A methodology for using crowdsourced data to measure uncertainty in natural speech release_k7gbsrq2vrdutlxda2d2eu43py

by Lara Martin, Matthew Stone, Florian Metze, Jack Mostow

Published by Figshare.



People sometimes express uncertainty unconsciously in order to add layers of meaning on top of their speech, conveying doubts about the accuracy of the information they are trying to communicate. In this paper, we propose a methodology for annotating uncertainty, which is usually a subjective and expensive process, by using crowdsourcing. In our experiment, we used an online database which consists of colors that more than 200,000 users have named. Based on the amount of unique names that users have given each color, an entropy value was calculated to represent the uncertainty level of the color. A model, which performed better than chance, was created to predict whether or not the color that the participant was describing was ambiguous or borderline, given certain prosodic cues of their speech when asked to name the color verbally. Using crowdsourced data can greatly streamline the process of annotating uncertainty, but our methods have yet to be tested in other domains besides color. By using methods such as ours to measure prosodic attributes of uncertainty, it should be possible to increase the accuracy of voice search.
In text/plain format

Archived Files and Locations

application/pdf  299.2 kB
s3-eu-west-1.amazonaws.com (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2018-06-10
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 09b13c16-81fe-4490-bded-3ecdaab4c08f