VALUE: Understanding Dialect Disparity in NLU
release_gd4crwqdjzbktdfhhmuzfqhtvu
by
Caleb Ziems, Jiaao Chen, Camille Harris, Jessica Anderson, Diyi Yang
2022
Abstract
English Natural Language Understanding (NLU) systems have achieved great
performances and even outperformed humans on benchmarks like GLUE and
SuperGLUE. However, these benchmarks contain only textbook Standard American
English (SAE). Other dialects have been largely overlooked in the NLP
community. This leads to biased and inequitable NLU systems that serve only a
sub-population of speakers. To understand disparities in current models and to
facilitate more dialect-competent NLU systems, we introduce the VernAcular
Language Understanding Evaluation (VALUE) benchmark, a challenging variant of
GLUE that we created with a set of lexical and morphosyntactic transformation
rules. In this initial release (V.1), we construct rules for 11 features of
African American Vernacular English (AAVE), and we recruit fluent AAVE speakers
to validate each feature transformation via linguistic acceptability judgments
in a participatory design manner. Experiments show that these new dialectal
features can lead to a drop in model performance.
In text/plain
format
Archived Files and Locations
application/pdf 454.0 kB
file_l3ewf4gomfhqln3ntwoc3kn3pq
|
arxiv.org (repository) web.archive.org (webarchive) |
2204.03031v1
access all versions, variants, and formats of this works (eg, pre-prints)