Gender and Racial Bias in Visual Question Answering Datasets
release_d366jca56fdyhjsa4dvlb4hsge
by
Yusuke Hirota, Yuta Nakashima, Noa Garcia
2022
Abstract
Vision-and-language tasks have increasingly drawn more attention as a means
to evaluate human-like reasoning in machine learning models. A popular task in
the field is visual question answering (VQA), which aims to answer questions
about images. However, VQA models have been shown to exploit language bias by
learning the statistical correlations between questions and answers without
looking into the image content: e.g., questions about the color of a banana are
answered with yellow, even if the banana in the image is green. If societal
bias (e.g., sexism, racism, ableism, etc.) is present in the training data,
this problem may be causing VQA models to learn harmful stereotypes. For this
reason, we investigate gender and racial bias in five VQA datasets. In our
analysis, we find that the distribution of answers is highly different between
questions about women and men, as well as the existence of detrimental
gender-stereotypical samples. Likewise, we identify that specific race-related
attributes are underrepresented, whereas potentially discriminatory samples
appear in the analyzed datasets. Our findings suggest that there are dangers
associated to using VQA datasets without considering and dealing with the
potentially harmful stereotypes. We conclude the paper by proposing solutions
to alleviate the problem before, during, and after the dataset collection
process.
In text/plain
format
Archived Files and Locations
application/pdf 4.0 MB
file_hoi5t2puzjad7btagi4uzztpuu
|
arxiv.org (repository) web.archive.org (webarchive) |
2205.08148v2
access all versions, variants, and formats of this works (eg, pre-prints)