Challenges in Generalization in Open Domain Question Answering
release_5vdc7qs3ffhbxcsjqnkjhad67m
by
Linqing Liu, Patrick Lewis, Sebastian Riedel, Pontus Stenetorp
2021
Abstract
Recent work on Open Domain Question Answering has shown that there is a large
discrepancy in model performance between novel test questions and those that
largely overlap with training questions. However, it is unclear which aspects
of novel questions make them challenging. Drawing upon studies on systematic
generalization, we introduce and annotate questions according to three
categories that measure different levels and kinds of generalization: training
set overlap, compositional generalization (comp-gen), and novel-entity
generalization (novel-entity). When evaluating six popular parametric and
non-parametric models, we find that for the established Natural Questions and
TriviaQA datasets, even the strongest model performance for
comp-gen/novel-entity is 13.1/5.4% and 9.6/1.5% lower compared to that for the
full test set -- indicating the challenge posed by these types of questions.
Furthermore, we show that whilst non-parametric models can handle questions
containing novel entities relatively well, they struggle with those requiring
compositional generalization. Lastly, we find that key question difficulty
factors are: cascading errors from the retrieval component, frequency of
question pattern, and frequency of the entity.
In text/plain
format
Archived Content
There are no accessible files associated with this release. You could check other releases for this work for an accessible version.
Know of a fulltext copy of on the public web? Submit a URL and we will archive it
2109.01156v2
access all versions, variants, and formats of this works (eg, pre-prints)