An Approach for Weakly-Supervised Deep Information Retrieval
release_rotres2qn5ghxizuj22tsnov6m
by
Sean MacAvaney, Kai Hui, Andrew Yates
2017
Abstract
Recent developments in neural information retrieval models have been
promising, but a problem remains: human relevance judgments are expensive to
produce, while neural models require a considerable amount of training data. In
an attempt to fill this gap, we present an approach that---given a weak
training set of pseudo-queries, documents, relevance information---filters the
data to produce effective positive and negative query-document pairs. This
allows large corpora to be used as neural IR model training data, while
eliminating training examples that do not transfer well to relevance scoring.
The filters include unsupervised ranking heuristics and a novel measure of
interaction similarity. We evaluate our approach using a news corpus with
article headlines acting as pseudo-queries and article content as documents,
with implicit relevance between an article's headline and its content. By using
our approach to train state-of-the-art neural IR models and comparing to
established baselines, we find that training data generated by our approach can
lead to good results on a benchmark test collection.
In text/plain
format
Archived Files and Locations
application/pdf 646.5 kB
file_rddsop7u45f4vcbuwz34ty4obq
|
arxiv.org (repository) web.archive.org (webarchive) |
1707.00189v2
access all versions, variants, and formats of this works (eg, pre-prints)