An Approach for Weakly-Supervised Deep Information Retrieval release_rotres2qn5ghxizuj22tsnov6m

by Sean MacAvaney, Kai Hui, Andrew Yates

Released as a article .

2017  

Abstract

Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.
In text/plain format

Archived Files and Locations

application/pdf  646.5 kB
file_rddsop7u45f4vcbuwz34ty4obq
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2017-07-24
Version   v2
Language   en ?
arXiv  1707.00189v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: cc079f08-4d95-4c4b-987b-46f043f5b068
API URL: JSON