How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions release_mionie2jbvex5a43b76dgr5xri

by Zewei Chu, Mingda Chen, Jing Chen, Miaosen Wang, Kevin Gimpel, Manaal Faruqui, Xiance Si

Published in PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE by Association for the Advancement of Artificial Intelligence (AAAI).

2020   Volume 34, Issue 05, p7586-7593

Abstract

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2% in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.1
In application/xml+jats format

Archived Files and Locations

application/pdf  478.3 kB
file_hc2n3vnswvd6xmy24ee2gy2h2m
aaai.org (web)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2020-04-03
Proceedings Metadata
Not in DOAJ
Not in Keepers Registry
ISSN-L:  2159-5399
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 5310e649-417d-41d7-a3cb-ae4a0d7b4f23
API URL: JSON