Pre-Training for Query Rewriting in A Spoken Language Understanding System release_c5epqra7ine27dltbr3bz5kqvy

by Zheng Chen, Xing Fan, Yuan Ling, Lambert Mathias, Chenlei Guo

Released as a article .

2020  

Abstract

Query rewriting (QR) is an increasingly important technique to reduce customer friction caused by errors in a spoken language understanding pipeline, where the errors originate from various sources such as speech recognition errors, language understanding errors or entity resolution errors. In this work, we first propose a neural-retrieval based approach for query rewriting. Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant. In addition, we propose to use the NLU hypotheses generated by the language understanding system to augment the pre-training. Our experiments show pre-training provides rich prior information and help the QR task achieve strong performance. We also show joint pre-training with NLU hypotheses has further benefit. Finally, after pre-training, we find a small set of rewrite pairs is enough to fine-tune the QR model to outperform a strong baseline by full training on all QR training data.
In text/plain format

Archived Files and Locations

application/pdf  431.1 kB
file_6mgggt7h2zenbbigx3qvkfpvt4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-02-13
Version   v1
Language   en ?
arXiv  2002.05607v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: cc047582-b4f8-419e-b081-e26e8a36e63d
API URL: JSON