Pre-Training for Query Rewriting in A Spoken Language Understanding
System
release_c5epqra7ine27dltbr3bz5kqvy
by
Zheng Chen, Xing Fan, Yuan Ling, Lambert Mathias, Chenlei Guo
2020
Abstract
Query rewriting (QR) is an increasingly important technique to reduce
customer friction caused by errors in a spoken language understanding pipeline,
where the errors originate from various sources such as speech recognition
errors, language understanding errors or entity resolution errors. In this
work, we first propose a neural-retrieval based approach for query rewriting.
Then, inspired by the wide success of pre-trained contextual language
embeddings, and also as a way to compensate for insufficient QR training data,
we propose a language-modeling (LM) based approach to pre-train query
embeddings on historical user conversation data with a voice assistant. In
addition, we propose to use the NLU hypotheses generated by the language
understanding system to augment the pre-training. Our experiments show
pre-training provides rich prior information and help the QR task achieve
strong performance. We also show joint pre-training with NLU hypotheses has
further benefit. Finally, after pre-training, we find a small set of rewrite
pairs is enough to fine-tune the QR model to outperform a strong baseline by
full training on all QR training data.
In text/plain
format
Archived Files and Locations
application/pdf 431.1 kB
file_6mgggt7h2zenbbigx3qvkfpvt4
|
arxiv.org (repository) web.archive.org (webarchive) |
2002.05607v1
access all versions, variants, and formats of this works (eg, pre-prints)