Pushing the Limits of Non-Autoregressive Speech Recognition release_muqaw7ua5bfdncgccbwjfzunda

by Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan

Released as a article .

2021  

Abstract

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.
In text/plain format

Archived Files and Locations

application/pdf  127.3 kB
file_mnguxxruczg47my5wncymtsq7q
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-06-16
Version   v3
Language   en ?
arXiv  2104.03416v3
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: b7b45f8b-1b0d-4232-8c28-2b3af3b5df83
API URL: JSON