BibTeX
CSL-JSON
MLA
Harvard
Pushing the Limits of Non-Autoregressive Speech Recognition
release_muqaw7ua5bfdncgccbwjfzunda
by
Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan
Released
as a article
.
2021
Abstract
We combine recent advancements in end-to-end speech recognition to
non-autoregressive automatic speech recognition. We push the limits of
non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech,
Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC
on giant Conformer neural network architectures with SpecAugment and wav2vec2
pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets,
5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without
a language model.
In text/plain
format
Archived Files and Locations
application/pdf 127.3 kB
file_mnguxxruczg47my5wncymtsq7q
|
arxiv.org (repository) web.archive.org (webarchive) |
Read Archived PDF
Preserved and Accessible
arXiv
2104.03416v3
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
access all versions, variants, and formats of this works (eg, pre-prints)
Cite This
Lookup Links