TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids
release_tjodbvbz3jf2rhie326dgaxqru
by
Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough
2020
Abstract
Modern speech enhancement algorithms achieve remarkable noise suppression by
means of large recurrent neural networks (RNNs). However, large RNNs limit
practical deployment in hearing aid hardware (HW) form-factors, which are
battery powered and run on resource-constrained microcontroller units (MCUs)
with limited memory capacity and compute capability. In this work, we use model
compression techniques to bridge this gap. We define the constraints imposed on
the RNN by the HW and describe a method to satisfy them. Although model
compression techniques are an active area of research, we are the first to
demonstrate their efficacy for RNN speech enhancement, using pruning and
integer quantization of weights/activations. We also demonstrate state update
skipping, which reduces the computational load. Finally, we conduct a
perceptual evaluation of the compressed models to verify audio quality on human
raters. Results show a reduction in model size and operations of 11.9×
and 2.9×, respectively, over the baseline for compressed models, without
a statistical difference in listening preference and only exhibiting a loss of
0.55dB SDR. Our model achieves a computational latency of 2.39ms, well within
the 10ms target and 351× better than previous work.
In text/plain
format
Archived Files and Locations
application/pdf 319.0 kB
file_n5gdgk5rivaptlv4ecb7yaazeu
|
arxiv.org (repository) web.archive.org (webarchive) |
2005.11138v1
access all versions, variants, and formats of this works (eg, pre-prints)