Convolutional Recurrent Neural Networks for Small-Footprint Keyword
Spotting
release_3aevylk2kzb2xnqx7hgjq6nq6y
by
Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew
Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates
2017
Abstract
Keyword spotting (KWS) constitutes a major component of human-technology
interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate,
while minimizing the footprint size, latency and complexity are the goals for
KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks
(CRNNs). Inspired by large-scale state-of-the-art speech recognition systems,
we combine the strengths of convolutional layers and recurrent layers to
exploit local structure and long-range context. We analyze the effect of
architecture parameters, and propose training strategies to improve
performance. With only ~230k parameters, our CRNN model yields acceptably low
latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise
ratio.
In text/plain
format
Archived Files and Locations
application/pdf 315.4 kB
file_yqyppmryjzhdzajzfzhcyxtaka
|
arxiv.org (repository) web.archive.org (webarchive) |
1703.05390v1
access all versions, variants, and formats of this works (eg, pre-prints)