WaveFlow: A Compact Flow-based Model for Raw Audio release_cih35fsuine2xls5jbadrcd3za

by Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

Released as a article .

2019  

Abstract

In this work, we present WaveFlow, a small-footprint generative flow for raw audio, which is trained with maximum likelihood without probability density distillation and auxiliary losses as used in Parallel WaveNet and ClariNet. It provides a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow as special cases. We systematically study these likelihood-based generative models for raw waveforms in terms of test likelihood and speech fidelity. We demonstrate that WaveFlow can synthesize high-fidelity speech as WaveNet, while only requiring a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, WaveFlow closes the significant likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has 5.91M parameters and can generate 22.05kHz high-fidelity speech 42.6 times faster than real-time on a GPU without engineered inference kernels.
In text/plain format

Archived Files and Locations

application/pdf  470.3 kB
file_u3ickldhifhlthxjgt4jipm2u4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2019-12-03
Version   v1
Language   en ?
arXiv  1912.01219v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: d56979ef-0f34-4298-8c2c-f1402e96a6ec
API URL: JSON