Parallel WaveGAN: A fast waveform generation model based on generative
adversarial networks with multi-resolution spectrogram
release_ggwlzakyljet3iuid7y3mbjose
by
Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
2019
Abstract
We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint
waveform generation method using a generative adversarial network. In the
proposed method, a non-autoregressive WaveNet is trained by jointly optimizing
multi-resolution spectrogram and adversarial loss functions, which can
effectively capture the time-frequency distribution of the realistic speech
waveform. As our method does not require density distillation used in the
conventional teacher-student framework, the entire model can be easily trained
even with a small number of parameters. In particular, the proposed Parallel
WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform
28.68 times faster than real-time on a single GPU environment. Perceptual
listening test results verify that our proposed method achieves 4.16 mean
opinion score within a Transformer-based text-to-speech framework, which is
comparative to the best distillation-based Parallel WaveNet system.
In text/plain
format
Archived Files and Locations
application/pdf 343.3 kB
file_2tcpajv7xre4dm5xspsshn56te
|
arxiv.org (repository) web.archive.org (webarchive) |
1910.11480v1
access all versions, variants, and formats of this works (eg, pre-prints)