Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram release_ggwlzakyljet3iuid7y3mbjose

by Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

Released as a article .

2019  

Abstract

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained even with a small number of parameters. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system.
In text/plain format

Archived Files and Locations

application/pdf  343.3 kB
file_2tcpajv7xre4dm5xspsshn56te
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2019-10-25
Version   v1
Language   en ?
arXiv  1910.11480v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 8430b301-7582-447a-a6e8-b8ecec3cf29c
API URL: JSON