Bandwidth Extension on Raw Audio via Generative Adversarial Networks
release_sbpiyc5kjjc3zj54l2funla6pu
by
Sung Kim, Visvesh Sathe
2019
Abstract
Neural network-based methods have recently demonstrated state-of-the-art
results on image synthesis and super-resolution tasks, in particular by using
variants of generative adversarial networks (GANs) with supervised feature
losses. Nevertheless, previous feature loss formulations rely on the
availability of large auxiliary classifier networks, and labeled datasets that
enable such classifiers to be trained. Furthermore, there has been
comparatively little work to explore the applicability of GAN-based methods to
domains other than images and video. In this work we explore a GAN-based method
for audio processing, and develop a convolutional neural network architecture
to perform audio super-resolution. In addition to several new architectural
building blocks for audio processing, a key component of our approach is the
use of an autoencoder-based loss that enables training in the GAN framework,
with feature losses derived from unlabeled data. We explore the impact of our
architectural choices, and demonstrate significant improvements over previous
works in terms of both objective and perceptual quality.
In text/plain
format
Archived Files and Locations
application/pdf 635.1 kB
file_ukuoremiingsxizyexyz75jahu
|
arxiv.org (repository) web.archive.org (webarchive) |
1903.09027v1
access all versions, variants, and formats of this works (eg, pre-prints)