Unsupervised Source Separation via Bayesian Inference in the Latent Domain
release_7qm2ev64vbcf5oqmldhhcrdyeq
by
Michele Mancusi, Emilian Postolache, Marco Fumero, Andrea Santilli, Luca Cosmo, Emanuele Rodolà
2021
Abstract
State of the art audio source separation models rely on supervised
data-driven approaches, which can be expensive in terms of labeling resources.
On the other hand, approaches for training these models without any direct
supervision are typically high-demanding in terms of memory and time
requirements, and remain impractical to be used at inference time. We aim to
tackle these limitations by proposing a simple yet effective unsupervised
separation algorithm, which operates directly on a latent representation of
time-domain signals. Our algorithm relies on deep Bayesian priors in the form
of pre-trained autoregressive networks to model the probability distributions
of each source. We leverage the low cardinality of the discrete latent space,
trained with a novel loss term imposing a precise arithmetic structure on it,
to perform exact Bayesian inference without relying on an approximation
strategy. We validate our approach on the Slakh dataset arXiv:1909.08494,
demonstrating results in line with state of the art supervised approaches while
requiring fewer resources with respect to other unsupervised methods.
In text/plain
format
Archived Files and Locations
application/pdf 444.2 kB
file_2jotqmyae5fqxd5jiwcxzw4hhi
|
arxiv.org (repository) web.archive.org (webarchive) |
2110.05313v1
access all versions, variants, and formats of this works (eg, pre-prints)