Binaural Source Separation with Convolutional Neural Networks release_mye3jo4smng7jbvvnpk57ysgdi

by Gerard Erruz, Marius Miron, Adan Garriga

Published by Zenodo.

2017  

Abstract

This work is a study on source separation techniques for binaural music mixtures. The chosen framework uses a Convolutional Neural Network (CNN) to estimate time-frequency soft masks. This masks are used to extract the different sources from the original two-channel mixture signal. Its baseline single-channel architecture performed state-of-the-art results on monaural music mixtures under low-latency conditions. It has been extended to perform separation in two-channel signals, being the first two-channel CNN joint estimation architecture. This means that filters are learned for each source by taking in account both channels information. Furthermore, a specific binaural condition is included during training stage. It uses Interaural Level Difference (ILD) information to improve spatial images of extracted sources. Concurrently, we present a novel tool to create binaural scenes for testing purposes. Multiple binaural scenes are rendered from a music dataset of four instruments (voice, drums, bass and others). The CNN framework have been tested for these binaural scenes and compared with monaural and stereo results. The system showed a great amount of adaptability and good separation results in all the scenarios. These results are used to evaluate spatial information impact on separation performance.
In text/plain format

Archived Files and Locations

application/pdf  4.6 MB
file_q3en73h2z5b7fpryph6zchrgce
zenodo.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   published
Date   2017-12-07
Language   en ?
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 9c475fcd-6180-4f5a-8b45-047e2cb64846
API URL: JSON