Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks release_3pnipfudefgsnh3mdojpy7vi5y

by Jordi Pons, Rong Gong, Xavier Serra

Released as a article .

2017  

Abstract

This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term "syllable onset". Then, we identify which are the challenges that jingju a cappella singing poses. Further, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture different time-frequency scales for estimating syllable onsets. In addition, we propose using a score-informed Viterbi algorithm -instead of thresholding the onset function-, because the available musical knowledge we have (the score) can be used to inform the Viterbi algorithm in order to overcome the identified challenges. The proposed method outperforms the state-of-the-art in syllable segmentation for jingju a cappella singing. We further provide an analysis of the segmentation errors which points possible research directions.
In text/plain format

Archived Files and Locations

application/pdf  1.5 MB
file_c5r2cy27cvbxtm4fr7eh4sqcve
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2017-07-12
Version   v1
Language   en ?
arXiv  1707.03544v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 1dcf98d5-e922-4e86-bb9e-7080852948c6
API URL: JSON