Score-informed syllable segmentation for a cappella singing voice with
convolutional neural networks
release_3pnipfudefgsnh3mdojpy7vi5y
by
Jordi Pons, Rong Gong, Xavier Serra
2017
Abstract
This paper introduces a new score-informed method for the segmentation of
jingju a cappella singing phrase into syllables. The proposed method estimates
the most likely sequence of syllable boundaries given the estimated syllable
onset detection function (ODF) and its score. Throughout the paper, we first
examine the jingju syllables structure and propose a definition of the term
"syllable onset". Then, we identify which are the challenges that jingju a
cappella singing poses. Further, we investigate how to improve the syllable ODF
estimation with convolutional neural networks (CNNs). We propose a novel CNN
architecture that allows to efficiently capture different time-frequency scales
for estimating syllable onsets. In addition, we propose using a score-informed
Viterbi algorithm -instead of thresholding the onset function-, because the
available musical knowledge we have (the score) can be used to inform the
Viterbi algorithm in order to overcome the identified challenges. The proposed
method outperforms the state-of-the-art in syllable segmentation for jingju a
cappella singing. We further provide an analysis of the segmentation errors
which points possible research directions.
In text/plain
format
Archived Files and Locations
application/pdf 1.5 MB
file_c5r2cy27cvbxtm4fr7eh4sqcve
|
arxiv.org (repository) web.archive.org (webarchive) |
1707.03544v1
access all versions, variants, and formats of this works (eg, pre-prints)