MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding
release_5kyvtkwb55av3dqyxijduvosfi
by
Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang
2021
Abstract
This paper presents an attempt to employ the mask language modeling approach
of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of
polyphonic piano MIDI files for tackling a number of symbolic-domain
discriminative music understanding tasks. These include two note-level
classification tasks, i.e., melody extraction and velocity prediction, as well
as two sequence-level classification tasks, i.e., composer classification and
emotion classification. We find that, given a pre-trained Transformer, our
models outperform recurrent neural network based baselines with less than 10
epochs of fine-tuning. Ablation studies show that the pre-training remains
effective even if none of the MIDI data of the downstream tasks are seen at the
pre-training stage, and that freezing the self-attention layers of the
Transformer at the fine-tuning stage slightly degrades performance. All the
five datasets employed in this work are publicly available, as well as
checkpoints of our pre-trained and fine-tuned models. As such, our research can
be taken as a benchmark for symbolic-domain music understanding.
In text/plain
format
Archived Files and Locations
application/pdf 1.7 MB
file_skaojvbdvfhrff2t6dtwyqcn6y
|
arxiv.org (repository) web.archive.org (webarchive) |
2107.05223v1
access all versions, variants, and formats of this works (eg, pre-prints)