Zero-shot Learning for Audio-based Music Classification and Tagging
release_hwuhvu77qfbvfg2gob3zzdpthm
by
Jeong Choi, Jongpil Lee, Jiyoung Park, Juhan Nam
2020
Abstract
Audio-based music classification and tagging is typically based on
categorical supervised learning with a fixed set of labels. This intrinsically
cannot handle unseen labels such as newly added music genres or semantic words
that users arbitrarily choose for music retrieval. Zero-shot learning can
address this problem by leveraging an additional semantic space of labels where
side information about the labels is used to unveil the relationship between
each other. In this work, we investigate the zero-shot learning in the music
domain and organize two different setups of side information. One is using
human-labeled attribute information based on Free Music Archive and
OpenMIC-2018 datasets. The other is using general word semantic information
based on Million Song Dataset and Last.fm tag annotations. Considering a music
track is usually multi-labeled in music classification and tagging datasets, we
also propose a data split scheme and associated evaluation settings for the
multi-label zero-shot learning. Finally, we report experimental results and
discuss the effectiveness and new possibilities of zero-shot learning in the
music domain.
In text/plain
format
Archived Files and Locations
application/pdf 876.9 kB
file_tyszxm7kpfd2reegrgligqarhy
|
arxiv.org (repository) web.archive.org (webarchive) |
1907.02670v2
access all versions, variants, and formats of this works (eg, pre-prints)