论文标题
Jukebox:多语言歌手识别数据集
JukeBox: A Multilingual Singer Recognition Dataset
论文作者
论文摘要
独立于文本的说话者识别系统取决于成功编码语音音高,强度和音色等语音因素来实现良好的性能。大多数此类系统都是使用口头语音或日常对话语音数据培训和评估的。但是,口语的声音表现出可能的扬声器动力学范围有限,从而限制了派生的说话者识别模型的效用。另一方面,唱歌声音涵盖了更广泛的声音和环境因素,因此可以用来评估说话者识别系统的鲁棒性。但是,大多数现有的说话者识别数据集只专注于口语。相比之下,标有适合说话者识别研究的标签语音数据的大幅短缺。为了解决这个问题,我们组装\ textit {jukebox} - 一个扬声器识别数据集,其中包含带有歌手身份,性别和语言标签的多种语音语音音频。我们使用当前的最新方法来证明使用单独使用口语训练的模型在唱歌声音上表现出扬声器的困难。我们还评估了性别和语言对说话者识别性能的影响,无论是在口语和演唱语音数据中。可以在http://iprobe.cse.msu.edu/datasets/jukebox.html上访问完整的\ textit {jukebox}数据集。
A text-independent speaker recognition system relies on successfully encoding speech factors such as vocal pitch, intensity, and timbre to achieve good performance. A majority of such systems are trained and evaluated using spoken voice or everyday conversational voice data. Spoken voice, however, exhibits a limited range of possible speaker dynamics, thus constraining the utility of the derived speaker recognition models. Singing voice, on the other hand, covers a broader range of vocal and ambient factors and can, therefore, be used to evaluate the robustness of a speaker recognition system. However, a majority of existing speaker recognition datasets only focus on the spoken voice. In comparison, there is a significant shortage of labeled singing voice data suitable for speaker recognition research. To address this issue, we assemble \textit{JukeBox} - a speaker recognition dataset with multilingual singing voice audio annotated with singer identity, gender, and language labels. We use the current state-of-the-art methods to demonstrate the difficulty of performing speaker recognition on singing voice using models trained on spoken voice alone. We also evaluate the effect of gender and language on speaker recognition performance, both in spoken and singing voice data. The complete \textit{JukeBox} dataset can be accessed at http://iprobe.cse.msu.edu/datasets/jukebox.html.
