论文标题
调查言语识别无序的数据增强技术
Investigation of Data Augmentation Techniques for Disordered Speech Recognition
论文作者
论文摘要
言语识别障碍是一项高度挑战的任务。言语障碍患者的基本神经运动状况通常与共同发生的身体残疾相同,导致难以收集系统开发所需的大量言语。本文研究了一系列用于语音识别的数据增强技术,包括声道长度扰动(VTLP),节奏扰动和速度扰动。在增强过程中利用了正常和无序的言语。原始数据和增强数据中的扬声器受损的可变性是使用基于学习的单元贡献(LHUC)的扬声器自适应培训对模型的。最终的扬声器调整系统是使用Uapeech语料库构建的,并且基于速度扰动的最佳增强方法,在没有数据增强的情况下,基准系统的绝对绝对(9.3%相对)单词错误率(wer)降低了2.92%(9.3%),并在16个含有16个同性恋者的测试集中,总体上降低了26.37%。
Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with speech disorders, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of speech required for system development. This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation. Both normal and disordered speech were exploited in the augmentation process. Variability among impaired speakers in both the original and augmented data was modeled using learning hidden unit contributions (LHUC) based speaker adaptive training. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute (9.3% relative) word error rate (WER) reduction over the baseline system without data augmentation, and gave an overall WER of 26.37% on the test set containing 16 dysarthric speakers.
