欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2015, Vol. 32 ›› Issue (1): 97-102.DOI: 10.7523/j.issn.2095-6134.2015.01.016

• 信息与电子科学 • 上一篇    下一篇

低数据资源条件下基于Bottleneck特征与SGMM模型的语音识别系统

吴蔚澜1,2, 蔡猛3, 田垚3, 杨晓昊3, 陈振锋1,2, 刘加3, 夏善红2   

  1. 1. 中国科学院大学, 北京 100190;
    2. 中国科学院电子学研究所 传感技术国家重点实验室, 北京 100190;
    3. 清华大学电子工程系 清华信息科学与技术国家实验室, 北京 100084
  • 收稿日期:2014-02-27 修回日期:2014-03-07 发布日期:2015-01-15
  • 通讯作者: 夏善红
  • 基金资助:

    国家自然科学基金(61005019,61273268,61370034,90920302)和北京市自然科学基金(KZ201110005005)资助

Bottleneck features and subspace Gaussian mixture models for low-resource speech recognition

WU Weilan1,2, CAI Meng3, TIAN Yao3, YANG Xiaohao3, CHEN Zhenfeng1,2, LIU Jia3, XIA Shanhong2   

  1. 1. University of Chinese Academy of Sciences, Beijing 100190, China;
    2. State Key Laboratory of Transducer Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China;
    3. Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • Received:2014-02-27 Revised:2014-03-07 Published:2015-01-15

摘要:

语音识别系统需要大量有标注训练数据,在低数据资源条件下的识别性能往往不理想.针对数据匮乏问题,本文先研究子空间高斯混合声学模型通过参数共享减少待估计的参数规模,并使用基于最大互信息准则的区分型训练技术提高识别精度;而后在特征层面应用基于深度神经网络的Bottleneck特征来达到特征提取和降维的目的;最后将上述研究成果结合并构建了低资源条件下的语音识别系统.在国际标准的OpenKWS 2013数据库上的实验结果表明,本文的技术能够有效改善低资源条件下的系统识别性能,相比基线系统有12%左右的词错误率降低.

关键词: 语音识别, 低资源, 声学模型, 声学特征

Abstract:

State-of-the-art speech recognition systems often depend on a lot of training data, but perform poorly when limited data is available. In this paper, we study speech recognition systems under low-resource condition. The subspace Gaussian mixture (SGMM) model is first applied to reduce the number of parameters. The model is further enhanced by discriminative training based on maximum mutual information criterion. The bottleneck features based on deep neural networks are then studied to make robust feature extraction. The SGMM model and the bottleneck features are finally combined to produce a novel speech recognition system under low-resource condition. On the standard OpenKWS 2013 evaluation corpus, experimental results show the combination of the two technologies brings substantial relative improvement of about 12% over the baseline system.

Key words: speech recognition, low-resource, acoustic model, acoustic feature

中图分类号: