欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2017, Vol. 34 ›› Issue (5): 633-639.DOI: 10.7523/j.issn.2095-6134.2017.05.014

• 信息与电子科学 • 上一篇    下一篇

深度神经网络自适应中基于身份认证向量的归一化方法

杨建斌, 张卫强, 刘加   

  1. 清华大学电子工程系, 北京 100084
  • 收稿日期:2016-07-19 修回日期:2016-10-13 发布日期:2017-09-15
  • 通讯作者: 张卫强,E-mail:wqzhang@tsinghua.edu.cn
  • 基金资助:
    国家自然科学基金(61370034,61403224)资助

Investigation of normalization methods in speaker adaptation of deep neural network using i-vector

YANG Jianbin, ZHANG Weiqiang, LIU Jia   

  1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • Received:2016-07-19 Revised:2016-10-13 Published:2017-09-15

摘要: 深度神经网络是近年来非常流行的一种语音识别声学建模技术,其性能比之前主流的高斯混合模型有显著提高,但是深度神经网络的说话人自适应技术一直没有很好地解决。利用身份认证向量对深度神经网络进行自适应,并研究身份认证向量归一化对系统的影响,提出一种新的max-min线性归一化技术。实验表明在TIMIT数据集上该技术可使字错误率比传统方法相对下降5.10%。

关键词: 身份认证向量, 深度神经网络, 说话人自适应, 归一化

Abstract: The deep neural network (DNN) was a remarkable modeling technology for speech recognition in recent years and its performance was significantly better than that of the Gaussian mixture model,which was the mainstream modeling technology in speech recognition before.However,commendable adaptation of DNN has not been solved yet.In this work,we use the identity vector (i-vector) to adapt a deep neural network by putting i-vector and the regular speech features together as the input of DNN for both training and testing.Then we focus on the normalization method of i-vector using a new max-min linear normalization method.We get a 5.10%relative decrease in word error rate over the traditional length normalization method.

Key words: identity vector, deep neural network, speaker adaptation, normalization

中图分类号: