欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2023, Vol. 40 ›› Issue (5): 687-693.DOI: 10.7523/j.ucas.2022.011

• 电子信息与计算机科学 • 上一篇    下一篇

低信噪比环境下的多通道语音端点检测算法

肖思1,2, 龚杰2, 李宝清2   

  1. 1. 中国科学院大学微电子学院, 北京 100049;
    2. 中国科学院上海微系统与信息技术研究所 微系统技术重点实验室, 上海 201800
  • 收稿日期:2021-12-16 修回日期:2022-02-08 发布日期:2022-03-16
  • 通讯作者: 李宝清,E-mail:sinoiot@mail.sim.ac.cn
  • 基金资助:
    微系统技术重点实验室基金(6142804200408)资助

Multi-channel voice activity detection in low signal-to-noise ratio environment

XIAO Si1,2, GONG Jie2, LI Baoqing2   

  1. 1. School of Microelectronics, University of Chinese Academy of Sciences, Beijing 100049, China;
    2. Key Laboratory of Microsystem Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 201800, China
  • Received:2021-12-16 Revised:2022-02-08 Published:2022-03-16

摘要: 传统的端点检测算法仅利用信号的时频信息,在低信噪比环境下,尤其是非平稳噪声环境,会出现准确率下降的问题,而多通道语音信号具有丰富的空间信息,可以对时频域的信息进行补充,从而提高检测的准确率。因此在多通道空间特征研究的基础上,利用接收阵列信号的协方差矩阵,提出一种全新的基于多通道协方差矩阵最大特征值的多通道语音端点检测算法。首先通过提取每一帧信号的协方差矩阵的最大特征值作为端点检测的特征参数,从而对语音信号进行跟踪,然后采用双门限阈值法判断当前帧是否为语音帧。实验结果表明,在VCTK及实验室语料库上,与梅尔能量比及新能零熵算法相比,所提出的算法具有更高的检测准确率,并且对于-5 dB的低信噪比环境及非平稳噪声环境具有更好的鲁棒性。

关键词: 语音端点检测, 麦克风阵列, 协方差矩阵, 低信噪比

Abstract: Traditional voice activity detection algorithm only uses the time-frequency information, hence the detection accuracy will reduce rapidly in the low signal-to-noise environment, especially when the noise is non-stationary. Multi-channel speech signal has rich spatial information, which helps to improve the accuracy of detection as a supplement to time-frequency information. In this paper, on the basis of multi-channel spatial feature research, we propose a new multi-channel voice activity detection algorithm, by leveraging the maximum eigenvalue of the multi-channel covariance matrix (covariance matrix maximum eigenvalue, CMME) of the received array signals. First, we extract the CMME of the array signal as the feature of detection frame by frame, to track the speech signal. Then the double threshold method is adopted to determine whether the current frame is a speech frame. The results show that, compared with Mel energy ratio and the improved energy zero-entropy algorithm, the proposed algorithm has higher detection accuracy in VCTK and laboratory corpus, and thus is more robust in the low signal-to-noise ratio and non-stationary noise environment.

Key words: voice activity detection, microphone array, covariance matrix, low signal-to-noise rate

中图分类号: