Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences ›› 2023, Vol. 40 ›› Issue (5): 687-693.DOI: 10.7523/j.ucas.2022.011

• Research Articles • Previous Articles     Next Articles

Multi-channel voice activity detection in low signal-to-noise ratio environment

XIAO Si1,2, GONG Jie2, LI Baoqing2   

  1. 1. School of Microelectronics, University of Chinese Academy of Sciences, Beijing 100049, China;
    2. Key Laboratory of Microsystem Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 201800, China
  • Received:2021-12-16 Revised:2022-02-08 Online:2023-09-15

Abstract: Traditional voice activity detection algorithm only uses the time-frequency information, hence the detection accuracy will reduce rapidly in the low signal-to-noise environment, especially when the noise is non-stationary. Multi-channel speech signal has rich spatial information, which helps to improve the accuracy of detection as a supplement to time-frequency information. In this paper, on the basis of multi-channel spatial feature research, we propose a new multi-channel voice activity detection algorithm, by leveraging the maximum eigenvalue of the multi-channel covariance matrix (covariance matrix maximum eigenvalue, CMME) of the received array signals. First, we extract the CMME of the array signal as the feature of detection frame by frame, to track the speech signal. Then the double threshold method is adopted to determine whether the current frame is a speech frame. The results show that, compared with Mel energy ratio and the improved energy zero-entropy algorithm, the proposed algorithm has higher detection accuracy in VCTK and laboratory corpus, and thus is more robust in the low signal-to-noise ratio and non-stationary noise environment.

Key words: voice activity detection, microphone array, covariance matrix, low signal-to-noise rate

CLC Number: