Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences

   

Research on voiceprint recognition based on fusion features MGCC and CNN-SE-BiGRU

FAN Tao, ZHAN Xu   

  1. School of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 644000, Sichuan,China
  • Received:2023-11-23 Revised:2024-01-26

Abstract: In order to solve the problems of single feature, weak representation ability and anti-noise ability in the field of voiceprint recognition, weak feature expression ability of traditional CNN model and incomplete acquisition of temporal features, an acoustic feature fused with mel frequency cepstral coefficient (MFCC) and gamma frequency cepstrum coefficient (GFCC) was proposed tocarry out voiceprint recognition with a new deep network structure-compression excitation mechanism convolutional neural network and bidirectional gated recurrent unit network integration network (CNN-SE-BiGRU). Firstly, the extracted speech MFCC features and GFCC features were normalized respectively, and according to the discrimination between feature classes, appropriate weights were designed to linearly weight the MFCC and GFCC features, and the mel-gammatone cepstral coefficients (MGCC) with stronger speaker discrimination were obtained. Secondly, in order to improve the expression of CNN to features, an improved channel feature response SE-Block (squeeze and excitation block) model was introduced. Finally, on the basis of the improved compressed excitation convolutional network (CNN-SE-Net) to extract spatial features, the time series features are further extracted through the bidirectional gated recurrent unit network (BiGRU) to improve the performance of the whole network. Experimental results show that the acoustic features of MGCC show stronger characterization ability and better robustness under different noise backgrounds, while the average recognition rate of the CNN-SE-BiGRU model is the highest under MGCC acoustic features of 96.05%, which fully proves the effectiveness and robustness of the proposed method.

Key words: voiceprint recognition, fusion features, bidirectional gated recirculating unit, squeeze and excitation block

CLC Number: