Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences ›› 2025, Vol. 42 ›› Issue (6): 832-842.DOI: 10.7523/j.ucas.2024.004

• Research Articles • Previous Articles     Next Articles

Voiceprint recognition based on fused MGCC and CNN-SE-BiGRU features

FAN Tao, ZHAN Xu   

  1. School of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 644000, Sichuan, China
  • Received:2023-11-23 Revised:2024-01-26

Abstract: In order to solve the problems of single feature, weak representation ability and anti-noise ability in the field of voiceprint recognition, weak feature expression ability of traditional convolutional neural network (CNN) model and incomplete acquisition of temporal features, an acoustic feature fused with Mel frequency cepstral coefficient (MFCC) and Gamma frequency cepstral coefficient (GFCC) was proposed to carry out voiceprint recognition with a novel voiceprint recognition model based on enhanced CNN and bidirectional GRU networks (CNN-SE-BiGRU). Firstly, the extracted MFCC features and GFCC features are normalized, and according to the inter-class discrimination power, appropriate weights are designed to linearly combine the MFCC and GFCC features, and the Mel-gammatone cepstral coefficients (MGCC) with stronger speaker discrimination were obtained. Secondly, in order to improve the expression of CNN to features, an improved channel feature response SE-Block (squeeze and excitation block) model was introduced. Finally, building upon the spatial features extracted by the enhanced squeeze-and-excitation CNN (CNN-SE), the time series features are further extracted through the bidirectional gated recurrent unit network (BiGRU) to improve the performance of the whole network. Experimental results show that the acoustic features of MGCC show stronger characterization ability and better robustness under different noise backgrounds, while the average recognition rate of the CNN-SE-BiGRU model can be 96.05% under MGCC acoustic features, which fully proves the effectiveness and robustness of the proposed method.

Key words: voiceprint recognition, fusion features, bidirectional gated recurrent unit, squeeze and excitation block, convolutional neural network (CNN)

CLC Number: