Welcome to Journal of University of Chinese Academy of Sciences,Today is

›› 2017, Vol. 34 ›› Issue (4): 431-438.DOI: 10.7523/j.issn.2095-6134.2017.04.004

Previous Articles     Next Articles

Sample optimization based on local features in speech emotion recognition

SUI Xiaoyun1,2, ZHU Tingshao1, WANG Jingying1,2   

  1. 1. Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
    2. Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2016-03-29 Revised:2016-06-17 Online:2017-07-15

Abstract: Emotion recognition is one of the most prospective technics in human-machine interaction process. Most researches prefer statistical functional features because these features are more consistent with the speech variation as emotion changes. However, local features, i.e., short-term or temporal features extracted from single frame also contain useful information. In this work, a new approach is proposed to optimize samples via local features. To achieve this, a K-means cluster is employed to cluster each sample with 2 groups: frames with obvious emotion and frames with emotion which is not that obvious. It is hypothesized that the cluster with more frames should be emotionally obvious. It is observed in the results that the classification performs better on optimized samples than on original ones. The method was tested on 3 corpora and the classification accuracy increases by 5%-17%. It is also found the improvement increases as speech length grows, which implies the optimization approach may be more applicable to the longer speech recognition.

Key words: speech emotion recognition, local features, global features, cluster analysis, sample optimization

CLC Number: