Welcome to Journal of University of Chinese Academy of Sciences,Today is

›› 2018, Vol. 35 ›› Issue (1): 109-117.DOI: 10.7523/j.issn.2095-6134.2018.01.015

Previous Articles     Next Articles

A lip-reading recognition approach based on long short-term memory

MA Ning1,2, TIAN Guodong2, ZHOU Xi2   

  1. 1. University of Chinese Academy of Sciences, Beijing 100049, China;
    2. Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
  • Received:2016-11-23 Revised:2017-03-15 Online:2018-01-15

Abstract: Visual speech information is the important carrier of conversation. However, visual speech informations from different speakers are different due to various appearances of lips, various backgrounds, and various talking ways even the content of the conversation is the same. To address the problem of variety of visual speech information, we propose a new approach for lip-reading recognition based on long short-term memory (LSTM). We compute the positions of lip landmarks which describe the dynamic information of the shape as the features of the lip-reading video, and it has the characteristics of within-class consistency and between-class distinctiveness. Then we use LSTM to encode temporal information, and it learns spatio-temporal features which have the ability of discrimination and generalization. Our approach is evaluated on three public databases (GRID, MRIALC, and OuluVS) for lip-reading recognition of isolated words or phrases in speaker independent experiments. On GRID and MRIALC, the accuracy of our approach is more than 30% highter than that of the conventional approach. On OuluVS, the accuracy of our approach is comparable to state of the art. The experiment results indicate that our lip-reading recognition approach solves the problem of variety of visual speech information effectively.

Key words: lip-reading recognition, long short-term memory, computer vision

CLC Number: