欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2018, Vol. 35 ›› Issue (3): 402-408.DOI: 10.7523/j.issn.2095-6134.2018.03.015

• 计算机科学 • 上一篇    下一篇

一种免除二值化的视频叠加中文字符识别方法

田洁, 王伟强, 孙翼   

  1. 中国科学院大学计算机与控制学院, 北京 101408
  • 收稿日期:2017-03-15 修回日期:2017-04-19 发布日期:2018-05-15
  • 通讯作者: 田洁
  • 基金资助:
    国家自然科学基金(61271434)资助

Recognition of overlaid Chinese characters in videos without binarization

TIAN Jie, WANG Weiqiang, SUN Yi   

  1. School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 101408, China
  • Received:2017-03-15 Revised:2017-04-19 Published:2018-05-15

摘要: 提出一种新的用于识别视频中字幕文字的方法。鉴于视频中文字的大小、颜色、渲染风格和分辨率的不同,以及视频中各种复杂背景的影响,识别视频中的叠加文字是一个尚未解决的问题。目前,大多数视频叠加文字识别方法都基于视频文字的二值化和传统OCR引擎的结合。然而,二值化过程容易引入噪声和文字笔划信息的丢失。另外,传统OCR技术主要专注于高分辨率的扫描打印文档,这些文档具有背景单一、噪声少和笔划信息较完整的特点。因此,传统OCR引擎用于识别叠加文字二值化后的结果可能不够鲁棒。为解决这个问题,直接从未二值化的叠加视频文字图像中提取Gabor特征用于训练二层字符识别器。实验结果表明,本文提出的方法在多字体视频叠加中文文字识别上有良好的效果。

关键词: 视频叠加文字, OCR, Gabor, 最近原型分类(NPC)

Abstract: In this paper, a new method for recognizing caption texts in videos is proposed. Due to varying font sizes, colors, styles, and resolutions and complex backgrounds in videos, it is still a challenging problem to recognize overlaid texts in videos. Most existing overlaid text recognition methods are based on the combination of text binarization and traditional OCR engine. However, the process of text binarization may incur noises and text stroke information loss. Additionally, techniques of traditional OCRs are mainly focused on high-resolution scans of printed documents, which have the characteristics of single color background, little noise, and more complete stroke information. Hence, traditional OCR engines might not be robust enough to recognize the binarization results of overlaid text images. In order to solve this problem, we directly extract Gabor features from overlaid text images without binarization for training the two-level character recognizer. The final experimental results demonstrate that the proposed method makes a great progress in overlaid Chinese text recognition with multiple fonts.

Key words: video overlaid text, OCR, Gabor, nearest prototype classifier

中图分类号: