Welcome to Journal of University of Chinese Academy of Sciences,Today is

›› 2009, Vol. 26 ›› Issue (5): 703-711.DOI: 10.7523/j.issn.2095-6134.2009.5.017

• Research Articles • Previous Articles     Next Articles

Fast dictionary mechanism for Chinese word segmentation

WU Jing-Jing1,2, JING Ji-Wu2, NIE Xiao-Feng2, Wang Ping-Jian2   

  1. 1. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China;
    2. State Key Laboratory of Information Security, Graduate University of the Chinese Academy of Sciences, Beijing 100049, China
  • Received:2008-10-16 Revised:2009-04-21 Online:2009-09-15

Abstract:

With the development of global networking through Internet, the amount of articles in Chinese or other native languages is increasing rapidly. As the lack of explicit separator, word segmentation is a precondition for the processing of these character-based languages and thus it affects the whole system in performance. In this paper, we propose a new solution for Chinese word segmentation problem based on Lexicon named double-character-and-long-word-hash-indexing (DCLWHI).Compared with traditional lexicon mechanism, DCLWHI improves the speed and efficiency of word segmentation without extra memory spending and gains the same accuracy.

Key words: text real-time processing, Chinese word segmentation, lexicon mechanism, double-character-and-long-word-Hash-indexing(DCLWHI)

CLC Number: