Welcome to Journal of University of Chinese Academy of Sciences,Today is

›› 2005, Vol. 22 ›› Issue (5): 554-559.DOI: 10.7523/j.issn.2095-6134.2005.5.004

Previous Articles     Next Articles

A Fast Text Categorization Approach Based on k-Nearest Neighbor

ZHANG Qing-Guo1, ZHANG Hong-Wei2, ZHANG Jun-Yu1   

  1. 1. Department of Mathematics, Graduate School of the Chinese Academy of Sciences, Beijing 100049, China;
    2. Optical Memory National Engineering Research Center, Tsinghua University, Beijing 100084, China
  • Received:2004-08-09 Revised:2004-11-08 Online:2005-09-15

Abstract:

k-Nearest Neighbor (k-NN) is one of the simplest and most effective algorithms for text categorizat ion. However, k-NN search requires intensive similarity computations, part icularly for large training set, the search of the whole set is unacceptable. Therefore, speeding-up k-NN search is a key for making k-NN categorizat ion useful in practice. In this paper a fast text categorization approach based on k-NN, which can classify textual documents quickly and efficiently on condition of searching in the very large training set is presented. Experiment shows that the new algorithm can greatly improve the performance.

Key words: text categorization, k-Nearest Neighbor(k-NN), multidimensional index, similarity retrieval

CLC Number: