欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2013, Vol. 30 ›› Issue (5): 706-712.DOI: 10.7523/j.issn.2095-6134.2013.05.021

• 计算机科学 • 上一篇    下一篇

改进的多模式串匹配算法及GPU并行化研究

钱权1,2, 朱伟1,2, 车弘毅1,2, 张瑞1,2   

  1. 1. 上海大学计算机工程与科学学院, 上海 200072;
    2. 中国科学院软件研究所信息安全国家重点实验室, 北京 100190
  • 收稿日期:2012-10-11 修回日期:2013-02-06 发布日期:2013-09-15
  • 基金资助:
    国家自然科学基金(61003248);上海市自然科学基金(13ZR1416100);上海教委重点学科(J50103);上海教委创新项目(09YZ05);教育部博士点基金(20093108120016);上海科委开放课题(09511501300)资助 

An improved multi-pattern string matching algorithm and GPU parallelization

QIAN Quan1,2, ZHU Wei1,2, CHE Hong-Yi1,2, ZHANG Rui1,2   

  1. 1. School of Computer Engineering & Science, Shanghai University, Shanghai 200072, China;
    2. State Key Laboratory of Information Security, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2012-10-11 Revised:2013-02-06 Published:2013-09-15
  • Contact: 钱权,E-mail:qqian@shu.edu.cn

摘要: 通过分析AC多模式匹配算法和正则语句搜索匹配在功能上的优劣,研究它们在生成确定性有穷自动机时的相同与差异,融合AC算法和正则语句运用于文本的多模式串匹配,使得AC算法能够识别正则语句,并且保持原有算法在匹配失败后,目标模式串指针不回退且AC自动机回退少的特点,使得算法兼有二者优点. 同时,讨论了在GPU上通过CUDA的并行程序环境实现算法的并行化,并详细比较了在GPU上利用不同类型存储器实现的算法的性能差异.

关键词: 多模式匹配, 正则语句匹配, GPU, CUDA

Abstract: Multi-pattern matching algorithm has been widely used in text searching, intrusion detection, and some other areas. We focus on two matching algorithms, AC and regular expression. By comparing their DFA automata building process, we integrate the two processes to a new novel AC algorithm. The new AC maintains the advantages of the traditional AC when matching fails, the pointer of the target pattern does not turn back, and the AC automata pointer just moves back by a few steps. Meanwhile, we also discuss the AC parallelization based on GPU and CUDA, and compare the running performance when using GPU global memory or the texture one.

Key words: multi-pattern string matching, regulation expression matching, GPU, CUDA

中图分类号: