欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2016, Vol. 33 ›› Issue (4): 562-569.DOI: 10.7523/j.issn.2095-6134.2016.04.019

• 计算机科学 • 上一篇    下一篇

基于余弦测度下K-means的网络空间终端设备识别

曹来成1, 赵建军1,2, 崔翔2, 李可2,3   

  1. 1. 兰州理工大学计算机与通信学院, 兰州 730050;
    2. 中国科学院信息工程研究所, 北京 100093;
    3. 北京邮电大学计算机学院, 北京 100876
  • 收稿日期:2016-01-07 修回日期:2016-03-17 发布日期:2016-07-15
  • 通讯作者: 赵建军
  • 基金资助:

    国家自然科学基金(61562059,61461027)资助

Cyberspace device identification based on K-means with cosine distance measure

CAO Laicheng1, ZHAO Jianjun1,2, CUI Xiang2, LI Ke2,3   

  1. 1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China;
    2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;
    3. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2016-01-07 Revised:2016-03-17 Published:2016-07-15

摘要:

针对传统Web指纹识别方法中识别对象局限于主流Web服务器软件的问题,提出一种基于余弦测度下K-means的网络空间终端设备识别模型。首先,设计识别模型和确定验证方法。其次,选取返回的HTTP数据包头部字段和状态码作为终端设备特征,对特征进行提取和向量化后转化为32维特征向量。再次,选取余弦距离函数作为K-means聚类算法中的相似性度量函数。最后,根据识别模型设计实验算法流程,对网络空间中的无标记样本和标记样本进行识别实验。实验结果表明,该模型能够识别无线路由器、网络摄像头和智能交换机等终端设备,并具有较高的识别准确率和较低的识别遗漏率。

关键词: 网络空间, 终端设备, K-means, 余弦测度, 指纹识别

Abstract:

Since the traditional web fingerprinting methods are limited to identification of mainstream web server softwares, a kind of cyberspace device identification model based on K-means with cosine distance measure is proposed.Firstly, identification model is designed and verification method is determined.Secondly, the header fields and the status code of HTTP response are selected as characteristics of terminal device and then the characteristics are transformed into 32-dimensional feature vector by feature extraction and vectorization.Thirdly, cosine distance function is selected as similarity measuring function in K-means.Finally, experiment algorithm process is designed according to the identification model and the experiments for unlabeled samples and labeled samples are carried out.The results show that the identification model works for many kinds of terminal devices, including wireless router, web camera, and intelligent switch, and has high accuracy rate and low omission rate.

Key words: cyberspace, terminal device, K-means, cosine measure, fingerprinting

中图分类号: