Welcome to Journal of University of Chinese Academy of Sciences,Today is

›› 2020, Vol. 37 ›› Issue (4): 553-561.DOI: 10.7523/j.issn.2095-6134.2020.04.016

• Research Articles • Previous Articles     Next Articles

Intrusion detection method based on entity embedding and long short-term memory networks

LAI Xunfei1,2,3,4, LIANG Xuwen2,3,4, XIE Zhuochen3, LI Zongwang3,4   

  1. 1. Shanghai Institute of Microsyst&Information Technology, Chinese Academy of Sciences, Shanghai 200050, China;
    2. School of Information Science&Technology, ShanghaiTech University, Shanghai 201210, China;
    3. Shanghai Engineering Center for Microsatellites, Chinese Academy of Sciences, Shanghai 201203, China;
    4. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2019-01-25 Revised:2019-04-03 Online:2020-07-15

Abstract: Due to the inability to effectively deal with the representation of categorical variables in intrusion data, the network intrusion detection has low accuracy and high false negative rate. A method combining entity embedding and long short-term memory network (LSTM) is proposed. First, when the data is preprocessed, the numerical variable data and categorical variable data are separated, and the categorical variable data are mapped into an Euclidean space by using the entity embedding method to obtain a vector representation and then this vector is embedded into the numeric data to get the input data. Then, by inputting the data into the long short-term memory network, the parameters are updated by time back propagation. Thus the optimal embedded vector is obtained as the input feature, and a relatively optimal detection model of the LSTM network is also obtained through training. Experiments are carried out on the data set NSL-KDD, and the results show that entity embedding is an effective method to deal with categorical variables in network intrusion data. The model composed of LSTM network effectively improves the detection rate. In the processing of categorical variables, the accuracy of detection using entity embedding method increases by 1.44 percentage points and the false negative rate decreases by 2.99 percentage points, compared with those using the traditional One-Hot coding method.

Key words: entity embedding, LSTM, intrusion detection, categorical variables

CLC Number: