Two Stage P2P Botnet Detection Method Based on Flow Similarity

NIU Wei-na; ZHANG Xiao-song; SUN En-bo; YANG Guo-wu; ZHAO Ling-yuan

doi:10.3969/j.issn.1001-0548.2017.06.019

Dec. 2017

Article Contents

Article Navigation > Journal of University of Electronic Science and Technology of China > 2017 > 46(6): 902-906, 948

NIU Wei-na, ZHANG Xiao-song, SUN En-bo, YANG Guo-wu, ZHAO Ling-yuan. Two Stage P2P Botnet Detection Method Based on Flow Similarity[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(6): 902-906, 948. doi: 10.3969/j.issn.1001-0548.2017.06.019

Citation:

NIU Wei-na, ZHANG Xiao-song, SUN En-bo, YANG Guo-wu, ZHAO Ling-yuan. Two Stage P2P Botnet Detection Method Based on Flow Similarity[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(6): 902-906, 948. doi: 10.3969/j.issn.1001-0548.2017.06.019

Two Stage P2P Botnet Detection Method Based on Flow Similarity

doi: 10.3969/j.issn.1001-0548.2017.06.019

1.
Center for Cyber Security, University of Electronic Science and Technology of China Chengdu 611731
2.
School of Computer Science and Engineering, University of Electronic Science and Technology of China Chengdu 611731

Received Date: 2016-06-28
Rev Recd Date: 2017-03-09
Publish Date: 2017-11-30

Abstract

The botnet has been one of the most common threats to the network security since it exploits multiple malicious codes like worm, Trojans, Rootkit, etc. toperform thedenial-of-service attack, send phishing links, and provide malicious services. Peer-to-peer (P2P) botnet is more difficult to be detected compared with IRC, HTTP and other types of botnets because it has typical features of the centralization and distribution. To solve these problems, we propose an effective two-stage traffic classification method to detect P2P botnet traffic based on both non-P2P traffic filtering mechanism and machine learning techniques on conversation features. At the first stage, the non-P2P packages are filtered to reduce the amount of network traffic, according to well-known ports, DNS query, and flow counting. At the second stage, the conversation features based on data flow features and flow similarity are extracted. Finally, the P2P botnet is detected by using Random Forest based on the decision tree model. Experimental evaluations on UNB ISCX botnet dataset shows that our two-stage detection method has a higher accuracy than traditional P2P botnet detection methods.
- botnet detection,
- conversation feature,
- flow similarity,
- P2P traffic identification

References

[1]	ZHU Z, LU G, CHEN Y, et al. Botnet research survey[C]//IEEE International Computer Software and Applications Conference. Turku:IEEE, 2008:967-972. http://ieeexplore.ieee.org/document/4591703/
[2]	LIVADAS C, WALSH R, LAPSLEY D, et al. Using machine learning techniques to identify botnet traffic[C]//31st IEEE Conference on Local Computer Networks. Tampa:IEEE, 2006:967-974. http://www.mendeley.com/catalog/using-machine-learning-techniques-identify-botnet-traffic-12/
[3]	CAI T, ZOU F. Detecting HTTP botnet with clustering network traffic[C]//20128th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM). Shanghai:IEEE, 2012:1-7. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6478491
[4]	ZEIDANLOO H R, MANAF A B A, AHMAD R B, et al. A proposed framework for P2P botnet detection[J]. International Journal of Engineering and Technology, 2010, 2(2):161-168. https://www.researchgate.net/publication/310793596_A_Proposed_Framework_for_P2P_Botnet_Detection
[5]	HADDADI F, CONG D L, PORTER L, et al. On the effectiveness of different botnet detection approaches[C]//International Conf on Information Security Practice and Experience. Beijing:ACM, 2015:121-135. https://www.researchgate.net/publication/278667543_On_the_Effectiveness_of_Different_Botnet_Detection_Approaches
[6]	WANG J S, LIU F, ZHANG J. Botnet detecting method based on group-signature filter[J]. Journal on Communications, 2010, 31(2):29-35. https://www.researchgate.net/publication/291313592_Botnet_detecting_method_based_on_group-signature_filter
[7]	ZHANG J, PERDISCI R, LEE W, et al. Detecting stealthy P2P botnets using statistical traffic fingerprints[J]. Journal of Child Psychology & Psychiatry, 2011, 14(14):271-282. doi: 10.1109/DSN.2011.5958212
[8]	ABDULLAH R S, ABDOLLAH M F, NOH Z A M, et al. Preliminary study of host and network-based analysis on P2P botnet detection[C]//TIME-E':International Conference on Technology, Informatics, Management, Engineering & Environment. Bandung:IEEE, 2013:105-109. http://ieeexplore.ieee.org/document/6611973/
[9]	ZHAO Y. The novel approach of P2P botnet node-based detection and applications[J]. Journal of Chemical and Pharmaceutical Research, 2014, 6(7):1055-1063. https://www.researchgate.net/publication/297510433_The_novel_approach_of_P2P_Botnet_node-based_detection_and_applications
[10]	ZHAO D, TRAORE I, SAYED B, et al. Botnet detection based on traffic behavior analysis and flow intervals[J]. Computers & Security, 2013, 39(4):2-16. http://www.sciencedirect.com/science/article/pii/S0167404813000837
[11]	ZHANG J, PERDISCI R, LEE W, et al. Building a scalable system for stealthy P2P-Botnet detection[J]. IEEE Transactions on Information Forensics & Security, 2014, 9(1):27-38. http://ieeexplore.ieee.org/document/6661360/
[12]	SHARIFNYA R, ABADI M. Dfbotkiller:Domain-flux botnet detection based on the history of group activities and failures in DNS traffic[J]. Digital Investigation, 2015, 12(12):15-26. http://www.sciencedirect.com/science/article/pii/S1742287614001182
[13]	BUCZAK A L, GUVEN E. A survey of data mining and machine learning methods for cyber security intrusion detection[J]. IEEE Communications Surveys & Tutorials, 2015, 18(2):1153-1176. http://ieeexplore.ieee.org/document/7307098/
[14]	YIN C, AWLLA A H, YIN Z, et al. Botnet detection based on genetic neural network[J]. International Journal of Security and Its Applications, 2015, 9(11):97-104. doi: 10.14257/ijsia
[15]	CONSTANTINOU F, MAVROMMATIS P. Identifying known and unknown peer-to-peer traffic[C]//IEEE International Symposium on Network Computing & Applications. Cambridge:IEEE, 2006:93-102. http://dl.acm.org/citation.cfm?id=1158211
[16]	MADHUKAR A, WILLIAMSON C. A longitudinal study of P2P traffic classification[C]//MASM' 06:14th IEEE International Symposium on Modeling, Analysis, and Simulation. Monterey:IEEE, 2006:179-188. http://dl.acm.org/citation.cfm?id=1158127
[17]	KARAGIANNIS T, BROIDO A, FALOUTSOS M, et al. Transport layer identification of P2P traffic[C]//ACM SIGCOMM Conference on Internet Measurement. Taormina:ACM, 2004:121-134. http://dl.acm.org/citation.cfm?id=1028804
[18]	WANG C, ZHOU X, YOU F, et al. Design of P2P traffic identification based on DPI and DFI[C]//CNMT' 09:Computer Network and Multimedia Technology. Wuhan:IEEE, 2009:1-4. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5374577
[19]	BEIGI E B, JAZI H H, STAKHANOVA N, et al. Towards effective feature selection in machine learning-based botnet detection approaches[C]//CNS' 14:18th IEEE Conference on Communications and Network Security. San Francisco:IEEE, 2014:247-255. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6997492

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views(4253) PDF downloads(149) Cited by()

Proportional views

HTML

当今时代，网络环境错综复杂，安全问题日益突出。由于僵尸网络的C&C服务器具有更高的隐蔽性，僵尸程序经常被实施大规模网络攻击的黑客所采用，几乎所有的DDoS攻击和80%~90%的垃圾邮件攻击都是由僵尸网络发起的^[1]。因此，僵尸网络已成为网络安全中不容忽视的问题。

早期的僵尸网络主要采用IRC^[2]和HTTP^[3]作为通信协议，具有单点失效问题，很容易被检测和摧毁。如今，大多数僵尸网络使用P2P技术来创建C&C(命令和控制)机制以增强网络通信隐蔽性^[4]。相比采用IRC和HTTP协议的僵尸网络，不具有中心节点的P2P僵尸网络具有更大的威胁性和隐蔽性。所以，P2P僵尸网络越来越受到攻击者的青睐，P2P僵尸网络检测^[5]也成为安全领域的研究热点。

目前，P2P应用已经引起了互联网流量爆炸式的增长，这对数据存储以及实时分析来讲都是一个巨大的挑战。因此，在检测P2P僵尸网络的早期，对网络中的非P2P流量进行过滤就显得尤为重要。

本文针对P2P僵尸网络提出一种两阶段的检测方法：第一阶段基于端口判断、DNS查询以及会话中数据流计数来过滤非P2P流量；第二阶段基于会话特征来识别P2P僵尸网络，其中本文使用基于会话特征的检测方法有效降低了需要分析的数据条数。然后采用基于决策树模型的随机森林算法对流量进行分类识别，从而检测出僵尸网络。同时，在UNB数据集上将本文算法与另外两种已有算法做了实验对比和分析，实验结果表明随机森林算法对P2P僵尸网络的检测准确率更高。

1. 相关工作

根据检测策略的不同，P2P僵尸网络检测方法包括以下4种类型：基于特征码^[6-7]、基于主机行为^[8-9]、基于流行为特征^[10]和基于流相似性^[11]。

1.1. 基于特征码的检测

基于特征码的检测^[6-7]是通过分析僵尸网络应用程序或者通信流量提取其特征(如MD5、PE头格式等)来设计检测规则。但是最初的检测规则将会在僵尸网络应用程序改变它们的通信方式和数据包格式之后失效。与此同时，如果当前使用的特征码不能有效表示僵尸程序的特征，该检测策略就会有较高的误报率。

1.2. 基于主机行为的检测

基于主机行为的检测^[8-9]是通过在一个可控环境中监测主机中进程、文件、网络连接、注册表内容的更改来检测僵尸程序。该方法不能检测新型和变种的僵尸网络程序，如攻击者可以使用诸如rootkit、反调试等新的攻击和隐藏技术躲避此种检测策略。

1.3. 基于流行为特征的检测

基于流行为特征^[10]的检测主要是在僵尸网络C&C控制阶段使用^[12]，因为C&C控制阶段的流量与正常的网络流量在流特征与通信规律上存在差异，这些差异包括平均数据包大小、周期性连接等。因此，可以结合机器学习^[13]、神经网络^[14]对僵尸网络实时监控。

基于流行为特征的僵尸网络检测方法主要分析如下两个特征：连接失败率和流特征。其中，流特征又包括上下行数据包的数量，上下行传输字节的大小，上下行数据包的平均长度、最大长度、平均方差，数据流的持续时间以及在一个流中已加载的数据包的总长度。

这种方法具有较高的检测率，因为它不依赖于僵尸网络的类别来提取流的共同特征向量。所以，该检测策略广受流量分析领域专家学者的关注。在高速、复杂、多变的网络环境中，决定检测效率和准确率的主要因素是提取的特征和使用的分类策略。

1.4. 基于流相似性的检测

研究表明^[11]，加入同一个僵尸网络的僵尸主机之间的通信行为具有相似性。所以，P2P僵尸网络流量识别可以采用如下方案：首先对获取到的网络流量进行分析处理，并提取特征；然后结合聚类算法对上一阶段提取的流数据进行聚簇；最后分析判断P2P僵尸网络流量位于哪一个簇中。

该方案是通过设置阈值的方式来提高检测准确率，无需使用现有的僵尸网络数据流进行训练。但是，如果当前网络中只有一台僵尸主机，或者在已捕获的数据包中未发现不同僵尸主机的通信流量，此方法也不会有太大效果。

4. 结束语

本文提出了一种基于会话特征的P2P僵尸网络检测方法，首先分别从包、流和会话层面过滤非P2P流量，然后使用基于会话特征的有监督的机器学习算法检测P2P僵尸网络，该方法同时结合基于流特征的检测方法与基于流相似性的检测方法的优点。最后通过使用公开的数据集验证所提方法的有效性，实验结果表明，该方法能高效地检测P2P僵尸网络流量。

未来将致力于非P2P流量过滤算法的优化，进一步提升其性能。此外，希望将基于会话特征的检测方法推广到其他类型僵尸网络的检测与分类中。

Reference (19)

[1]	ZHU Z, LU G, CHEN Y, et al. Botnet research survey[C]//IEEE International Computer Software and Applications Conference. Turku:IEEE, 2008:967-972.
[2]	LIVADAS C, WALSH R, LAPSLEY D, et al. Using machine learning techniques to identify botnet traffic[C]//31st IEEE Conference on Local Computer Networks. Tampa:IEEE, 2006:967-974.
[3]	CAI T, ZOU F. Detecting HTTP botnet with clustering network traffic[C]//20128th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM). Shanghai:IEEE, 2012:1-7.
[4]	ZEIDANLOO H R, MANAF A B A, AHMAD R B. A proposed framework for P2P botnet detection[J]. International Journal of Engineering and Technology, 2010, 2(2): 161-168.
[5]	HADDADI F, CONG D L, PORTER L, et al. On the effectiveness of different botnet detection approaches[C]//International Conf on Information Security Practice and Experience. Beijing:ACM, 2015:121-135.
[6]	WANG J S, LIU F, ZHANG J. Botnet detecting method based on group-signature filter[J]. Journal on Communications, 2010, 31(2): 29-35.
[7]	ZHANG J, PERDISCI R, LEE W. Detecting stealthy P2P botnets using statistical traffic fingerprints[J]. Journal of Child Psychology & Psychiatry, 2011, 14(14): 271-282.
[8]	ABDULLAH R S, ABDOLLAH M F, NOH Z A M, et al. Preliminary study of host and network-based analysis on P2P botnet detection[C]//TIME-E':International Conference on Technology, Informatics, Management, Engineering & Environment. Bandung:IEEE, 2013:105-109.
[9]	ZHAO Y. The novel approach of P2P botnet node-based detection and applications[J]. Journal of Chemical and Pharmaceutical Research, 2014, 6(7): 1055-1063.
[10]	ZHAO D, TRAORE I, SAYED B. Botnet detection based on traffic behavior analysis and flow intervals[J]. Computers & Security, 2013, 39(4): 2-16.
[11]	ZHANG J, PERDISCI R, LEE W. Building a scalable system for stealthy P2P-Botnet detection[J]. IEEE Transactions on Information Forensics & Security, 2014, 9(1): 27-38.
[12]	SHARIFNYA R, ABADI M. Dfbotkiller:Domain-flux botnet detection based on the history of group activities and failures in DNS traffic[J]. Digital Investigation, 2015, 12(12): 15-26.
[13]	BUCZAK A L, GUVEN E. A survey of data mining and machine learning methods for cyber security intrusion detection[J]. IEEE Communications Surveys & Tutorials, 2015, 18(2): 1153-1176.
[14]	YIN C, AWLLA A H, YIN Z. Botnet detection based on genetic neural network[J]. International Journal of Security and Its Applications, 2015, 9(11): 97-104. doi: 10.14257/ijsia
[15]	CONSTANTINOU F, MAVROMMATIS P. Identifying known and unknown peer-to-peer traffic[C]//IEEE International Symposium on Network Computing & Applications. Cambridge:IEEE, 2006:93-102.
[16]	MADHUKAR A, WILLIAMSON C. A longitudinal study of P2P traffic classification[C]//MASM' 06:14th IEEE International Symposium on Modeling, Analysis, and Simulation. Monterey:IEEE, 2006:179-188.
[17]	KARAGIANNIS T, BROIDO A, FALOUTSOS M, et al. Transport layer identification of P2P traffic[C]//ACM SIGCOMM Conference on Internet Measurement. Taormina:ACM, 2004:121-134.
[18]	WANG C, ZHOU X, YOU F, et al. Design of P2P traffic identification based on DPI and DFI[C]//CNMT' 09:Computer Network and Multimedia Technology. Wuhan:IEEE, 2009:1-4.
[19]	BEIGI E B, JAZI H H, STAKHANOVA N, et al. Towards effective feature selection in machine learning-based botnet detection approaches[C]//CNS' 14:18th IEEE Conference on Communications and Network Security. San Francisco:IEEE, 2014:247-255.

应用程序	端口号
SSH	22
Telnet	23
MAIL	25, 110, 143, 465, 220, 993, 995
NetBios	125, 137, 139, 445
Remote	3 389
FTP	20, 21
NTP	123

特征值	特征值的说明
avg_dura	相同会话中不同网络流的持续总时间的均值
std_dura	相同会话中不同网络流的持续总时间的标准差
min_dura	相同会话中不同网络流的持续总时间的最小值
max_dura	相同会话中不同网络流的持续总时间的最大值
avg_f(b)int	相同会话中不同网络流的上行(下行)数据包传输的平均间隔时间
max_f(b)pl	相同会话中不同网络流的上行(下行)传输数据包长度的最大值的均值
avg_f(b)pl	相同会话中不同网络流的上行(下行)传输数据包长度的均值的均值
min_f(b)pl	相同会话中不同网络流上行(下行)传输数据包长度的最小值的均值
std_avg_f(b)pl	相同会话中不同网络流上行(下行)传输数据包长度的平均值的标准差
avg_f(b)pen	相同会话中不同网络流上行(下行)传输的有效数据包个数的平均值
std_avg_f(b)pen	相同会话中不同网络流上行(下行)传输的有效数据包个数的标准差
avg_f(b)pb	相同会话中不同网络流上行(下行)传输的总字节数的平均值
std_f(b)pb	相同会话中不同网络流上行(下行)传输的总字节数的标准差

Two Stage P2P Botnet Detection Method Based on Flow Similarity

doi: 10.3969/j.issn.1001-0548.2017.06.019