ATWebshell：基于对抗学习和长短语义感知的Webshell检测方法

doi:10.11871/jfdc.issn.2096-742X.2022.05.008

数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (5): 68-76.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.05.008

doi: 10.11871/jfdc.issn.2096-742X.2022.05.008

• 专题:第37次全国计算机安全学术交流会征文 • 上一篇下一篇

ATWebshell：基于对抗学习和长短语义感知的Webshell检测方法

郜洪奎,安通鉴^*(),税雪飞,王欣,范渊

杭州安恒信息技术股份有限公司,浙江杭州 310051

收稿日期:2022-08-02 出版日期:2022-10-20 发布日期:2022-10-27
通讯作者: 安通鉴
作者简介:郜洪奎,杭州安恒信息技术股份有限公司,主要研究方向为基于AI的网络安全技术。
本文中负责实验和论文实验设计。
GAO Hongkui is an employee of DAS-Security Co., Ltd. The main research field is AI-based network security technology.
In this paper, he is responsible for experiment development and experiment design.
E-mail: kui.hg@dbappsecurity.com.cn|安通鉴,杭州安恒信息技术股份有限公司,博士,主要研究领域为基于AI的网络安全技术。
本文中负责撰写论文摘要、实验设计、结论与展望和论文修改。
AN Tongjian, Ph.D., is an employee of DAS-Security Co., Ltd. The main research field is AI-based network security technology.
In this paper, he is responsible for the abstract, experimental design, conclusion and prospect, and paper revision.
E-mail: pacino.an@dbappsecurity.com

ATWebshell: Webshell Detection Model Based on Adversarial Learning and Long-Short Semantic Awareness

GAO Hongkui,AN Tongjian^*(),SHUI Xuefei,WANG Xin,FAN Yuan

DAS-Security Co., Ltd, Hangzhou, Zhejiang 310051, China

Received:2022-08-02 Online:2022-10-20 Published:2022-10-27
Contact: AN Tongjian

摘要/Abstract

摘要：

【目的】Webshell是一类基于网页脚本的Web攻击程序。黑客攻击者可以通过Webshell获取服务器相关权限来窃取有价值的信息和篡改网页内容等。Webshell种类繁多,现有的检测技术手段无法应对复杂灵活的Webshell,导致Webshell检测效果差,泛化能力弱等问题。【方法】针对目前存在问题,本文提出了ATWebshell,一种融合对抗学习和长短语义感知的Webshell检测模型。该模型一方面在词向量层主动引入对抗扰动来模拟攻击者对Webshell检测的对抗攻击,另一方面通过TextCNN和GRU 双塔模型联合学习句内和句间的恶意行为。【结果】实验结果表明,本文的模型ATWebshell在提升召回率的同时也提升了精确率。【结论】通过结果证明本文ATWebshell模型的合理性和有效性,本文的研究方法为其它研究提供了思路。

关键词: Webshell检测, 对抗学习, GRU, TextCNN

Abstract:

[Objective] Webshell is a type of web attacking program based on web scripting. Hackers obtain server-related privileges through Webshell to obtain valuable information and modify web content etc. Because there are many kinds of webshell attacks, the existing detection technology is unable to deal with complex and flexible webshells, resulting in poor detection accuracy and weak generalization ability. [Methods] To this end, this paper proposes a model named ATWebshell, which merges adversarial learning and long short semantic awareness model architecture. ATWebshell introduces adversarial disturbance in the word embedding layer to simulate the attacker’s adversarial attack on webshell detection. Then a bi-tower model including TextCNN and GRU is exploited to learn intra-line and inter-line semantic information. [Results] The experimental results show that the model ATWebshell in this paper not only improves the recall rate but also improves the precision rate. [Conclusions] The results prove the rationality and validity of the ATWebshell model in this paper, and the research method in this paper provides ideas for other researches.

Key words: Webshell detection, adversarial learning, GRU, TextCNN

郜洪奎,安通鉴,税雪飞,王欣,范渊. ATWebshell：基于对抗学习和长短语义感知的Webshell检测方法[J]. 数据与计算发展前沿, 2022, 4(5): 68-76.

GAO Hongkui,AN Tongjian,SHUI Xuefei,WANG Xin,FAN Yuan. ATWebshell: Webshell Detection Model Based on Adversarial Learning and Long-Short Semantic Awareness[J]. Frontiers of Data and Computing, 2022, 4(5): 68-76, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2022.05.008.

图/表 9

图1

图2

图3

表1

表2

表3

表4

表5

图4

参考文献 27

[1]	赵瑞杰, 施勇, 张涵, 等. 基于TF-IDF 的 Webshell 文件检测[J]. 计算机科学, 2020, 47(11A):363-367. doi: 10.11896/jsjkx.200100064
[2]	Hou Y T, Chang Y, Chen T, et al. Malicious web content detection by machine learning[J]. expert systems with applications, 2010, 37(1):55-60. doi: 10.1016/j.eswa.2009.05.023
[3]	Deng L Y, Lee D L, Chen Y H, et al. Lexical analysis for the webshell attacks[C]// 2016 International Sympo-sium on Computer,Consumer and Control (IS3C), IEEE, 2016:579-582.
[4]	Mingkun X, Xi C, Yan H. Design of software to search ASP web shell[J]. Procedia Engineering, 2012, 29:123-127. doi: 10.1016/j.proeng.2011.12.680
[5]	Behrens S, Hagen B. Web shell detection using NeoPI[EB/OL].[2022-09-22]. http://re-sources.infosecinstitute.com/web-shell-detection.
[6]	Hansen R J, Patterson M L. Guns and butter:Towards formal axioms of input validation[J]. Black Hat USA, August, 2005, 1(8):1-6.
[7]	郑毅. 基于机器学习的 IDS 研究[J]. 现代电子技术, 2006, 29(21):98-99.
[8]	田新广, 高立志, 张尔扬. 新的基于机器学习的入侵检测方法[J]. 通信学报, 2006, 27(6):108-114.
[9]	胡建康, 徐震, 马多贺, 等. 基于决策树的 Webshell 检测方法研究[J]. 网络新媒体技术, 2012, 1(6):15-19.
[10]	孟正, 梅瑞, 张涛, 等. Linux 下基于 SVM 分类器的 WebShell 检测方法研究[J]. 信息网络安全, 2014, 5(5):5-9.
[11]	Xie M, Hu J. Evaluating host-based anomaly detection systems:A preliminary analysis of adfa-ld[C]// 2013 6th international congress on image and signal processing (CISP), IEEE, 2013, 3:1711-1716.
[12]	茅雨绮, 施勇, 薛质. 基于抽象语法树和XGBoost的jsp_webshell检测方法研究[J]. 通信技术, 2020, 53(10):2543-2549.
[13]	Sun X, Lu X, Dai H. A matrix decomposition based webshell detection method[C]// Proceedings of the 2017 International Conference on Cryptography, Security and Privacy, 2017:66-70.
[14]	张涵, 薛质, 施勇. 基于多层神经网络的 Webshell 改进检测方法研究[J]. 通信技术, 2019, 52(1):179-183.
[15]	姜天. 基于卷积神经网络的 Webshell 检测方法研究[J]. 信息技术与网络安全, 2019, 38(7): 27-31.
[16]	吴斌, 赵力. 基于深度学习和半监督学习的 webshell 检测方法[J]. 信息技术与网络安全, 2018, 37(8) :19-22.
[17]	周龙, 王晨, 史崯. 基于RNN的Webshell检测研究[J]. 计算机工程与应用, 2020, 56(14):88-92. doi: 10.3778/j.issn.1002-8331.1904-0420
[18]	Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
[19]	Miyato T, Dai A M, Goodfellow I. Adversarial training methods for semi-supervised text classification[J]. arXiv preprint arXiv:1605.07725, 2016.
[20]	Chung J, Gulcehre C, Cho K H, et al. Empirical eval-uation of gated recurrent neural networks on sequence modeling[J]. arXiv preprint arXiv:1412.3555, 2014.
[21]	Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780. pmid: 9377276
[22]	Kim Y. Convolutional Neural Networks for Sentence Cla-ssification[EB/OL].[2022-09-22]. https://arxiv.org/pdf/1408.5882.pdf.
[23]	Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv: 1607. 01759,2016.
[24]	崔艳鹏, 史科杏, 胡建伟. 基于 XGBoost 算法的 Webshell 检测方法研究[J]. 计算机科学, 2018, 45(6A):375-379.
[25]	Li T, Ren C, Fu Y, et al. Webshell detection based on the word attention mechanism[J]. IEEE Access, 2019, 7: 185140-185147. doi: 10.1109/ACCESS.2019.2959950
[26]	Lv Z H, Yan H B, Mei R. Automatic and accurate detection of webshell based on convolutional neural network[C]// China Cyber Security Annual Conference, Springer, Si-ngapore, 2018:73-85.
[27]	Qi L, Kong R, Lu Y, et al. An end-to-end detection method for webshell with deep learning[C]// 2018 Eighth Inter-national Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), IEEE, 2018:660-665.

Webshell的GitHub链接	正常脚本的GitHub链接
https://github.com/ysrc/webshell-sample	https://github.com/laravel/laravel
https://github.com/xl7dev/WebShell	https://github.com/symfony/symfony
https://github.com/tanjiti/webshellSample	https://github.com/composer/composer
https://github.com/webshellpub/awsome-webshell	https://github.com/DesignPatternsPHP/DesignPatternsPHP
https://github.com/DeEpinGh0st/PHP-bypass-collection/	https://github.com/Seldaek/monolog
https://github.com/tdifg/WebShell	https://github.com/nextcloud/server
https://github.com/malwares/WebShell	https://github.com/bcit-ci/CodeIgniter
https://github.com/lhlsec/webshell	https://github.com/PHPMailer/PHPMailer
https://github.com/oneoneplus/webshell	https://github.com/monicahq/monica
https://github.com/vnhacker1337/Webshell	https://github.com/nikic/PHP-Parser

方法	精确率(P)	召回率(R)	F-1值
SVM^[10]	0.749	0.672	0.708
XGBoost^[24]	0.851	0.833	0.842
ATWebshell*	0.990	0.986	0.988

方法	精确率(P)	召回率(R)	F-1值
GRU +ATTENTION^[25]	0.962	0.973	0.967
CNN^[26]	0.953	0.959	0.956
LSTM^[27]	0.948	0.977	0.962
ATWebshell*	0.990	0.986	0.988

方法	精确率(P)	召回率(R)	F-1值
GRU +ATTENTION^[25]	0.961	0.944	0.953
CNN^[26]	0.965	0.958	0.961
LSTM^[27]	0.960	0.979	0.969
ATWebshell*	0.991	0.980	0.985

方法	精确率(P)	召回率(R)	F-1值
GRU	0.922	0.940	0.931
TextCNN	0.937	0.920	0.928
GRU+TextCNN（无对抗学习）	0.971	0.973	0.972
ATWebshell*	0.992	0.988	0.990

ATWebshell：基于对抗学习和长短语义感知的Webshell检测方法

ATWebshell: Webshell Detection Model Based on Adversarial Learning and Long-Short Semantic Awareness

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 27

相关文章 0

编辑推荐

Metrics

本文评价