Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (3): 127-138.
CSTR: 32002.14.jfdc.CN10-1649/TP.2024.03.014
doi: 10.11871/jfdc.issn.2096-742X.2024.03.014
• Technology and Application • Previous Articles Next Articles
YAN Jin(),DONG Kejun,LI Hongtao*(
)
Received:
2023-03-07
Online:
2024-06-20
Published:
2024-06-21
YAN Jin, DONG Kejun, LI Hongtao. A Deep Web Tracker Detection Method with Coordinated Semantic and Co-Occurrence Features[J]. Frontiers of Data and Computing, 2024, 6(3): 127-138, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2024.03.014.
Table 4
Comparison results of different web tracker identification methods"
数据集 | 方法 | 准确率 | 精确率 | 召回率 | F1分数 |
---|---|---|---|---|---|
top1,000 | 决策树 | 0.766,7 | 0.768,1 | 0.771,4 | 0.769,7 |
随机森林 | 0.825,0 | 0.832,3 | 0.901,4 | 0.865,5 | |
逻辑回归 | 0.785,7 | 0.793,1 | 0.845,2 | 0.818,3 | |
Label Propagation | 0.759,2 | 0.568,5 | 0.201,1 | 0.297,1 | |
本方法 | 0.879,3 | 0.852,5 | 0.912,9 | 0.881,7 | |
top5,000 | 决策树 | 0.790,0 | 0.789,6 | 0.784,9 | 0.787,3 |
随机森林 | 0.853,4 | 0.857,0 | 0.898,5 | 0.877,2 | |
逻辑回归 | 0.814,2 | 0.818,5 | 0.856,3 | 0.837,0 | |
Label Propagation | 0.779,2 | 0.661,0 | 0.203,0 | 0.310,6 | |
本方法 | 0.906,1 | 0.891,1 | 0.924,5 | 0.907,5 | |
top10,000 | 决策树 | 0.788,6 | 0.790,0 | 0.797,2 | 0.793,5 |
随机森林 | 0.857,2 | 0.860,9 | 0.906,1 | 0.883,0 | |
逻辑回归 | 0.813,6 | 0.818,2 | 0.859,4 | 0.838,3 | |
Label Propagation | 0.790,0 | 0.644,7 | 0.199,1 | 0.304,2 | |
本方法 | 0.910,4 | 0.890,2 | 0.936,6 | 0.912,8 | |
top20,000 | 决策树 | 0.799,6 | 0.800,7 | 0.805,8 | 0.803,2 |
随机森林 | 0.863,0 | 0.866,3 | 0.907,5 | 0.886,4 | |
逻辑回归 | 0.817,6 | 0.822,4 | 0.864,5 | 0.842,9 | |
Label Propagation | 0.795,8 | 0.624,3 | 0.174,8 | 0.273,1 | |
本方法 | 0.910,9 | 0.887,6 | 0.941,4 | 0.913,7 |
[1] | ENGLEHARDT S, NARAYANAN A. Online Tracking: A 1-million-site Measurement and Analysis[C]// Proc of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), ACM, 2016: 1388-1401. |
[2] | CASTELL-UROZ I, SOLE-PARETA J, BARLET-ROS P. Network measurements for web tracking analysis and detection: A tutorial[J]. IEEE Instrumentation & Measurement Magazine, 2020, 23(9): 50-57. |
[3] | WANG Z, LI Z, XUE M, TYSON G. Exploring the Eastern Frontier: A First Look at Mobile App Tracking in China[C]// Proc of the 2020 Passive and Active Measurement (PAM), Springer, 2020: 314-328. |
[4] |
SU J, LI Z, GRUMBACH S, et al. A Cartography of Web Tracking using DNS Records[J]. Computer Communications, 2019, 134: 83-95.
doi: 10.1016/j.comcom.2018.11.008 |
[5] | EasyList. Web跟踪器列表[EB/OL]. [2023-02-25]. https://easylist.to/. |
[6] | EasyPrivacy. Web跟踪器列表[EB/OL]. [2023-02-25]. https://easylist.to/easylist/easyprivacy.txt. |
[7] | YU Z, MACBETH S, MODI K, et al. Tracking the Trackers[C]// Proc of the 2016 International Conference on World Wide Web (WWW), ACM, 2016: 121-132. |
[8] | METWALLEY H, TRAVERSO S, MELLIA M. Unsupervised Detection of Web Trackers[C]// Proc of the 2015 IEEE Global Communications Conference (GLOBECOM), IEEE, 2015: 1-6. |
[9] | CASTELL-UROZ I, POISSONNIER T, MANNEBACK P, et al. URL-based Web Tracking Detection Using Deep Learning[C]// Proc of the 16th International Conference on Network and Service Management (CNSM), IEEE, 2020: 1-5. |
[10] | GUGELMANN D, HAPPE H, AGER B, et al. An Automated Approach for Complementing Ad Blockers’ Blacklists[C]// Proc of the Privacy Enhancing Technologies (PET), 2015: 282-298. |
[11] | KALAVRI V, BLACKBURN J, VARVELLO M, et al. Like a Pack of Wolves: Community Structure of Web Trackers[C]// Proc of the 2016 Passive and Active Measurement (PAM), 2016: 42-54. |
[12] | IKRAM M, ASGHAR H, KAAFAR H, et al. Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning[C]// Proc of the 2017 Privacy Enhancing Technologies, 2017: 79-99 |
[13] | WU Q, LIU Q, ZHANG Y, et al. A machine learning approach for detecting third-party trackers on the web[C]// Proc of the European Symposium on Research in Computer Security, Springer, 2016: 238-258. |
[14] | IQBAL U, SNYDER P, ZHU S, et al. Adgraph: A graph-based approach to ad and tracker blocking[C]// Proc of the 2020 IEEE Symposium on Security and Privacy (S&P), IEEE, 2020: 763-776. |
[15] | CASTELL-UROZ I, SOLÉ-PARETA J, BARLET-ROS P. TrackSign: Guided Web Tracking Discovery[C]// Proc of the IEEE INFOCOM 2021, IEEE, 2021: 1-10. |
[16] | PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: Online learning of social representations[C]// Proc of the 20th ACM International Conference on Knowledge Discovery and data mining (KDD), ACM, 2014: 701-710. |
[17] | AdBlock Plus. 广告拦截工具[EB/OL].[2023-02-25]. https://adblockplus.org/en/. |
[18] | Ghostery. “Ghostery Makes the Web Cleaner, Faster and Safer!,”[EB/OL]. [2020-02-02]. https://www.ghostery.com/. |
[19] | HILL R. “uBlock Origin”[EB/OL]. [2020-02-02]. https://github.com/gorhill/uBlock. |
[20] | Disconnect. “Disconnect | Take back your privacy,”[EB/OL], [2020-02-02]. http://disconnect.me. |
[21] | LI Z, YANG D, LI Z, et al. Mobile Content Hosting Infrastructure in China: A View from a Cellular ISP[C]// Proc. of the 2018 Passive and Active Measurement (PAM), Springer, 2018: 100-113 |
[22] | SCHÜPPEN S, TEUBERT D, HERRMANN P, et al. FANCI : Feature-based automated nxdomain classification and intelligence[C]// Proc of the 27th USENIX Security Symposium, USENIX Association, 2018: 1165-1181. |
[23] | ANDERSON H S, WOODBRIDGE J, FILAR B. Deepdga: Adversarially-tuned domain generation and detection[C]// Proc of the 2016 ACM Workshop on Artificial Intelligence and Security, ACM 2016: 13-21. |
[24] | SUN X, WANG Z, YANG J, et al. Deepdom: Malicious domain detection with scalable and heterogeneous graph convolutional networks[J]. Computers & Security, 2020, 99: 1-16 |
[25] | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J/OL]. arXiv preprint, 2013. https://doi.org/10.48550/arXiv.1301.3781 |
[26] | DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, NAACL-HLT 2019, 1: 4171-4186. |
[27] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need[J/OL]. arXiv preprint, 2017. https://doi.org/10.48550/arXiv.1706.03762. |
[28] | CASTELL-UROZ I. Web tracking datasets[EB/OL]. [2023-02-25]. https://www.cba.upc.edu/downloads/category/29-web-tracking-datasets. |
[29] | COOPER K. Alexa: Most popular website list[EB/OL]. [2022-12-01]. https://www.alexa.com/. |
[30] | PASZKE A, GROSS S, MASSA F, et al. Pytorch: An imperative style, high-performance deep learning library[C]// Proc. of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc., 2019: 8024-8035. |
[1] | LIAO Libo, WANG Shudong, SONG Weimin, ZHANG Zhaoling, LI Gang, HUANG Yongsheng. The Study of Jet Tagging Algorithm Based on DeepSets at CEPC [J]. Frontiers of Data and Computing, 2024, 6(3): 108-115. |
[2] | KOU Dazhi. Automatic Teeth Segmentation on Dental Panoramic Radiographs with Deep Learning [J]. Frontiers of Data and Computing, 2024, 6(3): 162-172. |
[3] | CAI Chengfei, LI Jun, JIAO Yiping, WANG Xiangxue, GUO Guanchen, XU Jun. Progress and Challenges of Medical Multimodal Data Fusion Methods Based on Deep Learning in Oncology [J]. Frontiers of Data and Computing, 2024, 6(3): 3-14. |
[4] | ZHENG Yinuo, SUN Muyi, ZHANG Hongyun, ZHANG Jing, DENG Tianzheng, LIU Qian. Application of Deep Learning in Dental Implant Imaging: Research Progress and Challenges [J]. Frontiers of Data and Computing, 2024, 6(3): 41-49. |
[5] | YUAN Jialin, OUYANG Rushan, DAI Yi, LAI Xiaohui, MA Jie, GONG Jingshan. Assessing the Clinical Utility of a Deep Learning-Based Model for Calcification Recognition and Classification in Mammograms [J]. Frontiers of Data and Computing, 2024, 6(2): 68-79. |
[6] | WANG Ziyuan, WANG Guozhong. Application of Improved Lightweight YOLOv5 Algorithm in Pedestrian Detection [J]. Frontiers of Data and Computing, 2023, 5(6): 161-172. |
[7] | ZHANG Rong, LIU Yuan. Multi-Level Data Augmentation Method for Aspect-Based Sentiment Analysis [J]. Frontiers of Data and Computing, 2023, 5(5): 140-153. |
[8] | JU Jiaji, HUANG Bo, ZHANG Shuai, GUO Ruyan. A Dual-Channel Sentiment Analysis Model Integrating Sentiment Lexcion and Self-Attention [J]. Frontiers of Data and Computing, 2023, 5(4): 101-111. |
[9] | LI JunFei, XU LiMing, WANG Yang, WEI Xin. Review of Automatic Citation Classification Based on Deep Learning Technology [J]. Frontiers of Data and Computing, 2023, 5(4): 86-100. |
[10] | LI Yan,HE Hongbo,WANG Runqiang. A Survey of Research on Microblog Popularity Prediction [J]. Frontiers of Data and Computing, 2023, 5(2): 119-135. |
[11] | LIU Yunfan,LI Qi,SUN Zhenan,TAN Tieniu. Face Age Editing Methods Based on Generative Adversarial Network: A Survey [J]. Frontiers of Data and Computing, 2023, 5(2): 2-23. |
[12] | TU Youyou,ZHENG Qijing,ZHAO Jin. Research on Quantum Proton Coupled Charge Transfer Process Based on Deep Neural Network [J]. Frontiers of Data and Computing, 2023, 5(2): 37-49. |
[13] | XU Songyuan,LIU Feng. ESDRec: A Data Recommendation Model for Earth Big Data Platform [J]. Frontiers of Data and Computing, 2023, 5(1): 55-64. |
[14] | CHEN Qiong,YANG Yong,HUANG Tianlin,FENG Yuan. A Survey on Few-Shot Image Semantic Segmentation [J]. Frontiers of Data and Computing, 2021, 3(6): 17-34. |
[15] | PU Xiaorong,HUANG Jiaxin,LIU Junchi,SUN Jiayu,LUO Jixiang,ZHAO Yue,CHEN Kecheng,REN Yazhou. A Survey on Clinical Oriented CT Image Denoising [J]. Frontiers of Data and Computing, 2021, 3(6): 35-49. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||||||
Full text 246
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Abstract 180
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||