数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (6): 1-12.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.06.001

doi: 10.11871/jfdc.issn.2096-742X.2025.06.001

• 专刊:第40次全国计算机安全学术交流会征文 • 上一篇    下一篇

基于Transformer和逐深度卷积的JPEG文件碎片识别

朱楠1,2,*(),皇智远2   

  1. 1.西安工业大学,陕西 西安 710021
    2.陕西御数维安科技有限公司,陕西 咸阳 712034
  • 收稿日期:2025-08-02 出版日期:2025-12-20 发布日期:2025-12-17
  • 通讯作者: 朱楠
  • 作者简介:朱楠,西安工业大学,副教授、硕士生导师,主要研究方向为数字媒介取证、计算机视觉。
    本文承担工作为:模型设计、模型优化、论文修改。
    ZHU Nan is an associate professor and Master’s supervisor at Xi’an Technological University. His main research interests include digital media forensics and computer vision.
    In this paper, he is mainly response for designing, optimizing models and manuscript revision.
    E-mail: nanzhu.xatu@foxmail.com
  • 基金资助:
    陕西省重点研发计划项目(2023-YBSF-473)

JPEG File Fragment Recognition Based on Transformer and Depth-Wise Convolution

ZHU Nan1,2,*(),HUANG Zhiyuan2   

  1. 1. Xi’an Technological University, Xi’an, Shaanxi 710021, China
    2. Shaanxi Yushu Weian Technology Co., Ltd., Xianyang, Shaanxi 712034, China
  • Received:2025-08-02 Online:2025-12-20 Published:2025-12-17
  • Contact: ZHU Nan

摘要:

【目的】设计一种高准确率的JPEG文件碎片识别方法。【方法】构建基于Transformer和逐深度卷积的文件碎片识别深度网络,该网络以原始字节作为输入,首先通过嵌入层来降低数据维度并学习字节之间的关系,随后分别输入由Transformer编码器构成的全局特征提取分支网络和由Inception模块构成的局部特征提取分支网络,最终将提取到的全局特征和局部特征进行串接后输入到决策模块进行分类。【结果】在FFT-75数据集上对75类不同的文件碎片进行识别,能够取得70.7%(512字节)/83.3%(4,096字节)的整体分类准确率和91.8%(512字节)和94.2%(4,096字节)的JPEG文件碎片分类准确率。【结论】不论是在整体分类准确率上还是对JPEG文件碎片的分类准确率上,相较于对比方法均有所提升,验证了所提出的全局特征和局部特征融合的有效性。

关键词: 电子数据取证, 文件碎片类型识别, JPEG文件, Transformer, 卷积神经网络

Abstract:

[Objective] This paper aims to design a highly effective JPEG file fragment recognition method. [Methods] This work constructs a deep network for file fragment recognition based on the Transformer and depthwise convolution. The network uses original bytes as input. First, the embedding layer is utilized to reduce data dimensions and learn the relationships between bytes. The output of the embedding layer is respectively fed into a global feature extraction branch network composed of Transformer encoders and a local feature extraction branch network consisting of inception modules. Finally, the extracted global features and local features are concatenated and fed into the decision module for classification. [Results] On the FFT-75 dataset, for identifying 75 different types of file fragments, an overall classification accuracy of 70.7% (512 bytes) and 83.3% (4096 bytes) and a JPEG file fragment classification accuracy of 91.8% (512 bytes) and 94.2% (4096 bytes) can be achieved. [Conclusions] Both in terms of the overall classification accuracy and the JPEG file fragment classification accuracy, moderate improvement can be achieved when compared with reference methods, which verifies the effectiveness of the proposed fusion strategy of global features and local features.

Key words: electronic data forensics, file fragment type identification, JPEG file, Transformer, CNN