数据与计算发展前沿 ›› 2026, Vol. 8 ›› Issue (1): 158-167.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.01.013

doi: 10.11871/jfdc.issn.2096-742X.2026.01.013

• 技术与应用 • 上一篇    下一篇

基于PolyLoss函数的不平衡犯罪隐语文本增强方法

李文邦(),颜靖华*(),董泽   

  1. 中国人民公安大学,信息网络安全学院,北京 100038
  • 收稿日期:2025-04-01 出版日期:2026-02-20 发布日期:2026-02-02
  • 通讯作者: 颜靖华
  • 作者简介:李文邦,中国人民公安大学,硕士研究生,研究方向为警务数据分析。
    本文主要工作为开展实验和论文撰写。
    LI Wenbang, is a master's student at the People’s Public Security University of China. His research interest is Police data analysis.
    In this paper, he is responsible for conducting experiments and writing the paper.
    E-mail: 1053058921@qq.com|颜靖华,博士,副教授,研究方向为数据警务技术等。
    本文主要工作为定制研究计划。
    YAN Jinghua, Ph.D., is an associate professor. Her research interests include data policing technology.
    In this paper, she is responsible for formulating the research plan.
    E-mail: yanjing hua@ppsuc.edu.cn
  • 基金资助:
    高校基本科研业务费项目“基于仿真的指挥效能评估”(2022JKF02038)

An Implicit Transcript Enhancement Method for Unbalanced Crimes Based on PolyLoss Function

LI Wenbang(),YAN Jinghua*(),DONG Ze   

  1. School of Information and Network Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2025-04-01 Online:2026-02-20 Published:2026-02-02
  • Contact: YAN Jinghua

摘要:

【目的】为解决犯罪隐语数据集存在的不平衡问题,进一步提升对犯罪隐语文本的分类效果。【方法】对于长文本,提出SimPoly方法;对于短文本,提出EDAPoly方法。【结果】实验结果表明,SimPoly和EDAPoly方法在解决犯罪隐语数据集不平衡问题上显著提升了分类模型的表现,相较于未进行文本增强的基线方法,在准确率、召回率和F1分数等指标上均有明显提升。【结论】不仅为罪隐语识别技术的实际应用提供了一种有效的解决方案,也为类似的不平衡文本分类任务提供了新的思路和支持。

关键词: 犯罪隐语, 文本增强, 损失函数, 不平衡文本处理

Abstract:

[Objective] This study aims to address the imbalance issue in the criminal cant dataset and further improve the classification performance of criminal cant texts. [Method] For long texts, we present SimPoly, while for short texts, EDAPoly is proposed. [Result] Experimental results demonstrate that SimPoly and EDAPoly significantly improve classification model performance on imbalanced criminal cant datasets, achieving notable gains in accuracy, recall, and F1-score compared to baseline methods without text augmentation. [Conclusion] The proposed method not only provides an effective solution for the practical application of criminal cant recognition technology, but also offers new ideas and support for similar imbalanced text classification tasks.

Key words: criminal cant, text augmentation, loss function, unbalanced text processing