Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (6): 35-43.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.06.004

doi: 10.11871/jfdc.issn.2096-742X.2025.06.004

• Special Issue: Call for Papers for the 40th National Conference on Computer Security • Previous Articles     Next Articles

Sensitive Information Identification and Analysis Method for Microdata Anonymization Empowered by LLM

DONG Wei(),LIAO Jiachun*(),YAO Sicheng,CHEN Haisu,KAN Sunan   

  1. Center of Big Data Technology, Nanhu Laboratory, Jiaxing, Zhejiang 314000, China
  • Received:2025-08-02 Online:2025-12-20 Published:2025-12-17
  • Contact: LIAO Jiachun E-mail:dwei@nanhulab.ac.cn;jliao@nanhulab.ac.cn

Abstract:

[Objective] Data anonymization is an effective approach to protecting personal privacy and promote the release of data value. However, current sensitive information (such as identifiers) identification phase of existing microdata anonymization processes suffer from issues such as excessive omissions of privacy information identification, and neglection of internal associations within microdata. Large Language Models (LLMs), with their robust semantic understanding capabilities, offer a new approach addressing these challenges. [Methods] This paper proposes an LLM-empowered microdata identification and analysis method. On one hand, this method combines data semantics to broaden the scope of direct identifier recognition and extracts refined quasi-identifier evaluation criteria from domestic anonymization standards, achieving high-precision automatic identification of quasi-identifiers. On the other hand, it conducts internal association analysis on microdata and performs following anonymization based on these associations to enhance the privacy and utility of the anonymized results. [Conclusions] Experiment results demonstrate that the method improves the accuracy of microdata identifiers recognition.

Key words: privacy protection, data anonymization, Large Language Model, microdata