基于推理攻击的生成模型隐私风险评估技术研究与应用综述

doi:10.11871/jfdc.issn.2096-742X.2025.03.005

数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (3): 48-66.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.03.005

doi: 10.11871/jfdc.issn.2096-742X.2025.03.005

• 专刊：中国科学院计算机网络信息中心成立30周年 • 上一篇下一篇

基于推理攻击的生成模型隐私风险评估技术研究与应用综述

张宁徽^1,²(),龙春^1,^*(),万巍¹,李婧¹,杨帆¹,魏金侠¹,付豫豪¹

1.中国科学院计算机网络信息中心，北京 100083
2.中国科学院大学，北京100190

收稿日期:2025-04-28 出版日期:2025-06-20 发布日期:2025-06-25
通讯作者: *龙春（E-mail: anquanip@cnic.cn）
作者简介:张宁徽，中国科学院计算机网络信息中心，硕士研究生，主要研究方向为数据安全与隐私保护。
本文负责文献调研，整理分析。
ZHANG Ninghui, a Master’s student at the Computer Network Information Center, Chinese Academy of Sciences (CNIC), specializes in data security and privacy protection.
In this paper, he is responsible for literature review,analysis, and synthesis.
E-mail: nhzhang@cnic.cn|龙春，中国科学院计算机网络信息中心，正高级工程师，博士生导师。计算机学会安全专委会委员，中国互联网协会青年专家。主要从事智能网络安全保障、安全大数据挖掘与深度分析等方面的科研工作，获得北京市科学技术奖科学技术进步二等奖。
本文负责论文框架设计、文献分析。
LONG Chun is a senior engineer in the Computer Network Information Center, Chinese Academy of Sciences. He also serves as a Ph.D. supervisor at the University of Chinese Academy of Sciences. He is Member of the Security Committee of the Computer Society, and a young expert at the China Internet Association.He is engaged in scientific research in the fields of intelligent network security protection, security big data mining, and in-depth analysis. He has won the second prize of the Science and Technology Progress Award from the Beijing Municipal Science and Technology Award.
In this paper, he is responsible for designing the framework and analyzing the literature.
E-mail: anquanip@cnic.cn
基金资助:
国家重点研发计划(2023YFC3304704);中国科学院网络安全和信息化专项(CAS-WX2022GC-04);中国科学院青年创新促进会项目(2022170)

A Review of the Research and Application of Privacy Risk Assessment Techniques for Generative Models Based on Inference Attacks

ZHANG Ninghui^1,²(),LONG Chun^1,^*(),WAN Wei¹,LI Jing¹,YANG Fan¹,WEI Jinxia¹,FU Yuhao¹

1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
2. University of Chinese Academy of Sciences, Beijing 100190, China

Received:2025-04-28 Online:2025-06-20 Published:2025-06-25

摘要/Abstract

摘要：

【目的】系统梳理生成模型中基于推理攻击的隐私风险评估技术研究进展与应用现状。【文献范围】本文调研了2015年至2024年主流会议与期刊的70余篇文献。【方法】在技术维度下以黑盒与白盒条件假设为核心分类依据，在黑盒与白盒条件假设下又具体到每类生成模型的攻击方法进行细分总结，而应用维度下则聚焦于合成数据的隐私风险评估框架方案比较。【结果】现有攻击技术研究较为完备，但其与模型种类耦合度较高且在黑盒场景下受限于准确率，导致实际应用中合成数据隐私风险的评估框架在通用性和准确性等方面存在局限。【结论】本文与当前同方向综述相比首次归纳大语言模型成员推理攻击的最新成果，同时对比分析了当前最新的合成数据隐私风险评估框架。通过技术-应用双维度总结分析为研究者在该方向上提供有价值的参考和指导。

关键词: 生成模型, 成员推理攻击, 属性推理攻击, 隐私风险评估

Abstract:

[Objective] To systematically sort out the research progress and application status of privacy risk assessment techniques for generative models based on inference attacks. [Literature Scope] this paper has surveyed more than 70 pieces of literature from mainstream conferences and journals between 2015 and 2024. [Methods] From the technical dimension, the core classification basis is the assumptions of black-box and white-box conditions. Under the assumptions of black-box and white-box conditions, a detailed summary is made by further classifying the attack methods for each type of generative model. From the application dimension, the focus is on the comparison of privacy risk assessment framework solutions for synthetic data. [Results] The existing research on attack technologies is relatively complete. However, it has a high degree of coupling with the types of models and is limited by the accuracy rate in black-box scenarios, resulting in limitations in terms of universality and accuracy of the assessment framework for the privacy risks of synthetic data in practical applications. [Conclusion] Compared with current reviews in the same research direction, this paper for the first time summarizes the latest achievements of membership inference attacks on large language models and simultaneously conducts a comparative analysis of the current latest privacy risk assessment frameworks for synthetic data. Through a summary and analysis from both dimensions of technology and application, it provides valuable references and guidance for researchers in this direction.

Key words: generative model, membership inference attack, attribute inference attack, privacy risk assessment

张宁徽, 龙春, 万巍, 李婧, 杨帆, 魏金侠, 付豫豪. 基于推理攻击的生成模型隐私风险评估技术研究与应用综述[J]. 数据与计算发展前沿, 2025, 7(3): 48-66.

ZHANG Ninghui, LONG Chun, WAN Wei, LI Jing, YANG Fan, WEI Jinxia, FU Yuhao. A Review of the Research and Application of Privacy Risk Assessment Techniques for Generative Models Based on Inference Attacks[J]. Frontiers of Data and Computing, 2025, 7(3): 48-66, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2025.03.005.

图/表 12

图1

图2

图3

图4

图5

图6

表1

图7

图8

表2

生成模型成员推理攻击方案比较"

攻击设置	目标模型	代表方法	攻击手段	核心技术	攻击效率	攻击准确率
黑盒	GAN	Hayes^[30]	判别器评分排序	利用判别器输出置信度排序，选择前一半作为成员	中	低
		Hilprecht^[44]	蒙特卡罗积分攻击	结合PCA和欧氏距离计算邻域生成样本，近似成员概率	低	中
		Chen^[45]	重建误差的评分优化	部分场景利用黑盒优化改进攻击效果	高	高
	扩散模型	Hu^[46]	基于损失与似然的攻击	利用低噪声扩散步骤的损失值或样本似然值推理成员资格	中	高
		Duan^[47]	后验估计匹配度	假设成员样本的逆向扩散误差更小	中	高
		Dubinski^[48]	基于在损失基础上的扩散过程研究	修改扩散过程提取成员信息	中	高
		Fu^[49]	基于记忆效应的攻击	检测成员记录周围的概率分布变化来进行成员推理	中	高
		Li^[50]	基于模型API变体攻击	利用不同扩散步骤下图像的重建能力差异	高	中
	多类别生成模型	Zhang^[51]	泛化成员推理攻击	利用生成样本与非成员辅助数据训练二分类器	高	高
	大语言模型	Duan^[52]	基于数据分布偏移的盲攻击	提取文本日期信息，通过时间阈值区分成员	高	中
		Fu^[54]	基于概率分布极值检测	通过对称复述文本计算概率变化	中	高
		Gali^[55]	噪声邻居对比	生成噪声邻居，比较困惑度差异	高	中
		Mozaffari^[56]	掩码扰动对比	掩码生成扰动文本，量化语义距离与损失差异	中	高
白盒	GAN	Hayes^[30]	基于判别器概率阈值	直接利用判别器输出概率判断成员资格	高	高
		Chen^[45]	基于拟牛顿优化的攻击	利用生成器和判别器的内部参数优化攻击	高	高
		Azadmanesh^[60]	基于隶属度攻击	利用模型类型和训练配置评估	中	高
	VAE	Hilprecht^[44]	重构攻击	计算目标样本的重构误差，训练样本误差更低	高	高
	扩散模型	Matsumoto^[61]	基于损失函数拟合度	利用扩散模型对训练数据的损失值差异	中	中
	扩散模型	Pang^[62]	基于模型梯度的攻击	分析模型梯度对不同样本的响应差异	中	高

表2

表3

表4

参考文献 74

[1]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.
[2]	DEVLIN J, CHANG W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics:human language technologies, volume 1 (long and short papers), 2019: 4171-4186.
[3]	李永红, 汪盈, 李腊全, 等. 一种改进的特征选择算法在邮件过滤中的应用[J]. 计算机科学, 2022, 49(S2): 740-744.
[4]	BENGIO Y, LECUN Y, HINTON G. Deep learning for AI[J]. Communications of the ACM, 2021, 64(7): 58-65.
[5]	CHEN X, CHO H, DOU Y, et al. Predicting future earnings changes using machine learning and detailed financial data[J]. Journal of Accounting Research, 2022, 60(2): 467-515.
[6]	NEWTON M, SWEENEY L, MALIN B. Preserving privacy by de-identifying face images[J]. IEEE transactions on Knowledge and Data Engineering, 2005, 17(2): 232-243.
[7]	SUN Y, LIU J, YU K, et al. PMRSS: Privacy-preserving medical record searching scheme for intelligent diagnosis in IoT healthcare[J]. IEEE Transactions on Industrial Informatics, 2021, 18(3): 1981-1990.
[8]	CARLINI N, LIU C, ERLINGSSON Ú, et al. The secret sharer: Evaluating and testing unintended memorization in neural networks[C]// 28th USENIX security symposium (USENIX security 19), 2019: 267-284.
[9]	SONG C, RISTENPART T, SHMATIKOV V. Machine learning models that remember too much[C]// Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security, 2017: 587-601.
[10]	ZHANG C, BENGIO S, HARDT M, et al. Understanding deep learning (still) requires rethinking generalization[J]. Communications of the ACM, 2021, 64(3): 107-115.
[11]	TRAMÈR F, ZHANG F, JUELS A, et al. Stealing machine learning models via prediction {APIs}[C]// 25th USENIX security symposium (USENIX Security 16), 2016: 601-618.
[12]	ZHOU J, CHEN Y, SHEN C, et al. Property inference attacks against GANs[J]. arXiv preprint arXiv:2111.07608, 2021.
[13]	HU L, YAN A, YAN H, et al. Defenses to membership inference attacks: A survey[J]. ACM Computing Surveys, 2023, 56(4): 1-34.
[14]	丁红发. 理性隐私保护模型及应用[D]. 贵州大学, 2019. DOI:10.27047/d.cnki.ggudu.2019.000036.
[15]	刘睿瑄, 陈红, 郭若杨, 等. 机器学习中的隐私攻击与防御[J]. 软件学报, 2020, 31(3): 866-892.DOI:10.13328/j.cnki.jos.005904.
[16]	RIGAKI M, GARCIA S. A survey of privacy attacks in machine learning[J]. ACM Computing Surveys, 2023, 56(4): 1-34.
[17]	任奎, 孟泉润, 闫守琨, 等. 人工智能模型数据泄露的攻击与防御研究综述[J]. 网络与信息安全学报, 2021, 7(1): 1-10.
[18]	邵国松, 黄琪. 人工智能中的隐私保护问题[J]. 现代传播(中国传媒大学学报), 2017, 39(12): 1-5.
[19]	FREDRIKSON M, JHA S, RISTENPART T. Model inversion attacks that exploit confidence information and basic countermeasures[C]// Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015: 1322-1333.
[20]	王璐璐, 张鹏, 闫峥, 等. 机器学习训练数据集的成员推理综述[J]. 网络空间安全, 2019, 10(10): 1-7.
[21]	王鹏焱. 机器学习中的成员推理攻击与防御研究[J]. 信息技术与网络安全, 2021, 40(8): 65-70+83.DOI:10.19358/j.issn.2096-5133.2021.08.011.
[22]	牛俊, 马骁骥, 陈颖, 等. 机器学习中成员推理攻击和防御研究综述[J]. 信息安全学报, 2022, 7(6): 1-30.DOI:10.19363/J.cnki.cn10-1380/tn.2022.11.01.
[23]	BAI Y, CHEN T, FAN M. A survey on membership inference attacks against machine learning[J]. management, 2021, 6: 14.
[24]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[25]	CHEN Y, LIU J, PENG L, et al. Auto-encoding variational Bayes[J]. Cambridge Explorations in Arts and Sciences, 2024, 2(1): 1-12.
[26]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.
[27]	HOMER N, SZELINGER S, REDMAN M, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays[J]. PLoS genetics, 2008, 4(8): e1000167.
[28]	SHOKRI R, STRONATI M, SONG C, et al. Membership inference attacks against machine learning models[C]// 2017 IEEE symposium on security and privacy (SP), IEEE, 2017: 3-18.
[29]	GUPTA U, STRIPELIS D, LAM P K, et al. Membership inference attacks on deep regression models for neuroimaging[C]// Proceedings of the Medical Imaging with Deep Learning Conference, 2021: 228-251.
[30]	HAYES J, MELIS L, DANEZIS G, et al. Logan: Membership inference attacks against generative models[J]. arXiv preprint arXiv: 1705.07663, 2017.
[31]	CHEN J, ZHANG J, ZHAO Y, et al. Beyond model-level membership privacy leakage: an adversarial approach in federated learning[C]// 2020 29th International Conference on Computer Communications and Networks (ICCCN), IEEE, 2020: 1-9.
[32]	CHI X, ZHANG X, WANG Y, et al. Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought[J]. arXiv preprint arXiv: 2405.07018, 2024.
[33]	LIU Y, WANG C, PENG K, et al. SocInf: Membership Inference Attacks on Social Media Health Data with Machine Learning[J]. IEEE Transactions on Computational Social Systems, 2019, 6(5): 907-921.
[34]	WANG Y, HUANG L, YU S, et al. Membership inference attacks on knowledge graphs[J]. arXiv preprint arXiv:2104.08273, 2021.
[35]	TSENG C, KAO T, LEE H. Membership inference attacks against self-supervised speech models[J]. arXiv preprint arXiv:2111.05113, 2021.
[36]	TABASSI E, BURNS K J, HADJIMICHAEL M, et al. A taxonomy and terminology of adversarial machine learning[J]. National Institute of Standards and Technology, 2019, 2019: 1-29.
[37]	EUROPEAN PARLIAMENT AND COUNCIL. Regulation (EU) 2016/679: General Data Protection Regulation (GDPR)[J] Official Journal of the European Union, 2016, L119: 1-88.
[38]	ATENIESE G, MANCINI L, SPOGNARDI A, et al. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers[J]. International Journal of Security and Networks, 2015, 10(3): 137-150.
[39]	GIOMI M, BOENISCH F, WEHMEYER C, et al. A unified framework for quantifying privacy risk in synthetic data[J]. arXiv preprint arXiv:2211.10459, 2022.
[40]	STADLER T, OPRISANU B, TRONCOSO C. Synthetic data-anonymisation groundhog day[C]// 31st USENIX Security Symposium (USENIX Security 22), 2022: 1451-1468.
[41]	彭长根, 丁红发, 朱义杰, 等. 隐私保护的信息熵模型及其度量方法[J]. 软件学报, 2016, 27(8): 1891-1903. DOI:10.13328/j.cnki.jos.005096.
[42]	CLAUß S, SCHIFFNER S. Structuring anonymity metrics[C]// Proceedings of the second ACM workshop on Digital identity management, 2006: 55-62.
[43]	DIAZ C, SEYS S, CLAESSENS J, et al. Towards measuring anonymity[C]// International Workshop on Privacy Enhancing Technologies, Berlin, Heidelberg: Springer Berlin Heidelberg, 2002: 54-68.
[44]	HILPRECHT B, HÄRTERICH M, BERNAU D. Monte carlo and reconstruction membership inference attacks against generative models[J]. Proceedings on Privacy Enhancing Technologies, 2019: 232-249.
[45]	CHEN D, YU N, ZHANG Y, et al. Gan-leaks: A taxonomy of membership inference attacks against generative models[C]// Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 2020: 343-362.
[46]	HU H, PANG J. Membership inference of diffusion models[J]. arXiv preprint arXiv:2301.09956, 2023.
[47]	DUAN J, KONG F, WANG S, et al. Are diffusion models vulnerable to membership inference attacks?[C]// International Conference on Machine Learning, ICML, 2023: 8717-8730.
[48]	DUBIŃSKI J, KOWALCZUK A, PAWLAK S, et al. Towards more realistic membership inference attacks on large diffusion models[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 4860-4869.
[49]	FU W, WANG H, GAO C, et al. A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models[J]. arXiv e-prints, 2023: arXiv:2308.121 43.
[50]	LI J, DONG J, HE T, et al. Towards Black-Box Membership Inference Attack for Diffusion Models[J]. arXiv preprint arXiv:2405.20771, 2024.
[51]	ZHANG M, YU N, WEN R, et al. Generated distributions are all you need for membership inference attacks against generative models[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 4839-4849.
[52]	DUAN M, SURI A, MIRESHGHALLAH N, et al. Do membership inference attacks work on large language models?[J]. arXiv preprint arXiv:2402.07841, 2024.
[53]	DAS D, ZHANG J, TRAMÈR F. Blind baselines beat membership inference attacks for foundation models[J]. arXiv preprint arXiv:2406.16201, 2024.
[54]	FU W, WANG H, GAO C, et al. Practical membership inference attacks against fine-tuned large language models via self-prompt calibration[J]. arXiv preprint arXiv:2311.06062, 2023.
[55]	GALLI F, MELIS L, CUCINOTTA T. Noisy Neighbors: Efficient membership inference attacks against LLMs[J]. arXiv preprint arXiv:2406.16565, 2024.
[56]	MOZAFFARI H, MARATHE J. Semantic Membership Inference Attack against Large Language Models[J]. arXiv preprint arXiv:2406.10218, 2024.
[57]	MAINI P, JIA H, PAPERNOT N, et al. LLM Dataset Inference: Did you train on my dataset?[J]. Advances in Neural Information Processing Systems, 2024, 37: 124069-124092.
[58]	PUERTO H, GUBRI M, YUN S, et al. Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models[J]. arXiv preprint arXiv:2411.00154, 2024.
[59]	HU H, PANG J. Membership inference attacks against GANs by leveraging over-representation regions[C]// Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021: 2387-2389.
[60]	AZADMANESH M, GHAHFAROKHI S, TALOUKI A. A white-box generator membership inference attack against generative models[C]// 2021 18th International ISC Conference on Information Security and Cryptology (ISCISC), IEEE, 2021: 13-17.
[61]	MATSUMOTO T, MIURA T, YANAI N. Membership inference attacks against diffusion models[C]// 2023 IEEE Security and Privacy Workshops (SPW), IEEE, 2023: 77-83.
[62]	PANG Y, WANG T, KANG X, et al. White-box membership inference attacks against diffusion models[J]. arXiv preprint arXiv:2308.06405, 2023.
[63]	ZHOU J, CHEN Y, SHEN C, et al. Property inference attacks against GANs[J]. arXiv preprint arXiv:2111.07608, 2021.
[64]	WANG L, WANG J, WAN J, et al. Property existence inference against generative models[C]// 33rd USENIX Security Symposium (USENIX Security 24), 2024: 2423-2440.
[65]	ZHANG Z, LEI L, WU L, et al. SafetyBench: Evaluating the safety of large language models[J]. arXiv preprint arXiv:2309.07045, 2023.
[66]	YUAN X, LI J, WANG D, et al. S-eval: Automatic and adaptive test generation for benchmarking safety evaluation of large language models[J]. arXiv preprint arXiv:2405.14191, 2024.
[67]	HOUSSIAU F, JORDON J, COHEN N, et al. Tapas: a toolbox for adversarial privacy auditing of synthetic data[J]. arXiv preprint arXiv:2211.06550, 2022.
[68]	VAN BREUGEL B, SUN H, QIAN Z, et al. Membership inference attacks against synthetic data through overfitting detection[J]. arXiv preprint arXiv:2302.12580, 2023.
[69]	ANNAMALAI S, GANEV G, DE CRISTOFARO E. "What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation[J]. arXiv preprint arXiv:2405.10994, 2024.
[70]	NASR M, HAYES J, STEINKE T, et al. Tight auditing of differentially private machine learning[C]// 32nd USENIX Security Symposium (USENIX Security 23), 2023: 1631-1648.
[71]	GANEV G, ANNAMALAI S, DE CRISTOFARO E. The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing, Debugging[J]. arXiv preprint arXiv:2406.13985, 2024.
[72]	JORDON J, YOON J, VAN M. PATE-GAN: Generating synthetic data with differential privacy guarantees[C]// International conference on learning representations, 2018: 1-15.
[73]	PEI-HSUAN L, PANG-CHIEH W, CHIA-MU Y. Empirical evaluation on synthetic data generation with generative adversarial network[C]// Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, 2019: 1-6.
[74]	SUNDARAM M, GADOTTI A, ROCHER L. A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data[J]. arXiv e-prints, 2023: arXiv: 2301.10053.

评估类型	核心思想	技术手段	适用场景	优势	局限性	参考文献
基于隐私攻击	模拟敌手能力量化模型信息记忆程度	成员及属性推理攻击	合成数据隐私风险评估	实证性高；可量化泄露风险	依赖攻击假设条件；计算开销较大	[39][40]
基于效用评价	平衡隐私保护与数据效用	隐私-效用权衡曲线计算；统计相似性检验	数据发布前评估；差分隐私参数调优	直观易解释；支持动态调整	主观性强难以覆盖所有风险	[14]
基于信息熵	用熵量化信息泄露程度	互信息计算；条件熵分析；匿名集合熵	匿名化方案评估；隐私保护机制设计	理论严谨支持形式化证明；计算复杂度高	计算复杂度高；实际关联性难建立	[41][42][43]

比较维度	属性存在性推理^[63]	属性比例推理^[64]
低比例场景误差??	低	高
生成样本利用率??	高	低
计算成本??	高	低
适用范围	GAN模型及其变体	多类生成模型
攻击角度	存在性推理	比例推理

评估框架/方法	支持攻击类型	是否需要参考数据集	计算开销	适用场景	参考文献
DOMIAS	MIA	是	高	图像及表格数据	[68]
TAPAS	MIA/AIA	否	中	图像及表格数据	[67]
Grondhog	MIA/AIA	否	高	表格数据	[40]
Anonymeter	MIA/AIA	是	低	表格数据	[39]
TADP	MIA	是	高	表格数据	[69]
LRattack	AIA	是	高	表格数据	[74]

基于推理攻击的生成模型隐私风险评估技术研究与应用综述

A Review of the Research and Application of Privacy Risk Assessment Techniques for Generative Models Based on Inference Attacks

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 74

相关文章 1

编辑推荐

Metrics

本文评价