数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (3): 48-66.
CSTR: 32002.14.jfdc.CN10-1649/TP.2025.03.005
doi: 10.11871/jfdc.issn.2096-742X.2025.03.005
• 专刊:中国科学院计算机网络信息中心成立30周年 • 上一篇 下一篇
张宁徽1,2(),龙春1,*(
),万巍1,李婧1,杨帆1,魏金侠1,付豫豪1
收稿日期:
2025-04-28
出版日期:
2025-06-20
发布日期:
2025-06-25
通讯作者:
*龙春(E-mail: 作者简介:
张宁徽,中国科学院计算机网络信息中心,硕士研究生,主要研究方向为数据安全与隐私保护。基金资助:
ZHANG Ninghui1,2(),LONG Chun1,*(
),WAN Wei1,LI Jing1,YANG Fan1,WEI Jinxia1,FU Yuhao1
Received:
2025-04-28
Online:
2025-06-20
Published:
2025-06-25
摘要:
【目的】系统梳理生成模型中基于推理攻击的隐私风险评估技术研究进展与应用现状。【文献范围】本文调研了2015年至2024年主流会议与期刊的70余篇文献。【方法】在技术维度下以黑盒与白盒条件假设为核心分类依据,在黑盒与白盒条件假设下又具体到每类生成模型的攻击方法进行细分总结,而应用维度下则聚焦于合成数据的隐私风险评估框架方案比较。【结果】现有攻击技术研究较为完备,但其与模型种类耦合度较高且在黑盒场景下受限于准确率,导致实际应用中合成数据隐私风险的评估框架在通用性和准确性等方面存在局限。【结论】本文与当前同方向综述相比首次归纳大语言模型成员推理攻击的最新成果,同时对比分析了当前最新的合成数据隐私风险评估框架。通过技术-应用双维度总结分析为研究者在该方向上提供有价值的参考和指导。
张宁徽, 龙春, 万巍, 李婧, 杨帆, 魏金侠, 付豫豪. 基于推理攻击的生成模型隐私风险评估技术研究与应用综述[J]. 数据与计算发展前沿, 2025, 7(3): 48-66.
ZHANG Ninghui, LONG Chun, WAN Wei, LI Jing, YANG Fan, WEI Jinxia, FU Yuhao. A Review of the Research and Application of Privacy Risk Assessment Techniques for Generative Models Based on Inference Attacks[J]. Frontiers of Data and Computing, 2025, 7(3): 48-66, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2025.03.005.
表1
隐私风险评估种类比较"
评估类型 | 核心思想 | 技术手段 | 适用场景 | 优势 | 局限性 | 参考文献 |
---|---|---|---|---|---|---|
基于隐 私攻击 | 模拟敌手能力量化 模型信息记忆程度 | 成员及属性推理攻击 | 合成数据隐私风险 评估 | 实证性高;可量 化泄露风险 | 依赖攻击假设条 件;计算开销较大 | [ |
基于效 用评价 | 平衡隐私保护 与数据效用 | 隐私-效用权衡曲线计 算;统计相似性检验 | 数据发布前评估; 差分隐私参数调优 | 直观易解释; 支持动态调整 | 主观性强难以覆 盖所有风险 | [ |
基于信 息熵 | 用熵量化信息 泄露程度 | 互信息计算;条件熵 分析;匿名集合熵 | 匿名化方案评估; 隐私保护机制设计 | 理论严谨支持形式化 证明;计算复杂度高 | 计算复杂度高;实 际关联性难建立 | [ |
表2
生成模型成员推理攻击方案比较"
攻击设置 | 目标模型 | 代表方法 | 攻击手段 | 核心技术 | 攻击效率 | 攻击准确率 |
---|---|---|---|---|---|---|
黑盒 | GAN | Hayes[ | 判别器评分排序 | 利用判别器输出置信度排序, 选择前一半作为成员 | 中 | 低 |
Hilprecht[ | 蒙特卡罗积分攻击 | 结合PCA和欧氏距离计算邻域 生成样本,近似成员概率 | 低 | 中 | ||
Chen[ | 重建误差的评分优化 | 部分场景利用黑盒优化改进攻 击效果 | 高 | 高 | ||
扩散模型 | Hu[ | 基于损失与似然的攻击 | 利用低噪声扩散步骤的损失值 或样本似然值推理成员资格 | 中 | 高 | |
Duan[ | 后验估计匹配度 | 假设成员样本的逆向扩散误差更小 | 中 | 高 | ||
Dubinski[ | 基于在损失基础上的扩 散过程研究 | 修改扩散过程提取成员信息 | 中 | 高 | ||
Fu[ | 基于记忆效应的攻击 | 检测成员记录周围的概率分布 变化来进行成员推理 | 中 | 高 | ||
Li[ | 基于模型API变体攻击 | 利用不同扩散步骤下图像的重 建能力差异 | 高 | 中 | ||
多类别生成模型 | Zhang[ | 泛化成员推理攻击 | 利用生成样本与非成员辅助数据 训练二分类器 | 高 | 高 | |
大语言模型 | Duan[ | 基于数据分布偏移的盲攻击 | 提取文本日期信息,通过时间阈 值区分成员 | 高 | 中 | |
Fu[ | 基于概率分布极值检测 | 通过对称复述文本计算概率变化 | 中 | 高 | ||
Gali[ | 噪声邻居对比 | 生成噪声邻居,比较困惑度差异 | 高 | 中 | ||
Mozaffari[ | 掩码扰动对比 | 掩码生成扰动文本,量化语义距 离与损失差异 | 中 | 高 | ||
白盒 | GAN | Hayes[ | 基于判别器概率阈值 | 直接利用判别器输出概率判断 成员资格 | 高 | 高 |
Chen[ | 基于拟牛顿优化的攻击 | 利用生成器和判别器的内部参数 优化攻击 | 高 | 高 | ||
Azadmanesh[ | 基于隶属度攻击 | 利用模型类型和训练配置评估 | 中 | 高 | ||
VAE | Hilprecht[ | 重构攻击 | 计算目标样本的重构误差,训练 样本误差更低 | 高 | 高 | |
扩散模型 | Matsumoto[ | 基于损失函数拟合度 | 利用扩散模型对训练数据的损失 值差异 | 中 | 中 | |
Pang[ | 基于模型梯度的攻击 | 分析模型梯度对不同样本的 响应差异 | 中 | 高 |
[1] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778. |
[2] | DEVLIN J, CHANG W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics:human language technologies, volume 1 (long and short papers), 2019: 4171-4186. |
[3] | 李永红, 汪盈, 李腊全, 等. 一种改进的特征选择算法在邮件过滤中的应用[J]. 计算机科学, 2022, 49(S2): 740-744. |
[4] | BENGIO Y, LECUN Y, HINTON G. Deep learning for AI[J]. Communications of the ACM, 2021, 64(7): 58-65. |
[5] | CHEN X, CHO H, DOU Y, et al. Predicting future earnings changes using machine learning and detailed financial data[J]. Journal of Accounting Research, 2022, 60(2): 467-515. |
[6] | NEWTON M, SWEENEY L, MALIN B. Preserving privacy by de-identifying face images[J]. IEEE transactions on Knowledge and Data Engineering, 2005, 17(2): 232-243. |
[7] | SUN Y, LIU J, YU K, et al. PMRSS: Privacy-preserving medical record searching scheme for intelligent diagnosis in IoT healthcare[J]. IEEE Transactions on Industrial Informatics, 2021, 18(3): 1981-1990. |
[8] | CARLINI N, LIU C, ERLINGSSON Ú, et al. The secret sharer: Evaluating and testing unintended memorization in neural networks[C]// 28th USENIX security symposium (USENIX security 19), 2019: 267-284. |
[9] | SONG C, RISTENPART T, SHMATIKOV V. Machine learning models that remember too much[C]// Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security, 2017: 587-601. |
[10] | ZHANG C, BENGIO S, HARDT M, et al. Understanding deep learning (still) requires rethinking generalization[J]. Communications of the ACM, 2021, 64(3): 107-115. |
[11] | TRAMÈR F, ZHANG F, JUELS A, et al. Stealing machine learning models via prediction {APIs}[C]// 25th USENIX security symposium (USENIX Security 16), 2016: 601-618. |
[12] | ZHOU J, CHEN Y, SHEN C, et al. Property inference attacks against GANs[J]. arXiv preprint arXiv:2111.07608, 2021. |
[13] | HU L, YAN A, YAN H, et al. Defenses to membership inference attacks: A survey[J]. ACM Computing Surveys, 2023, 56(4): 1-34. |
[14] | 丁红发. 理性隐私保护模型及应用[D]. 贵州大学, 2019. DOI:10.27047/d.cnki.ggudu.2019.000036. |
[15] | 刘睿瑄, 陈红, 郭若杨, 等. 机器学习中的隐私攻击与防御[J]. 软件学报, 2020, 31(3): 866-892.DOI:10.13328/j.cnki.jos.005904. |
[16] | RIGAKI M, GARCIA S. A survey of privacy attacks in machine learning[J]. ACM Computing Surveys, 2023, 56(4): 1-34. |
[17] | 任奎, 孟泉润, 闫守琨, 等. 人工智能模型数据泄露的攻击与防御研究综述[J]. 网络与信息安全学报, 2021, 7(1): 1-10. |
[18] | 邵国松, 黄琪. 人工智能中的隐私保护问题[J]. 现代传播(中国传媒大学学报), 2017, 39(12): 1-5. |
[19] | FREDRIKSON M, JHA S, RISTENPART T. Model inversion attacks that exploit confidence information and basic countermeasures[C]// Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015: 1322-1333. |
[20] | 王璐璐, 张鹏, 闫峥, 等. 机器学习训练数据集的成员推理综述[J]. 网络空间安全, 2019, 10(10): 1-7. |
[21] | 王鹏焱. 机器学习中的成员推理攻击与防御研究[J]. 信息技术与网络安全, 2021, 40(8): 65-70+83.DOI:10.19358/j.issn.2096-5133.2021.08.011. |
[22] | 牛俊, 马骁骥, 陈颖, 等. 机器学习中成员推理攻击和防御研究综述[J]. 信息安全学报, 2022, 7(6): 1-30.DOI:10.19363/J.cnki.cn10-1380/tn.2022.11.01. |
[23] | BAI Y, CHEN T, FAN M. A survey on membership inference attacks against machine learning[J]. management, 2021, 6: 14. |
[24] | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. |
[25] | CHEN Y, LIU J, PENG L, et al. Auto-encoding variational Bayes[J]. Cambridge Explorations in Arts and Sciences, 2024, 2(1): 1-12. |
[26] | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851. |
[27] | HOMER N, SZELINGER S, REDMAN M, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays[J]. PLoS genetics, 2008, 4(8): e1000167. |
[28] | SHOKRI R, STRONATI M, SONG C, et al. Membership inference attacks against machine learning models[C]// 2017 IEEE symposium on security and privacy (SP), IEEE, 2017: 3-18. |
[29] | GUPTA U, STRIPELIS D, LAM P K, et al. Membership inference attacks on deep regression models for neuroimaging[C]// Proceedings of the Medical Imaging with Deep Learning Conference, 2021: 228-251. |
[30] | HAYES J, MELIS L, DANEZIS G, et al. Logan: Membership inference attacks against generative models[J]. arXiv preprint arXiv: 1705.07663, 2017. |
[31] | CHEN J, ZHANG J, ZHAO Y, et al. Beyond model-level membership privacy leakage: an adversarial approach in federated learning[C]// 2020 29th International Conference on Computer Communications and Networks (ICCCN), IEEE, 2020: 1-9. |
[32] | CHI X, ZHANG X, WANG Y, et al. Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought[J]. arXiv preprint arXiv: 2405.07018, 2024. |
[33] | LIU Y, WANG C, PENG K, et al. SocInf: Membership Inference Attacks on Social Media Health Data with Machine Learning[J]. IEEE Transactions on Computational Social Systems, 2019, 6(5): 907-921. |
[34] | WANG Y, HUANG L, YU S, et al. Membership inference attacks on knowledge graphs[J]. arXiv preprint arXiv:2104.08273, 2021. |
[35] | TSENG C, KAO T, LEE H. Membership inference attacks against self-supervised speech models[J]. arXiv preprint arXiv:2111.05113, 2021. |
[36] | TABASSI E, BURNS K J, HADJIMICHAEL M, et al. A taxonomy and terminology of adversarial machine learning[J]. National Institute of Standards and Technology, 2019, 2019: 1-29. |
[37] | EUROPEAN PARLIAMENT AND COUNCIL. Regulation (EU) 2016/679: General Data Protection Regulation (GDPR)[J] Official Journal of the European Union, 2016, L119: 1-88. |
[38] | ATENIESE G, MANCINI L, SPOGNARDI A, et al. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers[J]. International Journal of Security and Networks, 2015, 10(3): 137-150. |
[39] | GIOMI M, BOENISCH F, WEHMEYER C, et al. A unified framework for quantifying privacy risk in synthetic data[J]. arXiv preprint arXiv:2211.10459, 2022. |
[40] | STADLER T, OPRISANU B, TRONCOSO C. Synthetic data-anonymisation groundhog day[C]// 31st USENIX Security Symposium (USENIX Security 22), 2022: 1451-1468. |
[41] | 彭长根, 丁红发, 朱义杰, 等. 隐私保护的信息熵模型及其度量方法[J]. 软件学报, 2016, 27(8): 1891-1903. DOI:10.13328/j.cnki.jos.005096. |
[42] | CLAUß S, SCHIFFNER S. Structuring anonymity metrics[C]// Proceedings of the second ACM workshop on Digital identity management, 2006: 55-62. |
[43] | DIAZ C, SEYS S, CLAESSENS J, et al. Towards measuring anonymity[C]// International Workshop on Privacy Enhancing Technologies, Berlin, Heidelberg: Springer Berlin Heidelberg, 2002: 54-68. |
[44] | HILPRECHT B, HÄRTERICH M, BERNAU D. Monte carlo and reconstruction membership inference attacks against generative models[J]. Proceedings on Privacy Enhancing Technologies, 2019: 232-249. |
[45] | CHEN D, YU N, ZHANG Y, et al. Gan-leaks: A taxonomy of membership inference attacks against generative models[C]// Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 2020: 343-362. |
[46] | HU H, PANG J. Membership inference of diffusion models[J]. arXiv preprint arXiv:2301.09956, 2023. |
[47] | DUAN J, KONG F, WANG S, et al. Are diffusion models vulnerable to membership inference attacks?[C]// International Conference on Machine Learning, ICML, 2023: 8717-8730. |
[48] | DUBIŃSKI J, KOWALCZUK A, PAWLAK S, et al. Towards more realistic membership inference attacks on large diffusion models[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 4860-4869. |
[49] | FU W, WANG H, GAO C, et al. A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models[J]. arXiv e-prints, 2023: arXiv:2308.121 43. |
[50] | LI J, DONG J, HE T, et al. Towards Black-Box Membership Inference Attack for Diffusion Models[J]. arXiv preprint arXiv:2405.20771, 2024. |
[51] | ZHANG M, YU N, WEN R, et al. Generated distributions are all you need for membership inference attacks against generative models[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 4839-4849. |
[52] | DUAN M, SURI A, MIRESHGHALLAH N, et al. Do membership inference attacks work on large language models?[J]. arXiv preprint arXiv:2402.07841, 2024. |
[53] | DAS D, ZHANG J, TRAMÈR F. Blind baselines beat membership inference attacks for foundation models[J]. arXiv preprint arXiv:2406.16201, 2024. |
[54] | FU W, WANG H, GAO C, et al. Practical membership inference attacks against fine-tuned large language models via self-prompt calibration[J]. arXiv preprint arXiv:2311.06062, 2023. |
[55] | GALLI F, MELIS L, CUCINOTTA T. Noisy Neighbors: Efficient membership inference attacks against LLMs[J]. arXiv preprint arXiv:2406.16565, 2024. |
[56] | MOZAFFARI H, MARATHE J. Semantic Membership Inference Attack against Large Language Models[J]. arXiv preprint arXiv:2406.10218, 2024. |
[57] | MAINI P, JIA H, PAPERNOT N, et al. LLM Dataset Inference: Did you train on my dataset?[J]. Advances in Neural Information Processing Systems, 2024, 37: 124069-124092. |
[58] | PUERTO H, GUBRI M, YUN S, et al. Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models[J]. arXiv preprint arXiv:2411.00154, 2024. |
[59] | HU H, PANG J. Membership inference attacks against GANs by leveraging over-representation regions[C]// Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021: 2387-2389. |
[60] | AZADMANESH M, GHAHFAROKHI S, TALOUKI A. A white-box generator membership inference attack against generative models[C]// 2021 18th International ISC Conference on Information Security and Cryptology (ISCISC), IEEE, 2021: 13-17. |
[61] | MATSUMOTO T, MIURA T, YANAI N. Membership inference attacks against diffusion models[C]// 2023 IEEE Security and Privacy Workshops (SPW), IEEE, 2023: 77-83. |
[62] | PANG Y, WANG T, KANG X, et al. White-box membership inference attacks against diffusion models[J]. arXiv preprint arXiv:2308.06405, 2023. |
[63] | ZHOU J, CHEN Y, SHEN C, et al. Property inference attacks against GANs[J]. arXiv preprint arXiv:2111.07608, 2021. |
[64] | WANG L, WANG J, WAN J, et al. Property existence inference against generative models[C]// 33rd USENIX Security Symposium (USENIX Security 24), 2024: 2423-2440. |
[65] | ZHANG Z, LEI L, WU L, et al. SafetyBench: Evaluating the safety of large language models[J]. arXiv preprint arXiv:2309.07045, 2023. |
[66] | YUAN X, LI J, WANG D, et al. S-eval: Automatic and adaptive test generation for benchmarking safety evaluation of large language models[J]. arXiv preprint arXiv:2405.14191, 2024. |
[67] | HOUSSIAU F, JORDON J, COHEN N, et al. Tapas: a toolbox for adversarial privacy auditing of synthetic data[J]. arXiv preprint arXiv:2211.06550, 2022. |
[68] | VAN BREUGEL B, SUN H, QIAN Z, et al. Membership inference attacks against synthetic data through overfitting detection[J]. arXiv preprint arXiv:2302.12580, 2023. |
[69] | ANNAMALAI S, GANEV G, DE CRISTOFARO E. "What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation[J]. arXiv preprint arXiv:2405.10994, 2024. |
[70] | NASR M, HAYES J, STEINKE T, et al. Tight auditing of differentially private machine learning[C]// 32nd USENIX Security Symposium (USENIX Security 23), 2023: 1631-1648. |
[71] | GANEV G, ANNAMALAI S, DE CRISTOFARO E. The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing, Debugging[J]. arXiv preprint arXiv:2406.13985, 2024. |
[72] | JORDON J, YOON J, VAN M. PATE-GAN: Generating synthetic data with differential privacy guarantees[C]// International conference on learning representations, 2018: 1-15. |
[73] | PEI-HSUAN L, PANG-CHIEH W, CHIA-MU Y. Empirical evaluation on synthetic data generation with generative adversarial network[C]// Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, 2019: 1-6. |
[74] | SUNDARAM M, GADOTTI A, ROCHER L. A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data[J]. arXiv e-prints, 2023: arXiv: 2301.10053. |
[1] | 张圣林,林潇霏,孙永谦,张玉志,裴丹. 基于深度学习的无监督KPI异常检测[J]. 数据与计算发展前沿, 2020, 2(3): 87-100. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||