数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (6): 90-102.doi: 10.11871/jfdc.issn.2096-742X.2020.06.010

• 技术与应用 • 上一篇    

基于大数据语言实验平台的隐私安全研究

张婕,郭印   

  1. 青岛理工大学,人文与外国语学院, 山东 青岛 266520
  • 收稿日期:2020-10-23 出版日期:2020-12-20 发布日期:2020-12-29
  • 作者简介:张婕,青岛理工大学人文与外国语学院,实验师,硕士,研究方向为网络数据挖掘、数据隐私保护、网络数据安全等。先后参与国家互联网应急中心(CNCERT)研究专项、中国科学院PKI研究专项等科研项目以及中国互联网发展基金会等软课题,撰写互联网时间服务协议NTP通信行业标准三项。本文中负责整体撰写、实验分析技术部分。
    ZHANG Jie, master, is an experimentalist at the School of Humanities and Foreign Languages, Qingdao University of Technology. Her research areas are network data mining, data privacy protection, network data security, etc. She successively participated in research projects of the National Internet Emergency Center (CNCERT), the PKI Research Project of the Chinese Academy of Sciences, and other projects from the China Internet Development Foundation. She has written three communication standards of Network Time Protocol (NTP).In this paper, she is responsible for the overall writing and experimental analysis of the technical part.E-mail: 251153580@qq.com|郭印,青岛理工大学人文与外国语学院,副院长,博士,教授,硕士生导师,主要研究方向为计算语言学、认知语言学。文中负责总体统稿和初稿编辑。
    GUO Yin, Ph.D., is a professor and master supervisor of School of Humanities and Foreign Languages, Qingdao University of Technology. His recent research interest areas follow: computational linguistics and cognitive linguistics.In this paper, he is responsible for the final compilation and edition of the manuscript. E-mail: guoyenm@sina.com

Research on Privacy Security Based on Big Data Language Laboratory Platforms

ZHANG Jie*,GUO Yin   

  1. School of Humanities and Foreign Languages, Qingdao University of Technology, Qingdao, Shandong 266520, China
  • Received:2020-10-23 Online:2020-12-20 Published:2020-12-29

摘要:

【目的】基于大数据语言实验平台开展隐私安全研究,为相关领域隐私安全研究提供研究思路。【方法】对于大数据环境下语言实验平台而言,传统的隐私保护流程以及隐私保护技术已经无法满足大数据隐私安全的要求。本文重点分析语言实验平台建设和使用过程中存在的数据隐私泄露风险点,提出面向大数据语言实验平台的隐私安全技术框架,从数据流通的各个环节入手进行分析,提出技术解决方案,并以数据发布阶段提出随机可逆匿名化算法进行实验验证。【结果】针对大数据语言实验平台的隐私安全研究取得了相当的成果,提出的隐私安全技术框架可以为语言实验平台中提供隐私保护技术支持,能够在去除平台数据的隐私信息的前提下,仍保持数据可用性。【结论】关于大数据语言实验平台的隐私安全研究具有高度的科研价值和实用价值,但整体研究处在初级阶段,仍有空白领域需要探索研究。

关键词: 大数据隐私安全, 语言实验平台, 欧盟GDPR, 数据生命周期

Abstract:

[Objective] This paper studies the privacy security of big data language laboratory platforms and provides insights of research fields related to privacy security. [Methods] For the language laboratory platform in the big data environment, traditional privacy protection processes and technologies no longer meet the requirements of big data privacy and security. This paper focuses on analyzing the risks of data privacy leakage during the construction and implementation of the language experiment platform and proposes a big data privacy security technical framework for the language experiment platform. The article analyzes all aspects of data circulation, suggests technical solutions, and proposes a random reversible anonymization algorithm for experimental verification in the data release stage. [Results] Privacy security research on big data language experiment platforms has achieved considerable results. The proposed privacy security technology framework can maintain data availability with the privacy information of platform data removed. [Conclusions] The privacy security research on the big data language experiment platform is of scientific research and practical merits, but the overall research is still at the very early stage, and there are still many technical barriers that need to be crossed over.

Key words: big data privacy security, language laboratory platform, EU GDPR, data life cycle