数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (4): 22-33.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.04.002

doi: 10.11871/jfdc.issn.2096-742X.2024.04.002

• 专刊:面向国家科学数据中心的基础软件栈及系统 • 上一篇    下一篇

面向“融合科学”新范式的科学数据跨中心可信共享技术框架

杨婧如1,2(),蔡华谦1,2,*(),杨勇1,2,李影3,刘佳4   

  1. 1.数据空间技术与系统全国重点实验室,北京 100091
    2.北京大学,计算机学院,北京 100871
    3.北京大学,软件与微电子学院,北京 100871
    4.中国科学院计算机网络信息中心,北京 100083
  • 收稿日期:2024-02-05 出版日期:2024-08-20 发布日期:2024-08-20
  • 通讯作者: *蔡华谦(E-mail: caihq@pku.edu.cn
  • 作者简介:杨婧如,数据空间技术与系统全国重点实验室,助理研究员,博士,CCF会员。主要研究方向为数据治理技术与系统、数据库系统等。
    本文主要承担工作为创新点的提出与文章的撰写。
    YANG Jingru, Ph.D., is a research associate of the National Key Laboratory of Dataspace Technology and System, and a member of CCF. Her main research interests include data governance technology and systems, intelligent database systems, etc.
    In this paper, she is responsible for proposing innovative points and writing the article.
    E-mail: okiyang@pku.edu.cn|蔡华谦,北京大学计算机学院,博士,副研究员,CCF会员。发表SCI/EI收录论文10余篇,发明专利40余项。主要研究方向为系统软件、分布式系统。
    本文主要承担工作为研究方向的提出与文章的修改。
    CAI Huaqian received his Ph.D. from the School of Computer Science, Peking University in 2018. He is an associate researcher at the School of Computer Science, Peking University. He is also a member of CCF. He has published over 10 SCI/EI indexed papers and holds more than 40 patents. His primary research interests include system software and distributed systems.
    In this paper, he is responsible for proposing research directions and revising the article.
    E-mail: caihq@pku.edu.cn
  • 基金资助:
    国家重点研发计划“面向国家科学数据中心的基础软件栈及系统”(2021YFF0704200)

A Technical Framework of Cross-Center Trusted Sharing of Scientific Data for the New Paradigm of Convergence Science

YANG Jingru1,2(),CAI Huaqian1,2,*(),YANG Yong1,2,LI Ying3,LIU Jia4   

  1. 1. National Key Laboratory of Dataspace Technology and System, Beijing 100091, China
    2. School of Computer Science, Peking University, Beijing 100871, China
    3. School of Software and Microelectronics, Peking University, Beijing 100871, China
    4. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2024-02-05 Online:2024-08-20 Published:2024-08-20

摘要:

【目的】 大数据催生了一种基于多学科数据融合解决重大科技问题的科研新范式,即“融合科学”新范式。科学数据跨学科、跨领域、跨机构的协同分析与应用成为了科学数据价值充分释放的重要方式,科学数据的跨中心可信共享成为科学数据中心建设的关键目标。【方法】 针对科学数据中心数据多源异构、数据量大、资源分散、专业性强、具有明确的知识产权等特点与挑战,本文提出科学数据跨中心可信共享技术框架,该框架包括科学数据建模与互操作方法、双标识融合解析、可信存证、数据确权与流转追溯等关键技术。【结果】 在跨越五个科学数据中心的数据共享场景下,验证了该框架的有效性。【结论】 为实现面向“融合科学”新范式的科学数据跨中心可信共享提供了一种可行技术路径。

关键词: 科学数据, 融合科学, 可信共享, 互操作, 标识解析

Abstract:

[Objective] The advent of big data has given rise to a new research paradigm, termed the "Convergence Science" paradigm, which addresses significant technological challenges through the fusion of multidisciplinary data. Collaborative analysis and application of scientific data across disciplines have become crucial for maximizing its value, making cross-center trustworthy data sharing a key issue in constructing scientific data centers. [Methods] Considering the characteristics and challenges of scientific data centers, including heterogeneous data from multiple sources, large volumes, dispersed resources, strong expertise, and clear intellectual property rights, this paper proposes a technical framework for cross-center trustworthy sharing of scientific data. This framework includes key technologies such as scientific data modeling and interoperability methods, dual-identifier fusion resolution, trustworthy storage and certification, and data ownership and circulation traceability. [Results] This framework's effectiveness has been confirmed in the context of data sharing spanning five scientific data centers. [Conclusions] This framework provides a feasible technical path for cross-center trusted sharing of scientific data in the new paradigm of convergence science.

Key words: scientific data, convergence science, trusted sharing, interoperability, identifier resolution