数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (1): 1-14.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.01.001

doi: 10.11871/jfdc.issn.2096-742X.2023.01.001

• 专刊:科学数据资源、技术与政策联合专刊 • 上一篇    下一篇

科学数据中心基础软件栈架构研究与设计

王九龙(),王容昊(),王华进(),路长发*(),段军磊   

  1. 中国科学院计算机网络信息中心,北京 100083
  • 收稿日期:2022-12-05 出版日期:2023-02-20 发布日期:2023-02-20
  • 通讯作者: 路长发
  • 作者简介:王九龙,中国科学院计算机网络信息中心,大数据技术与应用发展部,博士后,主要研究方向为科学大数据分析方法与技术,大数据/人工智能技术在能源、环境等领域的交叉应用等研究。获得2021年中国科学院特别研究助理资助项目1项,主持中国博士后科学基金项目1项、中科院网信专项项目1项,重点参与国家“973”、国家重大科技专项、国家自然基金等项目6项,发表学术论文20余篇,其中第一作者/通讯作者SCI/EI论文11篇,申请国家专利8项,授权4项。
    在本文中,负责论文引言和第2章节“科学数据中心软件栈参考架构”内容的撰写和论文整体逻辑的整合。
    WANG Jiulong is a postdoctoral researcher at the Department of Bigdata Technology and Application Development, Com-puter Network Information Center, Chinese Academy of Scien-ces (CAS). His main research directions are scientific big data analysis methods and technologies, and the crossappli-cation of big data/AI technologies in energy, environment, and other fields. In 2021, he won one special research assistant fund project of the CAS, one postdoctoral science fund project of the CAS, and one special network information project of the CAS. He also participated in the 973 Program, the National Science and Technology Major Project, the National Natural Science Foundation of China, and other three projects, published more than 20 papers, including 11 SCI/EI papers as the first author/corresponding author, and applied for 8 national patents, 4 of which have been authorized.
    In this paper, he is responsible for writing “introduction”, and “2 the recommended architecture of the scientific data center software stack”, and integrating the overall idea of the paper.
    E-mail: jlwang@cnic.cn|王容昊,中国科学院计算机网络信息中心,大数据技术与应用发展部,助理工程师,硕士,主要研究方向为大数据分析方法与技术,作为联合第一作者发表SCI论文1篇。
    在本文中,负责论文第1章节“典型科学数据中心软件栈”的研究和内容的撰写。
    WANG Ronghao, master, is an assistant engineer at the Depar-tment of Big Data Technology and Application Development, Computer Network Information Center, Chinese Academy of Sciences. His main research direction focuses on big data analysis methods and technology. He published 1 SCI paper as a Joint First Author.
    In this paper, he is responsible for writing “1 classic scientific data center software stack”.
    E-mail: rhwang@cnic.cn|王华进,中国科学院计算机网络信息中心,大数据技术与应用发展部,博士,助理研究员,主要研究方向为大数据管理、处理技术,在软件学报、CCGrid 等国内外重要期刊、会议发表论文8篇,主持/参与了云端大数据软件栈弹性管理工具PackOne、异构数据融合管理系统PandaDB等开源软件的研发,目前承担国家重点研发计划“面向国家科学数据中心的软件栈及系统”子课题。
    在本文中,负责制定论文框架,设立科学数据中心软件栈参考架构,设计科学数据中心软件栈技术方案,以及论文的审阅和修改。
    WANG Huajin, Ph.D., is an assistant research fellow of the Department of Big Data Technology and Application Deve-lopment, Computer Network Information Center, Chinese Academy of Sciences, and his main research direction focuses on big data management and processing technology. He has published 8 papers in important journals and conferences, such as the Journal of Software and CCGrid, and presided over/participated in the research and development of opensource software such as PackOne, a cloud big data software stack elastic management tool, and PandaDB, a heterogeneous data fusion management system. He is currently undertaking a National key research and development program’s sub-project of "Software Stack and System for National Scientific Data Center".
    In this paper, he is responsible for formulating the framework of the thesis, establishing the recommended architecture of the software stack for the scientific data center, designing the technical solution of the software stack for the scientific data center, and reviewing and revising the paper.
    E-mail: wanghj@cnic.cn|路长发,中国科学院计算机网络信息中心,大数据技术与应用发展部大数据技术与应用实验室主任,硕士,高级工程师,主要研究方向为大数据管理技术,先后主持或参与了“大数据中台”、“中国科协大数据知识管理与服务平台”、“烟草科技知识图谱”、“国家空间科学中心领域大数据知识图谱服务平台”、“智慧中科院知识图谱与专家画像系统”、“中国科学院学部专家人才推荐系统”等项目的技术研发和工程实施。
    在本文中,负责撰写第3章节“软件栈原型实现”及关键技术原型系统开发。
    LU Changfa, master, is a senior engineer and big data techn-ology and application development lab administrator of the Department of Big Data Technology and Application Deve-lopment, Computer Network Information Center, Chinese Academy of Sciences, and his main research direction is big data management technology. He has presided over or par-ticipated in the "Big Data Platform", "CAST Big Data Know-ledge Management and Service Platform", "Tobacco Science and Technology Knowledge Graph", "National Space Science Center Big Data Knowledge Graph Service Platform", "Smart CAS Knowledge Graph and Expert Portrait System", "Expert Talent Recommendation System of CAS" and other projects' technical research and development and engineering implementation.
    In this paper, he is responsible for writing “3 prototype soft-ware stack implementation” and prototype system tools devel-opment.
    E-mail: luchangfa@cnic.cn
  • 基金资助:
    国家重点研发计划“面向国家科学数据中心的基础软件栈及系统”(2021YFF0704200);中国科学院“十四五”网信专项工程建设项目“科学大数据工程(三期)”(CAS-WX2022GC-02)

Research and Design of a Basic Software Stack Architecture for Scientific Data Center

WANG Jiulong(),WANG Ronghao(),WANG Huajin(),LU Changfa*(),DUAN Junlei   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, ChinaBeijing 100083, China
  • Received:2022-12-05 Online:2023-02-20 Published:2023-02-20
  • Contact: LU Changfa

摘要:

【背景】众多学科领域的数据资源均在爆炸性增长,科学研究范式正在发生深刻变革,科学数据中心已成为学科领域数据应用的重要载体。科学数据中心基础软件栈是当前科学数据中心建设的重中之重。【方法】本文通过对国内外典型科学数据中心的软件栈架构进行剖析,指出了当前主流软件栈架构在多源异构存储、管理、共享、分析等方面的不足。【结果】提出了一套科学数据中心基础软件栈参考架构及其原型实现,具备标准化存储、异构数据融合管理、大数据流处理、在线交互式分析等功能,可为我国科学数据中心体系建设提供技术路线参考。

关键词: 科学数据中心, 软件栈, 技术架构

Abstract:

[Background] Data resources in many disciplines are growing explosively. Scientific research paradigms are undergoing profound changes. Scientific data centers have become important carriers of data app-lications in disciplines. The basic software stack of the scientific data center is currently in top priority of the scientific data center construction. [Methods] Through analyzing the software architecture of typical scientific data centers at home and abroad, this paper points out the shortcomings of the current mainstream software architecture in the storage, management, sharing, and analysis of multi-source heterogeneous data. [Results] A reference design for the software architecture of the scientific data center and its prototype implementation are proposed, which consists of the functionalities of standardized storage, heterogeneous data fusion management, big data stream processing, online interactive analysis, etc. The proposed scheme can serve as a technical reference for constructing scientific data centers in China.

Key words: scientific data center, software stack, technical architecture