数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (4): 67-78.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.04.006

doi: 10.11871/jfdc.issn.2096-742X.2025.04.006

• 专刊:空间科学大数据智能算法模型与工具 • 上一篇    下一篇

语义关联驱动的空间科学数据仓储系统构建与关联推荐研究

吴兆晨(),路长发*(),李刚,蓝晨阳,王慈枫   

  1. 中国科学院计算机网络信息中心北京 100083
  • 收稿日期:2025-04-29 出版日期:2025-08-20 发布日期:2025-08-21
  • 通讯作者: 路长发
  • 作者简介:吴兆晨,中国科学院计算机网络信息中心,大数据技术与应用发展部科学数据软件体系实验室,工程师,主要研究方向为大数据管理与分析计算技术。先后参与了“大数据中台”“中国科协大数据知识管理与服务平台”“烟草科技知识图谱”、国家重点研发计划“空间科学大数据智能管理与分析挖掘关键技术及应用”等项目的技术研发和工程实施。
    本文负责关键技术创新、系统原型研发及论文第一章的撰写。
    WU Zhaochen, is an engineer at the scientific data software system lab administrator, Department of Big Data Technology and Application Development, Computer Network Information Center, Chinese Academy of Sciences. His primary research interests include big-data management and analytical computing technologies. He has presided over or participated in the “Big Data Platform”, “CAST Big Data Knowledge Management and Service Platform”, “Tobacco Science and Technology Knowledge Graph” and National Key R&D Program.
    In this paper, he is responsible for key technological innovations, prototype development and writing of Chapter 1.
    E-mail: zcwu@cnic.cn|路长发,中国科学院计算机网络信息中心,大数据技术与应用发展部科学数据软件体系实验室主任,硕士,高级工程师,主要研究方向为大数据管理技术。先后主持或参与了“大数据中台”“中国科协大数据知识管理与服务平台”“烟草科技知识图谱”“国家空间科学中心领域大数据知识图谱服务平台”“智慧中科院知识图谱与专家画像系统”“中国科学院学部专家人才推荐系统”等项目的技术研发和工程实施。
    本文负责学术指导、关键技术创新及论文终稿审定。
    LU Changfa, with a master’s degree, is a senior engineer and scientific data software system lab administrator of the Department of Big Data Technology and Application Development, Computer Network Information Center, Chinese Academy of Sciences. His main research direction is big data management technology. He has presided over or participated in the “Big Data Platform”, “CAST Big Data Knowledge Management and Service Platform”, “Tobacco Science and Technology Knowledge Graph”, “National Space Science Center Big Data Knowledge Graph Service Platform”, “Smart CAS Knowledge Graph and Expert Portrait System”, “Expert Talent Recommendation System of CAS” and other projects’ technical research and development and engineering implementation.
    In this paper, he is responsible for academic supervision, key technological innovations, and final approval of the paper.
    E-mail: luchangfa@cnic.cn
  • 基金资助:
    国家重点研发计划“空间科学大数据智能管理与分析挖掘关键技术及应用”(2022YFF0711400)

Research on Construction of a Semantic Association-Driven Space Science Data Repository System and Dataset Association Recommendation

WU Zhaochen(),LU Changfa*(),LI Gang,LAN Chenyang,WANG Cifeng   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2025-04-29 Online:2025-08-20 Published:2025-08-21
  • Contact: LU Changfa

摘要:

【背景】随着空间科学领域多模态数据规模的指数级增长,现有数据管理系统面临严峻的挑战,传统架构下数据间语义关联的缺失严重制约了跨学科知识发现的效率。【目的】本研究旨在构建语义增强的空间科学数据仓储系统,深度挖掘多源数据的元数据语义及其关联关系,提升关联分析的能力。【方法】基于概念层-逻辑层-物理层的建模架构,构建了空间科学数据元数据语义关联网络。通过非侵入式数据整合方法,在不改变现有业务系统架构的前提下,研发了档案库对接服务、统一对外服务、图数据库管理、图查询引擎等部件,实现了空间科学数据仓储系统。进一步设计了基于元数据语义的相似度计算算法,量化数据实体间的关联强度,并通过数据集关联推荐实验进行技术验证。【结论】实验表明,本研究提出的方法能够有效提升空间科学数据的知识发现效率,为破解多模态数据融合难题提供了新思路,显著增强了复杂科学数据的知识发现能力。

关键词: 空间科学数据语义关联网, 空间科学数据仓储系统, 属性图模型, 语义相似度计算, 关联数据集推荐

Abstract:

[Background] With the exponential growth of multimodal data in space science, existing data management systems face significant challenges. The lack of semantic correlations between data in traditional architectures severely limits the efficiency of interdisciplinary knowledge discovery. [Objective] This study aims to construct a semantically enhanced space science data repository system, deeply exploring metadata semantics and their correlations across multi-source data to break disciplinary barriers and enhance correlation analysis capabilities. [Methods] The research constructs a metadata semantics network for space science data through a progressive three-tier conceptual-logical-physical architecture. Employing a non-intrusive data integration methodology, we develop key components including archival repository interface services, unified external service APIs, graph database management systems, and graph query engines, thereby establishing the space science data repository system without modifying existing business architectures. Furthermore, we design a metadata-driven semantic similarity calculation algorithm to quantify the association strength between datasets, with technical validation conducted through related datasets recommendation experiments. [Conclusions] Experiments show that the proposed method effectively improves knowledge discovery efficiency in space science, offers a novel solution to multimodal data fusion challenges, and significantly enhances capabilities for analyzing complex scientific data.

Key words: space science metadata semantics network, space science data repository system, attribute graph model, semantic similarity computation, associated datasets recommendation