数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (4): 20-32.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.04.002

doi: 10.11871/jfdc.issn.2096-742X.2025.04.002

• 专刊:空间科学大数据智能算法模型与工具 • 上一篇    下一篇

空间科学虚拟观测台智能检索系统构建

李云龙1(),焦琦融1,王慈枫2,邹自明1,*()   

  1. 1.中国科学院国家空间科学中心北京 100190
    2.中国科学院计算机网络信息中心北京 100083
  • 收稿日期:2025-05-31 出版日期:2025-08-20 发布日期:2025-08-21
  • 通讯作者: 邹自明
  • 作者简介:李云龙,中国科学院国家空间科学中心副研究员,硕士生导师,主要研究方向为空间科学大数据智能检索与关联挖掘。
    本文中承担的工作为原型系统实现和验证。
    LI Yunlong is an associated researcher and a master’s advisor at the National Space Science Center, Chinese Academy of Sciences. His main reseach interests include intelligent retrieval and knowledge mining of big data in space science.
    In this paper, he is mainly responsible for implementation and validation of a prototype system.
    E-mail: liyunlong@nssc.ac.cn|邹自明,中国科学院国家空间科学中心研究员,国家空间科学数据中心主任,博士生导师,长期从事空间科学与数据科学交叉领域研究,包括科学数据治理理论、标准研制、空间信息组织与互操作、日地空间大数据系统工程、空间天气领域数据挖掘与知识发现。本文中承担的工作为原型系统设计。
    ZOU Ziming is a researcher and a doctoral supervisor at the National Space Science Center, Chinese Academy of Sciences. He is also the Director of the National Space Science Data Center. He has long been engaged in research at the intersection of space science and data science, including scientific data governance, development of standards, space science information management and interoperation, system engineering for space science big data, data mining and knowledge discovery in space weather.
    In this paper, he is mainly responsible for the design of the prototype system.
    E-mail: mzou@nssc.ac.cn
  • 基金资助:
    国家重点研发计划“基础科研条件与重大科学仪器设备研发”重点专项(2022YFF0711400)

Construction of an Intelligent Retrieval System for the Virtual Space Science Observatory

LI Yunlong1(),JIAO Qirong1,WANG Cifeng2,ZOU Ziming1,*()   

  1. 1. National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
    2. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2025-05-31 Online:2025-08-20 Published:2025-08-21
  • Contact: ZOU Ziming

摘要:

【背景】随着空间科学数据的快速增长和多模态化,传统的基于元数据字段的检索方式难以满足科研用户对复杂语义和未预定义查询的检索需求,亟需引入具备语义理解能力的智能检索系统。【目的】本研究旨在构建一个面向空间科学领域数据的智能检索系统,以解决传统元数据查询方式在语义理解和多模态数据检索方面的不足,提升科研人员对异构空间科学数据的发现效率和准确性。【方法】研究基于大语言模型构建动态语义解析机制,结合BM25和稠密向量检索方法实现数据集的混合检索;针对图像和时序数据,采用DINOv2、VISTA、Timer-XL等模型提取内容特征,构建多模态语义索引;系统采用分层架构,集成全文检索与向量数据库,支持自然语言、标签和数据样例等多种查询方式。【结论】空间科学虚拟观测台智能检索系统通过融合多种AI模型,显著提升了数据发现的灵活性与准确性,为大规模空间科学数据的高效利用提供了新范式。

关键词: 空间科学, 大数据, 智能检索

Abstract:

[Background] With the rapid growth of space science data, the traditional metadata-based retrieval methods have gradually become insufficient to meet the needs of researchers for complex semantic queries. There is an urgent need to introduce intelligent retrieval systems capable of semantic understanding. [Objective] This study aims to develop an intelligent retrieval system for space science data, addressing the limitations of conventional metadata-driven approaches in semantic comprehension and multi-modal data retrieval, thereby enhancing the efficiency and accuracy of accessing heterogeneous space science datasets. [Methods] The proposed system employs a dynamic semantic parsing mechanism based on large language models, combined with hybrid retrieval strategies integrating BM25 and dense vector search methods. For image and time-series data, feature representations are extracted using models such as DINOv2, VISTA, and Timer-XL to construct a multi-modal semantic index. The system adopts a hierarchical architecture that integrates full-text search and vector databases, supporting multiple query modes including natural language, tags, and data examples. [Conclusion] The intelligent retrieval system for the virtual space science observatory significantly enhances the flexibility and accuracy of data discovery by integrating multiple AI models, offering a novel paradigm for the efficient utilization of large-scale space science data.

Key words: space science, big data, intelligent retrieval