Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (4): 20-32.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.04.002

doi: 10.11871/jfdc.issn.2096-742X.2025.04.002

• Special Issue: Artificially Intelligent Models and Tools for Space Science Big Data • Previous Articles     Next Articles

Construction of an Intelligent Retrieval System for the Virtual Space Science Observatory

LI Yunlong1(),JIAO Qirong1,WANG Cifeng2,ZOU Ziming1,*()   

  1. 1. National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
    2. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2025-05-31 Online:2025-08-20 Published:2025-08-21
  • Contact: ZOU Ziming E-mail:liyunlong@nssc.ac.cn;mzou@nssc.ac.cn

Abstract:

[Background] With the rapid growth of space science data, the traditional metadata-based retrieval methods have gradually become insufficient to meet the needs of researchers for complex semantic queries. There is an urgent need to introduce intelligent retrieval systems capable of semantic understanding. [Objective] This study aims to develop an intelligent retrieval system for space science data, addressing the limitations of conventional metadata-driven approaches in semantic comprehension and multi-modal data retrieval, thereby enhancing the efficiency and accuracy of accessing heterogeneous space science datasets. [Methods] The proposed system employs a dynamic semantic parsing mechanism based on large language models, combined with hybrid retrieval strategies integrating BM25 and dense vector search methods. For image and time-series data, feature representations are extracted using models such as DINOv2, VISTA, and Timer-XL to construct a multi-modal semantic index. The system adopts a hierarchical architecture that integrates full-text search and vector databases, supporting multiple query modes including natural language, tags, and data examples. [Conclusion] The intelligent retrieval system for the virtual space science observatory significantly enhances the flexibility and accuracy of data discovery by integrating multiple AI models, offering a novel paradigm for the efficient utilization of large-scale space science data.

Key words: space science, big data, intelligent retrieval