Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (1): 15-27.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.01.002

doi: 10.11871/jfdc.issn.2096-742X.2023.01.002

• Special Issue: Resources, Technology and Policy of Scientific Data • Previous Articles     Next Articles

End-to-End Workflow Framework for Cross-Center Scientific Data Analysis

ZHU Xiaojie1(),WANG Huajin1,SHEN Zhihong1,*(),GUO Xuebing2,DONG Wen1   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
  • Received:2022-12-27 Online:2023-02-20 Published:2023-02-20
  • Contact: SHEN Zhihong E-mail:xjzhu@cnic.cn;bluejoe@cnic.cn

Abstract:

[Objective] The rapid development of big data and artificial intelligence technology has led to the transformation of research paradigms. New paradigms generally require collaborative analysis. Task types are complex and the analysis process spans different scientific data centers. [Application background] Existing process-based analysis frameworks are difficult to support end-to-end cross-center scientific data analysis requirements due to the lack of the capabilities of analysis process expression, heterogeneous computing framework integration, and cross-center job scheduling. [Methods] A software framework for end-to-end cross-center analysis of scientific data is proposed, which supports cross-center heterogeneous workflow construction, cross-framework data transparent transfer, and cross-center job optimization scheduling. [Results] The function and performance of the proposed framework are verified based on the scenario of "cross-station online processing and quality control of aboveground grass biomass" in the National Ecosystem Science Data Center, which verifies the advancement and feasibility of the framework.

Key words: scientific research paradigm, analysis workflow, scientific data center, cross-center computing