Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (3): 15-28.

doi: 10.11871/jfdc.issn.2096-742X.2026.03.002

• Special Issue: Call for Papers for the 21st National Conference on Scientific Computing • Previous Articles     Next Articles

From Building to Packaging: A Study on FAIR Data Traceability in HPC: AI-Driven Integrated Modeling of Magnetic Confinement Fusion

LIU Xiaojuan1(),YU Zhi1,*(),ZHANG Yundong2   

  1. 1 Institute of Plasma Physics, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui 230031, China
    2 University of Science and Technology of China Network Information Centre, Hefei, Anhui 230026, China
  • Received:2025-10-28 Online:2026-06-20 Published:2026-06-18
  • Contact: YU Zhi E-mail:lxj@ipp.ac.cn;yuzhi@ipp.ac.cn

Abstract:

[Background] With advances in magnetic confinement fusion research and computing capabilities, integrated fusion modeling is evolving from single-physics modules toward complex, multi-physics, multi-scale coupled systems. The widespread adoption of artificial intelligence (AI) has led to an HPC-AI hybrid computing paradigm. [Objective] However, traditional HPC codes rely on stable system-level compilation environments, while AI applications depend on dynamic Python ecosystems and containerization. Their fundamental differences in dependency management, build processes, and execution models make it difficult for the existing in-house integrated modeling framework (FuYun) to uniformly manage heterogeneous components, causing traceability gaps at the HPC-AI interface and threatening reproducibility and provenance. This study aims to address FAIR compliance challenges in software environment management and physics module execution within FuYun under HPC-AI integrated environments. [Methods] This study extends FuYun’s scope from HPC “build” to AI “packaging,” introducing a unified abstraction called the Computational Unit (CU) to encapsulate both traditional HPC programs and containerized AI applications. A cross-stack unique identifier (@pid) system and provenance tracking mechanism are designed. The module description schema is enhanced with AI-specific metadata fields (e.g., model weights, training hyperparameters, random seeds) to ensure complete recording of critical information. [Results] The extended framework successfully unifies management of heterogeneous components. Full-stack data provenance is achieved via the @pid system and enhanced tracking. Experiments show a 95% reproducibility rate and an 85% improvement in environment deployment efficiency over manual methods. This approach bridges the gap between HPC modules and AI modules, allowing users to flexibly select different computational modules and organize them into analysis workflows within an integrated platform, and to record the compute flow and result.

Key words: data management, FAIR4RS, integrated modeling, containers, magnetic confinement fusion, provenance