Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (3): 51-58.

doi: 10.11871/jfdc.issn.2096-742X.2026.03.005

• Special Issue: Call for Papers for the 21st National Conference on Scientific Computing • Previous Articles     Next Articles

A Scientific Data Collection Service Platform for High-Energy Physics

WANG Shuang1,2,*(),ZHANG Hongmei1,2,CHEN Gang1,2,QI Fazhi1,2   

  1. 1 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, 100049, China
    2 National High Energy Physics Science Data Center, Beijing, 100049, China
  • Received:2025-10-29 Online:2026-06-20 Published:2026-06-18
  • Contact: WANG Shuang E-mail:wangshuang@ihep.ac.cn

Abstract:

[Background] The scientific data generated by large-scale facilities in the field of high-energy physics are characterized by their massive volume and complexity, which are recognized as strategic national resources. Standardized collection and management of these data are of critical importance for promoting the full utilization of data resources and fostering technological innovation. However, conventional data service platforms in high-energy physics primarily focus on data openness and lack the capability to support full lifecycle management of scientific data collection starting from research projects, thus failing to meet national strategic requirements. [Objective] To address these challenges, this paper presents the design and construction of a scientific data collection service platform for high-energy physics. The platform enables standardized and process-oriented management from metadata submission to data release. [Methods] Adhering to user-friendly and extensible design principles, the platform establishes a comprehensive standardized management process that includes: “user registration-collection intention-collection plan-data submission-data review-data release”. [Results] To date, the platform has supported data collection of 87 scientific projects, including 79 National Key R&D Program projects and 8 CAS Pilot Program projects, involving multiple large-scale scientific facilities. A total of 1,190 scientific datasets have been collected, amounting to 5.78 PB of data. All datasets have been publicly released through the website of the National High Energy Physics Science Data Center. [Conclusion] The development and application of this platform facilitate standardized management and sharing of scientific data in high-energy physics, thereby fully unleashing the scientific value of the data and providing solid data support for scientific research and major discoveries.

Key words: high-energy physics, scientific data, data collection, data service platform, data management