Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (3): 136-148.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.03.011

doi: 10.11871/jfdc.issn.2096-742X.2025.03.011

• Technology and Application • Previous Articles     Next Articles

XHDF5: A High-Performance HDF5 Remote Data Access System for HEPS

FENG Shichao1(),CHENG Yaodong2,3,*(),CHENG Yaosong2   

  1. 1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan 450001, China
    2. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    3. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-11-20 Online:2025-06-20 Published:2025-06-25

Abstract:

[Objective] In order to meet the demand for remote access to massive data generated by the high energy synchrotron radiation source (HEPS), an efficient and adaptable solution based on HDF5 data format is proposed. [Background] The HEPS is expected to produce a tremendous volume of scientific data annually. By adopting HDF5 as the unified data format and employing the "centralized data storage, remotely distributed processing" computing paradigm, users are no longer required to download data in advance. Instead, computing tasks can be seamlessly run at the remote computing center. [Methods] By analyzing the current storage infrastructure of the HEPS data center and existing data access methods, the XHDF5 system is designed and implemented. Leveraging HDF5 VOL and XRootD technologies, this system facilitates seamless and transparent access to HDF5 data. Moreover, the system incorporates advanced techniques such as concurrent access and large-scale data compression to improve access performance. [Results] After comprehensive technical testing, detailed performance comparison experiments, and rigorous large-scale data processing evaluations, the XHDF5 system has demonstrated outstanding performance and remarkable stability during remote HDF5 file access. In comparison to traditional access methods that rely on file system mounting and H5serv technology, the XHDF5 system exhibits superior performance, particularly in high network latency environments. [Conclusion] The XHDF5 system effectively supports cross-region data sharing and collaboration, providing researchers with a stable and reliable data access platform to facilitate scientific research.

Key words: XRootD, HDF5, HEPS, NFS, remote data access