数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (3): 136-148.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.03.011

doi: 10.11871/jfdc.issn.2096-742X.2025.03.011

• 技术与应用 • 上一篇    下一篇

XHDF5:面向HEPS的高性能HDF5数据远程访问系统

冯时超1(),程耀东2,3,*(),程垚松2   

  1. 1.郑州大学,计算机与人工智能学院,河南 郑州 450001
    2.中国科学院高能物理研究所,计算中心,北京 100049
    3.中国科学院大学,北京 100049
  • 收稿日期:2024-11-20 出版日期:2025-06-20 发布日期:2025-06-25
  • 通讯作者: *程耀东(E-mail:chyd@ihep.ac.cn
  • 作者简介:冯时超,郑州大学计算机与人工智能学院,硕士研究生,主要研究方向是HDF5远程数据访问方法及应用研究。
    本文承担工作:XHDF5系统的设计与实现,论文的初稿撰写。
    FENG Shichao is a master student at the School of Computer and Artificial Intelligence, Zhengzhou University. His research interests include remote data access methods and applications.
    In this paper, he is responsible for the paper drafting, and XHDF5 development.
    E-mail: scfeng@ihep.ac.cn|程耀东,中国科学院高能物理研究所,研究员,博士生导师,主要研究领域为云计算、海量存储和大数据。
    本文承担工作为:研究指导,整体框架的设计规划,论文的审定。
    CHENG Yaodong is a professor doctoral supervisor at the Institute of High Energy Physics, Chinese Academy of Sciences. His research interests include cloud computing, mass storage, and big data.
    In this paper, he is responsible for research supervision, overall framework design and planning, and the final review and approval of the manuscript.
    E-mail: chyd@ihep.ac.cn

XHDF5: A High-Performance HDF5 Remote Data Access System for HEPS

FENG Shichao1(),CHENG Yaodong2,3,*(),CHENG Yaosong2   

  1. 1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan 450001, China
    2. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    3. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-11-20 Online:2025-06-20 Published:2025-06-25

摘要:

【目的】为满足异地用户对高能同步辐射光源(HEPS)海量数据的访问需求,提出了一种高效且具备良好适应性的基于HDF5数据格式的远程访问解决方案。【应用背景】HEPS每年将产生海量科学数据,以HDF5作为统一的数据格式,采用“数据集中存储,异地分布处理”的计算模式,用户的计算任务无需提前下载数据即可无缝地运行在异地的算力中心上。【方法】通过对HEPS数据中心的存储现状和目前数据访问手段的分析,采用HDF5 VOL与XRootD实现对HDF5数据的透明访问,设计并实现了XHDF5系统。该系统采用了并发访问、数据压缩等方式提高访问性能。【结果】通过多项技术测试、性能对比实验以及大规模数据处理测试,XHDF5系统在远程访问HDF5文件时展现出卓越的性能和高度的稳定性。与传统的基于文件系统挂载的访问方法和H5serv技术相比,XHDF5系统在高网络延迟的环境下表现尤为突出。【结论】XHDF5系统能够高效地支持跨区域的数据共享与协同处理,为科研人员提供一个稳定可靠的高性能数据访问环境,助力科学研究的顺利开展。

关键词: XRootD, HDF5, HEPS, NFS, 远程数据访问

Abstract:

[Objective] In order to meet the demand for remote access to massive data generated by the high energy synchrotron radiation source (HEPS), an efficient and adaptable solution based on HDF5 data format is proposed. [Background] The HEPS is expected to produce a tremendous volume of scientific data annually. By adopting HDF5 as the unified data format and employing the "centralized data storage, remotely distributed processing" computing paradigm, users are no longer required to download data in advance. Instead, computing tasks can be seamlessly run at the remote computing center. [Methods] By analyzing the current storage infrastructure of the HEPS data center and existing data access methods, the XHDF5 system is designed and implemented. Leveraging HDF5 VOL and XRootD technologies, this system facilitates seamless and transparent access to HDF5 data. Moreover, the system incorporates advanced techniques such as concurrent access and large-scale data compression to improve access performance. [Results] After comprehensive technical testing, detailed performance comparison experiments, and rigorous large-scale data processing evaluations, the XHDF5 system has demonstrated outstanding performance and remarkable stability during remote HDF5 file access. In comparison to traditional access methods that rely on file system mounting and H5serv technology, the XHDF5 system exhibits superior performance, particularly in high network latency environments. [Conclusion] The XHDF5 system effectively supports cross-region data sharing and collaboration, providing researchers with a stable and reliable data access platform to facilitate scientific research.

Key words: XRootD, HDF5, HEPS, NFS, remote data access