数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (3): 58-66.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.03.006

doi: 10.11871/jfdc.issn.2096-742X.2024.03.006

• 会议论文 • 上一篇    下一篇

基于Rucio的高能物理网格数据管理的研究和应用

张玄同1,*(),张晓梅1,胡皓1,2,王浩帆1   

  1. 1.中国科学院高能物理研究所,北京 100049
    2.中国科学技术大学,安徽 合肥 230026
  • 收稿日期:2023-10-30 出版日期:2024-06-20 发布日期:2024-06-21
  • 通讯作者: *张玄同(E-mail:zhangxuantong@ihep.ac.cn
  • 作者简介:张玄同,中国科学院高能物理研究所计算中心,助理研究员,主要研究方向为国际网格框架下分布式计算系统和数据管理系统。
    负责论文初稿撰写与相关软件应用的开发。
    ZHANG Xuantong, is a Research Associate of Computing Center of Institute of High Energy Physics, Chinese Academy of Sciences. His research interests include distributed computing systems and data management systems within the WLCG framework.
    In this paper, he is responsible for the paper drafting and relevant software applications development.
    E-mail: zhangxuantong@ihep.ac.cn

Research and Applications of High Energy Physics Grid Data Management Based on Rucio

ZHANG Xuantong1,*(),ZHANG Xiaomei1,HU Hao1,2,WANG Haofan1   

  1. 1. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2. University of Science and Technology of China, Hefei, Anhui 230026, China
  • Received:2023-10-30 Online:2024-06-20 Published:2024-06-21

摘要:

【目的】近年来,高能物理网格数据规模和用户需求产生重大变革,需要研究和应用新兴网格数据管理技术以应对需求变化。【方法】基于新型网格数据管理系统Rucio,利用其高伸缩性、模块化和可扩展性的软件特点,发挥其分布式数据恢复、自适应的数据复制的功能特性,为多个国内主导的国际合作实验设计了面向实验需求的网格数据管理解决方案。【结果】实现了分布式数据统一命名、数据增删改查等基础管理功能、多站点数据副本管理、原始数据分发管理、实验软件数据管理接口嵌入等多种功能,并先后进入了测试应用阶段。【结论】本研究为未来国内主导的国际合作的高能物理实验网格数据管理方案的设计和开发进行了探索和尝试,希望进一步在网格架构上开展深入研究,实现国内实验通用标准的网格数据管理方案。

关键词: 网格计算, 分布式计算, 网格数据管理, 高能物理

Abstract:

[Objective] In recent years, significant changes have occurred in the scale of high-energy physics Grid data and user requirements, necessitating research and the application of emerging Grid data management technologies to adapt to changing demands. [Methods] Based on the novel Grid data management system Rucio, and leveraging its characteristics of high scalability, modularity, and extensibility, the functionalities of distributed data recovery and adaptive data replication are utilized to design Grid data management solutions tailored to the experimental requirements of several China-led international collaborative experiments. [Results] Multiple functions are realized, including uniform naming of distributed data, basic management functions for data creation, modification, retrieval, and deletion, multi-site data replica management, original data distribution management, and embedding of interfaces for experiment software data management. Various tests and applications are conducted in different stages. [Conclusions] This study explores the design and development of Grid data management solutions for China-led international collaborative high-energy physics experiments. Future research can further delve into Grid architecture to achieve universal standard Grid data management solutions.

Key words: grid computing, distributed computing, grid data management, high energy physics