数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (1): 97-112.

doi: 10.11871/jfdc.issn.2096-742X.2022.01.008

• 专刊:“国家科学数据中心联合”专刊 • 上一篇    下一篇

国家高能物理科学数据中心分布式数据处理平台

石京燕(),黄秋兰(),汪璐(),李海波(),杜然(),姜晓巍(),胡庆宝(),郑伟(),闫晓飞(),张玄同()   

  1. 中国科学院高能物理研究所,北京 100049
  • 收稿日期:2021-10-08 出版日期:2022-02-20 发布日期:2022-03-04
  • 通讯作者: 石京燕
  • 作者简介:石京燕,中国科学院高能物理研究所,研究员,主要研究方向为高性能计算、虚拟化技术。
    本文中负责总体统稿、分布式数据处理平台以及高通量计算部分。
    SHI Jingyan is a professor at the Institute of High Energy physics, Chinese Academy of Sciences. Her current research interests include high performance computing and virtualization technology.
    In this paper, she is responsible for the overall draft, distributed data process platform, and high throughput computing. E-mail: jingyan.shi@ihep.ac.cn|黄秋兰,中国科学院高能物理研究所,副研究员,主要研究方向为虚拟化计算。
    本文中负责快速数据处理和跨域统一认证部分。
    HUANG Qiulan is an associate professor at the Institute of High Energy physics, Chinese Academy of Sciences. Her current research interests include virtualization and data processing.
    In this paper, she is responsible for the data fast process and single sign on. E-mail: huangql@ihep.ac.cn|汪璐,中国科学院高能物理研究所,副研究员,负责高能所计算中心存储系统的规划、建设和优化。研究方向为分布式文件系统、云存储和机器学习等技术在高能物理计算环境中的应用等。
    本文中负责分布式存储系统。
    WANG Lu is an associate professor at the Institute of High Energy physics, Chinese Academy of Sciences. She is res-ponsible for the planning, construction and optimization of the storage system of the computing center of the Institute of High Energy Physics. Her research interests include the application of distributed file system, cloud storage, and machine learning technology in high-energy physical computing environment.
    In this paper, she is responsible for the storage system. E-mail: lu.wang@ihep.ac.cn|李海波,中国科学院高能物理研究所,副研究员,主要研究方向为海量存储系统、云计算。
    本文中负责海量存储技术的研究内容及案例分析研究。
    LI Haibo is an associate professor at the Institute of High Energy Physics, Chinese Academy of Sciences. His research interests include mass data storage system and cloud computing. E-mail: lihaibo@ihep.ac.cn|杜然,中国科学院高能物理研究所,副研究员,主要研究方向为高性能计算、网格计算和志愿计算。
    本文中负责HPC计算集群部分。
    DU Ran is an associate professor at the Institute of High Energy Physics, Chinese Academy of Sciences. Her research interests include High Performance Computing, Grid Computing and Volunteer Com-puting.
    In this paper, she is responsible for the HPC cluster section. E-mail: duran@ihep.ac.cn|姜晓巍,中国科学院高能物理研究所,工程师,主要研究方向为分布式计算和高通量计算。
    本中主要负责高通量计算和dHTC分布式计算技术研究。
    JIANG Xiaowei is an engineer at the Institute of High Energy Physics, Chinese Academy of Sciences. His main research interests include dis-tributed computing and high throughput computing.
    In this paper, he is responsible for the research and development of high throughput computing and distributed computing. E-mail: jiangxw@ihep.ac.cn|胡庆宝,中国科学院高能物理研究所,助理研究员,主要研究方向为数据流处理,容器虚拟化,系统运维监控,海量数据索引查询,数据可视化,集群认证与鉴权等。
    本文中主要负责跨域统一认证和细粒度权限控制。
    HU Qingbao is an assistant research fellow at the Institute of High Energy Physics, Chinese Academy of Sciences. His research interests include data stream processing, container virtualization, system operation and maintenance monitoring, massive data index query, data visualization, cluster authen-tication and account authentication, etc.
    In this paper, he is responsible for unified crossdomain authen-tication and fine-grained authority control. E-mail: huqb@ihep.ac.cn|郑伟,中国科学院高能物理研究所,高级工程师,主要研究方向为虚拟化容器应用,高性能计算集群系统环境部署与监控、基础设施运行管理等。
    本文中负责分布式站点大规模细粒度监控系统的研究和应用。
    ZHENG Wei is a senior engineer at the Institute of High Energy physics, Chinese Academy of Sciences. His main resea-rch interests include virtualized container application, high per-formance computing cluster system environment deployment and monitoring, infrastructure operation management, etc.
    In this paper, he is responsible for the research and application of the large-scale fine-grained monitoring system for distributed sites. E-mail: zhengw@ihep.ac.cn|闫晓飞,中国科学院高能物理研究所,副研究员,主要研究方向为网格计算、高性能计算主要以及智能化运维。
    本文中负责网格系统的运行维护,集群系统的运行维护。
    YAN Xiaofei is currently an associate professor at the Institute of High Energy physics, Chinese Academy of Sciences. His research interests inlcude Grid Computing, HPC, and intelligent maintenance.
    In this paper, he is responsible for the maintenance of the grid system and the cluster system. E-mail: yanxf@ihep.ac.cn|张玄同,中国科学院高能物理研究所,助理研究员,主要研究方向为分布式计算系统。
    本文中负责DIRAC分布式计算系统在数据中心的应用。
    ZHANG Xuantong is a research associate at the Institute of High Energy Physics, Chinese Acade-my of Sciences. His research interests include distributed com-puting system.
    In this paper, he is responsible for the DIRAC distributed com-puting system at National High Energy Physics Data Center. E-mail: zhangxuantong@ihep.ac.cn
  • 基金资助:
    国家自然科学基金(11775250);国家自然科学基金(11805226);国家自然科学基金(12075268);国家自然科学基金(11875283)

Distributed Data Processing Platform of National High Energy Physics Data Center

SHI Jingyan(),HUANG Qiulan(),Wang Lu(),LI Haibo(),DU Ran(),JIANG Xiaowei(),HU Qingbao(),ZHENG Wei(),Yan Xiaofei(),ZHANG Xuantong()   

  1. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-10-08 Online:2022-02-20 Published:2022-03-04
  • Contact: SHI Jingyan

摘要:

【目的】本文对国家高能物理科学数据中心分布式数据平台进行系统全面介绍,为高能物理及相关领域大科学实验的数据处理提供参考。【方法】文章介绍了国家高能物理科学数据中心分布式数据处理平台的总体构成、运行模式和智能运维等方面的关键技术。通过分析高能物理实验数据处理的计算特点与实际需求,介绍了数据中心“一平台多中心”的数据处理平台建设思想,阐述了平台为高能物理实验提供的跨地域资源共享、高性能海量数据访问以及用户实时交互服务等技术方案设计与实现。【结果】文章列举了数据中心分布式数据处理平台对两个高能物理实验的支持实例,助力科学研究成果获取。【结论】国家高能物理科学数据中心分布式数据处理平台已经成为高能物理学科的重要基础设施和组成,是学科融合、开展新研究方法的主要场所,满足了粒子物理、理论物理、空间天文、射线学科、加速器设计等科研领域的数据处理需求。

关键词: 分布式数据处理平台, 跨地域资源共享, 高性能计算, 高通量计算

Abstract:

[Objective] This paper introduces the distributed data processing platform of the National High Energy Physics Data Center (NHEPDC). It also provides a reference for data processing in the HEP and related science experiments. [Methods] This paper introduces the composition, key technologies, and intelligent operation of the distributed data processing platform of NHEPDC. By analyzing the characteristics and actual requirements of high energy physics data processing, the paper introduces the strategy of "one platform and multiple centers" for construction of the distributed data processing platform in the data center and elaborates the realization of cross-regional resource sharing, high performance data access, and user interaction data processing. [Results] The paper enumerates two examples of support for high-energy physics experiments on the distributed data processing platform of NHEPDC to facilitate the acquisition of scientific research results. [Conclusions] The distributed data processing platform of NHEPDC has become an important infrastructure and composition of high energy physics, the main place to integrate new research methods. It meets the computing needs of particle physics, theoretical physics, space astronomy, ray science, accelerator design, and other scientific research fields.

Key words: distributed data processing platform, cross-regional resource sharing, high performance computing, high throughput computing