数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (3): 83-91.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.03.009

doi: 10.11871/jfdc.issn.2096-742X.2024.03.009

• 会议论文 • 上一篇    下一篇

LHAASO模拟作业从X86到ARM计算集群的移植

程垚松1,*(),毕玉江1,2,郭超奇1,闫晓飞1   

  1. 1.中国科学院高能物理研究所,计算中心,北京 100049
    2.中国科学院高能物理研究所,天府宇宙线研究中心,四川 成都 610041
  • 收稿日期:2023-11-10 出版日期:2024-06-20 发布日期:2024-06-21
  • 通讯作者: *程垚松(E-mail: chengys@ihep.ac.cn
  • 作者简介:程垚松,中国科学院高能物理研究所计算中心,工程师,长期从事科学数据计算平台的建设,主要研究方向为海量分布式存储系统和科学数据处理新技术的研究与应用。
    负责论文撰写与软件移植测试工作。
    CHENG Yaosong, Engineer at the Computing Center of the Institute of High Energy Physics, Chinese Academy of Sciences, is engaged in the construction of scientific data computing platforms. His main research focuses on the study and application of large-scale distributed storage systems and new technologies for scientific data processing. He is responsible for paper writing and software porting testing work.
    E-mail: chengys@ihep.ac.cn
  • 基金资助:
    国家自然科学基项目“面向多数据中心的LHAASO科学大数据管理系统及关键技术研究”(12075268)

Porting of LHAASO Simulation Jobs from X86 to ARM Computing Cluster

CHENG Yaosong1,*(),BI Yujiang1,2,GUO Chaoqi1,YAN Xiaofei1   

  1. 1. Computing Center, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2. Tianfu Cosmic Ray Research Center, Institute of High Energy Physics, Chinese Academy of Sciences, Chengdu,Sichuan 610041, China
  • Received:2023-11-10 Online:2024-06-20 Published:2024-06-21

摘要:

【目的】随着高能物理实验的推进与先进探测器的研发,产生的科学大数据显著增加,通过对这些数据的分析和模拟,可以发现宇宙运行规律并进一步探索宇宙的起源。【应用背景】科学数据的爆炸式增长对计算资源的规模和性能提出了更多的需求。例如,高海拔宇宙线观测站(LHAASO)自2020年实验开始运行以来,其宇宙线事例模拟作业一直在Intel X86集群上运行,但由于CPU资源有限,仅生产了第一阶段计划数据的一部分。【方法】基于对计算资源的需求和国际局势的变化,利用中国广东省东莞市的ARM架构计算集群,探索了异构计算服务设备在高能物理领域的应用。【结果】本文构建了一个完整的支持高能物理离线数据处理的应用程序生态环境。将基于平方公里阵列(KM2A)、水切伦科夫探测器阵列(WCDA)和广角切伦科夫望远镜阵列(WFCTA)实验的离线软件移植到ARM机器上运行,制定跨异地站点和异构计算集群的数据传输和作业调度策略,并评估了模拟作业在Intel X86和ARM集群中的性能和功耗差异。【结论】该环境中,移植的LHAASO模拟作业在ARM计算集群可以正确运行;虽然基于Intel X86架构的CPU单核性能优于ARM CPU,但是对于多核架构的整个服务器来说,ARM服务器性能更好。

关键词: 科学大数据, 数据处理, 异构计算, ARM架构

Abstract:

[Objective] With the advancement of high-energy physics experiments and the development of advanced detectors, the generation of scientific big data has significantly increased. By analyzing and simulating these data, we can discover the laws of the universe and further explore the origins of the universe. [Context] The explosive growth of scientific data poses greater demands on the scale and performance of computing resources. For example, since the operation of the Large High Altitude Air Shower Observatory (LHAASO) in 2020, its cosmic ray event simulation has been running on an Intel X86 cluster. However, due to limited CPU resources, only a portion of the planned data for the first stage was produced. [Methods] Based on the demand for computing resources and changes in the international situation, we explore the application of heterogeneous computing service devices in the field of high-energy physics using an ARM architecture computing cluster located in Dongguan, Guangdong Province, China. [Results] This article builds a complete application ecosystem that supports offline data processing for high-energy physics, and ports the offline software of experiments based on the Square Kilometer Array (KM2A), Water Cherenkov Detector Array (WCDA), and Wide Field of View Cherenkov Telescope Array (WFCTA) to run on ARM machines. This article also develops data transfer and job scheduling strategies across different sites and heterogeneous computing clusters. Furthermore, this article evaluates the performance and power consumption differences of simulation jobs between Intel X86 and ARM clusters. [Conclusions] In this environment, the ported LHAASO simulation jobs can run correctly on the ARM computing cluster. Although the single-core performance of Intel X86 CPUs is better than that of ARM CPUs, for the entire server with a multi-core architecture, ARM servers provide better performance.

Key words: scientific big data, data processing, heterogeneous computing, ARM architecture