数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (5): 108-119.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.05.012

doi: 10.11871/jfdc.issn.2096-742X.2022.05.012

• 技术与应用 • 上一篇    下一篇

Gadget-2在一个加速卡异构平台上的移植与优化

赵文龙1,2,王武1,*()   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100049
  • 收稿日期:2021-11-23 出版日期:2022-10-20 发布日期:2022-10-27
  • 通讯作者: 王武
  • 作者简介:赵文龙,中国科学院计算机网络信息中心,硕士研究生,主要研究方向为高性能计算和并行计算。
    本文承担工作为通过HIP对Gadget-2进行异构平台的移植优化。
    ZHAO Wenlong is a master student at Computer Network Information Center, Chinese Academy of Sciences. His main research interests are high performance computing and parallel computing.
    In this paper, he is mainly responsible for the implementation and optimization of Gadget-2 on heterogeneous platform with HIP.
    E-mail: zhaowenlong@cnic.cn|王武,中国科学院计算机网络信息中心,博士,副研究员,研究方向为并行算法,高性能计算。
    本文承担工作为指导在异构平台上对Gadget-2的移植优化。
    WANG Wu, Ph.D., is an associate researcher at Computer Ne-twork Information Center, Chinese Academy of Sciences. His main research interests are parallel algorithm and high perfor-mance computing.
    In this paper, he is mainly responsible for direction of the implementation and optimization of Gadget-2 on the heterogeneous platform.
    E-mail: wangwu@sccas.cn
  • 基金资助:
    光合基金A类(GHFUND A ghfund)

Porting and Optimizing Gadget-2 on a Heterogeneous Accelerator Platform

ZHAO Wenlong1,2,WANG Wu1,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-11-23 Online:2022-10-20 Published:2022-10-27
  • Contact: WANG Wu

摘要:

【目的】本文在国产加速卡异构平台上,对基于BH-树方法和粒子网格方法的并行天文N体模拟软件Gadget-2进行了移植优化。【方法】基于HIP将Gadget-2中最耗时的短程力计算部分移植到加速卡上,包括本地树的遍历,并对结构体数组进行重构,同时充分利用寄存器与共享内存,提高设备端的访存效率。【结果】数值结果表明,移植优化的版本整体性能加速13.27倍,短程力计算加速35.67倍,并行效率达到57.29%,功率谱结果验证了移植优化的正确性。【结论】本文实现了天文N体模拟软件Gadget-2在加速卡异构平台上的移植和优化,并为大规模宇宙学模拟提供支撑。

关键词: N体问题, TreePM, 高性能异构平台, HIP

Abstract:

[Objective] In this paper, we present the work on porting the parallel cosmological N-body simulation software Gadget-2 to a homegrown heterogeneous accelerator platform, based on the BH-Tree and the Particle-Mesh methods. [Methods] The most time-consuming part, computation of short-range force, is ported to the accelerator with HIP, including the traversal of the local tree. The Structure of Arrays is reconstructed, the register and shared memory are fully utilized to improve the efficiency of memory access on the device. [Results] Numerical results show that the performance of the optimized software and the part of short-range force computation are accelerated up to 13.27 times and 35.67 times, respectively. The parallel efficiency reaches 57.29%. The optimized version is validated by the power spectrum. [Conclusions] The cosmological N-body simulations software Gadget-2 is ported and optimized on a heterogeneous accelerator platform, which can support large-scale cosmological simulation.

Key words: N-body problem, TreePM method, Heterogeneous Accelerator Platform, HIP