Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (4): 182-193.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.04.016

doi: 10.11871/jfdc.issn.2096-742X.2024.04.016

• Technology and Application • Previous Articles    

Performance Analysis and Runtime Optimization of GROMACS on Kunpeng-920 Platform

YUAN Huifeng1,2(),LU Teng1,ZHU Yanchao3,YAN Chen3,MA Yingjin1,LIU Qian1,JIN Zhong1,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100190, China
    3. HPC Lab., Huawei Technologies Co., Ltd., Hangzhou, Zhejiang 310053, China
  • Received:2023-10-31 Online:2024-08-20 Published:2024-08-20

Abstract:

[Background] ARM multicore architecture processors play an increasingly important role in domains such as molecular dynamics, fluid dynamics, and weather simulations due to their high performance, parallelism, and low power consumption. [Limitation] However, the diverse workload characteristics and various parallelization strategies employed in molecular dynamics simulation software, such as particle interaction and spatiotemporal domain decomposition, pose challenges to efficient utilization of the highly parallel computational resources of multicore processors, leading to low execution efficiency of individual compute units. This has become one of the significant bottlenecks limiting performance improvement. [Method] This paper focuses on Huawei Technologies’ self-developed ARM-based Kunpeng-920 processor and the GROMACS software as the research subjects. It conducts a detailed analysis of the Kunpeng-920 processor’s architecture and computational capabilities, as well as the task decomposition and parallel execution characteristics of the GROMACS software. Based on this analysis, it proposes a runtime parallel parameter optimization strategy to better match the software’s computational requirements with the hardware’s computational capabilities, thereby improving the software’s computational performance. [Result] By systematically identifying performance bottlenecks and implementing optimization strategies, our scheme achieves a 16.9% acceleration compared to the pre-optimized state. [Conclusion] This research outcome can serve as a reference for performance optimization of molecular dynamics simulations in multicore computing environments, for the development of domestically produced high-performance computing systems, and for dedicated machines for molecular dynamics simulations.

Key words: molecule dynamic simulation, GROMACS, Kunpeng-920, performance optimization