数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (4): 182-193.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.04.016

doi: 10.11871/jfdc.issn.2096-742X.2024.04.016

• 技术与应用 • 上一篇    

GROMACS在鲲鹏920平台的性能分析及运行优化

原惠峰1,2(),陆腾1,朱延超3,晏臣3,马英晋1,刘倩1,金钟1,*()   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100190
    3.华为技术有限公司高性能计算应用实验室,浙江 杭州 310053
  • 收稿日期:2023-10-31 出版日期:2024-08-20 发布日期:2024-08-20
  • 通讯作者: *金钟(E-mail: zjin@sccas.cn
  • 作者简介:原惠峰,中国科学院计算机网络信息中心,博士研究生,工程师,CCF会员(会员号:I3071M),研究方向为高性能计算。
    本文主要承担的工作为:方案设计及GROMACS性能指标分析与调优。
    YUAN Hui-feng is an engineer and a Ph.D. student at the Computer Network Information Center, Chinese Academy of Sciences. He is a member of the China Computer Federation (CCF member number: I3071M). His main research interests include high-performance computing.
    In this paper, he is mainly responsible for proposal design and GROMACS performance metric analysis and optimization.
    E-mail: hfyuan@cnic.cn|金钟,中国科学院计算机网络信息中心,高性能计算技术与应用发展部主任,研究员,研究方向为高性能计算及生物医药计算。
    本文主要承担的工作为:方案设计及参数调优指导。
    JIN Zhong is a researcher and the director of the Department of High-Performance Computing Technology and Application Development at the Computer Network Information Center, Chinese Academy of Sciences. His main research interests include high-performance computing and biomedical computing.
    In this paper, he is mainly responsible for proposal design and guidance on parameter optimization.
    E-mail: zjin@sccas.cn
  • 基金资助:
    国家重点研发计划“多物理复杂体系科学计算应用平台”(2020YFB0204802);国家自然科学基金“针对密度矩阵重正化群及其衍生方法的高性能计算程序开发研究”(22173114)

Performance Analysis and Runtime Optimization of GROMACS on Kunpeng-920 Platform

YUAN Huifeng1,2(),LU Teng1,ZHU Yanchao3,YAN Chen3,MA Yingjin1,LIU Qian1,JIN Zhong1,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100190, China
    3. HPC Lab., Huawei Technologies Co., Ltd., Hangzhou, Zhejiang 310053, China
  • Received:2023-10-31 Online:2024-08-20 Published:2024-08-20

摘要:

【应用背景】 ARM众核架构处理器以其高性能、高并行性及低功耗的特点,在分子动力学、流体及天气模拟等领域扮演着越来越重要的作用。【局限】 然而,分子动力学模拟软件运行时不同维度的任务分解策略(如粒子作用、时空域分解等)、多样化的并行策略导致负载特征多样而与众核处理器基于高度并行的计算资源所产生的算力这一特点难以很好地匹配,进而导致各计算单元运行时效率低的问题已成为了限制运行时性能提升的重要瓶颈之一。【方法】 针对这一问题,以华为技术有限公司自主研发ARM架构鲲鹏920处理器和GROMACS软件为研究对象,通过对鲲鹏920处理器结构特点和算力特征、GROMACS软件任务分解、并行执行过程进行深入分析,提出运行时并行参数优化策略,以更好地适配软件的算力需求和硬件的算力特点,提升了软件计算性能。【结果】 通过系统分析性能瓶颈并实践优化策略,相比优化前取得了16.9%的加速效果。【结论】 此研究成果可为分子动力学模拟在众核计算环境下的性能优化、国产高性能计算系统及分子动力学模拟专用机等的研发提供一定的参考依据。

关键词: 分子动力学, GROMACS, 鲲鹏920, 性能优化

Abstract:

[Background] ARM multicore architecture processors play an increasingly important role in domains such as molecular dynamics, fluid dynamics, and weather simulations due to their high performance, parallelism, and low power consumption. [Limitation] However, the diverse workload characteristics and various parallelization strategies employed in molecular dynamics simulation software, such as particle interaction and spatiotemporal domain decomposition, pose challenges to efficient utilization of the highly parallel computational resources of multicore processors, leading to low execution efficiency of individual compute units. This has become one of the significant bottlenecks limiting performance improvement. [Method] This paper focuses on Huawei Technologies’ self-developed ARM-based Kunpeng-920 processor and the GROMACS software as the research subjects. It conducts a detailed analysis of the Kunpeng-920 processor’s architecture and computational capabilities, as well as the task decomposition and parallel execution characteristics of the GROMACS software. Based on this analysis, it proposes a runtime parallel parameter optimization strategy to better match the software’s computational requirements with the hardware’s computational capabilities, thereby improving the software’s computational performance. [Result] By systematically identifying performance bottlenecks and implementing optimization strategies, our scheme achieves a 16.9% acceleration compared to the pre-optimized state. [Conclusion] This research outcome can serve as a reference for performance optimization of molecular dynamics simulations in multicore computing environments, for the development of domestically produced high-performance computing systems, and for dedicated machines for molecular dynamics simulations.

Key words: molecule dynamic simulation, GROMACS, Kunpeng-920, performance optimization