数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (6): 32-42.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.06.004

doi: 10.11871/jfdc.issn.2096-742X.2024.06.004

• • 上一篇    下一篇

混合精度GMRES算法在格点量子色动力学中的应用

张克龙,何连花*(),徐顺,金钟   

  1. 中国科学院计算机网络信息中心,北京 100083
  • 收稿日期:2024-02-04 出版日期:2024-12-20 发布日期:2024-12-20
  • 通讯作者: 何连花
  • 作者简介:张克龙,博士,助理研究员,主要研究方向为高性能计算、高能物理等。
    本文中负责论文撰写,代码开发。
    ZHANG KeLong, Ph.D., assistant professor. His research interests include high-performance computing, high energy physics, etc.
    In this paper, he is responsible for the paper drafting and code development.|何连花,博士,助理研究员,主要研究方向为高性能计算、计算数学等。
    本文中负责论文撰写,算法设计及代码开发。
    HE Lianhua, Ph.D., assistant professor. Her research interests include high-performance computing, computational mathematics, etc.
    In this paper, she is responsible for the paper drafting, algorithm design and code development.
  • 基金资助:
    国家重点研发计划“面向新一代国产超算系统的应用支撑环境和开发框架”(2023YFB3001900);HPC应用优化大颗粒项目YBN2020055045格点量子色动力学软件移植与优化项目

Application of Mixed Precision GMRES Method in Lattice Quantum Chromodynamics

ZHANG Kelong,HE Lianhua*(),XU Shun,JIN Zhong   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2024-02-04 Online:2024-12-20 Published:2024-12-20
  • Contact: HE Lianhua

摘要:

【应用背景】格点量子色动力学是通过计算机模拟进行粒子物理研究的重要理论,该物理模型构建于四维结构化网格上,模拟过程中最主要的计算热点为Dirac方程对应的大型稀疏线性系统求解,通常问题规模可达上亿维度。【方法】本文采用广义极小残差法(GMRES)求解此大型稀疏线性问题,其中采用无矩阵算法实现Wilson费米子的复数矩阵向量乘。首先,评估了GMRES方法中子空间维数m的选取情况。其次,为了减少GMRES算法的数据通信及内存占用,改善GMRES算法在格点量子色动力学中的计算性能,实现了4种单双精度混合的GMRES算法,测试了其在国产计算平台上的计算性能并分析了4种混合精度算法各个kernel的加速情况。【结果】实验结果表明,4种混合精度GMRES算法与双精度GMRES算法收敛性一致,且获得不同程度的性能加速。【局限与展望】分析了混合精度GMRES算法性能瓶颈,并对未来研究进行了展望。

关键词: 格点量子色动力学, 混合精度, GMRES算法, 并行计算

Abstract:

[Application Background] Lattice quantum chromodynamics is an important theory for particle physics research through computer simulation, which is built on a four-dimensional space-time lattice. The main computing hotspot in the simulation process is solving sparse linear systems with hundreds of millions of unknowns. [Methods] The generalized minimal residual method (GMRES) is used in this paper, in which a matrix-free algorithm is used to realize the complex matrix-vector multiplication of Wilson fermions. Firstly, the selection of subspace dimension in the GMRES method is evaluated. Secondly, in order to reduce the amount of data movement and memory occupation, and thus, to improve the computational performance of the GMRES algorithm, we implement four mixed precision GMRES algorithms, evaluate the corresponding performance in lattice quantum chromodynamics, and analyze the acceleration of each kernel. [Results] The experimental results show that the four mixed-precision GMRES algorithms converge consistently with the double-precision GMRES algorithm and obtain different degrees of performance acceleration. [Limitations and Conclusions] The performance bottleneck of the mixed-precision algorithms is analyzed, and the future research directions are forecasted.

Key words: lattice quantum chromodynamics, mixed precision, GMRES, parallel computing