混合精度GMRES算法在格点量子色动力学中的应用

doi:10.11871/jfdc.issn.2096-742X.2024.06.004

数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (6): 32-42.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.06.004

doi: 10.11871/jfdc.issn.2096-742X.2024.06.004

混合精度GMRES算法在格点量子色动力学中的应用

张克龙,何连花^*(),徐顺,金钟

中国科学院计算机网络信息中心，北京 100083

收稿日期:2024-02-04 出版日期:2024-12-20 发布日期:2024-12-20
通讯作者: 何连花
作者简介:张克龙，博士，助理研究员，主要研究方向为高性能计算、高能物理等。
本文中负责论文撰写，代码开发。
ZHANG KeLong, Ph.D., assistant professor. His research interests include high-performance computing, high energy physics, etc.
In this paper, he is responsible for the paper drafting and code development.|何连花，博士，助理研究员，主要研究方向为高性能计算、计算数学等。
本文中负责论文撰写，算法设计及代码开发。
HE Lianhua, Ph.D., assistant professor. Her research interests include high-performance computing, computational mathematics, etc.
In this paper, she is responsible for the paper drafting, algorithm design and code development.
基金资助:
国家重点研发计划“面向新一代国产超算系统的应用支撑环境和开发框架”(2023YFB3001900);HPC应用优化大颗粒项目YBN2020055045格点量子色动力学软件移植与优化项目

Application of Mixed Precision GMRES Method in Lattice Quantum Chromodynamics

ZHANG Kelong,HE Lianhua^*(),XU Shun,JIN Zhong

Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China

Received:2024-02-04 Online:2024-12-20 Published:2024-12-20
Contact: HE Lianhua

摘要/Abstract

摘要：

【应用背景】格点量子色动力学是通过计算机模拟进行粒子物理研究的重要理论，该物理模型构建于四维结构化网格上，模拟过程中最主要的计算热点为Dirac方程对应的大型稀疏线性系统求解，通常问题规模可达上亿维度。【方法】本文采用广义极小残差法（GMRES）求解此大型稀疏线性问题，其中采用无矩阵算法实现Wilson费米子的复数矩阵向量乘。首先，评估了GMRES方法中子空间维数m的选取情况。其次，为了减少GMRES算法的数据通信及内存占用，改善GMRES算法在格点量子色动力学中的计算性能，实现了4种单双精度混合的GMRES算法，测试了其在国产计算平台上的计算性能并分析了4种混合精度算法各个kernel的加速情况。【结果】实验结果表明，4种混合精度GMRES算法与双精度GMRES算法收敛性一致，且获得不同程度的性能加速。【局限与展望】分析了混合精度GMRES算法性能瓶颈，并对未来研究进行了展望。

关键词: 格点量子色动力学, 混合精度, GMRES算法, 并行计算

Abstract:

[Application Background] Lattice quantum chromodynamics is an important theory for particle physics research through computer simulation, which is built on a four-dimensional space-time lattice. The main computing hotspot in the simulation process is solving sparse linear systems with hundreds of millions of unknowns. [Methods] The generalized minimal residual method (GMRES) is used in this paper, in which a matrix-free algorithm is used to realize the complex matrix-vector multiplication of Wilson fermions. Firstly, the selection of subspace dimension in the GMRES method is evaluated. Secondly, in order to reduce the amount of data movement and memory occupation, and thus, to improve the computational performance of the GMRES algorithm, we implement four mixed precision GMRES algorithms, evaluate the corresponding performance in lattice quantum chromodynamics, and analyze the acceleration of each kernel. [Results] The experimental results show that the four mixed-precision GMRES algorithms converge consistently with the double-precision GMRES algorithm and obtain different degrees of performance acceleration. [Limitations and Conclusions] The performance bottleneck of the mixed-precision algorithms is analyzed, and the future research directions are forecasted.

Key words: lattice quantum chromodynamics, mixed precision, GMRES, parallel computing

张克龙,何连花,徐顺,金钟. 混合精度GMRES算法在格点量子色动力学中的应用[J]. 数据与计算发展前沿, 2024, 6(6): 32-42.

ZHANG Kelong,HE Lianhua,XU Shun,JIN Zhong. Application of Mixed Precision GMRES Method in Lattice Quantum Chromodynamics[J]. Frontiers of Data and Computing, 2024, 6(6): 32-42, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2024.06.004.

图/表 13

算法1

GMRES(m)算法"

输入：

x 0

：初始值.
输出：数值解.

1 for i=0,1,

?

do
2 计算

r = b - A x i

3 计算

β = h 1,0 = r 2

4 检查是否收敛，若收敛则退出
5 for k=1,

?

, m do
6

? v k = r / h k, k - 1

r = A v k

8 for j=1,

?

, k do
9

h j, k = (r, v j)

10 r=r-

h j, k v j

11 end
12 计算

h k + 1, k = r 2

13 end
14 定义

V k = v 1, ?, v k, ? H k = h i, j 1 ≤ ? ? i ≤ ? ? k + 1,1 ≤ ? ? j ≤ ? ? k

15 计算

y k = a r g m i n y β e 1 - H k y k 2

? x i + 1 = x i + V k y k

17 end

算法1

算法2

GMRES-SD算法"

输入：

x 0

：初始值.
输出：数值解.

1 计算

r 0 = b - A x 0

[single]
2 for i=0,1,

?

do
3 GMRES(m)求解

A e i = r i

[single]
4

x i + 1 = x i + e i

[single]
5

? r i + 1 = b - A x i + 1

[single]
6 若

r i + 1 2 / r 0 2 10 - 7

，令

r 0 = r i + 1

且退出
7 end
8 for i=0,1,

?

do
9 GMRES(m)求解

A e i = r i

[double]
10

x i + 1 = x i + e i

[double]
11

? r i + 1 = b - A x i + 1

[double]
12 若

r i + 1 2 / r 0 2 t o l

，退出
13 end

算法2

算法3

GMRES-IR算法"

输入： $x 0$ ：初始值. 输出：数值解.
1 计算 $r 0 = b - A x 0$ [double] 2 for i=0,1, $?$ do 3 GMRES(m)求解 $A e i = r i$ [single] 4 $x i + 1 = x i + e i$ [double] 5 $? r i + 1 = b - A x i + 1$ [double] 6 检查是否收敛，若收敛则退出 7 end

算法3

算法4

FGMRES-GMRES 算法"

输入：

x 0

：初始值.
输出：数值解.

1 for i=0,1,

?

do
2 计算

r = b - A x i

[double]
3 计算

β = h 1,0 = r 2

[double]
1 检查是否收敛，若收敛则退出
2 for k=1,

?

m o u t

do
3

v k = r / h k, k - 1

[double]
7 用GMRES

(m i n)

求解

A z k = v k

（初始值

z k

=0）[single]
8

? r = A v k

[double]
9 for j=1,

?

,k do
10

h j, k = (r, v j)

[double]
11 r=r-

h j, k v j

[double]
12 end
13 计算

h k + 1, k = r 2

[double]
14 end
15 定义

Z k = z 1, ?, z k, ? H k = h i, j 1 ≤ ? ? i ≤ ? ? k + 1,1 ≤ ? ? j ≤ ? ? k

16 计算

y k = a r g m i n y β e 1 - H k y k 2

[double]
17

x i + 1 = x i + Z k y k

[double]
18 end

算法4

图1

图2

图3

图4

图5

图6

图7

图8

表1

参考文献 29

[1]	DAWSON A, DÜBEN P D, MACLEOD D A, et al. Reliable low precision simulations in land surface models[J]. Climate Dynamics, 2018, 51(7): 2657-2666.
[2]	FABIEN-OUELLET G. Seismic modeling and inversion using half-precision floating point numbers[J]. Geophysics, 2020, 85(3): 1MJ-Z13.
[3]	ABDELFATTAH A, ANZT H, BOMAN E G, et al. A survey of numerical linear algebra methods utilizing mixed-precision arithmetic[J]. International Journal of High Performance Computing Applications, 2021, 35(4): 344-369.
[4]	ZOUNON M, HIGHAM N J, LUCAS C, et al. Performance impact of precision reduction in sparse linear systems solvers[J]. PeerJ Computer Science, 2022, 8: e778.
[5]	BUTTARI A, DONGARRA J, KURZAK J, et al. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-Bit accuracy[J]. ACM Transactions on Mathematical Software, 2008, 24: 1-22.
[6]	ANZT H, DONGARRA J, FLEGAR G, et al. Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers[J]. Concurrency and Computation: Practice and Experience, 2019, 31(6): e4460.
[7]	FLEGAR G, ANZT H, COJEAN T, et al. Adaptive precision block-jacobi for high performance preconditioning in the ginkgo linear algebra software[J]. ACM Transactions on Mathematical Software, 2021, 47(2): 1-28.
[8]	AHMAD K, SUNDAR H, HALL M. Data-driven mixed precision sparse matrix vector multiplication for GPUs[J]. ACM Transactions on Architecture and Code Optimization, 2019, 16(4):1-24.
[9]	TEZCAN E, TORUN T, KOŞAR F, et al. Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection[C]// IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Bordeaux, France, November 02-05, 2022. Piscataway: IEEE, 2022: 31-40.
[10]	WILSON K G. Confinement of quarks[J]. Physical Review D, 1974, 10: 2445.
[11]	HABIB S, ROSER R, GERBER R, et al. ASCR/HEP Exascale Requirements Review Report[R/OL]. https://www.osti.gov/biblio/1408335.2016.
[12]	CLARK M A, BABICH R, BARROS K, et al. Solving lattice QCD systems of equations using mixed precision solvers on GPUs[J]. Computer Physics Co-mmunications, 2010, 181(9): 1517-1528.
[13]	SLEIJPEN G L G, VAN DER VORST H A. Reliable updated residuals in hybrid Bi-CG methods[J]. Computing, 1996, 56: 141-163.
[14]	FROMMER A, KAHL K, KRIEG S, et al. Adaptive aggregation-based domain decomposition multigrid for the lattice wilson-dirac operator[J]. SIAMJournal on Scientific Computing, 2014, 36(4): A1581-A1608.
[15]	SAAD Y, SCHULTZ M H. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems[J]. SIAMJournal on Scientific and Statistical Computing, 1986, 7(3): 856-869.
[16]	MOLER C B. Iterative refinement in floating point[J]. Journal of the ACM, 1967, 14(2): 316-321.
[17]	LANGOU J, LANGOU J, LUSZCZEK P, et al. Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems) [C]// Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC’06), Tampa, USA, November 11-17, 2006. IEEE, 2006: 50.
[18]	CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems[J]. SIAM Journal on Scientific Computing, 2017, 39(6): A2834-A2856.
[19]	CARSON E, HIGHAM N. Accelerating the solution of linear systems by iterative refinement in three precisions[J]. SIAM Journal on Scientific Computing, 2018, 40(2): A817-A847.
[20]	HAIDAR A, BAYRAKTAR H, TOMOV S, et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems[J]. Proceedings of the Royal Society A, 2020, 476: 20200110.
[21]	AMESTOY P, BUTTARI A, HIGHAM N J, et al. Five-precision GMRES-based iterative refinement[J]. MIMS EPrint 2021.5, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.
[22]	TURNER K, Walker H F. Efficient high accuracy solutions with GMRES(m)[J]. SIAMJournal on Scientific and Statistical Computing, 1992, 13(3): 815-825.
[23]	LINDQUIST N, LUSZCZEK P, DONGARRA J. Improving the performance of the GMRES method using mixed-precision techniques[M]. Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham.
[24]	LINDQUIST N, LUSZCZEK P, DONGARRA J. Accelerating restarted GMRES with mixed precision arithmetic[J]. IEEE Transactions on Parallel and Distributed Systems 33(4) 1027-1037.
[25]	LOE J A, GLUSA C A, YAMAZAKI I, et al. Experimental evaluation of multiprecision strategies for GMRES on GPUs[C]// IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Portland, OR, USA, 2021.
[26]	SAAD Y. Iterative methods for sparse linear systems 2nd edition[M]. Philadelphia: SIAM, 2003.
[27]	MORGAN R B. A restarted GMRES method augmented with eigenvectors[J]. SIAMJournal on Matrix Analysis and Applications, 1995, 16(4): 1154-1171.
[28]	SAAD Y. A flexible inner-outer preconditioned GMRES algorithm[J]. SIAMJournal on Scientific Computing, 1993, 14(2): 461-469.
[29]	BABOULIN M, BUTTARI A, DONGARRA J, et al. Accelerating scientific computations with mixed precision algorithms[J]. Computer Physics Communications, 2009, 189(12): 2526-2533.

cpu核数	GMRES-SD	GMRES-IR	FGMRES-BiCGStab	FGMRES-GMRES
16	1.09	1.30	1.12	1.20
32	1.11	1.35	1.22	1.20
64	1.07	1.12	1.19	1.15
128	1.10	1.26	1.21	1.15
256	1.10	1.32	1.16	1.20

混合精度GMRES算法在格点量子色动力学中的应用

Application of Mixed Precision GMRES Method in Lattice Quantum Chromodynamics

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 29

相关文章 8

编辑推荐

Metrics

本文评价

[1]	尚小敏, 李强, 高凌云, 陶顺安, 周全, 袁武, 陆忠华. 面向国产加速卡的OpenFOAM线程并行加速研究[J]. 数据与计算发展前沿, 2024, 6(2): 134-144.
[2]	徐顺, 张宝花, 刘倩, 金钟. eMD：基于异构计算的大规模分子动力学模拟软件[J]. 数据与计算发展前沿, 2024, 6(1): 21-34.
[3]	曹义魁,陆忠华,张鉴,刘夏真,袁武,梁姗. 面向国产加速器的CFD核心算法并行优化[J]. 数据与计算发展前沿, 2021, 3(4): 93-103.
[4]	柴象海,胡寿丰,张执南,侯亮. 显式动力学子模型法在航空发动机整机瞬态冲击并行计算中的应用[J]. 数据与计算发展前沿, 2020, 2(6): 11-20.
[5]	张留莹,王鹏飞,张峰,刘海龙,林鹏飞,王涛,韦俊林,田少博,姜金荣,迟学斌. 海洋环流模式LICOM的GPU实现与优化[J]. 数据与计算发展前沿, 2020, 2(4): 92-104.
[6]	周广庆,张云泉,姜金荣,张贺,吴保东,曹杭,王天一,郝卉群,朱家文,袁良,张明华. 地球系统模式CAS-ESM[J]. 数据与计算发展前沿, 2020, 2(1): 38-54.
[7]	李晓涵,陈文光. COS：度量分布式大数据处理系统的效率[J]. 数据与计算发展前沿, 2020, 2(1): 93-104.
[8]	赵海涛, 孙家昶, 黎雷生, 杨文浩, 赵慧, 李会元. 适合一类复杂异构超算系统的HPL并行计算模型研究[J]. 数据与计算发展前沿, 2020, 2(1): 85-92.