数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (1): 68-78.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.01.007

doi: 10.11871/jfdc.issn.2096-742X.2024.01.007

• 技术与应用 • 上一篇    下一篇

面向GPU架构的CCFD-KSSolver组件设计和实现

张浩源1,2(),马文鹏3,*(),袁武1,2,张鉴1,2,陆忠华1,2   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100049
    3.信阳师范学院,河南 信阳 464000
  • 收稿日期:2022-09-19 出版日期:2024-02-20 发布日期:2024-02-21
  • 通讯作者: * 马文鹏(E-mail: mawp@xynu.edu.cn
  • 作者简介:张浩源,中国科学院计算机网络信息中心,博士研究生,主要研究方向为稀疏线性解法器、计算流体力学。
    本文承担工作为:KSSolver软构件设计与性能测试。ZHANG Haoyuan is a Ph.D. candidate at CNIC. His main research interests include sparse linear solvers and computational fluid dynamics.
    In this paper, he is mainly responsible for KSSolver software component design and performance tests.
    E-mail: zhanghaoyuan@cnic.cn|马文鹏,信阳师范学院,副教授,主要研究方向为数值并行计算、高性能计算。
    本文承担工作为:指导KSSolver软构件设计与算法开发。MA Wenpeng, Ph.D., is an associate professor at Xinyang Normal University. His main research interests include grid computing and high-performance computation.
    In this paper, he is mainly responsible for guiding KSSolver program design and algorithm development.
    E-mail: mawp@xynu.edu.cn
  • 基金资助:
    国家重点研发计划资助(2020YFB1709500);河南省重点研发与推广专项(222102210162)

Implementation of CCFD-KSSolver Component for GPU Architecture

ZHANG Haoyuan1,2(),MA Wenpeng3,*(),YUAN Wu1,2,ZHANG Jian1,2,LU Zhonghua1,2   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Xinyang Normal University, Xinyang, Henan 464000, China
  • Received:2022-09-19 Online:2024-02-20 Published:2024-02-21

摘要:

【应用背景】在如计算流体力学和材料科学等高性能应用领域中,大型稀疏线性方程的求解直接影响高性能应用的效率与精度。异构众核已成为现代超算系统体系结构的重要特征和发展趋势。【方法】本文面向CPU+GPU异构超算系统设计并实现了线性解法器组件CCFD-KSSolver。该组件针对异构体系结构特征,实现了针对多物理场块结构矩阵的Krylov子空间解法器和多种典型预处理方法,采用了如计算通信重叠、GPU访存优化、CPU-GPU协同计算等优化技术提升CCFD-KSSolver的计算效率。【结果】顶盖驱动流的实验表明,当子区域数目为8时,Block-ISAI相比于CPU和cuSPARSE的子区域求解器分别取得20.09倍和3.34倍的加速比,且具有更好的扩展性;对于百万阶规模的矩阵,应用3种子区域求解器的KSSolver在8个GPU上的并行效率分别为83.8%、55.7%、87.4%。【结论】本文选择具有块结构的经典多物理中的应用对解法器及预处理软构件进行测试,证明其稳定高效性,有力支撑了以流体力学数值模拟为代表的高性能计算与应用在异构系统上的开展。

关键词: GPU, KSSolver, 并行优化, 预条件, 高性能计算

Abstract:

[Application Background] In high-performance applications such as computational fluid dynamics and material science, the efficiency and accuracy will be directly affected by the solution of large sparse linear equations. Heterogeneous many-core has become an important feature of modern supercomputing architecture and will be the future trend. [Methods] The linear solver component CCFD-KSSolver is designed and implemented for a CPU+GPU heterogeneous supercomputing system. The component implements the Krylov subspace solver for the multi-physical field block structure matrix and a variety of typical preconditioners. Optimization techniques such as computation-communication overlap, GPU memory access optimization, and CPU-GPU collaborative computing are used to improve the computational efficiency of the CCFD-KSSolver. [Results] Experimental results show that when the number of subdomains is 8, Block-ISAI achieves a speedup of 20.09×and 3.34×compared with CPU and cuSPARSE subdomain solvers, respectively, and has better scalability. For million-level matrices, the parallel efficiency of the three subdomain solvers of KSSolver on eight GPUs is 83.8%, 55.7%, and 87.4%, respectively. [Conclusions] The application of classical multi-physics with block structure is selected to test the solver and preconditioning components. The results show that the solver is stable and efficient, which strongly supports the development of high-performance computing and applications on heterogeneous systems.

Key words: GPU, KSSolver, parallel optimization, preconditioner, high-performance computing