Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (1): 68-78.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.01.007

doi: 10.11871/jfdc.issn.2096-742X.2024.01.007

• Technology and Application • Previous Articles     Next Articles

Implementation of CCFD-KSSolver Component for GPU Architecture

ZHANG Haoyuan1,2(),MA Wenpeng3,*(),YUAN Wu1,2,ZHANG Jian1,2,LU Zhonghua1,2   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Xinyang Normal University, Xinyang, Henan 464000, China
  • Received:2022-09-19 Online:2024-02-20 Published:2024-02-21

Abstract:

[Application Background] In high-performance applications such as computational fluid dynamics and material science, the efficiency and accuracy will be directly affected by the solution of large sparse linear equations. Heterogeneous many-core has become an important feature of modern supercomputing architecture and will be the future trend. [Methods] The linear solver component CCFD-KSSolver is designed and implemented for a CPU+GPU heterogeneous supercomputing system. The component implements the Krylov subspace solver for the multi-physical field block structure matrix and a variety of typical preconditioners. Optimization techniques such as computation-communication overlap, GPU memory access optimization, and CPU-GPU collaborative computing are used to improve the computational efficiency of the CCFD-KSSolver. [Results] Experimental results show that when the number of subdomains is 8, Block-ISAI achieves a speedup of 20.09×and 3.34×compared with CPU and cuSPARSE subdomain solvers, respectively, and has better scalability. For million-level matrices, the parallel efficiency of the three subdomain solvers of KSSolver on eight GPUs is 83.8%, 55.7%, and 87.4%, respectively. [Conclusions] The application of classical multi-physics with block structure is selected to test the solver and preconditioning components. The results show that the solver is stable and efficient, which strongly supports the development of high-performance computing and applications on heterogeneous systems.

Key words: GPU, KSSolver, parallel optimization, preconditioner, high-performance computing