%A Fu Yueyue,Wang Wu,Wang Qiao %T The Implementation and Optimization of Cosmological N-Body Simulation by FMM-PM Method on GPUs %0 Journal Article %D 2020 %J Frontiers of Data and Computing %R 10.11871/jfdc.issn.2096-742X.2020.02.013 %P 155-164 %V 2 %N 2 %U {http://www.jfdc.cnic.cn/CN/abstract/article_43.shtml} %8 2020-04-20 %X

[Objective] In this paper, the kernel functions of PhoToNs, which is an astronomical N-body simulation software based on the fast multipole method (FMM) and particle grid method (PM), are accelerated and optimized for CUDA on a multi-GPU platform. [Methods] The main optimization methods adopted in CUDA kernels include: algorithm parameter optimization, use of page-locked memory and CUDA streams, and use of mixed precision and fast math library. [Results] The kernel function of short range force interaction is deeply optimized, which achieves a speedup of about 410 times faster on four Titan V GPUs than the pure MPI code running on four Intel Xeon CPU cores. [Conclusions] Optimization methods in this paper can support further algorithm research and hyper-scale N-body simulation on other high performance GPU-based heterogeneous platforms.