Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (2): 155-164.doi: 10.11871/jfdc.issn.2096-742X.2020.02.013

• Technology and Applicaton • Previous Articles    

The Implementation and Optimization of Cosmological N-Body Simulation by FMM-PM Method on GPUs

Fu Yueyue1,2,Wang Wu1,*(),Wang Qiao2,3   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012, China
  • Received:2020-02-04 Online:2020-04-20 Published:2020-06-03
  • Contact: Wu Wang


[Objective] In this paper, the kernel functions of PhoToNs, which is an astronomical N-body simulation software based on the fast multipole method (FMM) and particle grid method (PM), are accelerated and optimized for CUDA on a multi-GPU platform. [Methods] The main optimization methods adopted in CUDA kernels include: algorithm parameter optimization, use of page-locked memory and CUDA streams, and use of mixed precision and fast math library. [Results] The kernel function of short range force interaction is deeply optimized, which achieves a speedup of about 410 times faster on four Titan V GPUs than the pure MPI code running on four Intel Xeon CPU cores. [Conclusions] Optimization methods in this paper can support further algorithm research and hyper-scale N-body simulation on other high performance GPU-based heterogeneous platforms.

Key words: N-Body simulation, fast multipole method, GPU, optimization