数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (2): 155-164.

doi: 10.11871/jfdc.issn.2096-742X.2020.02.013

所属专题: “数据分析技术与应用”专刊

• 技术与应用 • 上一篇    

基于FMM-PM方法的宇宙N体模拟在GPU上的实现和优化

扶月月1,2,王武1,*(),王乔2,3   

  1. 1. 中国科学院计算机网络信息中心,北京 100190
    2. 中国科学院大学,北京 100049
    3. 中国科学院国家天文台,北京 100101
  • 收稿日期:2020-02-04 出版日期:2020-04-20 发布日期:2020-06-03
  • 通讯作者: 王武
  • 作者简介:扶月月,中国科学院计算机网络信息中心,硕士研究生,研究方向为并行计算。
    本文承担的工作为设计和实现快速多极子方法在GPU上的实现和优化。
    Fu Yueyue is a master student at Computer Network Information Center, Chinese Academy of Sciences. Her main research interest is parallel computing algorithm.
    In this paper, she undertakes the following tasks: design, optimization and implementation of fast multipole method on GPU.
    E-mail: fuyueyue@cnic.cn|王武,中国科学院计算机网络信息中心,博士,副研究员,研究方向为并行算法、高性能计算。
    本文承担的工作为设计指导快速多极子方法在GPU上的实现和优化。
    Wang Wu, Ph.D., is an associate research fellow at Computer Network Information Center, Chinese Academy of Sciences. His main research interests are parallel computing algorithm and high performance computing.
    In this paper, he is the director for design, optimization and implementation of the fast multipole method on GPU.|王乔,中国科学院国家天文台,博士,副研究员,研究方向为宇宙大尺度结构、计算宇宙学。
    本文承担的工作为设计和实现并行快速多极子方法与粒子网格方法。
    Wang Qiao, Ph.D., is an associate research fellow at National Astronomical Observatories, Chinese Academy of Sciences. His main research interests are cosmic large scale structure and computational cosmology.
    In this paper, he undertakes the following tasks: design and implementation of the parallel fast multipole method and particle mesh method.
    E-mail: qwang@nao.cas.cn
  • 基金资助:
    国家重点研发计划项目“宇宙学高性能异构模拟系统”(2017YFB0203302);中国科学院“十三五”信息化专项“科研信息化应用工程”(XXH13506-405);中国科学院战略性先导科技专项(C类)(XDC01040100)

The Implementation and Optimization of Cosmological N-Body Simulation by FMM-PM Method on GPUs

Fu Yueyue1,2,Wang Wu1,*(),Wang Qiao2,3   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012, China
  • Received:2020-02-04 Online:2020-04-20 Published:2020-06-03
  • Contact: Wu Wang

摘要:

【目的】本文在多GPU平台上,对基于快速多极子方法(FMM)和粒子网格方法(PM)的天文N体模拟软件PHoToNs的核心函数进行CUDA加速实现和性能优化。【方法】主要优化方法包括算法的参数优化、页锁定内存和CUDA流优化、混合精度和快速数学库优化等。【结果】优化后的短程力相互作用核心函数在Titan V的GPU平台上采用4张GPU卡的计算速度相对采用4个Intel Xeon CPU核提高了约410倍。【结论】本文的优化技术可为其它高性能GPU异构平台上的进一步算法研究和超大规模天文N体模拟提供支撑。

关键词: N体模拟, 快速多极子方法, GPU, 优化

Abstract:

[Objective] In this paper, the kernel functions of PhoToNs, which is an astronomical N-body simulation software based on the fast multipole method (FMM) and particle grid method (PM), are accelerated and optimized for CUDA on a multi-GPU platform. [Methods] The main optimization methods adopted in CUDA kernels include: algorithm parameter optimization, use of page-locked memory and CUDA streams, and use of mixed precision and fast math library. [Results] The kernel function of short range force interaction is deeply optimized, which achieves a speedup of about 410 times faster on four Titan V GPUs than the pure MPI code running on four Intel Xeon CPU cores. [Conclusions] Optimization methods in this paper can support further algorithm research and hyper-scale N-body simulation on other high performance GPU-based heterogeneous platforms.

Key words: N-Body simulation, fast multipole method, GPU, optimization