Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (3): 101-112.doi: 10.11871/jfdc.issn.2096-742X.2020.03.009

• Technology and Applicaton • Previous Articles     Next Articles

Implementation of Parallel FMM Based on Charm++

Ding Lei1,2(),Wang Wu1,*(),Jiang Jinrong1(),Zhao Lian1()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-02-19 Online:2020-06-20 Published:2020-08-19
  • Contact: Wang Wu E-mail:dinglei2017@cnic.cn;wangwu@sccas.cn;jjr@sccas.cn;zhaolian@sccas.cn

Abstract:

[Objective] This paper has implemented a parallel FMM based on Charm++ to take advantage of its over-decomposition and migratability. [Methods] It is achieved by analyzing communication, separating parallel tasks, and converting synchronous communication to asynchronous communication. Also, the SDAG was used to implement the basic communication calls and the LPT approximation strategy was adopted for dynamic load balancing. [Results] The results show that the implementation of parallel FMM based on Charm++ has the same accuracy as that of MPI implementation, and its execution speed on the thousand-core scale is better than that of MPI implementation. Over-decomposition and load-balancing strategy contribute to the execution time reduction by 10% in the unbalance particle distribution. [Limitations] The current implementation does not use the shared memory structure of Charm++ and needs further optimizations. Besides, the load balancing strategy is simple. [Conclusions] This paper gives a relatively general method to convert the MPI style programs to Charm++ style ones and proves that over-decomposition and load-balancing strategy can accelerate FMM execution.

Key words: Charm++, FMM, load balancing, over-decomposition