数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (3): 101-112.

doi: 10.11871/jfdc.issn.2096-742X.2020.03.009

所属专题: 下一代互联网络技术与应用

• 技术与应用 • 上一篇    下一篇

基于Charm++的并行FMM实现

丁磊1,2(),王武1,*(),姜金荣1(),赵莲1()   

  1. 1. 中国科学院计算机网络信息中心,北京 100190
    2. 中国科学院大学,北京 100049
  • 收稿日期:2020-02-19 出版日期:2020-06-20 发布日期:2020-08-19
  • 通讯作者: 王武
  • 作者简介:丁磊,中国科学院计算机网络信息中心,硕士研究生,主要研究方向为高性能计算、并行计算。
    本文承担工作为:移植与负载均衡方案设计,代码实现,结果测试。
    Ding Lei is a master student at Computer Network Information Center of Chinese Academy of Sciences. His main research interests are high performance computing and parallel computing.
    In this paper he undertakes the following tasks: design, implementation and test of the load balance strategy.
    E-mail: dinglei2017@cnic.cn|王武,中国科学院计算机网络信息中心,博士,副研究员,研究方向为并行算法,高性能计算。
    本文承担工作为:FMM与N-body开发指导。
    Wang Wu, Ph.D., is an associate research fellow at Computer Network Information Center, Chinese Academy of Sciences. His main research interests are parallel algorithm and high performance computing.
    In this paper, he is the execution director of implementation of FMM and N-body simulation.
    E-mail: wangwu@sccas.cn|姜金荣,中国科学院计算机网络信息中心,博士,研究员,主要研究方向为并行算法与框架软件、计算地球科学。
    本文承担工作为:整体方案设计指导。
    Jiang Jinrong is the research fellow at Computer Network Information Center of Chinese Academy of Sciences. His main research interests are parallel computing algorithms and frameworks.
    In this paper he undertakes the following tasks: design of general implementation.
    E-mail: jjr@sccas.cn|赵莲,中国科学院计算机网络信息中心,助理研究员,博士,主要研究方向为高性能计算、并行计算。
    本文承担工作为:Charm++开发指导。
    Zhao Lian, Ph.D., is an assistant research fellow at Computer Network Information Center of Chinese Academy of Sciences. Her main research interests are high performance computing and parallel computing.
    In this paper she directed the code implementation of Charm++.
    E-mail: zhaolian@sccas.cn
  • 基金资助:
    国家重点研发计划“地球系统模式的改进、应用开发和高性能计算”(2016YFB0200800);中国科学院科研信息化应用工程:高性能应用软件(XXH13506-405);中国科学院战略性先导科技专项(C):国产安全可控先进计算系统研制(XDC01040100)

Implementation of Parallel FMM Based on Charm++

Ding Lei1,2(),Wang Wu1,*(),Jiang Jinrong1(),Zhao Lian1()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-02-19 Online:2020-06-20 Published:2020-08-19
  • Contact: Wang Wu

摘要:

【目的】为了利用Charm++的过分解与运行时迁移特性,提高FMM的并行执行效率,本文在Charm++上完成了FMM的并行实现。【方法】通过分析通信、并行任务分解、异步调用转化,采用SDAG实现了基本通信函数,并利用LPT近似策略达到了负载均衡,最终实现了并行FMM。【结果】测试结果表明,FMM的Charm++实现的计算精度与MPI实现完全相同,在千核规模上的执行速度优于MPI实现。过分解与负载均衡策略在粒子分布不均的情况下减少了10%的运行时间。【局限】目前的实现没有利用Charm++共享内存的结构,仍有优化的空间,负载均衡策略较为简单。【结论】本文给出了一个较为通用的MPI风格程序向Charm++转化的策略,并证明了Charm++的过分解与负载均衡策略对FMM有加速效果。

关键词: Charm++, FMM, 负载均衡, 过分解

Abstract:

[Objective] This paper has implemented a parallel FMM based on Charm++ to take advantage of its over-decomposition and migratability. [Methods] It is achieved by analyzing communication, separating parallel tasks, and converting synchronous communication to asynchronous communication. Also, the SDAG was used to implement the basic communication calls and the LPT approximation strategy was adopted for dynamic load balancing. [Results] The results show that the implementation of parallel FMM based on Charm++ has the same accuracy as that of MPI implementation, and its execution speed on the thousand-core scale is better than that of MPI implementation. Over-decomposition and load-balancing strategy contribute to the execution time reduction by 10% in the unbalance particle distribution. [Limitations] The current implementation does not use the shared memory structure of Charm++ and needs further optimizations. Besides, the load balancing strategy is simple. [Conclusions] This paper gives a relatively general method to convert the MPI style programs to Charm++ style ones and proves that over-decomposition and load-balancing strategy can accelerate FMM execution.

Key words: Charm++, FMM, load balancing, over-decomposition