数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (4): 92-104.

doi: 10.11871/jfdc.issn.2096-742X.2020.04.008

所属专题: 下一代互联网络技术与应用

• 技术与应用 • 上一篇    下一篇

海洋环流模式LICOM的GPU实现与优化

张留莹1,2(),王鹏飞3(),张峰1,2(),刘海龙3(),林鹏飞3(),王涛1(),韦俊林1,2(),田少博1,2(),姜金荣1,*(),迟学斌1()   

  1. 1.中国科学院计算机网络信息中心,北京 100190
    2.中国科学院大学,北京 100049
    3.中国科学院大气物理研究所,北京 100029
  • 收稿日期:2020-03-13 出版日期:2020-08-20 发布日期:2020-09-10
  • 通讯作者: 姜金荣
  • 作者简介:张留莹,中国科学院计算机网络信息中心,在读硕士研究生,主要研究方向为高性能计算与应用。
    本文承担工作为:LICOM3程序在GPU上的代码实现,应用测试。
    Zhang Liuying is a master student at Computer Network Information Center of the Chinese Academy of Sciences. Her main research interests are high-performance computing and applications.
    In this paper she undertakes the following tasks: code implementations of LICOM3 on the GPUs and its testing.
    E-mail: zhangliuying@cnic.cn|王鹏飞,中国科学院大气物理研究所,高级工程师,主要研究方向为数值模拟、气候系统模式研发与应用、海气相互作用、可预报性。
    本文承担工作为:LICOM3程序的开发指导,正确性验证。
    Wang Pengfei is a senior engineer at Institute of Atmospheric Physics, Chinese Academy of Sciences. His main research interests are numerical simulation, development and application of climate system models, air-sea interaction and predictability.
    In this paper he undertakes the following tasks: research guidance of LICOM3 and the correctness verification.
    E-mail: wpf@mail.iap.ac.cn|张峰,中国科学院计算机网络信息中心,在读博士研究生,主要研究方向为高性能计算与应用。
    本文承担工作为:LICOM3程序在GPU上的代码实现,应用测试。
    Zhang Feng is a PhD student at Computer Network Information Center of the Chinese Academy of Sciences. His main research interests are high-performance computing and applications.
    In this paper he undertakes the following tasks: code implementations of LICOM3 on the GPUs and its testing.
    E-mail: zhangfeng@cnic.cn|刘海龙,中国科学院大气物理研究所,研究员,主要研究方向为海洋环流及其数值模拟。
    本文承担工作为:LICOM3程序开发指导,正确性验证。
    Liu Hailong is a research fellow at Institute of Atmospheric Physics, Chinese Academy of Sciences. His main research interests are ocean circulation and its numerical simulation.
    In this paper he undertakes the following tasks: research guidance of LICOM3 and the correctness verification.
    E-mail: lhl@lasg.iap.ac.cn|林鹏飞,中国科学院大气物理研究所,研究员,主要研究方向为海气相互作用、海洋模式发展、海洋生物与物理相互作用。
    本文承担工作为:LICOM3程序开发指导。
    Lin Pengfei is a research fellow at Institute of Atmospheric Physics, Chinese Academy of Sciences. His main research interests are ocean modeling, physical oceanography, aquatic ecosystems, air-sea interactions and their climate effects, mesoscale eddies.
    In this paper he undertakes the following tasks: development guidance of LICOM3.
    E-mail: linpf@mail.iap.ac.cn|王涛,中国科学院计算机网络信息中心,工程师,主要研究方向为高性能计算。
    本文承担工作为:LICOM3程序在GPU上实现。
    Wang Tao is an engineer at Computer Network Information Center of the Chinese Academy of Sciences. His main research interests is high performance computing.
    In this paper he undertakes the following tasks: code implementations of LICOM3 on the GPUs.
    E-mail: wangtao@cnic.cn|韦俊林,中国科学院计算机网络信息中心,在读硕士研究生,主要研究方向为高性能计算与应用。
    本文承担工作为:LICOM3程序在GPU上实现。
    Wei Junlin is a master student at Computer Network Information Center of the Chinese Academy of Sciences. His main research interests are high-performance computing and applications.
    In this paper he undertakes the following tasks: code implementations of LICOM3 on the GPUs.
    E-mail: weijunlin19@mails.ucas.edu.cn|田少博,中国科学院计算机网络信息中心,在读博士研究生,主要研究方向为高性能计算和科学计算。
    本文承担工作为:LICOM3程序在GPU上实现。
    Tian Shaobo is a PhD student at Computer Network Information Center of the Chinese Academy of Sciences. His main research interests are high performance computing and scientific computing.
    In this paper he undertakes the following tasks: code implementations of LICOM3 on the GPUs.
    E-mail: tianshaobo@cnic.cn|姜金荣,中国科学院计算机网络信息中心,研究员,主要研究方向为并行算法与框架软件、计算地球科学。
    本文承担工作为:LICOM3程序在GPU上整体结构设计,研究指导。
    Jiang Jinrong is a research fellow at Computer Network Information Center of the Chinese Academy of Sciences. His main research interests are parallel computing algorithms and frameworks.
    In this paper he undertakes the following tasks: the design of overall structure and research guidance of LICOM3 on GPUs.
    E-mail: jjr@sccas.cn|迟学斌,中国科学院计算机网络信息中心,研究员,主要研究方向为高性能计算、并行计算。
    本文承担工作为:LICOM3程序并行算法设计指导。
    Chi Xuebin is a research fellow at Computer Network Information Center of the Chinese Academy of Sciences. His main research interests are high performance computing and parallel computing.
    In this paper he undertakes the following tasks: the research guidance of LICOM3 on GPUs.
    E-mail: chi@sccas.cn
  • 基金资助:
    国家重点研发计划“地球系统模式的改进、应用开发和高性能计算”(2016YFB0200800);国家自然科学重点基金“涡分辨全球气候海洋模式及海洋气候效应研究”(41931183);中国科学院科研信息化应用工程“高分辨率地球系统模式集成与优化”(XXH13506-402);中国科学院战略性先导科技专项(C)“国产安全可控先进计算系统研制”(XDC01040100)

The Implementation and Optimization of LICOM on GPUs

Zhang Liuying1,2(),Wang Pengfei3(),Zhang Feng1,2(),Liu Hailong3(),Lin Pengfei3(),Wang Tao1(),Wei Junlin1,2(),Tian Shaobo1,2(),Jiang Jinrong1,*(),Chi Xuebin1()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China
  • Received:2020-03-13 Online:2020-08-20 Published:2020-09-10
  • Contact: Jiang Jinrong

摘要:

【目的】为了加速海洋环流模式LICOM的积分计算,降低因分辨率的提升而带来的运行成本,本文设计并实现了基于CUDA C的GPU加速版本。【方法】本文基于目前最新的LICOM3版本,在分析LICOM海洋网格块的并行算法的基础上,结合使用CUDA线程并行计算海洋网格点,将LICOM主要计算程序移植到GPU平台上,并从数据传输和设备内存的使用两个方面进行优化。【结果】实验表明,GPU版本模拟结果的与原CPU版本基本一致。与使用相同数量的Intel Xeon E5-2680 V2 CPU相比,使用2至16块NVIDIA K20 GPU单个模式天加速了9.31到1.27倍。【局限】由于LICOM3计算的边界同步通信比较多,限制了程序的可扩展性,未来需要通过边界通信优化和算法优化来提高模式的可扩展性。【结论】本文对LICOM3程序进行了GPU版本的实现和优化,取得了一定的加速效果并保持较好的扩展性,为今后面向更大规模计算的海洋环流模式发展提供了经验和参考。

关键词: GPU, CUDA, 并行计算, 高性能计算, LICOM, 海洋环流模式

Abstract:

[Objective] In order to accelerate the calculation of the LICOM oceanic circulation model and reduce the cost caused by the high resolution, this paper designs and implements a GPU accelerat-ed version using CUDA C. [Methods] Based on the latest version of LICOM3, this paper analyzes the parallel algorithms of ocean grid block, and uses CUDA threads to calculate the grid points in parallel, which enables porting of the main program of LICOM to the GPU platform, and data transmission and device memory usage are optimized. [Results] Experiments show that the simulation results of GPU version program are basically same as the original CPU version program, while achieving 9.31x to 1.27x speedup on 2 to 16 NVIDIA K20 GPUs compared with the same number of Intel Xeon E5-2680 V2 CPUs. [Limitations] Because there are many boundary synchronous communications in LICOM3, which limits the scalability of the program, and it is necessary to improve the scalability of the model through boundary communications optimization and algorithm optimization. [Conclusions] This paper implements and optimizes the GPU version of the LICOM3 program, achieves some speedup and keep a good scalability, which provides experience and reference for the development of larger-scale oceanic circulation model in the future.

Key words: GPU, CUDA, parallel computing, high performance computing, LICOM, oceanic circulation model