Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (4): 92-104.doi: 10.11871/jfdc.issn.2096-742X.2020.04.008

• Technology and Applicaton • Previous Articles     Next Articles

The Implementation and Optimization of LICOM on GPUs

Zhang Liuying1,2(),Wang Pengfei3(),Zhang Feng1,2(),Liu Hailong3(),Lin Pengfei3(),Wang Tao1(),Wei Junlin1,2(),Tian Shaobo1,2(),Jiang Jinrong1,*(),Chi Xuebin1()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China
  • Received:2020-03-13 Online:2020-08-20 Published:2020-09-10
  • Contact: Jiang Jinrong E-mail:zhangliuying@cnic.cn;wpf@mail.iap.ac.cn;zhangfeng@cnic.cn;lhl@lasg.iap.ac.cn;linpf@mail.iap.ac.cn;wangtao@cnic.cn;weijunlin19@mails.ucas.edu.cn;tianshaobo@cnic.cn;jjr@sccas.cn;chi@sccas.cn

Abstract:

[Objective] In order to accelerate the calculation of the LICOM oceanic circulation model and reduce the cost caused by the high resolution, this paper designs and implements a GPU accelerat-ed version using CUDA C. [Methods] Based on the latest version of LICOM3, this paper analyzes the parallel algorithms of ocean grid block, and uses CUDA threads to calculate the grid points in parallel, which enables porting of the main program of LICOM to the GPU platform, and data transmission and device memory usage are optimized. [Results] Experiments show that the simulation results of GPU version program are basically same as the original CPU version program, while achieving 9.31x to 1.27x speedup on 2 to 16 NVIDIA K20 GPUs compared with the same number of Intel Xeon E5-2680 V2 CPUs. [Limitations] Because there are many boundary synchronous communications in LICOM3, which limits the scalability of the program, and it is necessary to improve the scalability of the model through boundary communications optimization and algorithm optimization. [Conclusions] This paper implements and optimizes the GPU version of the LICOM3 program, achieves some speedup and keep a good scalability, which provides experience and reference for the development of larger-scale oceanic circulation model in the future.

Key words: GPU, CUDA, parallel computing, high performance computing, LICOM, oceanic circulation model