Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (6): 58-66.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.06.006

doi: 10.11871/jfdc.issn.2096-742X.2023.06.006

Previous Articles     Next Articles

Performance Research of a Class of Stencil Applied in Many-Core NUMA Architecture

GAO Lingyun1,2(),GOU Wenjin3,LIU Xiazhen1,YUAN Wu1,2,*(),ZHANG Jian1,2,LU Zhonghua1,2   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Huawei Technologies Co. Ltd, Hangzhou, Zhejiang 310053, China
  • Received:2022-07-11 Online:2023-12-20 Published:2023-12-25

Abstract:

[Application Background] Stencil is a typical algorithm for scientific computing such as CFD (Computational Fluid Dynamics), and its memory access performance has attracted attention. The NUMA architecture is widely used in the ARM architecture represented by the Kunpeng 920 processor due to its good scalability. [Methods] Performance analysis tools and benchmark programs are used to test the performance of the Kunpeng platform's memory access and communication subsystems. The hot spot analysis and performance test are carried out for the typical stencil application software CCFD V3.0, and the Roofline model is established. [Results] The Kunpeng 920 processor relies on its many-core NUMA architecture. Its single-node floating-point performance, peak memory bandwidth, and communication latency are better than that of the Intel Xeon E5-2680v2 and another domestic processor. On a single node, the execution speed of CCFD V3.0 on the Kunpeng platform is about 2~3 times of that of the Intel platform and 1.5~2 times of that of the domestic processor. [Conclusions] The Kunpeng platform based on the ARM architecture is easy in program porting, and its NUMA architecture has advantages for memory-intensive applications such as stencil.

Key words: Stencil, Kunpeng 920, performance evaluation, CFD