数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (6): 58-66.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.06.006

doi: 10.11871/jfdc.issn.2096-742X.2023.06.006

• • 上一篇    下一篇

一类Stencil应用在众核NUMA架构的性能研究

高凌云1,2(),勾文进3,刘夏真1,袁武1,2,*(),张鉴1,2,陆忠华1,2   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100049
    3.华为技术有限公司,浙江 杭州 310053
  • 收稿日期:2022-07-11 出版日期:2023-12-20 发布日期:2023-12-25
  • 通讯作者: 袁武(E-mail: yuanwu@sccas.cn
  • 作者简介:高凌云,中国科学院计算机网络信息中心,博士研究生,主要研究方向为高性能计算与应用。
    本文承担工作为:CCFD v3.0在华为鲲鹏平台上的移植、访存性能、通信性能的测试评估。
    GAO Lingyun is a Ph.D. candidate at CNIC. His activities mainly focus on high-performance computation and applications.
    In this paper, he is mainly responsible for testing and evaluating CCFD v3.0 migration, memory access performance, and communication performance on the Huawei Kunpeng platform.
    E-mail: gaolingyun@cnic.cn|袁武,中国科学院计算机网络信息中心,博士,副研究员,主要研究方向为研究方向为计算流体力学、重叠网格技术、非定常流动问题数值模拟等。
    本文承担工作为:指导CCFD v3.0在华为鲲鹏平台上访存性能、通信性能的测试评估方法。
    YUAN Wu, Ph.D., is an associate researcher at CNIC. He works in computational fluid dynamics, overlapping grid technology, and numerical simulation of unsteady flow problems.
    In this paper, he is mainly responsible for guiding the test and evaluation method of CCFD v3.0 on the Huawei Kunpeng platform for memory access performance and communication performance.
    E-mail: yuanwu@sccas.cn
  • 基金资助:
    国家重点研发计划“面向复杂装备的CAE云服务平台研发”项目(2020YFB1709500)

Performance Research of a Class of Stencil Applied in Many-Core NUMA Architecture

GAO Lingyun1,2(),GOU Wenjin3,LIU Xiazhen1,YUAN Wu1,2,*(),ZHANG Jian1,2,LU Zhonghua1,2   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Huawei Technologies Co. Ltd, Hangzhou, Zhejiang 310053, China
  • Received:2022-07-11 Online:2023-12-20 Published:2023-12-25

摘要:

【应用背景】 模板计算是CFD(计算流体动力学,Computational Fluid Dynamics)等科学计算的典型算法,其访存性能受到关注。NUMA架构因扩展性好,在以鲲鹏920处理器为代表的ARM架构上普遍被应用。【方法】 使用性能分析工具和benchmark程序,对鲲鹏平台的访存和通信子系统进行性能测试。针对典型stencil应用软件CCFD V3.0开展热点分析和性能测试,并建立Roofline模型。【结果】 鲲鹏920处理器依托其众核NUMA架构,单节点浮点性能、内存带宽峰值,以及通信时延均优于Intel Xeon E5-2680v2与一款国产处理器。单节点时,CCFD V3.0在鲲鹏平台的运行速度约是Intel平台的2~3倍,是国产处理器的1.5~2倍。【结论】 基于ARM架构的鲲鹏平台应用移植简单,其NUMA架构对模板计算一类访存密集性应用具有优势。

关键词: Stencil, 鲲鹏920, 性能评估, CFD

Abstract:

[Application Background] Stencil is a typical algorithm for scientific computing such as CFD (Computational Fluid Dynamics), and its memory access performance has attracted attention. The NUMA architecture is widely used in the ARM architecture represented by the Kunpeng 920 processor due to its good scalability. [Methods] Performance analysis tools and benchmark programs are used to test the performance of the Kunpeng platform's memory access and communication subsystems. The hot spot analysis and performance test are carried out for the typical stencil application software CCFD V3.0, and the Roofline model is established. [Results] The Kunpeng 920 processor relies on its many-core NUMA architecture. Its single-node floating-point performance, peak memory bandwidth, and communication latency are better than that of the Intel Xeon E5-2680v2 and another domestic processor. On a single node, the execution speed of CCFD V3.0 on the Kunpeng platform is about 2~3 times of that of the Intel platform and 1.5~2 times of that of the domestic processor. [Conclusions] The Kunpeng platform based on the ARM architecture is easy in program porting, and its NUMA architecture has advantages for memory-intensive applications such as stencil.

Key words: Stencil, Kunpeng 920, performance evaluation, CFD