数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (1): 12-20.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.01.002

doi: 10.11871/jfdc.issn.2096-742X.2024.01.002

• 专题:超算互联网及应用 • 上一篇    下一篇

基于Kokkos模板元编程的性能可移植求解器开发

郑亮1,*(),黎坤运1,周兴彬1,李永辉1,于要杰1,向玉开1,胡健1,柴华1,郭黎1,2   

  1. 1.国家超级计算成都中心,四川 成都 610213
    2.成都数据集团,四川 成都 610041
  • 收稿日期:2023-09-29 出版日期:2024-02-20 发布日期:2024-02-21
  • 通讯作者: * 郑亮(E-mail: zhengl@cdcszx.cn/23798906@qq.com
  • 作者简介:郑亮,国家超级计算成都中心(成都超算中心运营管理有限公司),副研究员,副总经理,高性能计算部部长,长期从事高性能计算研究,主要研究方向为并行计算、工业软件和计算地球动力学。
    本文负责论文撰写,设计与开发ChipSum。ZHENG Liang is the deputy manager of the National Supercomputing Center in Chengdu, and the director of the Department of High Performance Computing. He has long been engaged in the research of high performance computing. His major research interests include parallel computing, industrial software development, and computational geodynamics.
    In this paper, he is responsible for the paper writing, designing and developing ChipSum.
    E-mail: zhengl@cdcszn.cn/23798906@qq.com
  • 基金资助:
    工信部产业技术基础公共服务平台面向人工智能创新应用先导区的应用场景公共服务平台建设项目-成都人工智能应用发展产业技术基础公共服务平台建设(CEIEC-2021-ZM02-0166);四川省科技计划面向工业软件的作业管理技术研究与应用(2022YFG0040);光合基金(GHFUND202107014373)

Development of Performance Portable Solver Based on Kokkos Template Metaprogramming

ZHENG Liang1,*(),LI Kunyun1,ZHOU Xingbin1,LI Yonghui1,YU Yaojie1,XIANG Yukai1,HU Jian1,CHAI Hua1,GUO Li1,2   

  1. 1. National Supercomputing Center in Chengdu, Chengdu, Sichuan 610213, China
    2. Chengdu Data Group, Chengdu, Sichuan 610041, China
  • Received:2023-09-29 Online:2024-02-20 Published:2024-02-21

摘要:

【目的】解决面向多样化异构计算架构的求解器应用编程问题。【应用背景】超级计算机的硬件架构日益多样化,新的异构架构因生态不够完善,往往导致软件移植研发门槛过高、研发周期过长,以及存在针对不同硬件反复适配、移植等问题。【方法】在Kokkos代数算子库基础上,开发一套面向国产E级计算环境的“性能可移植”模板元接口开源框架,用于线性代数求解器编程。【结果】本文给出其用于Krylov子空间算法的简单编程示范,实现了面向国产异构处理器的线性代数求解器移植,部分求解器相较于10核超线程Xeon CPU有数十倍以上加速。【结论】性能可移植编程可成为应对多样化异构计算的解决方案。

关键词: 性能可移植, Kokkos, 模板元编程, 线性代数求解器

Abstract:

[Objective] This paper provides a solution to programming of linear algebra problem solvers oriented to multiple heterogeneous computing architectures. [Context] The hardware architecture of supercomputers is becoming increasingly diverse. Due to incomplete software ecosystem, programming on new hardware is difficult, for example, the development cycle is too long for new hardware, repeated adaptation and development are needed for different hardware. [Methods] Using the Kokkos library to port linear algebra operators can help development of a metaprogramming framework using performance portable templates for domestic exa-scale computing environment. [Results] The Krylov sub-space algorithm are implemented as a demonstration for programming linear algebra problem solvers on a domestic heterogeneous device, resulting in at least 10x speedup compared to a 10-core hyperthreading Xeon CPU. [Conclusions] Performance portable programming is a solution for programming diverse heterogeneous computing systems.

Key words: performance portability, Kokkos, template metaprogramming, linear algebra problem solver