[1] |
MESSINA P. The exascale computing project[J]. Computing in Science & Engineering, 2017, 19(3): 63-67.
|
[2] |
DUBEY A, MCINNES L C, THAKUR R, et al. Performance portability in the exascale computing project: exploration through a panel series[J]. Computing in Science & Engineering, 2021, 23(5): 46-54.
|
[3] |
DEAKIN T, MCINTOSH-SMITH S, PRICE J, et al. Performance portability across diverse computer architectures[C]// 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 2019: 1-13.
|
[4] |
EDWARDS H C, TROTT C R, SUNDERLAND D. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns[J]. Journal of parallel and distributed computing, 2014, 74(12): 3202-3216.
doi: 10.1016/j.jpdc.2014.07.003
|
[5] |
EDWARDS H C, TROTT C R. Kokkos: Enabling performance portability across manycore architectures[C]// 2013 Extreme Scaling Workshop (xsw 2013). IEEE, 2013: 18-24.
|
[6] |
BECKINGSALE D A, BURMARK J, HORNUNG R, et al. RAJA: Portable performance for large-scale scientific applications[C]// 2019 ieee/acm international workshop on performance, portability and productivity in hpc (p3hpc). IEEE, 2019: 71-81.
|
[7] |
REGULY I Z. Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications[J]. arXiv preprint arXiv:2309.10075, 2023.
|
[8] |
BARATTA I, RICHARDSON C, WELLS G. Performance analysis of matrix-free conjugate gradient kernels using SYCL[C]// International Workshop on OpenCL. 2022: 1-10.
|
[9] |
RUL S, VANDIERENDONCK H, D'HAENE J, et al. An experimental study on performance portability of OpenCL kernels[C]// 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC'10), 2010.
|
[10] |
谭光明. 高性能计算中的性能工程问题[J]. 数值计算与计算机应用, 2022, 43(4): 343.
doi: 10.12288/szjs.s2022-0842
|
[11] |
李韦, 文渊博, 孙广中, 等. 提升高性能计算程序性能可移植性的领域特定语言[J]. 高技术通讯, 2020, 30(2): 141-149.
|
[12] |
HUANG X, HUANG X, WANG D, et al. OpenArray v1. 0: a simple operator library for the decoupling of ocean modeling and parallel computing[J]. Geoscientific Model Development, 2019, 12(11): 4729-4749.
doi: 10.5194/gmd-12-4729-2019
|
[13] |
XU M, LI H, SONG Z, et al. An Electric-Thermal-Solid Physical Fields Coupling Calculation Based on FELAC Platform[C]// 2022 IEEE 5th International Conference on Electronics Technology (ICET). IEEE, 2022: 532-536.
|
[14] |
ABRAHAMS D, GURTOVOY A. C++ template metaprogramming: concepts, tools, and techniques from Boost and beyond[M]. New York: Pearson Education, 2004.
|
[15] |
TROTT C R, HAMMOND S D, MOORE S G, et al. Sustainability and Performance thorugh Kokkos: A Case Study with LAMMPS[R]. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2016.
|
[16] |
MOORE S G. LAMMPS Kokkos Package--Targeting Next-Generation HPC Platforms[R]. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2016.
|
[17] |
EDWARDS H C, Trott C R, Sunderland D. Kokkos Tutorial: A Trilinos package for manycore performance portability[R]. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2013.
|
[18] |
MILLS R T, ADAMS M F, BALAY S, et al. Toward performance-portable PETSc for GPU-based exascale systems[J]. Parallel Computing, 2021, 108: 102831.
doi: 10.1016/j.parco.2021.102831
|
[19] |
CIESKO J, POLIAKoff D, HOLLMAN D S, et al. Towards generic parallel programming in computer science education with Kokkos[C]// 2020 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC). IEEE, 2020: 35-42.
|
[20] |
TROTT C, BERGER-VERGIAT L, POLIAKOFF D, et al. The kokkos ecosystem: Comprehensive performance portability for high performance computing[J]. Computing in Science & Engineering, 2021, 23(5): 10-18.
|
[21] |
LIU Y, SCHMIDT B. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs[C]// 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 2015: 82-89.
|
[22] |
LANGR D, TVRDIK P. Evaluation criteria for sparse matrix storage formats[J]. IEEE Transactions on parallel and distributed systems, 2015, 27(2): 428-440.
doi: 10.1109/TPDS.2015.2401575
|