[1] |
RAWAT P S, VAIDYA M, SUKUMARAN-RAJAM A, et al. Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations[J]. Proceedings of the IEEE, 2018, 106(11): 1902-1920.
doi: 10.1109/JPROC.2018.2862896
[2] |
LI K, YUAN L, ZHANG Y, et al. Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations[C]. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021: 1-15.
[3] |
YUAN L, CAO H, ZHANG Y, et al. Temporal Vectorization for Stencils[C]. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2020: 1-13.
[4] |
YOSHIDA T. Fujitsu high performance CPU for the Post-K Computer[C]. Hot Chips 2018, 30(1): 1-22.
[5] |
魏嘉, 张兴军, 纪泽宇, 等. 天河三号原型机分布式并行深度神经网络性能评测及调优[J]. 计算机工程与科学, 2021, 43(5): 782-791.
[6] |
WIKICHIP. TaiShan v110-Microarchitectures-HiSilicon[EB/OL].[2019-05-02]. https://en.wikichip.org/wiki/hisilicon/microarchitectures/taishan_v110.
[7] |
MCCALPIN, JOHN D. Memory bandwidth and machine balance in current high performance computers[J]. IEEE computer society technical committee on computer architecture (TCCA) newsletter, 1995, 2:19-25.
[8] |
MVAPICH. MPI OVER INFINIBAND, Omni-Path, Ethernet/iWARP, and RoCE - Network Based Computing Laboratory, OSUMicro-Benchmarks[EB/OL]. [2022-02-03]. http://mvapich.cse.ohio-state.edu/benchmarks/.
[9] |
WEAVER V. M. Self-monitoring overhead of the Linux perf_event performance counter interface[C]. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2015: 102-111.
[10] |
刘夏真, 马文鹏, 张鉴, 等. 基于多块结构网格大规模并行计算的负载均衡设计及实现[J]. 科研信息化技术与应用, 2013, 4(5): 18-25.
[11] |
刘夏真. 并行流场软件-CCFDv3.0设计及面向国产异构平台的实现[D]. 北京: 中国科学院大学, 2021.
[12] |
YAMAMOTO Y, KAI T, HOZUMI K. Numerical rebuilding of aerothermal environments and CFD analysis of post flight wind tunnel tests for hypersonic flight experiment HYFLEX[C]. AIAA Thermophysics Conference, 2001, 2899: 1-16.
[13] |
龚春叶, 包为民, 汤国建, 等. 二维结构化网格CFD LU-SGS时间推进并行算法[J]. 计算机科学与探索, 2013, 7(10): 936-943.
[14] |
WILLIAMS S, WATERMAN A, PATTERSON D. Roofline: an insightful visual performance model for multicore architectures[J]. Communications of the ACM, 2009, 52(4): 65-76.
doi: 10.1145/1498765.1498785
[15] |
BROWNE S, DONGARRA J, GARNER N, et al. A portable programming interface for performance evaluation on modern processors[J]. The international journal of high performance computing applications, 2000, 14(3): 189-204.
doi: 10.1177/109434200001400303