面向国产加速器的CFD核心算法并行优化
曹义魁,陆忠华,张鉴,刘夏真,袁武,梁姗

Parallel Optimization of CFD Core Algorithms Based on Domestic Processor
CAO Yikui,LU Zhonghua,ZHANG Jian,LIU Xiazhen,YUAN Wu,LIANG Shan
表1 ADI_kernel1_i和ADI_kernel5_i在不同线程块大小下的运行时间
Table 1 The running time of ADI_kernel1_i and ADI_kernel5_i under different thread block sizes
Block大小 ADI_kernel1_i ADI_kernel5_i
32*4*1
32*8*1
32*4*2
32*4*4
32*8*2
32*8*4
2.86 s
2.26 s
2.31 s
2.64 s
2.16 s
2.26 s
1.84 s
1.58 s
1.69 s
1.18 s
1.25 s
1.04 s