[1] |
迟学斌, 王彦棡, 王珏, 等. 并行计算与实现技术[M]. 北京: 科学出版社, 2015: 7.
|
[2] |
MARCO Z, BROND L, STEVE T, et al. Performance Analysis Using the MIPS R10000 Performance Counters[C]. Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), 1996.
|
[3] |
BROWNE S, DONGARRA J, GARNER N, et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters[C]. ACM/IEEE SC 2000 Conference, 2000.
|
[4] |
RUDOLPH B, ZIEGLER H. PCL-the Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors[Z]. Germany: Research Centre Juelich GmbH, 1993.
|
[5] |
LUC S. System and Kernel Thread Performance Monitor API Reference Guide[R]. BM RS/6000 Division, 1999.
|
[6] |
GERNDT M, KEREKU E. Periscope: Advanced Techniques for Performance Analysis[C]. Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the International Conference ParCo 2005, 2005.
|
[7] |
FAHRINGER T, GERNDT M. Specification of Performance Problems in MPI-Programs with ASL[C]. International Conference on Parallel Processing (ICPP’00), 2000.
|
[8] |
JOSEPH E, WILLARD C. A new HPC technical computing benchmark: the IDC balanced rating[C]. IDC Bulletin W, 2000.
|
[9] |
BKDG. A Programming Based Performance Counter Statistical Method for Modern High end Microprocessors[EB/OL]. [2010-04-12]. http://support.amd.com/us/Processor_TechDocs.
|
[10] |
TALLENT N, MELLOR J, ADHIANTO L, et al. “HPCToolkit: performance tools for scientific computing.”[C]. Journal of Physics: Conference Series, 2008.
|
[11] |
Open SpeedShop, Performance measurement tools..[EB/OL]. [2010-04-12]. http://www.openspeedshop.org/wp/.
|
[12] |
PAPI, Performance measurement tools. [EB/OL]. [2010-04-12]. http://icl.cs.utk.edu/papi/.
|
[13] |
SHENDE S, MALONY A. “The Tau Parallel Performance System.”[J]. International Journal of High Performance Computing Applications, 2006, 20(2): 287-311.
|
[14] |
MARTIN B, KIM B D, JEFF D, et al. PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications[C]. 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
|
[15] |
GERNDT M, MOHR B, LARSSON J. Evaluating OpenMP Performance Analysis Tools with the APART Test Suite[C]. Fifth European Workshop on OpenMP (EWOMP’03), 2003.
|
[16] |
GEIMER M, WOLF F, WYLIE B J, et al. The Scalasca Performance Toolset Architecture. Concurrency and Computation[J]. Practice and Experience, 2010, 22(6), 702-719.
|
[17] |
KNUPFER A, BRUNST H, DOLESCHAL J, et al. The Vampir Performance Analysis Tool Set. In: Tools for High Performance Computing[J]. Springer, Berlin, 2008, 139-155.
|
[18] |
SHENDE S S, MALONY A D. The TAU Parallel Performance System[J]. International Journal of High Performance Computing Applications. 2006, 20(2), 287-311.
|
[19] |
DIETER M, SCOTT B. Score-P-A Unified Performance Measurement System for Petascale Applications[C]. Competence in High Performance Computing, 2010.
|
[20] |
DONGARRA J, GANNON D, FOX G, et al. The Impact of Multicore on Computational Science Software[J]. ResearchGate, 2007, 3: 3-10.
|
[21] |
JEFF D, MARTIN B. Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications[C]. IEEE International Symposium on Performance Analysis of Systems and Software, 2011.
|
[22] |
CHEN D, SINGH D. Fractal video compression in opencl: An evaluation of cpus, gpus, and fpgas as acceleration platforms[C]. Asia and South Pacific Design Automation Conference (ASP-DAC), 2013.
|
[23] |
BAKHODA A, YUAN G. Analyzing cuda workloads using a detailed gpu simulator[C]. IEEE International Symposium on Performance Analysis ofSystems and Software (ISPASS), 2009.
|
[24] |
BOR K. K, SU B Y. clspmv: A cross-platform opencl spmv framework on gpus[C]. ACM international conference on Supercomputing (ICS), 2012.
|
[25] |
JOHN C, SETH K, et al. Performance analysis framework for high-level language applications in reconfigurable computing[C]. ACMTrans. Reconfigurable Technology, 2010.
|
[26] |
SILVA B, BRAEKEN A. Performance modeling for fpgas: Extending the roofline model with high-level synthesis tools[C]. International Journal ofReconfigurable Computing (IJRC), 2013.
|
[27] |
ALTERA. Implementing fpga design with the opencl standard[R]. Altera Whitepaper, 2011.
|
[28] |
SUBIN B, AJEESH S. Automated Performance Modeling of HPC Applications Using Machine Learning[C]. IEEE Transactions on Computers, 2020.
|
[29] |
OZAN T, EMRE A. Diagnosing Performance Variations in HPC Applications Using Machine Learning[C]. Lecture Notes in Computer Science, 2017.
|
[30] |
JONATHAN R, JACK D. Timemory: Modular Performance Analysis for HPC[C]. Lecture Notes in Computer Science, 2020.
|
[31] |
ZHOU K, MARK W K. Tools for top-down performance analysis of GPU-accelerated applications[C]. Proceedings of the 34th ACM International Conference on Supercomputing, 2020.
|
[32] |
LARISSA S, MARCIN C. Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models[C]. International Conference on Supercomputing 2022, 2022.
|
[33] |
FELIPE T, ALVARO W. Scalable performance analysis method for SPMD applications[J]. The Journal of Supercomputing, 2022 (78): 19346-19371.
|