[1] |
NVIDIA Corporation. NVIDIA CUDA编程指南[EB/OL]. [2021/11/04]. https://www.nvidia.cn/docs/IO/51635/NVIDIA_CUDA_Programming_Guide_1.1_chs.pdf.
|
[2] |
Munshi A. The opencl specification[C]. 2009 IEEE Hot Chips 21 Symposium (HCS), IEEE, 2009: 1-314.
|
[3] |
Abadi M, Barham P, Chen J, et al. {TensorFlow}: A Sy-stem for {Large-Scale} Machine Learning[C]. 12th US-ENIX symposium on operating systems design and implementation (OSDI 16), 2016: 265-283.
|
[4] |
Abadi M, Agarwal A, Barham P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems[J]. arXiv preprint arXiv:1603.04467, 2016.
|
[5] |
Sanders J, Kandrot E. CUDA by example: an introduction to general-purpose GPU programming[M]. Addison-Wesley Professional, 2010:14-19.
|
[6] |
NVIDIA Corporation. Cuda toolkit | nvidia developer[EB/OL]. [2021/11/04]. https://developer.nvidia.cn/zh-cn/cuda-toolkit.
|
[7] |
The Khronos® Group Inc. Opencl overview - the khro-nos group inc[EB/OL]. [2021/11/04]. https://www.kh-ronos.org/opencl/.
|
[8] |
Perkins H. CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++ 11 applications on OpenCL™ 1.2 Devices[C]// Proceedings of the 5th Inter-national Workshop on OpenCL, 2017: 1-4.
|
[9] |
hughperkins. tf-coriander - OpenCL 1.2 implementation for Tensorflow[EB/OL]. [2021/11/04]. https://github.com/hughperkins/tf-coriander.
|
[10] |
The Khronos® Group Inc. SYCL Overview - The Khr-onos Group Inc[EB/OL]. [2021/11/04]. https://www.khronos.org/sycl/.
|
[11] |
Goli M, Iwanski L, Richards A. Accelerated machine learning using TensorFlow and SYCL on OpenCL Dev-ices[C]// Proceedings of the 5th International Workshop on OpenCL, 2017: 1-4.
|
[12] |
Goli M, Iwanski L, Lawson J, et al. OpenCL Acceleration for TensorFlow[J]. arXiv preprint arXiv:1605.02688, 2018: 1-3.
|
[13] |
Codeplay Developer. Home - ComputeCpp CE - Pro-ducts[EB/OL]. [2021/11/04]. https://developer.codeplay.com/products/computecpp/ce/home.
|
[14] |
The Khronos Group Inc. SPIR Overview[EB/OL]. [2021/11/04]. https://www.khronos.org/spir/.
|
[15] |
NVIDIA Corporation. An Easy Introduction to CUDA C and C++[EB/OL]. [2021/11/04]. https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/.
|
[16] |
Kondratyuk N, Nikolskiy V, Pavlov D, et al. GPU-acc-elerated molecular dynamics: State-of-art software perfor-mance and porting from Nvidia CUDA to AMD HIP[J]. The International Journal of High Performance Comput-ing Applications, 2021, 35(4): 312-324.
|
[17] |
Keryell R, Reyes R, Howes L. Khronos SYCL for Open-CL: a tutorial[C]. Proceedings of the 3rd Inter-national Workshop on OpenCL, 2015: 1-1.
|
[18] |
TensorFlow. Create an op | tensorflow core[EB/OL]. [2021/11/04]. https://www.tensorflow.org/guide/create_op.
|
[19] |
KnuEdge. Constructing a fake device in tensorflow[EB/OL]. [2021/11/04]. https://github.com/knuedge/ten-sorf-low/blob/36e0cdf04f294bfd51931d4f78e291590ed0d3ec/tensorflow/g3doc/hardware/adding_support/fake_device.md.
|
[20] |
Martin York. C++ singleton design pattern[EB/OL]. [2021/11/04]. https://stackoverflow.com/questions/1008019/c-singleton-design-pattern.
|
[21] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
|
[22] |
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.
|
[23] |
YunYang1994. TensorFlow2.0-Examples - Difficult alg-orithm, Simple code [EB/OL]. [2021/11/04]. https://github.com/YunYang1994/TensorFlow2.0-Examples.
|
[24] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv pre-print arXiv:1409.1556, 2014.
|
[25] |
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE con-ference on computer vision and pattern recognition, 2016: 770-778.
|