TensorFlow框架中OpenCL核函数的测试验证方案设计与实现

doi:10.11871/jfdc.issn.2096-742X.2022.02.002

数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (2): 17-28.

doi: 10.11871/jfdc.issn.2096-742X.2022.02.002

• 专刊：先进智能计算平台及应用 • 上一篇下一篇

TensorFlow框架中OpenCL核函数的测试验证方案设计与实现

陈禹乔(),孙羽菲^*(),程大果(),张玉志(),周建宇(),隋轶丞(),石昌青()

南开大学,软件学院,天津 300350

收稿日期:2022-02-16 出版日期:2022-04-20 发布日期:2022-04-30
通讯作者: 孙羽菲
作者简介:陈禹乔,南开大学,软件学院,硕士研究生,主要研究方向为深度学习框架移植与高性能计算。
本文主要设计文章架构并参与方案设计与实施。
CHEN Yuqiao is currently a master’s student in the College of Software at Nankai University, Tianjin, China. His research interests include Deep Learning Framework Transplantation and High-Performance Computing.
In this paper, he is mainly responsible for the design of the paper architecture and participating in the methodology design and implementation.
E-mail: ujoenk@mail.nankai.edu.cn|孙羽菲,南开大学,软件学院,特聘研究员,博士,主要研究方向为深度学习、异构计算、人工智能等。
本文主要承担论文指导和修改工作。
SUN Yufei, Ph.D, is a professor at Coll -ege of Software, Nankai University. Her re-search interests include Deep Learning, Heterogeneous Computing,Artificial Intelligence, etc.
In this paper she is mainly responsible for the paper guidance and paper revision.
E-mail: yufei_sun@sina.com|程大果,南开大学,软件学院,硕士研究生,研究方向为深度学习和高性能计算。
本文中负责参与方案设计与实施。
CHENG Daguo is a master’s student in College of Software at Nankai University. His research interests include Deep Learning and High-Performance Computing.
In this paper, he is mainly responsible for participating in the methodology design and Implementation.
E-mail: chengdaguo@mail.nankai.edu.cn|张玉志,南开大学,软件学院,讲席教授,院长。主要研究方向为人工智能、模式识别、自然语言处理等。
本文主要承担文献调研及指导。
ZHANG Yuzhi is the chair professor and the Dean of Software College at Na-nkai University. His research interests include Artificial Intell-igence, Pattern Recognition, Natural Language Processing, etc.
In this paper he is mainly responsible for the related work investigation and guidance.
E-mail: zyz@nankai.edu.cn|周建宇,南开大学,软件学院,讲师,博士,主要研究方向为算法设计与优化、统计机器学习等。
本文主要承担文献调研及论文修改。
ZHOU Jianyu is a lecturer at College of Software, Nankai University. His res-earch interests include algorithm design and optimization, stat-istical machine learning, etc.
In this paper, he is mainly responsible for the related work investigation and paper revision.
E-mail: jyzhou@nankai.edu.cn|隋轶丞,南开大学,软件学院,博士研究生,主要研究方向为人工智能。
本文中负责参与方案设计与实施。
SUI Yicheng is a Ph.D student in Co-llege of Software at Nankai University. His research interests include Artificial Intelligence.
In this paper, he is mainly responsible for participating in the methodology design and implementation.
E-mail: suiyicheng@mail.nankai.edu.cn|石昌青,南开大学,软件学院,硕士研究生,主要研究方向为深度学习与高性能计算。
本文中负责参与方案设计与实施。
SHI Changqing is a master’s student in College of Software at Nankai University. His research interests include Deep Lea-rning and High-Performance Computing.
In this paper, he is mainly responsible for participating in the methodology design and implementation.
E-mail: shichangqing@mail.nankai.edu.cn
基金资助:
国家重点研发计划(2021YFB0300104)

Design and Implementation of Testing and Verification Method for OpenCL Kernels in TensorFlow

CHEN Yuqiao(),SUN Yufei^*(),CHENG Daguo(),ZHANG Yuzhi(),ZHOU Jianyu(),SUI Yicheng(),SHI Changqing()

College of Software, Nankai University, Tianjin 300350, China

Received:2022-02-16 Online:2022-04-20 Published:2022-04-30
Contact: SUN Yufei

摘要/Abstract

摘要：

【目的】TensorFlow是人工智能领域最具代表性的深度学习框架。国产加速设备需要一个支持OpenCL的TensorFlow才能发挥其加速性能,为此需要将TensorFlow框架下的CUDA代码向OpenCL转换。如何验证OpenCL核函数的正确性,是研发任务面对的重要问题。【方法】基于TensorFlow动态链接库自定义算子和raw_ops测试接口,本文提出了一套OpenCL核函数的测试解决方案,包括自定义算子的源码设计规范、测试代码规范、代码审核方法和测试流程。【结果】本文实现了对135个OpenCL核函数代码的审核与测试,在各种数据类型及多种数据规模下进行了测试对比,完成了OpenCL核函数正确性的验证,及其与CUDA核函数的性能比较。【结论】本文为TensorFlow下OpenCL核函数的测试提供了可靠而有效的解决方案。

关键词: TensorFlow, CUDA, OpenCL, 代码审核, 代码测试

Abstract:

[Objective] TensorFlow is the most representative deep learning framework in the field of artificial intelligence. The domestic acceleration device needs a version of TensorFlow with OpenCL support to take advantage of its acceleration performance, for which the CUDA code in TensorFlow needs to be converted to OpenCL code. How to verify the correctness of the OpenCL kernels is an important problem in the development task. [Methods] Based on the TensorFlow raw ops test interface and TensorFlow custom op implemented by the dynamic link library, this paper proposes a test scheme for OpenCL kernels, including the source code design rules of the custom op, test code rules, code review methods, and test process. [Results] This paper implements the review and testing of 135 OpenCL kernel function codes, tests and compares them under various data types and data scales, and completes the verification of the correctness of the OpenCL kernels and their performance comparison with the CUDA kernels. [Conclusions] This paper provides a reliable and effective approach for testing OpenCL kernels based on TensorFlow.

Key words: TensorFlow, CUDA, OpenCL, code review, code testing

陈禹乔,孙羽菲,程大果,张玉志,周建宇,隋轶丞,石昌青. TensorFlow框架中OpenCL核函数的测试验证方案设计与实现[J]. 数据与计算发展前沿, 2022, 4(2): 17-28.

CHEN Yuqiao,SUN Yufei,CHENG Daguo,ZHANG Yuzhi,ZHOU Jianyu,SUI Yicheng,SHI Changqing. Design and Implementation of Testing and Verification Method for OpenCL Kernels in TensorFlow[J]. Frontiers of Data and Computing, 2022, 4(2): 17-28.

图/表 6

图1

表1

图2

图3

表2

表3

参考文献 20

[1]	Abadi M, Agarwal A, Barham P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed sys-tems[J]. arXiv preprint arXiv:1603.04467, 2016.
[2]	CUDA Nvidia. Nvidia cuda c programming guide[J]. Nv-idia Corporation, 2011, 120(18): 8-245.
[3]	Zhu H, Hall P A V, May J H R. Software unit test cover-age and adequacy[J]. Acm computing surveys (csur), 1997, 29(4): 366-427. doi: 10.1145/267580.267590
[4]	Mcnerney P J. Your First Bazel Project[M]. Beginning Bazel, Apress, Berkeley, CA, 2019: 23-41.
[5]	TensorFlow. TensorFlow testing best practices[EB/OL].[2022-01-08]. https://tensorflow.google.cn/community/contribute/tests.
[6]	TensorFlow. Create an op \| TensorFlow Core[EB/OL].[2022-01-08]. https://www.tensorflow.org/guide/create_op.
[7]	TensorFlow. Module: tf.raw_ops \| TensorFlow Core v2.2.0[EB/OL].[2022-01-08]. https://www.tensorflow.org/ver-sions/r2.2/api_docs/python/tf/raw_ops.
[8]	TensorFlow. Running unit tests[EB/OL]. [2022-01-08]. https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md#running-unit-tests.
[9]	Acharya S, Pandya V. Bridge between black Box and white Box-gray Box testing technique[J]. International Journal of Electronics and Computer Science Engin-eering, 2012, 2(1): 175-185.
[10]	Munshi A. The opencl specification[C]// 2009 IEEE Hot Chips 21 Symposium (HCS), IEEE, 2009: 1-314.
[11]	Wai Yip Tung. HTMLTestRunner introduction[EB/OL].[2022-01-08]. http://tungwaiyip.info/software/HT-MLTestRunner.html.
[12]	Python Software Foundation. unittest introduction[EB/OL]. [2022-01-08]. https://docs.python.org/zh-cn/3.7/library/unittest.html.
[13]	Hosmer Jr D W, Lemeshow S, Sturdivant R X. Applied logistic regression[M]. John Wiley & Sons, 2013: 1-8.
[14]	Taud H, Mas J F. Multilayer perceptron (MLP)[M]// Geo-matic approaches for modeling land change scenarios, Springer, Cham, 2018: 451-455.
[15]	Albawi S, Mohammed T A, Al-Zawi S. Understanding of a convolutional neural network[C]// 2017 international conference on engineering and technology (ICET), Ieee, 2017: 1-6.
[16]	Baldi P. Autoencoders, unsupervised learning, and deep architectures[C]// Proceedings of ICML workshop on un-supervised and transfer learning. JMLR Workshop and Conference Proceedings, 2012: 37-49.
[17]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE con-ference on computer vision and pattern recognition, 2016: 770-778.
[18]	Redmon J, Farhadi A. Yolov3: An incremental imp-rovement[J]. arXiv preprint arXiv: 1804. 02767, 2018.
[19]	Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Net-works[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
[20]	Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]// Inter-national Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234-241.

数据类型	比较函数	说明
整型	assertAllEqual()	判断结果是否在每个对应位置上都相等
浮点型	assertAllClose()	判断结果是否在每个对应位置上的误差小于10^-5

软硬件信息	环境
CPU	Intel(R) Xeon(R) Gold 5218 CPU @2.30GHz
RAM	187GB DDR4 2933 MT/s
GPU	NVIDIA Tesla V100S
NVIDIA CUDA Toolkit	CUDA-10.2
OpenCL	OpenCL 1.2
Host compiler	GCC7.5
TensorFlow	TensorFlow-GPU r2.2

模型或任务	描述	是否正确
Logistic Regression	Logistic回归分析,机器学习基础模型之一,可用于估计某个事件发生的可能性,也可分析某个问题的影响因素有哪些^[13]。	正确
Multilayer Perceptron	多层感知机,一种前向结构的人工神经网络,映射一组输入向量到一组输出向量^[14]。	正确
CNN	卷积神经网络,一类包含卷积计算且具有深度结构的前馈神经网络,深度学习代表算法之一^[15]。	正确
AutoEncoder	一种无监督学习算法,主要用于数据的降维或者特征的抽取^[16]。	正确
ResNet	深度残差网络,一种作为许多计算机视觉任务主干的经典神经网络^[17]。	正确
YOLOv3	一种检测速度极快的模型^[18]。	正确
RPN	Region Proposal Network,用以提取输入图像中的物体候选框^[19]。	正确
UNet	卷积网络在生物医学图像分割上的应用^[20]。	正确

TensorFlow框架中OpenCL核函数的测试验证方案设计与实现

Design and Implementation of Testing and Verification Method for OpenCL Kernels in TensorFlow

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 20

相关文章 4

编辑推荐

Metrics

本文评价

[1]	隋轶丞,石昌青,孙羽菲,张玉志,陈禹乔,张宇哲. 基于OpenCL的TensorFlow框架中Element-Wise算子实现[J]. 数据与计算发展前沿, 2022, 4(3): 19-29.
[2]	甘润东,沈舒尹,张宇哲. MXNet框架中基于OpenCL核函数的多维线性数据处理[J]. 数据与计算发展前沿, 2022, 4(2): 29-38.
[3]	郭强,程大果,孙羽菲,周建宇,张玉志,裴嘉傲,甘润东,陈锐. TensorFlow框架中OpenCL算子的实现及集成[J]. 数据与计算发展前沿, 2022, 4(2): 3-16.
[4]	张留莹,王鹏飞,张峰,刘海龙,林鹏飞,王涛,韦俊林,田少博,姜金荣,迟学斌. 海洋环流模式LICOM的GPU实现与优化[J]. 数据与计算发展前沿, 2020, 2(4): 92-104.