TensorFlow框架中OpenCL算子的实现及集成

doi:10.11871/jfdc.issn.2096-742X.2022.02.001

数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (2): 3-16.

doi: 10.11871/jfdc.issn.2096-742X.2022.02.001

• 专刊：先进智能计算平台及应用 • 上一篇下一篇

TensorFlow框架中OpenCL算子的实现及集成

郭强(),程大果(),孙羽菲^*(),周建宇(),张玉志(),裴嘉傲(),甘润东(),陈锐()

南开大学,软件学院,天津 300450

收稿日期:2022-02-23 出版日期:2022-04-20 发布日期:2022-04-30
通讯作者: 孙羽菲
作者简介:郭强, 南开大学,软件学院,硕士研究生,主要研究方向为深度学习与高性能计算。
本文中负责算子集成与模型相关实验,以及引言和背景介绍等部分的撰写。
GUO Qiang is a master’s student in the College of Software at Nankai University. His research interests include Deep Learning and High-Performance Computing.
In this paper, he is responsible for the parts of the integration of OpenCL operators, model-related experiment, abstract and introduction, etc.
E-mail: guoqiang701@mail.nankai.edu.cn|程大果,南开大学,软件学院,硕士研究生,研究方向为深度学习和高性能计算。
本文中负责部分OpenCL算子的实现及集成、实验分析等的撰写。
CHENG Daguo is a master’s student in the College of Software at Nankai University. His research interests include Deep Learning and High-Performance Com-puting.
In this paper, he is responsible for the parts of the implemen-tation and integration of OpenCL operators, experimental an-alysis, etc.
E-mail: chengdaguo@mail.nankai.edu.cn|孙羽菲,南开大学,软件学院,特聘研究员,博士,主要研究方向为深度学习、异构计算、人工智能等。本文中负责论文整体设计,修改和指导。
SUN Yufei, Ph.D, is a professor in the College of Software, Nankai University. Her research interests include Deep Learning, Heterogeneous Computing,Artificial Intelligence, etc.
In this paper, she is responsible for overall design, revision and guidance of this paper.
E-mail: yufei_sun@sina.com|周建宇,南开大学软件学院讲师,博士,主要研究方向为算法设计与优化、统计机器学习等。
本文中主要负责论文修改与指导。
ZHOU Jianyu, Ph.D, is a lecturer in the College of Software, Nankai Universi-ty. His research interests include algorithm design and optim-ization, statistical machine learning, etc.
In this paper, he is responsible for the revision and guidance of this paper.
E-mail: jyzhou@nankai.edu.cn|张玉志,南开大学,讲席教授,软件学院院长,主要研究方向为人工智能、模式识别、自然语言处理等。
本文中主要负责论文的整体设计。
ZHANG Yuzhi is the chair professor and the Dean of the College of Soft-ware at Nankai University. His research interests include Arti-ficial Intelligence, Pattern Recognition, Natural Language Processing, etc.
In this paper, he is responsible for the overall design of this paper.
E-mail: zyz@nankai.edu.cn|裴嘉傲,南开大学,软件学院,硕士研究生,目前研究方向为软件移植、企业区块链的应用等。
在本文中主要工作是负责实验部分的撰写。
PEI Jiaao is a master’s student in the College of Software, Nankai University. His research interests include software porting, application of enterprise blockchain, etc.
In this paper, he is main responsible for writing the exper-imental part.
E-mail: peijiaao@mail.nankai.edu.cn|甘润东,南开大学,软件学院,硕士研究生,主要研究方向为基于深度学习的场景识别。
在本文中主要工作是负责基于OpenCL并行计算框架的核函数以及算子构成部分的撰写。
GAN Rundong is a master student in the College of Software, Nankai University. The main research direction is scene recog-nition based on deep learning.
In this paper, he is responsible for the preparation of kernel functions and operator components based on the OpenCL parallel computing framework.
E-mail: raineast666@163.com|陈锐,南开大学,软件学院,博士研究生,主要研究领域为深度学习与高性能计算等。
本文中主要负责实验部分的设计。
CHEN Rui is a Ph.D student in the Coll-ege of Software, Nankai University. His research interests include Deep Learning and High Performance Computing.
He completes the part of experiments design.
E-mail: rzchen@mail.nankai.edu.cn
基金资助:
国家重点研发计划(2021YFB0300104)

Implementation and Integration of OpenCL Operators in TensorFlow Framework

GUO Qiang(),CHENG Daguo(),SUN Yufei^*(),ZHOU Jianyu(),ZHANG Yuzhi(),PEI Jiaao(),GAN Rundong(),CHEN Rui()

College of Software, Nankai University, Tianjin 300450, China

Received:2022-02-23 Online:2022-04-20 Published:2022-04-30
Contact: SUN Yufei

摘要/Abstract

摘要：

【目的】目前,TensorFlow 这一主流机器学习框架与CUDA异构编程环境的组合在学术界与工业界得到大量使用,使用CUDA实现的TensorFlow算子是加速计算的关键。然而,TensorFlow对于OpenCL 这一开放通用的异构编程标准的不支持严重限制了TensorFlow的通用性,并导致OpenCL硬件设备的算力无法充分发挥。【方法】针对此问题,本文深入探索TensorFlow的底层实现,在对TensorFlow代码结构深入分析的基础上实现了OpenCL算子,并且在2.2.0版本的TensorFlow框架实现了OpenCL算子的集成。【结果】基于上述实现, TensorFlow能够借助OpenCL算子在支持OpenCL 1.2的硬件设备上运行。同时,本文提出的优化方法也大幅提升了OpenCL算子的计算效率。【结论】通过实验表明,本文提出的方法能够有效地解决TensorFlow无法应用在OpenCL硬件设备上的问题。

关键词: TensorFlow, OpenCL, 算子

Abstract:

[Objective] TensorFlow, a mainstream machine learning framework, and CUDA heterogeneous programming environment are currently being used widely in academia and industry. TensorFlow operators implemented in CUDA are the key to accelerating computation. However, TensorFlow's lack of support for OpenCL, an open general-purpose heterogeneous programming standard, severely limits the versatility of TensorFlow and prevents the full computational power of OpenCL hardware devices. [Methods] To address this issue, this paper deeply explores the implementation of TensorFlow, implements the OpenCL operator based on an in-depth analysis of the TensorFlow code structure, and implements the integration of the OpenCL operator in the 2.2.0 version of the TensorFlow framework. [Results] Based on the above implementation, TensorFlow can run on hardware devices supporting OpenCL 1.2 with the help of the OpenCL operator. Also, the optimization method proposed in this paper significantly improves the computational efficiency of the OpenCL operator. [Conclusions] The experiments show that the method proposed in this paper can effectively solve the problem that TensorFlow cannot be applied to OpenCL hardware devices.

Key words: TensorFlow, OpenCL, Operator

郭强,程大果,孙羽菲,周建宇,张玉志,裴嘉傲,甘润东,陈锐. TensorFlow框架中OpenCL算子的实现及集成[J]. 数据与计算发展前沿, 2022, 4(2): 3-16.

GUO Qiang,CHENG Daguo,SUN Yufei,ZHOU Jianyu,ZHANG Yuzhi,PEI Jiaao,GAN Rundong,CHEN Rui. Implementation and Integration of OpenCL Operators in TensorFlow Framework[J]. Frontiers of Data and Computing, 2022, 4(2): 3-16.

图/表 15

图1

图2

图3

图4

表1

图5

表2

图6

表3

图 7

图 8

图 9

图 10

图11

图12

参考文献 25

[1]	NVIDIA Corporation. NVIDIA CUDA编程指南[EB/OL]. [2021/11/04]. https://www.nvidia.cn/docs/IO/51635/NVIDIA_CUDA_Programming_Guide_1.1_chs.pdf.
[2]	Munshi A. The opencl specification[C]. 2009 IEEE Hot Chips 21 Symposium (HCS), IEEE, 2009: 1-314.
[3]	Abadi M, Barham P, Chen J, et al. {TensorFlow}: A Sy-stem for {Large-Scale} Machine Learning[C]. 12th US-ENIX symposium on operating systems design and implementation (OSDI 16), 2016: 265-283.
[4]	Abadi M, Agarwal A, Barham P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems[J]. arXiv preprint arXiv:1603.04467, 2016.
[5]	Sanders J, Kandrot E. CUDA by example: an introduction to general-purpose GPU programming[M]. Addison-Wesley Professional, 2010:14-19.
[6]	NVIDIA Corporation. Cuda toolkit \| nvidia developer[EB/OL]. [2021/11/04]. https://developer.nvidia.cn/zh-cn/cuda-toolkit.
[7]	The Khronos® Group Inc. Opencl overview - the khro-nos group inc[EB/OL]. [2021/11/04]. https://www.kh-ronos.org/opencl/.
[8]	Perkins H. CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++ 11 applications on OpenCL™ 1.2 Devices[C]// Proceedings of the 5th Inter-national Workshop on OpenCL, 2017: 1-4.
[9]	hughperkins. tf-coriander - OpenCL 1.2 implementation for Tensorflow[EB/OL]. [2021/11/04]. https://github.com/hughperkins/tf-coriander.
[10]	The Khronos® Group Inc. SYCL Overview - The Khr-onos Group Inc[EB/OL]. [2021/11/04]. https://www.khronos.org/sycl/.
[11]	Goli M, Iwanski L, Richards A. Accelerated machine learning using TensorFlow and SYCL on OpenCL Dev-ices[C]// Proceedings of the 5th International Workshop on OpenCL, 2017: 1-4.
[12]	Goli M, Iwanski L, Lawson J, et al. OpenCL Acceleration for TensorFlow[J]. arXiv preprint arXiv:1605.02688, 2018: 1-3.
[13]	Codeplay Developer. Home - ComputeCpp CE - Pro-ducts[EB/OL]. [2021/11/04]. https://developer.codeplay.com/products/computecpp/ce/home.
[14]	The Khronos Group Inc. SPIR Overview[EB/OL]. [2021/11/04]. https://www.khronos.org/spir/.
[15]	NVIDIA Corporation. An Easy Introduction to CUDA C and C++[EB/OL]. [2021/11/04]. https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/.
[16]	Kondratyuk N, Nikolskiy V, Pavlov D, et al. GPU-acc-elerated molecular dynamics: State-of-art software perfor-mance and porting from Nvidia CUDA to AMD HIP[J]. The International Journal of High Performance Comput-ing Applications, 2021, 35(4): 312-324.
[17]	Keryell R, Reyes R, Howes L. Khronos SYCL for Open-CL: a tutorial[C]. Proceedings of the 3rd Inter-national Workshop on OpenCL, 2015: 1-1.
[18]	TensorFlow. Create an op \| tensorflow core[EB/OL]. [2021/11/04]. https://www.tensorflow.org/guide/create_op.
[19]	KnuEdge. Constructing a fake device in tensorflow[EB/OL]. [2021/11/04]. https://github.com/knuedge/ten-sorf-low/blob/36e0cdf04f294bfd51931d4f78e291590ed0d3ec/tensorflow/g3doc/hardware/adding_support/fake_device.md.
[20]	Martin York. C++ singleton design pattern[EB/OL]. [2021/11/04]. https://stackoverflow.com/questions/1008019/c-singleton-design-pattern.
[21]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[22]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.
[23]	YunYang1994. TensorFlow2.0-Examples - Difficult alg-orithm, Simple code [EB/OL]. [2021/11/04]. https://github.com/YunYang1994/TensorFlow2.0-Examples.
[24]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv pre-print arXiv:1409.1556, 2014.
[25]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE con-ference on computer vision and pattern recognition, 2016: 770-778.

相关软硬件	环境
CPU	Intel(R) Xeon(R) Gold 5218 CPU @2.30GHz
RAM	187GB DDR4 2933 MT/s
GPU	NVIDIA Tesla V100S
NVIDIA CUDA Toolkit	CUDA-10.2
OpenCL	OpenCL 1.2
Host compiler	GCC 7.5

算子	核函数数量	功能
BiasAdd	2	将偏差项bias加到value上
BiasAddGrad	4	对“bias”张量进行“BiasAdd”的反向操作
BatchToSpace	1	用于T型的4维张量的BatchToSpace
Concat	2	将两个张量按照一定方式连接
DepthToSpace	3	将数据从深度重新排列为空间数据块
DynamicStitch	1	将数据张量的值交织成一个单一的张量
Resize_bilinear	3	计算双线性插值
SplitV	2	将一个张量沿一维分割成多个张量
SpaceToBatch	1	用于T型的4维张量的SpaceToBatch
Tile	1	通过对一个给定的张量进行平铺,构建一个张量

模型名称	学习率	批大小	训练轮数
VGG16	0.01	64	10
ResNet18	0.0001	32	20

TensorFlow框架中OpenCL算子的实现及集成

Implementation and Integration of OpenCL Operators in TensorFlow Framework

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 25

相关文章 3

编辑推荐

Metrics

本文评价

[1]	隋轶丞,石昌青,孙羽菲,张玉志,陈禹乔,张宇哲. 基于OpenCL的TensorFlow框架中Element-Wise算子实现[J]. 数据与计算发展前沿, 2022, 4(3): 19-29.
[2]	陈禹乔,孙羽菲,程大果,张玉志,周建宇,隋轶丞,石昌青. TensorFlow框架中OpenCL核函数的测试验证方案设计与实现[J]. 数据与计算发展前沿, 2022, 4(2): 17-28.
[3]	甘润东,沈舒尹,张宇哲. MXNet框架中基于OpenCL核函数的多维线性数据处理[J]. 数据与计算发展前沿, 2022, 4(2): 29-38.