数据流计算研究进展与概述

doi:10.11871/jfdc.issn.2096-742X.2021.05.005

数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (5): 65-81.

doi: 10.11871/jfdc.issn.2096-742X.2021.05.005

• 专刊：我国信息技术领域“卡脖子”问题与对策 • 上一篇下一篇

数据流计算研究进展与概述

范志华^1,²(),李文明¹(),叶笑春¹(),范东睿^1,^2,^*()

1.中国科学院计算技术研究所,计算机体系结构国家重点实验室,北京 100190
2.中国科学院大学,计算机科学与技术学院,北京 100049

收稿日期:2021-09-30 出版日期:2021-10-20 发布日期:2021-11-24
通讯作者: 范东睿
作者简介:范志华,中国科学院计算技术研究所,中国科学院大学,博士研究生,计算机学会学生会员,主要研究领域为数据流计算及高通量处理器设计。
本文主要承担工作为数据流计算研究进展总结及文章撰写。
FAN Zhihua, is a Ph.D. candidate of Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences. He is a student member of China Computer Federation. His research interests include dataflow computing and high throughput computing arch-itecture.
The main contributions to this paper are the summary of the research progress of dataflow computing and paper writing.
E-mail: fanzhihua@ict.ac.cn|李文明,中国科学院计算技术研究所,博士,副研究员,硕士生导师,计算机学会高级会员,主要研究领域为高通量处理器设计及软件模拟技术。
本文主要承担工作为全文统筹及数据流计算发展趋势讨论。
LI Wenming, Ph.D., is an associate professor of Institute of Computing Technology, Chinese Academy of Sciences. He is a senior member of China Computer Federation. His research interests include high throughput computing architecture and software simulation.
The main contributions to this paper are paper organization and discussion of the development trend of dataflow computing.
E-mail: liwenming@ict.ac.cn|叶笑春,中国科学院计算技术研究所,博士,研究员,硕士生导师,计算机学会高级会员,主要研究领域为高通量处理器体系结构及软件模拟技术。
本文主要承担工作为文章国产数据流处理器部分分析及撰写。
YE Xiaochun, Ph.D., is a professor of Institute of Computing Technology, Chinese Academy of Sciences. He is a senior member of China Computer Federation. His research interests include high throughput computing architecture and software simulation.
The main contributions to this paper are the analysis and writing of the domestic dataflow processors.
E-mail: yexiaochun@ict.ac.cn|范东睿,中国科学院计算技术研究所,博士,研究员,博士生导师,计算机学会杰出会员,主要研究领域为高通量/高性能处理器体系结构。
本文主要承担的工作为研究项目负责人及整体架构设计。
FAN Dongrui, Ph.D., is a professor of Institute of Comput-ing Technology, Chinese Academy of Sciences. He is a disting-uished member of China Computer Federation. His research interests include high throughput computing architecture and high performance many-core processor microarchitecture.
The main contributions to this paper are the leader of the project and design of the overall architecture.
E-mail: fandr@ict.ac.cn
基金资助:
国家自然科学基金(61732018);国家自然科学基金(61872335);国家自然科学基金(61802367);中国科学院战略性先导科技专项C类项目(XDC05000000);中国科学院国际伙伴计划(171111KYSB20200002)

The Research Progress of Dataflow Computing: A Brief Survey

FAN Zhihua^1,²(),LI Wenming¹(),YE Xiaochun¹(),FAN Dongrui^1,^2,^*()

1. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China

Received:2021-09-30 Online:2021-10-20 Published:2021-11-24
Contact: FAN Dongrui

摘要/Abstract

摘要：

【目的】本文追溯数据流计算的起源,就数据流计算理论和系统的相关研究背景、关键技术展开介绍。【文献范围】本文整理上世纪60年代至今数据流计算相关的研究文献。【方法】从数据流的起源、软件系统、硬件架构研究进展三个方面介绍了数据流计算的重要工作和关键技术。【结果】对数据流计算的发展趋势和挑战进行了分析与总结。【结论】本文将对未来数据流计算的研究提供参考,希望给该领域的研究人员带来一定的启发。

关键词: 数据流执行模型, 数据流软件系统, 数据流硬件架构

Abstract:

[Objective]This paper gives a brief introduction to the origin, history, research background, and key technologies of dataflow computing. [Coverage] Based on a literature collection of related works from the 1960s to the year 2020, [Methods]various significant works and key technologies in terms of the history and the research progress on dataflow software/hardware are introduced. [Results] In this case, the trends and challenges are analyzed and concluded in the field of dataflow computing. [Conclusions] All in all, this paper is aimed to provide a reference for future dataflow computing research, and to bring some inspiration to researchers in this field.

Key words: dataflow execution model, dataflow software system, dataflow architecture

范志华,李文明,叶笑春,范东睿. 数据流计算研究进展与概述[J]. 数据与计算发展前沿, 2021, 3(5): 65-81.

FAN Zhihua,LI Wenming,YE Xiaochun,FAN Dongrui. The Research Progress of Dataflow Computing: A Brief Survey[J]. Frontiers of Data and Computing, 2021, 3(5): 65-81.

图/表 11

图1

图2

图3

图4

表1

图5

图6

图7

图8

图9

图10

参考文献 78

[1]	Dennis J. B. First version of a data flow procedure lang-uage[C]. Paris: In Proceeding of the Colloque sur la Programmation, 1974.
[2]	李国杰. 一种新的体系结构-数据流计算机[J]. 电子计算机动态, 1981, 11:1-8.
[3]	Adams D.A. A. A Computation Model with DataFlow Sequencing[Z]. Technical Report CS 117, Computer Science Department, School of Humanities and Sciences, Stanford University, Calif, 1968.
[4]	Dennis J. B. First version of a data flow procedure language[C]. Paris: In Proceeding of the Colloque sur la Programmation, 1974.
[5]	Dennis J B, David P M. A preliminary architecture for a basic data-flow processor[C]. Proceedings of the 2nd Annual Symposium on Computer Architecture, 1974.
[6]	Dennis J.B., J.B. Fosseen. Introduction to DataFlow Schemas[Z]. 1973.
[7]	Kosinski P.R. A DataFlow Programming Language[Z]. IBM, 1973.
[8]	Kosinski P.R. A DataFlow Language for opreating sys-tems programming[C]. Proceeding of ACM SIGP-LAN-SIGOPS Interface Meeting, 1973.
[9]	G. R. Gao. An implementation scheme for array oper-ations in static data flow computers[Z]. Technical report, Cambridge, 1982.
[10]	G. R. Gao. A pipelined code mapping scheme for static data flow computers[D]. Massachusetts Institute of Tech-nology, 1986.
[11]	G. R. Gao. An efficient hybrid dataflow architecture model[J]. Journal of Parallel and Distributed Computing, 1993, 19(4):293-307. doi: 10.1006/jpdc.1993.1113
[12]	J. B. Dennis. General parallel computation can be perfor-med with a cycle-free heap[C]. In Proceedings. 1998 Inter national Conference on Parallel Architectures and Comp-ilation Techniques, 1998.
[13]	A. L. Davis and R. M. Keller. Data flow program graphs [C]. 1982.
[14]	D. E. Culler and Arvind. Resource requirements of data-flow programs[C]. Washington: In proceedings of the 15^th Annual International Symposium on Computer Archi-tecture, IEEE Computer Society Press, 1988.
[15]	J. B. Dennis. Compiling fresh breeze codelets[C]. New York: In Proceedings of Programming Models and Appli-cations on Multicores and Manycores, 2014.
[16]	A. Danalis, K. Y. Kim, L. Pollock, M. Swany. Transformations to parallel codes for communication-computation overlap[C]. Seattle In Acm/ieee Conference on Supercomputing, 2005.
[17]	张维维, 魏海涛, 于俊清, 李鹤, 黎昊, 杨秋吉. Costream: 一种面向数据流的编程语言和编译器实现[J]. 计算机学报, 2013, 36(10):1993-2006.
[18]	S. Zuckerman, J. Suetterlein, R. Knauerhase, G. R. Gao. Position paper: Using a ”codelet”program execution model for exascale machines[J]. New York :In ACM International Conference Proceeding Series, ACM Press, 2011.
[19]	G. R. Gao, H. H. Hum, Y.-B. Wong. Parallel function invocation in a dynamic argument-fetching dataflow architecture[C]. Miami Beach,Flor:In Proceedings. PAR-BASE-90: International Conference on Databases, Parallel Architectures, and Their Applications, IEEE, 1990.
[20]	G. R. Gao. Maximum pipelining linear recurrence on static data flow computers[J]. International journal of parallel programming, 1986, 15(2):127-149. doi: 10.1007/BF01414442
[21]	G. R. Gao. A pipelined code mapping scheme for static data flow computers[D]. Massachusetts Institute of Technology, 1986.
[22]	G. R. Gao. Algorithmic aspects of balancing techniques for pipelined data flow code generation[J]. Journal of Parallel and Distributed Computing, 1989, 6(1):39-61. doi: 10.1016/0743-7315(89)90041-5
[23]	G. R. Gao. A pipelined code mapping scheme for solving tridiagonal linear system equations[C]. Nice Frtance :In Proceeding of IFIP Highly Parallel Computer Conference, 1986.
[24]	G. R. Gao. A Code Mapping Scheme for Dataflow Soft-ware Pipelining[C]. Springer Science & Business Media, 2012.
[25]	G. R. Gao. An efficient hybrid dataflow architecture model[J]. Journal of Parallel and Distributed Computing, 1993, 19(4):293-307. doi: 10.1006/jpdc.1993.1113
[26]	Xiaochun Ye, Xu Tan, Meng Wu, Yujing Feng, Da Wang, Hao Zhang, Songwen Pei, Dongrui Fan, An efficient dataflow accelerator for scientific applications[J]. Future Generation Computer Systems, 2020, Volume 112, 580-588.
[27]	Taoran Xiang and Lunkai Zhang, et al. RISC-NN: Use RISC, NOT CISC as Neural Network Hardware Infrastruc-ture[J]. arXiv preprint, arXiv:2103.12393.2021. [2021-09-27]https://arxiv.org/abs/2103.12393v1
[28]	Baumgarte V, Ehlers G, May F, et al. PACT XPP—A Self-Reconfigurable Data Processing Architecture[J]. Journal of Supercomputing, 2003, 26(2):167-184. doi: 10.1023/A:1024499601571
[29]	Swanson S, Schwerin A, Mercaldi M, et al. The WaveScalar architecture[J]. ACM Transactions on Computer Systems, 2007, 25(2):4.
[30]	Mattson P, Dally W J. A programming system for the imagine media processor[D]. Stanford: Stanford Univer-sity, 2002.
[31]	Khailany B, Dally W J, Kapasi U J, et al. Imagine: media processing with streams[J]. IEEE Micro, 2001, 21:35-46. doi: 10.1109/40.918001
[32]	C.A.R. Hoare. Hoare Communicating Sequential Processes[M]. Prentice Hall International, 1985.
[33]	Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, et al. Google, Inc., Mountain View, CA USA 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit[C]. Toronto :In Proceedings of ISCA, 2017.
[34]	H. T. Kung, Why systolic architectures?[C]. IEEE Computer, 1982,Vol. 15:37-46. doi: 10.1109/MC.1982.1653825
[35]	H. H.-J. Hum. The Super-actor Machine: A Hybrid Data-flow/Von Neum[D]. Montreal, Que., Canada, 1992.
[36]	H. H. J. Hum, K. B. Theobald, and G. R. Gao, Buil-ding multithreaded architectures with off-the-shelf microprocessors[D]. In Proceedings of 8th International Parallel Processing Symposium, IEEE, 1994, pages 288-294.
[37]	G. R. Gao. An implementation scheme for array oper-ations in static data flow computers[D]. Technical report, Cambridge, MA, USA, 1982.
[38]	H. H. Hum, O. Maquelin, K. B. Theobald, X. Tian, X. Tang, G. R. Gao, P. Cupryk, N. Elmasri, L. J. Hend-ren. A. Jimenez, et al. A design study of the earth multiprocessor[C]. Citeseer: In PACT, 1995.
[39]	K. B. Theobald and G. R. Gao. Earth: an efficient arch-itecture for running threads[D]. McGill University Mon-treal, Canada, 1999.
[40]	H. H. Hum and G. R. Gao. A high-speed memory organ-ization for hybrid dataflow/von neumann computing[J]. Future Generation Computer Systems, 1992, 8(4):287-301. doi: 10.1016/0167-739X(92)90064-I
[41]	G. R. Gao and V. Sarkar. Sarkar. Location consistency: Stepping beyond the barriers of memory coherence and serializa-bility [Z]. In McGill University, School of Computer. Citeseer, 1994.
[42]	G. R. Gao and V. Sarkar. On the importance of an end-to-end view of memory consistency in future computer systems[C]. Fukuoka,Japan:In International Symposium on High Performance Computing, 1997.
[43]	G. R. Gao and V. Sarkar. Location consistency-a new memory model and cache consistency protocol[J]. IEEE Transactions on Computers, 2000, 49(8):798-813. doi: 10.1109/12.868026
[44]	G. Tan, N. Sun, G. R. Gao. Improving performance of dynamic programming via parallelism and locality on multicore architectures[J]. IEEE Trans. Parallel Distrib. Syst. 2009, 20(2):261-274. doi: 10.1109/TPDS.2008.78
[45]	谭光明. 非规则计算中的并行性和局部性[D]. 中国科学院大学, 2005.
[46]	L. J. Hendren, X. Tang, Y. Zhu, S. Ghobrial, G. R. Gao, X. Xue, H. Cai, P. Ouellet. Compiling c for the earth multithreaded architecture[J]. International Journal of Parallel Programming, 1997, 25(4):305-338. doi: 10.1007/BF02699905
[47]	J. B. Dennis. A parallel program execution model suppor-ting modular software construction[C]. In Mass-ively Parallel Programming Models, IEEE, 1997.
[48]	J. B. Dennis. Compiling fresh breeze codelets[C]. New York, NY:In International Symposium on Code Generation and Optimization, ACM, 2014.
[49]	J. B. Dennis, G. R. Gao, X. X. Meng. Experiments with the fresh breeze tree-based memory model[J]. Com-puter Science-Research and Development, 2011, 26(3-4):325-337.
[50]	J. B. Dennis, L. Huang, W. Y. P. Lim, H. Wu, and Y. Yan. Lim, H. Wu, and Y. Yan. Implementing deep neural networks on fresh breeze[C]. Italy:International Conference on Parallel Computing, 2017.
[51]	H. Wei, M. Qin, W. Zhang, J. Yu, D. Fan, G. R. Gao. Streamtmc: Stream compilation for tiled multi-core architectures[J]. Journal of Parallel & Distributed Computing, 2013, 73(4):484-494.
[52]	魏海涛, 秦明康, 于俊清, 范东睿. 一种面向众核架构的数据流编译框架[J]. 计算机学报, 2014, 37(07):128-137.
[53]	Y. Wu, L. Zheng, B. Heilig, G. R. Gao. HAMR: A dataflow-based real-time in-memory cluster computing engine[J]. International Journal of High Performance Computing Applications, 2017, 31(5):361-374. doi: 10.1177/1094342016672080
[54]	Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell. Caffe: Convolutional architecture for fast feature embedding[C]. New York :In Proceedings of the 22nd ACM International Conference on Multimedia, Association for Computing Machinery, 2014.
[55]	M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng. TensorFlow: A System for Large-Scale Machine Learning[C]. USA: In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation.USENIX Association, 2016.
[56]	U. Brüning, W. K. Giloi, W. Schroeder-Preikschat. Latency hiding in message-passing architectures[C]. Mexico :In Proceedings of 8th International Parallel Processing Symposium, IEEE, 1994.
[57]	H. H. Hum, O. Maquelin, K. B. Theobald, X. Tian, G. R. Gao, L. J. Hendren. A study of the earth-manna multithreaded system[J]. International Journal of Parallel Programming, 1996, 24(4):319-348. doi: 10.1007/BF03356753
[58]	C. Intel Corp. I860 Microprocessor Family Programmer’s Reference Manual[Z]. Intel Corporation, Santa Clara, CA, USA, 1992.
[59]	Andres Marquez and Guang R. Gao. CARE: Overview of an Adaptive Multithreaded Architecture[C]. Tokyo, Japan :In Proceedings of Fifth International Symposium on High Performance Computing, 2003.
[60]	M. S. S. Govindan D. Burger, and Burger S S. Keckler. Trips: A distributed explicit data graph execution (edge) micropro-cessor[C]. In 2007 IEEE Hot Chips 19 Symposium 2007.
[61]	S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, S. J. Eggers, The wave scalar architecture[C]. ACM Trans. Comput. Syst 2007, 25(2):4:1-4:54.
[62]	Ziang Hu Juan del Cuvillo Weirong Zhu, and Guang R. Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences[C]. Dresden: In Proceedings of the 12th International European Conference on Parallel Processing (Euro-Par 2006) 2006.
[63]	Y. Ji, Y. Zhang, X. Xie, S. Li, P. Wang, X. Hu, Y. Zhang, Y. Xie. Fpsa: A full system stack solution for reconfigurable reram-based nn accelerator architecture[C]. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2019.
[64]	Zhang Y., Qu P., Ji Y. et al. A system hierarchy for brain inspired computing[J]. Nature, 2020, 586(7829):378-384. doi: 10.1038/s41586-020-2782-y
[65]	Jian Weng, Sihao Liu, Zhengrong Wang, Vidushi Dadu, Tony Nowatzki. A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms[C]. HPCA 2020.
[66]	Dongrui Fan, Hao Zhang, Da Wang, Xiaochun Ye, Fenglong Song, Guojie Li, Ninghui Sun. Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism[J]. IEEE Micro., 2012, 32:38-47.
[67]	Xiang Taotan, Feng Yujing, Ye Xiaochun, et al. Accelerating CNN algorithm with fine-grained dataflow architectures[C]. Piscataway : Proc of 2018 IEEE Conf on High Performance Computing and Communications, 2018.
[68]	申小伟, 叶笑春, 王达, 等. 一种面向科学计算的数据流优化方法[J]. 计算机学报, 2017, 40(9):223-238.
[69]	向陶然, 叶笑春, 李文明, 等. 基于细粒度数据流架构的稀疏神经网络全连接层加速[J]. 计算机研究与发展, 2019, 56(6):1192-1204.
[70]	X. Tan, X.-W. Shen, X.-C. Ye, D.-R. Fan, L. Zhang, W.-M. Li, Z.-M. Zhang, Z.-M. Tang. A non-stop double buffering mechanism for dataflow architecture[J]. J.Comput. Sci. Tech., 2018, 33(1):145-157.
[71]	X. Tan, X.-C. Ye, X.-W. Shen, Y.-C. Xu, D. Wang, L. Zhang, W.-M. Li, D. R. Fan, Z.-M. Tang. A pipelining loop optimization method for dataflow architecture[J]. J.Comput.Sci.Tech., 2018, 33(1):116-130.
[72]	X. Shen, X. Ye, X. Tan, D. Wang, Z. Zhang, D. Fan, Z. Tang. Poster: An optimization of dataflow architectures for scientific applications[C]. in: 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT),2016.
[73]	X. Shen, X. Ye, X. Tan, D. Wang, Z. Zhang, Z. Tang, D. Fan. Memory partition for simd in streaming dataflow architectures[C]. in: 2016 Seventh International Green and Sustainable Computing Conference (IGSC), 2016.
[74]	Farabet, Clément, et al. Neuflow: A runtime reconfigurable dataflow processor for vision[C]. CVPR 2011 Workshops. IEEE, 2011.
[75]	Robatmili, Behnam, et al. How to implement effective prediction and forwarding for fusable dynamic multicore architectures[C]. 2013 IEEE 19th International Sympo-sium on High Performance Computer Architecture (HPCA). IEEE, 2013.
[76]	Giorgi, Roberto, et al. TERAFLUX: Harnessing dataflow in next generation teradevices[J]. Microprocessors and Microsystems, 2014, 38(8):976-990. doi: 10.1016/j.micpro.2014.04.001
[77]	Chen, Yu-Hsin, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks[J]. ACM SIGARCH Com-puter Architecture News, 2016, 44(3):367-379.
[78]	Lu, Wenyan, et al. Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks[C]. IE-EE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2017.

成果	年份	领域
Argument-flow^[4]	1973	数据流起源
DSG ^[37]	1982	程序执行模型
Argument-fetch ^[19]	1990	程序执行模型
Gao等^[25]	1993	动态数据流
Super-actor^[35]	1992	数/控结合
EARTH- MANNA^[57]	1994	计算机系统
EARTH^[40]	1995	程序执行模型
EARTH-c^[46]	1997	编译
CARE^[59]	2003	计算机系统
WaveScalar^[61]	2003	计算机系统
TRIPS^[60]	2004	数据流芯片
Cyclops64^[62]	2006	数据流芯片
渗透模型^[44,45]	2009	程序执行模型
NeuFlow^[74]	2011	数据流芯片
Godson-T^[66]	2012	高性能处理器
Codelet^[18]	2013	程序执行模型
COStream^[51,52]	2013	编译器
T3^[74]	2013	计算机系统
Caffe^[54]	2014	软件框架
Teraflux^[75]	2014	计算机系统
Fresh Breeze^[48,49,50]	2014	编译
Tensorflow^[55]	2016	软件框架
Eyeriss^[76]	2016	数据流芯片
HAMR^[53]	2017	软件系统
FlexFlow^[77]	2017	数据流芯片
Stream-dataflow^[78]	2017	数据流芯片
Tianjic^[63,64]	2019	类脑芯片
SPU^[26]	2020	数据流芯片

数据流计算研究进展与概述

The Research Progress of Dataflow Computing: A Brief Survey

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 78

相关文章 0

编辑推荐

Metrics

本文评价