Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (3): 66-91.
CSTR: 32002.14.jfdc.CN10-1649/TP.2023.03.006
doi: 10.11871/jfdc.issn.2096-742X.2023.03.006
• Special Issue: AI for Science • Previous Articles Next Articles
WEI Jia1(),CHEN Mo2,WANG Longxiang1,*(
),REN Pei2,LEI Yujia1,QU Yuqi1,JIANG Qiyu1,DONG Xiaoshe1,WU Weiguo1,ZHANG Kaili2,ZHANG Xingjun1
Received:
2022-06-22
Online:
2023-06-20
Published:
2023-06-21
Contact:
*王龙翔(E-mail: WEI Jia, CHEN Mo, WANG Longxiang, REN Pei, LEI Yujia, QU Yuqi, JIANG Qiyu, DONG Xiaoshe, WU Weiguo, ZHANG Kaili, ZHANG Xingjun. Status, Challenges, and Trends of Data-Intensive Supercomputing[J]. Frontiers of Data and Computing, 2023, 5(3): 66-91, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2023.03.006.
Table 1
Mainstream supercomputing support capabilities for data-intensive applications"
名称 | 文件系统 | 存储访问接口 | 存储容量 | 算力类型 | 存储带宽 |
---|---|---|---|---|---|
富岳 | Lustre+BeeGFS | POSIX | 1EB | CPU | 10TB/s |
Summit | IBM Spectrum Scale | POSIX SWIFT HDFS | 250PB | CPU GPU | 2.5TB/s |
神威太湖之光 | Lustre+SWGFS | POSIX | 10PB | CPU | 341 GB/s |
天河2A | Lustre+H2FS | POSIX HDFS | 19PB | CPU MIC GPU | 1TB/s |
Table 4
Comparative hardware architecture analysis"
硬件 | 硬件型号 | 功耗(W) | 计算能力(FLOPS) | 开源软件支持 | 并行算法开发难度 | 开发周期 |
---|---|---|---|---|---|---|
CPU | Ryzen ThreadRipper_3970X | 280 | 4062G SP | 好 | 易 | 短 |
GPU | TeslaV100 | 235 | 15T SP | 好 | 中等 | 短 |
MIC | KNL | 225-300 | 3T DP | 少 | 易 | 中 |
FPGA | UltraScale+ | 27 | 4903G SP | 少 | 难 | 较长 |
ASIC | VPU | 1 | 150G SP | 少 | 难 | 长 |
TPU | TPU v2 | 118 | 180T SP | 好 | 难 | 较长 |
[1] | Middleton A M. Data Intensive Supercomputing Solut-ions[M]//Big Data Technologies and Applications. Cham: Springer-Verlag, 2016:257-306. |
[2] | Kleppmann M. Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems[M]. Sebastopol: O’Reilly Media Inc, 2017:15-16. |
[3] | Harchol-Balter M. Performance modeling and design of computer systems: queueing theory in action[M]. Cambr-idge: Cambridge University Press, 2013:110-119. |
[4] | Hager G, Wellein G. Introduction to high performance computing for scientists and engineers[M]. Boca Raton: CRC Press, 2010:19-21. |
[5] |
Schmidt B, Hildebrandt A. Next-generation sequencing: big data meets high performance computing[J]. Drug discovery today, 2017, 22(4): 712-717.
doi: S1359-6446(17)30058-2 pmid: 28163155 |
[6] |
Ding Z, Yang B, Chi Y, et al. Enabling smart transport-ation systems: A parallel spatio-temporal database appro-ach[J]. IEEE Transactions on Computers, 2015, 65(5): 1377-1391.
doi: 10.1109/TC.2015.2479596 |
[7] | Henz B J, Elliot L, Barton M, et al. High-Performance Computing for the Next Generation Combat Vehicle[R]. Maryland: US Army Research Laboratory, 2018. |
[8] |
Puertas-Martín S, Banegas-Luna A J, Paredes-Ramos M, et al. Is high performance computing a requirement for novel drug discovery and how will this impact academic efforts?[J]. Expert opinion on drug discovery, 2020, 15 (9): 981-985.
doi: 10.1080/17460441.2020.1758664 pmid: 32345062 |
[9] | Orhan A E. Robustness properties of Facebook’s ResN-eXt WSL models[J/OL]. arXiv preprint arXiv: 1907.07640, 2019-12-10 [2022-05-07]. https://arxiv.org/abs/1907.07640. |
[10] | Rajak R. A comparative study: Taxonomy of high perfo-rmance computing (HPC)[J]. International Journal of Electrical and Computer Engineering, 2018, 8(5): 3386. |
[11] | Shimizu T. Supercomputer Fugaku: Co-designed with application developers/researchers[C]. The 2020 IEEE Asian Solid-State Circuits Conference (A-SSCC), Man-hattan: IEEE, 2020: 1-4. |
[12] | Gao J, Zheng F, Qi F, et al. Sunway supercomputer arc-hitecture towards exascale computing: analysis and prac-tice[J]. Science China Information Sciences, 2021, 64(4): 1-21. |
[13] | Usman S, Mehmood R, Katib I. Big data and HPC conve-rgence for smart infrastructures: a review and proposed architecture[M]. Switzerland: Springer Cham, 2020: 561-586. |
[14] | Kodama Y, Odajima T, Arima E, et al. Evaluation of power management control on the supercomputer fugaku[C]. The 2020 IEEE International Conference on Cluster Computing (CLUSTER), Manhattan: IEEE, 2020: 484-493. |
[15] | Nakao M, Ueno K, Fujisawa K, et al. Performance eval-uation of supercomputer fugaku using breadth-first search benchmark in Graph500[C]. The 2020 IEEE International Conference on Cluster Computing (CLUSTER), Manh-attan: IEEE, 2020: 408-409. |
[16] | Kudo S, Nitadori K, Ina T, et al. Implementation and numerical techniques for one EFlop/s HPL-AI benchmark on Fugaku[C]. The 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Manhattan: IEEE, 2020: 69-76. |
[17] | Odajima T, Kodama Y, Tsuji M, et al. Preliminary per-formance evaluation of the fujitsu a64fx using hpc appl-ications[C]. The 2020 IEEE International Conference on Cluster Computing (CLUSTER), Manhattan: IEEE, 2020: 523-530. |
[18] | Zhang K, Su H, Zhang P, et al. Optimization and Perf-ormance Modeling of Stencil Computations on ARM Architectures[C]. 2020 IEEE 22nd International Confer-ence on High Performance Computing and Communic-ations; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems(HPCC/SmartCity/DSS), Manhattan: IEEE, 2020: 113-121. |
[19] | Kahle J A, Moreno J, Dreps D. 2.1 Summit and Sierra: designing AI/HPC supercomputers[C]. 2019 IEEE Inter-national Solid-State Circuits Conference-(ISSCC), Man-hattan: IEEE, 2019: 42-43. |
[20] | Wang R, Tobar R, Dolensky M, et al. Processing full-scale square kilometre array data on the summit superco-mputer[C]. SC20:International Conference for High Performance Computing, Networking, Storage and Anal-ysis, Manhattan: IEEE, 2020: 1-12. |
[21] | Luo L, Straatsma T P, Suarez L E A, et al. Pre-exascale accelerated application development: The ORNL Summit experience[J]. IBM Journal of Research and Develo-pment, 2020, 64(4): 11. |
[22] | Hernández B, Somnath S, Yin J, et al. Performance eval-uation of python based data analytics frameworks in summit: Early experiences[C]. Smoky Mountains Com-putational Sciences and Engineering Conference, Cham: Springer, 2020: 366-380. |
[23] | Womble D E, Shankar M, Joubert W, et al. Early expe-riences on summit: Data analytics and AI applications[J]. IBM Journal of Research and Develop-ment, 2019, 63(6): 1-9. |
[24] | Fu H, Liao J, Yang J, et al. The Sunway TaihuLight sup-ercomputer: system and applications[J]. Science China Information Sciences, 2016, 59(7): 1-16. |
[25] |
Chen Q, Chen K, Chen Z N, et al. Lessons learned from optimizing the Sunway storage system for higher applic-ation I/O performance[J]. Journal of Computer Science and Technology, 2020, 35(1): 47-60.
doi: 10.1007/s11390-020-9798-5 |
[26] |
Lu Y T, Cheng P, Chen Z G. Design and Implementation of the Tianhe-2 Data Storage and Management System[J]. Journal of Computer Science and Technology, 2020, 35(1): 27-46.
doi: 10.1007/s11390-020-9799-4 |
[27] | 臧大伟, 曹政, 孙凝晖. 高性能计算的发展[J]. 科技导报, 2016, 34(14): 22-28. |
[28] | Lin H, Tang X, Yu B, et al. Scalable graph traversal on sunway taihulight with ten million cores[C]. The 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Manhattan: IEEE, 2017: 635-645. |
[29] | Dong W, Li K, Kang L, et al. Implementing molecular dynamics simulation on the Sunway TaihuLight system with heterogeneous many-core processors[J]. Concur-rency and Computation: Practice and Experience, 2018, 30(16): e4468. |
[30] | Seal S K, Lim S H, Wang D, et al. Toward large-scale image segmentation on summit[C]. The 49th Intern-ational Conference on Parallel Processing-ICPP, Manh-attan: IEEE, 2020: 1-11. |
[31] |
Hohman F, Park H, Robinson C, et al. Summit: Scaling deep learning interpretability by visualizing activation and attribution summarizations[J]. IEEE transactions on visualization and computer graphics, 2019, 26(1): 1096-1106.
doi: 10.1109/TVCG.2945 |
[32] | Zhang P, Yin D, Atkinson P M. Future Extreme Clim-ate Prediction in Western Jilin Province Based on Statis-tical DownScaling Model[C]. IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symp-osium, Manhattan: IEEE, 2019: 9886-9889. |
[33] | Torbicki M. Longtime prediction of climate-weather change influence on critical infrastructure safety and resilience[C]. The 2018 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Manhattan: IEEE, 2018: 996-1000. |
[34] |
Shuman C A, Steffen K, Box J E, et al. A dozen years of temperature observations at the Summit: Central Green-land automatic weather stations 1987-99[J]. Journal of Applied Meteorology, 2001, 40(4): 741-752.
doi: 10.1175/1520-0450(2001)040<0741:ADYOTO>2.0.CO;2 |
[35] | Anh Khoa T, Quang Minh N, Hai Son H, et al. Wireless sensor networks and machine learning meet climate cha-nge prediction[J]. International Journal of Communica-tion Systems, 2021, 34(3): e4687. |
[36] | Guo S, Qiao W, Chen B, et al. Prediction and Abnor-mality Analysis of Climate Change Based on PCA-ARMA and PCC[C]. The 2020 IEEE International Conference on Networking, Sensing and Control (ICNSC), Manhattan: IEEE, 2020: 1-6. |
[37] |
Kappe C P, Böttinger M, Leitte H. Analysis of Decadal Climate Predictions with User‐guided Hierarchical Ens-emble Clustering[J]. Computer Graphics Forum, 2019, 38(3): 505-515.
doi: 10.1111/cgf.13706 |
[38] | Belair S, Carrera M L, Abrahamowicz M, et al. Spaceb-orne L-Band Radiometry in Environment and Climate Change Canada (ECCC)’S Numerical Analysis and Prediction Systems[C]. IGARSS 2019-2019 IEEE Inter-national Geoscience and Remote Sensing Sympo-sium, Manhattan: IEEE, 2019: 7526-7528. |
[39] |
Massonnet F, Bellprat O, Guemas V, et al. Using climate models to estimate the quality of global observational data sets[J]. Science, 2016, 354(6311): 452-455.
pmid: 27789838 |
[40] | Kurth T, Treichler S, Romero J, et al. Exascale deep learning for climate analytics[C]. The SC18:International Conference for High Performance Computing, Networ-king, Storage and Analysis, Manhattan: IEEE, 2018: 649-660. |
[41] | Allen R M. Transforming earthquake detection?[J]. Sc-ience, 2012, 335(6066): 297-298. |
[42] |
Minson S E, Meier M A, Baltay A S, et al. The limits of earthquake early warning: Timeliness of ground motion estimates[J]. Science advances, 2018, 4(3): eaaq0504.
doi: 10.1126/sciadv.aaq0504 |
[43] | Mapar J, Holtermann K, Legary J, et al. The role of inte-grated modeling and simulation in disaster preparedness and emergency preparedness and response: the SUMMIT platform[C]. The 2012 IEEE Conference on Technologies for Homeland Security (HST), Manhattan: IEEE, 2012: 117-122. |
[44] |
Luo H, Paal S G. Advancing post-earthquake structural evaluations via sequential regression-based predictive mean matching for enhanced forecasting in the context of missing data[J]. Advanced Engineering Informatics, 2021, 47(4):101202.
doi: 10.1016/j.aei.2020.101202 |
[45] |
Yu Z, Zhu K, Hattori K, et al. Borehole Strain Observ-ations Based on a State-Space Model and ApNe Analysis Associated With the 2013 Lushan Earthquake[J]. IEEE Access, 2021, 9: 12167-12179.
doi: 10.1109/Access.6287639 |
[46] |
Amin M S, Ahn H. Earthquake disaster avoidance lear-ning system using deep learning[J]. Cognitive Systems Research, 2021, 66: 221-235.
doi: 10.1016/j.cogsys.2020.11.002 |
[47] |
Hu Y, Yang H, Luan Z, et al. Massively scaling seismic processing on sunway taihulight supercomputer[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 31(5): 1194-1208.
doi: 10.1109/TPDS.71 |
[48] | Ichimura T, Fujita K, Yamaguchi T, et al. A fast scalable implicit solver for nonlinear time-evolution earthquake city problem on low-ordered unstructured finite eleme-nts with artificial intelligence and transprecision compu-ting[C]. SC18:International Conference for High Perfo-rmance Computing, Networking, Storage and Anal-ysis, Manhattan: IEEE, 2018: 627-637. |
[49] | Mazzucco W, Pastorino R, Lagerberg T, et al. Current state of genomic policies in healthcare among EU me-mber states: results of a survey of chief medical officers[J]. The European Journal of Public Health, 2017, 27(5): 931-937. |
[50] | Chien S, Bashir R, Nerem R M, et al. Engineering as a new frontier for translational medicine[J]. Science trans-lational medicine, 2015, 7(281): 281 fs13-281fs13. |
[51] |
Kumar S, Huang C, Zheng G, et al. Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system[J]. IBM Journal of Research and Development, 2008, 52(1):177-188.
doi: 10.1147/rd.521.0177 |
[52] | Lv G F, Li M F, An H, et al. Distributed deep learning system for cancerous region detection on Sunway Tai-huLight[J]. CCF Transactions on High Performance Co-mputing, 2020, 2(4):1-14. |
[53] | Sejdic E, Malandraki G A, Coyle J L. Computational deg-lutition: Using signal-and image-processing methods to understand swallowing and associated disorders[J]. IEEE signal processing magazine, 2018, 36(1): 138-146. |
[54] | Joubert W, Weighill D, Kainer D, et al. Attacking the opioid epidemic: determining the epistatic and pleiot-ropic genetic architectures for chronic pain and opioid addiction[C]. SC18:International Conference for High Performance Computing, Networking, Storage and Anal-ysis, Manhattan: IEEE, 2018: 717-730. |
[55] |
Hush M R. Machine learning for quantum physics[J]. Science, 2017, 355(6325): 580.
doi: 10.1126/science.aam6564 pmid: 28183936 |
[56] | Musser G. One of quantum physics’ greatest paradoxes may have lost its leading explanation[J/OL]. Science, 2020-09-12[2022-05-07]. https://www.science.org/cont-ent/article/one-quantum-physics-greatest-paradoxes-may-have-lost-its-leading-explanation. |
[57] |
Gao X, Zhang Z Y, Duan L M. A quantum machine lear-ning algorithm based on generative models[J]. Science advances, 2018, 4(12): eaat9004.
doi: 10.1126/sciadv.aat9004 |
[58] | Gao P, Duan X, Zhang T, et al. Millimeter-scale and billion-atom reactive force field simulation on Sunway Taihulight[J]. IEEE Transactions on Parallel and Dis-tributed Systems, 2020, 31(12): 2954-2967. |
[59] | Li K, Shang H, Zhang Y, et al. OpenKMC: a KMC desi-gn for hundred-billion-atom simulation using millions of cores on Sunway Taihulight[C]. Proceedings of the International Conference for High Performance Comp-uting, Networking, Storage and Analysis, Manh-attan: IEEE, 2019: 1-16. |
[60] | Liu Z, Chu X S, Lv X, et al. Sunwaylb: Enabling extre-me-scale lattice boltzmann method based computing fluid dynamics simulations on sunway taihulight[C]. The 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Manhattan: IEEE, 2019: 557-566. |
[61] | Zhang J, Zhou C, Wang Y, et al. Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer[C]. SC’16:Proceedings of the International Conference for High Performance Comp-uting, Networking, Storage and Analysis, Manhattan: IEEE, 2016: 34-45. |
[62] | Duan X, Gao P, Zhang T, et al. Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight[C]. SC18:International conference for high performance computing, networking, storage and analysis, Manhattan: IEEE, 2018: 148-159. |
[63] |
C Chang C C, Nicholson A N, Rinaldi E, et al. A percent-level determination of the nucleon axial coupling from quantum chromodynamics[J]. Nature, 2018, 558(7708): 91-94.
doi: 10.1038/s41586-018-0161-8 |
[64] | Anbuvizhi R, Balakumar V. Credit/Debit Card Transa-ction Survey Using Map Reduce in HDFS and Imple-menting Syferlock to Prevent Fraudulent[J]. International Journal of Computer Science and Network Security (IJC-SNS), 2016, 16(11): 106. |
[65] | Dai Y, Yan J, Tang X, et al. Online credit card fraud detection: a hybrid framework with big data technolo-gies[C]. The 2016 IEEE Trustcom/ BigDataSE / ISPA, Manhattan: IEEE, 2016: 1644-1651. |
[66] |
Li Z, Liu G, Jiang C. Deep representation learning with full center loss for credit card fraud detection[J]. IEEE Transactions on Computational Social Systems, 2020, 7(2): 569-579.
doi: 10.1109/TCSS.6570650 |
[67] |
Koppers L, Wormer H, Ickstadt K. Towards a systematic screening tool for quality assurance and semiautomatic fraud detection for images in the life sciences[J]. Science and engineering ethics, 2017, 23(4): 1113-1128.
doi: 10.1007/s11948-016-9841-7 pmid: 27848190 |
[68] | Hong H J, Chuang J C, Hsu C H. Animation rendering on multimedia fog computing platforms[C]. The 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Manhattan: IEEE, 2016: 336-343. |
[69] |
Zhang Y, Zhu Z, Cui H, et al. Small files storing and computing optimization in Hadoop parallel rendering[J]. Concurrency and Computation: Practice and Experience, 2017, 29(20): e3847.
doi: 10.1002/cpe.v29.20 |
[70] | Wang A, Zhang A, Chan E, et al. A Review of Human Mobility Research Based on Big Data and Its Implication for Smart City Development[J]. International Journal of Geo-Information, 2020, 10(1):13. |
[71] |
de Assuncao M D, da Silva Veith A, Buyya R. Distributed data stream processing and edge computing: A survey on resource elasticity and future directions[J]. Journal of Network and Computer Applications, 2018, 103: 1-17.
doi: 10.1016/j.jnca.2017.12.001 |
[72] | Mehmood Y, Ahmad F, Yaqoob I, et al. Internet-of-things-based smart cities: Recent advances and challenges[J]. IEEE Communications Magazine, 2017, 55(9): 16-24.. |
[73] |
Bo T, Zhen C, Hefferman G, et al. Incorporating Intel-ligence in Fog Computing for Big Data Analysis in Smart Cities[J]. IEEE Transactions on Industrial Informatics, 2017, 13(5):2140-2150.
doi: 10.1109/TII.2017.2679740 |
[74] | Jun S P, Yoo H S, Choi S. Ten years of research change using Google Trends: From the perspective of big data utilizations and applications[J]. Technological Forecas-ting and Social Change, 2018, 130(5):69-87. |
[75] | Sapountzi A, Psannis K E. Social networking data analysis tools & challenges[J]. Future Generation Computer Sys-tems, 2018, 86: 893-913. |
[76] | Stavrinides G L, Karatza H D. The impact of data locality on the performance of a SaaS cloud with real-time data-intensive applications[C]. The 2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Manhattan: IEEE, 2017: 1-8. |
[77] | Abramson D, Jin C, Luong J, et al. A BeeGFS-based caching file system for data-intensive parallel computing[C]. Asian Conference on Supercomputing Frontiers, Cham: Springer, 2020: 3-22. |
[78] | Zhang X, Wang Y, Wang Q, et al. A new approach to double i/o performance for ceph distributed file system in cloud computing[C]. The 2019 2nd International Con-ference on Data Intelligence and Security (ICDIS), Man-hattan: IEEE, 2019: 68-75. |
[79] | Iannone F, Ambrosino F, Bracco G, et al. CRESCO EN-EA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout[C]. The 2019 International Conference on High Performance Computing & Simul-ation (HPCS), Manhattan: IEEE, 2019: 1051-1052. |
[80] | Braam P. The Lustre storage architecture[J/OL]. arXiv preprint arXiv:1903.01955, 2019-03-05 [2022-05-07]. https://arxiv.org/abs/1903.01955. |
[81] | Awad M, Menascé D A. iModel: Automatic Derivation of Analytic Performance Models[J]. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), 2020, 5(2): 1-30. |
[82] |
Williams S, Waterman A, Patterson D. Roofline: an insi-ghtful visual performance model for multicore archit-ectures[J]. Communications of the ACM, 2009, 52(4): 65-76.
doi: 10.1145/1498765.1498785 |
[83] | Tsafrir D, Etsion Y, Feitelson D G. Backfilling Using System-Generated Predictions Rather than User Runt-ime Estimates[J]. IEEE Transactions on Parallel & Distri-buted Systems, 2007, 18(6):789-803. |
[84] | Gibbons R. A historical application profiler for use by parallel schedulers[C]. Workshop on Job Scheduling Stra-tegies for Parallel Processing, Berlin:Springer, 1997: 58-77. |
[85] | Smith W, Foster I, Taylor V. Predicting application run times using historical information[C]. Workshop on Job Scheduling Strategies for Parallel Processing, Berlin:Springer, 1998: 122-142. |
[86] | Kapadia N H, Fortes J A B, Brodley C E. Predictive app-lication-performance modeling in a computational grid environment[C]. Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No. 99TH8469), Manhattan: IEEE, 1999: 47-54. |
No related articles found! |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||