数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (6): 136-148.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.06.013

doi: 10.11871/jfdc.issn.2096-742X.2025.06.013

• 技术与应用 • 上一篇    下一篇

面向国产超算的深度学习框架算子的移植与适配

周法国1(),刘芳2,*(),王彦棡2,王珏2,于淼1,李顺德2,周纯葆2,王婧2,杨沁蒙2   

  1. 1.中国矿业大学(北京),人工智能学院,北京 100083
    2.中国科学院计算机网络信息中心,北京 100083
  • 收稿日期:2025-04-23 出版日期:2025-12-20 发布日期:2025-12-17
  • 通讯作者: 刘芳
  • 作者简介:周法国,中国矿业大学(北京),博士,副教授,硕士生导师,现任计算机系党支部书记。IEEE CS会员,ACM会员,CCF高级会员。主要研究方向为自然语言处理、数据挖掘与社交网络分析、人工智能与深度学习、知识图谱构建及其应用(自动问答、社区问答、聊天机器人)、图像识别与处理等。
    本文中负责写作指导和项目整体的管理。
    ZHOU Faguo is an associate professor and master's supervisor at China University of Mining and Technology (Beijing), and he serves as the Party Branch Secretary of the Computer Science Department. He is a member of the IEEE CS, ACM, and a senior member of CCF. His research interests include natural language processing, data mining, social network analysis, deep learning, knowledge graphs, and image processing.
    In this paper, he is responsible for providing writing guidance and managing the overall project.
    E-mail:zhoufaguo@cumtb.edu.cn|刘芳,中国科学院计算机网络信息中心,副研究员,主要研究方向包括基于GPU的通用计算、高性能计算以及人工智能等。近几年来共主持项目5项,发表论文20余篇,专著2部。
    本文中负责实验设计。
    LIU Fang is an associate researcher at the Computer Network Information Center of the Chinese Academy of Sciences. Her research interests include GPU-based general computing, high-performance computing, and artificial intelligence. In recent years, she has presided over 5 projects, published more than 20 papers and 2 monographs.
    In this paper, she is responsible for the experimental design.
    E-mail: liufang@sccas.cn
  • 基金资助:
    国家重点研发计划项目“面向新一代国产超算系统的应用支撑环境和开发框架”(2023YFB3001900)

Porting and Adapting Deep Learning Framework Operators on Domestic Supercomputers

ZHOU Faguo1(),LIU Fang2,*(),WANG Yangang2,WANG Jue2,YU Miao1,LI Shunde2,ZHOU Chunbao2,WANG Jing2,YANG Qinmeng2   

  1. 1. School of Artificial Intelligence, China University of Mining and Technology-Beijing, Beijing 100083, China
    2. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2025-04-23 Online:2025-12-20 Published:2025-12-17
  • Contact: LIU Fang

摘要:

【应用背景】随着大规模深度学习模型的快速发展,训练大规模模型所需的计算资源不断提升,单一的计算设备已难以满足大规模深度学习模型的训练需求。因此,在深度学习领域,使深度学习框架支持超算平台具有重要的战略意义。作为国产自主研发的深度学习框架,MindSpore凭借其高效的计算性能、灵活的调试功能以及对分布式训练的便捷支持,成为人工智能研究领域的重要工具之一。【问题】MindSpore框架并不支持曙光高性能计算机,无法在该超算平台上直接部署和运行,严重地限制了MindSpore框架在超算环境中的应用。【方法】针对MindSpore框架无法在曙光高性能计算机上运行的问题,本文基于曙光高性能计算机的硬件架构和软件环境,对MindSpore框架进行了移植与适配。曙光高性能计算机采用CPU与海光DCU的异构架构,MindSpore框架对该超算平台的不支持,表现为框架中的算子无法在海光DCU上调度执行,因此本文以框架中的原始GPU算子为基础,设计了面向海光DCU的算子移植方案。【结果】依据面向海光DCU的算子移植方案,本文共成功移植了278个算子,使得MindSpore框架能够在曙光高性能计算机上运行。并在曙光高性能计算机上,对LLaMA模型进行了分布式并行训练,验证了MindSpore框架中海光DCU算子良好的执行性能。

关键词: 深度学习框架, 超级计算机, 算子移植, 分布式并行训练

Abstract:

[Application background] With the rapid development of large-scale deep learning models, the computing resources required for training large-scale models are constantly increasing, and a single computing device is no longer able to meet the training requirements of large-scale deep learning models. Therefore, in the field of deep learning, providing support for supercomputing platforms is of great strategic significance. As a domestically independently developed deep learning framework, MindSpore has become one of the important tools in the field of artificial intelligence research due to its efficient computing performance, flexible debugging functions, and convenient support for distributed training. [Problem] The MindSpore framework does not support Sugon high-performance computers and cannot be directly deployed and run on this supercomputing platform, which severely limits the application of the MindSpore framework in the supercomputing environment. [Method] To address the issue that the MindSpore framework cannot run on the Sugon high-performance computer because of the different hardware architecture and software environment of the Sugon high-performance computer, this paper presents the efforts in porting and adapting the MindSpore framework to the Sugon computer. The Sugon high-performance computer adopts a heterogeneous architecture composed of Hygon CPU and Hygon DCU. The lack of support from the MindSpore framework for this supercomputing platform is manifested in the fact that the operators in the framework cannot be scheduled and executed on Hygon DCU. Therefore, based on the original GPU operators in the framework, this project designs an operator transforming scheme for Hygon DCU. [Result] Based on the operator transforming scheme for Hygon DCU, a total of 278 operators were successfully translated in this project, enabling the MindSpore framework to run on Sugon high-performance computers. Furthermore, on the Sugon high-performance computer, a distributed parallel training of the LLaMA model was carried out to verify the good performance of the Hygon DCU operators in the MindSpore framework.

Key words: deep learning framework, supercomputer, operator migration, distributed parallel training