数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (2): 120-129.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.02.012

doi: 10.11871/jfdc.issn.2096-742X.2025.02.012

• 技术与应用 • 上一篇    下一篇

面向国产超算系统的大模型训练优化方法

屈志勇1(),王晓光2,周纯葆2,*(),史源香1,乔嘉伟1   

  1. 1.山西省气象信息中心,山西 太原 030006
    2.中国科学院计算机网络信息中心,北京 100083
  • 收稿日期:2024-11-04 出版日期:2025-04-20 发布日期:2025-04-23
  • 通讯作者: 周纯葆
  • 作者简介:屈志勇,山西省气象信息中心,高级工程师,主要研究方向为气象信息技术。
    本文中主要负责方法研究与实验,论文撰写。
    QU Zhiyong is a senior engineer at the Shanxi Meteorological Information Center. His main research direction is meteorological information technology.
    In this paper, he is responsible for method research, experimental design, and paper writing.
    E-mail: 153224922@qq.com|周纯葆,中国科学院计算机网络信息中心,硕士生导师,研究员,博士,主要研究方向为并行计算、人工智能基础算法与软件。
    本文中主要负责方法设计和实验指导。
    ZHOU Chunbao, Ph.D., is a researcher and master supervisor at the Computer Network Information Center, Chinese Academy of Sciences. His main research directions include parallel computing, basic algorithms and software for artificial intelligence.
    In this paper, he is responsible for method design and experimental guidance.
    E-mail: zhoucb@cnic.cn
  • 基金资助:
    山西省气象局揭榜挂帅项目(SXKJBGS202409);山西省档案科技项目共同资助(2024-SX-002);国家气象信息中心重点创新团队(NMIC-2024-ZD08)

Optimization Method for Large Language Models on Domestic Supercomputer System

QU Zhiyong1(),WANG Xiaoguang2,ZHOU Chunbao2,*(),SHI Yuanxiang1,QIAO Jiawei1   

  1. 1. Shanxi Meteorological Information Center, Taiyuan, Shanxi 030006, China
    2. Computer Network Information Center, Chinese Academy of Science, Beijing 100083, China
  • Received:2024-11-04 Online:2025-04-20 Published:2025-04-23
  • Contact: ZHOU Chunbao

摘要:

【目的】为了降低国产超算系统上的大模型训练开销,研发一套大模型训练优化方法。【方法】本文基于MPI与UCC形成一套通信后端,将进程组快速构建与低延迟集合通信相结合,在此基础上引入基于压缩的集合通信优化方法。【结果】通过在国产超算系统上多种配置下的大模型训练实验,本文提出的优化方法可以有效减少训练开销。【结论】实验结果证明了本文提出的大模型训练优化方法在减少训练开销方面的有效性。

关键词: 大语言模型, 分布式训练, 集合通信, 数据压缩

Abstract:

[Objective] In order to reduce the training cost of large language models on domestic supercomputer systems, we propose an optimization method. [Methods] In this article, we build a communication backend based on MPI and UCC, combining the rapid construction of process groups with low-latency collective communication, and introduces a compression-based collective communication optimization method. [Results] Through training experiments for large language models with various configurations on domestic supercomputer systems, our proposed optimization method effectively reduces training costs. [Conclusions] Experimental results demonstrate the effectiveness of the proposed large model training optimization method in reducing training costs.

Key words: large language model, distributed training, collective communication, data compression