Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (2): 120-129.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.02.012

doi: 10.11871/jfdc.issn.2096-742X.2025.02.012

• Technology and Application • Previous Articles     Next Articles

Optimization Method for Large Language Models on Domestic Supercomputer System

QU Zhiyong1(),WANG Xiaoguang2,ZHOU Chunbao2,*(),SHI Yuanxiang1,QIAO Jiawei1   

  1. 1. Shanxi Meteorological Information Center, Taiyuan, Shanxi 030006, China
    2. Computer Network Information Center, Chinese Academy of Science, Beijing 100083, China
  • Received:2024-11-04 Online:2025-04-20 Published:2025-04-23
  • Contact: ZHOU Chunbao E-mail:153224922@qq.com;zhoucb@cnic.cn

Abstract:

[Objective] In order to reduce the training cost of large language models on domestic supercomputer systems, we propose an optimization method. [Methods] In this article, we build a communication backend based on MPI and UCC, combining the rapid construction of process groups with low-latency collective communication, and introduces a compression-based collective communication optimization method. [Results] Through training experiments for large language models with various configurations on domestic supercomputer systems, our proposed optimization method effectively reduces training costs. [Conclusions] Experimental results demonstrate the effectiveness of the proposed large model training optimization method in reducing training costs.

Key words: large language model, distributed training, collective communication, data compression