Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (5): 65-87.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.05.006

doi: 10.11871/jfdc.issn.2096-742X.2025.05.006

• Special Issue: New Domestic Computing Power Empowers the Development of Scientific Computing Applications • Previous Articles     Next Articles

FlowAware: A Feature-Aware Automated Model Parallelization Method for AI-for-Science Tasks

ZENG Yan1(),WU Baofu1,YI Guangzheng1,HUANG Chengchuang1,QIU Yang1,CHEN Yue1,WAN Jian1,2,*(),HU Fan3,JIN Sicong1,LIANG Jiajun1,LI Xin1   

  1. 1. Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China
    2. Zhejiang University of Science and Technology, Hangzhou, Zhejiang 310023, China
    3. Zhejiang Sugon Information Technology Co., Ltd, Hangzhou, Zhejiang 310013, China
  • Received:2025-02-28 Online:2025-10-20 Published:2025-10-23
  • Contact: WAN Jian E-mail:yz@hdu.edu.cn;wanjian@hdu.edu.cn
  • Supported by:
    National Key Research and Development Program of China(2023YF-B3001501);National Natural Science Foundation of China (NSFC)(62302133);Key Research and Development Program of Zhejiang Province(2024C01026);Yangtze River Delta Project(2023Z-Y1068);Hangzhou Key Research Plan Project(2024SZD1A02);GHfund A(202302019816)

Abstract:

[Objective] This study aims to address the inefficiency of AI-for-Science tasks caused by the design and implementation challenges of applying the distributed parallel computing strategies to deep learning models, as well as their inefficient execution. [Methods] We propose an automatic distributed parallelization method for AI-for-Science tasks, called FlowAware. Based on the AI-for-Science framework JAX, this approach thoroughly analyzes task characteristics, operator structures, and data flow properties of deep learning models. By incorporating cluster topology information, it constructs a search space for distributed parallel computing strategies. Guided by load balancing and communication optimization objectives, FlowAware automatically identifies optimal distributed parallel computing strategies for AI models. [Results] Comparative experiments conducted on both GPU-like accelerator clusters and GPU clusters demonstrated that FlowAware achieves a throughput improvement of up to 7.8×compared to Alpa. [Conclusions] FlowAware effectively enhances the search efficiency of distributed parallel computing strategies for AI models in scientific computing tasks and significantly improves their computational performance.

Key words: AI for Science, deep learning, distributed parallel computing