Computation Resource Assessment Methodology for Large Meteorological AI Models

doi:10.11871/jfdc.issn.2096-742X.2025.04.015

Abstract

Abstract:

[Objective] In recent years, large meteorological AI models have demonstrated the potential to surpass traditional numerical methods in weather forecasting. However, the model training and deployment require significant computational resources. The existing resource assessment methods, primarily designed for large-scale models in natural language processing (NLP), are struggling to accommodate the dynamic computational demands of meteorological tasks, such as spatiotemporal multidimensionality, and the unique architectures of meteorological models. This results in inefficient resource utilization and high computational costs. To address these challenges, this study proposes a computational resource assessment framework for large meteorological models. By quantifying parameters, computational load, memory usage, and communication overhead, the framework provides a theoretical foundation for hardware configuration and resource allocation, aiming to reduce computational costs and ensure efficient and stable development and operation of large meteorological models. [Methods] We introduce the Multi-Granularity Computing Resource Joint Evaluation Framework (MGCRJEF), which establishes modular models for parameter calculation, spatiotemporal-aware FLOPs assessment, memory usage prediction, and distributed communication analysis. By incorporating the spatiotemporal heterogeneity of meteorological data, it comprehensively evaluates the core hardware resource requirements of large meteorological models. [Results] Using the Pangu-Weather model, which is based on the Swin-Transformer architecture, as a case study, the framework uncovers the model’s resource demand characteristics. For instance, memory usage increases significantly with higher input resolutions, while communication overhead becomes a major performance bottleneck during multi-node training. These insights provide practical guidance for optimizing resource allocation. Furthermore, the framework’s estimated resource demands closely align with actual consumption, demonstrating its accuracy and effectiveness. [Conclusions] The MGCRJEF framework provides a standardized approach to assessing the resource demands of large meteorological models, facilitating resource planning in intelligent computing hardware environments. It offers both theoretical and practical references for model deployment and hardware optimization in the field of meteorology.

Key words: large meteorological AI models, multi-granularity computing resource joint evaluation framework, resource optimization, intelligent computing

SHI Yiheng,WANG Qiyi,SUN Jing,ZHAO Chunyan,DENG Shuai,WU Peng,YAO Wang. Computation Resource Assessment Methodology for Large Meteorological AI Models[J]. Frontiers of Data and Computing, 2025, 7(4): 182-195, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2025.04.015.

Figures/Tables 4

Fig.1

Fig.2

Table 1

Table 2

References 29

[1]	PATHAK J, SUBRAMANIAN S, HARRINGTON P, et al. FourCastNet: A global data-driven high-resolution weather model using adaptive Fourier neural operators[J/OL]. arXiv, 2022. arXiv:2202.11214. https://arxiv.org/abs/2202.11214.
[2]	LAM R, SANCHEZ-GONZALEZ A, WILLSON M, et al. Learning skillful medium-range global weather forecasting[J]. Science, 2023, 382(6677): 1416-1421.
[3]	BI K, XIE L, ZHANG H, et al. Accurate medium-range global weather forecasting with 3D neural networks[J]. Nature, 2023, 619(7970): 533-538.
[4]	CHEN K, HAN T, GONG J, et al.FengWu:Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead[J/OL]. arXiv, 2023. arXiv:2304.02948. https://arxiv.org/abs/2304.02948.
[5]	CHEN L, ZHONG X, ZHANG F, et al. FuXi: a cascade machine learning forecasting system for 15-day global weather forecast[J]. npj Climate and Atmospheric Science, 2023, 6(1): 190.
[6]	ECMWF. ECMWF Annual Report 2022[M]. Reading: ECMWF Publications, 2022: 43.
[7]	RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional Networks for Biomedical Image Segmentation[C]. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Sp- ringer, 2015: 234-241.
[8]	LIU Z, LIN Y, CaoY, et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[C]. IEEE International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 9992-10002.
[9]	SCARSELLI F, GORI M, TSOI AC, Hagenbuchner M, Monfardini G. The Graph Neural Network Model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61-80. doi: 10.1109/TNN.2008.2005605 pmid: 19068426
[10]	BROWN TB, MANN B, RYDER N, et al. Language Models are Few-Shot Learners[J/OL]. arXiv, 2020.arXiv:2005.14165. https://arxiv.org/abs/2005.14165
[11]	KAPLAN J., MCCANDLISH S., HENIGHAN T., et al. Scaling Laws for Neural Language Models[J/OL]. arXiv, 2020. arXiv:2001.08361. https://arxiv.org/abs/22001.08361
[12]	ZAHEER M, GURUGANESH G, DUBEY A, et al. Big Bird: Transformers for Longer Sequences[C]. Advances in Neural Information Processing Systems(NeurIPS). Virtual: NeurIPS Foundation, 2020: 12.
[13]	CHILD R, GRAY S, RADFORD A, SUTSKEVER I. Generating Long Sequences with Sparse Transformers[J/OL]. arXiv, 2019. arXiv:1904.10509. https://arxiv.org/abs/1904.10509.
[14]	RAJBHANDARI S, RASLEY J, RUWASE O, et al. Zero: Memory Optimizations Toward Training Trillion Parameter Models[C]. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2020). Atlanta: IEEE, 2020: 1-24.
[15]	BAUER P, THORPE A, BRUNET G. The Quiet Revolution of Numerical Weather Prediction[J]. Nature, 2015, 525(7567): 47-55.
[16]	SHOCEYBI M, PATWARY M, PURI R, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism[J/OL]. arXiv, 2019.arXiv:1909.08053. https://arxiv.org/abs/1909.08053
[17]	NARAYANAN D, SHOCEYBI M, CASPER J, et al. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM[C]. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2021). St. Louis: IEEE, 2021: 1-15.
[18]	POPE R, DOUGLAS S, CHOWDHERY A, et al. Efficiently Scaling Transformer Inference[J/OL]. arXiv, 2022. arXiv:2211.05102. https://arxiv.org/abs/2211.05102
[19]	QI P, WAN X, HUANG G, LIN M. Zero Bubble (Almost) Pipeline Parallelism[C]. 12th International Conference on Learning Representations (ICLR 2024). Vienna: ICLR, 2024: 1-19.
[20]	AMINABADI RY, RAJBHANDARI S, AWAN AA, et al. DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale[C] Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2022). Dallas: IEEE, 2022: 1-15.
[21]	DENG Q, LU P, ZHAO S, YUAN N. U-Net: A Deep-Learning Method for Improving Summer Precipitation Forecasts in China[J]. Atmospheric and Oceanic Science Letters, 2023, 16(4): 100322.
[22]	TREBING K, STAŃCZYK T, MEHRKANOON S. SmaAt-UNet: Precipitation Nowcasting Using a Small Attention-UNet Architecture[J]. Pattern Recognition Letters, 2021, 145: 178-186.
[23]	TISHBY N, PEREIRA FC, BIALEK W. The Information Bottleneck Method[J/OL]. arXiv, 2000. arXiv:physics/0004057. https://arxiv.org/abs/physics/0004057.
[24]	SOHONI N S, ABERGER CR, LESZCZYNSKI M, et al. Low-Memory Neural Network Training: A Technical Report[J/OL]. arXiv, 2019. arXiv:1904.10631. https://arxiv.org/abs/1904.10631.
[25]	HUANG Y, CHENG Y, BAPNA A, et al. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism[C]. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Vancouver: NeurIPS Foundation, 2019: 103-113.
[26]	FOLEY D, DANSKIN J. Ultra-Performance Pascal GPU and NVLink Interconnect[J]. IEEE Micro, 2017, 37(2): 7-17.
[27]	HAN Y, ZHANG Q, LI S, et al. Latency-Aware Unified Dynamic Networks for Efficient Image Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 7760-7774.
[28]	Pangu-Weather[EB/OL]. GitHub. https://github.com/198808xc/Pangu-Weather.
[29]	CHOWDHERY A, NARANG S, DEVLIN J, et al. Pa- LM: Scaling Language Modeling with Pathways[J/OL]. arXiv, 2022. arXiv:2204.02311, https://arxiv.org/abs/2204.02311.

数	值	说明
隐藏层维度D	1,152	通过Github开源onnx推理模型的结构查看
前馈扩展系数r	4	控制MLP层参数量，文章未直接给出，一般为4
编码器-解码器层数L	8+8	前2层保持分辨率(8×360×181×D)，后6层下采样至(8×180×91×2D)，解码器对称设计

算维度	计算结果	说明
参数量	266 million	256 million（官方）
单次单样本训练 FLOPs	1254 TFLOPs
训练显存占用	59.19 GB	包括模型参数、梯度、优化器状态和激活值
推理显存占用	28.53 GB	包括模型参数、激活值
通信量	24.6GB	192块V100，节点间InfiniBand传输
训练时间	17天	192块V100训练100 epoch，训练16天