[1] |
PATHAK J, SUBRAMANIAN S, HARRINGTON P, et al. FourCastNet: A global data-driven high-resolution weather model using adaptive Fourier neural operators[J/OL]. arXiv, 2022. arXiv:2202.11214. https://arxiv.org/abs/2202.11214.
|
[2] |
LAM R, SANCHEZ-GONZALEZ A, WILLSON M, et al. Learning skillful medium-range global weather forecasting[J]. Science, 2023, 382(6677): 1416-1421.
|
[3] |
BI K, XIE L, ZHANG H, et al. Accurate medium-range global weather forecasting with 3D neural networks[J]. Nature, 2023, 619(7970): 533-538.
|
[4] |
CHEN K, HAN T, GONG J, et al.FengWu:Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead[J/OL]. arXiv, 2023. arXiv:2304.02948. https://arxiv.org/abs/2304.02948.
|
[5] |
CHEN L, ZHONG X, ZHANG F, et al. FuXi: a cascade machine learning forecasting system for 15-day global weather forecast[J]. npj Climate and Atmospheric Science, 2023, 6(1): 190.
|
[6] |
ECMWF. ECMWF Annual Report 2022[M]. Reading: ECMWF Publications, 2022: 43.
|
[7] |
RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional Networks for Biomedical Image Segmentation[C]. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Sp- ringer, 2015: 234-241.
|
[8] |
LIU Z, LIN Y, CaoY, et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[C]. IEEE International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 9992-10002.
|
[9] |
SCARSELLI F, GORI M, TSOI AC, Hagenbuchner M, Monfardini G. The Graph Neural Network Model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61-80.
doi: 10.1109/TNN.2008.2005605
pmid: 19068426
|
[10] |
BROWN TB, MANN B, RYDER N, et al. Language Models are Few-Shot Learners[J/OL]. arXiv, 2020.arXiv:2005.14165. https://arxiv.org/abs/2005.14165
|
[11] |
KAPLAN J., MCCANDLISH S., HENIGHAN T., et al. Scaling Laws for Neural Language Models[J/OL]. arXiv, 2020. arXiv:2001.08361. https://arxiv.org/abs/22001.08361
|
[12] |
ZAHEER M, GURUGANESH G, DUBEY A, et al. Big Bird: Transformers for Longer Sequences[C]. Advances in Neural Information Processing Systems(NeurIPS). Virtual: NeurIPS Foundation, 2020: 12.
|
[13] |
CHILD R, GRAY S, RADFORD A, SUTSKEVER I. Generating Long Sequences with Sparse Transformers[J/OL]. arXiv, 2019. arXiv:1904.10509. https://arxiv.org/abs/1904.10509.
|
[14] |
RAJBHANDARI S, RASLEY J, RUWASE O, et al. Zero: Memory Optimizations Toward Training Trillion Parameter Models[C]. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2020). Atlanta: IEEE, 2020: 1-24.
|
[15] |
BAUER P, THORPE A, BRUNET G. The Quiet Revolution of Numerical Weather Prediction[J]. Nature, 2015, 525(7567): 47-55.
|
[16] |
SHOCEYBI M, PATWARY M, PURI R, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism[J/OL]. arXiv, 2019.arXiv:1909.08053. https://arxiv.org/abs/1909.08053
|
[17] |
NARAYANAN D, SHOCEYBI M, CASPER J, et al. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM[C]. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2021). St. Louis: IEEE, 2021: 1-15.
|
[18] |
POPE R, DOUGLAS S, CHOWDHERY A, et al. Efficiently Scaling Transformer Inference[J/OL]. arXiv, 2022. arXiv:2211.05102. https://arxiv.org/abs/2211.05102
|
[19] |
QI P, WAN X, HUANG G, LIN M. Zero Bubble (Almost) Pipeline Parallelism[C]. 12th International Conference on Learning Representations (ICLR 2024). Vienna: ICLR, 2024: 1-19.
|
[20] |
AMINABADI RY, RAJBHANDARI S, AWAN AA, et al. DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale[C] Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2022). Dallas: IEEE, 2022: 1-15.
|
[21] |
DENG Q, LU P, ZHAO S, YUAN N. U-Net: A Deep-Learning Method for Improving Summer Precipitation Forecasts in China[J]. Atmospheric and Oceanic Science Letters, 2023, 16(4): 100322.
|
[22] |
TREBING K, STAŃCZYK T, MEHRKANOON S. SmaAt-UNet: Precipitation Nowcasting Using a Small Attention-UNet Architecture[J]. Pattern Recognition Letters, 2021, 145: 178-186.
|
[23] |
TISHBY N, PEREIRA FC, BIALEK W. The Information Bottleneck Method[J/OL]. arXiv, 2000. arXiv:physics/0004057. https://arxiv.org/abs/physics/0004057.
|
[24] |
SOHONI N S, ABERGER CR, LESZCZYNSKI M, et al. Low-Memory Neural Network Training: A Technical Report[J/OL]. arXiv, 2019. arXiv:1904.10631. https://arxiv.org/abs/1904.10631.
|
[25] |
HUANG Y, CHENG Y, BAPNA A, et al. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism[C]. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Vancouver: NeurIPS Foundation, 2019: 103-113.
|
[26] |
FOLEY D, DANSKIN J. Ultra-Performance Pascal GPU and NVLink Interconnect[J]. IEEE Micro, 2017, 37(2): 7-17.
|
[27] |
HAN Y, ZHANG Q, LI S, et al. Latency-Aware Unified Dynamic Networks for Efficient Image Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 7760-7774.
|
[28] |
Pangu-Weather[EB/OL]. GitHub. https://github.com/198808xc/Pangu-Weather.
|
[29] |
CHOWDHERY A, NARANG S, DEVLIN J, et al. Pa- LM: Scaling Language Modeling with Pathways[J/OL]. arXiv, 2022. arXiv:2204.02311, https://arxiv.org/abs/2204.02311.
|