数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (3): 13-28.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.03.002

doi: 10.11871/jfdc.issn.2096-742X.2023.03.002

• 专刊:“人工智能&大数据”科研范式变革专刊(下) • 上一篇    下一篇

HPC+AI驱动的第一性原理科学智能计算平台

刘涛1(),赵曈1,谭光明1,2,贾伟乐1,2,*()   

  1. 1.中国科学院计算技术研究所,处理器全国重点实验室,北京 100190
    2.中国科学院大学,北京 100049
  • 收稿日期:2023-05-04 出版日期:2023-06-20 发布日期:2023-06-21
  • 通讯作者: *贾伟乐(E-mail: jiaweile@ict.ac.cn
  • 作者简介:刘涛,中国科学院计算技术研究所,高级工程师,主要研究方向为高性能计算、机器学习以及科学智能应用。
    本文中主要承担工作为科学智能计算平台的整体架构设计。
    LIU Tao, is a senior engineer in Institute of Computing Technology, Chinese Ac-ademy of Sciences. His research interests include High Perfor-mance Computing, Machine Learning, and AI for Science app-lications.
    In this paper, he is mainly responsible for the overall fram-ework design of the AI for Science computing platform.
    E-mail: liutao17@ict.ac.cn|贾伟乐,中国科学院计算技术研究所,副研究员,博士生导师,致力于智能科学计算(HPC+AI)研究,其参与研发的高性能深度学习分子动力学软件,比同类型软件效率提高4个数量级,被广泛应用(软件被用户应用在Nature、Science、PRL上发表文章)。先后获2020年ACM戈登贝尔奖,入选2020年两院院士评选的中国十大科技进展新闻。研究工作也入围了2022年ACM戈登贝尔奖,获得2022年中国超算最佳应用奖。
    本文中主要承担工作为文献调研及平台概述。
    JIA Weile, is an associate Professor and Ph.D. supervisor in Institute of Computing Technology, Chinese Academy of Sciences. His research focuses in the field of HPC+AI. He is among the key developers of the neural network based molecular dynamics software, and the software is used by users to publish papers in Nature, Science, PRL. He is the recipient of 2020 ACM Gordon Bell Award and was selected as one of the top ten scientific and technological progress news in China in 2020. One recent collaborative work was also selected among the 2022 ACM Gordon Bell Prize Finalists and won the 2022 China Supercomputing Best Application Award.
    In this paper, he is mainly responsible for literature research and platform overview.
    E-mail: jiaweile@ict.ac.cn
  • 基金资助:
    国家重点研发计划高性能计算专项(青年科学家项目)(2021YFB0300600);国家自然科学基金(92270206);国家自然科学基金(T2125013);国家自然科学基金(62032023);国家自然科学基金(61972377);中国科学院稳定支持青年科学家团队(YSBR-005);中国科学院网信专项“大数据+人工智能”科研范式变革应用示范(CAS-WX2021SF-0103)

An AI-for-Science Platform of Molecular Dynamics with Ab initio Accuracy

LIU Tao1(),ZHAO Tong1,TAN Guangming1,2,JIA Weile1,2,*()   

  1. 1. State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-05-04 Online:2023-06-20 Published:2023-06-21

摘要:

【目的】 科学智能(AI for Science)方法正在深刻地改变当前科学计算的格局。其融合了物理模型、人工智能与高性能计算,针对传统科学计算中的高维问题,通过数据拟合的方式实现成量级的增加高精度科学计算问题的时间和空间尺度,正在推动一场科研范式的变革。【方法】 本文针对第一性原理精度的分子动力学,提出一种HPC+AI驱动的科学智能计算平台,针对科学智能在工作流上带来的变化与挑战,从科学数据的生成与数据集制备、构型空间探索与训练样本标注、科学智能模型的高效训练及大规模高效推理等四个方面阐述构建科学智能计算平台的关键技术与流程。【结果】 本文所提出的计算平台在整合科学智能计算工作流的基础上,针对HPC+AI驱动的第一性原理精度分子动力学这一典型应用,提出了基于卡尔曼滤波的主动学习策略;改进了拟二阶AI模型训练方法,实现训练时间从天到分钟级的加速;利用五阶多项式AI模型压缩技术实现在同等硬件条件下模型推理的体系规模提高1个数量级,到解时间提高3-9倍。【结论】 通过上述工作的整合,形成一套可用于第一性原理精度分子动力学计算的科学智能计算平台。【局限与展望】 科学智能计算方法与工作流仍处于蓬勃发展阶段,在高精度数据、更通用AI模型和高效的计算方法等方面仍面临巨大的挑战,也将成为本文工作在未来的重要探索方向。

关键词: 科学智能, 第一性原理计算, 分子动力学, 主动学习, 卡尔曼滤波, 模型压缩

Abstract:

[Objective] AI for Science is changing the landscape of the traditional scientific computing by combining physical models, artificial intelligence, and high-performance computing to address challenging problems such as molecular dynamics with ab initio accuracy. This approach adapts neural networks to fitting high-dimensional functions, achieving orders of magnitude increases in the temporal and spatial scales, leading to a paradigm shift in scientific research. [Methods] This paper proposes an HPC+AI-driven computing platform for molecular dynamics with ab initio accuracy. Aiming at the changes and challenges brought by the workflow, the key technologies and processes for building an AI for Science computing platform are described from four aspects: generating scientific data and preparing datasets, exploring configuration space and labeling training samples, efficiently training AI for Science models, and performing large scale efficient inference (MD simulation). [Results] Based on the computational platform proposed in this paper and AI for Science computing workflows, this paper proposes an active learning strategy based on Kalman filtering for the typical application of HPC+AI-driven first-principles accuracy molecular dynamics. The training method for the quasi-second-order AI model is improved, achieving a training time acceleration from days to minutes. A fifth-order polynomial model compression technology increases the system scale by one order of magnitude for model inference and accelerates time-to-solution by 3-9 times. [Conclusions] All of the above work is combined to form an AI for Science computing platform for first-principles accuracy molecular dynamics calculations. [Limitations and Prospects] The AI for Science computing approach and workflows are still in a vigorous stage of development and facing significant challenges in high-precision data, more general AI models, and efficient computing methods. These challenges will also be important directions for future exploration in this work.

Key words: AI for Science, first-principles calculation, molecular dynamics, active learning, Kalman Filtering, Model compression