数据与计算发展前沿 ›› 2026, Vol. 8 ›› Issue (2): 184-203.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.02.014

doi: 10.11871/jfdc.issn.2096-742X.2026.02.014

• 技术与应用 • 上一篇    下一篇

人工智能驱动的分子生成方法与数据资源综述

许黄超1,2(),张宝花1,*(),刘倩1,金钟1,*()   

  1. 1 中国科学院计算机网络信息中心北京 100083
    2 中国科学院大学北京 100049
  • 收稿日期:2025-08-08 出版日期:2026-04-20 发布日期:2026-04-23
  • 通讯作者: *张宝花(E-mail: zhangbh@cnic.cn);金钟 (E-mail: zjin@sccas.cn)
  • 作者简介:许黄超,中国科学院计算机网络信息中心,中国科学院大学,博士研究生,主要研究方向为人工智能辅助药物设计。
    本文承担工作为:文献的收集整理与整体内容的撰写。
    XU Huangchao is a Ph.D. student at the Computer Network Information Center, Chinese Academy of Science, and the University of Chinese Academy of Science. Her main research interests include AI-assisted drug design.
    In this paper, she is responsible for literature collection and organization, as well as paper writing.
    E-mail: hcxu@cnic.cn|张宝花,中国科学院计算机网络信息中心,高级工程师,主要研究方向为超智融合技术及在药物等领域应用。
    本文承担工作为:框架设计与指导。
    ZHANG Baohua is a senior engineer at the Computer Network Information Center, Chinese Academy of Science. Her main interests include HPC&AI integration technology and its application in fields such as pharmaceuticals.
    In this work, she is responsible for framework design and guidance.
    E-mail: zhangbh@cnic.cn|金钟,中国科学院计算机网络信息中心,研究员,主要研究领域为生物医学计算与并行软件实现。
    本文承担工作为:调研方向和内容指导。
    JIN Zhong is a professor at the Computer Network Information Center, Chinese Academy of Science. His main research areas include biomedical computing and parallel software implementation.
    In this paper, he is responsible for guiding the research direction and content.
    E-mail: zjin@sccas.cn
  • 基金资助:
    四大慢病重大专项(2024ZD0533504)

A Review on AI-Driven Methods and Data Resources for Molecule Generation

XU Huangchao1,2(),ZHANG Baohua1,*(),LIU Qian1,JIN Zhong1,*()   

  1. 1 Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2025-08-08 Online:2026-04-20 Published:2026-04-23

摘要:

【目的】 在人工智能(AI)技术与海量分子数据的双重驱动下,AI赋能的分子生成已成为药物设计与化学创新的关键技术。本文聚焦于小分子设计,旨在系统综述AI驱动的小分子生成方法及在药物研发中的应用。【文献范围】梳理了国内外支持小分子生成的主要数据资源、生成方法与应用研究。【方法】 围绕变分自编码器、生成对抗网络、Transformer、扩散模型及大语言模型等技术路线,介绍当前主流模型及其核心机制,结合靶点引导、结构约束与语言建模等策略展开归纳。【结果】 AI小分子生成方法在多个应用场景中展现出显著优势,但对数据质量、算法复杂度和计算资源提出更高要求。【局限】受篇幅限制,本文未能全面涵盖该领域的所有分支和最新进展。【结论】 AI驱动的小分子生成方法正在加速分子发现和药物设计创新进程,构建AI-ready的高质量数据集,提升小分子生成模型的可控性和泛化能力,完善生成评价体系将是未来重要的研究方向。

关键词: 药物发现, 小分子生成, 深度生成模型, 人工智能

Abstract:

[Objective] Advances in artificial intelligence (AI) and the rapid growth of molecular data have made AI-driven molecular generation a key technology in drug design and chemical innovation. This review focuses on small molecule design, summarizing AI-driven generation methods, key data resources, and their current applications in drug discovery. [Coverage] We survey key data resources that support small molecule generation, summarize representative generative modeling approaches, and analyze recent applications studies from both domestic and international research. [Methods] The article reviews mainstream deep learning architectures used in molecular generation, including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, Diffusion Models, and Large Language Models (LLMs). It further summarizes advanced strategies including target-guided, fragment-based, and language model-driven generation. [Results] AI-driven small molecule generation has shown strong effectiveness in various applications. Growing model complexity and data volume require higher-quality datasets, advanced algorithms, and increased computational resources. [Limitations] This review mainly covers key data resources, core methods, and main applications in small molecule generation, without encompassing all subfields or the latest developments in this rapidly evolving area. [Conclusions] The development of domain-specific data resources has enabled a range of AI-driven small molecule generation methods, accelerating molecular discovery and drug design. Building high-quality, AI-ready datasets, improving model controllability and generalization, and refining evaluation systems remain important directions for future research.

Key words: drug design, molecule generation, deep generative models, artificial intelligence