Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (2): 184-203.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.02.014

doi: 10.11871/jfdc.issn.2096-742X.2026.02.014

• Technology and Application • Previous Articles     Next Articles

A Review on AI-Driven Methods and Data Resources for Molecule Generation

XU Huangchao1,2(),ZHANG Baohua1,*(),LIU Qian1,JIN Zhong1,*()   

  1. 1 Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2025-08-08 Online:2026-04-20 Published:2026-04-23

Abstract:

[Objective] Advances in artificial intelligence (AI) and the rapid growth of molecular data have made AI-driven molecular generation a key technology in drug design and chemical innovation. This review focuses on small molecule design, summarizing AI-driven generation methods, key data resources, and their current applications in drug discovery. [Coverage] We survey key data resources that support small molecule generation, summarize representative generative modeling approaches, and analyze recent applications studies from both domestic and international research. [Methods] The article reviews mainstream deep learning architectures used in molecular generation, including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, Diffusion Models, and Large Language Models (LLMs). It further summarizes advanced strategies including target-guided, fragment-based, and language model-driven generation. [Results] AI-driven small molecule generation has shown strong effectiveness in various applications. Growing model complexity and data volume require higher-quality datasets, advanced algorithms, and increased computational resources. [Limitations] This review mainly covers key data resources, core methods, and main applications in small molecule generation, without encompassing all subfields or the latest developments in this rapidly evolving area. [Conclusions] The development of domain-specific data resources has enabled a range of AI-driven small molecule generation methods, accelerating molecular discovery and drug design. Building high-quality, AI-ready datasets, improving model controllability and generalization, and refining evaluation systems remain important directions for future research.

Key words: drug design, molecule generation, deep generative models, artificial intelligence