Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (4): 155-168.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.04.013

doi: 10.11871/jfdc.issn.2096-742X.2025.04.013

• Technology and Application • Previous Articles     Next Articles

CAE-Bench:An Evaluation of Large Language Models in Structural Mechanics Simulation

LIU Dianyu1,2,LIU Qingkai2,3,XIAO Yuyang1,2,WANG Jie2,3,*()   

  1. 1. Graduate School of China Academy of Engineering Physics, Beijing 100193, China
    2. Institute of Applied Physics and Computational Mathematics, Beijing 100094, China
    3. CAEP Software Center for High Performance Numerical Simulation, Beijing 100088, China
  • Received:2025-01-06 Online:2025-08-20 Published:2025-08-21
  • Contact: WANG Jie E-mail:wang_jie@iapcm.ac.cn

Abstract:

[Background] Computer-aided engineering (CAE) plays a crucial role in equipment design and development. Recent advances in large language models (LLMs) present new possibilities for intelligent CAE assistance. [Methods] This study explores the capability of LLMs in CAE tasks by designing a dedicated evaluation benchmark, CAE-Bench, focusing on structural mechanics simulation. The benchmark consists of three capability levels—knowledge retention, problem-solving, and simulation application—spanning six fundamental subjects and nine subfields. Based on CAE-Bench, we develop a dataset of 3,340 multiple-choice questions to systematically assess 15 prominent LLMs. [Conclusions] Although LLMs exhibit a basic understanding of CAE knowledge, their proficiency remains limited. Average accuracy reaches 70% on knowledge-retention questions but falls to 50% on problem-solving questions. For simulation-application tasks, which require reasoning and comprehensive analysis, the accuracy declines further as task difficulty increases. Performance also varies substantially across application scenarios, with accuracies differing by up to a factor of four, indicating that a considerable gap still remains before these models can be deployed in real-world engineering applications. This study provides a feasible framework for evaluating LLMs in CAE domain and can serve as a reference for future endeavors in intelligent CAE and automated simulation.

Key words: large language models, benchmark, computer-aided engineering, structural mechanics simulation