数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (4): 155-168.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.04.013

doi: 10.11871/jfdc.issn.2096-742X.2025.04.013

• 技术与应用 • 上一篇    下一篇

CAE-Bench: 面向结构力学仿真的大语言模型评估基准

刘典玉1,2,刘青凯2,3,肖雨阳1,2,王杰2,3,*()   

  1. 1.中国工程物理研究院研究生院北京 100193
    2.北京应用物理与计算数学研究所北京 100094
    3.中物院高性能数值模拟软件中心北京 100088
  • 收稿日期:2025-01-06 出版日期:2025-08-20 发布日期:2025-08-21
  • 通讯作者: 王杰
  • 作者简介:刘典玉,中国工程物理研究院研究生院,北京应用物理与计算数学研究所,硕士研究生,主要研究方向为CAE软件智能化。
    本文中主要负责评估方法的设计、测评实验与文章撰写。
    LIU Dianyu, is a Master’s student at the Institute of Applied Physics and Computational Mathematics, Graduate School of China Academy of Engineering Physics. Her main research interests include AI for CAE software.
    In this paper, she is responsible for the design of evaluation methods, conducting assessment experiments, and manuscript writing.
    E-mail: liudianyu22@gscaep.ac.cn|王杰,北京应用物理与计算数学研究所,博士,副研究员,主要研究方向是计算力学、高性能CAE软件等。
    本文中主要负责CAE评估数据集的构建。
    WANG Jie, Ph.D., is an associate researcher at the Institute of Applied Physics and Computational Mathematics. His main research directions include computational mechanics and CAE.
    In this paper, he is primarily responsible for defining the CAE benchmark datasets.
    E-mail: wang_jie@iapcm.ac.cn

CAE-Bench:An Evaluation of Large Language Models in Structural Mechanics Simulation

LIU Dianyu1,2,LIU Qingkai2,3,XIAO Yuyang1,2,WANG Jie2,3,*()   

  1. 1. Graduate School of China Academy of Engineering Physics, Beijing 100193, China
    2. Institute of Applied Physics and Computational Mathematics, Beijing 100094, China
    3. CAEP Software Center for High Performance Numerical Simulation, Beijing 100088, China
  • Received:2025-01-06 Online:2025-08-20 Published:2025-08-21
  • Contact: WANG Jie

摘要:

【目的】在航空、航天、船舶、能源、汽车等领域,计算机辅助工程(Computer Aided Engineering,CAE)是装备设计研制不可或缺的支撑工具。随着装备数字化转型的持续深入,智能化成为CAE软件的重要发展趋势,CAE仿真与人工智能技术有效融合是关键。近年来,大语言模型的快速发展为CAE仿真智能化带来了新的契机。然而,大语言模型是否适用于CAE仿真领域尚不明确。【方法】本文以评估大语言模型对CAE仿真知识的掌握程度为目标,针对结构力学仿真相关专业方向,设计并构建了一套评估基准测试集(CAE-Bench),包括知识记忆、问题求解、仿真应用三个层级,覆盖六类CAE仿真基础课程和九类仿真应用场景,共包含3,340道选择题。基于CAE-Bench,本文对15个主流大语言模型进行了系统评估。【结论】结果表明,大语言模型已经具备一定的CAE仿真基础知识,但掌握程度有限,在知识记忆层级的准确率平均可达70%,而在问题求解层级仅有50%;同时,大语言模型难以处理涉及推理和综合分析的复杂问题,在仿真应用层级的回答准确率随题目难度增加而下降,且在不同类型应用场景下的准确率差异较大,最高和最低可相差4倍,因此距离实际工程应用还有一定差距。本文提出的CAE-Bench为大语言模型在CAE领域的评估提供了可行的方案,可为大语言模型和CAE仿真的有效融合提供参考和支撑。

关键词: 大语言模型, 评估基准, 计算机辅助工程, 结构力学仿真

Abstract:

[Background] Computer-aided engineering (CAE) plays a crucial role in equipment design and development. Recent advances in large language models (LLMs) present new possibilities for intelligent CAE assistance. [Methods] This study explores the capability of LLMs in CAE tasks by designing a dedicated evaluation benchmark, CAE-Bench, focusing on structural mechanics simulation. The benchmark consists of three capability levels—knowledge retention, problem-solving, and simulation application—spanning six fundamental subjects and nine subfields. Based on CAE-Bench, we develop a dataset of 3,340 multiple-choice questions to systematically assess 15 prominent LLMs. [Conclusions] Although LLMs exhibit a basic understanding of CAE knowledge, their proficiency remains limited. Average accuracy reaches 70% on knowledge-retention questions but falls to 50% on problem-solving questions. For simulation-application tasks, which require reasoning and comprehensive analysis, the accuracy declines further as task difficulty increases. Performance also varies substantially across application scenarios, with accuracies differing by up to a factor of four, indicating that a considerable gap still remains before these models can be deployed in real-world engineering applications. This study provides a feasible framework for evaluating LLMs in CAE domain and can serve as a reference for future endeavors in intelligent CAE and automated simulation.

Key words: large language models, benchmark, computer-aided engineering, structural mechanics simulation