数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (1): 19-37.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.01.002

doi: 10.11871/jfdc.issn.2096-742X.2025.01.002

• 专刊:生成式人工智能 • 上一篇    下一篇

图表问答研究综述

马秋平(),张琪*(),赵晓凡   

  1. 中国人民公安大学,信息网络安全学院,北京 100038
  • 收稿日期:2024-11-10 出版日期:2025-02-20 发布日期:2025-02-21
  • 通讯作者: *张琪(E-mail: qi.zhang@ppsuc.edu.cn
  • 作者简介:马秋平,中国人民公安大学,硕士研究生,CCF学生会员,研究方向为数据分析、视觉问答。本文主要工作为完成文献调研和论文撰写。
    MA Qiuping, is a master’s student at the People’s Public Security University of China. He is a CCF student member. His main research interests include data analysis, visual question answering, etc.
    In this paper, he is responsible for literature review and paper writing.
    E-mail: maqiuping@stu.ppsuc.edu.cn|张琪,中国人民公安大学,副教授,博士,主要研究方向包括计算机视觉、模式识别等。
    本文主要承担工作为论文内容修改。
    ZHANG Qi, Ph.D., is an associate professor at the People’s Public Security University of China. Her main research directions include computer vision, pattern recognition, etc.
    In this paper, she is mainly responsible for revising the manuscript.
    E-mail: qi.zhang@ppsuc.edu.cn
  • 基金资助:
    中央高校基本科研业务费项目(2024JKF18)

Review of Research on Chart Question Answering

MA Qiuping(),ZHANG Qi*(),ZHAO Xiaofan   

  1. School of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2024-11-10 Online:2025-02-20 Published:2025-02-21

摘要:

【目的】本文旨在全面综述图表问答(CQA)技术的研究进展,分析现有模型和方法,并探讨未来发展方向。【方法】首先将CQA模型分为两大类:基于深度学习和基于多模态大模型。针对基于深度学习的方法,本文进一步细分为端到端模型和两阶段模型。随后,深入分析了基于深度学习的CQA任务的三个核心流程,并对各个流程现有的处理方法进行了详细的分类和深入的分析。本文还探讨了基于多模态大模型的CQA模型,分析了其优势、局限性以及未来发展方向。【结果】本文全面总结了CQA技术的研究现状,并对现有模型和方法进行了深入分析。本文发现,基于深度学习的CQA模型在处理标准图表类型和简单任务时表现优异,但在面对复杂、非标准化图表或需要深度推理的任务时仍显不足。而基于多模态大模型的CQA模型则展现出巨大的潜力,但模型性能的提升往往伴随着模型规模和计算复杂度的增加。未来研究应聚焦于开发更轻量化的问答模型,并提升模型的可解释性。

关键词: 图表问答, 视觉问答, 深度学习, 多模态大语言模型

Abstract:

[Objective] The purpose of this paper is to comprehensively review the research progress of Chart Question Answering (CQA) technology, analyze existing models and methods, and explore future development directions. [Methods] Firstly, CQA models are divided into two categories: deep learning-based and multi-modal large models. Deep learning-based methods are further subdivided into end-to-end models and two-stage models in this paper. Subsequently, the three core processes taken by the deep learning-based CQA are deeply analyzed, and a detailed classification along with an in-depth analysis of the existing processing methods for each process is provided. CQA models based on multi-modal large models are also explored in this paper, with their advantages, limitations, and future development directions being analyzed. [Results] The current research status of CQA technology is comprehensively summarized, and an in-depth analysis of existing models and methods is conducted. It is found that deep learning-based CQA models perform well in handling standard chart types and simple tasks, but fall short when facing complex, non-standardized charts or tasks requiring deep reasoning. In contrast, CQA models based on multi-modal large models show great potential, but the improvement in model performance often comes with an increase in model size and computational complexity. Future research should focus on developing more lightweight question answering models and enhancing model interpretability.

Key words: chart question answering, visual question answering, deep learning, multi-modal large language models