Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (1): 19-37.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.01.002

doi: 10.11871/jfdc.issn.2096-742X.2025.01.002

• Special Issue: Generative Artificial Intelligence • Previous Articles     Next Articles

Review of Research on Chart Question Answering

MA Qiuping(),ZHANG Qi*(),ZHAO Xiaofan   

  1. School of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2024-11-10 Online:2025-02-20 Published:2025-02-21

Abstract:

[Objective] The purpose of this paper is to comprehensively review the research progress of Chart Question Answering (CQA) technology, analyze existing models and methods, and explore future development directions. [Methods] Firstly, CQA models are divided into two categories: deep learning-based and multi-modal large models. Deep learning-based methods are further subdivided into end-to-end models and two-stage models in this paper. Subsequently, the three core processes taken by the deep learning-based CQA are deeply analyzed, and a detailed classification along with an in-depth analysis of the existing processing methods for each process is provided. CQA models based on multi-modal large models are also explored in this paper, with their advantages, limitations, and future development directions being analyzed. [Results] The current research status of CQA technology is comprehensively summarized, and an in-depth analysis of existing models and methods is conducted. It is found that deep learning-based CQA models perform well in handling standard chart types and simple tasks, but fall short when facing complex, non-standardized charts or tasks requiring deep reasoning. In contrast, CQA models based on multi-modal large models show great potential, but the improvement in model performance often comes with an increase in model size and computational complexity. Future research should focus on developing more lightweight question answering models and enhancing model interpretability.

Key words: chart question answering, visual question answering, deep learning, multi-modal large language models