图表问答研究综述

doi:10.11871/jfdc.issn.2096-742X.2025.01.002

数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (1): 19-37.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.01.002

doi: 10.11871/jfdc.issn.2096-742X.2025.01.002

• 专刊：生成式人工智能 • 上一篇下一篇

图表问答研究综述

马秋平(),张琪^*(),赵晓凡

中国人民公安大学，信息网络安全学院，北京 100038

收稿日期:2024-11-10 出版日期:2025-02-20 发布日期:2025-02-21
通讯作者: *张琪（E-mail: qi.zhang@ppsuc.edu.cn）
作者简介:马秋平，中国人民公安大学，硕士研究生，CCF学生会员，研究方向为数据分析、视觉问答。本文主要工作为完成文献调研和论文撰写。
MA Qiuping, is a master’s student at the People’s Public Security University of China. He is a CCF student member. His main research interests include data analysis, visual question answering, etc.
In this paper, he is responsible for literature review and paper writing.
E-mail: maqiuping@stu.ppsuc.edu.cn|张琪，中国人民公安大学，副教授，博士，主要研究方向包括计算机视觉、模式识别等。
本文主要承担工作为论文内容修改。
ZHANG Qi, Ph.D., is an associate professor at the People’s Public Security University of China. Her main research directions include computer vision, pattern recognition, etc.
In this paper, she is mainly responsible for revising the manuscript.
E-mail: qi.zhang@ppsuc.edu.cn
基金资助:
中央高校基本科研业务费项目(2024JKF18)

Review of Research on Chart Question Answering

MA Qiuping(),ZHANG Qi^*(),ZHAO Xiaofan

School of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China

Received:2024-11-10 Online:2025-02-20 Published:2025-02-21

摘要/Abstract

摘要：

【目的】本文旨在全面综述图表问答（CQA）技术的研究进展，分析现有模型和方法，并探讨未来发展方向。【方法】首先将CQA模型分为两大类：基于深度学习和基于多模态大模型。针对基于深度学习的方法，本文进一步细分为端到端模型和两阶段模型。随后，深入分析了基于深度学习的CQA任务的三个核心流程，并对各个流程现有的处理方法进行了详细的分类和深入的分析。本文还探讨了基于多模态大模型的CQA模型，分析了其优势、局限性以及未来发展方向。【结果】本文全面总结了CQA技术的研究现状，并对现有模型和方法进行了深入分析。本文发现，基于深度学习的CQA模型在处理标准图表类型和简单任务时表现优异，但在面对复杂、非标准化图表或需要深度推理的任务时仍显不足。而基于多模态大模型的CQA模型则展现出巨大的潜力，但模型性能的提升往往伴随着模型规模和计算复杂度的增加。未来研究应聚焦于开发更轻量化的问答模型，并提升模型的可解释性。

关键词: 图表问答, 视觉问答, 深度学习, 多模态大语言模型

Abstract:

[Objective] The purpose of this paper is to comprehensively review the research progress of Chart Question Answering (CQA) technology, analyze existing models and methods, and explore future development directions. [Methods] Firstly, CQA models are divided into two categories: deep learning-based and multi-modal large models. Deep learning-based methods are further subdivided into end-to-end models and two-stage models in this paper. Subsequently, the three core processes taken by the deep learning-based CQA are deeply analyzed, and a detailed classification along with an in-depth analysis of the existing processing methods for each process is provided. CQA models based on multi-modal large models are also explored in this paper, with their advantages, limitations, and future development directions being analyzed. [Results] The current research status of CQA technology is comprehensively summarized, and an in-depth analysis of existing models and methods is conducted. It is found that deep learning-based CQA models perform well in handling standard chart types and simple tasks, but fall short when facing complex, non-standardized charts or tasks requiring deep reasoning. In contrast, CQA models based on multi-modal large models show great potential, but the improvement in model performance often comes with an increase in model size and computational complexity. Future research should focus on developing more lightweight question answering models and enhancing model interpretability.

Key words: chart question answering, visual question answering, deep learning, multi-modal large language models

马秋平, 张琪, 赵晓凡. 图表问答研究综述[J]. 数据与计算发展前沿, 2025, 7(1): 19-37.

MA Qiuping, ZHANG Qi, ZHAO Xiaofan. Review of Research on Chart Question Answering[J]. Frontiers of Data and Computing, 2025, 7(1): 19-37, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2025.01.002.

图/表 9

图1

图2

图3

图4

图5

图6

图7

表1

表2

图表问答模型在数据集上的评估结果"

		FigureQA		DVQA(No OCR/Oracle)		PlotQA			ChartQA			OpenCQA
		Val1	Val2	Familiar	Novel	Test1	Test2	Avg	Human	Machine	Avg	OpenCQA
基于深度学习的图表问答	RN^[13]	76.39	72.54	-	-	-	-	-	-	-	-	-
	MOM^[14]	-	-	45.03	40.9	-	-	-	-	-	-	-
	ARN^[16]	85.48	82.95	44.5/79.43	44.51/79.58	-	-	-	-	-	-	-
	SANDY^[14]	-	-	36.02/56.48	36.14/56.62	-	-	-	-	-	-	-
	LEAF-Net^[15]	-	81.15	-/72.72	-/72.89	-	-	-	-	-	-	-
	PReFIL^[37]	94.84	93.26	47.7/96.37	47.86/96.53	-	-	-	-	-	-	-
	PlotQA-M^[25]	-	-	57.99	59.54	-	-	22.52	-	-	36.15	-
	CRCT^[50]	94.61	85.04	-	-	79.64	34.44	-	-	-	-	-
	VL-T5^[6]	88.6	88.49	-/94.8	-/77.04	75.9	56.02	-	26.24	56.88	41.56	-
	VisionTaPas^[6]	91.46	91.45	-/95.38	-/95.46	65.3	42.5	-	29.6	61.44	45.52	-
	ChartT5^[45]	-	-	-	-	-	-	-	31.8	74.4	53.16	-
	GoT-CQA^[56]	-	-	-	-	92.8	78.3	-	47.1	87.9	67.5	-
	LWR-RN^[29]	85.91	83.43	44.76/79.83	44.49/79.96	-	-	-	-	-	-	-
	FIBT^[73]	-	-	-	-	-	60.44	-	-	-	-	-
	STL-CQA^[46]	-	-	-/97.35	-/97.51	-	-	-	-	-	-	-
基于大模型的图表问答	QDCHART^[17]	-	-	-	-	-	-	-	34.9	79.4	57.2	-
	Matcha^[18]	-	-	-	-	92.3	90.7	-	38.2	90.2	64.2	-
	Unichart^[19]	-	-	-	-	-	-	-	43.92	88.56	66.24	14.88
	CCM+GPT-3.5+reph PoT SC^[10]	-	-	-	-	62.0	71.4	66.7	67.6	91.4	79.5	-
	ChartLLaMa^[74]	-	-	-	-	-	-	-	48.96	90.36	69.66	-
	ChartAst-D^[68]	-	-	-	-	-	-	-	45.3	91.3	68.3	14.9
	ChartAst-S^[68]	-	-	-	-	-	-	-	65.9	93.9	75.1	15.5
	ChartGemma^[63]	-	-	-	-	-	-	-	64.8	89.44	77.12	-
	ChartInstruct-Flan-T5-XL^[75]	-	-	-	-	-	-	-	50.16	93.84	72	14.81
	ChartMoE^[72]	-	-	-	-	-	-	-	71.36	91.04	81.2	-
	ChartReader-TaPas^[76]	91.12	91.4	-/92.2	-/94.3	74.2	56.2	-	-	-	48.3	-
	EvoChart-4B^[77]	-	-	-	-	-	-	-	-	-	81.5	-
	mChartQA(Intern-LM2)^[78]	96.06	96.3	-	-	78.25	74.79	-	68.24	89.76	79	-
	mChartQA(Qwen)^[78]	90.32	92.75	-	-	78	62.95	-	58.56	93.44	76	-
	ChartReader-T5^[76]	95.5	95.8	-/95.4	-/96.5	78.1	59.3	-	-	-	49.5	-
	SIMPLOT^[79]	-	-	-	-	-	-	-	78.07	88.42	83.24	-
	SynChart^[69]	-	-	-	-	-	-	-	74.24	94.96	84.6	-
	TinyChart@768^[80]	-	-	-	-	-	-	-	73.34	93.86	83.6	-
	ChartPaLI-5B(Conv)^[66]	96.3	96.2	-/86	-/70.7	-	-	-	60.88	93.68	77.28
	VPAgent^[81]	-	-	-/98.5	-/97.1	92.6	78.1	-	48.1	87.4	67.8	-

表2

参考文献 81

[1]	AGRAWAL A, LU J, ANTOL S, et al. VQA: Visual Question Answering[J]. International Journal of Computer Vision, 2015, 123 (1): 4-31. DOI: 10.1109/ICCV.2015.279.
[2]	KIM J, SRINIVASAN A, KIM N W, et al. Exploring Chart Question Answering for Blind and Low Vision Users[J]. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems: ACM, 2023: (4), Hamburg Germany: 1-15.
[3]	BADAM S K, LIU Z, ELMQVIST N. Elastic Documents: Coupling Text and Tables through Contextual Visualizations for Enhanced Document Reading[J]. IEEE transactions on visualization and computer graphics, 2019, 25(1): 661-671.
[4]	SETLUR V, TORY M, DJALALI A. Inferencing underspecified natural language utterances in visual analysis[C]// Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray California: ACM, 2019: 40-51.
[5]	KIM D H, HOQUE E, KIM J, et al. Facilitating document reading by linking text and tables[C]// Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, Berlin Germany: ACM, 2018: 423-434.
[6]	MASRY A, LONG D X, TAN J Q, et al. Chartqa: A benchmark for question answering about charts with visual and logical reasoning[J]. arXiv preprint arXiv:2203.10244, 2022.
[7]	LUO J, LI Z, WANG J, et al. Chartocr: Data extraction from charts images via a deep hybrid framework[C]// Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021: 1917-1925.
[8]	RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J]. Journal of Machine Learning Research, 2020, 21(140): 1-67.
[9]	HERZIG J, NOWAK P K, MÜLLER T, et al. TAPAS: Weakly Supervised Table Parsing via Pre-training[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020: 4320-4333.
[10]	LIU Y C, CHU W T. Chart Question Answering based on Modality Conversion and Large Language Models[C]// Proceedings of the 1st ACM Workshop on AI-Powered Q&A Systems for Multimedia, 2024: 19-24.
[11]	LIU Z, LIN Y, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 10012-10022.
[12]	LEWIS M, LIU Y, GOYAL N, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension[C/OL]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020: 7871-7880. https://www.aclweb.org/anthology/2020.acl-main.703.
[13]	KAHOU S E, MICHALSKI V, ATKINSON A, et al. Figureqa: An annotated figure dataset for visual reasoning[J]. arXiv preprint arXiv:1710.07300, 2017.
[14]	KAFLE K, PRICE B, COHEN S, et al. Dvqa: Understanding data visualizations via question answering[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 5648-5656.
[15]	CHAUDHRY R, SHEKHAR S, GUPTA U, et al. Leaf-qa: Locate, encode & attend for figure question answering[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020: 3512-3521.
[16]	ZOU J, WU G, XUE T, et al. An affinity-driven relation network for figure question answering[C]// 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2020: 1-6.
[17]	ZHENG H, WANG S, THOMAS C, et al. Advancing Chart Question Answering with Robust Chart Component Recognition[J]. arXiv preprint arXiv:2407.21038, 2024.
[18]	LIU F, PICCINNO F, KRICHENE S, et al. MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering[J]. arXiv preprint arXiv:2212.09662, 2022.
[19]	MASRY A, KAVEHZADEH P, DO X L, et al. Unichart: A universal vision-language pretrained model for chart comprehension and reasoning[J]. arXiv preprint arXiv:2305.14761, 2023.
[20]	WU Q, TENEY D, WANG P, et al. Visual question answering: A survey of methods and datasets[J]. Computer Vision and Image Understanding, 2017, 163: 21-40.
[21]	王虞, 孙海春. 视觉问答技术研究综述[J]. 计算机科学与探索, 2023, 17(7): 1487-1505. doi: 10.3778/j.issn.1673-9418.2303025
[22]	张一飞, 孟春运, 蒋洲, 等. 可解释的视觉问答研究进展[J]. 计算机应用研究, 2024, 41(1): 10-20.
[23]	WU A, WANG Y, SHU X, et al. Ai4vis: Survey on artificial intelligence approaches for data visualization[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 28(12): 5049-5070.
[24]	HOQUE E, KAVEHZADEH P, MASRY A. Chart question answering: State of the art and future directions[C]// Computer Graphics Forum, 2022, 41(3): 555-572.
[25]	METHANI N, GANGULY P, KHAPRA M M, et al. Plotqa: Reasoning over scientific plots[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020: 1527-1536.
[26]	YANG Z, HE X, GAO J, et al. Stacked attention networks for image question answering[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 21-29.
[27]	KIM J H, JUN J, ZHANG B T. Bilinear attention networks[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018: 1571-1581.
[28]	SINGH A, NATARAJAN V, SHAH M, et al. Towards vqa models that can read[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 8317-8326.
[29]	黎颖, 吴清锋, 刘佳桐, 等. 引导性权重驱动的图表问答重定位关系网络[J]. 中国图象图形学报, 2023, 28(2): 510-521.
[30]	RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing, 2015: 234-241.
[31]	SANTORO A, RAPOSO D, BARRETT D G T, et al. A simple neural network module for relational reasoning[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., 2017: 4974-4983.
[32]	MASRY A, PRINCE E H. Integrating image data extraction and table parsing methods for chart question answering[C]// Chart Question Answering Workshop, in conjunction with the Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 1-5.
[33]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.
[34]	SMITH R. An overview of the Tesseract OCR engine[C]// Ninth international conference on document analysis and recognition (ICDAR 2007), IEEE, 2007, 2: 629-633.
[35]	PASUPAT P, LIANG P. Compositional semantic parsing on semi-structured tables[J]. arXiv preprint arXiv:1508.00305, 2015. http://aclweb.org/anthology/P15-1142.
[36]	JAIN H, JAYARAMAN S, SOORYANATH I T, et al. TapasQA-Question Answering on Statistical Plots Using Google TAPAS[C]// International Conference on Image Processing and Capsule Networks, Cham: Springer International Publishing, 2022: 63-77.
[37]	KIM D H, HOQUE E, AGRAWALA M. Answering questions about charts and generating visual explanations[C]// Proceedings of the 2020 CHI conference on human factors in computing systems, 2020: 1-13.
[38]	SATYANARAYAN A, MORITZ D, WONGSUPHASAWAT K, et al. Vega-lite: A grammar of interactive graphics[J]. IEEE transactions on visualization and computer graphics, 2016, 23(1): 341-350.
[39]	MAZRAEH FARAHANI A, ADIBI P, EHSANI M S, et al. Chart Question Answering with Multimodal Graph Representation Learning[A/OL]. Rochester, NY, 2023[2024-10-05]. https://www.ssrn.com/abstract=4655474. DOI:10.2139/ssrn.4655474.
[40]	HE K, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]// Proceedings of the IEEE international conference on computer vision, 2017: 2961-2969.
[41]	HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
[42]	DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of naacL-HLT. 2019, 1: 2.
[43]	KAFLE K, SHRESTHA R, PRICE B, et al. Answering questions about data visualizations using efficient bimodal fusion[C]// Proceedings of the IEEE/CVF Winter conference on applications of computer vision, 2020: 1498-1507.
[44]	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 4700-4708.
[45]	ZHOU M, FUNG Y R, CHEN L, et al. Enhanced chart understanding in vision and language task via cross-modal pre-training on plot table pairs[J]. arXiv preprint arXiv:2305.18641, 2023.
[46]	SINGH H, SHEKHAR S. STL-CQA: Structure-based transformers with localization and encoding for chart question answering[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020: 3275-3284.
[47]	GUPTA A, GUPTA V, ZHANG S, et al. Enhancing Question Answering on Charts Through Effective Pre-training Tasks[J]. arXiv preprint arXiv:2406.10085, 2024.
[48]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9 (8) : 1735-1780. doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[49]	LIPTON Z C, BERKOWITZ J, ELKAN C. A Critical Review of Recurrent Neural Networks for Sequence Learning[J/OL]. arXiv Preprint, CoRR, abs/1506.00019, 2015: 1-1. http://arxiv.org/abs/1506.00019.
[50]	LEVY M, BEN-ARI R, LISCHINSKI D. Classification-regression for chart comprehension[C]// European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 469-484.
[51]	VASWANI A, SHAZEER N, PARMAR N. Attention is all you need[C]. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Sys-tems, Red Hook, NY, USA: Curran Associates Inc., 2017: 6000-6010.
[52]	LIU Y, GU J, GOYAL N, et al. Multilingual denoising pre-training for neural machine translation[J/OL]. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742. https://api.semanticscholar.org/CorpusID:210861178.
[53]	MANNING C, SURDEANU M, BAUER J, et al. The Stanford CoreNLP natural language processing toolkit[C]// Proceedings of 52nd annual meeting of the association for computational linguistics:system demonstrations, 2014: 55-60.
[54]	ZHANG Y, PASUPAT P, LIANG P. Macro grammars and holistic triggering for efficient semantic parsing[C]. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017: 1214-1223.
[55]	LIU C, YU J, GUO Y, et al. Breathing New Life into Existing Visualizations: A Natural Language-Driven Manipulation Framework[J]. arXiv preprint arXiv:2404.06039, 2024.
[56]	ZHANG L, HUANG M, WANG Q, et al. GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering[J]. arXiv preprint arXiv:2409.02611, 2024.
[57]	REDDY R, RAMESH R, DESHPANDE A, et al. Figurenet: A deep learning model for question-answering on scientific plots[C]// 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, 2019: 1-8.
[58]	CORTES C, VAPNIK V. Support-Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297.
[59]	SHARMA M, GUPTA S, CHOWDHURY A, et al. Chartnet: Visual reasoning over statistical charts using mac-networks[C]// 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, 2019: 1-7.
[60]	FENG J. Chart Understanding with Large Language Model[J/OL]. Engineering Archive, 2023[2024-10-05]. https://engrxiv.org/preprint/view/3401/version/4747. DOI: 10.31224/3401.
[61]	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// International conference on machine learning, PMLR, 2021: 8748-8763.
[62]	CHIANG W L, LI Z, LIN Z, et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality[J]. See https://vicuna.lmsys.org (accessed 14 April 2023), 2023, 2(3): 6.
[63]	MASRY A, THAKKAR M, BAJAJ A, et al. Chartgemma: Visual instruction-tuning for chart reasoning in the wild[J]. arXiv preprint arXiv:2407.04172, 2024.
[64]	CHEN X, WANG X, BEYER L, et al. Pali-3 vision language models: Smaller, faster, stronger[J]. arXiv preprint arXiv:2310.09199, 2023.
[65]	FORD J, ZHAO X, SCHUMACHER D, et al. Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations[J]. arXiv preprint arXiv:2409.18764, 2024.
[66]	CARBUNE V, MANSOOR H, LIU F, et al. Chart-based reasoning: Transferring capabilities from llms to vlms[C]// Findings of the Association for Computational Linguistics: NAACL 2024, 2024: 989-1004.
[67]	WU Y, YAN L, SHEN L, et al. ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering[C]// Findings of the Association for Computational Linguistics: EMNLP 2024, 2024: 12174-12200.
[68]	MENG F, SHAO W, LU Q, et al. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning[J]. arXiv preprint arXiv:2401.02384, 2024.
[69]	LIU M, LI Q, CHEN D, et al. SynChart: Synthesizing Charts from Language Models[J]. arXiv preprint arXiv:2409.16517, 2024.
[70]	LI Z, JASANI B, TANG P, et al. Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 13613-13623.
[71]	ZENG X, LIN H, YE Y, et al. Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning[J]. IEEE Transactions on Visualization and Computer Graphics, 2024 (1): 1-11.
[72]	XU Z, QU B, QI Y, et al. ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding[J]. arXiv preprint arXiv:2409.03277, 2024.
[73]	BHAISAHEB S, PALIWAL S, PATIL R, et al. Program Synthesis for Complex QA on Charts via Probabilistic Grammar Based Filtered Iterative Back-Translation[C]// Findings of the Association for Computational Linguistics: EACL, 2023: 2501-2515.
[74]	HAN Y, ZHANG C, CHEN X, et al. Chartllama: A multimodal llm for chart understanding and generation[J]. arXiv preprint arXiv:2311.16483, 2023.
[75]	MASRY A, SHAHMOHAMMADI M, PARVEZ M R, et al. ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning[J]. arXiv preprint arXiv:2403.09028, 2024.
[76]	CHENG Z Q, DAI Q, HAUPTMANN A G. Chartreader: A unified framework for chart derendering and comprehension without heuristic rules[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 22202-22213.
[77]	HUANG M, HAN L, ZHANG X, et al. EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding[J]. arXiv preprint arXiv:2409.01577, 2024.
[78]	WEI J, XU N, CHANG G, et al. mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning[J]. arXiv preprint arXiv:2404.01548, 2024.
[79]	KIM W, PARK S, IN Y, et al. SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials[J]. arXiv preprint arXiv:2405.00021, 2024.
[80]	ZHANG L, HU A, XU H, et al. Tinychart: Efficient chart understanding with visual token merging and program-of-thoughts learning[J]. arXiv preprint arXiv:2404.16635, 2024.
[81]	HUANG M, ZHANG L, HAN L, et al. VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning[J]. arXiv preprint arXiv:2409.01667, 2024.

数据集	创建日期	图表类型	问题类型	说明
FigureQA	2018年	折线图、条形图、饼图	结构查询	数据来源于随机抽样，图表由Bokeh绘图库绘制，问题基于模板生成，答案仅包含Yes/No两种
DVQA	2018年	条形图	结构理解、数据检索、推理	数据来源于随机抽样，图表由Matplotlib绘图库绘制，问题基于模板生成
LEAF-QA	2019年	条形图、饼图、箱形图、散点图、折线图	结构理解、推理	数据来源各种在线的统计表，然后使用Matplotlib库生成各种风格的统计图，问题基于模板生成
PlotQA	2020年	条形图、折线图、散点图	结构理解、数据检索、推理	图表数据来自真实的统计数据，问题和答案基于众包收集和人工模板生成，涵盖了多种图表类型、复杂问题和开放词汇答案
ChartQA	2022年	条形图、折线图、饼图	结构理解、数据检索、推理	图表来源与真实场景，问题由人工标注和T5模型生成两种方式生成

图表问答研究综述

Review of Research on Chart Question Answering

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 81

相关文章 15

编辑推荐

Metrics

本文评价

[1]	水映懿, 张琪, 李根, 张士豪, 吴尚. 基于多类特征的社交网络影响力预测研究综述[J]. 数据与计算发展前沿, 2025, 7(1): 2-18.
[2]	金家立, 高思远, 高满达, 王文彬, 柳绍祯, 孙哲南. 基于生成对抗网络和扩散模型的人脸年龄编辑综述[J]. 数据与计算发展前沿, 2025, 7(1): 38-55.
[3]	卢成浩,陈秀宏. 基于隐式分区学习深度特征融合重建曲面网络[J]. 数据与计算发展前沿, 2024, 6(6): 19-31.
[4]	韦一金,樊景超. 基于改进的BERT-BiGRU-Attention的农业科技政策分类模型[J]. 数据与计算发展前沿, 2024, 6(6): 53-61.
[5]	何文通,罗泽. 基于联邦学习的野生动物红外相机图像目标检测[J]. 数据与计算发展前沿, 2024, 6(6): 85-96.
[6]	晏直誉, 茹一伟, 孙福鹏, 孙哲南. 基于主动感知机制的视频行为识别方法研究[J]. 数据与计算发展前沿, 2024, 6(5): 66-79.
[7]	廖立波, 王书栋, 宋维民, 张兆领, 李刚, 黄永盛. CEPC上基于DeepSets模型的喷注标记算法研究[J]. 数据与计算发展前沿, 2024, 6(3): 108-115.
[8]	严瑾, 董科军, 李洪涛. 融合语义和共现特征的Web跟踪器深度识别方法[J]. 数据与计算发展前沿, 2024, 6(3): 127-138.
[9]	寇大治. 基于深度学习的口腔全景片牙齿自动分割方法[J]. 数据与计算发展前沿, 2024, 6(3): 162-172.
[10]	蔡程飞, 李军, 焦一平, 王向学, 郭冠辰, 徐军. 基于深度学习的医学多模态数据融合方法在肿瘤学中的进展和挑战[J]. 数据与计算发展前沿, 2024, 6(3): 3-14.
[11]	郑懿诺, 孙沐毅, 张虹云, 张婧, 邓天政, 刘倩. 深度学习在口腔种植影像学中的应用：研究进展与挑战[J]. 数据与计算发展前沿, 2024, 6(3): 41-49.
[12]	袁家琳, 欧阳汝珊, 戴懿, 赖小慧, 马捷, 龚静山. 基于深度学习乳腺X线摄影钙化识别分类模型的临床应用价值[J]. 数据与计算发展前沿, 2024, 6(2): 68-79.
[13]	王子元, 王国中. 改进的轻量级YOLOv5算法在行人检测的应用[J]. 数据与计算发展前沿, 2023, 5(6): 161-172.
[14]	巨家骥, 黄勃, 张帅, 郭茹燕. 融合情感词典和自注意力的双通道情感分析模型[J]. 数据与计算发展前沿, 2023, 5(4): 101-111.
[15]	李俊飞, 徐黎明, 汪洋, 魏鑫. 基于深度学习技术的科技文献引文分类研究综述[J]. 数据与计算发展前沿, 2023, 5(4): 86-100.