数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (1): 19-37.
CSTR: 32002.14.jfdc.CN10-1649/TP.2025.01.002
doi: 10.11871/jfdc.issn.2096-742X.2025.01.002
收稿日期:
2024-11-10
出版日期:
2025-02-20
发布日期:
2025-02-21
通讯作者:
*张琪(E-mail: 作者简介:
马秋平,中国人民公安大学,硕士研究生,CCF学生会员,研究方向为数据分析、视觉问答。本文主要工作为完成文献调研和论文撰写。基金资助:
MA Qiuping(),ZHANG Qi*(
),ZHAO Xiaofan
Received:
2024-11-10
Online:
2025-02-20
Published:
2025-02-21
摘要:
【目的】本文旨在全面综述图表问答(CQA)技术的研究进展,分析现有模型和方法,并探讨未来发展方向。【方法】首先将CQA模型分为两大类:基于深度学习和基于多模态大模型。针对基于深度学习的方法,本文进一步细分为端到端模型和两阶段模型。随后,深入分析了基于深度学习的CQA任务的三个核心流程,并对各个流程现有的处理方法进行了详细的分类和深入的分析。本文还探讨了基于多模态大模型的CQA模型,分析了其优势、局限性以及未来发展方向。【结果】本文全面总结了CQA技术的研究现状,并对现有模型和方法进行了深入分析。本文发现,基于深度学习的CQA模型在处理标准图表类型和简单任务时表现优异,但在面对复杂、非标准化图表或需要深度推理的任务时仍显不足。而基于多模态大模型的CQA模型则展现出巨大的潜力,但模型性能的提升往往伴随着模型规模和计算复杂度的增加。未来研究应聚焦于开发更轻量化的问答模型,并提升模型的可解释性。
马秋平, 张琪, 赵晓凡. 图表问答研究综述[J]. 数据与计算发展前沿, 2025, 7(1): 19-37.
MA Qiuping, ZHANG Qi, ZHAO Xiaofan. Review of Research on Chart Question Answering[J]. Frontiers of Data and Computing, 2025, 7(1): 19-37, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2025.01.002.
表1
图表问答数据集"
数据集 | 创建日期 | 图表类型 | 问题类型 | 说明 |
---|---|---|---|---|
FigureQA | 2018年 | 折线图、条形图、 饼图 | 结构查询 | 数据来源于随机抽样,图表由Bokeh绘图库绘制,问题基于模板生成,答案仅包含Yes/No两种 |
DVQA | 2018年 | 条形图 | 结构理解、数据检索、推理 | 数据来源于随机抽样,图表由Matplotlib绘图库绘制,问题基于模板生成 |
LEAF-QA | 2019年 | 条形图、饼图、箱形图、散点图、折线图 | 结构理解、推理 | 数据来源各种在线的统计表,然后使用Matplotlib库生成各种风格的统计图,问题基于模板生成 |
PlotQA | 2020年 | 条形图、折线图、 散点图 | 结构理解、数据检索、推理 | 图表数据来自真实的统计数据,问题和答案基于众包收集和人工模板生成,涵盖了多种图表类型、复杂问题和开放词汇答案 |
ChartQA | 2022年 | 条形图、折线图、 饼图 | 结构理解、数据检索、推理 | 图表来源与真实场景,问题由人工标注和T5模型生成两种方式生成 |
表2
图表问答模型在数据集上的评估结果"
FigureQA | DVQA(No OCR/Oracle) | PlotQA | ChartQA | OpenCQA | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Val1 | Val2 | Familiar | Novel | Test1 | Test2 | Avg | Human | Machine | Avg | |||
基于深 度学习 的图表 问答 | RN[ | 76.39 | 72.54 | - | - | - | - | - | - | - | - | - |
MOM[ | - | - | 45.03 | 40.9 | - | - | - | - | - | - | - | |
ARN[ | 85.48 | 82.95 | 44.5/79.43 | 44.51/79.58 | - | - | - | - | - | - | - | |
SANDY[ | - | - | 36.02/56.48 | 36.14/56.62 | - | - | - | - | - | - | - | |
LEAF-Net[ | - | 81.15 | -/72.72 | -/72.89 | - | - | - | - | - | - | - | |
PReFIL[ | 94.84 | 93.26 | 47.7/96.37 | 47.86/96.53 | - | - | - | - | - | - | - | |
PlotQA-M[ | - | - | 57.99 | 59.54 | - | - | 22.52 | - | - | 36.15 | - | |
CRCT[ | 94.61 | 85.04 | - | - | 79.64 | 34.44 | - | - | - | - | - | |
VL-T5[ | 88.6 | 88.49 | -/94.8 | -/77.04 | 75.9 | 56.02 | - | 26.24 | 56.88 | 41.56 | - | |
VisionTaPas[ | 91.46 | 91.45 | -/95.38 | -/95.46 | 65.3 | 42.5 | - | 29.6 | 61.44 | 45.52 | - | |
ChartT5[ | - | - | - | - | - | - | - | 31.8 | 74.4 | 53.16 | - | |
GoT-CQA[ | - | - | - | - | 92.8 | 78.3 | - | 47.1 | 87.9 | 67.5 | - | |
LWR-RN[ | 85.91 | 83.43 | 44.76/79.83 | 44.49/79.96 | - | - | - | - | - | - | - | |
FIBT[ | - | - | - | - | - | 60.44 | - | - | - | - | - | |
STL-CQA[ | - | - | -/97.35 | -/97.51 | - | - | - | - | - | - | - | |
基于大 模型的 图表 问答 | QDCHART[ | - | - | - | - | - | - | - | 34.9 | 79.4 | 57.2 | - |
Matcha[ | - | - | - | - | 92.3 | 90.7 | - | 38.2 | 90.2 | 64.2 | - | |
Unichart[ | - | - | - | - | - | - | - | 43.92 | 88.56 | 66.24 | 14.88 | |
CCM+GPT-3.5+reph PoT SC[ | - | - | - | - | 62.0 | 71.4 | 66.7 | 67.6 | 91.4 | 79.5 | - | |
ChartLLaMa[ | - | - | - | - | - | - | - | 48.96 | 90.36 | 69.66 | - | |
ChartAst-D[ | - | - | - | - | - | - | - | 45.3 | 91.3 | 68.3 | 14.9 | |
ChartAst-S[ | - | - | - | - | - | - | - | 65.9 | 93.9 | 75.1 | 15.5 | |
ChartGemma[ | - | - | - | - | - | - | - | 64.8 | 89.44 | 77.12 | - | |
ChartInstruct-Flan-T5-XL[ | - | - | - | - | - | - | - | 50.16 | 93.84 | 72 | 14.81 | |
ChartMoE[ | - | - | - | - | - | - | - | 71.36 | 91.04 | 81.2 | - | |
ChartReader-TaPas[ | 91.12 | 91.4 | -/92.2 | -/94.3 | 74.2 | 56.2 | - | - | - | 48.3 | - | |
EvoChart-4B[ | - | - | - | - | - | - | - | - | - | 81.5 | - | |
mChartQA(Intern-LM2)[ | 96.06 | 96.3 | - | - | 78.25 | 74.79 | - | 68.24 | 89.76 | 79 | - | |
mChartQA(Qwen)[ | 90.32 | 92.75 | - | - | 78 | 62.95 | - | 58.56 | 93.44 | 76 | - | |
ChartReader-T5[ | 95.5 | 95.8 | -/95.4 | -/96.5 | 78.1 | 59.3 | - | - | - | 49.5 | - | |
SIMPLOT[ | - | - | - | - | - | - | - | 78.07 | 88.42 | 83.24 | - | |
SynChart[ | - | - | - | - | - | - | - | 74.24 | 94.96 | 84.6 | - | |
TinyChart@768[ | - | - | - | - | - | - | - | 73.34 | 93.86 | 83.6 | - | |
ChartPaLI-5B(Conv)[ | 96.3 | 96.2 | -/86 | -/70.7 | - | - | - | 60.88 | 93.68 | 77.28 | ||
VPAgent[ | - | - | -/98.5 | -/97.1 | 92.6 | 78.1 | - | 48.1 | 87.4 | 67.8 | - |
[1] | AGRAWAL A, LU J, ANTOL S, et al. VQA: Visual Question Answering[J]. International Journal of Computer Vision, 2015, 123 (1): 4-31. DOI: 10.1109/ICCV.2015.279. |
[2] | KIM J, SRINIVASAN A, KIM N W, et al. Exploring Chart Question Answering for Blind and Low Vision Users[J]. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems: ACM, 2023: (4), Hamburg Germany: 1-15. |
[3] | BADAM S K, LIU Z, ELMQVIST N. Elastic Documents: Coupling Text and Tables through Contextual Visualizations for Enhanced Document Reading[J]. IEEE transactions on visualization and computer graphics, 2019, 25(1): 661-671. |
[4] | SETLUR V, TORY M, DJALALI A. Inferencing underspecified natural language utterances in visual analysis[C]// Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray California: ACM, 2019: 40-51. |
[5] | KIM D H, HOQUE E, KIM J, et al. Facilitating document reading by linking text and tables[C]// Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, Berlin Germany: ACM, 2018: 423-434. |
[6] | MASRY A, LONG D X, TAN J Q, et al. Chartqa: A benchmark for question answering about charts with visual and logical reasoning[J]. arXiv preprint arXiv:2203.10244, 2022. |
[7] | LUO J, LI Z, WANG J, et al. Chartocr: Data extraction from charts images via a deep hybrid framework[C]// Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021: 1917-1925. |
[8] | RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J]. Journal of Machine Learning Research, 2020, 21(140): 1-67. |
[9] | HERZIG J, NOWAK P K, MÜLLER T, et al. TAPAS: Weakly Supervised Table Parsing via Pre-training[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020: 4320-4333. |
[10] | LIU Y C, CHU W T. Chart Question Answering based on Modality Conversion and Large Language Models[C]// Proceedings of the 1st ACM Workshop on AI-Powered Q&A Systems for Multimedia, 2024: 19-24. |
[11] | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 10012-10022. |
[12] | LEWIS M, LIU Y, GOYAL N, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension[C/OL]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020: 7871-7880. https://www.aclweb.org/anthology/2020.acl-main.703. |
[13] | KAHOU S E, MICHALSKI V, ATKINSON A, et al. Figureqa: An annotated figure dataset for visual reasoning[J]. arXiv preprint arXiv:1710.07300, 2017. |
[14] | KAFLE K, PRICE B, COHEN S, et al. Dvqa: Understanding data visualizations via question answering[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 5648-5656. |
[15] | CHAUDHRY R, SHEKHAR S, GUPTA U, et al. Leaf-qa: Locate, encode & attend for figure question answering[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020: 3512-3521. |
[16] | ZOU J, WU G, XUE T, et al. An affinity-driven relation network for figure question answering[C]// 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2020: 1-6. |
[17] | ZHENG H, WANG S, THOMAS C, et al. Advancing Chart Question Answering with Robust Chart Component Recognition[J]. arXiv preprint arXiv:2407.21038, 2024. |
[18] | LIU F, PICCINNO F, KRICHENE S, et al. MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering[J]. arXiv preprint arXiv:2212.09662, 2022. |
[19] | MASRY A, KAVEHZADEH P, DO X L, et al. Unichart: A universal vision-language pretrained model for chart comprehension and reasoning[J]. arXiv preprint arXiv:2305.14761, 2023. |
[20] | WU Q, TENEY D, WANG P, et al. Visual question answering: A survey of methods and datasets[J]. Computer Vision and Image Understanding, 2017, 163: 21-40. |
[21] |
王虞, 孙海春. 视觉问答技术研究综述[J]. 计算机科学与探索, 2023, 17(7): 1487-1505.
doi: 10.3778/j.issn.1673-9418.2303025 |
[22] | 张一飞, 孟春运, 蒋洲, 等. 可解释的视觉问答研究进展[J]. 计算机应用研究, 2024, 41(1): 10-20. |
[23] | WU A, WANG Y, SHU X, et al. Ai4vis: Survey on artificial intelligence approaches for data visualization[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 28(12): 5049-5070. |
[24] | HOQUE E, KAVEHZADEH P, MASRY A. Chart question answering: State of the art and future directions[C]// Computer Graphics Forum, 2022, 41(3): 555-572. |
[25] | METHANI N, GANGULY P, KHAPRA M M, et al. Plotqa: Reasoning over scientific plots[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020: 1527-1536. |
[26] | YANG Z, HE X, GAO J, et al. Stacked attention networks for image question answering[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 21-29. |
[27] | KIM J H, JUN J, ZHANG B T. Bilinear attention networks[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018: 1571-1581. |
[28] | SINGH A, NATARAJAN V, SHAH M, et al. Towards vqa models that can read[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 8317-8326. |
[29] | 黎颖, 吴清锋, 刘佳桐, 等. 引导性权重驱动的图表问答重定位关系网络[J]. 中国图象图形学报, 2023, 28(2): 510-521. |
[30] | RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing, 2015: 234-241. |
[31] | SANTORO A, RAPOSO D, BARRETT D G T, et al. A simple neural network module for relational reasoning[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., 2017: 4974-4983. |
[32] | MASRY A, PRINCE E H. Integrating image data extraction and table parsing methods for chart question answering[C]// Chart Question Answering Workshop, in conjunction with the Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 1-5. |
[33] | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149. |
[34] | SMITH R. An overview of the Tesseract OCR engine[C]// Ninth international conference on document analysis and recognition (ICDAR 2007), IEEE, 2007, 2: 629-633. |
[35] | PASUPAT P, LIANG P. Compositional semantic parsing on semi-structured tables[J]. arXiv preprint arXiv:1508.00305, 2015. http://aclweb.org/anthology/P15-1142. |
[36] | JAIN H, JAYARAMAN S, SOORYANATH I T, et al. TapasQA-Question Answering on Statistical Plots Using Google TAPAS[C]// International Conference on Image Processing and Capsule Networks, Cham: Springer International Publishing, 2022: 63-77. |
[37] | KIM D H, HOQUE E, AGRAWALA M. Answering questions about charts and generating visual explanations[C]// Proceedings of the 2020 CHI conference on human factors in computing systems, 2020: 1-13. |
[38] | SATYANARAYAN A, MORITZ D, WONGSUPHASAWAT K, et al. Vega-lite: A grammar of interactive graphics[J]. IEEE transactions on visualization and computer graphics, 2016, 23(1): 341-350. |
[39] | MAZRAEH FARAHANI A, ADIBI P, EHSANI M S, et al. Chart Question Answering with Multimodal Graph Representation Learning[A/OL]. Rochester, NY, 2023[2024-10-05]. https://www.ssrn.com/abstract=4655474. DOI:10.2139/ssrn.4655474. |
[40] | HE K, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]// Proceedings of the IEEE international conference on computer vision, 2017: 2961-2969. |
[41] | HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017. |
[42] | DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of naacL-HLT. 2019, 1: 2. |
[43] | KAFLE K, SHRESTHA R, PRICE B, et al. Answering questions about data visualizations using efficient bimodal fusion[C]// Proceedings of the IEEE/CVF Winter conference on applications of computer vision, 2020: 1498-1507. |
[44] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 4700-4708. |
[45] | ZHOU M, FUNG Y R, CHEN L, et al. Enhanced chart understanding in vision and language task via cross-modal pre-training on plot table pairs[J]. arXiv preprint arXiv:2305.18641, 2023. |
[46] | SINGH H, SHEKHAR S. STL-CQA: Structure-based transformers with localization and encoding for chart question answering[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020: 3275-3284. |
[47] | GUPTA A, GUPTA V, ZHANG S, et al. Enhancing Question Answering on Charts Through Effective Pre-training Tasks[J]. arXiv preprint arXiv:2406.10085, 2024. |
[48] |
HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9 (8) : 1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276 |
[49] | LIPTON Z C, BERKOWITZ J, ELKAN C. A Critical Review of Recurrent Neural Networks for Sequence Learning[J/OL]. arXiv Preprint, CoRR, abs/1506.00019, 2015: 1-1. http://arxiv.org/abs/1506.00019. |
[50] | LEVY M, BEN-ARI R, LISCHINSKI D. Classification-regression for chart comprehension[C]// European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 469-484. |
[51] | VASWANI A, SHAZEER N, PARMAR N. Attention is all you need[C]. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Sys-tems, Red Hook, NY, USA: Curran Associates Inc., 2017: 6000-6010. |
[52] | LIU Y, GU J, GOYAL N, et al. Multilingual denoising pre-training for neural machine translation[J/OL]. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742. https://api.semanticscholar.org/CorpusID:210861178. |
[53] | MANNING C, SURDEANU M, BAUER J, et al. The Stanford CoreNLP natural language processing toolkit[C]// Proceedings of 52nd annual meeting of the association for computational linguistics:system demonstrations, 2014: 55-60. |
[54] | ZHANG Y, PASUPAT P, LIANG P. Macro grammars and holistic triggering for efficient semantic parsing[C]. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017: 1214-1223. |
[55] | LIU C, YU J, GUO Y, et al. Breathing New Life into Existing Visualizations: A Natural Language-Driven Manipulation Framework[J]. arXiv preprint arXiv:2404.06039, 2024. |
[56] | ZHANG L, HUANG M, WANG Q, et al. GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering[J]. arXiv preprint arXiv:2409.02611, 2024. |
[57] | REDDY R, RAMESH R, DESHPANDE A, et al. Figurenet: A deep learning model for question-answering on scientific plots[C]// 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, 2019: 1-8. |
[58] | CORTES C, VAPNIK V. Support-Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297. |
[59] | SHARMA M, GUPTA S, CHOWDHURY A, et al. Chartnet: Visual reasoning over statistical charts using mac-networks[C]// 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, 2019: 1-7. |
[60] | FENG J. Chart Understanding with Large Language Model[J/OL]. Engineering Archive, 2023[2024-10-05]. https://engrxiv.org/preprint/view/3401/version/4747. DOI: 10.31224/3401. |
[61] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// International conference on machine learning, PMLR, 2021: 8748-8763. |
[62] | CHIANG W L, LI Z, LIN Z, et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality[J]. See https://vicuna.lmsys.org (accessed 14 April 2023), 2023, 2(3): 6. |
[63] | MASRY A, THAKKAR M, BAJAJ A, et al. Chartgemma: Visual instruction-tuning for chart reasoning in the wild[J]. arXiv preprint arXiv:2407.04172, 2024. |
[64] | CHEN X, WANG X, BEYER L, et al. Pali-3 vision language models: Smaller, faster, stronger[J]. arXiv preprint arXiv:2310.09199, 2023. |
[65] | FORD J, ZHAO X, SCHUMACHER D, et al. Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations[J]. arXiv preprint arXiv:2409.18764, 2024. |
[66] | CARBUNE V, MANSOOR H, LIU F, et al. Chart-based reasoning: Transferring capabilities from llms to vlms[C]// Findings of the Association for Computational Linguistics: NAACL 2024, 2024: 989-1004. |
[67] | WU Y, YAN L, SHEN L, et al. ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering[C]// Findings of the Association for Computational Linguistics: EMNLP 2024, 2024: 12174-12200. |
[68] | MENG F, SHAO W, LU Q, et al. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning[J]. arXiv preprint arXiv:2401.02384, 2024. |
[69] | LIU M, LI Q, CHEN D, et al. SynChart: Synthesizing Charts from Language Models[J]. arXiv preprint arXiv:2409.16517, 2024. |
[70] | LI Z, JASANI B, TANG P, et al. Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 13613-13623. |
[71] | ZENG X, LIN H, YE Y, et al. Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning[J]. IEEE Transactions on Visualization and Computer Graphics, 2024 (1): 1-11. |
[72] | XU Z, QU B, QI Y, et al. ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding[J]. arXiv preprint arXiv:2409.03277, 2024. |
[73] | BHAISAHEB S, PALIWAL S, PATIL R, et al. Program Synthesis for Complex QA on Charts via Probabilistic Grammar Based Filtered Iterative Back-Translation[C]// Findings of the Association for Computational Linguistics: EACL, 2023: 2501-2515. |
[74] | HAN Y, ZHANG C, CHEN X, et al. Chartllama: A multimodal llm for chart understanding and generation[J]. arXiv preprint arXiv:2311.16483, 2023. |
[75] | MASRY A, SHAHMOHAMMADI M, PARVEZ M R, et al. ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning[J]. arXiv preprint arXiv:2403.09028, 2024. |
[76] | CHENG Z Q, DAI Q, HAUPTMANN A G. Chartreader: A unified framework for chart derendering and comprehension without heuristic rules[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 22202-22213. |
[77] | HUANG M, HAN L, ZHANG X, et al. EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding[J]. arXiv preprint arXiv:2409.01577, 2024. |
[78] | WEI J, XU N, CHANG G, et al. mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning[J]. arXiv preprint arXiv:2404.01548, 2024. |
[79] | KIM W, PARK S, IN Y, et al. SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials[J]. arXiv preprint arXiv:2405.00021, 2024. |
[80] | ZHANG L, HU A, XU H, et al. Tinychart: Efficient chart understanding with visual token merging and program-of-thoughts learning[J]. arXiv preprint arXiv:2404.16635, 2024. |
[81] | HUANG M, ZHANG L, HAN L, et al. VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning[J]. arXiv preprint arXiv:2409.01667, 2024. |
[1] | 水映懿, 张琪, 李根, 张士豪, 吴尚. 基于多类特征的社交网络影响力预测研究综述[J]. 数据与计算发展前沿, 2025, 7(1): 2-18. |
[2] | 金家立, 高思远, 高满达, 王文彬, 柳绍祯, 孙哲南. 基于生成对抗网络和扩散模型的人脸年龄编辑综述[J]. 数据与计算发展前沿, 2025, 7(1): 38-55. |
[3] | 卢成浩,陈秀宏. 基于隐式分区学习深度特征融合重建曲面网络[J]. 数据与计算发展前沿, 2024, 6(6): 19-31. |
[4] | 韦一金,樊景超. 基于改进的BERT-BiGRU-Attention的农业科技政策分类模型[J]. 数据与计算发展前沿, 2024, 6(6): 53-61. |
[5] | 何文通,罗泽. 基于联邦学习的野生动物红外相机图像目标检测[J]. 数据与计算发展前沿, 2024, 6(6): 85-96. |
[6] | 晏直誉, 茹一伟, 孙福鹏, 孙哲南. 基于主动感知机制的视频行为识别方法研究[J]. 数据与计算发展前沿, 2024, 6(5): 66-79. |
[7] | 廖立波, 王书栋, 宋维民, 张兆领, 李刚, 黄永盛. CEPC上基于DeepSets模型的喷注标记算法研究[J]. 数据与计算发展前沿, 2024, 6(3): 108-115. |
[8] | 严瑾, 董科军, 李洪涛. 融合语义和共现特征的Web跟踪器深度识别方法[J]. 数据与计算发展前沿, 2024, 6(3): 127-138. |
[9] | 寇大治. 基于深度学习的口腔全景片牙齿自动分割方法[J]. 数据与计算发展前沿, 2024, 6(3): 162-172. |
[10] | 蔡程飞, 李军, 焦一平, 王向学, 郭冠辰, 徐军. 基于深度学习的医学多模态数据融合方法在肿瘤学中的进展和挑战[J]. 数据与计算发展前沿, 2024, 6(3): 3-14. |
[11] | 郑懿诺, 孙沐毅, 张虹云, 张婧, 邓天政, 刘倩. 深度学习在口腔种植影像学中的应用:研究进展与挑战[J]. 数据与计算发展前沿, 2024, 6(3): 41-49. |
[12] | 袁家琳, 欧阳汝珊, 戴懿, 赖小慧, 马捷, 龚静山. 基于深度学习乳腺X线摄影钙化识别分类模型的临床应用价值[J]. 数据与计算发展前沿, 2024, 6(2): 68-79. |
[13] | 王子元, 王国中. 改进的轻量级YOLOv5算法在行人检测的应用[J]. 数据与计算发展前沿, 2023, 5(6): 161-172. |
[14] | 巨家骥, 黄勃, 张帅, 郭茹燕. 融合情感词典和自注意力的双通道情感分析模型[J]. 数据与计算发展前沿, 2023, 5(4): 101-111. |
[15] | 李俊飞, 徐黎明, 汪洋, 魏鑫. 基于深度学习技术的科技文献引文分类研究综述[J]. 数据与计算发展前沿, 2023, 5(4): 86-100. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||