文档图像识别技术回顾与展望

doi:10.11871/jfdc.issn.2096-742X.2019.02.002

数据与计算发展前沿 ›› 2019, Vol. 1 ›› Issue (2): 17-25.

doi: 10.11871/jfdc.issn.2096-742X.2019.02.002

所属专题： “人工智能”专刊

文档图像识别技术回顾与展望

刘成林^1,^2,^3,^*()

1. 中国科学院自动化研究所,模式识别国家重点实验室,北京 100190
2. 中国科学院大学,人工智能学院,北京 100049
3. 中国科学院脑科学与智能技术卓越创新中心,北京 100190

收稿日期:2019-11-07 出版日期:2019-12-20 发布日期:2020-01-15
通讯作者: 刘成林
作者简介:刘成林,1967年生,中国科学院自动化研究所,副所长,模式识别国家重点实验室主任,研究员、博士生导师。1989年毕业于武汉大学无线电信息工程系,1992年在北京工业大学获电路与系统专业工学硕士学位,1995年在中国科学院自动化研究所获模式识别与智能控制专业工学博士学位。1996年3月-1997年10月在韩国科学技术院（KAIST）从事博士后研究。1997年11月-1999年3月在日本东京农工大学从事博士后研究。1999年3月-2004年12月在日立中央研究所（东京）先后任研究员和主任研究员。2005年入选中国科学院“百人计划”。2008年获得国家杰出青年科学基金资助。研究兴趣包括图像处理、模式识别、机器学习、文字识别与文档分析等。在国内外期刊和学术会议上发表论文300余篇,合著英文专著一本。现任国际刊物Pattern Recognition的副主编,Image and Vision Computing, Int. J. Document Analysis and Recognition, Cognitive Computation的编委,国内期刊《自动化学报》的副主编。美国电气电子工程师协会会士 (IEEE Fellow)、国际模式识别学会会士(IAPR Fellow)。
Liu Chenglin is a Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, and he is now the director of the laboratory. He received the B.S. degree in electronic engineering from Wuhan University, the M.E. degree in electronic engineering from Beijing Polytechnic University, and the Ph.D. degree in pattern recognition and intelligent control from the Chinese Academy of Sciences, in 1989, 1992 and 1995, respectively. He was a postdoctoral fellow at Korea Advanced Institute of Science and Technology (KAIST) and later at Tokyo University of Agriculture and Technology from March 1996 to March 1999. From 1999 to 2004, he was a research staff member and later a senior researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. His research interests include pattern recognition, image processing, neural networks, machine learning, and especially the applications to character recognition and document analysis. He has published over 300 technical papers in journals and conferences. He won the IAPR/ICDAR Young Investigator Award of 2005. He is an associate editor-in-chief of Pattern Recognition Journal, an associate editor of Image and Vision and Computing, International Journal on Document Analysis and Recognition, and Cognitive Computation. He is a Fellow of the IAPR and the IEEE.
基金资助:
国家自然科学基金(61721004)

Document Image Recognition: Retrospective and Perspective of Technology

Liu Chenglin^1,^2,^3,^*()

1. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
3. CAS Center for Excellence of Brain Science and Intelligence Technology, Beijing 100190, China

Received:2019-11-07 Online:2019-12-20 Published:2020-01-15
Contact: Liu Chenglin

摘要/Abstract

摘要：

【目的】文档图像是一类广泛存在且具有重要应用价值的数据。从文档图像中检测文字并转化为计算机内码（电子文本）是文档识别的主要目标。自上世纪50年代以来,文档识别（又称文字识别,OCR）的研究和应用取得了巨大的进展。本文为科研人员和工程人员提供一个比较全面的文档图像识别技术总体介绍,便于大家开展技术创新和技术应用。【方法】本文在介绍文档识别应用背景的基础上,对该领域历史上主要方法进行回顾,对当前技术状况和研究动态进行分析,并展望未来发展趋势。【结果】1950年代到2000年代,在统计模式识别、特征提取、结构分析、字符切分、字符串识别和版面分析等方面积累了大量有效方法。【结论】近年来深度学习（深度神经网络）逐渐成为主导性的方法,使文字检测和识别的性能得到明显提升,但在复杂版面分析能力、文字识别的可靠性、泛化性等方面仍然存在不足。

关键词: 文档识别, 版面分析, 文本检测, 深度学习, 字符识别, 文本行识别

Abstract:

[Objective] Document images carry important information of texts which are extensive in daily life. Extracting texts from images and converting to digital texts to be processed by computers is the main objective of document image recognition (also called as character recognition or OCR). Since 1950s, the field of document recognition has seen tremendous advances in research and applications. This paper provides an overview of document image recognition, facilitating research innovations and engineering applications. [Methods] In this article, I first introduce the applications needs of document recognition, then review the main advances of research in this field, analyze the strengths and weaknesses of the methods, and finally, prospect the future development. [Results] Numerous methods of statistical recognition, feature extraction, structural analysis, character segmentation, character string recognition and layout analysis were proposed during 1950s-2000s. [Conclusions] In recent years, deep learning methods (deep neural networks, DNNs) dominate the field, and have promoted the performance of text detection and recognition significantly. However, insufficiencies are still evident in complex layout analysis, character recognition reliability and generalization.

Key words: document recognition, layout analysis, text detection, deep learning, character recognition, text line recognition

刘成林. 文档图像识别技术回顾与展望[J]. 数据与计算发展前沿, 2019, 1(2): 17-25.

Liu Chenglin. Document Image Recognition: Retrospective and Perspective of Technology[J]. Frontiers of Data and Computing, 2019, 1(2): 17-25.

图/表 5

表1

图1

图2

表2

表3

参考文献 33

[1]	H. Fujisawa . Forty years of research in character and document recognition—an industrial perspective[J]. Pattern Recognition, 2008,41(8):2453-2446.
[2]	G. Meng, C. Pan, S. Xiang, J. Duan, N. Zheng . Metric rectification of curved document images[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2012,34(4):707-722.
[3]	F. Shafait, D. Keysers, T. Breuel . Performance evaluation and benchmarking of six-page segmentation algorithms[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2008,30(6):941-954.
[4]	S. Eskenazi, P. Gomez-Kramer, J.-M. Ogier. A comprehensive survey of mostly textual document segmentation algorithms since 2008[J]. Pattern Recognition, 2017,64:1-14.
[5]	G. Nagy, S. Seth, M. Viswanathan . A prototype document image analysis system for technical journals[J]. Computer, 1992,25(7):10-22.
[6]	L. O’Gorman . The document spectrum for page layout analysis[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1993,15(11):1162-1173.
[7]	K. Kise, A. Sato, M. Iwata . Segmentation of page images using the area Voronoi diagram[J]. Computer Vision and Image Understanding, 1998,70(3):370-382.
[8]	X.-H. Li, F. Yin, C.-L. Liu. Printed/Handwritten Texts and Graphics Separation in Complex Documents using Conditional Random Fields[C]. Proc.13th IAPR Int. Workshop on Document Analysis Systems, Vienna, Austria, April 24-27, 2018, pp. 145-150.
[9]	X.-H. Li, F. Yin, T. Xue, L. Liu, J.-M. Ogier, C.-L. Liu。 Instance aware document image segmentation using label pyramid networks and deep watershed transformation[C]. Proc. 15th ICDAR, Sydney, Australia, September 20-25, 2019, pp. 514-519.
[10]	Q. Ye, D. Doermann . Text detection and recognition in imagery: A survey[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015,37(7):1480-1500.
[11]	B. Shi, X. Bai, S. Belongie . Detecting oriented text in natural images by linking segments[C]. CVPR 2017, pp. 2550-2558.
[12]	W. He, X.-Y. Zhang, F. Yin, C.-L. Liu. Multi-oriented and multi-lingual scene text detection with direct regression[J]. IEEE Trans. Image Processing, 2018,27(11):5406-5419.
[13]	S. Long, J. Ruan, W. Zhang, X. He, W. Wu, C. Yao . TextSnake: A flexible representation for detecting text of arbitrary shapes[C]. ECCV 2018, LNCS 11206, 2018,pp. 19-35.
[14]	X. Wang, Y. Jiang, Z. Luo, C.-L. Liu, H. Choi, S. Kim. Arbitrary shape scene text detection with adaptive text region representation[C]. CVPR 2019, Long Beach, CA, June 16-20, 2019.
[15]	C.-L. Liu, F. Y., D.-H. Wang, Q.-F. Wang. Online and offline handwritten Chinese character recognition: Benchmarking on new databases[J]. Pattern Recognition, 2013,46(1):155-162.
[16]	金连文, 钟卓耀, 杨钊, 杨维信, 谢泽澄, 孙俊 . 深度学习在手写汉字识别中的应用综述[J]. 自动化学报, 2016,42(8):1125-1141.
[17]	F. Yin, Q.-F. Wang, X.-Y. Zhang, C.-L. Liu. ICDAR 2013 Chinese handwriting recognition competition[C]. Proc. 12th ICDAR, Washington D.C., 2013, pp. 1095-1101.
[18]	X.-Y. Zhang, Y. Bengio, C.-L. Liu. New benchmark for online and offline handwritten Chinese character recognition with deep convolutional network and adaptation[J]. Pattern Recognition, 2017,61:348-360.
[19]	H. Murase . Online recognition of free-format Japanese handwritings[C]. Proc. 9th ICPR, Rome, Italy, 1988, pp. 1143-1147.
[20]	Q.-F. Wang, F. Yin, C.-L. Liu. Handwritten Chinese text recognition by integrating multiple contexts[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2012,34(8):1469-1481.
[21]	Y.-C. Wu, F. Yin, C.-L. Liu. Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models[J]. Pattern Recognition, 2017,65:251-264.
[22]	H. Bunke, M. Roth, E.G. Schukat-Talamazzini . Off-line cursive handwriting recognition using hidden Markov models[J]. Pattern Recognition, 1995,28(9):1399-1414.
[23]	W. Cho, S.-W. Lee, J.H. Kim . Modeling and recognition of cursive words with hidden Markov models[J]. Pattern Recognition, 1995,28(12):1941-1954.
[24]	A. Graves, M. Liwicki, S. Fernandez, R. Bertonami, H. Bunke, J. Schmidhuber . A novel connectionist system for unconstrained handwriting recognition[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2009,31(5):855-868.
[25]	B. Shi, X. Bai, C. Yao . An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017,39(11):2298-2304.
[26]	Y.-C. Wu, X.-Y. Zhang, C.-L. Liu. Scene Text Recognition with Sliding Convolutional Character Models, arXiv:1709.01727, 2017.
[27]	Z. Xie, Z. Sun, L. Jin, H. Ni, T. Lyons . Learning spatial-semantic context with fully convolutional recurrent network for online handwritten Chinese text recognition[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018,40(8):1903-1917.
[28]	U.-V. Marti, H. Bunke. The IAM-database: an English sentence database for offline handwriting recognition[J]. Int. J. Document Analysis and Recognition, 2002,5(1):39-46.
[29]	E. Grosicki, H. El-Abed . ICDAR 2011-french handwriting recognition competition[C]. Proc. 11th ICDAR, Beijing, China, 2011, pp. 1460-1464.
[30]	K. Dutta, P. Krishnan, M. Mathew, C.V. Jawahar . Improving CNN-RNN hybrid networks for handwriting recognition[C]. Proc. 16th ICFHR, Niagara Falls, NY, 2018.
[31]	C-L. Liu, F. Yin, D.-H. Wang, Q.-F. Wang. CASIA online and offline Chinese handwriting databases[C]. Proc. 11th ICDAR, Beijing, China, 2011, pp. 37-41.
[32]	X-Y. Zhang, F. Yin, Y.-M. Zhang, C.-L. Liu, Yoshua Bengio. Drawing and recognizing Chinese characters with recurrent neural network[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018,40(4):849-862.
[33]	X. Xiao, L. Jin, Y. Yang, W. Yang, J. Sun, T. Chang . Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition[J]. Pattern Recognition, 2017,72:72-81.

年代	主要方法	识别对象	相关事件
1920s	光学模板匹配	印刷数字、字母	首个OCR专利
1950s-1960s	相关匹配,简单结构分析	印刷数字、字母;印刷体汉字识别(1966)	1966年首个“模式识别”研讨会
1970s- 1980s	特征匹配,形状归一化,方向特征提取,结构匹配,统计模式识别	手写数字、字母,印刷或手写英文词识别,手写日文、汉字识别	1972年首次国际模式识别大会(ICPR); 1978年国际模式识别学会(IAPR)正式成立
1990s	神经网络,文档分析多种技术研究展开,包括版面分析、字符切分、字符串识别等	应用快速推广(文档电子化,邮件分拣,票据处理,名片识别,联机手写文字输入等)	PC机普及,互联网发展; 1990年首次国际手写识别前沿研讨会(IWFHR); 1991年首次国际文档分析与识别会议(ICDAR); 1994年首次国际文档分析系统研讨会(DAS)
2000s	隐马尔科夫模型(HMM), 递归神经网络(RNN),深度学习	手写文本识别,拍照文档识别,古籍文档,联机手写图文混合文档,自然场景文本	网络搜索,大数据,智能手机,社交网络(微博、微信等)

发表时间	作者	方法	识别率(%)
ICDAR 2013	富士通研究所	CNN	94.77
	瑞士IDSIA	CNN	94.42
	哈工大	特征提取+分类	92.62
ICFHR 2014	富士通研究所	CNN	95.04,96.06 (多模型集成)
PR 2017 ^[18]	中科院自动化所	方向特征+ CNN	96.95, 97.12 (多模型集成)
PR 2017 ^[33]	华南理工大学	CNN	97.30

发表时间	作者	方法	字符正确率(%)
ICDAR 2013	哈工大	过切分+ 字符分类	88.76
ICDAR 2013	中科院自动化所	过切分+ 字符分类	90.22
ICFHR 2016	中国科大	HMM	93.27,94.86 (书写人自适应)
ICFHR 2016	富士通研究所	过切分+ CNN分类	95.53
PR 2017 ^[21]	中科院自动化所	过切分+ CNN分类	96.32

文档图像识别技术回顾与展望

Document Image Recognition: Retrospective and Perspective of Technology

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 33

相关文章 15

编辑推荐

Metrics

本文评价

[1]	许淞源,刘峰. ESDRec：一种面向地球大数据平台的数据推荐模型[J]. 数据与计算发展前沿, 2023, 5(1): 55-64.
[2]	陈琼,杨咏,黄天林,冯媛. 小样本图像语义分割综述[J]. 数据与计算发展前沿, 2021, 3(6): 17-34.
[3]	蒲晓蓉,黄佳欣,刘军池,孙家瑜,罗纪翔,赵越,陈柯成,任亚洲. 面向临床需求的CT图像降噪综述[J]. 数据与计算发展前沿, 2021, 3(6): 35-49.
[4]	何涛,王桂芳,马廷灿. 基于词嵌入语义异常的跨学科研究内容发现方法[J]. 数据与计算发展前沿, 2021, 3(6): 50-59.
[5]	张怡宁,何洪波,王闰强. 热门数字音频预测技术综述[J]. 数据与计算发展前沿, 2021, 3(4): 81-92.
[6]	陈子健,李俊,岳兆娟,赵泽方. 基于自编码器与属性信息的混合推荐模型[J]. 数据与计算发展前沿, 2021, 3(3): 148-155.
[7]	肖建平,龙春,赵静,魏金侠,胡安磊,杜冠瑶. 基于深度学习的网络入侵检测研究综述[J]. 数据与计算发展前沿, 2021, 3(3): 59-74.
[8]	李序,连一峰,张海霞,黄克振. 网络安全知识图谱关键技术[J]. 数据与计算发展前沿, 2021, 3(3): 9-18.
[9]	赵伟昱,张宏海,仲波. 基于深度学习的遥感影像地块分割方法[J]. 数据与计算发展前沿, 2021, 3(2): 133-141.
[10]	沈飙,陈扬,杨琛,刘博文. 海洋科学中尺度涡的计算机视觉检测和分析方法[J]. 数据与计算发展前沿, 2020, 2(6): 30-41.
[11]	任荟颖,王婧,王彦棡. 基于AutoML的湍流建模[J]. 数据与计算发展前沿, 2020, 2(4): 121-131.
[12]	张圣林,林潇霏,孙永谦,张玉志,裴丹. 基于深度学习的无监督KPI异常检测[J]. 数据与计算发展前沿, 2020, 2(3): 87-100.
[13]	陈雷,袁媛. 基于深度迁移学习的农业病害图像识别[J]. 数据与计算发展前沿, 2020, 2(2): 111-119.
[14]	俞益洲, 马杰超, 石德君, 周振. 深度学习在医学影像分析中的应用综述[J]. 数据与计算发展前沿, 2019, 1(2): 37-52.
[15]	马艳军,于佃海,吴甜,王海峰. 飞桨：源于产业实践的开源深度学习平台[J]. 数据与计算发展前沿, 2019, 1(1): 105-115.