Frontiers of Data and Computing ›› 2019, Vol. 1 ›› Issue (2): 17-25.doi: 10.11871/jfdc.issn.2096-742X.2019.02.002

Special Issue: “人工智能”专刊

Previous Articles     Next Articles

Document Image Recognition: Retrospective and Perspective of Technology

Liu Chenglin1,2,3,*()   

  1. 1. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
    3. CAS Center for Excellence of Brain Science and Intelligence Technology, Beijing 100190, China
  • Received:2019-11-07 Online:2019-12-20 Published:2020-01-15
  • Contact: Liu Chenglin


[Objective] Document images carry important information of texts which are extensive in daily life. Extracting texts from images and converting to digital texts to be processed by computers is the main objective of document image recognition (also called as character recognition or OCR). Since 1950s, the field of document recognition has seen tremendous advances in research and applications. This paper provides an overview of document image recognition, facilitating research innovations and engineering applications. [Methods] In this article, I first introduce the applications needs of document recognition, then review the main advances of research in this field, analyze the strengths and weaknesses of the methods, and finally, prospect the future development. [Results] Numerous methods of statistical recognition, feature extraction, structural analysis, character segmentation, character string recognition and layout analysis were proposed during 1950s-2000s. [Conclusions] In recent years, deep learning methods (deep neural networks, DNNs) dominate the field, and have promoted the performance of text detection and recognition significantly. However, insufficiencies are still evident in complex layout analysis, character recognition reliability and generalization.

Key words: document recognition, layout analysis, text detection, deep learning, character recognition, text line recognition