Deep Learning Techniques for Detecting and Segmenting Text in Natural Scene Images: Review

Authors

  • Alaa Hussein Computer Science Department, College of Science, Al-Nahrain University, Baghdad, Iraq
  • Mohammed Sahib Mahdi Altaei Computer Science Department, College of Science, Al-Nahrain University, Baghdad, Iraq

DOI:

https://doi.org/10.22401/4qpt9s13

Keywords:

Image processing , Text detection , Text recognition, OCR

Abstract

Text detection and segmentation in natural scene images is an active research problem in computer vision and document analysis. Unlike scanned documents, scene text exhibits significant diversity in appearance, orientation, scale, font, and lighting conditions. In this review, survey the current state-of-the-art in techniques and methodologies aimed at detecting and segmenting text regions from images of natural scenes are presented. Both traditional approaches using hand-crafted features as well as modern data-driven deep learning methods will be discussed. The review will analyze common datasets, evaluation protocols and metrics used for benchmarking. Limitations of existing methods and open challenges in handling multilingual text, curved text, and efficiency will be highlighted. Promising future directions towards robust and generalizable scene text extraction systems will be identified. In summary, the review will provide a comprehensive overview of the advances, remaining challenges and future opportunities in developing automated systems for detecting and segmenting text in unconstrained natural images.

References

Xu, X.; Qi, Z.; Ma, J.; Zhang, H.; Shan, Y.; Qie, X.; "BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild". Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022.

Zhu, Y.; Yao, C.; Bai, X.; "Scene text detection and recognition: recent advances and future trends". Front. Comput. Sci., 10(1): 19–36, 2016.

Chen, J.; Li, B.; Xue, X.; "Scene Text Telescope: Text-Focused Scene Image Super-Resolution". Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021.

Fu, K.; Sun, L.; Kang, X.; Ren, F.; "Text Detection for Natural Scene based on MobileNet V2 and U-Net". Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 2019.

Jaderberg, M.; Simonyan, K.; Vedaldi, A.; Zisserman, A.; "Reading Text in the Wild with Convolutional Neural Networks". Int. J. Comput. Vis., 116(1): 1–20, 2016.

Long, S.; He, X.; Yao, C.; "Scene Text Detection and Recognition: The Deep Learning Era". Int. J. Comput. Vis., 129(1): 161–184, 2021.

Feng, W.; Yin, F.; Zhang, X.-Y.; Liu, C.-L.; "Semantic-Aware Video Text Detection". In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021.

Ye, Q.; Doermann, D.; "Text detection and recognition in imagery: A survey". IEEE Trans. Pattern Anal. Mach. Intell., 37(7): 1480–1500, 2014.

Soni, R.; Kumar, B.; Chand, S.; "Text detection and localization in natural scene images based on text awareness score". Appl. Intell., 49(4): 1376–1405, 2019.

Luo, C.; Lin, Q.; Liu, Y.; Jin, L.; Shen, C.; "Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild". Int. J. Comput. Vis., 129(4): 960–976, 2021.

Raisi, Z.; Naiel, M.A.; Fieguth, P.; Wardell, S.; Zelek, J.; "Text Detection and Recognition in the Wild: A Review". arXiv, Jun. 30, 2020.

Chen, X.; Jin, L.; Zhu, Y.; Luo, C.; Wang, T.; "Text Recognition in the Wild: A Survey". ACM Comput. Surv., 54(2): 1–35, 2022.

Epshtein, B.; Ofek, E.; Wexler, Y.; "Detecting text in natural scenes with stroke width transform". In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010.

Nistér, D.; Stewénius, H.; "Linear Time Maximally Stable Extremal Regions". In Computer Vision – ECCV 2008, Springer Berlin Heidelberg, vol. 5303, pp. 183–196, 2008.

Bušta, M.; Neumann, L.; Matas, J.; "Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework". In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.

Karatzas, D.; et al.; "ICDAR 2015 competition on Robust Reading". In: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015.

Veit, A.; Matera, T.; Neumann, L.; Matas, J.; Belongie, S.; "COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images". arXiv, Jun. 19, 2016.

Kai, W.; Babenko, B.; Belongie, S.; "End-to-end scene text recognition". In: Proceedings of the 2011 International Conference on Computer Vision, Barcelona, 2011.

Wolf, C.; Jolion, J.-M.; "Object count/area graphs for the evaluation of object detection and segmentation algorithms". Int. J. Doc. Anal. Recognit. IJDAR, 8(4): 280–296, 2006.

Wang, W.; et al.; "Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network". In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019.

Liu, Y.; et al.; “Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting”. IEEE Trans. Pattern Anal. Mach. Intell., 44(11): 8048–8064, 2021.

Dinh, M.-T.; Choi, D.-J.; Lee, G.-S.; “Dense Text PVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection”. Sensors, 23(13): 5889, 2023.

Neumann, L.; Matas, J.; “Real-time lexicon-free scene text localization and recognition”. IEEE Trans. Pattern Anal. Mach. Intell., 38(9): 1872–1885, 2015.

Mishra, A.; Alahari, K.; Jawahar, C. V.; "Top-down and bottom-up cues for scene text recognition". In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012.

Shi, B.; Bai, X.; Belongie, S.; "Detecting Oriented Text in Natural Images by Linking Segments". In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.

Liu, Y.; Jin, L.; Zhang, S.; Luo, C.; Zhang, S.; “Curved scene text detection via transverse and longitudinal sequence connection”. Pattern Recognit., 90: 337–345, 2019.

Ghosh, M.; Mukherjee, H.; Obaidullah, S. M. Gao, X.-Z.; Roy, K.; “Scene text understanding: recapitulating the past decade”. Artif. Intell. Rev., 56(12): 15301–15373, Dec. 2023.

Neumann, L.; Matas, J.; "Real-time scene text localization and recognition". In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012.

Xiangrong, C.; Yuille, A.L.; "Detecting and reading text in natural scenes". In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), Washington, DC, USA, 2004.

Liao, M.; Shi, B.; Bai, X.; Wang, X.; Li, W.H.; "Textboxes: A Fast Text Detector with a Single Deep Neural Network". In: Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA , 2017.

Long, S.; Ruan, J.; Zhang, W.; He, X.; Wu, W.; Yao, C.; "TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes". In: Proceedings of the European Conference on Computer Vision (ECCV), San Francisco, CA, USA, 2018.

Graves, A.; Schmidhuber, J.; “Offline handwriting recognition with multidimensional recurrent neural networks”. Adv. Neural Inf. Process. Syst., 21, 2008.

Hochreiter, S.; Schmidhuber, J.; “Long short-term memory”. Neural Comput., 9(8): 1735–1780, 1997.

Shi, B.; Bai, X., Yao, C.; “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition”. IEEE Trans. Pattern Anal. Mach. Intell., 39(11): 2298–2304, 2016.

Graves, A.; Fernández, S.; Gomez, F.; Schmidhuber, J.; "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks". In: Proceedings of the 23rd International Conference on Machine Learning - ICML '06, San Francisco, CA, USA, 2006.

Powers, D.M.W.; “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation.” arXiv, Oct. 10, 2020.

Sokolova, M.; Lapalme, G.; “A systematic analysis of performance measures for classification tasks”. Inf. Process. Manag., 45(4): 427–437, 2009.

Karatzas, D. et al.; "ICDAR 2013 Robust Reading Competition". In: Proceedings of the 2013 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 2013.

Chen, H.; Tsai, S. S.; Schroth, G.; Chen, D. M.; Grzeszczuk, R.; Girod, B.; "Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions". In: Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 2011.

Yin, X.-C.; Pei, W.-Y.; Zhang, J.; Hao, H.-W.; “Multi-Orientation Scene Text Detection with Adaptive Clustering”. IEEE Trans. Pattern Anal. Mach. Intell., 37(9): 1930–1937, Sep. 2015.

Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H.; "Character Region Awareness for Text Detection". In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019.

Ch'ng, C. K.; Chan, C. S.; "Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition". In: Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2017.

Xu Y., Wang Y., Zhou W., Wang Y., Yang Z., and Bai X.; "Textfield: Learning a deep direction field for irregular scene text detection". IEEE Trans. Image Process., 28(11): 5566–5579, 2019.

Xing, L.; Tian, Z.; Huang, W.; Scott, M.; "Convolutional Character Networks". In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019.

Tian Z., Huang W., He T., He P., and Qiao Y.; "Detecting Text in Natural Image with Connectionist Text Proposal Network". In: Leibe B., Matas J., Sebe N., and Welling M. (eds) Computer Vision – ECCV 2016. Springer, Cham, pp. 56–72, 2016.

Zhu, Y.; Chen, Liang, L.; Kuang, Z.; Jin, L.; Zhang, W.; "Fourier Contour Embedding for Arbitrary-Shaped Text Detection". In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021.

Lyu, P.;, Liao, M.; Yao, C.; Wu, W.; Bai X.F.; "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes". In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83, 2018.

Liu, Y.; Chen, H.; Shen, C.; He, T.; Jin, L.; Wang, L.; "ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network". In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020.

Liao, M.; Shi, B.; Bai, X.; "Textboxes++: A single-shot oriented scene text detector". IEEE Trans. Image Process., 27(8): 3676–3690, 2018.

He, P.; Huang, W.; He, T.; Zhu, Q.; Qiao, Y.; Li, X.; "Single Shot Text Detector with Regional Attention". In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.

Ma, J.; et al.; "Arbitrary-oriented scene text detection via rotation proposals". IEEE Trans. Multimed., 20(11): 3111–3122, 2018.

Sun, Y.; Zhang, C.; Huang, Z.; Liu, J.; Han, J.; Ding, E.; "TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network". In: Jawahar C. V., Li H., Mori G., and Schindler K. (eds) Computer Vision – ACCV 2018. Springer, Cham, pp. 83–99, 2019.

Zhou, X.; et al.; "EAST: An Efficient and Accurate Scene Text Detector". In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.

Cheng, Z.; Bai, F.; Xu, Y.; Zheng, G.; Pu, S.; Zhou, S.; "Focusing Attention: Towards Accurate Text Recognition in Natural Images". In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.

Li, H.; Wang, P.; Shen, C.; "Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks". In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.

Gupta, A.; Vedaldi, A.; Zisserman, A.; "Synthetic Data for Text Localisation in Natural Images". In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016.

Sung, M.-C.; Jun, B.; Cho, H.; Kim, D.; "Scene text detection with robust character candidate extraction method". In: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015.

Diaz-Escobar, J.; Kober, V.; "Text detection in natural scenes with phase congruency approach". In: Tescher A. G. (ed.) Applications of Digital Image Processing XL, SPIE, p. 115, 2017.

Diaz-Escobar, J.; Kober, V.; "Natural scene text detection and recognition with a three-stage local phase-based algorithm". In: Tescher A. G. (ed.) Applications of Digital Image Processing XLI, SPIE, p. 7, 2018.

Turki, H.; Ben Halima, M.; Alimi, A. M.; "Text detection in natural scene images using two masks filtering". In: Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir, Morocco, 2016.

Chidiac, N.-M.; Damien, P.; Yaacoub, C.; "A robust algorithm for text extraction from images". In: Proceedings of the 2016 39th International Conference on Telecommunications and Signal Processing (TSP), Vienna, Austria, 2016.

Dai, K.; Lu, J. ; Ruan, S.; "A Novel Method for Text Detection in Arbitrary Scenes Based on Multi-scale Segmentation Networks". In: Proceedings of the 2019 Chinese Control And Decision Conference (CCDC), Nanchang, China, 2019.

Turki, H.; Ben Halima, M.; Alimi, A. M.; "A Hybrid Method of Natural Scene Text Detection Using MSERs Masks in HSV Space Color". In: Proceedings of the Ninth International Conference on Machine Vision, SPIE, p. 1034111, 2017.

Zhang, S.-X.; Zhu, X.; Chen, L.; Hou, J.-B.; Yin, X.-C.; "Arbitrary Shape Text Detection via Segmentation with Probability Maps". arXiv, 2022.

Dong, L.; Chao, Z.; Wang, J.; "An Efficient Detection Method for Text of Arbitrary Orientations in Natural Images". In: Yang C., Virk G. S., and Yang H. (eds) Wearable Sensors and Robots. Springer Singapore, pp. 447–460, 2017.

Phuoc Huynh; Cong, et al; "CRAFT: Complementary Recommendation by Adversarial Feature Transform.". In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 2018.‏

Downloads

Published

2024-06-15

Issue

Section

Articles

How to Cite

(1)
Deep Learning Techniques for Detecting and Segmenting Text in Natural Scene Images: Review. ANJS 2024, 27 (2), 133-144. https://doi.org/10.22401/4qpt9s13.