A Comprehensive Framework for Integrated Text Detection and Recognition in Natural Images Through Unified Deep Learning Approaches

Abstract

Scene Text Recognition has become one of the long-standing and popular research areas in computer vision; Scene-based text recognition is a crucial task in computer vision with various applications in numerous domains such as automated driving, assistive technology, and geographic information systems. It involves the identification and interpretation of text in natural scenes, which is a challenging process due to complex backgrounds, varying fonts, sizes, colors, and orientations. This thesis work presents an innovative approach for scene-based text recognition using cutting-edge machine learning techniques. By combining deep learning methodologies with state of the art image processing techniques, the proposed framework enhances both the detection and recognition of text in diverse and dynamic scenes. The existing approach possesses various issues such as optimal feature extraction inters of Devanagari script and issue of irregular scenery.In deep learning integration of two distinctive neural network helps in addressing the issue of other neural network, thus this research work adopts the different neural network and develops three distinctive models i.e., Hybrid-CNN (Convolution Neural Network), FNN (Fusion Neural Network) and EDN (Ensemble Deep Network). At first, it develops an efficient technique to perceive texts in natural scene images. Also constructs the quotCRF modelquot by relating quotCNN scores of Maximally Stable External Regions (MSERs)quot and multiple neighborhood information. Moreover, this adopts YOLO based object detector and a CNN-based classification approach. Further, this second part of thesis develops optimal FNN (Fusion Neural Network) for detection and recognition. Fusion Neural Network combines layers of Convolution Neural Network and Recurrent Neural Network. In FNN, convolution layer is used for feature extraction and recurrent layer is used for obtaining the feature sequence prediction, further optimal training model is designed for classification accuracy enhancement. newlineThe Devanagari MLT-19 dataset is used to test FNN. Three distinct parameters are taken into account for the evaluation: script word identification, word recognition rate, and character recognition rate. The effectiveness of the proposed model is shown by a comparison with current methodologies. Additionally, the FNN model reports 98.67% accuracy in script word identification, 84.65% accuracy in word recognition rate, and 92.93% accuracy in character recognition rate. Additionally, the third section of the thesis proposes a unique framework, Ensemble Deep Network (EDN), which consists of a Deep Auto encoder and a customized CNN, to handle the problem of irregular text. In order to optimize the input of irregular text to read at the same size, the best spatial transformation module is included in customized CNN design. Additionally, Deep Auto encoder which makes use of its own intrinsic features is provided with an efficient attention method. For both irregular and regular scene texts, the proposed EDN-PS approach outperforms the current techniques. Upon further simulations, the proposed model yields better results for the IIT5K, ICDAR-13, ICDAR-15, and CUTE dataset when compared to the current system. newline newline

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced