Image and Video Text Recognition System

Image and Video Text Recognition System

Files

80_recommendation.pdf (1.16 MB)

abstract.pdf (432.91 KB)

chapter 1.pdf (1.27 MB)

chapter 2.pdf (724.76 KB)

chapter 3.pdf (1.35 MB)

Abstract

The text conveys much information through tags, signs, logos, labels, billboards, and newlinemarkers and has been an integral part of human life for ages now. It can deliver newlineinformation by embedding it in natural scenes images/videos; hence, they have received newlineincreasing research attention in computer vision. Furthermore, with the development of newlinedigital technology, TDR-Text Detection and Recognition in images/video has become newlinemore popular for real-time applications, such as robot navigation systems, assisting newlineblind people in travelling on roads, monitoring vehicle license plates, and security newlinereasons. The text properties include arbitrary orientations, varied font sizes, and aspect newlineratios, which are challenging to address. Difficulties presented in video-text scenes are newlinelighting changes/effects, motion blur, and occlusion. Because of the importance of newlineTDR from images and frames/videos, several researchers are working towards the newlinedevelopment of effective text recognition systems from videos and images. Therefore, newlinethe proposed system introduces different efficient text detection and recognition newlinemethods. Firstly, different pre-processing techniques are studied in the proposed newlinemethod, and it showed that Radon Transform (R.T.) gives good results compared to newlineother filtering techniques and improved by 4.68% F-Score. However, this traditional newlinemethod is inefficient in the case of vertical, far text and is sensitive to blurred text. newlineHence we proposed the next method using the neural network approach. The second newlineapproach, deep learning, is better for text localization than the previous method. newlineHowever, localizing the text in natural scene images has become challenging. The newlineproposed research provides a comprehensive solution for text localization using newlineTransfer Learning (T.L.) with Deep Convolution Neural Network (DCNN), an newlineimproved version of the first objective with a reasonably good F-Score of 82.79% is newlineachieved. As a part of the third objective, we have designed the model to identify and newlinerecognize the text from the video data, which is DEFUSE (Deep Fused) Model. newlineDEFUSE model is fused with the DEASTD (Deep Efficient and Accurate Scene Text newlineDetector) and KOCR (Keras Optical Character Recognition) model to locate and newlinerecognize the text in image/video frames and is powered by a neural network. The newlinemodel has handled the high complexity of challenging text and data dynamicity newlinedifficulties, where the video screen changes from one location to another. This model newlineimproved the accuracy by 2.85% and 10.55% on different datasets. To work with newlineimages and video frames together, the YOLOv5x model is used for text detection and newline newlinexix newlineTesserectOCR for text recognition purposes. The proposed work also concentrates on newlinecapturing text on real-time challenging videos and getting good results on different newlinedatasets. newline

URI

http://hdl.handle.net/10603/481188

Collections

Department of Computer Science Engineering

Full item page

Image and Video Text Recognition System

Files

Date

item.page.authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced