Image and Video Text Recognition System
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The text conveys much information through tags, signs, logos, labels, billboards, and
newlinemarkers and has been an integral part of human life for ages now. It can deliver
newlineinformation by embedding it in natural scenes images/videos; hence, they have received
newlineincreasing research attention in computer vision. Furthermore, with the development of
newlinedigital technology, TDR-Text Detection and Recognition in images/video has become
newlinemore popular for real-time applications, such as robot navigation systems, assisting
newlineblind people in travelling on roads, monitoring vehicle license plates, and security
newlinereasons. The text properties include arbitrary orientations, varied font sizes, and aspect
newlineratios, which are challenging to address. Difficulties presented in video-text scenes are
newlinelighting changes/effects, motion blur, and occlusion. Because of the importance of
newlineTDR from images and frames/videos, several researchers are working towards the
newlinedevelopment of effective text recognition systems from videos and images. Therefore,
newlinethe proposed system introduces different efficient text detection and recognition
newlinemethods. Firstly, different pre-processing techniques are studied in the proposed
newlinemethod, and it showed that Radon Transform (R.T.) gives good results compared to
newlineother filtering techniques and improved by 4.68% F-Score. However, this traditional
newlinemethod is inefficient in the case of vertical, far text and is sensitive to blurred text.
newlineHence we proposed the next method using the neural network approach. The second
newlineapproach, deep learning, is better for text localization than the previous method.
newlineHowever, localizing the text in natural scene images has become challenging. The
newlineproposed research provides a comprehensive solution for text localization using
newlineTransfer Learning (T.L.) with Deep Convolution Neural Network (DCNN), an
newlineimproved version of the first objective with a reasonably good F-Score of 82.79% is
newlineachieved. As a part of the third objective, we have designed the model to identify and
newlinerecognize the text from the video data, which is DEFUSE (Deep Fused) Model.
newlineDEFUSE model is fused with the DEASTD (Deep Efficient and Accurate Scene Text
newlineDetector) and KOCR (Keras Optical Character Recognition) model to locate and
newlinerecognize the text in image/video frames and is powered by a neural network. The
newlinemodel has handled the high complexity of challenging text and data dynamicity
newlinedifficulties, where the video screen changes from one location to another. This model
newlineimproved the accuracy by 2.85% and 10.55% on different datasets. To work with
newlineimages and video frames together, the YOLOv5x model is used for text detection and
newline
newlinexix
newlineTesserectOCR for text recognition purposes. The proposed work also concentrates on
newlinecapturing text on real-time challenging videos and getting good results on different
newlinedatasets.
newline