Extraction of Text from Compound Images using Ridler Calvard Based Median Filtering Algorithm

M. Praneesh; Dr. D. Napoleon

doi:10.17577/IJERTCONV8IS04007

NSDARM – 2020 (Volume 8 - Issue 04)

Extraction of Text from Compound Images using Ridler Calvard Based Median Filtering Algorithm

DOI : 10.17577/IJERTCONV8IS04007

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 97
Authors : M. Praneesh, Dr. D. Napoleon
Paper ID : IJERTCONV8IS04007
Volume & Issue : NSDARM – 2020 (Volume 8 – Issue 04)
Published (First Online): 17-03-2020
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Extraction of Text from Compound Images using Ridler Calvard Based Median Filtering Algorithm

M.Praneesh

Assistant Professor Department of Computer Science

Sri Ramakrishna College of Arts and Science Bharathiar University

Dr.D.Napoleon

Assistant Professor Department of Computer Science

School of Computer Science and Engineering Bharathiar University

ABSTRACT:In the current scenario, extracting the text from the compound images are still main problem in image processing area. Large amounts of information are embedded in images which are often required to be automatically recognized and processed. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of text extraction extremely challenging. In this paper we proposed Median filter Approach based on Ridler Calvard Method for enhancing the image before extracting the text from the image.

Key words: Text Extraction, Text detection, Shannon Entropy.

I.INTRODUCTION

With the recent advances in digital technology, more and more databases are multimedia in nature, containing images and video in addition to the textual information. The research on text extraction from images has been growing recently [1]. Many methods have been proposed based on edge detection, binarization, spatial- frequency image analysis and mathematical morphology operations. All these systems make evident that the text areas cannot be perfectly extracted from the image because natural scenes consist of complex objects, sometimes highly textured, buildings, trees, window frames and so on, giving rise to false text detection and misses. The first step in developing our text reading system is to address the problem of text detection in natural scene images [2]. The written text provides important information and it is not an easy problem to reliably detect and localize text embedded in natural scene images [8]. The size of the characters can vary from very small to very big. The font of the text can be different. Text present in the image may have multiple colors. The text may appear indifferent orientation. Text can occur in a complex background. And also the textual and other information captured is affected by significant degradations such as perspective distortion, blur, shadow and uneven lighting. Hence, the automatic detection and segmentation of text is a difficult and challenging problem. Reported works have identified a number of approaches for text localization from natural scene images.

The various approaches are categorized as connected component based, edge based and texture based

methods. Connected component based methods use bottom up approach to group smaller components into larger components until all regions are identified in the image. A geometrical analysis is later needed to identify text components and group them to localize text regions. Edge based methods focus on the high contrast between the background and text and the edges of the text boundary are Identified and merged.

Later several heuristics are required to filter out non-text regions. But, the presence of noise, complex background, and significant degradation in the low resolution natural scene image can affect the extraction of connected components and identification of boundary lines, thus making both the approaches inefficient [3]. This paper presents a method for image preprocessing based on Shannons definition of information Entropy. The approach is generally applicable to any image. The basic concept is that the background remains informatively poor, whereas the objects carry relevant information. This method preserves the details, highlights edges, and decreases random noise.

RELATED WORKS

According to Chitrakala Gopalan, Manjula.D(2008)-The problem of text extraction from different kinds of images such as Scene text images, Caption text images & document images with an unified framework,So the proposed method is to apply a variation of Contourlet transform on images to decompose it into set of directional sub bands with texture details capture in different orientations at various scales[1].

NobuoEzaki, Marius Bulacu Lambert Schomaker(2004) – Proposed four character-extraction methods based on connected components. And they tested the effectiveness of the methods on the ICDAR 2003 Robust Reading Competition data. The performance of the different methods depends on character size. In the data, bigger characters are more prevalent and the most effective extraction method proves to be the sequence: Sobel edge detection, Otsu binarization, connected component extraction and rule-based connected component filtering [2].

G. Sahoo, Tapas Kumar., et al – Proposed a set of sequential algorithms for text extraction and enhancement of image using cellular automata are proposed. The image

enhancement includes gray level, contrast manipulation, edge detection, and filtering[5].

JiSoo Kim, SangCheol Park,et al-Proposed three text extraction methods based on intensity information for natural scene images. The first method is composed of gray value stretching and binarization by an average intensity of the image. This method is appropriate to extract texts from complex backgrounds. The second method is a Split and Merge approach which is one of well-known algorithms for image segmentation. The third one is a combination of the two. Experimental results show that the proposed approaches are superior to conventional methods both in simple and complex images[7].
1. Rama Mohan Babu, P. Srimaiyee-Proposes an algorithm which is insensitive to noise, skew and text orientation. It is free from artifacts that are usually introduced by thresholding using morphological operators[6].
  
  Xiao-Wei Zhang, Xiong-Bo Zheng.,et al-Proposed a new text extraction algorithm under background image based on two-dimensional wavelet transforms.
PROPERTIES OF TEXT CHARACTERS
METHODOLOGY

In this work, the main focus is the application of Median filter Approach based on Ridler Calvard Method for extraction of text from the compound images. The overall system is shown in figure 1.

Input image

Input image

Pre processing

Pre processing

Edge detection

Edge detection

Text region Detection

Text region Detection

Text Extraction

Text Extraction

Extracted Text

Fig-1 System Architecture
The Binarization is applied to extract the text from the identified text region .It will enable the extracted text to be parsed & recognized by the Common OCR systems. Binarization is a technique by which the gray scale images

are converted to binary images. The most common method is to select a proper threshold for the image and then convert all the intensity values above the threshold intensity to one intensity value representing either black or white value. All intensity values below a threshold are converted to one intensity level and intensities higher than this threshold are converted to the other chosen intensity. It segments an image into foreground and background .The foreground contains interested characters & this process generates an output image with white text against a black background.

Ridler Calvard Method

The Ridler Calvard Algorithm uses an iterative clustering approach. First a initial estimate of the threshold is to be made for an example mean image intensity. Pixels above and below threshold are assigned to the object and background classes respectively. Then the mean pixels in

the object class is computed as Âµf and for the background as Âµb. using these two mean values, an improved threshold T1 is computed as

T1= (Âµs + Âµf) / 2

The threshold value is calculated by eqn(1) and the image is partitioned into two regions background and foreground using T1. This procedure is continued until T1=T0. The final T1 is taken as threshold T. using the threshold value T, the given input image f(x,y) is converted to a binary image g as:

g(x,y) = 1 if f(x,y) >= 1

= 0 Otherwise
RESULT AND DISCUSSION

In this work, the main focus is the application of Median filter Approach based on Ridler Calvard Method for extraction of text from the compound images the results of proposed algorithm shown in the figure 3.

Original image Edge detection Text localization Extracted text

Fig-3 Experimental results for proposed Algorithm
CONCLUSION

The main aim of this paper is to extract the text from the image, So before extraction we faced many problems like noise, image blurness etc..So the extraction is very complicated thats why the Shannon entropy method is used for enhancing the image because this method used

the entropy contribution which compensates for local changes (without blurring) and partially denoises the output and make the text extraction easy.

REFERENCES

Chitrakala Gopalan and Manjula.D Contourlet Based Approach for Text Identification and Extraction from Heterogeneous Textual Images International Journal of Computer Science and Engineering 2:4 2008.
Nobuo Ezaki, Marius Bulacu et al., Text Detection from Natural Scene Images:Towards a System for Visually Impaired Persons vol. II (ICPR 2004).
S. A. Angadi and M. M. Kodabagi A Texture Based Methodology for Text Region Extraction from Low Resolution Natural Scene Image. vol. II (ICPR 2004).
Jan Urban, Jan Vanek et al., Preprocessing of microscopy images via Shannons entropy.
G. Sahoo1,Tapas Kumar et al.., Text Extraction and Enhancement of Binary Images Using Cellular Automata vol 6(3), August (2009).
6.Rama Mohan Babu.G,srimaiyee.pText Exraction From Heterogeneous image using Mathematical Morphology JATIT 2005 2010.
JiSoo Kim, SangCheol Park,et al Text locating from natural scene images using image intensities16 January 2006.

Extraction of Text from Compound Images using Ridler Calvard Based Median Filtering Algorithm

Leave a Reply