Extraction of Text from Compound Images using Ridler Calvard Based Median Filtering Algorithm

Download Full-Text PDF Cite this Publication

Text Only Version

Extraction of Text from Compound Images using Ridler Calvard Based Median Filtering Algorithm


Assistant Professor Department of Computer Science

Sri Ramakrishna College of Arts and Science Bharathiar University


Assistant Professor Department of Computer Science

School of Computer Science and Engineering Bharathiar University

ABSTRACT:In the current scenario, extracting the text from the compound images are still main problem in image processing area. Large amounts of information are embedded in images which are often required to be automatically recognized and processed. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of text extraction extremely challenging. In this paper we proposed Median filter Approach based on Ridler Calvard Method for enhancing the image before extracting the text from the image.

Key words: Text Extraction, Text detection, Shannon Entropy.


With the recent advances in digital technology, more and more databases are multimedia in nature, containing images and video in addition to the textual information. The research on text extraction from images has been growing recently [1]. Many methods have been proposed based on edge detection, binarization, spatial- frequency image analysis and mathematical morphology operations. All these systems make evident that the text areas cannot be perfectly extracted from the image because natural scenes consist of complex objects, sometimes highly textured, buildings, trees, window frames and so on, giving rise to false text detection and misses. The first step in developing our text reading system is to address the problem of text detection in natural scene images [2]. The written text provides important information and it is not an easy problem to reliably detect and localize text embedded in natural scene images [8]. The size of the characters can vary from very small to very big. The font of the text can be different. Text present in the image may have multiple colors. The text may appear indifferent orientation. Text can occur in a complex background. And also the textual and other information captured is affected by significant degradations such as perspective distortion, blur, shadow and uneven lighting. Hence, the automatic detection and segmentation of text is a difficult and challenging problem. Reported works have identified a number of approaches for text localization from natural scene images.

The various approaches are categorized as connected component based, edge based and texture based

methods. Connected component based methods use bottom up approach to group smaller components into larger components until all regions are identified in the image. A geometrical analysis is later needed to identify text components and group them to localize text regions. Edge based methods focus on the high contrast between the background and text and the edges of the text boundary are Identified and merged.

Later several heuristics are required to filter out non-text regions. But, the presence of noise, complex background, and significant degradation in the low resolution natural scene image can affect the extraction of connected components and identification of boundary lines, thus making both the approaches inefficient [3]. This paper presents a method for image preprocessing based on Shannons definition of information Entropy. The approach is generally applicable to any image. The basic concept is that the background remains informatively poor, whereas the objects carry relevant information. This method preserves the details, highlights edges, and decreases random noise.


    According to Chitrakala Gopalan, Manjula.D(2008)-The problem of text extraction from different kinds of images such as Scene text images, Caption text images & document images with an unified framework,So the proposed method is to apply a variation of Contourlet transform on images to decompose it into set of directional sub bands with texture details capture in different orientations at various scales[1].

    NobuoEzaki, Marius Bulacu Lambert Schomaker(2004) – Proposed four character-extraction methods based on connected components. And they tested the effectiveness of the methods on the ICDAR 2003 Robust Reading Competition data. The performance of the different methods depends on character size. In the data, bigger characters are more prevalent and the most effective extraction method proves to be the sequence: Sobel edge detection, Otsu binarization, connected component extraction and rule-based connected component filtering [2].

    G. Sahoo, Tapas Kumar., et al – Proposed a set of sequential algorithms for text extraction and enhancement of image using cellular automata are proposed. The image

    enhancement includes gray level, contrast manipulation, edge detection, and filtering[5].

    JiSoo Kim, SangCheol Park,et al-Proposed three text extraction methods based on intensity information for natural scene images. The first method is composed of gray value stretching and binarization by an average intensity of the image. This method is appropriate to extract texts from complex backgrounds. The second method is a Split and Merge approach which is one of well-known algorithms for image segmentation. The third one is a combination of the two. Experimental results show that the proposed approaches are superior to conventional methods both in simple and complex images[7].

    1. Rama Mohan Babu, P. Srimaiyee-Proposes an algorithm which is insensitive to noise, skew and text orientation. It is free from artifacts that are usually introduced by thresholding using morphological operators[6].

      Xiao-Wei Zhang, Xiong-Bo Zheng.,et al-Proposed a new text extraction algorithm under background image based on two-dimensional wavelet transforms.


      1. Characters are normally arranged either horizontally or vertically.

      2. Given a fixed font type, style and size, heights of the characters belonging to the same group (groups include ascenders, descenders, capitals, and lowercases) are approximately constant.

      3. Characters form regular (or weakly periodic) structures in both horizontal and vertical directions.

      4. Given a fixed font type, style and size, characters are composed of strokes of approximately constant.


    In this work, the main focus is the application of Median filter Approach based on Ridler Calvard Method for extraction of text from the compound images. The overall system is shown in figure 1.

    Input image

    Input image

    Pre processing

    Pre processing

    Edge detection

    Edge detection

    Text region Detection

    Text region Detection

    Text Extraction

    Text Extraction

    Extracted Text

    Fig-1 System Architecture

        1. Pre-processing

          Image preprocessing is one of the important steps followed in image processing, because various noises are introduced while capturing the image, So the preprocessing steps are applied on the image for removing the various types of noises. But here Median filter method is used to remove the noise and highlights the edges of an image.

        2. Edge detection

          First, the image is converted to gray-level, beause we want to be able to detect text, even in low-light conditions where color information is absent or noisy. Next image is down sampled. So it enables the detection of text at different size levels. For detecting the text, apply edge detection method. As closed contour is a characteristic text region, well apply connected component analysis on the resultant image. Still some of the non-text regions have been detected, to eliminate them, well consider some of the geometric rules like neighboring text components are in horizontal direction belong to the same text line and have similar heights. Apart from the above rules they have considered edge Intensity of each component to discriminate text regions from non-text regions[5].

        3. Text regions detection

          The CT is applied to spread luminance values throughout the image and increases the contrast between the possibly interesting text regions and the rest of the image. The value of CT lies between the minimum value and maximum value, which will separate the text region from the background image region. If the background is simple, a text string, even of low contrast can easily be detected by a low threshold; whereas a text string embedded in a complex background needs a higher threshold to further simplify the background (white)[5].

          Fig -2 Contrast

        4. Text extraction

    The Binarization is applied to extract the text from the identified text region .It will enable the extracted text to be parsed & recognized by the Common OCR systems. Binarization is a technique by which the gray scale images

    are converted to binary images. The most common method is to select a proper threshold for the image and then convert all the intensity values above the threshold intensity to one intensity value representing either black or white value. All intensity values below a threshold are converted to one intensity level and intensities higher than this threshold are converted to the other chosen intensity. It segments an image into foreground and background .The foreground contains interested characters & this process generates an output image with white text against a black background.

    Ridler Calvard Method

    The Ridler Calvard Algorithm uses an iterative clustering approach. First a initial estimate of the threshold is to be made for an example mean image intensity. Pixels above and below threshold are assigned to the object and background classes respectively. Then the mean pixels in

    the object class is computed as µf and for the background as µb. using these two mean values, an improved threshold T1 is computed as

    T1= (µs + µf) / 2

    The threshold value is calculated by eqn(1) and the image is partitioned into two regions background and foreground using T1. This procedure is continued until T1=T0. The final T1 is taken as threshold T. using the threshold value T, the given input image f(x,y) is converted to a binary image g as:

    g(x,y) = 1 if f(x,y) >= 1

    = 0 Otherwise


    In this work, the main focus is the application of Median filter Approach based on Ridler Calvard Method for extraction of text from the compound images the results of proposed algorithm shown in the figure 3.

    Original image Edge detection Text localization Extracted text

    Fig-3 Experimental results for proposed Algorithm


The main aim of this paper is to extract the text from the image, So before extraction we faced many problems like noise, image blurness etc..So the extraction is very complicated thats why the Shannon entropy method is used for enhancing the image because this method used

the entropy contribution which compensates for local changes (without blurring) and partially denoises the output and make the text extraction easy.


  1. Chitrakala Gopalan and Manjula.D Contourlet Based Approach for Text Identification and Extraction from Heterogeneous Textual Images International Journal of Computer Science and Engineering 2:4 2008.

  2. Nobuo Ezaki, Marius Bulacu et al., Text Detection from Natural Scene Images:Towards a System for Visually Impaired Persons vol. II (ICPR 2004).

  3. S. A. Angadi and M. M. Kodabagi A Texture Based Methodology for Text Region Extraction from Low Resolution Natural Scene Image. vol. II (ICPR 2004).

  4. Jan Urban, Jan Vanek et al., Preprocessing of microscopy images via Shannons entropy.

  5. G. Sahoo1,Tapas Kumar et al.., Text Extraction and Enhancement of Binary Images Using Cellular Automata vol 6(3), August (2009).

  6. 6.Rama Mohan Babu.G,srimaiyee.pText Exraction From Heterogeneous image using Mathematical Morphology JATIT 2005 2010.

  7. JiSoo Kim, SangCheol Park,et al Text locating from natural scene images using image intensities16 January 2006.

Leave a Reply

Your email address will not be published. Required fields are marked *