Machine Printed Punjabi Character Recognition Using Morphological Operators on Binary Images

DOI : 10.17577/IJERTV1IS3236

Download Full-Text PDF Cite this Publication

Text Only Version

Machine Printed Punjabi Character Recognition Using Morphological Operators on Binary Images

Usha Rani

Er. Balwinder Singh

Er. Ravinder Singh

Co mputer Sc ience Deptt.

Co mputer Science Deptt.

Co mputer Sc ience Deptt.

Punjabi University Guru Kashi

Govt. Mult ipurpose Sec.

Govt. Mu ltipurpose Sec.

College,Da mda ma Sah ib

School, Pat iala

School, Patia la

ABSTRACT

In this paper we present a character recognition system by using morphological operators on binary images . As a consequence, we will deal with the Punjabi language characters. This recognition system is me rely feature- based, with no need of a learning phase or any kind of me mo ry. Main advantage of this system is its accuracy to recognize Punjab i characters. Input to the system is the scanned images fro m newspaper, magazines and old books.

Ke ywor ds: M orphological Operators, Punjabi character recognition, machine printed, binary images .

  1. Intro ductio n

    Optical Character Recognition is a technology used to copy and machine printed materia l into editable word processing file formats. This is the technology long used by libraries and government agencies to make lengthy documents quickly ava ilab le e lectronically.

    Optical character recognition is a challenging problem,

    the solution of which the researchers have been pursuing since more than 100 years. Several algorithms have been proposed to improve recognition capabilit ies [1,2]. Methods used to recognize characters inside a bitmapped image fall main ly into two categories : pattern matching, used in cheaper systems, and feature analysis, used in more sophisticated systems.

    Pattern matching methods have the bitmaps stored for every character of each of d iffe rent font and type sizes. By comparing a database of stored bitmaps with the bitmap of scanned character the program tries to recognise the letters. The pattern having best correlation is considered to be the scanned letter. If none of the pattern has a sufficient high degree of correlat ion with the scanned character, the character is considered to unclassifiable . This fa mily of methods is usually not computationally heavy as well as quite robust to noise. But this system has the severe drawback: that is the lac k of generality as it is only useful for the fonts and sizes stored. Comple x mu ltifont documents are beyond its scope. On the other side, feature extract ion method attempts to recognise characters by identifying th eir universal features and make OCR type face

    independent. In this method optical scanner looks for certain features in letters such as intersections in lines, diagonal lines, shapes in the characters that are closed, shapes that are open etc. Then these read features of scanned letter are co mpared to list of features that are available in the softwares programming code. This method is more versatile because it works with many types of fonts and characters.

    Our algorith m is feature analysis based. As a consequence, it shows great generalization capabilities. Moreover, all needed features can be extracted by using Morphological Operators.

    Morphology is the branch of biology that deals with the form and structure of anima ls and plants . Similarly mathe matica l mo rphology is a tool for e xtract ing image components that are useful in the representation and description of region shape, such as boundaries, skeletons and the convex hull. We also have morphologica l techniques for pre- or post processing such as morphological filte ring, thinning, and skeletonization, p runing [4]. We present an algorith m that has high degree of accuracy on different type of Punjabi fonts and sizes. In this technique we use Morphological operators branch points, end points and thinning of binary images.

    Thinning: Thinning means reducing binary objects or shapes in an image to strokes that are a single pixe l wide.

    Skeletonization: It is another process that reduce binary image object to a set of thin stroke that retain important informat ion about the shape of the original objects.

  2. Pre -Processing

    Pre-processing is done to remove the noise and extra objects in the image so that only the character to be recognized re ma ins in the image. There are several methods for pre-processing: by increasing the intensity level or apply ing various types of filtering etc. To re move the e xtra object, calculate the area of objects in

    the image and sort them in descending order. The largest area is the character area. Except that remove all objects in image based on area.

    Figure 1. Pre-pr ocessing of Image

  3. Feature Extraction

    For the better generalization capability and low computational cost, we considered only three features of characters: holes, junctions and ends. The recognition is based on the number and position of these features. After pre-processing, the image is normalized to a dimension of 50×50, maintain ing the aspect ratio.

    1. Holes

      The first feature we considered is the number of holes e xisting in the characters. In order to obtain an image in which every hole is represented as a point:

      Fill the hole in input image by using following command

      In mat lab following command is used to extract the branch point

      BWO = bwmor ph(BWI,' branchpoints');

      Fro m Input Image BWI, and Return the Output Image BWO.

      Count the number and position of junctions in the

      character, just by counting the number of blac k p ixe ls.

    2. Ends

      An endpoint is a mark o f termination or co mp letion. Every character has a number of end points which play a significant role to recognize a character.

      1 0 0 0 1 0 0 0

      0 1 0 0 becomes 0 0 0 0

      0 0 1 0 0 0 1 0

      0 0 0 0 0 0 0 0

      In mat lab following command is used to find the endpoint

      BWO = bwmor ph(BWI,'endpoints');

      For Input image BWI and output Image BWO

      BWO = i mfill(BWI,' holes');

      Subtract the filled holed image fro m input image and shrink it, you will get the final image.

      Figure 2. Hole de tection

      3.2 Junctions

      Junction is also known as branch points. Its a point where two points meet. For e xa mp le :

      0 0 1 0 0 becomes

      0 0 0 0 0

      1 1 1 1 1

      0 0 1 0 0

      0 0 1 0 0

      0 0 0 0 0

      0 0 1 0 0

      0 0 0 0 0

      Figure: 3 Ste ps for the whole proce dure

      Figure: 4 Ste ps in fig: 3 applie d to binary image containing c har acter

  4. Classificatio n

    First we performed the pre-processing of image , in pre- processing we adjusted intensity of image so that it get perfectly converted into binary image, then re moved the e xtra area or any type of noise in the image, by just calculating the area of objects in the image . The largest area will represent character and all other areas are noise. So, by keeping the largest area we re moved the areas representing noise.

    Secondly, hole detection is done by filling the holes in input image and taking logical d ifference with orig inal image and shrink it.

    Then after thinning of image the Skeletonization applied to the image and the result will be used for both junction detection and for ens detection.

    A first rough classification can be obtained just by counting the number of holes, junctions, and ends. For e xa mple , if the letter has three holes, we can immed iately recognize it as , if it has two holes, and four junctions and two end point, it is . Nevertheless,

    the recognition of a letter by simp ly counting the number of holes, junction and ends is sometimes not

    possible: for e xa mp le, if we count 0 holes, 1 junction, and 3 ext re mities, the letter could be . In this

    case, we can simp ly discriminate them just by looking at the position of the feature points too:

    Div ide the image into two equal parts upper half, and bottom half similarly left half and right half.

    If the junction is at the right side and upper half,

    similarly if two end point is in right and one endpoint in left half and all three endpoint is lie in upper half ,and no hole, then the letter is .

    The decision tree, for the first part is reproduced in Figure5.

    Figure 5: Decision Table

  5. Experime nt

    Table 1. Recog nition Rate for Differe nt type of fonts

    Font

    Recogniti on Rate

    AnmoLli pi

    100%

    Amrlipi

    100%

    Gur bani Akhar

    94.28%

    Asees

    97.14%

    AmrNe on

    91.42%

    We developed an application using matlab 2009 and conducted test on different Punjab i fonts and characters of diffe rent size. And result is shown below. Input to this application is any scanned image magazine, newspaper, old Punjabi record.

    Figure 6: Applicati on to c onduct Experime nt

  6. Result

    The accuracy of this system is very good as shown in table; accuracy is down only in the cases where we need to divide the image into two or four equal parts. If any other languages which do not need to divide the image further equal parts for character recognition, this system will give the accuracy of 100% .

  7. Conclusion and Future Work

A method for recognition of Punjabi characters in machine printed documents is developed based on the morphological operators. The capabilities of this operator in detecting patterns with specific geometric properties in the image, is used to accomplish different essential tasks in a pattern recognition

process. This algorithm can be used for different and it can be extended to other languages.

We have implemented this algorithm on Punjab i characters; we will try this on complete Punjabi word. For example

,

REFERENCES

  1. S. Kahan, T. Pavlidis, H. S. Ba ird, On the recognition of printed characters of any font and size IEEE Trans. Pattern Anal. Machine Intell., vol. 9, no. 2, pp. 274-288, March 1987.

  2. R.E. Ho ward, B. Boser, J.S. Denker, H.P. Graf, D.Henderson, W. Hubbard, L.D. Jacke l, Y. Le Cun, H.S. Ba ird: Optica l character recognition: a technology driver fo r neural networks Circuits and Systems, 1990,

    IEEE International Sy mposium

  3. A Cellula r Neural Network based character recognition system by Danie le Casali,Giovanni Costantini, Massimo Carota

  4. Book Digita l image processing using Matlab by Rafae l C. Gon zale z and Richard E. Woods

  5. M. Sa lman Je lodar, M.J. Fadaeiesla m, N. Mo zayani,

    M. Faze li A Persian OCR System using Morphological Operators World Academy of Science, engineering and technology 4 2005.

  6. See ra, J, Image Analysis and Mathematical Morphology,Acrdemic Press, New York,1982

  7. B. Timsari, Character recognition in typed Persian words: a mo rphological approach, M.S. Thesis,

    Isfahan Univ. Of Tech., Iran,1992

  8. J.W Smith and Z. Merali, Optica l character Recognition, The Brit ish Libra ry, Wetherby, West Yorkshire LS23 7BQ,UQ,1985

Leave a Reply