Integrated Segmentation And Recognition Of Handwritten Devnagari Characters

DOI : 10.17577/IJERTV2IS3487

Download Full-Text PDF Cite this Publication

Text Only Version

Integrated Segmentation And Recognition Of Handwritten Devnagari Characters

Mitrakshi B. Patil #1

Department of Computer Engineering, MGMs College of Engineering and Technology, Navi

Mumbai University of Mumbai, India.

Vaibhav E. Narawade#2

Head of Department of Information Technology, Padmabhushan Vasantdada Patil College of Engineering and Technology, Mumbai University of Mumbai, India.

Vijay R. Bhosale#3

Department of Computer Engineering,

MGMs College of Engineering and Technology, Navi Mumbai University of Mumbai, India.

Abstract

Handwritten character recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices.Handwritten Devnagari Characters are more complex for recognition than corresponding English characters due to many possible variations in order, number, direction and shape of the constituent strokes. The main purpose of this paper is to introduce a method for segmenting first a text document of offline handwritten devnagari characters and then recognizing the same. The whole process of recognition includes two phases- segmentation of characters into line, word and characters and then recognition through feed- forward neural network.

Keywordshandwritten devnagari character recognition, Segmentation, line segmentation, word segmentation, character segmentation, lower modifier, upper modifier, Header line, Baseline, feed-forward neural network.

  1. Introduction

    Character recognition plays an important role in the modern world. It can solve more complex problems and make humans job easier. An example is handwritten character recognition. Every individual has his own style of writing. Any individual having a very good knowledge of the script of a language can easily read some words written on a paper, though those are written in very bad manner, on the basis of his/her mental dictionary. Such words cannot be easily read by a machine as there may be various irregularities caused in expressing these words

    which are not easy to handle by a machine. Due to very strange styles of writing, a lot of difficulties are faced in machine recognition process. In recent years, a lot of research has been done in handwritten character recognition, but no work is done on the integration on segmentation and recognition of devnagari handwritten characters. Optical character recognition (OCR) is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine- encoded text [1]. It is a process that converts words or characters, on a printed page into a digital image, and creates a digital file so that users can later search for that text and characters within that text. Handwritten character recognition is an important field of Optical Character Recognition. Here, in this paper, we will be considering the integration of segmentation and recognition using artificial neural networks.

    The paper is organized as follows- The optical character recognition is introduced in section 2. Applications of OCR are discussed in section 2.1. Section 3 describes the Devnagari script. In section 4, the proposed system is given. The overall process is explained in section 5. The experimental results are discussed in section 6.Concluding remarks are given in Section 7.

  2. Optical Character Recognition

    Optical Character Recognition (OCR) translates the scanned printed or handwritten document images into a

    Segmentat ion

    Segmentat ion

    text document. Handwritten Character Recognition is an intelligent OCR capable of handling the complexity of writing, writing environment, materials, etc. Here is the Traditional OCR system structure [2, 6, 8]:

    Document Imaging

    Pre- processing

    Text Lines

    Document Imaging

    Pre- processing

    Text Lines

    Labels

    Classifier

    Feature Extraction

    Labels

    Classifier

    Feature Extraction

    Character Image

    Character Image

    Figure 1: Traditional OCR

    1. Applications of OCR

Following are the applications of OCR [3, 10].

  1. Automatic text entry into the computer for desktop publication, library cataloguing, ledgering, etc.

  2. Automatic reading for sorting of postal mail, bank cheques, postal code reading, commercial forms reading government records, manuscripts and their archival and other documents,

  3. Document data compression: from document image to ASCII format,

  4. Language processing such as indexing, spell checking, grammar checking, etc.,

  5. Multi-media system design, etc.

  1. Devnagari Script

    Devnagari script is different from Roman script in several ways. This script has two-dimensional compositions of symbols: core characters in the middle strip, optional modifiers above and/or below core characters. Two characters may be in shadow of each other. While line segments (strokes) are the predominant features for English, most of the characters in Devnagari script are formed by curves, holes, and also strokes. In Devnagari language script, the concept of uppercase, the lower-case characters, is absent. But the alphabet itself contains more

    number of symbols than that of English. Marathi is an Indo-Aryan language spoken by about 71 million people, mainly the Marathi people of western and central India [4, 7]. It is the official language of the state of Maharashtra. Marathi is thought to be a descendent of Maharashtri, one of the Prakrit languages which developed from Sanskrit. We know that the Handwriting style varies from person to person. It has a large character set with curves and lines in the shape formation, which may be over lapping (touch) in a word. Touching characters can touch each other at different position because of individual writing styles vary greatly. Following are the various regions of a devnagari script [5, 21].

    Figure 2: Devnagari script structure

    Devanagari Script has 13 vowels (svar) and 36 consonants (Vyanjan) and 10 numerals along with modifier symbols. All the individual characters are joined by a header line called Shiro Rekha which makes it difficult to isolate individual characters from the words. There are various vowel modifiers which add up to the confusion [4, 9]. Minor variations in similar characters can be there in the handwriting.

    Figure 3: Modifiers

    • Preprocessing

      The total process of preprocessing of the image can be summarized as follows [4, 14] Normalization, Binarization, Dilation, noise removal, thinning.

      Figure6: Binarized image

      Figure 4: vowels and consonants

  2. The Proposed System

    So, the proposed system can be summarized as [3, 6, 11, 12, 13, 15, 18, 22].

    Figure7: Dilated image

    Segmentation

    Image Acquisition

    Pre-processing

    Pre-processing

    Line segmentation

    Line segmentation

    Word Segmentation

    Word Segmentation

    Character Segmentation

    Figure8: Thinned image

    • Line segmentation and word segmentation

      It includes segmentation of lines based on the Bounding box formation. Detection of shirorekha has been done based on the line which contains maximum white pixels. For that purpose, first the joint points have been found out which is a new concept which never has been used. We removed the joint pixel which joins the shirorekha with the character. The lines wih white pixels have been expanded, and the lines with maximum white pixels are detected as

      Recognition

      Recognition

      the shirorekha. Baseline has been also detected.

      Figure 5: The proposed system

  3. The Overall Process

    In the proposed system, the recognition process of scanned text document image to the digitized image consists of the following steps [11, 17] – Preprocessing, Segmentation of lines, Segmentation of words, Segmentation of Characters, Recognition using neural network

    Figure9: Image with joint points identified

    Figure10: Shirorekha detection

    • Character segmentation

      Figure11: Detection of vertical bars For segmenting the characters, we have first identified the vertical bars. Then using the bounding boxes, the characters have been separated.

      Figure12: Segemented characters

      Figure13:Labelled characters

      • Neural Networks

    Neural Networks are definitely the preferred approach for recognizers, in cases of small variability of patterns. Neural networks are ideal for specific types of problems, such as processing stock markets or finding trends in graphical patterns . Here, we have used the feed- forward network to recognize the handwritten devnagari characters. The Feed Forward neural network with one hidden layer has been used to recognize the segmented characters of devnagari script [16, 18, 19, 20]. In this work, we have taken 160 input nodes, that is, for 40 characters; we have taken 4 samples each. The hidden nodes are 40 and the output consists of 40 classes. The database used is as shown.

    Figure14: The database

  4. Experimental Results

    The method implemented here gives almost 100% results for the segmentation. The system is able to identify the Shirorekha, Baseline properly. It also does the line segmentation, word segmentation and character segmentation properly. The results of segmentation are promising. But it gives 60% results for the recognition of the handwritten devnagari text, as the devnagari text here which is taken as an input is handwritten. Also as the database is handwritten, the system is not able to recognize it 100%.From the experiments, we noticed that mainly the error occurred because of similar shaped characters and the connected characters.

  5. Conclusion and Future Work

    Development of handwritten Devnagari OCR is still a challenging task in Pattern recognition area. The integration of Segmentation and Handwritten Devnagari characters has never been done yet. Earlier, only integration of segmentation and recognition has been done only on the numerals, that too, is done on the English numerals.

    There is a lot of difference between hand-printed and machine-printed. In hand-printed writing a lot of irregularities are committed by the writers. These irregularities drop the recognition rate of an OCR a lot.

    A lot of work needs to be done for the recognition of handwritten devnagari characters. As a future work, we can work on recognition for getting more accurate results. Also, work needs to be done on the conjunct characters.

  6. References

  1. Aarti Desai,Latesh Malik,Rashmi Welekar, A New methodology for Devnagari character recognitionJMIJIT,January 2011 Vol.1,issue 1,@Jm academy ISSN:Print 2229-6115

  2. Surbhi Syal, Sandeep Sood , Sunny Sharma, er. Navneet Randhawa, Segmented Character Recognition using Neural networks, International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 1, Issue 4, pp.1731-1735

  3. Jesse Hansen, A Matlab Project in Optical Character Recognition.

  4. S. Arora, D. Bhattacharjee, M. Nasipuri, D. K. Basu &

    M. Kundu, Recognition of Non-Compound Handwritten Devnagari Characters using a Combination of MLP and Minimum Edit Distance International Journal of Computer Science and Security (IJCSS),Volume (4) : Issue ( 1)

  5. Satish kumar, An Analysis of irregularities in Devnagari Script writing, (IJCSE) International Journal on Computer Science & Engineering, Vol 2,

    No.2, 2010,274-279

  6. Anita Pal, Dayashankar Singh, Handwritten English Character Recognition Using Neural Network, International Journal of Computer Science & CommunicationVol. 1, No. 2, July-December 2010, pp. 141-144

  7. Ajmire P.E. and Warkhede S.E., Handwritten Marathi character (vowel) recognition, Advances in Information Mining, ISSN: 09753265, Volume 2,

    Issue 2, 2010, pp-11-13

  8. Satish Kumar, A Three Tier Scheme for Devanagari Hand-printed Character Recognition 978-1-4244- 5612-3/09/$26.00_c 2009 IEEE

  9. Sandhya Arora, Debotosh Bhatcharjee, Mita Nasipuri, Latesh Malik, A Two Stage Classification Approach for Handwritten Devanagari Characters International Conference on Computational Intelligence and Multimedia Applications 2007, 0-7695-3050-8/07

    $25.00 © 2007 IEEE, DOI 10.1109/ICCIMA.2007.254

  10. J.Pradeep1, E.Srinivasan2 and S.Himavathi3,

    Diagonal Based Feature Extraction For Handwritten Alphabets Recognition System using neural Network, International Journal of Computer Science &

    Information Technology (IJCSIT), Vol 3, No 1, Feb 2011

  11. Naresh Kumar Garg, Lakhwinder Kaur, M. K. Jindal,

    Segmentation of Handwritten Hindi Text, ©2010 International Journal of Computer Applications (0975 8887) Volume 1 No. 4

  12. Dayashankar Singh, Sanjay Kr. Singh, Dr. (Mrs.) Maitreyee Dutta, Hand Written Character Recognition Using Twelve Directional Feature Input and Neural Network, ©2010 International Journal of Computer Applications (0975 8887) Volume 1 No. 3

  13. Seong-Whan Lee and Sang-Yup Kim, Integrated Segmentation and Recognition of Handwritten Numerals with Cascade Neural Network, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AN REVIEWS, VOL. 29, NO. 2, FEBRUARY 1999

  14. Sandhya Arora, Debotosh Bhattacharjee, Mita Nasipuri, Dipak Kumar Basu*, Mahantapas Kundu, Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition, 2008 IEEE Region 10 Colloquium and the Third ICIIS, Kharagpur, INDIA December 8-10.

  15. Pooja Agrawal, M. Hanmandlu, Brejesh Lall, Coarse Classification of Handwritten Hindi Characters, International Journal of Advanced Science and Technology Vol. 10, September, 2009

  16. Srinivasa Kumar Devireddy, Settipalli Appa Rao, Hand Written Character Recognition

    Using Back Propagation Network, Journal of Theoretical and Applied Information Technology © 2005 – 2009 JATIT

  17. http://en.wikipedia.org/wiki/Handwriting_recognition

  18. http://tcts.fpms.ac.be/rdf/hcrinuk.htm

  19. S.Arora,D.Bhattacharjee,M.Nasipuri,D.K.Basu,M.Kun du,Application of statistical features in Handwritten Devnagari Character Recognition, International Journal of Recent Trends in Engineering,vol2,No.2,November 2009

  20. Muhammad Faisal Zafar,Dzulkifli Mohamad, Razib M. Othman,Online Handwritten character recognition:An implementation of Counterpropagation neural net, World academy of science, Engineering and Technology 10 2005.

  21. Veena Bansal and R. M. K. Sinha,Segmentation of Touching and Fused Davanagari Characters

  22. Seong-Whan Lee and Sang-Yup Kim, Integrated Segmentation and Recognition of Handwritten Numerals with Cascade Neural Network, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2, FEBRUARY 1999 285

Leave a Reply