Hindi Handwritten Character Recognition from Digital Image using Deep Learning Neural Network

Download Full-Text PDF Cite this Publication

Text Only Version

Hindi Handwritten Character Recognition from Digital Image using Deep Learning Neural Network

Abhishek Mehta1, Dr. Subhashchdra Desai2, Dr. Ashish Chaturvedi3

1Research Scholar (Department of Computer and Informative Science, Sabarmati University, India)

2Director (Department of Computer and Informative Science, Sabarmati University, India)

3 Registrar (Department of Computer and Informative Science, Sabarmati University, India)

Abstract: This paper presents an overview of feature extraction methods for off-line recognition of segmented (isolated) digit/chratchter. Selection of a feature extraction method is probably the single most important factor in achieving high recognition performance in character recognition systems. Different feature extraction methods are designed for different representations of the digit/characters, such as solid binary characters, skeletons (thinned digit /characters), or gray level sub images of each individual character. Latest research in this area has been able to grown some new methodologies to overcome the complexity of Guajarati digit writing style. The recognition of handwritten digits which are written in proper way to easily readable. The problem is human can write digit in different styles so it is not identified by the computer but the some feature extraction methodologies like end point, junction point; straight lines etc. For features identification and character classification studied different algorithm and technique.

Keywords: Character Features Extraction, Digit Recognition, End Point, Junction Point, Classification of Digit.

INTRODUCTION

Optical character recognition refers to the branch of computer science that involves reading text from paper and translating the images into a form that the computer can manipulate (for example, into ASCII codes). More recently, the term Intelligent Character Recognition (ICR) has been used to describe the process of interpreting image data, in particular alphanumeric text. Forms containing characters images can be scanned through scanner and then recognition engine of the OCR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters). [14] Therefore, OCR allows users to quickly automate data capture from forms, eliminate keystrokes to reduce data entry costs and still maintain the high level of accuracy required in forms processing applications. The technology provides a complete form processing and documents capture solution. Usually, OCR uses a modular architecture that is open, scalable and workflow controlled. It includes forms definition, scanning, image pre-processing, and recognition capabilities. [16] The OCR challenges, first is layout detection like small fonts, curves types fonts, second is text type like broken character, chratchter mixed with noise, characters in old documents that are not available in modern computer fonts, third is language issues like Historically spelling was not unified and

consequently there are many different writing variants. Current most of the research is based on to identify the isolated, word, phrases, or the entire document. The Indian scripts are a composition of the constituent symbols in two dimensions. [18] In conventional Research, first the segmentation process is applied to word so thats the word is segmented into its composite characters. A lot of research is still needed for word, sentence and document recognition, its semantics and lexicon. Gujarati is a regional language of state Gujarat in India. Gujarati characters are having different shapes and its very difficult to recognize that shapes. Gujarati Handwritten characters recognition is very difficult because it depends on various persons and their writing styles.[13] The main focus of the handwritten digit recognition from image. Handwritten digit recognition is an active topic in OCR applications and pattern classification/learning research. In OCR applications, digit recognition is dealt with in postal mail sorting, bank check processing, form data entry, etc. For these applications, the performance (accuracy and speed) of digit recognition is crucial to the overall performance. Digit recognition is work on template matching techniques were used for machine printed digit recognition.[18]

literature review

The term Intelligent Character Recognition (ICR) has been used to describe the process of interpreting image data, in particular alphanumeric text. So different method is used for Guajarati hand written digit recognition from image that methods are as follow:

Hand Written Digit Recognition Using Special Point: Extraordinary focuses in Image Processing, for example, endpoints and intersection focuses are recognized while looking over and putting away diminished picture in lattice form.[1] They are characterized as focuses vertically, on a level plane, or askew associated with individually a couple of pixels of the diminished picture The End point is characterized as just a single of the eight neighbors as a "1s" and which "E" denotes.[1] The intersection point is characterized as pixel which has more than two "1s" among the eight neighbors. Be that as it may, this is getting numerous undesired intersection focuses, so we applying condition more than two "1s" as its two neighbors, the number 0 to 1 and 1 to 0 changes in the eight neighborhoods of pixel ought to be more noteworthy than or equivalent to

six. "J" means the intersection point. The two highlights, for example, end focuses and intersection point are utilized in arrangement stage.[1]

Figure 1. Stages of Hand Written Digit Recognition Using Special Point

The initial phase in this procedure is to obtain manually written numeral characters by scanner at 300 dpi. That yielded a twofold picture which in this manner put away in compacted position in memory.[1] In pre-handling the primer advances incorporate standardization, digitization and diminishing. It is critical to separate highlights so that the acknowledgment of various numerals ends up less demanding based on the individual highlights of every numeral. Highlight separated extraordinary purposes of digitized picture. Characterization is the vital stag for numeral character acknowledgment. The separated highlights extraordinary focuses are utilized to distinguish numeral. Having extricated the highlights, it is required to store them in some structure. Each example ought to interestingly distinguish a character and each character may speak to by a few particular examples. Lastly those digits are store in one content document like scratch pad, word record. Favorable position of this technique is fast, responsive and regular, less memory prerequisite. Inconvenience of this strategy is Application is just that specific digit like 3, 2 etc.[1]. The output of this method is recognition rates (in percentage) of handwritten numeral characters are as follow:

Figure 2. Result of recognition rates of handwritten numeral characters using special point

Handwritten Digit Recognition Using Image Processing and Neural Network:

At the point when the reasonable structures have been physically filled by different people by then channel these structures with the help of scanner. So now we have pictures

of hand making tries out of digits [2]. By and by possible to recognize the significance of any physically composed digit with the help of AI engine. [2] So now at whatever point any physically composed digit will be given as test commitment to the structure, the yield show will normally give the digi who's relating match regard is recognized. The above strategy is a diagram of human abstract thinking system. Neural framework is used anyway the digits are expelling from picture and store in to database the course of action technique is used. The stages of recognition using neural network are as follow:

Figure 3. Process of Hand Written Digit Recognition Using Neural Network

The neural system we make a two dimensional cluster of 10000 * 95( IN-PUT-ARRAY) components where 10,000 tells that there are that numerous records in the database and 95 speaks to 94 worldwide histograms and 1 speaks to. The distinguishing proof of the digit. At that point we additionally have an other exhibit of 10000 * 10 (IDEALARRAY) components where 10000 records and 10 is spoken to recognize every digit. Here make a database to store each picture of information for every one of the digits and make a table let say Digit-information which has 95 segments (a1-a94). In segment 1 and store the digit ID suppose any digit from ( 1 to 10 )So we read all the 1000 pictures recently cut and put away one by one and store the worldwide histogram In these segment for digits 0 to 9.[2] 94 input neurons in the info layer: 15 shrouded neurons in the concealed layer 10 yield neurons: yield layer which Correspond to digits 0-9.Input Neurons Input required for 94 input neurons is perused from the p4 components of the worldwide histogram. Out-put Neurons 10 yield neurons recommend the relating identification of digits from (1 to 10) [2]. When the neural system has been prepared for each of the ten digits now it is conceivable to recognize the importance of any written by hand digit with the assistance of the prepared neural system [2]. This time the neural system will take the predisposition and weight from the as of now put away content documents and utilize that for distinguishing the neuron terminating grouping. Favourable position of this technique is highest exactness and most common distinctive style digit acknowledgment. Impediment of this technique is Require more memory in light of the fact that distinctive style digit are put away [2].

Figure 4. Result of recognition of handwritten numeral characters using Neural Network

Handwritten Digit Recognition Tested on the MNIST Database:

The neural systems are generally utilized for the acknowledgment for which angle calculation is utilized. It is principally partitioned into three layers:

The first layer relates to the retina implies it coordinates the info picture [3].

The second layer (concealed layer) relates to the extraction of qualities subsystems [3].

The third layer relates to the yield framework.

Every neuron in this layer relates to one of the yield classes [3]. The process of the recognition digit using MNIST is as follow:

Figure 5. Process of Hand Written Digit Recognition Using MNIST Database

This method is describing the different detection and extraction digit in different zone. Three zones are there in these methods that are as follow:

Extraction of East Characteristic Zone:

A point of the picture has a place with the East trademark zone if and just if: This point does not have a place with the item (the white pixels in picture). Starting here, moving in a straight line toward the East, we don't cross the item. Starting here, moving in a straight line toward the south, north and west one crosses the item [3].

Figure 6. Extract the Digit East Characteristic Zone (EZ)

Extraction of West Characteristic Zone:

A point of the picture has a place with the West trademark zone if and just if: This point does not have a place with the article (the white pixels in picture). Starting here, moving in a straight line toward the West, we don't cross the item. Starting here, moving in a straight line toward the south, north and East one crosses the item [3].

Figure 7. Extract the Digit West Characteristic Zone (WZ)

Extraction of the Central Characteristic Zone:

A point of the picture has a place with the Central trademark zone if and just if: This point does not have a place with the breaking point of the item. Starting here, moving in a straight line toward the South, North, East and West we cross the article [3].

Figure 8.Central Characteristic Zone (CZ)

Figure 9.Digit Two after Surround and its Characteristic Zones

Figure 10.Digit Two after Surround and its Characteristic Zones

Preferred standpoint of this strategy is Most regular acknowledgment and drawback is Large database of various units Like MNIST database store distinctive side for digit like north, east, west and More memory require [3].

Feature Extraction based on DCT for Handwritten Digit Recognition:

Highlight extraction is a significant and testing venture in many example acknowledgment issues and particularly in written by hand digit acknowledgment applications [4]. Be that as it may, the extraction of the most enlightening highlights with profoundly biased capacity to improve the order precision and diminish multifaceted nature stays a standout amongst the most essential issues for this errand. The goal of this work is to recognize the ideal element extraction approach that accelerate the learning calculations

while expanding the arrangement precision [4]. The database held for this work is the MNIST dataset that we will depict in more subtleties in the following segment [4].

Figure 11. Simple Database for MNIST

DCT at first utilized for picture pressure, have been of developing enthusiasm among the example acknowledgment network [4]. DCT is a procedure to change over information of the picture into its basic recurrence parts. In this work we explore the viability of DCT highlights for transcribed digit acknowledgment. Consequently, we look at the execution of four variations of DCT coefficients specifically:

upper left corner (ULC) coefficients [4]. DCT zigzag coefficients [4].

Block based DCT ULC coefficients[7]. Block based DCT zigzag coefficients[7].

Figure 12. Example Of Initial Image 28*28 and Image reconstructed with only 15×15 DCT ULC coefficients

Figure 13. Selecting DCT coefficients in a zigzag fashion

So, Extraction based on DCT for Handwritten Digit Recognition the Discrete Cosine Transform (DCT) technique used for classification. The advantage is that maximizing the classification accuracy and disadvantages is that more time requires [9].

comparison table of above methods

No

Title

Method ology

Pros

Cons

1

Hand Written Digit recogniti on using pints

Junction point and endpoint

Fast, responsi ve and natural, less memory require ment

Applicati on is

only that particula r digit like 3

2

Hand Witten Digit recogniti on using image processin g and

neural network

Classific ation techniqu e

Highest accuracy and most natural different style digit recogniti on

Require more memory because different style digit are stored.

3

Hand Written Digit Recogniti on using MNIST

Database

North, east, west style of digit recogniti on

Most natural recogniti on

Large database of different units Like MNIST

database store different side for digit like north,

No

Title

Method ology

Pros

Cons

1

Hand Written Digit recogniti on using pints

Junction point and endpoint

Fast, responsi ve and natural, less memory require ment

Applicati on is

only that particula r digit like 3

2

Hand Witten Digit recogniti on using image processin g and

neural network

Classific ation techniqu e

Highest accuracy and most natural different style digit recogniti on

Require more memory because different style digit are stored.

3

Hand Written Digit Recogniti on using MNIST

Database

North, east, west style of digit recogniti on

Most natural recogniti on

Large database of different units Like MNIST

database store different side for digit like north,

Table- I: Comparison table of recognition methods

east, west. More memory require

  1. "Gujarati language – wikipedia," 16 February 2017. [Online]. Available: https://en.wikipedia.org/wiki/Gujarati_language.

  2. "Tesseract OCR GitHub," 16 02 2017. [Online]. Available: https://github.com/tesseract-ocr.

  3. M. A. Hasnat, M. R. Chowdhury, M. Khan and others, "Integrating Bangla script recognition support in Tesseract OCR," BRAC University, 2009.

  4. N. Mishra, C. Patvardhan, V. C. Lakshimi and S. Singh, "Shirorekha Chopping Integrated Tesseract OCR Engine for Enhanced Hindi Language Recognition," International Journal of Computer Applications, vol. 39, no. 6, pp. 19-23, 2012.

  5. C. Patel, A. Patel and D. Patel, "Optical character recognition by open source OCR tool tesseract: A case study," International Journal of Computer Applications, vol. 55, no. 10, 2012.

  6. R. Smith, "An overview of the Tesseract OCR engine," Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, vol. 2, pp. 629-633, 2007.

  7. Thien M. Ha and H. Bunke, Image Processing Methods for Document Image Analysis, In: Handbook of Character

5

Feature Extractio n based

on DCT for Handwrit ten Digit Recogniti on

Discrete Cosine Transfor m (DCT)

techniqu e used for classific ation

maximiz ing the classific ation accuracy

More time require

CONCLUSION

Gujarati Handwritten digit acknowledgment in computerized pictures is a testing issue because of contrasts in size, style, introduction, and arrangement, just as low picture differentiation and complex foundation. Discover a totally hearty and summed up method for digit acknowledgment; it is hard to give proper contribution to the optical character acknowledgment (OCR) framework. Numerous calculations have been proposed for perceiving content information in a picture. Every strategy gives hearty outcomes for determined arrangement of pictures. We utilized after technique for digit extraction: picture pre-handling, Segmentation and limitation, highlight Extraction, and grouping.

REFERENCES

  1. K.V vaghela, N.P Patel, International Journal of Innovative Research in Science, Engineering and Technology, Automatic Text Detection Using Morphological Oerations and Inpainting, Volume 2, Issue 5, May 2013, ISSN: 0219-8753, P.P.No- 1333-1336.

  2. D.S Sharma, D.G Gupta, International Journal of Computer Applications, Isolated Handwritten Digit Recognition using Adaptive Unsupervised Incremental Learning Technique, Volume 7, Issue 4, September 2010, ISSN: 0975 8887, P.P.No- 27-33.

  3. P. P. Chaudhari, K.R. Sarode, International Journal of Advanced Research in Computer Science and Software Engineering, Handwritten Digits Recognition Special point, Volume 4, Issue 3,March 2014, ISSN:2277-128X, P.P.No- 1467-1470.

  4. P.A Agarwal, R.V Varna, International Journal of Innovative Research in Science, Engineering and Technology, Text Extraction from Images, Volume 2, Issue 4,April 2012, ISSN:2231-0711, P.P.No- 1083-1087.

  5. S.T Tehsin, A.M Masood and S.P Kausar, International Journal of Image, Graphics and Signal Processing, Survey of Region-Based Text Extraction Techniques for Efficient Indexing of Image/Video Retrieval, , Volume 5, Issue 3,November 2014, ISSN:2567-2568, P.P.No- 53-64.

  6. A.A Andhale, P.R Yeolekar, International Journal of Computer Technology & Applications, Survey on Optimized solution for efficient detection of Text from images, Volume 5, Issue 6,September 2013, ISSN:2229-6093, P.P.No- 1950-1954.

  7. Faisal Tehseen Shah, K.Y Yousaf, International Journal of Proceedings of the World Congress on Engine, Handwritten Digit Recognition Using Image Processingand Neural Networks, Volume 1, Issue 4,July2007, ISSN:2290-2293, P.P.No- 5-7.

  8. G.L patel, H.G Zhang, , International Journal of Computer Technology & Applications, A new Feature ExtractionMethod Based on Fourier Transform in Handwriting Digits Recognition, Volume 7, Issue 5,Jun 2013, ISSN:2278-2279, P.P.No- 1245-1250.

Recognition and Document Image Analysis, pp.1-47, 1997.

  1. Ulrich and Jurgen, Pattern Classification Techniques Base on Function Approximation, In: Handbook of Character Recognition and Document Image Analysis, World Scientific Publishing Company, pp. 49-78., 1997.

  2. H. Yasuada, K. Takahashi, and T. Matsumoto, Online Handwriting Recognition by Discrete HMM with Fast Learning, In: Advances in Handwriting Recognition, World Scientific Publications, pp. 19-28, 1997.

  3. G. Rigoll, A. Kosmala, and D. Willet, A Systematic Comparison of Advanced Modeling Techniques For Very Large Vocabulary On-line Cursive Handwriting Recognition, In: Advances in Handwriting Recognition, World Scientific Publications, pp. 69- 78., 1997.

  4. Kam-Faichan and Dit-Yan Yeung, A Simple Yet Robust Structural Approach For On-line Handwritten Alphanumeric Character Recognition, In: Advances in Handwriting Recognition, World Scientific Publications, pp. 39- 48, 1997.

  5. Y. LeCun, B. Boaer, J. S. Denker, D. Henderson, R. E. Howard,

    W. Hubbard, and L. D. Jackel, Handwritten zip code recognition with multilayer networks, International Conference on Pattern Recognition, 1990, pp. 35-44.

  6. K. Fukushlma, T. Imagawa, and E. Ashida, Character recognition with selective attention , 1991 International Joint Conference on Neural Networks (I), pp. 593-598.

  7. neocognitron, IEEE 11-mw. on Neurral Networks, Vol. 2, No. 3, May 1991, pp. 355-365.

  8. W. H. Joerding and J. L. Meador, Encoding a priori information in feedforward networks, Neural Networks, Vol. 4, No. 6, December 1991, pp. 847-856.

  9. J. S. N. Jean and J. Wang, Weight smoothing to improve network generalization, to appear in IEEE tins. On Neural Networks.

  10. J. Wang and J. S. N. Jean, Multirexolution neural work for omni font character recognition, submitted to 1999 IEEE International Conference on Neural Networks.

  11. A. Rajavelu, M. T. Muaavi, and M. V. Shirvaikar, A neural network approach to character recognition, Neuml Networks, Vol. 2, No. 5, 1989, pp. 387-389.

AUTHORS PROFILE

Mr. Abhishek Mehta is working as an Assistant Professor in Parul Institute of Engineering and Technology. He has 5 years and 10 months of teaching experience. His basic qualification is BCA (Computer Science) from Veer Narmad South Gujarat University & MCA from Shrimad Rajchandra Institute of Management and Computer Application,

Uka Tarsadia University. Presently I was undertaken doctoral (PhD) in Calorx Teachers University, Ahmadabad on Emerging Technology of Image Processing and Natural Language Processing. A part from that Digital Image Processing, Natural Language Processing, Artificial Intelligence, Source Code Management, Information Systems is the area of interest of his research where could prepare a system that use in educational area. I had published 5 + research papers/ review papers in international

journal as well as present2+ research/ review paper in national/ international conference and I had attended 15+ seminars and workshops.

Dr. Ashish Chaturvedi is working as Professor in Computer Science Department in Calorx Teachers University. He was published 25 + research papers/ review papers in international journal as well as present 10+ research/ review paper in national/ international conference and attended 30+ seminars and workshops. He also received many awards and achievements.

Mr. Dharmendrasinh Rathod is working as an Assistant Professor in Parul Institute of Computer Application, Parul University. He has 3 years of teaching experience. His basic qualification is B.E (Computer Engineering) from Gujarat Technological University & M. Tech (Computer Engineering) from Parul University. Presently I was undertaken doctoral (PhD) in Parul University. A part from that Digital

Image Processing, Natural Language Processing, Artificial Intelligence, and Data Mining is the area of interest of my research. . I had published 2 + research papers/ review papers in international journal as well as present research/ review paper in national/ international conference and I had attended 5+ seminars and workshops.

Mr. Maulik Patel is student Parul Institute of Computer Application, Parul University. Presently I was undertaken Bachelor of Computer Application (BCA) from Parul Institute of Computer Application, Parul University. A part from that Web Development, Programming Language is the area of interest of my research. A part from BCA, I got 3rd rank in F.Y. BCA and 1st rank

Leave a Reply

Your email address will not be published. Required fields are marked *