Handwritten Devanagari Character Recognition Using Convolutional Neural Network

DOI : 10.17577/IJERTV13IS040076

Download Full-Text PDF Cite this Publication

Text Only Version

Handwritten Devanagari Character Recognition Using Convolutional Neural Network

Asra Masrat

Dept. of Information Technology

K.J Somaiya College of Engineering Mumbai, India

Yogita Borse

Dept. of Information Technology

K.J Somaiya College of Engineering Mumbai, India

Deepa Kumari

Dept. of Information Technology

K.J Somaiya College of Engineering Mumbai, India

Hardika Gawde

Dept. of Information Technology

K.J Somaiya College of Engineering Mumbai, India

AbstractIn this paper, we explore the use of Convolutional Neural Network for handwriting character recognition. The details and working of the numerous tried and tested models is explained in depth. Recent improvements in CNN technology have produced significant advances in Handwritten Character Recognition by learning discriminating qualities from enormous volumes of raw data. The CNN has a substantial benefit over traditional pattern recognition algorithms in terms that it can extract features, decrease data dimensionality, and classify all in one network structure. A 32 x 32 pixel Handwritten character image recorded from canvas can be processed and classified by the neural network. In less than 26 epochs, training converges. The final model for Devanagari consonants (and numerals) and vowels which have been integrated into our intended projectachieve an accuracy of 99.54% and 99.64% respectively. These results show that the suggested CNN has superior classification performance, proving that it is a viable real-time solution for Handwritten Character Recognition.

Index TermsConvolution Neural Network, Language Learn- ing, Handwritten Character Recognition, Devanagari Characters Dataset.

  1. INTRODUCTION

    The Devanagari script evolved from the North Indian script Gupta and, eventually, from the Brahmi script. Devanagari is a left-to-right alphasyllabary. Devanagari Script is composed of 13 vowels and 36 consonants. Consonants and vowel sequences are written as units in this segmental writing system. The Devanagari script is written from left to right and top to bottom, with no upper and lower case. Due to the possibility ofvariations in the shape, position, and quantity of the constituentstrokes, unconstrained Devanagari writing is more subtle than English writing. Devanagaris letter order is just like the other Brahmi Scripts. [1]

    Devanagari Script is the worlds fourth most adopted writing system. It has been used to write over a hundred languages. Devanagari Script is essential while learning to write anyIndian regional language like Marathi, Hindi or Sanskrit. Handwritten Character recognition of Devanagari text can have

    multiple applications. Because of its significant contribution to automation and AI-based language learning platforms, handwritten character recognition is garnering more atten- tion. We have integrated our model for handwritten character recognition of Devanagari Characters in our language learning application. This model has found its application in writing practice module of Marathi Language.

    In handwriting character recognition (HCR), the model interprets the users handwritten characters which could be alphabets, numerals or words, and converts them into a format that is machine-readable. HCR of complex-shaped characters is still seen as a challenge by many. By learning discrimi- natory qualities from enormous amounts of raw data, recent breakthroughs in convolutional neural networks (CNN) have made significant progress in HCR.

    The Convolutional Neural Network (CNN) is a deep learn- ing architecture inspired by the natural visual perception process of the human brain. Due to increase in volumes of available data and the enhanced power of graphic processing units (GPUs), the study of CNN has increased tremendously and it has found its applications in image classification, text detection, object tracking, action detection, speech processing, NLP, etc.

    The main goal of this study is to develop a low-cost custom architecture for recognizing handwritten Devanagari characters.

  2. LITERATURE REVIEW

    A paper by S.Gadge, K.Kharde, S.Bhere, R.Jadhav, titled Recognition of Handwritten Devanagari Character using Con- volutional Neural Network, has explored the use of Convo- lution Neural Network on Devanagari characters.[2] In this study, they employed different approaches in CNN architecture and evaluated their accuracy to determine the best match

    IJERTV13IS040076

    (This work is licensed under a Creative Commons Attribution 4.0 International License.)

    based on the accuracy of each strategy. They covered other methodologies utilized by other individuals employing CNN, ANN, SVM, and so on. They employed two separate datasets, namely, the Devanagari Handwritten Character Dataset and Devanagari (Nepali) Handwritten Character Dataset, both of which contained a wide range of data and were large enough to verify validity. These datasets contained consonants, numbers, and vowels. This systems primary focus was on Marathi language characters. This paper includes a full description of the creation of the convolution neural network its usage to categorize the photos, along with a comparison of the four CNN designs and their accuracies.

    The methodology is provided in its entirety, including the number of layers and their respective responsibilities inthe CNN model. They also looked into other CNN designs with different layer compositions and calculated the accuracy of each. Architectures 1 and 2 go into great depth on the correctness of training and testing data with constants and integers. In addition, designs 3 and 4 discuss performance in terms of training and testing accuracy, and the numberof epochs. They then thoroughly compared the performance consequences of each design. [2] In conclusion, the results demonstrate that architecture 2 has a higher testing accuracy of 98 percent for consonants and 99% for numerals, but architecture 4 has a 99% testing dataset accuracy. The total correctness of recognizing Devanagari consonants is 98%, for vowels it is 97.56%, and for numerals it is 99%. [2]

  3. DATASET

    Three datasets were used in this system. Although the final models were based on only two of them, trial and tests made the decision to choose the correct dataset more precise. The three datasets used are the Devanagari Handwritten Character Dataset (DHCD) [5], Devanagari (Nepali) Handwritten Char- acter Dataset (NHCD) [3], Handwritten Devanagari Characters

    – Vowels and Numerals dataset. [6]

    1. Devanagari Handwritten Character Dataset

      The DHCD was created by Shailesh Acharya and Prashnna Kumar Gyawali, and taken from the UCI Machine Learning Repository. An important characteristic of a good dataset is that it must have a variety of data which can provide a better accuracy for models. Large datasets are often preferred for this reason. Based on this notion, the DHCD dataset wasmade by extraction and manual notation of various handwritten documents. This image dataset of Handwritten Devanagari characters consisted of 46 classes of characters, 36 consonants and 10 numerals, with 2000 examples each. Following that, the dataset is divided into two parts: training and testing (85% and 15% respectively). There were a total of 92000 images (78200 in the training set and 13800 in the testing set). The images are of 32 x 32 pixels with the actual image centered within 28 x 28 pixels, with a padding of 2 pixels on all sides. Figure 1 and 2 show what the numerals and consonans in the dataset look like. [5]

      Fig. 1. Devanagari Consonant Dataset (DHCD)

      Fig. 2. Devanagari Numeral Dataset (DHCD)

    2. Devanagari (Nepali) Handwritten Character Dataset

      The DHCD dataset described above did not contain vowels, hence, two datasets were used for Devanagari vowels. The first dataset was the NHCD (Devanagari (Nepali) Handwritten Character Dataset). This dataset consisted of 12 classes, with 221 samples per class for vowels, and the rest of the 46 classes (10 for numerals and 36 for consonants) was not utilized. The samples were collected from 40 individuals who were from different fields. Each picture in this collection is 28 by 28 pixels in size, with a white backdrop and a black character,and it has been cropped to show character boundaries. Figure 3 shows what the dataset looks like. [3]

      Fig. 3. Devanagari Vowels Dataset (NHCD)

      IJERTV13IS040076

      (This work is licensed under a Creative Commons Attribution 4.0 International License.)

    3. Handwritten Devanagari Characters – Vowels and Numer- als

    Handwritten examples of Devanagari vowels and numerals are included in the data. As a result, the collection contains a total of 23 distinct Devanagari characters (10 numerals and 13 vowels). The vowels were obtained from 1400 participants of various ages, respectively. Data was also separated, pre- processed, and saved in a publicly accessible place. Since the DHCD already contained consonants and numerals, only the vowels from this dataset were used to train and test the model. After eliminating the occluded pictures and scribbles, and the numerals, the final data collection comprises 16,250 digitized images of vowels (1250 each). This data was manually divided into folders and was also made available in CSV format, with labels attached. Each image in this dataset has a black background with white character. [6]

    Fig. 4. Devanagari Vowels Mendeley Dataset

  4. METHODOLOGY

    In this section, we discuss the workflow of the modelsthat we have used for handwriting character recognition of Devanagari consonants, numerals, and vowels. The workflow for both the models is the same, the only difference is the datasets used. The pictorial representation of the layers isshown in Fig 5. Before splitting of the dataset, the imageswere scaled down by a factor of 255. The shear range, theaxis along which the image was distorted was 0.2. Following that, the dataset is divided into two parts: training and testing. The dataset was categorized into 45 classes for Devanagari consonants and numerals, and in 13 classes for Devanagari vowels. After splitting the dataset, it is passed through the CNNmodel and the model is trained for 25 epochs. Ourproposed models for Devanagari consonants and vowels usea total of 11 layers each. The input shape of the image is32x32 and the kernel size for the model layers is (3, 3). The CNN Model is composed of mainly three things, convolutionallayer, max pooling layer, and fully connected layer. Different combinations of multiple layers can be applied to enhance the feature extraction and increase accuracy achieved. Non- activation function reLu is used in the initial layers. In the final layers, the softmax activation function has been used. The key

    advantage of employing Softmax is the range of the output probabilities, which will be between 0 and 1. The optimizer used in the model for computing the adaptive learning rates at each epoch is Adam optimizer. The feature extraction involves repetition of sequential steps as is visible in Fig 5.

    Fig. 5. Flowchart of CNN Model 1 for Devanagari Consonants and Numerals

  5. RESULTS AND CONCLUSION

In the beginning stages of the development, some different layer combinations were tried and tested in the CNN, and these models were studied extensively to give us a perspective. Upon further study and trials, two models were proposed by us for Devanagari consonants and numerals, and vowels handwritten character recognition. The model under study was trained for 30 epochs initially. But the accuracy achieved could be further improved and hence, the next run of the models were done for 25 epochs, along with changes in the optimization process. The training accuracy achieved by the model is 99.54%, the testing accuracy achieved is 99.16%. The attained accuracy is higher than that of the model under consideration. Themodel predictions were therefore proved to be valid after the prediction results were personally examined. The dataset for vowels was divided into 13 classes. The training and testing accuracy of the Devanagari vowels model were 99.64% and 99.73% respectively, which were again better than the model accuracy being studied prior to the implementation.The comparison of the accuracies of all the models studied

IJERTV13IS040076

(This work is licensed under a Creative Commons Attribution 4.0 International License.)

and proposed have been tabulated below in Table 1. As can be seen from the table, the performance of our model has been significantly better than the existing ones.

Table. 1. Results ofmodels

REFERENCES

  1. A. Mohite and S. Shelke, Handwritten Devanagari Character Recog- nition using Convolutional Neural Network, 2018 4th International Conference for Convergence in Technology (I2CT), 2018, pp. 1 -4, doi: 10.1109/I2CT42659.2018.9057991.

  2. I. Dokare, S. Gadge, K. Kharde, S. Bhere and R. Jadhav, Recogni- tion of Handwritten Devanagari Character using Convolutional Neural Network, 2021 3rd International Conference on Signal Processing and Communication (ICPSC), 2021, pp. 353-359, doi: 10.1109/IC- SPC51351.2021.9451716.

  3. A. K. Pant, S. P. Panday and S. R. Joshi, Off-line Nepali handwritten character recognition using Multilayer Perceptron and Radial Basis Function neural networks, 2012 Third Asian Hi-malayas International Conference on Internet, 2012, pp. 1-5, doi: 10.1109/AHICI.2012.6408440.

  4. Y. Gurav, P. Bhagat, R. Jadhav, S. Sinha, Devanagari Handwritten Char- acter Recognition using Convolutional Neural Networks, International Conference on Electrical, Communication, and Computer Engineering (ICECCE), 2020.

  5. S. Acharya, A. K. Pant and P. K. Gyawali, Deep learning based large scale handwritten Devanagari character recognition, 9th International Conference on Software, Knowledge, Information Management and Applications, (SKIMA), 2015.

  6. Sai Prashanth, Duddela; Kumar Mehta, R Vasanth (2021), Handwritten Devanagari Characters – Vowels and Numerals (38,750 Isolated im- ages), Mendeley Data, V3, doi: 10.17632/pxrnvp4yy8.3

  7. M. Jangid and S. Srivastava, Handwritten devanagari character recog- nition using layer-wise training of deep convolutional neural networks and adaptive gradient methods, Journal of Imaging, vol. 4, no. 2, p. 41, 2018

  8. K. V. Kale, S. V. Chavan, M. M. Kazi and Y. S. Rode, Handwritten Devanagari Compound Character Recognition Using Legendre Moment: An Artificial Neural Network Approach, 2013 International Symposium on Computational and Business Intelligence, 2013, pp. 274-278, doi: 10.1109/ISCBI.2013.62.

  9. Jomy John, Support Vector Machine for Handwritten Character Recog- nition, KKTM Cognizance – A Multidisciplinary Journal, March 2016,

    ISSN:2456-4168, arXiv:2109.03081

  10. Megha Agarwal, Shalika, Vinam Tomar, Priyanka Gupta, Handwritten Character Recognition using Neural Network and Tensor Flow, Inter- national Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue- 6S4, April 2019

IJERTV13IS040076

(This work is licensed under a Creative Commons Attribution 4.0International License.)