A Survey on Recognition of Offline Handwritten Words

Download Full-Text PDF Cite this Publication

Text Only Version

A Survey on Recognition of Offline Handwritten Words

M S Patel Shruthi A

Research Scholar,VTU,Belgaum M.Tech Student,VTU,Belgaum Department of Information Science and Engineering Department of Information Science and Engineering

DSCE,Bengalore,India DSCE,Bengalore,India

Abstract – Handwritten Word Recognition (HWR) is one of the attractive and challenging research areas in the field of pattern recognition. Pattern recognition is a type of machine learning it focus on recognizing the pattern and regularities in the data. HWR is challenging task as there is no constraint on human handwritten style, size, variation in angle and shape of the letters. As opposed to the character recognition in which each character in each word is recognized, in HWR each word is treated as individual entity and recognizes the word from the overall shape. HWR are used in document verification, forensic science, historical manuscript etc. This paper survey of the major works on offline handwritten word recognition with the various filters, classifiers along with their corresponding performance. Major works done in English, Arabic, Hindi and other scripts are addressed in this paper.

Keywords: HWR; filters; classifiers

  1. INTRODUCTION

    Handwritten Word Recognition (HWR) is the conversion of handwritten text on the image into computer readable format. Document image processing involves handwriting recognition and HWR can be classified into two methods, namely offline and online recognition, based on the format of the handwriting inputs [1].

    In offline word recognition only handwritten text scanned image is an input to HWR system. Online word recognition will give the temporal information such as position and velocity of the pen along its trajectory to the HWR system.

    Mainly there are two approaches identified in HWR namely segmentation approach and segmentation-free approach. In segmentation approach requires that each word has to be segmented into characters and in segmentation-free approach involve the recognition of the whole word. Line and word segmentation is used in segmentation-free approach to create an index based on word matching [2,3].

    This paper convey about the offline handwritten word recognition system that involved with 6 steps shown in fig 1.

    Fig1. Major steps of HWR system

    1. Data collection

      Handwritten documents are collected from different writers irrespective of age groups. The collected documents are scanned through scanner like HP scanjet G2410 to obtain digitized image. There are many databases available in internet like IAM Historical Handwriting Database, IFN/ENIT, IRONOFF etc.

    2. Pre-processing

      Scanned document is stored as a binary image in the format JPEG, GIF, TIF etc and that is an input to pre- processing. The pre-processing applied to the scanned document to reduce noise in the image using filters like Median filter, Morphological filter, Gaussian filter etc. Many operations are applied in pre-processing for normalization, slant correction, stroke thickness normalization, baseline detection, contour smoothing etc.

    3. Segmentation

      Segmentation is used in order to separate text from graphs, images and lines. In segmentation based approach, the word is separated into character for recognition. In segmentation free approach whole word is consider for recognition without separate the word into character.

    4. Feature Extraction

      Feature extraction is finding the set of parameters that define the shape of a word. There are many features considered namely Structural features, Statistical features, Selected features, Density features, Contour features etc.

    5. Classification

      The classification is the process of identifying each word and assigning it to the correct word class by using classifiers like Neural networks, Hidden Markov Model (HMM), Support Vector Machine (SVM), Nearest Neighbor Classifier etc based on extracted features.

    6. Post-processing

      In post-processing one can improve handwriting recognition rate by relying on contextual post- processing or lexical post-processing, using which recognition rate can be increased by resolving ambiguities [4].

  2. REVIEW

      • Yousri Kessentini et .al proposed approach for multi- script handwritten word recognition. This approach used multi-stream HMM and to combine two low level feature streams namely density based features and contour based features extracted from two different sliding window widths and upper and lower contours respectively. They use IFN/ENIT benchmark database for Arabic script and IRONOFF database for Latin script. This approach achieved 86.2% and 81.2% recognition efficiency in IFN/ENIT for contour and density features respectively, 91.6% and 90% recognition efficiency in IRONOFF-196 for contour and density features respectively [5].

      • Douglas J. Kennard et .al proposed a wordwarping for offline handwriting recognition. To compute 2-D geometric warps they used automatic image morphing and that align the strokes of each word image with the strokes of word images of training examples. On two own datasets got 88.77% and 89.33% recognition accuracy. These are increases of 7.89% and 17.16% than the 1-D DP approach [6].

      • Anuja Naik, M S Patel proposed a method that performs preprocessing steps like skew and slant correction. To find skew of a word the least black pixel in every column are determined, the input image is rotated as per rotation angle to remove skew. Next slant is estimated by finding contour of threshold image and chain of connected pixels representing edges of stroke. Orientation of those edges close to the vertical is considered as slant. Upper black pixels and Lower black pixels are used to determine Upper and Lower baselines respectively. In Skeletonization, Input image is first smoothed by convolution with a Gaussian filter to remove noise. Next iterative erosive, thinning algorithm is applied to reduce width of strokes to width of a pixel. They used structural features for feature extraction and Euclidean distance method is applied for classification that produces single matching word having minimum difference value [7].

    • Soulef Nemouchi et .al presented a Arabic word recognition application to handwritten Algerian city names for that they used by classifier combination. Feature extraction and classification phases are focused here. In this system, they retained three feature sets and four classifiers used namely K Nearest Neighbor algorithm (KNN), Fuzzy C-Means algorithm (FCM), K-Means algorithm and Probabilistic Neural Network (PNN). Simple vote and weighted sum methods are combined for classifiers results and they got 80% of recognition efficiency [8].

    • Ahlam Maqqor et .al presented offline handwritten Arabic word recognition used by multi-stream HMM approach. Two methods are used to extract a set of simple statistical features. From a window which is sliding long that text line right to left and the approach Vertical Horizontal 2-dimentional (VH2D). Thresholding or binarization, normalization, filtering, smoothing and skew detection operations are applied to text image to extract the word feature simplify. Multi-stream approach is used and that involved multi-classifiers, multi-model approach, multi-band approach and multi-stream formalism. Hidden Morkov Model used for recognition and they achieved recognition rate of 78.2% for sliding window,76.6% for VH2D and 83.8% for combination of both [9]

    • Youssouf Chherawala, Partha Pratim Roy and Mohamed Cheriet proposed feature design for offline Arabic handwriting recognition. They evaluate the automatically learned features performance and that is compared with handcrafted features. The recognition model is based on the connectionist temporal classification (CTC) neural networks and long short- term memory (LSTM). HMM model is used as classifier for this method. Multidimensional LSTM network is able to automatically learn features from the input document image. The IFN/ENIT database used as benchmark for Arabic word recognition [10].

    • Silky Bansal, Munish Kumar, and Mamta Garg proposed a approach for recognize handwritten city name written in Gurumukhi script for postal automation. Used holistic approach in which they considered the whole word. For recognizing words they used a tree-diagonal feature extraction technique in which a tree structure comprises of zoning and diagonal feature extraction technique used with SVM and k-NN classifiers. They had collected 18,000 samples of handwritten city names in Gurmukhi script from 60 different writers. Maximum recognition accuracy of 90.8% achieved with SVM classifier [11].

    • Anne-Laure Bianne-Bernard et .al proposed HMM modeling with dynamic and contextual information for HWR. For modeling the contextual units, a state- tying process based on decision tree clustering is introduced here. Then applied this modeling to the recognition of handwritten words and experiments are conducted on three publicly available databases that are Rimes, IAM, and OpenHart [12].

      • Ankush Acharyya et .al proposed HWR holistic approach using MLP based classifier. The holistic approach in handwritten word recognition treats the word as a single, indivisible entity and attempts to recognize words from their overall shape. Neural network based classifier used to classify word images belonged to different classes. CMATERdb1.2.1 dataset used in this approach. The best-case and average-case performances of the technique for data set are 89.9% and 83.24% respectively [13].

      • B Gatos et .al proposed efficient off-Line cursive handwriting word recognition. This approach is combination of two different modes of word image

    normalization and robust hybrid feature extraction. The pre-processing is used in order to correct word skew, word slant and normalize the stroke thickness. Two types of features are combined in a hybrid fashion. The first one divided the word image into a set of zones and calculated the density of the each zone. In the second type of features, calculated the area that is formed from the projections of the upper and lower profile of the word. They used IAM database and got 80.76% recognition rate [14].

    TABLE 1: Brief description on survey.

    Authors

    Script

    Filters

    Classifiers or Classification Method

    Features

    Database

    Accuracy

    Yousri Kessentini et .al

    Multi-script

    HMM

    Density and Contour Features

    IFN/ENIT and IRONFF

    average 83.7% for IFN/ENIT and average 90.8% for IRONOFF-

    196

    Douglas J. Kennard et .al

    English

    2D-Warping and Distance Map

    Own Dataset

    88.77%

    Anuja Naik, M S Patel

    English

    Gaussian Filters

    Euclidean Distance

    Structural Features

    Own Dataset

    Soulef Nemouchi et .al

    Arabic

    FCM ,K-mean, KNN and PNN

    Global Structural Features

    Own Dataset

    80%

    Ahlam Maqqor et

    .al

    Arabic

    Median Filters

    Sliding Window and VH2D approach

    Statistical Features

    Own Dataset

    83.8%

    Youssouf Chherawala et .al

    Arabic

    HMM

    Distribution, Concavity, Visual-descriptor-based and Automatically learned features

    IFN/ENIT

    89.1% for MDLSTM

    Silky Bansal et .al

    Gurumukhi

    SVM and KNN

    Own Dataset

    90.8%

    Anne-Laure Bianne-Bernard et

    .al

    Latin and Arabic

    HMM and Nueral Network

    Geometric features

    Rimes, IAM, and OpenHart

    Ankush Acharyya et .al

    English

    MLP

    Holistic features

    CMATERdb1.2.1

    Average of 83%

    B Gatos et .al

    English

    Minimum Distance Classifier and SVM

    Hybrid features

    IAM

    80.76%

  3. APPLICATIONS

    There has been significant growth in the application of off-line handwriting recognition during last decade.

    • Signature Verification

    • Forensic Science

    • Bank Check Recognition

    • Handwritten Address Interpretation

    • Historical Manuscript conversion etc

  4. CONCLUSION

    Handwritten word recognition is challenging task and it requires higher level of accuracy. Most of the techniques used for HWR are script dependent and holistic approach is avoid the challenges of character segmentation. Which are the features extracted those are used in classification and classifiers are used for word matching based on extracted features. Some authors used combination of classifiers in classification method. Most of the work done in this area achieved more than 80% of accuracy but still an efficient HWR for the recognition of handwritten words does not exist. Applications of HWR are extent and used in many fields.

  5. REFERENCE

  1. Pooja Yadav & Ms. Neha Popli, Handwriting Recognition System A Survey, IJETST Volume 01 ,Issue 03, Pages 405-410, ISSN 2348- 9480,May 2014.

  2. N.Azizi, N.Farah and M.Sellami, Off-line Handwritten Word Recognition Using Ensemble of Classifiers Selection and Feature Fusion, Journal of Theoretical and Applied Information Technology,2005 2010.

  3. Jino P. and Kannan Balakrishnan, HWR for Indian Lnguages: A Comprehensive Survey, Econometric Institute research papers,Feb 2014.

  4. Ashwin S Ramteke, Milind E Rane, A Survey on Offline Recognition of Handwritten Devanagari Script, International Journal of Scientific & Engineering Research Volume 3, Issue 5, ISSN 2229-5518, May- 2012.

  5. Yousri Kessentini, Thierry Paquet and AbdelMajid Benhamadou, A Multi-Stream HMM-Based Approach for Off-line Multi-Script Handwritten Word Recognition, journal pattern recognition letters volume 31, issue 1, January 2010.

  6. Douglas J. Kennard, William A. Barrett, and Thomas W. Sederberg, Wordwarping for Offline Handwriting Recognition, ICDAR, Beijing, September 2011.

  7. Anuja Naik and M S Patel, Offline English Handwritten Word Recognizer Using Best Feature Extraction, IJACTE Volume 3, ISSN 2319-2526, Issue -2, 2014.

  8. Soulef Nemouchi, Labiba Souici Meslati and Nadir Farah, Classifiers Combination for Arabic Words Recognition Application to Handwritten Algerian City Names, ICISP, volume 7340, Pages 562- 570, Agadir Morocco, June 2012.

  9. Ahlam Maqqor, Akram Halli, and Khaled Satori, A Multi-Stream HMM Approach to Offline Handwritten Arabic Word Recognition, International Journal on Natural Language Computing (IJNLC), Vol. 2, No.4, Aug 2013.

  10. Youssouf Chherawala, Partha PratimRoy and Mohamed Cheriet, Feature Design for Offline Arabic Handwriting Recognition:

    Handcrafted vs Automated?, ICDAR, Washington, ISSN 1520- 5363,2013.

  11. Silky Bansal, Munish Kumar, and Mamta Garg, A New Approach for Handwritten City Name Recognition, ICAET, ISBN: 978-1-63248- 028-6, 2014.

  12. Anne-Laure Bianne-Bernard et.al, Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition, IEEE Transaction on Pattern Analysis and Machine Intelligence VOL. 33, NO. 10, Oct 2011.

  13. Ankush Acharyya,Sandip Rakshit,Ram Sarkar,subhadip Basu and Mita Nasipur, Handwritten Word Recognition using MLP based Classifier: A holistic approach, IJCSI,vol.10,issue 2,no 2,march 2013.

  14. B. Gatos, I. Pratikakis, A.L. Kesidis and S.J. Perantonis,Efficient Off- Line Cursive Handwriting Word Recognition,. Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, Oct. 2006.

  15. Andreas Fischer, Emanuel Indemuhle, Horst Bunke, Gabriel Viehhauser and Michael Stolz, Ground Truth Creation for Handwriting Recognition in Historical Documents, 9th IAPR international workshop on document analysis systems, pages 3-10, ISBN 978-1-60558-773-8, USA, 2010.

Leave a Reply

Your email address will not be published. Required fields are marked *