Convolutional Neural Network based Recognition of Myanmar Text Warning Sign for Mobile Platform

DOI : 10.17577/IJERTV8IS010102

Download Full-Text PDF Cite this Publication

Text Only Version

Convolutional Neural Network based Recognition of Myanmar Text Warning Sign for Mobile Platform

Saw Zay Maung Maung University of Computer Studies, Yangon

Nyein Aye Computer University, Hpa-an

Abstract

Recognizing text image from mobile phone is a challenge task for limited capacity and processing power. And also the accuracy of the system is important for text image recognition system. In this paper, we aimed to develop a Text Image Recognition System for mobile environment using Myanmar Character Dataset. Firstly, the image captured from mobile phones camera and then segment each connected character using Connected Labeling Algorithm. After that, this connected components passed through layer by layer into the Convolutional Neural Network to recognize the words in a given warning text sign. This paper present Myanmar character recognition from various warning text signboards using end to end recognition of Convolutional Neural Network.

Keywords Connected Labeling Algorithm, Convolutional Neural Network

  1. INTRODUCTION

    There are many image processings utilities to be applied in recognizing characters in an text image. Earlier day, many Optical Character Systems is developed for computer system such as Laptop and Desktop computer environments and many applications are seen in real world environment using OCR. Nowadays, we use smart phone in every day. With smart phone, we can develop much smart system like the system of earlier day to take many advantages in advanced technology. The mobile computing devices include a camera so that software in the device can use this device to take pictures of the images such as a hand written text as well as printed text. For mobile devices, many image processing tasks can be applied to develop many smart applications using OCR.

    With innovation technology, many modernized algorithms are applied in image classification and recognition system. Deep learning is one kind of machine learnings subfields and it gave the system with high accuracy. The Convolutional Neural Network is one kind of Deep Learning Algorithm and it applied in much image recognition system to get the results with higher accuracy. There are many deep learning frameworks (Tensorflow, Caffe, CNTK, PyTorch, Keras, and Deeplearning4j) to develop deep learning application. In this paper, we used the Tensorflow and Tensorflow Lite framework to use in mobile environment. Mobile phone includes many devices (Sound, Camera, and Internet) to develop many real time applications. By combining deep learning with mobile OS, mobile users can

    get many advantages with high accuracy and high availability.

    The organization of this paper is as follows. Section II provides the related works. Section III shows the step by step image processing in android environment. Convolutional Neural Network is presented in Section IV. In Section V, the natures of Myanmar scripts are explained and we provide the system overview and detail explained of Deep Learning in section VI. Section VII provides the experimental results of this system and Section VIII shows the conclusion of this paper.

  2. RELATED WORKS

    An Kohonen Neural Network based character recognition system explained and gave in [1]. This paper provided an framework for object oriented modeling and explained the challenges faced and the feature extraction method to detect characters. The OCR implementation is developed to learn Indian regional languages, as the number of characters including vowels, consonants and complicated letters are very much similar to most of the other Indian languages. In [2], This paper aims to recognize and produce into an editable text from the image using Optical Character Recognition (OCR) method with Tesseract, an OCR engine which along with all image processing suite, is installed in the android app.

    1. proposes a method for Tamil Text detection in natural scene picture. The Maximally Stable Extreme Regions (MSERs) is extracted as character candidates using the strategy of minimizing regularized variations. By using a single link clustering algorithm, Character candidates are merged into text candidates where distance weights and clustering threshold are learned automatically by a novel self- training distance metric learning algorithm. The documents in this application scanned as images and once the image is scanned the data from the image is extracted automatically and will be shown in the application as text. Then the text message is given to the translator tool which will convert the Tamil text into English Text message.

      The text region in document is scanned properly and then it segments the characters in [4]. After preprocessed and recognized a given text image, it will convert the English text into Marathi in translation process. Neural networks have been applied to various pattern classification and recognition. The input to a Kohonen algorithm is given to the neural

      network using the input neurons. And that input neurons get easily trained and having properties like topological ordering and good generalization. It uses smart mobile phones of android platform.

      Myanmar text extraction and recognition from warning signboard images taken by a mobile phone camera is presented in [8]. The horizontal projection profile, vertical projection profile and bounding box are used to segment Myanmar Characters, The blocking based pixel count and eight-direction chain codes features are used in template matching method for recognition.

  3. MOBILE BASED IMAGE PROCESSING

    Image Processing is the task of analysis and manipulation of a digitized image to improve its quality. To use image processing in mobile phone, the image is firstly scan from real image objects using mobile phone camera. After scanning, the digitized image is preprocessed that is suitable for a particular application. In this system, we applied black and white conversion and morphology close processing to enhance the image quality. To segment each character or word, Connected Labeling Algorithm is used to get bounding box characters or words. The algorithm consists of two passes as follow:

    On the first pass:

      1. Scan column pixels first, then row pixels (Raster Scanning)

      2. If the pixel is the foreground pixel

        1. Get the neighboring pixels of the current pixel

        2. If there are no neighbors, uniquely label the current pixels and continue

        3. Otherwise, find the neighbor with the smallest label and assign it to the current pixel

        4. Store the equivalence between neighboring labels

    On the second pass:

    1. Scan column pixels, then by row pixels

    2. If the pixel is the foreground pixel

      1. Relabel the element with the lowest equivalent label

    The method are applied in mobile phone and the result is shown in the figure 1. The development tool for image processing over mobile platform we used is OPENCV Computer Vision and Image Processing library.

    Figure 1. Result of Using Connected Labeling Algorithm Method

  4. CONVOLUTIONAL NEURAL NETWORK

    Convolutional neural networks convert the input data by passing through each layer of the network seamlessly to extract automatically the features of the images. In CNN, there are many different types of layers such as Convolution layer , Rectified Linear Unit , Pooling, Dropout , Fully connected layers.

    Convolution layer in figure 2, also called a feature extractor, extracts features from the input image tries to find them all places in the image by using a matrix called filter(Kernel).

    Figure 2. Convolution Layer

    In figure 3, after each convolution, the Rectified Linear Unit (ReLU) is applied to produce an output by using an activation function of the neurons.

    Figure 3. ReLU activation Layer

    The Pooling layer, in figure 4, reduces the dimensionality of each image from the previous layers, by preserving the most important features.

    Figure 4. Pooling Layer

    Dropout layers set their activation to zero to drops out a random set of neurons in that layer with the aim of reducing overfitting. In Full Connected Layer, every neuron is connected to every neuron by layer to layer.

    The concept of convolutional neural networks is introduced in [6], to find locally sensitive and orientation- selective nerve cells in the visual cortex [7]. A network structure is designed to extracts relevant features implicitly, by restricting the neural weights of one layer to a local receptive field in the previous layer to obtain feature map. By reducing the spatial resolution of the feature map, a certain degree of shift and distortion invariance is achieved [7]. Also, the number of parameters is significantly decreased by using the same weights for all features in the feature map [5].

    The following figure 5 is architecture of CNN for classifying character image

    Figure 5. Convolutional Neural Network

    TensorFlow is an open source deep learning and machine learning software library for high performance numerical computation. It allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.

    TensorFlow Lite is TensorFlows lightweight solution for mobile and embedded devices. It enables on-device machine learning inference with low latency and a small binary size. To apply deep leaning into mobile devices, we first need to convert ternsorflow trained model into tensorflow lite model (.tflite) by using tensorflow lite converter. The following figure 6 show the description of this process for TensorFlow Lite Architecture:

    Figure 6. Flow of TensorFlow Lite

  5. NATURE OF MYANMAR SCRIPTS

    Myanmar characters include consonants or vowels or compound (i.e. combination of basic consonants and vowel modifiers or consonant modifiers. With the nature of complex combinations, Myanmar characters can be divided into two types: basic characters and extended characters. The basic characters (consonants) may stand as single character or may be combined with one or more extended characters. There may be included many combinations of extended characters at the positions of left or right or top or bottom of the basic character. The figure 7 show the chart of Myanmar Scripts.

    Figure 7. Myanmar Scripts and Sample Road Sign

    From the above Road Sign Text, the some words are combined by a consonant and many extended characters. One example is shown in figure 8.

    Figure 8. Myanmar Characters Combination

    VI SYSTEM DESIGN AND DESCRIPTION

    There are two main processes in this system: Training and Recognition. Training is applied in Desktop Computer to produce a trained model that is transferred into Mobile Phone later. Recognition takes place in Mobile Phone using many image processing utilities. To process image captured from mobile camera, this system is firstly converted original image into grayscale image. And then apply Sobel edge detection algorithm to enhance character edges. The Sobel Operator computes an approximation of the gradient of an image intensity. The horizontal changes are computed by convolving image I with a kernel Gx with odd size. For example for a kernel size of 3, Gx would be computed as:

    The vertical changes is computed by convolving image I with a kernel Gy with odd size. For example for a kernel size of 3, Gy would be computed as:

    An approximation of the gradient at each point of the image calculated using the following equation:

    (1)

    After that the thresholding operation is applied to convert binary image using the following formula:

    (2)

    And then apply Morphological Closing to remove small holes (dark regions) using the following equation:

    A B = (AB) B (3)

    Closing is formed by first dilating a image A, after which this dilated set, A B, is eroded.

    Finally the Connected Labeling Algorithm is used to find the characters boundary. Now, the processed characters are ready for character recognition using Convolutional Neural Network. The Detail of Training and Recognition is

    described in subsection 5.1. The overview of this system design is shown in figure 9.

    Figure 9. Overview of System Design

    To train deep learning model for this system, we used Myanmar character dataset collected from text sign. Example of Myanmar text sign are as follow:

    We used most important 250 syllables warning text sign words to train this system. And then we generate the variations of each syllable by using Affine Transformation (rotation, scaling, zooming, shearing) to produce 12500 syllables words. The following figure shows a design of Convolutional Neural Network for classifying Myanmar Character of this system:

    Figure 10. Design of Convolutional Neural Network

    During, Segmented texts are pass through the layer by layer to get activation feature maps for recognition. The visualization of this feature maps is shown in figure 9.

    Figure 11. Visualization of feature maps

    After the training is finished from Desktop environment, we get a tensorflow model (pb file) and then the tensorflow model is converted into tensorflow lite model (tflite file) to embed into mobile environment for recognition.

    1. EXPERIMENTAL RESULT

      As an experimental result, we split the dataset that collected texts from warning text sign into the training samples and test sample as 75% and 25 %. And then we test and train the words with iteration=10, iteration=20, iteration=50. The accuracy result is show in the following table 1.

      TABLE 1: Accuracy

      Iteration

      Precision

      Recall

      10

      0.93

      0.91

      20

      0.94

      0.94

      50

      0.95

      0.94

      The following figure shows the loss and accuracy for iteration 10 and 20, respectively, during training process.

      (a)

      (b)

      Figure 12. Loss And Accuracy (a) iteration=10 (b) iteration=20.

    2. CONCLUSION AND FUTURE WORKS

Text Image Recognition System for mobile environment is not an easy task because of limited capacity. Nowadays, the usages of deep learning are increased in the field of computer vision and image analysis because deep learning provides the results the higher accuracy than many traditional approaches. Mobile phone includes many devices (Sound, Camera, and Internet) to develop many real time applications. By combining deep learning with mobile OS, mobile users can get many advantages with high accuracy and high availability. In this paper, we used Convolution Neural Network to train and recognize Myanmar Character Images to get high accuracy. We get high availability every time and everywhere by transferring the trained model into mobile environment. In future direction, we can apply the recognized text into editable format for some OCR applications. And also provides translation into another language by recognizing the semantics of the combined images. Moreover, we can apply in the areas of image into the application development of speech recognition.

REFERENCES

  1. R.Shukla, Object oriented framework modeling of a Kohonen network based character recognition system,Computer communiction and informatics international conference(ICCCI), p 93-100, 2012.

  2. Ishita Pal, Mohammadraza Rajani, Anusha Poojary, Priyanka Prasad, Implementation of Image to Text Conversion using Android App, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 6, Issue 4, April 2017.

  3. K. Jayasakthi Velmurugan, M.A. Dorairangaswamy,TAMIL CHARACTER RECOGNITION USING ANDROID MOBILE PHONE, ARPN Journal of Engineering and Applied Sciences, VOL. 13, NO. 3, FEBRUARY 2018.

  4. Mayuri B Gosavi, Ishwari V Pund, Harshada V Jadhav, Sneha R Gedam, Mobile Application with Optical Character Recognition Using Neural Network, International Journal of Computer Science and Mobile Computing, Vol.4 Issue.1, January- 2015.

  5. Lecun Y, Bottou L, Bengio Y, Haffner P, "Gradient-Based Learning Applied to Document Recognition", in Proceedings of the IEEE, 1998.

  6. Lecun Y, Bengio Y, " Convolutional Networks for Images, Speech, and Time-Series in The Handbook of Brain Theory and Neural Networks", MIT Press 1995.

  7. HAYKIN S,Neural Networks: A Comprehensive Foundation, second edition. Prentice Hall 1999. Chapter 4 Multilayer Perceptrons, pp. 156 255.

  8. Kyi Pyar Zaw, Zin Mar Kyu, "Camera Captured based Myanmar Character Recognition Using Dynamic Blocking and Chain Code Normalization", International Journal of Scientific and Research Publications, Volume 8, Issue 8, August 2018

  9. https://adeshpande3.github.io/A-Beginners-Guide-To-

    Understanding-Convolutional-Neural-Networks

  10. https://www.tensorflow.org/lite/overview

Leave a Reply