Android based Mathematical Expression Evaluation from Images

DOI : 10.17577/IJERTV5IS080475

Download Full-Text PDF Cite this Publication

Text Only Version

Android based Mathematical Expression Evaluation from Images

Aanal Patel

IT Dept., LDRP-ITR,

Sector-15, Gandhinagar, Gujarat, INDIA

Neel Acharya

CE Dept., LDRP-ITR,

Dhiren Pandit

Science & Humanities Dept., LDRP-ITR, Sector-15, Gandhinagar,

Gujarat, INDIA

Sector-15, Gandhinagar, Gujarat, INDIA

AbstractCurrent era is an era of automated systems and machine learning plays vital role in construction of automated systems. In machine learning, neural network is key tool. OCR is used for recognition of characters from printed text and mathematical tool of neural network is used for classification. Proposed system combines OCR and Mathematics, hence a system is designed that can compute mathematical expressions from images which contains equations. OCR systems can be used to convert the image of expression to a string of mathematical

equals, not-equals, less-than, greater-than, less-than or equals, greater-than or equals) Boolean operation (like and, or, not) are supported. Also trigonometric functions (like sine, cosine,

…) and other complex mathematical operations (like ceil, floor, log, …) are supported.

  1. Flow Diagram

    expressions and then combined with Mathematical libraries and compiler to obtain solution of such mathematical expressions

    Keywords OCR, Expression, Android, Neural Networks, Evaluation, Image Processing, Character detection, Printed Character Recognition

    1. INTRODUCTION

      Closed-Form Expression evaluation is one of the easiest task in mathematics. Its Computer application is also one such easy task. With built-in operators many mathematical simple equations can be easily solved using computers. However, with increase in mathematical complexity the solution of the equation becomes increasingly tougher task. This is due to need of more complex techniques of problem solution and more complex algorithmic procedures. Many such systems exist today that help us solve complex mathematical equations. With combination of such systems and an Optical Character Recognition system this problem can be transformed to a new domain of printed expression calculation.

    2. M-EXPRESSION FOR MATHEMATICAL EXPRESSION EVALUATION

      M-Expression is one such android and java based system

      Start

      Load image

      Pre-Process image

      Image API

      Detect characters and separate them

      Character detection error?

      No Classification task / OCR

      (Neural-Network)

      Error in OCR?

      No

      Expression in string form

      Yes

      Yes

      under development that does the task of printed expression calculation. The front-end is an android application that allows a user to choose an image of printed expression which is then forwarded to the java based back-end that does the task of OCR and then obtains the expression in a form that the java code can calculate it. This is then further processes with a math library to compile and find errors if any. In case of no errors, the expression is evaluated and a response comprising of calculated answer is sent to the front-end. For the task of OCR, a neural network based approach has been taken that uses a java library to train the dataset of fonts which can be used to classify the test image. Along with a basic set of mathematical operations (like addition, subtraction, multiplication, division, power), relational operations (like

      Connector 2

      Fig. 1. Flow-chart part-1

      Connector 1

      Connector 2

      Mathematical processing

      Compile expression

      Error ?

      No

      Yes

      Connector 1

      Yes End

      Answer

      in which an output pixels value is determined as a weighted sum of input pixel values ( ). But most useful filter is Gaussian filter. Gaussian filtering is done by convolving each point in the input array with a Gaussian kernel and then summing them all to produce the output array. Following is an image of 1D Gaussian kernel.

      Fig. 4. 1 D Gaussian kernel

      Assuming that an image is 1D, it can be noticed that the pixel located in the middle would have the biggest weight. The weight of its neighbors decreases as the spatial distance between them and the center pixel increases.

      2D Gaussian can be represented as:

  2. Preprocessing

    Evaluate Expression

    Evaluate Expression

    Fig. 2. Flow-chart part-2

    Where is the mean(peak) and represents the variance.

    Once an image of mathematical expression is loaded in the system, the image is pre-processed. Pre-processing step includes enhancing the image and grey scale conversion. This step makes the image ready to be properly processed and removes chances of minor errors in the OCR process [1]. For this purpose, different filters are used. In this work following filters are used for better recognition rate.

    1. RGB to Grayscale (Dimension Reduction)

    Given input colored image has three dimensional pixel value (RGB). Hence matrix obtained for this input colored image is three dimensional matrix. It is quite difficult to perform image processing technique on three dimensional matrix, hence for suitable and smoother processing the image is converted in gray format. A gray image has two dimensional pixel value which lies between [0, 255] as per the gray value of a pixel [2].

    Dimension Reduction

    Input Image Grayscale Image

    Fig. 3. Dimensional Reduction using RGB to Gray

    1. Gaussian Blur filter

      Once grayscale image is obtained, filters are applied for smoothing image. Smoothing, also called blurring, is a simple and frequently used image processing operation. Many reasons are there for use of this filter but here smoothing filters are used in order to reduce noise. There are many filters for smoothing but the most common type of filters are linear,

      Fig. 5. Effect of Gaussian filter

    2. Adaptive threshold

    Image binarization or thresholding is an important tool in image processing and computer vision, to extract the object pixels in an image from the background pixels. A number of methods have already been proposed for image thresholding. Bi-level image is used as a pre-processing unit in several applications. The use of binary images decreases computational load for the overall application. These applications include document analysis, optical character recognition system, scene matching, quality inspection of materials etc. The thresholding process computes the threshold value that differentiate object and background pixels. Under varying illumination and noise, the thresholding can become a challenging job. A number of factors contribute to complicate the thresholding scheme including ambient illumination, variance of gray levels within the object and the background, inadequate contrast, object shape and size non-commensurate with the scene. A wrong selection of threshold value may misinterpret the background pixel and can classify it as object and vice versa, resulting in overall degradation of system performance. In document analysis, thresholding is sensitive to noise, surrounding illumination, gray Ievel distribution, local shading effects, inadequate contrast, the presence of dense non-text components such as photographs, etc. There are a number of important performance requirements that need

    to be considered while thresholding gray level images. These include:

    a) Loss of features after thresholding input image should be zero or minimum. b) The features (objects) with similar relative gray levels should have same binary values in the processed output image. c) Thc effect of noise of minor gray level variations should be eliminated.

    N1 N2 N3 N4

    Nn Input Layer

    Hidden

    A B C D Layer

    W X Y Z

    Fig. 6. Effect of Adaptive threshold

  3. Character detection

    Next the system will separate the image into individual

    M1 M2 M3

    Output Layer

    Mm

    character images. This is done by reading the pre-processed greyscale converted image pixel by pixel. A group of black pixels spaced by a minimum threshold is considered as a character in the image. This can be achieved by moving a variable sized window over the image pixel array and then adjusting the window till a suitable group of pixels with predefined threshold distance has been found [3][4].

  4. Classification / OCR

    After detecting each character, the next job is to classify the characters into the clusters of predefined characters. To be straight to the target with the arrow, this step involves OCR technique to detect the character. For this purpose, we use a properly trained neural network that does our job.

    Neural network consists of n input neurons. Where n=(height)x(width) of individual character in the detection step [5][6][7].

    The number of output neuron is m, where m=the number of total detectable characters.

    There could be any amount of hidden neurons but it was experimentally found that 11 to 15 hidden neuron layers give optimal result. (IN our case it is 12 hidden layers) [8][9] [10].

    Multilayer perceptrons can easily learn tougher patterns & complex decision boundaries using feed forward and back propagation algorithms.

    For any input N = (N1, N2, N3, , Nn) are considered to be input layer. Every connection between 2 neutrons/perceptrons, there is an associated weight, Wab (Weight of a connection from node an in a layer to node b in next layer).

    Fig. 7. Neural network: Input layer, Hidden layer, Output layer

    Similarly, there are output neurons M = (M1, M2, M3, , Mm) that provide output of the network. In our case the input values, (N1, N2, N3, , Nn) consists of the pixels of the input character detected. The output layer in our case consists of m node (neurons) each representing a character and the output at any output node (neurons) is either 1 or 0, 1 at the place of character detected & 0 for the rest of outputs. This outputs are ideal & may deviate slightly in practical world.

    There is a connection between input node (neuron) & output node (neuron) via one or more hidden nodes (neurons).

    The value of any node is the weighted sum of all the input values to the node.

    • Initialization network weights

    • Until the termination:

    {

    For each training example:

    {

    Propagate the input forward to the network and compute the observed outputs.

    Propagate the errors backward as follows:

    For each network output unit o calculate its error term

    Where, Vo is the value of node k

    • Initialization network weights

    • Until the termination:

    {

    For each training example:

    {

    Propagate the input forward to the network and compute the observed outputs.

    Propagate the errors backward as follows:

    For each network output unit o calculate its error term

    Where, Vo is the value of node k

    Back propagation algorithm:

    For each hidden unit calculate its error term,

    Where, Vk is the value of node k Finally, update each weight

    Where,

    }

    }

    For each hidden unit calculate its error term,

    Where, Vk is the value of node k Finally, update each weight

    Where,

    }

    }

    Fonts:

    TABLE I. . FONTS USED FOR TESTING

    Important parameters in the above algorithm:

    Two most important parameters in BACKPROPAGATION are the learning rate and the momentum.

    • Learning Rate: in the above algorithm the weights of the nodes are updated by:

      Shruti

      Candara

      Times New Roman

      Carlito

      Consolas

      System

      Book Antiqua

      Below are a few samples of test dataset

      System Font

      Canadra Font

      Book-Antiqua Font

      Fig. 8. Image used for testing

      ,

      this parameter is called the learning rate.

    • Momentum: The weight update part of the equation can be modified in a way that the update in nth iteration also affected by the previous iteration in multiples of what is called a momentum factor. By adding this term to the formula, the update rule will be: . Momentum

    takes values in the range 0 < 1.

  5. Mathematical-Processing (Compiling + Evaluation)

For compiling a mathematical expression in we use a classical calculator stack approach that pushes in the operators and operands one by one into a stack and once everything is in the stack, the stack is evaluated one operator at a time by pushing and popping as per the precedence and associativity of the operator. In case of any error, the process is stopped and error is returned.

III. STANDARD DATASET FOR EXPERIMENT

For testing of the above mentioned procedure, a dataset consisting of 7 different fonts was used.

All the mentioned fonts were tested for characters in ASCII sequence of 32 (space) to 126 (~)

With the setup as mentioned above and 7 datasets, the results

were quite impressive. Efficiency pars the OCR process at approximately 91% and the Mathematical evaluation is quite effective to calculate all the operators thrown at it.

Also it is important to note that for certain fonts with similar looking characters, the error rate was high. For Example, for fonts Broadway & Britannic O (capital O), & 0(Zero) looked almost same, so were erroneous.

Britannic Font confusion for Zero & O

Broadway Font confusion for Zero & O

Fig. 9. Confusion for Zero & O in different fonts

Procedure for the app:

Step 1 + 2 + 3 + 4:

  • Step 5: Check output

Step 1: Open the application

Step 3: Crop the image to select proper expression

Step 2: Select the expression image (capture via camera or browse in the gallery)

Step 4: Wait for the processing

Fig. 11. App screenshots (2)

TABLE II. . EXPERIMENTAL RESULTS: INPUT IMAGE AND EVALUATED EXPRESSIONS

Image

Detected Expression

Evaluated Expression

2^2

4

2^2

4

abs(-90)

90

loglo( 100 )

Error: no operator loglo

log10(100)

2

Image

Detected Expression

Evaluated Expression

2^2

4

2^2

4

abs(-90)

90

loglo( 100 )

Error: no operator loglo

log10(100)

2

Fig. 10. App screenshots (1)

max(100,56)

100

PI*10*10

314

Pi*10*10

314

sin(90)

1

tan(90)

16331239350

000000

infinite

tanh(90)

1

Efficiency

100.00

90.00

80.00

70.00

60.00

50.00

40.00

30.00

20.00

10.00

0.00

5 10 20 30 40 50 60 70 78

Number of Fonts in dataset

Efficiency

Efficiency

100.00

90.00

80.00

70.00

60.00

50.00

40.00

30.00

20.00

10.00

0.00

5 10 20 30 40 50 60 70 78

Number of Fonts in dataset

Efficiency

Efficiency

Efficiency

Fig. 12. Experimental results: Input image and Evaluated Expression

With increase in the fonts in the dataset, we are able to achieve a stable efficiency rate of about 90% to 92%. With increase in font for a certain amount, the efficiency went on to plateau and further more increase led to decline of efficiency due to ambiguity in classification.

REFERENCES

  1. P. P. Roy, J. Llados, U. Pal, Text/graphics separation in color maps, in: Computing: Theory and Applications, 2007. ICCTA' 07. International Conference on, IEEE, 2007, pp. 545-551.

  2. Li Peng, Junhua Li ,A Facial Expression Recognition Method Based on Quantum Neural Networks, Publication: ISKE-2007, October 2007

  3. R. Singh, C. Yadav, P. Verma, V. Yadav, Optical character recognition (ocr) for printed devnagari script using artifcial neural network, international Journal of Computer Science & Communication 1 (1) (2010) 91-95.

  4. M. S. Uddin, T. Rahman, U. S. Busra, M. Sultana, Automated extraction of text from images using morphology based approach, proceeding of international journal of electronics & informatics 1 (1).

  5. F. Gounther, S. Fritsch, neural-net: Training of neural networks, The R journal 2 (1) (2010) 30-38.

  6. C. M. Bishop, Neural networks for pattern recognition, Oxford university press, 1995.

  7. S. B. Maind, P. Wankar, Research paper on basic of artificial neural network, International Journal on Recent and Innovation Trends in Computing and Communication 2 (1) (2014) 96-100.

  8. C. Peterson, T. Rognvaldsson, L. Lonnblad, Jetnet 3.0a versatile artifcial neural network package, Computer Physics Communications 81 (1) (1994) 185-220.

  9. K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural networks 4 (2) (1991) 251-257.

  10. R. Lippmann, An introduction to computing with neural nets, IEEE Assp magazine 4 (2) (1987) 4-22.

Leave a Reply