Decision Support Through Deep Learning: Application To Image Classification and Recognition

DOI : 10.17577/IJERTV11IS050074

Download Full-Text PDF Cite this Publication

Text Only Version

Decision Support Through Deep Learning: Application To Image Classification and Recognition

Glory Ndele Wum Edmond (1), Simon Ntumba Badibanga (2).

(1) Senior lecturer, Department of Mathematics and Computer Science

(2) Professor, Department of Mathematics and Computer Science.

Department of Mathematics and Computer Science, Faculty of Sciences, University of Kinshasa, Kinshasa, Democratic Republic of Congo. Febrary 2022

Abstract:-By extracting the relevant characteristics of a multilayer perceptron with four layers of convolutions, we were able to create our Convutionnal Neural Network model, for facial recognition, using an existing CNN architecture whose parameters are driven by the gradient retro-propagation algorithm In a database of 1672 images of which 80% (or 1337 learning images) and 20% (or 335 images of tests); The model predicted very strongly at a coefficient of 99.8% reliability while minimizing the risks of 0.02%. This model is tested with 20 epochs.

Keywords :- PMC, CNN, Neurones, Deep Leraning, ReLU, Flatten, Sofmax.


    Deep learning has revolutionized machine learning in recent years. While the first striking results were obtained mainly in image analysis, current work in deep learning now focuses on all types of data and almost all types of processing. Its application has an impact in the field of data science and the extraction of knowledge is considerable.

    However, bases of observations characterize a particular domain (animals, fruit, sick, genes, . . .), which are grouped into several classes. Automatic image classification is an application of pattern recognition, which consists of automatically assigning an image to a class using a classification system. [2, 6, 8]

    The problem of this research is to recognize the images, restructured them, and to apply the techniques of searching for learning images and tests to facilitate decision-making, using a model of a multilayer perceptron of a convolutional neural network.

    This problem formulates the hypothesis that images, can be classified into several classes, to allow better processing, which will help optimization in deep learning to search for images; the recognition of images similar to a query image, as well as the achievement of good performance on average on all images.

    The objective of this research is to facilitate the task of searching for images in automatic classification thanks to automatable methods that allow a machine to evolve through a learning process and thus, perform tasks that are difficult or impossible to be performed by more conventional algorithmic means.

    Thanks to the Analytical method that made it possible to analyze the different models of convolutional neural networks supported by techniques based on the (CNN) that are part of the types of deep neural network (Deep Neural Network) and the documentary technique; this model, which is a multilayer perceptron driven by the gradient retro-propagation algorithm using an existing architecture of the Convolutional Neural Network, in acronym (CNN) could be realized


    Machine learning is a technique through which a machine acquires new knowledge for future use. Machine learning is formed of two types of learning, it is a field of data science,

    which creates a machine = () defined by :

    = {(, ) , }. This machine can be created

    through: (Decision Tree, Neural Network, Vector Machine

    Support (SVM), Bayesian Network, Random Forest). [2,3,5,6]

      1. Unsupervised learning

        Unsupervised learning, also known as automatic classification, consists of highlighting information hidden by the large volume of data, in order to detect hidden trends in this data. The most appropriate techniques for this type of learning are: Clustering, Principal Component Analysis and Correspondence Factor Analysis. This automatic classification concerns a partition of individuals that obeys the law of intra- class homogeneity and inter-class heterogeneity. This classification is done by two main families of methods namely: classification by partitioning and hierarchical classification. [3,10,11,12,14]

        1. Classification par partitionnement Partitioning classification consists of segmenting a heterogeneous population into homogeneous subgroups in such a way as to minimize intraclass inertia and maximize interclass inertia. In the mathematical sense this classification consists in partitioning a heterogeneous population X into K homogeneous

          classes C1, C2, Ck surch as : = { , 1 }.

          1, 2, , .

          ) ; 1

          ) =

          ) =


          Inertia reduces dispersion around the mean, that is, the

          distance between individuals (Xi) and the center of gravity ().

          According to Hyughens Total Inertia = Intraclass Inertia +

          Interclass Inertia.. [7, 8,9]

          = 2(, ) ; =

          =1 =1

          | |

          With Pj Represents the weight of the class, it is the ratio of the

          number of individuals in the class to the total number; and gj : it is the distance of the individual from the center of gravity of his class.

          Figure 2 Nested Partition

      2. Supervised learning

        = 2(, ) ; () =




        , 1

        Supervised learning involves extrapolating new knowledge from a representative sample from unsupervised learning. This learning is done on labeled data; and in the end, it must take into account its classifier. [6,9,13]


        The methods (Algorithms) most used in segmentation

        Vraie classe

          • The K-Means Algorithm (K-medium or Mobile Center);

          • Dynamic Clouds (K-nuclei) which generalizes K-Means;

          • Fuzzy classification;

          • Gaussian mixing model. [12,14]

    2.1.2. Hierarchical classification.

    Hierarchical classification is part of the Unsupervised Learning methods, which includes two main families of

    Superviseur Classifier

    Esteemed Class


    methods namely: hierarchical Ascending Classification and Hierarchical Descending Classification. Individuals are

    represented by a tree structure called the Dendrogram or byDonnées dentrée

    the Nested Partition. [1,5,7]

    7 = ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8)


    Figure 4 : Supervisedlearning


    6 = ( 1, 2, 3, 4, 5, 6, 7)

    5 = ( 1, 2, 3, 4, 5, 6)

    4 = ( 1, 2, 3, 4)

    1 = ( 1, 2) 2 = ( 3, 4) 3 = ( 5, 6)

    1 2 3 4 5 6 7 8

    Figure 1 : Dendrogram

    Deep learning or convolutional neural networks are very

    similar to neural networks. They exploit one of the important characteristics of images, namely the spatial distribution of sampling. They consist successively of layers of convolutions, layers of groupings, and connected layers.

    The term deep learning refers to the many layers that need to be learned as you train. Convolutional neural networks are not directly inspired by biology and rely on learning algorithms that can fundamentally differ from biological brains. However, they learn internal representations that strongly resemble the ideas that one imagines of representations of the visual cortex.

    Considering the classical architecture of a convolutional neural network. An imge is provided as input and is convoluted with filters (first layer of convolution) and whose activation cards are grouped and concatenated.

    3.1. Layers of Deep Learning

    The major layers in a convolutional neural network are: (Convolution layer, Pooling layer, and Fully Connected Layers). [10,12,14]

    1. Convolutional Layers

      Convolution layers are a set of filters that are learned during training. The size and number of these filters are defined a priori. [14]

    2. Pooling Layers

    Pooling layers are predefined functions and reduce the number of parameters to learn for later layers while expanding the receptive field. They operate independently at different depths of the network and do not require any weight to drive. One of the classic operations performed is the maximum

    Good specify this problem

    Estimate a performances

    Use the

    Place a Database


    Construct or ou Adjust the frame of Network

    function, where in the vicinity of N pixels only the maximum is retained in the grouping layer.

    algorithm of




    1. Fully Connected Layers

      The neurons in these layers are all connected to all the neurons in the previous activation maps.

      In short, in general, the components of a convolutional neural network are:

      • Layer of filters convolved on the different channels;

        Figure 5 : Diagram of Deep learning

        3.4. Convolutional operation

        We note ,

        , , the convolution of X by f a pixel of coordinates

        (, ), [0, 1], [0, 1] is defined by :

      • Pooling: maximum value (max pool) or average (avg

        pool) in a certain convoluted window;



      • Transfer functions: ReLU, etc.

        ( )[,] = [ + , + ] [ + 2 , + 2 ]

      • Near the output, fully connected layers (as with multilayer perceptron)

    = 2

    = 2

      1. Contribution of Deep Learning

        Deep Learning has come to solve situations that the previous classical algorithms are incapable of, it is or:

        • Improve the development of traditional algorithms in the processing of Artificial Intelligence data;

        • Develop a large amount of data such as Big Data;

        • Adapt to any type of problem;

        • Extract characteristics automatically [3,11,13]

      2. Diagram of Deep Learning.

    To properly realize a Deep Learning application, it is necessary to respect the following scheme :

    In this case, if + + comes out of the image of X, (that is, + for example) then we take ( + , + = 0), and

    we talk about « Zero podding » to calculate other podding it is

    possible by taking the value of the nearest pixel. We can also

    perform the calculation that on the pixel [, ] such that ( +

    , + ) are always in the image; there will then be the

    reduction of the image of the output dimension. [7,8,10]

    The filter or kernel f is called Kernel or Filter in English. In a convolution neuron, we do not choose the Filter, we learn them, because these are trainable parameters of the network.

    For Input , with c : numbers of channels or channels ; When c = 1, the image is gray level, if c = 3 it is a

    color channel; And, in the intermediate layers C corresponds to

    the number of the previous layer. The convolution neuron =

    ( = ) ; =


    Pooling: operation used to reduce the dimension,

    searches for coarser details of larger structures in the image. (MaxPooling of size l: we take the maximum element of each sub array of l; Sum pooling of size l: we sum all the elements of each sub-array of size l).

    Pooling can be used for the frame of square matrices, for rectangular matrices, it is necessary to exploit an algorithm of the staggering to make it square.

    3.4.1. Matrix scaling algorithm (Reduction to reduced staggered form).

    In general,

    Le Mean Pooling, it calculates the sum of all the values and divides it by the number of values to obtain the representative average of this batch of pixels

    Let , (), there is a staggered matrix

    , () and a matrix () = .

    20 74

    [15 35

    17 52

    21 30] [36 40]

    This reduced staggered form is obtained by elementary

    26 34

    12 60

    15 25

    40 20

    34 25

    operations on the columns, more precisely by the following

    algorithm :

    1. Input : , ()

    1. Initialization : = , matrix unit of order n, j = 1

    2. Main loop : for i = 1 to m :

      By applying mean pooling, the starting matrix is divided into a region of square matrices of order 2, for each region we have retrieved the minimum values to form our Mean Pooling matrix.

      Le Sum Pooling, realizes the sum of all the values obtained

      1. Finding a pivot :

        20 74

        17 52

        144 120

        If Hi,j = 0

        [15 35

        21 30] [ ]

        If thère is , 0

        26 34

        15 25

        132 100


        12 60 40 20

        Applying Su g, ting Matrix is divided into a

        m Poolin

        the Star

        1. X = P(n)

        2. V = V.X

        3. H = H.X


          Else next step.

      2. Set the pivot to 1

      region of square matrices of order 2, for each region we calculated the sum of all the values to form our Sum Pooling matrix..

        1. Flattening

          0 1



          0 15 21 15

          i. =

          ( )


          1 1 [ ] 1

          1. V = V.X

            33 16

            2 21


          2. H = H.X


      0 1 4




      c) Reduce and stagger: set to 0 the letters coefficients of


      2 0 [20 60 5

      the pivot line for s ranging from 1 à j 1 then from j

      + 1 to n, loop {


      1 35 25

      6 60

      7 35

      8 25

      i. = , (, )

      ii. = .

      0 1

      9 40

      iii. V = V.X

      3 0

      40 15


      10 15


      1. j = j + 1

      2. Exit when j = n

      1 [12 38

      Figure 6. Flattening

      11 12

      12 38

      End loop.

    3. Output H staggered from () such that H = M.V

    In the case of our research, we will apply Max Pooling for square matrices as defined in point 3.4.

    The illustrative case or example of Max pooling is given below.

    Le Max Pooling, (Max Subsampling Operation) takes a region and gives the maximum output. [12,13,14]

    By applying flattening by operation Flatten, the Initial Matrices

    form a column vector.


The layer of neurons (ConvNet) is a stacking of these layers induces local properties of invariance by translation. These properties are essential for the purpose of recognizing characters and more generally images that can be seen from different angles. It is in this area that the most spectacular results were obtained while the name deep learning was advanced in order to accompany the growing success and the associated hype.

20 74

[15 35

17 52

21 30] [74 52

Physical word

Coding Preprocessing Analysis

26 34

12 60

15 25

40 20

60 40]


By applying Max Pooling, the Starting Matrix is divided into a region of square matrices of order 2, for each region we have retrieved the maximum values to form our Max Pooling matrix.



Figure 7. Shape recognition step.

  1. The physical world : allow to present the object in its real environment, and in its normal form before any possible treatment.

  2. The coding : it consists in observing the shape of the environment in analog form. In order to observe or represent it in discrete form to be processed in the system. The object is codified by binary sequences.

  3. The Preprocessing : consists of the standardization of coded information to keep only the information essential to the system.

  4. Analysis : it therefore makes it possible to extract the clues that characterize the object represented, in order to establish the parameters on which the learning will be based.

  5. Learning: it consists in memorizing and exploiting the knowledge resulting from the parameters of the analysis. Since the model will already be trained by training, it is then that the new data will be used for the test that leads to optimization (prediction). It is during this stage that we will practically know at what coefficient the new model predicts.

  6. The decision : this is the stage of image recognition, that is, the stage by which the system will establish a definitive classification. This step is also the step of optimization, because it is during this that we will be exactly defining the object.

  7. Interpretation : this is prediction on the image obtained, that is, accurately predict the object. [3,6,11,14]

    1. Principles and experiments facial recognition system.

      The problem of facial recognition is understood as an image of a face whose identity of the corresponding person is to be found. Face recognition is part of the field of pattern recognition. The purpose of pattern recognition is to classify objects of interest into a number of categories or classes. Objects of interest are usually called models or patterns and in our case they are vectors of characteristics. The classes here represent the different people. Since the classification procedure in our case will be applied on vectors of characteristics.

      Recognition is the core of this system, and it is the comparison of the vector code of the face in input with those of the database, and starting from the fact that we want to model a function of the human brain and that we have a classification problem, we chose to implement a Neural Network which is a simulator of the biological neural network. However, in the recognition phase of our system, we used two types of neural network, the first is a Multi Layer Perceptron (MLP) neural network, and the second is a convolutional neural network.

      1. Structure de lapplication.



Classifier Y W

faciale_fr H


V. Application.

    1. Face Database.

      Generally, databases are adapted to the needs of a few specific recognition algorithms. As far as we are concerned, our database consists of 1672 images of faces that we have distinguished into two classes : [2,4,13,14]

      • The class of the person "untouched targets" containing 1113 images.

      • The class of the "non-target" person containing 559 images of faces of people different from Intouche.


        Convolutionnal Neural Network

        Face non reconnue

    2. Separation of Databases.

The implementation of our facial recognition system required to have two image databases: one to perform the learning and the other to test the effectiveness of the data trained. [4,11]

  1. Learning images

    Of the 1672 images of the base, we reserved 80% (1337 images) for training.

  2. Tests images.

Here on the 1672 of the base we took the 20% (335 images) remaining at the service of the test.

Figure 8. Structure of image recognition

La figure 8 shows the image used in our application to test our Convolutional Neural Network.

      1. Application architecture.

        The architecture we used a multilayer Perceptron of our convolutional neural network.

        This architecture uses four convolution layers, with which the image is filtered before it is vectorized (flattened) of the pixel matrix. This allowed a good prediction, because with these four convolutional layers the model predicts at 99.8% reliability and minimizes 0.02% of the error, these results will be presented in the prediction curve. [3,10,13,14]



        Figure 9. Architecture of our CNN

      2. Retro-propagation Algorithm.

        The feedback algorithm used in this application is executed in three necessary steps which are :

        1. Resizing input images to 50x50x1 Format ;

        2. The construction of a CNN structure with four convolutional layers by associating a ReLU correction layer in each convolutional layer whose first layer uses a depth of 64 neurons, the second layer uses a depth of 32 neurons, the third layer uses a depth of 16 neurons and the last one uses a depth of 8 neurons with the nucleus of size 3×3. However, max pooling of size 2×2 is applied after two convolutional layers.

        3. After extracting all the features, the Flatten operation will flatten the images..

          • Dense : denotes the number that wants it on a channel, it always takes as input the result of the flattened image of the previous layer. [13,14]

          • Sofmax : is a classifier that designates the distribution of probability or that the sum of all probabilities of the output must be equal to 1.

      3. Sequential model testing

        Figure 10. Structure of the implementation of the CNN model architecture sequentially.

        As we see, the sequential model is convoluted to four layers, using the ReLU activation function, the total parameters of the model is 57,440. Prediction with training data to see how the model will predict with 20 epochs.

        Figure 11. The result found when evaluating the CNN model with 20 epochs.

        As we said before, with 1337 training images, the model to use 20 epochs to predict at a coefficient of 99.8%, so we concluded that our optimizes our learning.

      4. Overview of the GUI

        Figure 12. Testing the application with a Java interface..

        The GUI is created in Java script to test the model.

        With this interface, we confirm the result of our model, this is a certainty that validates the search result.

      5. Presentation of results.

Figure 13. Increased accuracy with the number of epochs.

Figure 14. Reduction of errors with the number of epochs.

By analyzing the results obtained, in Figures 13. and 14, we find that the accuracy of learning and validation increases with the number of epochs and after that, it falls again, which means that with each epoch where the accuracy accumulates, the model no longer learns information. If the accuracy is decreased, then we will need more information to make our model learn and therefore we must increase the number of epoch and vice versa. Similarly, the learning and validation error decreases with the number of ecpochs.


In summary, image classification is an important task in the field of computer vision, object recognition and machine learning. The objective of this work is to carry out a classification application of an image database into a set of classes in order to recognize the objects in the images, if necessary was that of facial recognition. To operationalize this classification in deep learning; We used the learning method that has shown its performance in recent years and we chose the Gradient Retro-propagation Algorithm as the classification

method, this choice is justified by the simplicity and efficiency of the method.

Thanks to the PYTHON language under the environment of the Anaconda (Tensorflow), we were able to implement our model, this also works thanks to an application created in Javascript on NetBeans 8.4. For each ime train the parameters of the model with the new data. The result obtained during the test phase confirms the effectiveness of our approach.


[1] Antoine Cornéjols, Laurent Miclet, Yves Kodratoff, Apprentissage Artificiel : Concepts et Algorithmes,Deuxième tirage Eyrilles, 2003.

[2] Achraf Cherti, Jargon Informatique, Logiciel, Version 1.3.6, Avril 2006.

[3] Christian Gagné, Réseau de neurones à Convolution, éd. Laval, 2008

[4] Gerard Swinnen, Apprendre à Programmer en Python 3, 2010.

[5] Karpathy A., Convolutional neural networks for visual recognition.

Neural networks, 2016.

[6] Phillipe.preux, Fouille de données, Lille, 2009 ;

[7] Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification, Miley interscience, 2001.

[8] Thibault Allançon, Introduction à lapprentissage artificiel, Paris, 2016.

[9] V. Vapnik. The Nature of Statistical Learning Theory. Springer- Verlag (200)

[10] Lee, H., Grosse, R., Ranganath, R., and Ng, A.Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conférence on machine learning, P. 609 616. ACM.

[11] Marc Bidan, Intelligence artificielle : les défis de lapprentissage profond, Universités Nantes 2018 -2019.

[12] Samuel, A. L. (2000). Some studies in machine learning using the game of checkers. Journal of research and development, 44 (1.2) : 206 226.

[13] S. DIB, Identification des individus multimodals : application sur les images du visage, Thèse de magister, Université de Mohamed Boudiaf, Oran, 2015.

[14] M. LEMMOUCHI, Identification des visages Humains par réseaux de neurones. Thèse de magister, Université de Batna 2, 2013.

Leave a Reply