A Face Recognition System using Convolutional Neural Network for Extraction-Grey Wolf Optimizer based Modified Dual Linear Collaborative Discriminant Regression Classification

DOI : 10.17577/IJERTCONV10IS11146

Download Full-Text PDF Cite this Publication

Text Only Version

A Face Recognition System using Convolutional Neural Network for Extraction-Grey Wolf Optimizer based Modified Dual Linear Collaborative Discriminant Regression Classification

Sangamesh Hosgurmath Visvevarayya Technological University Belgavi,India

Viswanatha Vanjre Mallappa

  1. K. E. Societys S. L. N College of Engineering Raichur,India

    Nagaraj B Patil

    Government Engineering College Gangavati,India

    Vishwanath P

    1. K. E. Societys S. L. N College of Engineering Raichur,India

      Abstract Now a days many applications demand biometrics. In those face recognitions is one of the principal methods used for biometric system. The face recognition type of biometric system has many advantages such as contact less, face can be differentiated from one person to another personetc. And Face recognition now used wherever the biometric is used, those are criminal identification, cess control, security application, identity verificationetc

      However, face recognition faces many challenges due to the variations of faces depends on many intra personal variations such as expression changes, changes in face due to age, and illumination changes. The factors make the face recognition challenging.

      To get control of all these challenges, in the proposed algorithm fusing of convolution neutral network, Grey wolf optimizer with modified linear collaborative discriminant regression classification algorithm is done to recognize face images. Here to extract discriminant features automatically CNN technique is adopted.

      At the beginning the CNN is trained by considering the trained data and testing data in percentagewise as 60% and 40% as testing data by considering the number as6.With this more discriminant and effective features were extracted with minimum number use of layers in CNN. Then it uses Grey wolf optimizer to produce the optimized weights. Then the modified Dual Linear collaborative discriminant regression classifier which uses the weights generated from the GWO and it makes the use of combination of loss function and fused distance grade.

      This combination of feature extraction, optimization and classification algorithm improves the recognition rate with less time and number of iterations used also reduced. And face images are recognized accurately.

      In our experiment, the proposed CNN-GWO-DLCDRC method achieved approximately 4%, 8.5% and 9% of FR accuracy on YALEB, ORL and extended YALEB face datasets.

      Keywords Convolutional neural network; Deep learning Features; Euclidian distance; Feature Extraction; Fused distance grade; Large Loss Function;


        Automatic biometric face recognition is currently a new area in image processing. [1].

        Generally, there are 2 different ways of learning. They are 1)simplistic learning

        1. Deep learning

          The simplistic learning includes subspace analysis method [2], Geometric characteristic method [3], Elastic graph matching method [4], Hidden markov model method [5]. In general, simplistic learning methods can extract some basic features of an image, all relying on artificial intelligence to extract exemplary features.

          Methods based on Convolutional Neural Networks are considered deep learning because they can extract more important and complex features such as edge features and planar features along with some basic features

          . Here the direct input image is applied to CNN, which is effective for rotating, translating and transforming the image scaling. CNNs can also automatically extract effective facial features, but they are used as extractors.

          The main purpose of CNNs as feature extractor is to reduce the resolution of image patches with the maximum pooling layer.

          The extracted features are then optimized by using grey wolf optimizer and then fed into the DLCDRC. The Grey wolf optimizer selects the optimal weight in DLCDRC to decrease the reconstruction error. DLCDRC uses a double-step like fused distance grade with a large loss function.

          The large loss function is the combination of inter class and intra class loss functions.

          And large loss function is used to reduce the intra class and inter class variations in within Class reconstruction error (WCRE) and Collaborative between class reconstruction error (CBCRE) .

          Together, the fused distance grade is used to build CBCRE and WCRE with minimal reconstruction values, effectively improving recognition performance.

          In this paper, we consider a face recognition method that combines CNN with Gray wolf optimizer and DLCDRC

          . After feature extraction by CNN used a GWO and modified Dual linear collaborative discriminant regression classification.

          The previous method uses CNN with SVM. In this case, feature extraction is performed by CNN and SVM is used as a classifier for recognition.

          SVM here stands for supervised learning and shows many advantages in small-sample resolution, nonlinear and multivariate pattern recognition.

          Existing methods have limitations related to poor loss function, over fitting problem, data imbalance problem, and unstable performance.

          The input face images are taken from three data sets available online. YALE B, ORL and extended YALE B data sets.

          Data sets considered must be free from noise (noise less) otherwise the intended operation such as classification gets affected. The CNN extracts the most affective features and the weights in DLCDRC can be selected by gray wolf optimizer in order to increase the CBCRE and to decrease WCRE.

          The large loss function of the DLCDRC algorithm uses intra- and inter-class loss functions to reduce the inter- and intra- class variability of CBCRE and WCRE,

          Replacing the reconstruction error computed with Euclidean distance measures can be done with a fused distance grade that gives efficient results in high-dimensional subspaces and then reduces the reconstruction error value for each class.

          In the next step, the CNN GWO-based modified DLCDRC algorithm is tested in terms of recognition accuracy(Critical Success Index),and sensitivity

          Best results are obtained by the combination of CNN with the grey wolf optimizer based modified DLCDRC classifier compared to existing algorithms on three bench mark database;ORL, YALE B and extended YALE B.

          This article is configured as follows.

          Some of the existing papers on the Face recognition are reviewed in Section 2. In section 3 the feature extraction by CNN, Grey Wolf Optimization and Classification by DLCRC algorithm is discussed. In Section 4 the System Model Design is explained, and comparative study of the CNN based grey wolf Optimizer and DLDRC algorithm is described in section 5, Section 6 addresses the conclusion of the research work.


        Shansan Guo et al [6] introduced a combination of CNN and Support Vector Machine to solve facial recognition

        problem. In the evolved model the CNN extracts more latent features and SVM is used for accurate face image recognition. Experimental studies have demonstrated effectiveness with high recognition rate and less training time. Therefore, the optimization method presented by CNN for larger data sets does not balance recognition speed and learning time.

        Reecha Sharma et al. [7] presented a novel hybrid approach using PCA for posture-invariant face recognition. These three steps are combined. In the first step, the face and its parts are recognized by the Viola-Jones algorithm. Five parts of the face image (face, left eye, right eye, nose, mouth) were found. In the second step, the LBP (Locl Binary Pattern) of each part is calculated from the extracted face and the features of the corresponding part In the third step, PCA is applied for recognition. In the third step, PCA is applied for recognition, This hybrid approach using PCA provides improved facial image recognition speed.. This approach only applies to pose invariance and does not take into account other parameters such as lighting, age, and partial occlusion.

        Sun et al. [8] considered a set of advanced deep learning extraction functions known as Deep Identification functions (DeepID) for face authentication. Here CNN is used for feature extraction and not for classification. Through this DeepID method,it was analyzed that facial recognition results are better than iris recognition.

        Yinfei Li et al. [9] presented deep learning face recognition using the Face Texture Function (FTFADLF). The FTFA- DLF mixes the good parts of deep learning and hand craft features. The handicraft features are the texture features extracted from the eyes, mouth and nose. Next both handcraft features and deep learning features are included in the non- discriminatory functional layer. Therefore, we establish a deep learning features that interacts with the hand craft feature to improve the performance of face recognition.

        Tao Lu et al.[10] proposed an efficient three-step sub extreme learning algorithm for face recognition.At the first level, preculturing can be done by clustering into a given set of sample into a specific training subspace.In the second layer, the decomposition method extracts effective features that are noise-free and insensitive to masking, Changing expression and lighting. These low-ranking features are encoded to support Low Rank Extreme Learning Machines (LSELMs). Through this algorithm, an efficient face recognition algorithm is developed, which improves the recognition performance with high efficiency in complex scenarios

        Saiyed Umer et al [11] employed fusion of feature learning techniques for face recognition. It includes 1) face pre- processing, 2) feature extraction technique

        1. classification techniques It includes pre-processing, feature extraction, and classification components. In the pre- processing process using the tree structured model, the face area is highlighted according to the land mark point of the face. A Scale-invariant feature descriptor is computed during feature extraction in the detected face region. A multi class linear vector machine classifier is used for feature recognition. Finally, the scores for the various learning methods are

          summed to determine the subject i.e. the decision level fusion method for validation and summed rule-level fusion method. Even through by this Recognition rate is high, but it takes too much time for recognition. Eventhrough the recognition rate is high, the time required for recognition is more.

          Mingjie He et al. [12] proposes a deformable face net for posture invariant face recognition. Here the deformable convolutional module simulteosly learns face recognition-centric alignment and feature extraction identification. The displacement Loss of coherence is expressed mathematically to know the displacement field of each grid. This aligns the faces so that direction and amplitude match locally. Because the face has a strong structure, it also introduces Loss of Identity consistency (ICL) and Loss of pose triplet (PTL) to minimize intra class feature variation due to different poses and maximize the distance between class features in the same pose. Experimental analysis shows the DFN model effectively recognizes pose invariant faces. The main disadvantage is that it produces remarkable results by considering poses. It does not consider other parameters.

          Yandong wen et al .13] developed different feature learning approach for deep face recognition. CNNs typically use Sift max loss functions as observation signals to train deep models.

          However, in this article, a new function of Loss i.e. Loss of Center for facial recognition problems is used to improve the discriminative power of deeply learned features. Loss of center learns the centroid of the deep feature of each class and reduces the distance between the deep feature and the centroid of that class. Loss function of Soft Max increases the deep feature variation between classes, while the Loss function of central reduces the deep within the class variation. Therefore, joint observation can improve the discrimination ability deeply learned features.


        In this research, Proposed CNN-GWO based DLCDRC methodology comprises of five phases they are data collection, feature extraction by CNN, Grey Wolf Optimizer, and Classification by DLCDRC Classifier and Performance analysis. The input face images are selected from the ORL and Yale data base.

          1. Database Explanation

            In this article, we implement the performance of the CNNGWO-based modified DLCDRC classifier algorithm three public databases YALE B, ORL and YALE B extended database.

            The YALE B Data base contains 15 individuals each of 11 face images, and total of 165 Gray scale images consisting of surprise, sad, sleepy, enter bright, blink, right light and left light [14], with eye glass and without eye glass. Here Gray scale image recorded under different conditions. Examples of YALE B data face images are shown in below Figure 1.

            Figure 1: Examples of YALE B data face images

            The ORL database contain s images of 10 different faces of 40 people and a total of 400 face images [15]. In the ORL database, facial images are recorded at different time intervals, expression, lighting conditions, and postures.

            Additionally, the expanded YALE B database contains 16128 images of 28 faces, each with 576 individual face images taken under 64 lighting conditions and 9 different poses [16]

            Images from the ORL face and YALE B extended face database are shown in Figure 2 and 3 below.

            Figure 2: Examples of ORL data face images

            Figure 3: Examples of YALE Extended data face images

          2. CNN Structure: The CNN Structure Used here for feature extraction consists 9 layers: Convolutional Layers3, max pooling layers3 and batch normalization layers3. The convolution layers here comprise convolution layer, Batch Normalization, ReLU..

        The feature maps of convolutional layer1:30, convolutional layer2:60, and convolutional layer3:80 extract and combine these features, respectively. Assume S1,S2 and S3 are subsample layers with the same number of maps as the number of maps in the previous convolutional layer. Input images are identified by the CNNs output layer according to specific features. Two optimization methods are used in CNNs. First the rectified linear unit function well describes the activation of neural signals displacing sigmodal features in the coevolutionary layer.

        Second, BN is the main building block of CNN architecture, and standard CNN models have a large

        Number of BN layers in their deep structure.

        In the course of training, BN calculates the mean and variance of each mini-batch.

        C. Grey Wolf Optimizer (GWO): The grey wolf optimizer is used to give the solution to the below problems.

        CNN- GWO -based modified DLCDRC algorithm of FR System is used to improve the performance of BCRE and

        Samples of the same class approach get each other, and samples of different classes keep a greater distance by reducing discrimination loss. Therefore, in order to reduce the inter-class and intra-class variation, the loss of discrimination is taken into account in the modified DLCDRC algorithm.

        So the equation is updated to get CBCRE and WCRE

        WCRE. Here a new optimization algorithm Grey Wolf

        = 1

        × 2

        Optimizer is used for selecting the optimal weight value in DLCDRC. Due to GWO-DLCDRC the distance between the





        classes increased extremely and distance within cass


        × 2

        decreases significantly. In order to reduce the RE, the GWO algorithm selects the best and optimized weight value in DLCDRC.As a result, by this algorithm determines discriminant subspace by increasing the BCRE and decreasing

        = =1 =1

        Fused distance grade:



        the WCRE at the same time.

        D. Classification by using Modified Dual Linear Collaborative Discriminant Regression Classification:

        Here, the discriminant subspace is found by minimizing the WCRE and maximizing the CBCRE, relying on the Large loss function of the fused distance grade, which improves the face recognition performance.

        The modified DLCDRC classifier is used for face recognition because it takes into account double metrics in these steps, such as Large loss functions and fused distance grade. A major requirement in an effective face recognition system is to reduce inter-and intra-class variability.

        Large loss functions essentially take full advantage of

        In this subsection, Fused distance grade are used instead of Euclidean distances. Here, the fused distance grade relies on the concept pf unrelated subspaces to recognition.

        Euclidean distance measurements are suitable for low dimensional spaces. Inefficient for multidimensional subspaces and categorical values. Along with this, Euclidean distance measurement avoids similarities between feature vectors and handles poorly. To solve the above problem, we use fused distance grade, instead of Euclidian distance measurement.

        Euclidean distance measure. Then we reduce the

        reconstruction error

        For each class I to get better results in the multidimensional


        labelled features in the feature space.

        So, feature vectors of the same class are grouped near the

        The fused distance grade is mathematically

        expressed as

        class centroid, and the grouped class centroids are far away

        from each other. A Large loss function is a combination of two loss functions: an intra class loss function and an inter class

        = / (6)

        min , = 1,2, (7)

        loss function. The mathematical expression foe the loss function within the class is:

        The mathematical expression for the intra class loss function is given as

        Where is the fused distance grade, it is a

        combination of associated distance and unrelated distance

        metric .

        2 and are mathematically expressed as

        = =1 2


        Where m is the size of the face image in grayscale, is

        = (8)

        the test image of face class, and

        centroid belonging to .

        is the feature

        = (9)

        By reducing the internal loss of the LCDRC algorithm, labelled samples of similar class approach that gets nearer to class centroid of the LCDRC algorithm. Increases the distance between the cluster centroids of each class to reduce the overlap of clustered feature vectors and reduce the similarity of each class.

        The mathematical expression for the internal loss function



        where, denotes related joint coefficient, and

        indicates unrelated joint coefficient. The combined distance

        metric is expressed in the above equation, and the

        reconstruction error computed by combined distance metric is mathematically presented in the below equation.

        The fused distance grade is expressed by the above

        equation and the reconstruction error is calculated using


        = 1



        + 1)


        the fused distance metric is mathematically expressed as the following equation.

        Where 1 makes the loss positive and the

        discrimination loss to improve perception. The loss of discrimination is mathematically expressed as

        =+ (3) Where is a constant used to balance and .

        = / = 1,2, . . (10)


        In this, we experiment on proposed CNN-GWO based DLCDRC algorithm on three data sets namely YALE B, ORL and Extended YALE B in terms of mean Recognition Accuracy

        Recognition accuracy is a parameter of the CNNGWO- based modified DLCDRC algorithm. The mathematical improve the performance of the LCDRC algorithm in face expression for recognition accuracy has the form

        1. Analysis of proposed CNN-GWO based modified DLCDRC algorithm on ORL data set.

          The performance of CNN-GWO based modified DLCDRC is compared with conventional LCDRC, which is the CNN-LCDRC training data set.

          In the table below, the training number represents the ratio of the training data to testing data for input facial image.

          Here Table 1 shows the verified average accuracy of the

          Recognition Accuracy = +


          * 100 (11)

          proposed method, and Figure 4 shows the CNN-GWO based modified DLCDRC as graph.

          Where, TP represents true Positive, TN describes True Negative denotes False Positive indicates False Negative.

          The results show that CNN efficiently extracts the

          features from ORL database face inputs, uses optimized weights in GWO and then efficiently

          Table 1. Performance of the CNN-GWO modified DLCCRC on ORL Database in the form of average Recognition Accuracy (%)


          Performance Analysis-Average Recognition Accuracy (%)

          Training Number

















          Proposed Method








          Figure 4: Graphical representation of proposed CNN-GWO based modified

        2. Performance Analysis on YALE B dataset

        In the next section, we present the performance analysis of the CNNGWO based modified DLCDRC method on the YALE B dataset.

        For moment, the ratio 60% training and 40% testing data

        i.e. training number 6, the proposed CNN-GWO based DLCDRC method attains 89.78% of mean accuracy, where

        CNN-LCDRC method attained 86,30% of average accuracy. The cause for attaining a better performance than CNN- LCDRC and LCDRC method is the utilization of CNN as feature extractor is used for getting the optimized weights and DLCDRC as classifier. Table 2 come up with the comparative study of CNN-LCDRC with Proposed CNN-GWO based modified DLCDRC in terms of average accuracy and Figure 5 shows graphical presentation of CNN-GWO based modified DLCDRC method.

        Table 2. Performance of the CNN-GWO modified DLCCRC on ORL Database in the form of average Recognition Accuracy (%)



        Performance Analysis-Average Recognition Accuracy (%)


        Training Number

        YALE B
















        Proposed Method







        Figure 5: Graphical representation of Proposed CNN-GWO based DLCDRC algorithm in terms of mean Accuracy

        C. Analysis of proposed CNN-GWO based modified DLCDRC algorithm on Extended YALE B data set.

        In this section, the experiment is carried out on extended YALE B data base for getting performance of CNN-GWO based modified DLCDRC method and CNN-LCDRC method.

        The validated results are tabulated in Table 3 and graphical representation is illustrated in Figure6 depending on mean recognition accuracy. For moment, CNN-GWO based modified DLCRC attained only 86.78% of mean accuracy for a training number 6(i.e. 60% of training data and 40% of testing data). For the same data 82.75% in terms of average accuracy attained on CNN-LCDRC method.

        Table 3. Performance of the CNN-GWO modified DLCCRC on ORL Database in the form of average Recognition Accuracy (%)

        Performance Analysis-Average Recognition Accuracy (%)


        Training Number

        Extended YALE B
















        Proposed Method








        Figure 6: Graphical representation of Proposed CNN-GWO based DLCDRC algorithm in terms of mean Accuracy

        From the above experiments it clearly proved that the proposed CNN-GWO based modified DLCDRC method attained better performance in terms of mean accuracy over CNN-LCDRC method. The cause for high performance is due to the effective feature extraction by CNN with a smaller number of layers and weight selection for classification by GWO and the effective utilization of loss function and fused distance grade in DLCDRC classifier.


In this research paper, a novel approach for face recognition is proposed by combining the three methods. Firstly, the CNN with minimum number of levels are used for the extracting the discriminant and effective features. [1] Moussa, Mourad, Maha Hmila, and Ali Douik. "Face recognition using fractional coefficients and discrete cosine transform tool." International

Journal of Electrical and Computer Engineering 11, no. 1 (2021): 892.

Secondly in order to get the optimized weights Grey Wolf Optimizer is used. Finally, the DLCDRC classifier is us as a classifier. The DLCDRC uses two steps. The deep loss function and the Combined distance metric. With the help of discriminant and effective features, optimized weights forwarded by GWO, and deep loss function in the DLCDRC, recognition can be made more effective by increasing the CBCRE simultaneously decreasing the WCRE. Along with this by using the combined distance metric the reconstruction error can be minimized in order to enhance the efficiency of face recognition.


[2] Turk, M., A. Pentland, P. Belhumeur, and J. Hespanh. "Eigenfaces for recognition: Journal of cognitive neuroscience ." (1991): 71-86. Sibanda,

[3] W. and Pretorius, P., 2011. Novel application of Multi-Layer Perceptrons (MLP) neural networks to model HIV in South Africa using Seroprevalence data from antenatal clinics. International Journal of Computer Applications, 35(5), pp.26-31.

[4] Wiskott, Laurenz, Norbert Krüger, N. Kuiger, and Christoph Von Der Malsburg. "Face recognition by elastic bunch graph matching." IEEE Transactions on pattern analysis and machine intelligence 19, no. 7 (1997): 775-779.

[5] Othman, Hisham, and Tyseer Aboulnasr. "A separable low complexity 2D HMM with application to face recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence 25, no. 10 (2003): 1229-


[6] Guo, Shanshan, Shiyu Chen, and Yanjie Li. "Face recognition based on convolutional neural network and support vector machine." In 2016 IEEE International conference on Information and Automation (ICIA), pp. 1787-1792. IEEE, 2016.

[7] Sharma, Reecha, and Manjeet Singh Patterh. "A new hybrid approach using PCA for pose invariant face recognition." Wireless Personal Communications 85, no. 3 (2015): 1561-1571.

[8] Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Deep learning face representation from predicting 10,000 classes." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1891- 1898. 2014.

[9] Li, Yunfei, Zhaoyang Lu, Jing Li, and Yanzi Deng. "Improving deep learning feature with facial texture feature for face recognition. " Wireless Personal Communications 103, no. 2 (2018): 1195-1206.

[10] Lu, Tao, Yingjie Guan, Yanduo Zhang, Shenming Qu, and Zixiang Xiong. "Robust and efficient face recognition via low-rank supported extreme learning machine. " Multimedia Tools and Applications 77, no. 9 (2018): 11219-11240.

[11] Umer, Saiyed, Bibhas Chandra Dhara, and Bhabatosh Chanda. "Face recognition using fusion of feature learning techniques." Measurement 146 (2019): 43-54.

[12] He, Mingjie, Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. "Deformable face net for pose invariant face recognition." Pattern Recognition 100 (2020): 107113.

[13] Wen, Yandong, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. "A discriminative feature learning approach for deep face recognition." In European conference on computer vision, pp. 499-515. Springer, Cham, 2016.

[14] Zaqout I, Al-Hanjori M (2018) An improved technique for face recognition applications. Information and Learning Science. DOI: 10.1108/ILS-03-2018-0023

[15] Luaibi MK, Mohammed FG (2019) Facial Recognition Based on DWT- HOG-PCA Features with MLP Classifier. Journal of Southwest Jiaotong University, 54.

[16] Dumitrescu CM, Dumitrache I (2019) Combining deep learning technologies with multi-level Gabor features for facial recognition in biometric automated systems. Studies in Informatics and Control, 28:221-230. DOI:10.24846/v28i2y201910

[17] Lu, Yuwu, Xiaozhao Fang, and Binglei Xie. "Kernel linear regression for face recognition. " Neural Computing and Applications 24, no. 7 (2014): 1843-1849.

[18] Peng, Yali, Jingcheng Ke, Shigang Liu, Jun Li, and Tao Lei. "An improvement to linear regression classification for face recognition." International Journal of Machine Learning and Cybernetics 10, no. 9 (2019): 2229-2243.

[19] Huang, Shih-Ming, and Jar-Ferr Yang. "Linear discriminant regression classification for face recognition. " IEEE Signal Processing Letters 20, no. 1 (2012): 91-94.

[20] Huang, Pu, Zhihui Lai, Guangwei Gao, Geng Yang, and Zhangjing Yang. "Adaptive linear discriminant regression classification for face recognition. " Digital Signal Processing 55 (2016): 78-84.


BINARY PATTERN. " Journal of Theoretical & Applied Information Technology 95, no. 16 (2017).

[22] Qu, Xiaochao, Suah Kim, Run Cui, and Hyoung Joong Kim. "Linear collaborative discriminant regression classification for face recognition. " Journal of Visual Communication and Image Representation 31 (2015): 312-319.