Face Mask Detection using Deep Learning and Computer Vision

Download Full-Text PDF Cite this Publication

Text Only Version

Face Mask Detection using Deep Learning and Computer Vision

Swetha Mohan1, Ankit Kumar2 , Abinash Kushwaha3

School of Computer Science Engineering, Vellore Institute of Technology, Vellore

Abstract – Wearing a mask is among the non-pharmaceutical measures that can be used to cut the primary source of COVID droplets expelled by an infected individual. To contribute towards communal health, this project aims to devise a highly accurate and real-time technique that can efficiently detect non-mask faces in public and thus, enforce them to wear masks. Although numerous researchers have committed efforts in designing efficient algorithms for face detection and recognition, there exists an essential difference between detection of the face under mask and detection of mask over face.

As per available literature, very little body of research has attempted to detect masks over face. Thus, this work aims to develop techniques that can accurately detect masks over the face in public. Further, it is not easy to detect faces with/without a mask in public as the dataset available for detecting masks on human faces is relatively small leading to the hard training of the model. So, the concept of transfer learning is used here to transfer the learned kernels from networks trained for a similar face detection task on an extensive dataset. The dataset covers various face images including faces with masks, faces without masks, faces with and without masks in one image and confusing images without masks.

Keywords – Transfer learning, kernels, COVID droplets, training of models, mask over face.


    With the reopening of countries from COVID-19 lockdown, Government and Public health agencies are recommending face masks as essential measures to keep us safe when venturing into public to curtail the spread of Coronavirus and thereby contributing to public healthcare. Regardless of discourse on medical resources and diversities in masks, all countries are mandating coverings over the nose and mouth in public.

    To mandate the use of facemasks, it becomes essential to devise some techniques that enforce individuals to apply a mask before exposure to public places. This application can be very useful in public areas such as airports, railway stations, crowded markets, malls, etc. The proposed method used here is carried out in two steps. The first step is to train the face mask detector using transfer learning. The second step is to use this trained face mask detector on images or videos of people to identify if they are wearing a mask.


    MobileNet Mask Model

    All governments around the world are struggling against COVID-19, which causes serious health crises. Therefore, the use of face masks regulatory can slow down the high spread of this virus. Dey proposed a deep learning-based model for detecting face mask. This model named MobileNet Mask is multiphase. A pretrained model of the ResNet-10 architecture is utilized to find faces in video stream. Also, numerous steps are used such as charging the classifier (MobileNet), building the FC layer, and testing phase. All the experimental cases are supervised on Google Colab that runs in the cloud and is provided with over 12GB of RAM. Different performance metrics (accuracy, F1-score, precision, and recall) are used to judge the performance of the proposed model.

    ResNet-50 with YOLO-V2 Model

    Annotating and localizing medical face masks in real-life images is among the most important object detection applications. In this context, the main objective of Loey is to explain and delimit the objectives of the medical face masks, especially in real images. They proposed a model consisting of two steps: medical face masks and feature extraction.

    The two public datasets of medical face masks are merged in one dataset to be explored in their research. The first one is Medical Masks Dataset (MMD). It contains 682 images with more than 3000 faces wearing masks. The second one is Face Mask Dataset (FMD), which contains 853 images. Combining both datasets resulted in a dataset of just 1415 pictures after deleting bad quality pictures.

    Deep Learning Tools and CXR Image-Based COVID-19 Detection

    Radiography is a technique used to quantify the functional and structural consequences of chest diseases, to provide high-resolution images on disease progression. Several works have been carried out in this context. Echtioui proposed a new CNN-based method for COVID-19 recognition, through analysing radiographic images of a patients lungs. The aim of this scheme is to provide clinical decision support for healthcare workers and also for researchers. Hence, performance results, as well as the accuracy value of about 91.34%, and the other metrics in terms of recall, precision, and F1-score, prove the efficiency of the method. In the same context, Ozturk introduced a new automatic COVID-19 detection model

    using CXR images denoted by the DarkCovidNet. It is used to provide correct diagnosis for both a binary classification (COVID-19 VS no findings) and a multiclass classification (COVID-19 VS pneumonia VS no findings).

    Deep Learning Tools and CT Image-Based COVID-19 Detection

    Computed tomography scan or CT scan is a medical imaging technique utilized in radiology in order to get detailed images of the body for diagnosis purposes. Accurate and fast COVID-19 screening is achievable using CT scan images. Various works have been carried out in this context. Shah proposed distinct deep learning techniques to differentiate CT scan images of both COVID- 19 and non-COVID-19, which helps in diagnosis. In the dataset, we find 349 images corresponding to patients with COVID-19 and 463 images corresponding to patients without COVID-19. These images were divided into three sets: 80% of them for training set, 10% for validation, and 10% for testing.

    Methods Using CXR and CT Images

    Combining two types of images in one dataset is an effective method to detect a disease. Sedik presented two deep learning models: CNN and ConvLSTM. To simulate the models, two datasets are assumed. The first dataset includes CT images while the second set includes X-ray images. Each dataset contains COVID-19 and non- COVID-19 image categories. The image categories, COVID-19 and pneumonia, were classified to certify the proposed models.

    The first model based on CNN includes five convolutional layers (CNVLs) accompanied by five pooling layers (PLs). Two layers (fully connected layer (FC) and classification layer) make up the classification network. The second model is a hybrid one. It combines ConvLSTM and CNN at the same time.

    Related Work

    Machine learning: Machine learning is a method of teaching prediction based on some data. It is a branch of artificial intelligence. Which numerically improves on data over as more data as add in algorithm the performance of the system is improved. These are the three types of machine learning:

    Supervised learning: In supervised learning we have several data points or samples described using predictive variables or features and the target variable our data is represented in table structure. Game supervised learning is build a model its able to predict the target variable

    Unsupervised learning is a machine learning task of uncovering hidden patterns from unlabeled data.

    Reinforcement learning (RL) in which machine or software agents interact with an environment reinforcement learning agent can automatically figure out how to optimize their behavior given a system of reward and punishments reinforcementlearning draws inspiration from behavioral psychology.

    Computer Vision: It is a field that includes processing, analyzing and understanding images in general high dimensional data from the real world in order to produce numerical and symbolic information or it is a technology of science and machines that see it obtain information from images.

    Deep Learning: Deep learning is a powerful set of techniques for learning using neural networks. Neural networks are a beautiful biologically inspired programming paradigm which enables a computer to learn from data. These are learning algorithms.


    Phases and individual steps for building a COVID-19 face mask detector with computer vision and deep learning using Python, OpenCV, and TensorFlow/Keras.

    In order to train a custom face mask detector, we need to break our project into two distinct phases, each with its own respective sub-steps:

    Training: Here well focus on loading our face mask detection dataset from disk, training a model (using Keras/TensorFlow) on this dataset, and then serializing the face mask detector to disk.

    Deployment: Once the face mask detector is trained, we can then move on to loading the mask detector, performing face detection, and then classifying each face as with_mask or without_mask

    COVID-19 face mask detection dataset

    This dataset consists of 1,376 images belonging to two classes: with_mask: 690 images, without_mask: 686 image


    Our goal is to train a custom deep learning model to detect whether a person is or is not wearing a mask.


    Python3, OpenCV, Keras, TensorFlow.


5. After iterating through each frame, we shall be able to get the output video with the results we wanted.


Dataset Used :

With mask


  1. Generate your own annotation file and class names file: Row Format: image_file_path box 1, box2, … boxN.

    Box format: x_min, y_min, x_max, y_max, class_id

  2. Convert the pre-trained weights to .p format as required by Keras.

  3. Freeze all layers except the final layers and train for some epochs until Plateau (no improvement stage) is reached.

  4. Unfreeze all the layers and train all the weights while continuously reducing the learning rate until again plateau is reached.

  5. End

Testing using OpenCV:

  1. Capture the video through webcam or any saved video testing file.

  2. Pass each frame of video or captured frame from webcam through the model.

  3. Get the boxes, scores and classes obtained as output and draw the boxes on the frame accordingly (colour depending on the class of each box).

  4. Display a total number of boxes of both classes at the bottom of the screen and record them in a variable.

    Training the model:

    Without mask

    Our model gave 98% accuracy for Face Mask Detection after training via tensorflow-gpu==2.5.0.

    We used our own images to verify the working of the custom deep learning model to detect whether a person is or is not wearing a face mask.

    without mask with mask


      Due to the urgency of controlling COVID-19, the application value and importance of real-time mask and social distancing detection are increasing. Our face mask detector doesn't use any morphed masked images dataset and the model is accurate. It is computationally efficient, thus making it easier to deploy the model to embedded systems (Raspberry Pi, Google Coral, etc.).

      This system can therefore be used in real-time applications which require face-mask detection for safety purposes due to the outbreak of Covid-19. This project can be integrated with embedded systems for application in airports, railway stations, offices, schools, and public places to ensure that public safety guidelines are followed.


    1. Alexey Bochkovskiy, Chien-Yao Wang and Hong-Yuan Mark Liao, "Yolov4: Optimal speed and accuracy of object detection", 2020.

    2. Narinder Singh Punn, Sanjay Kumar Sonbhadra and Sonali Agarwal, "Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and Deepsort techniques", 2020.

    3. Mahdi Rezaei and Mohsen Azarmi, "Deepsocial: Social distancing monitoring and infection risk assessment in covid-19 pandemic", Applied Sciences, vol. 10, no. 21, pp. 7514, 2020.

    4. Farooque Hassan Kumbhar, Syed Ali Hassan and Soo Young Shin, "New Normal: Cooperative Paradigm for Covid-19 Timely Detection and Containment using Internet of Things and Deep Learning", 2020.

    5. Enoch Arulprakash and Martin Aruldoss, "A study on fight against COVID-19 from latest technological intervention", SN Computer Science, vol. 1, no. 5, pp. 1-3, 2020.

    6. Dongfang Yang et al., "A vision-based social distancing and critical density detection system for covid-19", pp. 24-25, 2020.

    7. Zhanchao Huang, Jianlin Wang, Xuesong Fu, Tao Yu, Yongqi Guo and Rutong Wang, "DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection", Information Sciences, vol. 522, pp. 241-258, 2020, ISSN 002- 0255.

    8. Jiahui Yu et al., "Unitbox: An advanced object detection network", Proceedings of the 24th ACM international conference on Multimedia, 2016.

    9. Z. Zhao, P. Zheng, S. Xu and X. Wu, "Object Detection With Deep Learning: A Review", IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212-3232, Nov. 2019.

    10. Mohammad Javad Shafiee et al., "Fast YOLO: A fast you only look once system for real-time embedded object detection in video", 2017.

Leave a Reply

Your email address will not be published. Required fields are marked *