Face Smart Home Door Lock System Using AI & Deep Learning

DOI : 10.17577/IJERTCONV12IS03077

Download Full-Text PDF Cite this Publication

Text Only Version

Face Smart Home Door Lock System Using AI & Deep Learning

P. M. Manochitra

S. Sakthivel

K. Kaleeswaran

Assistant Professor



Computer Science and Engineering

Computer Science and Engineering

Computer Science and Engineering

Shree Venkateshwara Hi-Tech

Shree Venkateshwara Hi-Tech

Shree Venkateshwara Hi-Tech

Engineering College

Engineering College

Engineering College

Gobi, Erode

Gobi, Erode

Gobi, Erode




S.Naveen Kumar Student

Computer Science and Engineering.

Shree Venkateshwara Hi-Tech Engineering College

Gobi, Erode


Abstract Security is at most concern for anyone nowadays,

whether it's data security or Security of their own home. Face recognition system is broadly used for human identification because of its capacity to measure facial points and recognize the identity in an unobtrusive way. Deep convolution neural networks have been successfully applied to face Detection recently. This project presents a face detection and identification method based on improved Mask R-CNN, named G-Mask, which incorporates face detection and recognition Into one framework aiming to obtain more fine- grained information of face. Furthermore, the system employs a deep learning-based anomaly detection mechanism to detect and alert users of any suspicious activities or intrusions. By continuously analyzing patterns of behavior and monitoring the surroundings, the system can identify anomalies such as unusual movements or unfamiliar faces, and promptly notify homeowners through real-time alerts. This proactive approach to security enables homeowners to take appropriate actions to mitigate potential threats and safeguard their property.In addition to its security features, the Face Smart Home Door Lock System offers seamless integration with other smart home devices and platforms.

Through APIs and interoperability protocols, the system

can communicate with various IoT devices such as smart lights, cameras, and alarms, allowing for centralized control and management of the entire smart home ecosystem. This integration enhances the overall user experience and enables

homeowners to create personalized automation routines tailored to their preferences and lifestyle.Moreover, the system prioritizes user privacy and data security by implementing robust encryption techniques and adhering to strict privacy regulations. All sensitive user data, including facial images and access logs, are securely stored and encrypted to prevent unauthorized access or tampering. Additionally, the system provides users with granular control over their data and privacy settings, empowering them to make informed decisions about how their information is used and shared.In conclusion, the Face Smart Home Door Lock System represents a sophisticated yet user-friendly solution for modern home security. By harnessing the power of AI and deep learning, the system offers unparalleled accuracy, reliability, and intelligence in protecting homes and ensuring peace of mind for homeowners. With its advanced features, seamless integration, and commitment to privacy and security, the system sets a new standard for smart home security solutions in the digital age.

Keywords Facial recognition, Smart lock, Biometric security, Artificial intelligence, deep learning Access control, Home automation, Face detection, Security system, Keyless entry, Authentication, Facial biometrics, Machine learning, IOT (InternetofThings), Facial feature extraction.


    The application of Articial Intelligence (AI) has upgraded di erent aspects of our lives. Banking [1], healthcare [2], education [3], agriculture [4], industrial automation [5], transportation [6], and many di erent sectors are leveraging AI technology to reduce human intervention and automating services. The application of AI in cybersecurity and its e ectiveness is worth attention as well [7]. However, the application of AI in physical security is still an under-explored area. This paper explores the e ectiveness of applying AI in strengthening front-door security through an innovative algorithm named FDS. This paper is about an innovation to automate front-door security systems to replace the necessity of constant human intervention. The proposed algorithm mimics human-like intelligence to recognize gun violence and attempt to break the front door by hitting, kicking, or punching. And it alerts the residents like a loyal security guard. In this paper, we created a hybrid network combining GoogleNet and BiLSTM network to introduce human-like intelligence in a front- door video surveillance system. The GoogleNet, a 22- layers deep Convolutional Neural Network (CNN), developed by a group of researchers of Google [8], can extract the image features the way the human visual cortex does [9]. Bidirectional Long Short Term Memory (BiLSTM) can learn these features to classify YouTube videos [10], identify human sentiments [11], and recognize human activities [12]. That means a combination of these two networks exhibits the potential to extract video features from video streams and identify human activities from them. This functionality is the heart of the FDS algorithm proposed in this paper.

    We were under the impression of e ortlessly building

    an automatic intelligent front-door security system. However, the reality is di erent, lled with challenges and diculties. The challenges we faced, solutions we discovered, and the problems we overcame have been presented, analyzed, and discussed in this paper. Our ndings, limitations, and solutions are worth recognition as novel contributions because it paves the path to ensure an automatic intelligent security system for everyone at an a ordable cost. False alarm prevention is one of the challenges of intelligent surveillance systems [13]. The higher the accuracy of human activity detection, the lower the probability of having false alarms [14]. However, apart from accuracy, there are some other context-specic factors that govern the false alarm rate [15]. The methodology discussed in this paper recognizes human activities from live video feed with acceptable accuracy and solves the false alarm problem. The core contributions of this paper are listed as follows:

    • An innovative hybrid network design for HAR-based security system development with an average accuracy of 73.18%.

    • Development of a novel algorithm using CNN-BiLSTM combination in intelligent surveillance.

    • A cost-e ective solution with a nominal upfront cost without any subscription fee to make automatic and intelligent security a ordable for everyone.

    The rest of the paper has been organized into seven sections. The second section contains the literature review. The methodology has been presented in the third section of this paper. The methodology is further divided into two more subsections – Dataset and Network architecture. The fourth section of this paper demonstrates the experimental results and performance evaluation. The real-worl implementation and its analysis have been presented in the fth section. We have discussed the limitations and future scope of the proposed system in the sixth section. Finally, the paper has been concluded in the seventh section.

  2. Literature Review

    1. Smart Application For Front Door Security

    2. Sarp et al. used a Raspberry Pi-based video surveillance system to ensure front door security through two features – video feed and communication. In their system, the users can monitor the activities in front of the door remotely and also communicate with someone at the front door. They further connected the door through a cellular network to access the functionality in real-time through the internet [18]. While this approach e ectively ensures front door security, it has a drawback. And the drawback is the necessity of manual inspection. The proposed FDS system does not require human intervention to monitor the front door security. It is a fully automatic system that identies the activities of the individuals at the front door and alerts the homeowner if anything suspicious happens.

    A home monitoring system based on ESP32, published by

    R. C. Aldawira et al., shows the application of IoT to ensure home security, including front door security. This system allows the users to monitor the activities happening inside remotely and outside the house and control the door lock. It also has a motion sensor to sense any motion and alert the users. Moreover, it has a touch sensor that is used to identify human touch on the door knob [19]. These multiple features make the home more secure. However, the system does not use human-like intelligence. Because of using motion and touch sensors, the rate of false alarms is high, and it requires manual adjustment. Compared to this approach, the proposed FDS is more advanced as it uses CNN and recognizes activities as the human visual cortex does [20]. IoT-based home security systems [21], edge computer-based security systems [22], and intelligent warning-based security systems [23] are the common approaches to enhance the security of home. The literature review demonstrates a research gap in front-door security using a convolutional.

    B. Computer Vision-Based Har & Application

    Computer vision-based human activity recognition is the dominating technology in video analysis and its application in intelligent surveillance, autonomous vehicle, video analysis, video retrieval, and entertainment [24]. The review presented in this paper aligns without observation and methodology. For a front-door security algorithm, a computer vision based machine learning- centered approach is appropriate. V. Mazzia1 et al. developed a short-term posed-based human action recognition system. It achieved 90.86% accuracy with 227,000 parameters [25]. The accuracy of this paper is eye-catching, but the computational cost makes it expensive, which is not suitable for developing an a ordable security system using this methodology. A promising 93.89% accuracy was achieved by a DCNN- based framework with depth vision guided by Q. video datasets. However, it is dependent on the Microsoft Kinect camera. It also requires the Inertial Measurement Unit (IMU). These devices are expensive.


    The proposed ADS algorithm uses a GoogleNet- BiLSTM hybrid network as the classier. This hybrid network requires a video dataset. The video dataset selection criteria, dataset processing, network architecture, the working principle of the network with necessary mathematical interpretation, and the FDS algorithm have been described in this section.

    1. Dataset

      The quality and relevance of the dataset play a signicant role in the overall performance of the machine learning Units model, including the proposed network [27]. That is why selecting an appropriate dataset based on the criteria determined by the goals of an experiment is an essential step in CNN-based research. This subsection presents our process of selecting the most appropriate dataset and methods of processing it.

      FIG 1: Class Overlapping Among Experimenting Datasets.


        Predicting security breaches in real-time video streams is a broad eld of research. We have narrowed it down to a front-door security breach. Various activities are subject to CCTV footage analysis to predict security threats. However, activities that impose a threat at the front door are limited. The possible incidents at the front doors and their class names are listed in table 1. The target class names are the primary dataset selection criteria. The target datasets are the Human Activity Recognition (HAR) related dataset, which has a rich collection of punching, kicking, hitting, and shooting samples. Based on social observation, the activities presented in table 1 have increased gun violence. Another typical incident at the front door is hitting the door knob to break it to enter the house. Usually, burglars target empty houses and try to gain access by breaking the door. From this context, we have included the hitting on the door in the incident list. Anyone furious with the intention to physically hit someone usually punches or kicks the front door to express his anger. It is a common human nature. We have selected punching and kicking as target incidents considering these social phenomena.This paper explores the potential of the available HAR dataset instead of creating one for the experiment. There are mainly three criteria we analyzed while selecting the dataset. These criteria have been set from the context of ease of implementation of the proposed FDS. That is why the availability of activities listed in table 1 is a critical selection criterion. The secondary criterion is the similarity of the video of the selected activities with the front-door environment. And the third criterion is the feature richness of the available videos.


        The performance of Convolutional Neural Networks (CNNs) depends on both network architecture and training datasets [2]. We studied and analyzed six video datasets related to Human Activity Recognition (HAR) and listed them in table 2 [24]. We explored the datasets listed in table 2. It has been observed that there are overlapping classes among these datasets, which have be

        FIG 2: Videoclip Sequence Lengths Analysis.


        The HMDB51 datasets have been selected as the primary dataset for this paper. It is a Human Activity Recognition (HAR) dataset. The proposed FDS algorithm focuses on front-door security only. The activities which are considered threats at the front door are subsets of the HMDB51 dataset. This dataset contains videos of punching (p), kicking (k), hitting (h), and holding guns (g), along with 47 other classes. A model trained using the HMDB51 dataset is suitable for classifying the activities listed in table 1. The equation 1 denes the relation between the set of selected activities and the HMDB51 dataset.

        A = {x|x H, (x H) = {p, k, h, g}} (1) Here in equation 1, A is the set of target activities, and

        H is the set of activities available in the HMDB51 dataset.


      The video clips of the HMDB51 do not require any low- level ltering or improvement [34]. That means the dataset is ready to train the LSTM network. However, we have observed that some of the video clips are very lengthy. We analyzed the lengths using a histogram illustrated in gure 3.

      The histogram shows that there are few lengthy video clips. The lengthier the videos, the longer the network takes to learn. Training machine learning models using image datasets is time-consuming. It takes even longer for a video dataset. Longer period of training refers to the occupation of compuing resources for a longer period of time. At the same time, retraining the model for newer datasets becomes a serious issue as well. This experiment explored the opportunity to reduce the training time by limiting the length of observation from the learning curve so that the accuracy of the proposed hybrid model does not have any signicant change after two hundred minutes. However, the training process requires more iterations because of the video length. It is observable that the video length no longer positively impacts the models performance. The average training accuracy is 73.24%, which experiences ±0.06% deviation for epochs after two hundred minutes.


      The proposed network combines a pretrained Convolutional Neural Network (CNN), GoogleNet, and a Long Short Term Memory (LSTM) network.


        The inputs to the network are video feeds which are sequences of image frames that maintain specic temporal distribution [35]. The features of the video frames need to be extracted to train the network. However, the feature extraction delay of the network and the temporal distribution of the video frames are not the

        same. As a result, video feature extraction from the stream of the video frame directly is not a realistic approach. A study by A. George & A. Ravindran shows that the latency for machine vision can be controlled through approximate computing [36]. However, weve taken a much easier solution to reduce the computational cost. This problem has been handled using sequence .

        Considering the necessity of


        frequently retraining the network for the updated dataset, the


        I(xi, yi) = t=1 fr((xt, yt), t) (2)


        The folded sequences contain video features. The performance of these pretrained networks has been demonstrated in the Result and Performance section. Despite the acceptable performance, we excluded VGG- 19 and AlexNet for their size. The SqueezeNet is very lightweight and has signicantly fewer learnable parameters. However, it lowers the accuracy of the overall network. GoogleNet outperforms Squeezenet and ResNet50. The di erence in the nal classication accuracy for VGG-19, AlexNet, and GoogleNet is almost identical. That is why, considering everything, we used GoogleNet in algorithm 1 to extract the feature vectors.

        TABLE 1: The pretrained CNN experimented with in this paper.



        Size ( MB )

        Input Size

        AlexNet [38]




















        Algorithm 1 Frames to Feature Vectors Input: GoogleNet, GN ; Dataset, Ds Output: Feature Vector Sequence, s; Start

        Is Size(Layers(1, GN ))

        L pooling(N = 5,K = 7 × 7, S = 1)

        F num(readles(Dataset))

        for i 1 : F do

        v readVideo(le(i))

        s(i, 1) activations(GN , v, L)

        end for

        save(s, FeatureVector) end

        The features are saved as feature vectors. Once the features

        are extracted, the folded sequences are unfolded by inverting the equation 2. The video frames and corresponding feature vectors are 2D-spatial signals. They are attened using a atten layer before sending to the LSTM network.


        The proposed Googlenet-BiLSTM hybrid network has been developed by combining part of the GoogleNet and a BiL STM network. The GoogleNet has been used as a feature extractor, and the BiLSTM network is responsible for the classication. GoogleNet is a 22-layer deep convolutional neural network. The 19th layer of the GoogleNet is an average pooling layer with a size of 7 ×

        7. This layer passes the extracted features to the subsequent Fully Connected (FC) layer [42]. However, the proposed hybrid network The classication layer classies the input video into one of the four classes – Punch, Kick, Hit, or Shoot.


        We split the dataset into 80:10:10 for training, testing, and validation, respectively. We used the mini-batch method with a size 16 for each batch. The videos of each batch are internally shued in every iteration. Along with shuing, we used kfold cross-validation at k = 4.

        We experimented with three optimization algorithms – Adaptive Gradient Algorithm (AdaGrad) [45], Root Mean Squared Propagation (RMSProp) [46], and Adaptive Moment Estimation (ADAM) [47] dened in equation 3, 4, and 5, respectively. We analyzed the validation loss curve with respect to the number of iterations illustrated in gure

        7. It shows that the ADAM performs better than AdaGrad and RMSProp algorithms. The validation loss rapidly reduces up to 120 iterations for each optimization algorithm. After that, the loss reduces gradually to 1000 iterations. The validation loss is the lowest for the ADAM. That is why it has been used in our network. The training progress is illustrated in gure 8.

        The validation accuracy during the training progress increases rapidly till 120th iterations. After that, the rate of increment of validation accuracy reduces. However, it does not stop, and it keeps increasing till 1000th iterations. Similar but inverse nature is observed for the validation loss. It rapidly reduces till the 120th iteration. After that, the rate of change gradually reduces. The training process completes after 1000 iterations with 81.16% validation accuracy. The training process takes 197 minutes and 15 seconds with a 0.0001 learning rate.


        The FDS written in algorithm 2 uses the trained BiLSTM network to classify the four actions mentioned in table 1. This algorithm reads the real- time video stream. When the camera is active, it reads the frames. Ifa signicant di erence exists between two successive

        frames, The thresholds of the proposed FDS algorithm play signicant roles in the overall performance. We experimented with threshold values between 0 to 1 with

        0.1 increments. It has been observed that a threshold below 0.6 impacts the algorithm's performance for every class. In this range, the average accuracy is 34.10% which is not feasible for a security system. However, there is a signicant improvement within the threshold range of 0.6 to 0.95. The analysis in between this range has been presented in gure 9. It is evident that

        Algorithm 2 The FDS Algorithm Input:

        CCTV Video Stream, vs;Output:

        Alert, a;


        i 0

        F[i] read(vs)

        while vs = True do

        i i + 1

        F[i] read(vs)

        c compare(F[i 1], F[i])

        if c 0.5 then

        [p,s] LSTM[F[i]]

        if p == Punch & s 0.70 then

        a DoorPunch

        else if p == Kick & s 0.72 then

        a DoorKick

        else if p == Hit & s 0.65 then

        a DoorHit

        else if p == Shoot & s 0.85 then

        a DoorShooting

        end if




        end if end while end

  4. Results And Performance Evaluation

    The performance of the FDS algorithm depends on the accuracy of the proposed GoogleNet-BiLSTM hybrid network. As it is Deep Learning (DL) approach, we used state-of-the art machine learning performance evaluation metrics [48] to assess the performance of the proposed network first. After that, we analyzed the performance among different models. This performance cmparison demonstrates the superiority of the proposed methodology. After that, we performed another experiment with different datasets. The purpose of this third experiment is to analyze the robustness of the system. The proposed model has been trained with the HMDB51 dataset. However, we have used videos from all datasets mentioned in table 2

  5. Real World Implementation

    The performance of the proposed front door security system in the laboratory experiment is satisfactory. However, there are always some di erences between laboratory and real world scenarios. We have implemented the security system at the front door of an apartment, which is illustrated in gure 12. We used a Logitech C270 HD webcam. It has been mounted at the top of the door at 45o viewing angle

    FIG 3: The Camera Mounted Front Door.

  6. Limitation And Future Scope

    No computing system is immune to limitations. The proposed FDS is not an exception. Despite good performance in solving real-world problems, it has several limitations worth attention.


          One of the limitations of the proposed methodology is subject-camera angle sensitivity. It has been observed that if the subject is too close to the camera, that means the angle is more comparable to 70 degrees; the accuracy of the kicking and punching classes increases. However, the accuracy of the remaining classes decreases. When the angle is between 40 degrees to 60 degrees, the system generates the best result. The performance keeps degrading when the angle keeps increasing more than 60 degrees. The experimental results presented in this paper have been measured between 40 and 60 degrees. This limitation opens the opportunity to conduct further research and make the FDS more robust.


          Security systems are for both day and night. The proposed methodology has been experimented with in daylight when the sun is up and in oodlights at night. Experimenting with night-vision mode is another challenge that this paper has not addressed. However, subsequent research is in progress to explore the performance in night-vision mode. this system further. Instead of considering these as limitations.


    This paper deals with four security threats. There are

    many other potential risks at the front door which we could not include because of data availability. A custom dataset designed for the FDS algorithm would allow us to detect all of the common security threats at the front door. It paves the path to conduct more research, including new dataset creation and processing, discovering limitations and improving them for new classes, and handling computational complexities for additional classes.

    The limitations of the proposed FDS algorithm pave the path to develop this system further. Instead of considering these as limitations, the researchers of this project consider them as opportunities to continue the research work and integrate more innovative features with it.

  7. Conclusion

Human Activity Recognition (HAR) has drawn attention from the research of wearable sensors and the computer vision scientic community. In this paper, we created a hybrid network combining state-of-the-art techniques found in current research trends. And our innovative approach is a potential solution to better front-door security. The advancement of research in HAR is eye-catching. However, it's able application in front door security is unexplored. There are expensive, and large-scale AI surveillance services available that use HAR technology to strengthen the security of large premises. However, these services require expensive infrastructure. The FDS algorithm we presented in this paper does not require additional equipment. Integrating the CCTV camera video stream or a simple webcam is enough to recognize the security threats with 73.1% accuracy with an optimized threshold to reduce the false alarm rate. The real-world implementation and its experimental results show the adaptability of the FDS system in strengthening the security at the front door. However, there are some limitations of the FDS algorithm alongside impressive performance. These limitations are further opportunities to improve the systems service quality and robustness to make intelligent front-door security a ordable and available for everyone.


  1. A. B. Malali and S. Gopalakrishnan, Application of articial intelligence and its powered technologies in the Indian banking and nancial industry: An overview, IOSR J. Hum. Social Sci., vol. 25, no. 4, pp. 5560, 2020.

  2. N. Faruqui, M. A. Yousuf, M. Whaiduzzaman, A.

    K. M. Azad, A. Barros, and M. A. Moni, LungNet: A hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensor- based medical IoT data, Comput. Biol. Med., vol. 139, Dec. 2021, Art. no. 104961.

  3. W. Xu and F. Ouyang, The application of AI technologies in STEM education: A systematic

    review from 2011 to 2021, Int. J. STEM Educ., vol. 9, no. 1, pp. 120, Sep. 2022.

  4. N. C. Eli-Chukwu, Applications of articial intelligence in agriculture: A review, Eng., Technol. Appl. Sci. Res., vol. 9, no. 4, pp. 43774383, 2019.

  5. S. Weibin, L. Yun, D. Yi, D. Yingguo, P. Mingbo, and X. Gang, Three real-time architecture of industrial automation based on edge

  6. R. Abduljabbar, H. Dia, S. Liyanage, and S. A. Bagloee, Applications of articial intelligence in transport: An overview, Sustainability, vol. 11, no. 1, p. 189, 2019.

  7. S. Laato, A. Farooq, H. Tenhunen, T. Pitkamaki, A. Hakkala, and A. Airola, AI in cybersecurity educationA systematic literature review of studies on cybersecurity MOOCs, in Proc. IEEE 20th Int. Conf. Adv. Learn. Technol. (ICALT), Jul. 2020, pp. 610.

  8. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 19.

  9. B. Tripp, Approximating the architecture of visual cortex in a con volutional network, Neural Comput., vol. 31, no. 8, pp. 15511591, Aug. 2019.

  10. K. Yousaf and T. Nawaz, A deep learning-based approach for inappropriate content detection and classication of YouTube videos, IEEE Access, vol. 10, pp. 1628316298, 2022.

  11. T Senthil Prakash, V CP, RB Dhumale, A Kiran., "Auto-metric graph neural network for paddy leaf disease classification" – Archives of Phytopathology and Plant Protection, 2023

  12. TS Prakash, AS Kumar, CRB Durai, S Ashok., "Enhanced Elman spike Neural network optimized with flamingo search optimization algorithm espoused lung cancer classification from CT images"

    – Biomedical Signal Processing and Control, 2023

  13. TS Prakash, SP Patnayakuni, S Shibu., "Municipal Solid Waste Prediction using Tree Hierarchical Deep Convolutional Neural Network Optimized with Balancing Composite Motion Optimization Algorithm" – Journal of Experimental & Theoretical Artificial 2023

  14. C Aswath, T Prakash, P Kumari, N Thakur, R Sharma., " Effect of Gamma Radiation on Pollen Viability and Pollen Germination of Marigold Cultivar" – Think India Journal, 2019

  15. Senthilkumar, R., et al. "Pearson Hashing B-Tree With Self Adaptive Random Key Elgamal Cryptography For Secured Data Storage And Comunication In Cloud." Webology 18.5 (2021):


  16. R.Senthilkumar, B. G. Geetha, (2020),Asymmetric Key Blum-Goldwasser Cryptography for Cloud Services Communication Security, Journal of Internet Technology, vol. 21, no. 4 , pp. 929-939

  17. Anusuya, D., R. Senthilkumar, and T. Senthil Prakash. "Evolutionary Feature Selection for big data processing using Map reduce and APSO." International Journal of Computational Research and Development (IJCRD) 1.2 (2017): 30-35

  18. Farhanath, K., Owais Farooqui, and K. Asique.

"Comparative Analysis of Deep Learning Models for PCB Defects Detection and Classification." Journal of Positive School Psychology 6.5 (2022).