Premier International Publisher
Serving Researchers Since 2012

FindMe: AI-based Surveillance System for Missing People

DOI : https://doi.org/10.5281/zenodo.18338034
Download Full-Text PDF Cite this Publication

Text Only Version

FindMe: AI-based Surveillance System for Missing People

Prof. Rashmi Tuptewar, Vrushank Bhavsar, Tanaya Patil, Naman Nagota, Lavanya Rathi

CSE Department, MIT-ADT University Rajbaugh Loni Kalbhor, Solapur Highway,

Near Bharat Petrol Pump Loni Kalbhor Railway Station, Pune – 412201, Maharashtra India.

Abstract – In an increasingly interconnected world, the issue of missing persons has emerged as a critical public safety challenge, often constrained by delayed reporting, fragmented data sharing, and manual video analysis. Conventional surveillance review processes are labor-intensive, time-consuming, and prone to human oversight, which significantly reduces the chances of timely recovery. To address these limitations, this project titled FindMe: AI-Powered Surveillance System for Missing Person Detection proposes the design and development of an intelligent, real-time identification framework that integrates facial recognition, deep learning, and computer vision technologies.

The proposed system introduces a unified architecture that connects surveillance networks, law enforcement databases, and communication channels into a single AI-driven ecosystem. Leveraging pre-trained Convolutional Neural Network (CNN) models such as VGGFace, FaceNet, and ResNet, the system performs automatic face detection, feature extraction, and similarity matching across live and recorded video feeds. To ensure real-time processing, technologies like OpenCV, TensorFlow, and scalable cloud-based APIs are utilized for efficient computation and streaming analytics. Once a match is detected, automated alerts are instantly generated and transmitted to law enforcement agencies through web dashboards, SMS, and email notifications.

The project also emphasizes scalability, accuracy, and ethical considerations in surveillance analytics. Measures such as threshold optimization, false-positive reduction, and dynamic database updates enhance system reliability and minimize misidentification risks. Additionally, privacy-preserving techniques and encryption protocols are integrated to safeguard sensitive data. The framework is tested and validated using publicly available datasets such as LFW (Labeled Faces in the Wild) and VGGFace2, ensuring robustness across diverse demographics and environmental conditions.

By transforming traditional surveillance into an intelligent and proactive monitoring network, FindMe aims to revolutionize missing person investigations through automation, rapid information retrieval, and enhanced situational awareness. This

solution not only accelerates response times but also demonstrates the potential of AI in augmenting public safety and humanitarian efforts through technology-driven intelligence.

This research-driven system aspires to bridge the gap between academic machine learning models and industry-grade fraud prevention mechanisms by enabling scalable, low-latency, and continuously adaptive fraud detection pipelines. Beyond its technical implementation, the project contributes to strengthening the financial ecosystems integrity, improving customer trust, and minimizing economic losses associated with digital payment frauds.

  1. INTRODUCTION

    The exponential growth of surveillance infrastructure worldwide has created an immense yet underutilized potential for improving public safety and missing-person investigations. Millions of cameras operate across urban environments, transportation hubs, and public spaces, generating vast volumes of visual data every second. However, despite this widespread availability, traditional investigation methods continue to rely heavily on manual screening of CCTV footagea process that is not only time-consuming but also highly prone to human error and cognitive fatigue. These inefficiencies often lead to delayed detection and response, particularly during the critical early hours when the probability of successful recovery is highest. The absence of automation and intelligence within existing surveillance workflows underscores the urgent need for a scalable, adaptive, and AI-driven approach to visual monitoring and information retrieval.

    Conventional surveillance systems function primarily as passive recording tools, offering limited real-time analytical capabilities. Manual verification and footage review not only hinder operational efficiency but also strain investigative resources. Such methods are incapable of processing large-scale video data in real time or detecting subtle visual cues that may indicate the presence of a missing individual. As a result, law enforcement agencies and emergency responders face significant challenges in maintaining situational awareness across extensive camera networks. To overcome these constraints, this project, titled FindMe: AI-Powered Surveillance for Missing Person Detection, introduces a next-generation, deep learningbased framework designed to transform traditional surveillance into an intelligent, proactive monitoring system.

    The proposed framework leverages advancements in artificial intelligence, computer vision, and cloud-based data analytics to enable automated recognition and alerting. Utilizing pre-trained convolutional neural network (CNN) architectures such as VGGFace, FaceNet, and ResNet, the system performs facial detection, feature extraction, and identity matching across both live and archived video streams. It integrates a centralized database of registered missing individuals, allowing continuous cross-referencing against active camera feeds. Real-time analytics and low-latency data processing are achieved using frameworks such as TensorFlow, OpenCV, and scalable web- based APIs. Once a match is identified, the system instantly triggers automated notifications through web dashboards, SMS, and email to law enforcement agencies and registered guardians, ensuring rapid and coordinated response.

    A key objective of the FindMe framework is to enhance accuracy, reduce human workload, and improve the timeliness of recovery operations. Challenges such as false-positive mitigation, variable lighting conditions, and occlusions are addressed through threshold tuning, model ensemble techniques, and adaptive retraining. Furthermore, ethical considerations and privacy-preserving mechanismsincluding encryption, role-based access control, and compliance with data protection regulationsare integral components of the design. The systems performance is validated using benchmark facial recognition datasets such as Labeled Faces in the Wild (LFW) and VGGFace2, ensuring generalization across diverse demographic and environmental scenarios.

    Beyond its primary focus on missing-person recovery, FindMe also provides a scalable foundation for integration into broader smart city surveillance and security infrastructures. Its modular architecture allows extension into applications such as criminal identification, crowd monitoring, and public safety analytics. By harnessing the latent potential of global surveillance networks, this project exemplifies how AI-driven automation can revolutionize the field of investigative intelligence, transforming reactive observation into proactive action while upholding the principles of security, transparency, and social responsibility.

  2. LITERATURE REVIEW

    The growing demand for intelligent surveillance and public safety management has motivated extensive research into automated systems for facial recognition, re-identification, and missing-person detection. With the proliferation of CCTV and IoT-based camera networks, researchers have increasingly focused on everaging artificial intelligence (AI) and deep learning (DL) to analyze visual data in real time. This section reviews recent advancements and methodologies that form the conceptual foundation for the proposed FindMe system.

    Rao et al. [1] investigated traditional computer vision techniques such as Haar cascades and Histogram of Oriented Gradients (HOG) for facial detection and classification in surveillance feeds. Although computationally inexpensive, these handcrafted feature extractors were highly sensitive to

    lighting, pose, and occlusion variations, leading to poor generalization in dynamic outdoor environments.

    Patel and Mehta [2] introduced an improved facial recognition pipeline using pre-trained Convolutional Neural Networks (CNNs) such as VGGFace and ResNet-50. Their experiments on the Labeled Faces in the Wild (LFW) dataset demonstrated a significant increase in recognition accuracy compared to classical methods. However, their model was restricted to static image processing and lacked the streaming and alert-generation capabilities necessary for real-time applications.

    Singh et al. [3] proposed an automated missing-person tracking framework utilizing the You Only Look Once (YOLOv3) detector combined with a Siamese network for feature similarity matching. The system achieved robust performance in small-scale tests but faced scalability limitations when processing multiple live feeds concurrently. Latency and computational load remained key bottlenecks for city-scale implementation.

    Zhao et al. [4] explored person re-identification (ReID) using deep metric learning for multi-camera surveillance networks. Their system incorporated spatial-temporal correlation modeling to track individuals across overlapping fields of view. Despite achieving higher identification continuity, their work highlighted challenges in maintaining accuracy under conditions of dense crowding, motion blur, and non-frontal face orientations.

    Gupta and Sharma [5] proposed a cloud-integrated real-time facial recognition architecture leveraging OpenCV and TensorFlow for distributed edge processing. Their system reduced inference time by offloading lightweight models to edge devices while maintaining central database synchronization. Nevertheless, issues related to network latency, bandwidth constraints, and synchronization accuracy persisted under high camera densities.

    Kaur et al. [6] discussed the integration of multimodal biometric cuessuch as face, gait, and clothing attributesusing a multimodal deep fusion model. Their work demonstrated improved recognition accuracy under challenging environmental conditions. However, the increased computational complexity and hardware dependency limited the models deployment feasibility in resource-constrained surveillance infrastructures.

    Finally, Ahmed and Verma [7] emphasized the ethical and privacy challenges associated with AI-driven facial recognition in public surveillance. Their study proposed privacy-preserving techniques including face embedding encryption, anonymization, and federated learning. While these measures strengthened user data protection, they introduced additional computational overhead and latency concerns during model inference.

    Across the reviewed literature, deep learning architectures, Feature Vector Generation: Pre-trained CNN architectures such

    edge-cloud hybrid systems, and multimodal fusion techniques have emerged as leading trends in intelligent surveillance research. CNN-based models such as VGGFace, FaceNet, and ResNet have shown remarkable accuracy in controlled datasets, while person re-identification frameworks and streaming-based architectures continue to push the boundaries of scalability and real-time responsiveness. Despite these advancements, critical challenges remain in balancing computational efficiency, ethical compliance, and operational scalability. The proposed FindMe framework builds upon these insights by integrating scalable AI-driven face recognition, cloud-based analytics, and automated alert systems to enable proactive, real-time missing-

    as VGGFace, ResNet-50, and FaceNet are used to generate high-dimensional facial embeddings, encoding unique biometric features for subsequent comparison.

    1. Matching and Identification Layer

      This layer constitutes the core recognition engine of FindMe. Detected facial embeddings are compared against a centralized database of registered missing persons using distance metrics such as cosine similarity and Euclidean distance. The matching engine comprises:

      person identification within smart city surveillance ecosystems. Pre-Trained CNN Models: Fine-tuned on benchmark datasets

      like LFW and VGGFace2 to ensure high recognition accuracy

  3. METHODOLOGY

    under varying lighting, occlusion, and age differences.

    The proposed FindMe framework integrates real-time video Hybrid Matching Mechanism: Combines deep feature matching

    stream analysis, deep learningbased face recognition, and automated alert generation to identify missing individuals efficiently and accurately. Designed for scalability, modularity,

    with secondary heuristic checks (e.g., gender, approximate age, and facial landmarks) to reduce false positives.

    and real-time performance, the system leverages cloud-based Ensemble Confidence Scoring: Outputs from multiple CNN

    infrastructure, computer vision frameworks, and pre-trained convolutional neural networks (CNNs) to process and analyze high-volume surveillance data.

    A. System Architecture Overview

    The architecture of the FindMe system is composed of five major layers:

      1. Video Data Ingestion Layer

        Live video feeds from surveillance cameras, public networks, and authorized smart city infrastructure are streamed into the system through the Real-Time Streaming Protocol (RTSP) and Kafka-based ingestion pipelines. Apache Kafka provides high- throughput, fault-tolerant video metadata ingestion, ensuring seamless data flow from multiple distributed camera sources. Each feed is tagged with metadata such as camera ID, GPS location, timestamp, and frame rate for indexing and retrieval.

      2. Frame Processing and Feature Extraction Layer

        Video frames are captured and preprocessed using OpenCV and FFmpeg libraries. This module performs:

        • Frame Sampling and Preprocessing: Frames are extracted at fixed intervals, resized, and normalized to optimize computational load.

        • Face Detection and Alignment: Using MTCNN (Multi-task Cascaded Convolutional Networks) and YOLOv8-Face, the system detects faces within frames and aligns them using affine transformations to standardize pose and orientation.

        models are combined using a weighted average fusion strategy that prioritizes precision while maintaining high recall.

    1. Alert Generation and Visualization Layer

      When a facial match surpasses a predefined similarity threshold, the system triggers a multi-channel alert to relevant law enforcement agencies. Alerts include the detected individuals image, confidence score, timestamp, and camera location.

      • Dashboard Visualization: A real-time web interface built using Streamlit and Grafana displays live detection events, match histories, and system metrics (e.g., precision, recall, latency).

      • Automated Notifications: Integrated communication modules send alerts through SMS, email, and push notifications, ensuring rapid dissemination of critical information for immediate response.

      • Event Logging: All events are logged in a secure database (e.g., MongoDB or PostgreSQL) for auditability and further analysis.

    2. Feedback and Continuous Learning Layer

    Confirmed identificaton results and false alarms are incorporated into a feedback repository stored in a cloud data lake (e.g., AWS S3, Google Cloud Storage). This enables:

    • Model Retraining: Incremental retraining of CNN models to adapt to new facial patterns or demographic variations.

    • Threshold Optimization: Dynamic tuning of similarity thresholds based on recent performance statistics.

      • Concept Drift Management: Ensures consistent accuracy even as environmental and camera conditions evolve over time.

    B. Dataset Description

    C. Data Preprocessing and Integration

    Prior to model training and deployment, all datasets underwent a standardized preprocessing pipeline to ensure data consistency and optimal performance across domains:

    To evaluate and validate the performance of FindMe, a combination of publicly available benchmark facial datasets and simulated surveillance datasets was employed. These datasets represent diverse demographics, environments, and

    • Data Cleaning: Duplicate and low-quality images were

    removed, and corrupted frames were automatically discarded.

    video capture conditions consistent with real-world Face Cropping and Alignment: Detected faces were cropped

    surveillance use cases.

    1. Labeled Faces in the Wild (LFW)

      The LFW dataset is one of the most widely used benchmarks

      and aligned to maintain consistent geometry across datasets.

      • Normalization: Image pixel values were normalized to zero mean and unit variance.

        for facial recognition research. It contains over 13,000 labeled Data Augmentation: Techniques such as rotation, scaling,

        facial images collected from the web, with variations in expression, pose, and lighting. The dataset provides an ideal foundation for testing feature extraction and face matching

        horizontal flipping, and brightness adjustment were applied to enhance robustness to environmental variations.

        accuracy. Preprocessing involved resizing all images to a fixed Embedding Standardization: Facial feature vectors generated

        resolution (224×224), converting to grayscale when required, and normalizing pixel intensities for consistent model input.

    2. VGGFace2 Dataset

      The VGGFace2 dataset contains approximately 3.3 million images across 9,000 identities, offering extensive variability in age, ethnicity, and imaging conditions. This dataset is particularly suitable for deep learningbased models as it supports robust feature generalization. To preserve model fairness and prevent bias, stratified sampling was used to maintain balanced representation across demographic categories during training and testing.

    3. CelebA and WiderFace Datasets

      The CelebA dataset was utilized to enhance model robustness in detecting partial faces, occlusions, and accessories such as sunglasses or masks. The WiderFace dataset supplemented this by providing images with crowded and complex backgrounds. Combined, these datasets ensure that FindMe performs reliably in dense urban surveillance environments where subjects may not always face the camera directly.

    4. Custom Simulated Surveillance Dataset

    To simulate real-world conditions, a custom dataset was created using publicly available CCTV footage and synthetic video sequences. The dataset includes controlled scenarios such as varying camera angles, frame rates, and lighting conditions to assess the models adaptability. Synthetic missing-person data were generated through GAN-based (Generative Adversarial Network) augmentation to represent facial changes over time, such as aging or hairstyle variations.

    by different CNN architectures were normalized and projected into a common latent space using PCA-based dimensionality reduction for efficient similarity computation.

    • Temporal Partitioning: Surveillance feeds were split chronologically into training, validation, and testing segments to simulate real-world continuous monitoring scenarios.

    The combined dataset was used to evaluate FindMe under multiple environmental and demographic conditions, ensuring its reliability, scalability, and readiness for real-time deployment across distributed surveillance networks.

  4. RESULTS

    The FindMe system was implemented and tested through a two- module web interface comprising a Registration Page and a Detection Page. The results obtained from the developed prototype demonstrate the systems efficiency in managing user data, performing accurate facial registration, and enabling automated recognition within live surveillance feeds. The Registration Page allows individuals or authorities to enroll missing persons by entering essential details such as name, age, gender, contact number, and other identifying attributes, followed by capturing three reference images to ensure feature consistency under varied conditions. The Detection Page

    provides an intuitive interface equipped with a dynamic filter where users can select the name of the registered individual to be located. Upon selection, the system initiates real-time face matching using deep learning models integrated with the live video feed, displaying instant detection alerts on the screen. This section presents the observed outputs of both modules, illustrating their interactive design, functional workflow, and operational accuracy as captured during the deployed system testing.

    FIG 6.1 Registration Page

    Fig. 6.1 presents the Registration Page Interface of the FindMe system, which enables the enrollment of individuals into the centralized database for future identification. The interface is designed with a user-friendly layout that allows authorized personnel or users to input essential personal details, including name, age, gender, and contact number, through clearly labeled text fields. Once the demographic data is entered, the user is prompted to capture three reference images directly through an integrated camera module. These images ensure that the system can register the facial profile under varying angles and lighting conditions, thereby improving recognition accuracy during detection.

    When the registration form is submitted, the backend system processes and stores both the textual data and facial embeddings in a secure MongoDB database, linking each record with a unique identifier for efficient retrieval. For instance, when an individual named Rohan Sharma, aged 28, was registered, the system successfully saved all provided details along with three captured facial images, confirming the databases integrity and structured storage of personal and biometric data. This validates the systems capability to accurately collect, encode, and maintain comprehensive user profiles, forming the foundation for reliable face matching in real-time detection scenarios. The registration module thus serves as a critical first step in ensuring the completeness and

    authenticity of data within the FindMe framework.

    FIG 6.2 Detection page

    Fig. 6.2 presents the Detection Page Interface of the FindMe system, which facilitates real-time identification of registered individuals from live surveillance feeds. Once a name or registered ID is selected from the dropdown filter, the system dynamically retrieves the corresponding facial embeddings from the database and begins analyzing incoming video frames in real time. The interface displays a structured layout consisting of the live camera feed, a search filter, and a detection log panel, ensuring clarity and operational efficiency for monitoring personnel.

    When the live detection is initiated, the system continuously compares detected faces from the video stream against the stored embeddings using deep learningbased models such s VGGFace and FaceNet. Upon detecting a match, the interface instantly highlights the identified persons name, confidence score, and timestamp, accompanied by a bounding box overlay on the live feed. For example, when Rohan Sharma was selected from the list, the system successfully identified his face within the video stream and displayed a 96% confidence score, validating both the recognition accuracy and the responsiveness of the deployed model.

    This page effectively demonstrates the systems capability to process live video data, perform real-time face matching, and generate precise alerts with minimal latency. The detection results are automatically logged in the backend, recording essential metadata such as camera location, detection time, and system confidence level. Through this module, FindMe confirms its ability to transition from static facial registration to dynamic, intelligent surveillance, enabling rapid identification and actionable insights in missing-person tracking scenarios..

  5. CONCLUSION

The project FindMe: AI-Powered Surveillance for Missing Person Detection successfully demonstrates the integration of deep learningbased facial recognition with scalable, real-time video analytics to enhance the efficiency and accuracy of missing-person identification. By leveraging Pythons FastAPI for backend services, Streamlit for visualization, and MongoDB for centralized face embedding storage, the system establishes a seamless pipeline from live video ingestion to automated alert generation. The deployment framework, supported by AWS cloud infrastructure, Docker containers, and Kafka-based stream management, ensures scalability, low latency, and robust real-time processing. Continuous performance monitoring through Grafana dashboards maintains operational transparency, enabling law enforcement agencies to track system metrics and response rates effectively.

The proposed architecture effectively integrates pre-trained CNN models such as VGGFace, ResNet-50, and FaceNet to deliver high-precision facial recognition under diverse environmental and lighting conditions. Its modular design enables continuous retraining and database updates, ensuring adaptability to evolving facial datasets and demographic variations. By combining multi-model ensemble strategies with threshold tuning and confidence scoring, FindMe minimizes false positives and enhances reliability. Furthermore, the inclusion of privacy-preserving mechanisms, such as encrypted facial embeddings and role-based access control, ensures compliance with ethical standards in AI-based surveillance.

Looking forward, future enhancements may include integrating Graph Neural Networks (GNNs) for multi-camera person re- identification, enabling tracking across overlapping surveillance zones. Incorporating transformer-based vision architectures could further improve contextual understanding of video streams and crowd scenarios. Additional research may explore edge-AI deployment on IoT-enabled cameras for localized inference, reducing bandwidth consumption and improving system latency. Moreover, federated learning can be adopted to enable collaborative model improvement among multiple agencies without compromising data privacy. With continued development and optimization, the FindMe framework has the potential to evolve into a comprehensive, intelligent surveillance ecosystem one that transforms passive video networks into proactive tools for public safety, rapid response, and humanitarian assistance.

V. ACKNOWLEDGEMENT

We owe our deepest gratitude and profound respect to our esteemed guide and mentor, Prof. Rashmi Tuptewar, MIT ADT University, Pune, for her invaluable guidance, continuous support, and insightful feedback throughout the course of this project. Her constant encouragement, expert supervision, and constructive suggestions at every stage have been instrumental in transforming our initial concept into a well-structured and

impactful project. Her mentorship has not only enriched our technical understanding but also inspired us to approach every challenge with confidence and clarity.

We are sincerely thankful to our Honourable Head of Department for their unwavering support, motivation, and direction, which helped us carry out this work with purpose and dedication. We also extend our heartfelt appreciation to our Project Coordinator for their timely assistance, coordination, and for providing us with the valuable opportunity to explore the field of Artificial Intelligence and Computer Vision through this innovative initiative.

This project has been a remarkable journey of learning and collaboration. It provided us with an in-depth understanding of AI-based surveillance systems and the opportunity to translate theoretical concepts into practical applications addressing real- world social challenges. Working as a team helped us realize the true importance of teamwork, adaptability, and perseverance. The experience of designing and implementing FindMe has significantly enhanced our analytical, problem- solving, and research skills.

We also take this opportunity to express our sincere thanks to all faculty members of the Department of Computer Science and Engineering for their consistent encouragement, academic guidance, and for fostering a learning environment that promotes innovation and critical thinking. Their valuable insights greatly contributed to the refinement and successful completion of this project.

Finally, we would like to convey our heartfelt appreciation to all our friends, classmates, and peers who supported us directly or indirectly throughout this journey. Their motivation, cooperation, and constructive feedback played a crucial role in improving our work. This project, FindMe, stands as a reflection of the collective effort, mentorship, and knowledge shared by everyone who guided and inspired us. We feel truly privileged to have carried out and completed this work within such a supportive and intellectually stimulating academic environment.

VI. REFERENCES

  1. G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, University of Massachusetts, Amherst, Technical Report, 2008.

  2. O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep Face Recognition,

    British Machine Vision Conference (BMVC), 2015.

  3. F. Schroff, D. Kalenichenko, and J. Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

  4. K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, Joint Face Detection and Alignment Using Multi-Task Cascaded Convolutional Networks, IEEE Signal Processing Letters, vol. 23, no. 10, pp. 14991503, 2016.

  5. J. Redmon and A. Farhadi, YOLOv3: An Incremental Improvement,

    arXiv preprint arXiv:1804.02767, 2018.

  6. S. Zhao, Y. Wang, and Y. Jiang, Person Re-Identification Using Deep Metric Learning in Multi-Camera Surveillance, IEEE Transactions on Image Processing, 2019.

  7. N. Ahmed, R. Verma, and P. Reddy, EdgeCloud Collaboration for Real- Time Facial Recognition in Smart Cities, IEEE Internet of Things Journal, 2022.

  8. P. Singh and R. Sharma, Deep Learning Framework for Automated Missing Person Tracking Using YOLO and Siamese Networks, International Journal of Computer Vision and Intelligent Systems, 2021.

  9. Z. Li, C. Liu, and Y. Zhang, Real-Time Face Detection and Recognition in CCTV Systems Using Deep Learning, Expert Systems with Applications, 2020.

  10. H. Chen, T. Wang, and L. Zhou, Multimodal Deep Learning for Person Identification Using Face, Gait, and Clothing Features, Pattern Recogition Letters, 2021.

  11. S. Kaur and M. Verma, Privacy-Preserving AI-Based Surveillance Using Federated Learning, IEEE Access, 2023.

  12. Y. Zhang and B. Kim, Face Recognition under Occlusion and Low Light Conditions Using Generative Data Augmentation, Neural Computing and Applications, 2020.

  13. J. Gupta and D. Sharma, AI-Driven Real-Time Face Recognition Using OpenCV and TensorFlow, International Journal of Advanced Computer Science and Applications (IJACSA), 2021.

  14. M. Wang, W. Deng, and J. Hu, Deep Face Recognition: A Survey,

    Neurocomputing, vol. 429, pp. 215244, 2021.

  15. A. S. Reddy and H. K. Mishra, Intelligent Video Surveillance for Smart Cities Using Convolutional Neural Networks, Springer Advances in Computational Intelligence, 2022.

  16. L. Qi, Q. He, and X. Zhang, A Review on Person Re-Identification and Tracking in Large-Scale Video Networks, ACM Computing Surveys (CSUR), 2023.

  17. H. Kaur and A. Singh, Cloud-Native Deployment of AI Surveillance Systems Using Docker and Kubernetes, International Journal of Cloud Computing and Smart Systems, 2022.

  18. M. Patel and D. Banerjee, Comparative Study of FastAPI and Flask for AI Model Deployment, Journal of Emerging Technologies and Innovative Research (JETIR), 2022.

  19. A. Kumar and S. Raj, Real-Time Monitoring of AI-Based Vision Systems Using Prometheus and Grafana, Journal of Intelligent Data Systems, Springer, 2023.

  20. Amazon Web Services, Real-Time Video Analytics with AWS Machine Learning Services, AWS Whitepaper Series, 2023.