🏆
International Academic Publisher
Serving Researchers Since 2012

FindMe – AI-based Surveillance System for Missing People

DOI : https://doi.org/10.5281/zenodo.20000909
Download Full-Text PDF Cite this Publication

Text Only Version

FindMe – AI-based Surveillance System for Missing People

AI-based Surveillance System for Missing People

Vrushank Bhavsar, Tanaya Patil, Naman Nagota, Lavanya Rathi

MIT-ADT University ,Pune

Abstract – In our contemporary and hyper-connected global landscape, the phenomenon of missing individuals has evolved into a significant crisis for communal security. This issue is frequently exacerbated by the inherent limitations of traditional investigative protocols, which often suffer from fragmented communication between agencies, significant delays in initial reporting, and a heavy reliance on the manual scrutiny of massive video archives. Such conventional approaches to reviewing closed-circuit television (CCTV) footage are notoriously resource-heavy and slow, frequently falling victim to human fatigue and oversight. These systemic bottlenecks drastically lower the probability of locating individuals during the critical early hours following a disappearance, necessitating a shift toward more automated, reliable, and intelligent tracking solutions.

To confront these deep-seated operational hurdles, the initiative titled FindMe: AI-Powered Surveillance System for Missing Person Detection introduces a comprehensive framework for an automated, real-time identification platform. This system represents a sophisticated synthesis of facial biometric analysis, deep neural architectures, and advanced computer vision methodologies. By establishing a centralized digital infrastructure, the project effectively bridges the gap between disparate camera networks, official law enforcement databases, and emergency response channels. The result is a unified, AI-governed ecosystem that transforms passive video recording into a proactive tool for public safety, designed to operate with minimal human intervention while maintaining high levels of oversight.

The technical core of the system utilizes a suite of high-performance Convolutional Neural Network (CNN) architectures, including specialized models like VGGFace for biometric precision, FaceNet for distance-based clustering, and ResNet for deep feature extraction. These models facilitate the autonomous detection of facial markers, the distillation of unique identity vectors, and rapid similarity comparisons across both live streaming data and archived video libraries. To maintain the necessary throughput for real-world deployment, the platform integrates robust processing libraries such as OpenCV and TensorFlow, supported by elastic cloud-integrated APIs. This stack ensures that the system can handle concurrent data streams from multiple urban locations without significant latency or processing lag.

Safety and responsiveness are further enhanced through an integrated multi-channel notification engine . By implementing advanced threshold calibration and active database refresh cycles, the system drastically reduces the incidence of false positives. This focus on reliability ensures that law enforcement resources are directed toward genuine leads, thereby improving the overall efficacy of search operations and public trust in AI.

Furthermore, the FindMe architecture is built with a "privacy-by-design" philosophy. The prototypes robustness has been verified through extensive benchmarking against globally recognized facial datasets, such as Labeled Faces in the Wild (LFW) and VGGFace2, ensuring the system remains accurate across various lighting conditions, angles, and ethnic backgrounds. By evolving traditional, stagnant surveillance into a dynamic and anticipatory observation grid, this project strives to modernize the search for missing persons. It offers a scalable, machine-led alternative that prioritizes rapid data recovery, situational awareness, and the ethical application of artificial intelligence for the greater social good.

  1. INTRODUCTION

    The exponential growth of surveillance infrastructure worldwide has created an immense yet underutilized potential for improving public safety and missing-person investigations. Millions of high-definition cameras operate across urban environments, transportation hubs, and public spaces, generating vast volumes of visual information every second. However, despite this widespread availability, traditional investigation methods continue to rely heavily on manual screening of CCTV footagea process that is not only time-consuming but also highly prone to human error and cognitive fatigue. These inefficiencies often lead to delayed detection and response, particularly during the critical early hours when the probability of successful recovery is highest. The inability to synthesize this data in real-time creates a significant gap in urban security.

    The absence of automation and intelligence within existing surveillance workflows underscores the urgent need for a scalable, adaptive, and AI-driven approach to visual monitoring and information retrieval. Conventional surveillance systems function primarily as passive recording tools, offering limited real-time analytical capabilities. Manual verification and footage review not only hinder operational efficiency but also strain investigative resources. Such methods are incapable of processing large-scale video data in real time or detecting subtle visual cues that may indicate the presence of a missing individual. As a result, law enforcement agencies face significant challenges in maintaining situational awareness across extensive camera networks, often losing valuable time to administrative and technical bottlenecks.

    To overcome these constraints, this project introduces a next-generation, deep learningbased framework designed to transform traditional surveillance into an intelligent, proactive monitoring system. The proposed framework leverages advancements in artificial intelligence, computer vision, and cloud-based data analytics to enable automated recognition and alerting. Utilizing pre-trained convolutional neural network (CNN) architectures such as VGGFace, FaceNet, and ResNet, the system performs facial detection, feature extraction, and identity matching across both live and archived video streams. It integrates a centralized database of registered missing individuals, allowing continuous cross-referencing against active camera feeds. This shifts the paradigm from reactive searching to predictive and automated identification.

    Real-time analytics and low-latency data processing are achieved using industry-standard frameworks such as TensorFlow, OpenCV, and scalable web-based APIs. The technical architecture is built to be resilient, ensuring that even under heavy data loads, the system can maintain high accuracy and speed. By automating the identification process, the system allows investigators to focus on field operations rather than manual screen monitoring. Furthermore, the modular nature of the design allows for seamless integration with existing city-wide infrastructures, making it a viable solution for smart-city initiatives. The project emphasizes the transition from isolated camera silos to a networked intelligence grid that operates with unprecedented efficiency and technical precision.

    Finally, the project addresses the critical need for a user-centric notification system that bridges the gap between AI detection and human action. Once a potential match is identified, the system immediately propagates this information through secure channels, ensuring that the right data reaches the right authorities at the right time. This includes the integration of web dashboards and instant mobile notifications, which provide visual proof and location data for immediat verification. By combining the power of deep learning with a robust communication layer, the system aims to provide a holistic solution to the problem of missing person recovery. This approach not only demonstrates the potential of AI in augmenting public safety but also sets a new standard for humanitarian-focused technological innovation.

  2. LITERATURE REVIEW

    The escalating global demand for automated security and public welfare management has driven significant academic exploration into intelligent systems for facial identification, person re-identification, and lost-person recovery. With the massive expansion of urban CCTV and IoT-integrated camera grids, research focus has pivoted toward applying artificial intelligence (AI) and deep learning (DL) to process complex visual data in real time. This section provides a comprehensive review of the foundational methodologies and technical progress that serve as the architectural basis for the FindMe system.

    Rao et al. [1] explored the application of classical computer vision algorithms, specifically focusing on Haar Cascade classifiers and Histogram of Oriented Gradients (HOG) for spotting faces within surveillance streams. While these methods were computationally efficient for low-power edge devices, they exhibited significant vulnerabilities when faced with real-world variables such as extreme lighting shifts, non-frontal head poses, and physical obstructions. Their study confirmed that handcrafted features lack the robustness required for reliable detection in unconstrained outdoor environments.

    Patel and Mehta [2] advanced the field by introducing a high-accuracy recognition pipeline centered on deep Convolutional Neural Networks (CNNs), utilizing the VGGFace and ResNet-50 architectures. By conducting rigorous testing on the Labeled Faces in the Wild (LFW) dataset, they demonstrated that deep feature extraction could achieve precision rates far exceeding traditional methods. However, their framework was designed for static image analysis, lacking the real-time stream processing and automated notification mechanisms necessary for active field operations.

    Singh et al. [3] proposed a specialized framework for tracking missing individuals by integrating the YOLOv3 object detection model with a Siamese network for similarity verification. This dual-model approach allowed the system to maintain identity consistency across different video frames. Despite its success in controlled testing, the system encountered significant latency issues when scaled to handle multiple high-definition feeds simultaneously. Their research highlighted the critical need for more efficient data ingestion and lower-latency inference layers for city-wide implementation.

    Zhao et al. [4] investigated the complexities of person re-identification (Re-ID) across non-overlapping camera views using deep metric learning. Their work focused on establishing spatial-temporal associations to follow a targets path through a fragmented surveillance network. While their approach improved tracking continuity, the models reliability fluctuated in high-density environments where motion blur and occlusion are frequent. This highlighted the importance of integrating multi-stage verification to maintain high confidence in crowded public spaces.

    Gupta and Sharma [5] analyzed the infrastructure requirements of AI-driven vision, proposing a cloud-native architecture leveraging OpenCV and TensorFlow for distributed processing. Their methodology emphasized offloading initial image preprocessing to edge nodes to reduce the load on central servers. While this reduced the total computation time, their findings indicated that network synchronization and data consistency remained challenging when managing thousands of sensors, necessitating a more robust messaging backbone like Kafka for reliable data flow.

    Kaur and Verma [6] addressed the critical intersection of surveillance and individual rights by proposing privacy-preserving techniques such as federated learning and the encryption of facial embeddings. Their study argued that while

    detection accuracy is vital, systems must be built to prevent the unauthorized reconstruction of facial images from stored data. This research provides the ethical framework for the FindMe project, ensuring that biometric information is handled through secure, one-way hashing and encrypted storage protocols.

    Ahmed and Wang [7] explored the use of Generative Adversarial Networks (GANs) to enhance the quality of low-resolution surveillance frames before they are processed by recognition models. Their experiments showed that "super-resolution" techniques could significantly boost the accuracy of models like FaceNet when dealing with grainy or distant CCTV footage. This insight is particularly relevant for the FindMe system, as it provides a technical pathway for maintaining high detection rates even when using legacy camera hardware with limited resolution.

    Across the reviewed literature, deep learning architectures, edge-cloud hybrid systems, and multimodal fusion techniques have emerged as leading trends in intelligent surveillance research. CNN-based models such as VGGFace, FaceNet, and ResNet have shown remarkable accuracy in controlled datasets, while person re-identification frameworks and streaming-based architectures continue to push the boundaries of scalability and real-time responsiveness. Despite these advancements, critical challenges remain in balancing computational efficiency, ethical compliance, and operational scalability. The proposed FindMe framework builds upon these insights by integrating scalable AI-driven face recognition, cloud-based analytics, and automated alert systems to enable proactive, real-time missing-person identification within smart city surveillance ecosystems.

  3. METHODOLOGY

    The architectural framework of the FindMe ecosystem is engineered as a high-performance, multi-layered solution designed to bridge the gap between legacy surveillance hardware and active investigative intelligence. The design prioritizes a modular microservices approach, ensuring that individual components of the recognition engine can be updated or scaled independently.

    1. System Architecture Overview

      The internal logic of the platform is structured into five distinct operational phases, each handling a critical stage of the detection, analysis, and identification lifecycle:

      1. Video Data Ingestion Layer:

        • Live Stream Acquisition: The system establishes persistent, high-bandwidth connections to distributed camera grids via the Real-Time Streaming Protocol (RTSP). This ensures a continuous, low-latency data flow from diverse hardware sources, ranging from high-end IP cameras to basic digital sensors.

        • Kafka-Driven Ingestion: To manage the massive, asynchronous data throughput generated by hundreds of concurrent feeds, Apache Kafka is implemented as a fault-tolerant message broker. This decouples the video sources from the processing engine, providing a buffered queue that prevents data loss during peak loads or network fluctuations.

        • Metadata Indexing and Tagging: Every ingested frame is automatically appended with crucial metadata, including a unique Camera ID, precise GPS coordinates, and high-resolution Unix timestamps. This indexing allows for rapid geospatial filtering and chronological reconstruction of an individual's movement across a city-wide grid.

      2. Frame Processing and Feature Extraction Layer:

        • Normalization & Digital Cleaning: Raw video frames are extracted and subjected to digital cleaning via OpenCV and FFmpeg. This process adjusts for motion blur, reduces sensor noise, and applies histogram equalization to improve contrast, ensuring the subsequent detetion models receive the highest quality input possible.

        • Localization (MTCNN/YOLOv8-Face): The system performs high-speed facial localization using a combination of MTCNN (Multi-task Cascaded Convolutional Networks) and YOLOv8-Face. These models identify facial boundaries and perform five-point landmark alignment to normalize variations in head pose, tilt, and scale.

        • Embedding Generation & Distillation: High-dimensional biometric vectors (embeddings) are distilled using an ensemble of VGGFace, ResNet-50, and FaceNet architectures. This creates a 128 or 512-dimensional digital signature that mathematically represents unique facial characteristics, allowing for rapid comparison in a latent vector space.

      3. Matching and Identification Layer:

        • Vector Cross-Referencing: Probes are compared against a centralized missing persons repository hosted in a vector-optimized MongoDB or Pinecone instance. This allows the system to perform "one-to-many" searches across millions of records in milliseconds.

        • Similarity Metrics calculation: The engine utilizes Cosine Similarity and Euclidean Distance to calculate the mathematical proximity between the live probe and the enrolled gallery. Scores are normalized to a 0-1 scale, where values closer to 1 indicate a near-perfect biometric match.

        • Hybrid Verification Engine: To minimize the risk of false positives, deep feature matching is reinforced with secondary heuristic checks. These include soft-biometric filters such as gender estimation, approximate age grouping, and landmark-based

          facial ratio analysis to ensure the detected individual aligns with the known profile.

      4. Alert Generation and Visualization Layer:

        • Ensemble Confidence Scoring: A match is only flagged as a "Positive Identification" if it surpasses a weighted consensus threshold. By merging outputs from multiple neural network architectures, the system filters out environmental noise that might cause a single model to fail.

        • Multi-Channel Broadcasting Protocol: Upon verification, instant encrypted alerts are dispatched. The system utilizes Twilio APIs for SMS notifications and SMTP-enabled servers for detailed email reports, including subject profiles, confidence scores, and direct links to the relevant video snippet.

        • Interactive Web Dashboard: A centralized interface developed with Streamlit provides investigators with real-time visualization. It features an interactive map view, a chronological event log, and a comparison pane that displays the "Enrolled Image" side-by-side with the "Captured Probe."

      5. Feedback and Continuous Learning Layer:

        • Data Lake Integration: All detection events, whether confirmed or dismissed, are logged in a secure cloud-based data lake. This serves as a forensic repository for historical analysis and satisfies administrative requirements for data auditability and transparency.

        • Incremental Retraining Cycles: Confirmed matches and verified false alarms are fed back into the training pipeline. This allows the CNN models to undergo incremental fine-tuning, improving their accuracy against specific environmental factors or camera-specific distortions over time.

        • Dynamic System Calibration: The system adaptively adjusts its internal similarity thresholds based on shifting environmental conditions. For instance, sensitivity may be automatically recalibrated during night-time operations to account for the increased noise present in infrared surveillance feeds.

    2. Dataset Description

      To evaluate and validate the performance of the system, a combination of public benchmarks and specialized simulated surveillance data was employed:

      1. Labeled Faces in the Wild (LFW):

        • Baseline Precision Metrics: Over 13,000 labeled images were used to establish a foundational accuracy rate. This benchmark is critical for ensuring the system can handle the "unconstrained" nature of

          web-collected images, which mirror the quality of social media photos provided by families of missing persons.

        • Pose and Expression Stress-Testing: The system was benchmarked specifically on its ability to maintain high recognition rates across extreme facial expressions and significant variations in horizontal and vertical head rotation.

      2. VGGFace2 Dataset:

        • Demographic and Ethnic Diversity: With 3.3 million images representing 9,000 distinct identities, this dataset was utilized to mitigate algorithmic bias. It ensures the detection engine remains accurate across diverse ethnicities, age groups, and professional appearances.

        • Stratified Batch Sampling: During the training phase, stratified sampling techniques were used to ensure every identity category was represented equally, preventing the model from over-indexing on common facial features or majority demographics.

      3. CelebA and WiderFace Datasets:

        • Occlusion and Mask Handling: These datasets were used to train the system to identify subjects whose faces are partially obscured by accessories like sunglasses and hats, or medical equipment like surgical masksa common challenge in post-pandemic urban environments.

        • Complex Crowd Environments: The WiderFace dataset provided the variety needed to test detection in dense, cluttered backgrounds where subjects may appear small, out-of-focus, or at extreme distances from the camera lens.

      4. Custom Simulated Surveillance Dataset:

        • CCTV Artifact Simulation: We integrated a curated library of raw security footage to train the models on artifacts specific to surveillance, such as compression noise, interlacing lines, and the characteristic graininess of low-light CMOS sensors.

        • Synthetic Aging Augmentation: Employing Generative Adversarial Networks (GANs), the team created synthetic variations of missing person profiles to simulate the effects of facial aging, weight changes, and hairstyle alterations for long-term cold cases.

    3. Data Preprocessing and Integration

      Prior to deployment, all data underwent a standardized, high-rigor pipeline to ensure absolute consistency across different sensor types and image sources:

      1. Data Cleaning & Quality Assurance:

        • Automated Corrupt Frame Removal: Python-based scripts were used to scan and discard low-entropy or corrupted images that could introduce gradients of "garbage data" into the neural network training cycles.

        • Deduplication & Consistency Checks: A hashing-based deduplication process ensured that the training set contained only unique samples, preventing the model from overfitting on redundant images of the same subject.

      2. Standardization & Geometric Augmentation:

        • Affine Transformations & Alignment: Using facial landmarks, every detected face was rotated and scaled into a fixed coordinate system, ensuring that the distance between features like the eyes and nose remains consistent for the feature extractor.

        • Pixel Intensity Normalization: Image values were scaled to a range of [0, 1] or [-1, 1] withzero mean and unit variance, a step that is essential for ensuring fast and stable convergence during the backpropagation phase of model training.

        • Environmental Robustness Augmentation: To improve generalizability, the system applied random geometric transforms, including horizontal flipping, scaling, and Gaussian blurring, to simulate the unpredictable nature of real-world CCTV feeds.

      3. Integration & Partitioning Logistics:

        • Dimensionality Reduction & Projection: Facial vectors were normalized and projected into a common latent space using Principal Component Analysis (PCA). This reduces the storage footprint of each identity while maintaining the biometric distinctiveness required for high-speed matching.

        • Chronological and Spatial Partitioning: Surveillance feeds were partitioned into training, validation, and testing segments based on time and location. This simulates a real-world deployment where the system must recognize an individual seen at "Point A" when they appear hours later at "Point B."

    The combined dataset was used to evaluate FindMe under multiple environmental and demographic conditions, ensuring its reliability, scalability, and readiness for real-time deployment across distributed surveillance networks.

  4. RESULTS

    The FindMe system was implemented and tested through a two-module web interface comprising a Registration Page and a Detection Page. The results obtained from the developed prototype demonstrate the system's efficiency in managing user data, performing accurate facial registration, and enabling automated recognition within live surveillance feeds. The Registration Page allows individuals or authorities to enroll missing persons by entering essential details such as name, age, gender, contact number, and other identifying attributes, followed by capturing three reference images to ensure feature consistency under varied conditions.

    The Detection Page provides an intuitive interface equipped with a dynamic filter where users can select the name of the registered individual to be located. Upon selection, the system initiates real-time face matching using deep learning models integrated with the live video feed, displaying instant detection alerts on the screen. This section presents a high-level overview of the observed outputs of both modules, illustrating their interactive design, functional workflow, and operational accuracy as captured during the deployed system testing. The subsequent subsections will detail the specific visual results and performance metrics associated with these primary operational components.

    FIG 6.1 Registration Page

    The first image illustrates the FIND ME analytical dashboard, which serves as the centralized nerve center for large-scale situational monitoring. The results from this interface demonstrate a successful integration of multi-source data streams, ranging from live operational metrics to environmental sensor data. A key outcome observed is the system's ability to maintain a high-density "Kumbh Mela Statistics" module,

    which provides a real-time audit of currently missing individuals versus those successfully reunited. By utilizing a reactive React 19 frontend, the dashboard reflects status distributions and daily case trends through dynamic visualization components, ensuring that search coordinators can identify patterns in person-loss incidents as they occur.

    Furthermore, the interface effectively merges biometric tracking with environmental awareness widgets. The display of a 34°C temperature alongside "Extreme" UV warnings and air quality alerts proves that the platform provides a holistic view of the search environment. This is critical for field operations, as it allows administrators to factor in heat exhaustion or low visibility when deploying search teams across vast areas. The inclusion of these data points transforms the application from a simple identification tool into a comprehensive mission-control platform that prioritizes the safety of both the missing individuals and the rescue personnel.

    The "Live Activity Feed" and "Surveillance Map" further validate the system's ability to track identification events geographically. By providing a comprehensive heatmap of where detections are occurring across multiple sectors (e.g., Sector 2, 9, and 12), the interface allows for a strategic understanding of crowd movement and risk areas. The results indicate that the FastAPI backend is successfully pushing updates to the frontend via a persistent connection, ensuring that the geographic data is as current as the biometric captures, which is vital for directing search-and-rescue teams to the precise location of a sighting.

    the "LIVE" status marker. The interface displays an automated bounding box that successfully isolates the subject's face, demonstrating that the OpenCV-based preprocessing layer is effectively handling localization tasks in real-time, allowing the system to run as a persistent background service rather than a triggered event.

    The most significant outcome shown in this result is the triggered "MATCH FOUND" alert in the status panel, which occurs automatically upon facial detection. This confirms that the backend FastAPI service successfully received the stream frames, generated mathematical fingerprints, and executed similarity searches against the MongoDB database in a continuous, sub-second loop. By accurately retrieving the specific identity (Employee CUST-9569) and pinpointing the exact camera location (CAM-FRONT-GATE), the system validates its high-precision matching logic. This proves that the system can reliably distinguish between individuals in an active environment, providing actionable intelligence to security personnel without requiring them to manually initiate scans.

    Beyond the immediate match, the interface demonstrates a streamlined, "always-on" operational model. By removing manual capture controls, the interface shifts its focus to high-level monitoring and administrative oversight. The successful identification shown in the image confirms that the asynchronous architecture is capable of performing intensive computational tasksdetection, normalization, and database comparisonin a fully automated lifecycle. This result is a testament to the system's readiness for deployment in high-stakes environments where human reaction time is a limiting factor and constant, hands-free biometric accuracy is the highest priority.

  5. CONCLUSION

    FIG 6.2 Detection page

    The second image provides a direct result of the detection.py execution within the live production environment, highlighting an autonomous surveillance architecture. The primary result observed is the seamless transition from raw video acquisition to confirmed biometric identification without the need for manual user intervention. Utilizing the WebRTC API, the system maintains a stable, low-latency live feed, as indicated by

    The project "FindMe: AI-Powered Surveillance for Missing Person Detection" successfully demonstrates the integration of deep learningbased facial recognition with scalable, real-time video analytics to enhance the efficiency and accuracy of missing-person identification. By leveraging Pythons FastAPI for backend services, Streamlit for visualization, and MongoDB for centralized face embedding storage, the system establishes a seamless pipeline from live video ingestion to automated alert generation. The deployment framework, supported by AWS cloud infrastructure, Docker containers, and Kafka-based stream management, ensures scalability, low latency, and robust real-time processing. Continuous performance monitoring through Grafana dashboards maintains operational transparency, enabling law eforcement agencies to track system metrics and response rates effectively.

    The proposed architecture effectively integrates pre-trained CNN models such as VGGFace, ResNet-50, and FaceNet to deliver high-precision facial recognition under diverse environmental and lighting conditions. Its modular design enables continuous retraining and database updates, ensuring adaptability to evolving facial datasets and demographic variations. By combining multi-model ensemble strategies with threshold tuning and confidence scoring, FindMe minimizes

    false positives and enhances reliability. Furthermore, the inclusion of privacy-preserving mechanisms, such as encrypted facial embeddings and role-based access control, ensures compliance with ethical standards in AI-based surveillance.

    Looking forward, future enhancements may include integrating Graph Neural Networks (GNNs) for multi-camera person re-identification, enabling tracking across overlapping surveillance zones. Incorporating transformer-based vision architectures could further improve contextual understanding of video streams and crowded public scenarios. Additional research may explore Edge-AI deployment on IoT-enabled cameras for localized inference, reducing bandwidth consumption and improving system latency. Moreover, federated learning can be adopted to enable collaborative model improvement among multiple agencies without compromising sensitive data privacy. With continued development and optimization, the FindMe framework has the potential to evolve into a comprehensive, intelligent surveillance ecosystemone that transforms passive video networks into proactive tools for public safety, rapid response, and humanitarian assistance.

  6. ACKNOWLEDGEMENT

We owe our deepest gratitude and profound respect to our esteemed guide and mentor, Prof. Rashmi Tuptewar, MIT ADT University, Pune, for her invaluable guidance, continuous support, and insightful feedback throughout the course of this project. Her constant encouragement, expert supervision, and constructive suggestions at every stage have been instrumental in transforming our initial concept into a well-structured and impactful project. Her mentorship has not only enriched our technical understanding but also inspired us to approach every challenge with confidence and clarity.

We are sincerely thankful to our Honourable Head of Department for their unwavering support, motivation, and direction, which helped us carry out this work with purpose and dedication. We also extend our heartfelt appreciation to our Project Coordinator for their timely assistance, coordination, and for providing us with the valuable opportunity to explore the field of Artificial Intelligence and Computer Vision through this innovative initiative.

This project has been a remarkable journey of learning and collaboration. It provided us with an in-depth understanding of AI-based surveillance systems and the opportunity to translate theoretical concepts into practical applications addressing real-world social challenges. Working as a team helped us realize the true importance of teamwork, adaptability, and perseverance. The experience of designing and implementing FindMe has significantly enhanced our analytical, problem-solving, and research skills.

We also take this opportunity to express our sincere thanks to all faculty members of the Department of Computer Science and Engineering for their consistent encouragement, academic guidance, and for fostering a learning environment that promotes innovation and critical thinking. Their valuable

insights greatly contributed to the refinement and successful completion of this project.

Finally, we would like to convey our heartfelt appreciation to all our friends, classmates, and peers who supported us directly or indirectly throughout this journey. Their motivation, cooperation, and constructive feedback played a crucial role in improving our work. This project, FindMe, stands as a reflection of the collective effort, mentorship, and knowledge shared by everyone who guided and inspired us. We feel truly privileged to have carried out and completed this work within such a supportive and intellectually stimulating academic environment.

VII .REFERENCES

  1. G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, University of Massachusetts, Amherst, Technical Report, 2008.

  2. O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep Face Recognition, British Machine Vision Conference (BMVC), 2015.

  3. F. Schroff, D. Kalenichenko, and J. Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

  4. K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, Joint Face Detection and Alignment Using Multi-Task Cascaded Convolutional Networks, IEEE Signal Processing Letters, vol. 23, no. 10, pp. 14991503, 2016.

  5. J. Redmon and A. Farhadi, YOLOv3: An Incremental Improvement, arXiv preprint arXiv:1804.02767, 2018.

  6. S. Zhao, Y. Wang, and Y. Jiang, Person Re-Identification Using Deep Metric Learning in Multi-Camera Surveillance, IEEE Transactions on Image Processing, 2019.

  7. N. Ahmed, R. Verma, and P. Reddy, EdgeCloud Collaboration for Real-Time Facial Recognition in Smart Cities, IEEE Internet of Things Journal, 2022.

  8. P. Singh and R. Sharma, Deep Learning Framework for Automated Missing Person Tracking Using YOLO and Siamese Networks, International Journal of Computer Vision and Intelligent Systems, 2021.

  9. Z. Li, C. Liu, and Y. Zhang, Real-Time Face Detection and Recognition in CCTV Systems Using Deep Learning, Expert Systems with Applications, 2020.

  10. H. Chen, T. Wang, and L. Zhou, Multimodal Deep Learning for Person Identification Using Face, Gait, and Clothing Features, Pattern Recognition Letters, 2021.

  11. S. Kaur and M. Verma, Privacy-Preserving AI-Based Surveillance Using Federated Learning, IEEE Access, 2023.

  12. Y. Zhang and B. Kim, Face Recognition under Occlusion and Low Light Conditions Using Generative Data Augmentation, Neural Computing and Applications, 2020.

  13. J. Gupta and D. Sharma, AI-Driven Real-Time Face Recognition Using OpenCV and TensorFlow, International Journal of Advanced Computer Science and Applications (IJACSA), 2021.

  14. M. Wang, W. Deng, and J. Hu, Deep Face Recognition: A Survey,

    Neurocomputing, vol. 429, pp. 215244, 2021.

  15. A. S. Reddy and H. K. Mishra, Intelligent Video Surveillance for Smart Cities Using Convolutional Neural Networks, Springer Advances in Computational Intelligence, 2022.

  16. L. Qi, Q. He, and X. Zhang, A Review on Person Re-Identification and Tracking in Large-Scale Video Networks, ACM Computing Surveys (CSUR), 2023.

  17. H. Kaur and A. Singh, Cloud-Native Deployment of AI Surveillance Systems Using Docker and Kubernetes, International Journal of Cloud Computing and Smart Systems, 2022.

  18. M. Patel and D. Banerjee, Comparative Study of FastAPI and Flask for AI Model Deployment, Journal of Emerging Technologies and Innovative Research (JETIR), 2022.

  19. A. Kumar and S. Raj, Real-Time Monitoring of AI-Based Vision Systems Using Prometheus and Grafana, Journal of Intelligent Data Systems, Springer, 2023.

  20. Amazon Web Services, Real-Time Video Analytics with AWS Machine Learning Services, AWS Whitepaper Series, 2023.