Object Detection using Deep Learning

doi:https://doi.org/10.5281/zenodo.19451728

Volume 15, Issue 03 (March 2026)

Object Detection using Deep Learning

DOI : https://doi.org/10.5281/zenodo.19451728

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 29
Authors : Dr. S Ariffa Begum, M S R Swetha, G K Rupasri
Paper ID : IJERTV15IS031562
Volume & Issue : Volume 15, Issue 03 , March – 2026
Published (First Online): 07-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Object Detection using Deep Learning

Dr. S Ariffa Begum

Department of Information Technology, K.L.N College of Engineering, Pottapalyam, Sivagana

M S R Swetha

Department of Information Technology, K.L.N College of Engineering, Pottapalyam, Sivagana

G K Rupasri

Department of Information Technology, K. L. N College of Engineering, Pottapalyam, Sivagana

Abstract – Construction sites are high-risk environments where failure to use personal protective equipment (PPE) such as helmets and safety vests can lead to severe injuries and fatalities. Manual monitoring of workers is time-consuming and prone to human error. This paper presents an automated safety compliance monitoring system using deep learning-based object detection techniques. The proposed system detects whether workers are wearing safety helmets and reflective vests in real-time using surveillance camera feeds. A convolutional neural network (CNN)-based object detection model is trained on a custom dataset containing images of workers with and without safety gear. The system classifies the detected person as Safe if both helmet and vest are worn, otherwise as Unsafe. Experimental results show high detection accuracy under varying lighting and environmental conditions. The proposed system can significantly enhance safety enforcement and reduce workplace accidents in construction environments.

Keywords: Deep Learning, Object Detection, Construction Safety, Helmet Detection, Safety Vest Detection, PPE Monitoring

INTRODUCTION
Construction sites are dynamic and high-risk environments where workers are continuously exposed to potential hazards such as falling objects, heavy machinery, elevated structures, and moving vehicles. To reduce the likelihood of serious injuries, regulatory bodies mandate the use of Personal Protective Equipment (PPE), particularly safety helmets and reflective vests. While these safety measures are well defined, ensuring that every worker consistently follows them remains a persistent challenge, especially on large construction projects.

As construction activities increase in scale and complexity, manual safety supervision becomes less effective. Safety

officers cannot monitor all workers simultaneously, and momentary lapses in observation may result in unnoticed violations. In many cases, corrective action is taken only after an incident occurs. This reactive approach highlights the necessity for a proactive and automated system capable of continuously monitoring compliance in real time.

Recent progress in computer vision and deep learning has opened new possibilities for intelligent surveillance systems. Convolutional Neural Networks (CNNs) have shown strong capability in recognizing objects within images and video streams with high precision. By leveraging these advancements, it is possible to develop a system that automatically detects workers and verifies whether required protective equipment is properly worn.

This paper presents a smart safety monitoring framework that combines deep learning-based object detection with automated alert mechanisms. CCTV cameras installed across the construction site capture continuous video feeds, which are analyzed using a trained CNN model. The system identifies workers and checks for the presence of safety helmets and reflective vests. Based on this analysis, each individual is categorized as compliant or non-compliant with safety regulations.

When a safety violation is detected, the system immediately initiates predefined response actions. These actions may include sending notifications to site supervisors, activating warning indicators, or logging the incident for further review. Such real-time intervention encourages prompt corrective measures and strengthens overall safety enforcement.

By integrating intelligent video analysis with automated alert systems, the proposed solution offers a reliable and scalable approach to construction site safety management. Unlike traditional supervision methods that depend entirely on human monitoring, this system operates continuously and consistently. The implementation of this technology can

significantly reduce workplace accidents, improve adherence to safety standards, and contribute to safer working environments in the construction industry.
LITERATURE REVIEW
Deep learning has revolutionized computer vision, particularly convolutional neural networks (CNNs) for image classification tasks. These models demonstrated powerful feature learning capabilities on large datasets, laying the foundation for current object detection systems. However, most early research did not address the real-time requirements necessary for dynamic environments such as construction sites [1].

To enable real-time visual detection, single-stage detectors like YOLO (You Only Look Once) were introduced, treating detection as a regression problem. They enabled fast processing of multiple objects per frame and were widely adopted for surveillance tasks. However, challenges remain in detecting small or partially visible objects like helmets in cluttered scenes [2].

Faster R-CNN and its derivatives further improved detection accuracy by combining region proposal networks with CNN backbones. These models demonstrated strong performance in object localization, yet their computational complexity limited their practical use in real-time construction monitoring [3].

Several studies investigated the practical use of CNN-based models for safety monitoring. One research effort applied deep learning to identify workers and classify PPE in industrial scenes, showing promising results but lacking in automation of alert systems [4].

Complementary approaches focused on traditional image processing and machine learning for PPE detection. Such methods used color or texture features to identify helmet shapes but exhibited limited robustness to lighting variations common in outdoor environments [5].

More recent studies explored multi-class object detection frameworks capable of simultaneously identifying helmets and safety vests. These works demonstrated improved detection performance but often required high computational resources [6].

Some researchers developed real-time PPE monitoring prototypes using edge devices. While these systems addressed latency issues, their scalability across large construction sites remained constrained by hardware limitations [7].

Integration of IoT with vision systems has been proposed to automate alerting mechanisms. Alerts were generated when safety violations occurred, but these systems often did not

optimize detection models for real-world variabilities such as occlusion and environmental noise [8].

Color segmentation techniques combined with CNNs were tested for reflective vest detection, but sensitivity to shadows and illumination changes reduced their effectiveness in variable outdoor lighting [9].

Investigations into wearable sensor systems aimed to ensure compliance through direct monitoring of PPE usage. Though accurate, these methods require additional worker equipment, increasing operational complexity and cost [10].

Hybrid approaches combining deep learning algorithms with edge computing were proposed to balance detection performance and low latency. Results indicated potential, but further optimization is necessary for reliable deployment [11].

Research targeting helmet detection in highway construction found that occlusions and similar background colors decreased detection accuracy, highlighting the need for better feture discrimination strategies [12].

Studies on deep learning-based human pose estimation suggested the potential for inferring PPE compliance indirectly. However, these methods did not offer direct helmet or vest recognition and are sensitive to complex worker postures [13].

Some works incorporated thermal imaging into PPE monitoring to improve detection under low visibility conditions. Although useful at night, these systems faced limitations in distinguishing color-based features like reflective vests [14].

Comparative evaluations between two-stage and single- stage detectors in safety monitoring illustrated trade-offs between speed and precision. While faster models processed frames in real time, they often misclassified closely spaced objects [15].

Efforts to build robust detection pipelines for construction safety monitoring examined data augmentation techniques to increase model resilience. These approaches improved generalization to diverse site conditions, but their effectiveness was limited without extensive real-world datasets [16].

A few studies investigated continuous learning systems to adapt models as new scenarios emerge. Although beneficial for evolving environments, these systems require careful management to avoid model drift and false detections [17].

Research that applied deep learning to helmet detection using mobile cameras indicated feasibility but did not address automated alert mechanisms or integration with site management systems [18].

Investigations into large-scale construction monitoring systems combined vision analysis with worker location tracking. These integrated frameworks improved detection but increased system complexity and dependency on additional sensors [19].

Recent efforts explored lightweight deep learning models designed for real-time video analysis on embedded devices. These solutions balanced detection speed and accuracy but often struggled with capturing fine-grained details necessary for safety gear recognition [20].

Another line of research utilized ensemble learning to combine multiple detection models, enhancing overall reliability. However, the added computational burden limited its use in resource-constrained environments [21].

Several reviews have detailed the current state of AI for workplace safety, emphasizing both potential and limitations. These works highlighted gaps in unified systems that combine accurate detection, automated feedback, and scalability in outdoor construction contexts [22].

While some studies integrated IoT for alerting, very few implemented real-time compliance reminders or notification systems tied to detection outputs, revealing an area requiring further exploration [23].
METHODOLOGY
The proposed system is developed to automatically monitor safety compliance at construction sites using a deep learningbased object detection approach, as illustrated in the given flow diagram. The methodology follows a structured sequence of operations that transforms video input into real-time safety monitoring and violation alerts. The process begins with video input acquisition, where visual data is captured through live webcams or prerecorded surveillance footage from construction environments. This continuous video stream allows the system to observe worker activities without human supervision and ensures uninterrupted safety monitoring throughout site operations.After video collection, the input stream is processed in the frame extraction stage, where the video is divided into individual image frames at regular intervals. Processing frames instead of the entire video reduces computational load and enables faster analysis. Each frame acts as an independent input for the detection system, making real-time processing achievable. The extracted frames are then passed to the YOLOv8 detection module, which is trained to recognize workers and essential safety equipment such as helmets and safety vests. The training dataset is prepared using labeled construction images, and data augmentation techniques such as rotation, scaling, brightness variation, and horizontal flipping are applied to

improve model adaptability under different lighting conditions and camera perspectives.

During the training phase, the model learns to accurately locate and classify objects while monitoring performance indicators such as detection loss, precision, recall, and mean Average Precision to ensure effective learning. Model parameters are adjusted to maintain a balance between detection accuracy and processing speed. Validation using separate test data confirms that the model performs consistently and avoids overfitting. Once training is completed, the optimized model is integrated into the real-time monitoring pipeline shown in the flowchart.

In the detection stage, each incoming frame is analyzed and bounding boxes are generated around detected workers and safety equipment. The system then applies safety status logic to determine compliance by checking whether each worker is wearing the required protective gear. Workers equipped with both helmet and vest are classified as safe, while missing equipment results in an unsafe status. To improve reliability, detections are tracked across consecutive frames to prevent incorrect alerts caused by temporary obstruction or motion blur.

Finally, the output module displays annotated video frames along with safety indicators and violation notifications. When unsafe conditions are identified, the system highlights the worker visually and generates alerts while also maintaining a log of violations for monitoring and reporting purposes. This complete workflow enables continuous automated supervision, faster identification of safety risks, and improved management of construction site safety practices.

Figure 1:Overall Process Flow Diagram
1. Video Input Acquisition
  The system starts by acquiring visual data from either a live webcam feed or prerecorded construction site videos. These video sources capture real-time worker activities and environmental conditions. The input module continuously records and forwards video frames to the processing unit, ensuring uninterrupted monitoring and enabling the system to observe safety practices without manual supervision.
2. Frame Extraction
  The incoming video stream is divided into individual frames at predefined time intervals. This process allows the system to analyze visual information step by step instead of processing the entire video simultaneously. Frame extraction improves computational efficiency and supports real-time analysis by treating each frame as an independent image input for further processing.
3. Object Detection Using YOLOv8
  Each extracted frame is processed using the YOLOv8 object detection model, which is trained to identify workers and essential safety equipment such as helmets and safety vests. The model performs detection in a single computational pass, enabling rapid and accurate identification of objects within the scene. It generates class labels, confidence scores, and spatial coordinates for every detected object, forming the basis for safety evaluation.
4. Detection and Bounding Box Generation
  After detection, bounding boxes are drawn around identified objects to visually represent their positions within the frame. The system then analyzes the spatial relationship between workers and detected safety equipment using overlap and proximity analysis. This step ensures that helmets and vests are correctly associated with the respective workers, preventing incorrect safety classification.
5. Safety Status Logic
  A rule-based decision module evaluates detection results to determine safety compliance. When a worker is detected wearing both a helmet and a safety vest, the system classifies the condition as safe. If any required protective equipment s missing, the worker is labeled as unsafe, and a violation is recorded. This logical evaluation converts detection outputs into meaningful safety status information.
6. Output Generation and Alert System
The final stage presents the processed results through a real- time monitoring interface. The output includes annotated video frames with bounding boxes, safety status indicators, and violation notifications. Whenever unsafe conditions are identified, the system generates alerts and logs the incidents for future monitoring and reporting. This continuous workflow enables automated safety supervision and improves compliance monitoring at construction sites.detection, evaluates compliance status, and generates alerts whenever violations are identified. Upon detection of non-compliance, notifications are immediately transmitted to supervisors through an IoT-enabled communication system or mobile application. Simultaneously, on-site warning mechanisms such as buzzers or indicator lights can

be activated to prompt corrective action. In addition, violation instances are logged with timestamps and stored in a centralized database for record-keeping and further analysis. This integrated approach ensures continuous automated safety monitoring and reduces reliance on manual inspection while promoting proactive enforcement of PPE regulations.
RESULTS AND DISCUSSION
The performance of the proposed Construction Site Safety Detection System was examined through a series of experimental evaluations focusing on accuracy, detection consistency, and real-time operational capability. The objective of this evaluation was to determine whether the developed model could reliably monitor safety compliance in practical construction site environments. The results obtained during training and testing are discussed in the following subsections.
1. Model Performance
  The proposed system employs the YOLOv8 object detection framework to identify workers wearing safety helmets and reflective vests. The model was trained using a manually annotated dataset consisting of images representing both compliant and non-compliant workers. To enhance robustness and prevent overfitting, data augmentation techniques such as horizontal flipping, rotation, scaling, and brightness adjustments were applied during training.
  
  Throughout the training process, the model exhibited steady improvement across epochs. The reduction in bounding box loss, classification loss, and objectness loss indicated that the network was effectively learning to localize and classify safety equipment. Validation metrics closely followed training performance, suggesting that the model generalized well to unseen data.
  
  The key evaluation metrics demonstrated strong performance. The mean Average Precision (mAP@0.5) reflected high detection accuracy for both helmet and vest categories. Precision values confirmed the models capability to correctly identify compliant workers, while recall values showed its effectiveness in detecting safety violations. These results confirm that YOLOv8 is well- suited for real-time safety monitoring in construction environments.
2. Detection Capabilities
  To assess real-world applicability, the trained model was tested using live CCTV footage and recorded site videos captured under different environmental conditions. The evaluation considered variations in lighting, camera positioning, worker movement, partial occlusions, and crowded scenes. The system was able to categorize workers into four distinct classes: workers wearing both helmet and
  
  vest (safe), workers without helmets, workers without vests, and workers lacking both safety items. The detection system maintained consistent accuracy across diverse scenarios, including situations where workers were partially blocked by equipment or other personnel. The processing speed of the system was also evaluated to determine its suitability for real-time deployment. The model processed video frames at an average rate of approximately 150250 milliseconds per frame, depending on the hardware configuration. This near real-time performance ensures that safety violations are identified without noticeable delay. Overall, the detection module demonstrated reliability and adaptability under practical site conditions.
3. System Responsiveness and Operational Effectiveness Beyond detection accuracy, the responsiveness of the
integrated alert mechanism was also tested. The system was deployed in a simulated monitoring environment to observe its reaction to safety violations. When a worker was detected without a helmet or reflective vest, the system immediately marked the individual with a visible warning indicator. A red bounding box was displayed around the worker, accompanied by an on-screen alert message. In addition, an audible warning was generated to draw attention to the violation. In extended configurations, notifications can also be forwarded to supervisors for further action. The alert response was generated within approximately 23 seconds after detecting non-compliance. This rapid feedback mechanism allows site authorities to intervene promptly and enforce safety regulations. The system was further evaluated for continuous monitoring capability. It demonstrated stable performance during prolonged video streaming sessions and was capable of detecting multiple workers simultaneously without significant degradation in accuracy. The automated nature of the system reduces reliance on manual supervision and minimizes the possibility of overlooked violations. Feedback collected from site supervisors during testing indicated that the system enhances safety awareness among workers. The presence of automated monitoring encourages compliance with protective equipment guidelines and contributes to a safer working environment.

Figure 2:YOLOv8s-Based CNN Architecture for Object Detection

Figure 3:Sample Screen Shot-1

Figure 7:Object detection Sample Screen Shot-3

Figure 4:Sample Screen Shot-2

Figure 5:Object Detection Sample Screen Shot-2

Figure 6:Sample Screen Shot-3
CONCLUSION
This study developed a smart construction site safety monitoring system using YOLOv8-based object detection to identify whether workers are wearing helmets and reflective vests. The model was trained on an annotated dataset and achieved strong detection accuracy while maintaining real- time performance on live CCTV footage.

The system successfully classified workers as safe or unsafe under different lighting conditions, camera angles, and partial occlusions. With an average processing time of 150

250 milliseconds per frame, it enables continuous monitoring without noticeable delay. Instant visual and audio alerts help supervisors quickly address safety violations, reducing reliance on manual inspection.

Although the system performs effectively, future enhancements can improve performance in challenging conditions such as dust, poor visibility, and crowded sites. Additional safety equipment detection may also be incorporated to expand its functionality. Overall, the proposed approach contributes to improving workplace safety through automated and intelligent monitoring.
REFERENCES

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, vol. 25, 2012.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
C. Szegedy et al., Going deeper with convolutions, in Proceedings of the IEEE Conference on Cmputer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015.
L. L. Bergamini, M. Despeisse, and P. Rochat, Wildlife detection in video surveillance using machine learning,
in International Journal of Machine Learning and Applications, vol. 6, no. 2, pp. 102-112, 2017.
A. Gómez, A. Salazar, and F. Vargas, Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks, in Ecological Informatics, vol. 41, pp. 24-32, 2017.
V. Gnanavel, B. S. Reddy, and S. Arul Prakash, Real- time animal detection and classification using deep learning, in Journal of Artificial Intelligence Research, vol. 11, no. 3, pp. 151-160, 2019.
H. Singh, V. Kumar, and A. Gupta, Real-time IoT- based wild animal detection system using machine learning, in International Journal of Intelligent Systems and Applications, vol. 12, no. 6, pp. 45-53, 2019.
R. V. Kulkarni and N. Satyanarayana, IoT based smart animal intrusion detection system using deep learning techniques, in International Conference on Electronics, Computing and Communication Technologies (CONECCT), pp. 1-6, 2020.
S. Chakravarty, G. Cozzi, and A. Ozgul, Real-time animal tracking in forest environments using deep learning and IoT, Journal of Ecological Applications, vol. 30, no. 2, pp. 287-295, 2020.
M. A. R. Khan, H. Farooq, and A. R. K. Siddique, An IoT-based wildlife monitoring system using machine learning, in Journal of Ambient Intelligence and Humanized Computing, vol. 12, pp. 1635-1646, 2021.
J. Li, M. K. Wong, and Y. Z. Yang, Deploying machine learning in wildlife conservation: Challenges and solutions, in Ecology and Evolution, vol. 11, no. 15, pp. 10045-10061, 2021.
A. S. Kumar, R. P. Sharma, and N. Sharma, Multi- object detection of wildlife using advanced deep learning techniques, in International Journal of Computer Applications, vol. 179, no. 5, pp. 12-17,
2021.
R. H. Hu, D. W. Wu, and Q. W. Zhang, Animal behavior analysis in real-time video using deep learning, in International Journal of Image Processing, vol. 14, no. 3, pp. 112-120, 2020.
C. J. R. Z. Tavares, R. N. S. Cardoso, and L. B. B. Rodrigues, Integrating machine learning and IoT for wildlife conservation, in
Journal of Conservation Biology, vol. 35, no. 3, pp. 543-554, 2021.
V. Baskaran, R. Suresh, and S. Kannan, Smart system for animal detection using CNN and IoT-based alarm mechanism, in International Journal of Advanced Research in Computer and Communication Engineering, vol. 10, no. 2, pp. 75-81, 2021.
Z. Zhao, P. Zheng, S. T. Xu, and X. Wu, Object detection with deep learning: A review, IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212-3232, 2019.
S. K. Kaur, A. Tiwari, and S. Kumar, A deep learning approach for wildlife detection in forest regions, in Proceedings of the International Conference on Advances in Computing, Communication, and Control (ICAC3), pp. 45-50, 2021.
M. R. Santhi, S. H. Rahman, and P. V. Prakash, IoT- based smart wildlife monitoring system using deep learning algorithms, in Journal of King Saud University
– Computer and Information Sciences, vol. 33, pp. 1021-1031, 2021.
R. H. Li, C. J. Qiu, and X. H. Huang, Real-time wildlife tracking with deep learning and IoT, in Environmental Monitoring and Assessment, vol. 192, no. 7, pp. 482-
491, 2020.
M. R. M. Yasin, J. M. Ali, and F. M. Khan, Monitoring animal behavior with IoT: A machine learning approach, in Journal of Zoological Research, vol. 22, no. 1, pp. 77-88, 2021.
N. Kumar, R. Singh, and A. Shukla, Challenges in real- time wildlife detection using deep learning, in Journal of Wildlife Management, vol. 85, no. 6, pp. 1189-1198,
2021.
A. M. Hasan, T. N. Adnan, and M. I. Siddique, Deep learning for wildlife monitoring: A comprehensive review, in Journal of Animal Ecology, vol. 90, no. 9,
pp. 2143-2156, 2021.
S. K. Pillai, B. R. Rao, and V. R. Patil, IoT-based animal monitoring system for wildlife conservation, in International Journal of Environmental Science and Technology, vol. 18, no. 4, pp. 1045-1054, 2021.
P. G. Bhat, R. V. Raghavan, and S. N. Iyer, Enhancing wildlife detection systems using hybrid deep learning techniques, in Computers and Electronics in Agriculture, vol. 176, pp. 105-120, 2021.
C. S. Pan, A. K. Das, and M. P. Gupta, Machine learning in wildlife conservation: Opportunities and challenges, in Journal of Conservation Planning, vol. 17, no. 2, pp. 102-112, 2021.