Detection of Human Targets from Thermal Images

Download Full-Text PDF Cite this Publication

Text Only Version

Detection of Human Targets from Thermal Images

Mr. More Rahul Tanaji Computer systems Dept, MCEME, Secunderabad, 500015.

Abstract- The available human detection systems are used in applications like self-directed vehicles, investigate and rescue operations. These inspection systems limit itself in night surveillance due to use of RGB cameras. As we know that, there are varieties of applications like boundary surveillance, security purposes, monitoring systems, anomaly or interloper detection, which must seek a system capable of night surveillance.[1] This paper presents a human detection method for infrared images. The major contributions is the combination of the pixel-gradient and body parts processing proposed to reduce the false detection. The presented algorithm has been tested on the real thermal images using HHTI (Hand held thermal imager) taken in real environment. Also some limitations have been identified, such as problem with detection groups of the overlapped people, occluded and small targets.

Keywords – Interloper detection, Geometric matching, Edge detection Foreground segmentation, Parallax, clustering and ltering, temporal average, Image fusion, pixel-level semantic features.

friendly transportation, reduction of false alarms among automatically detected potential distinct heating leakages in power distribution transformers.



Human recognition, classification, or generally detection on thermal images is a increasing part of computer vision. In practice human detection is a complex task due to many physical parameters that can impact the final result of image processing. The detection is dependent on the human pose, clothes that it wears, type of hair, effort, the environment in which a human being is present. Different types of the cameras, angle of the camera, Distance to the human, changing background. In recent years many of the solutions have been proposed, based on shape, geometric matching, skin color, classification of the body parts.

Thermal images have several differences regarding to the visual camera images such as they include the distribution of temperatures in taken picture, what is advantage from one hand, because we get a new quantity that can be used to detect human, but the other hand thermal images include some objects with temperature similar to the human.


To implement the technique, for identification of human target from thermal images.

Detection and tracking in thermal infrared images has been matter of curiosity mainly for military purposes. Increasing image quality and resolution combined with decreasing price and size has opened up new relevance areas. Thermal cameras are advantageous in many applications due to their ability to see in total darkness, their toughness to illumination. Many applications connected to thermal cameras can be related to civil society e.g. prevention and localization of energy losses, as well as environmental

Fig 1- Approach to Paper


Human recognition, categorization, or detection on thermal images is an intensifying part of computer vision. In practice, human detection is a difficult task due to many physical parameters that can impact the final result of image processing. The detection is dependent on the human pose, clothes that he or she wears, type of hair, effort, the environment in which a human being is present. Different types of the cameras, angle of the camera, distance to the human, changing background. In recent years many of the solutions have been proposed, based on shape, geometric matching, skin color, classification of the body parts. There is no universal method for human detection, all existing methods have been adjusted to a specific situations[3].

Process of the human detection can be described by a few general steps. First, after the image acquiring, the preprocessing methods have been performed such as filtering, some edge detection operations. Next, the features selection process produces number of the objects in the image. Finally,

use of classifier decides which of the detected object can be categorized as human. Human detection methods in many solutions have been combined with other applications, in consequence can obtain information apparently not connected with human detection such as height, haircut, walking style, walking alone or in group of people. Nowadays, human detection algorithms find applications in many fields, for example science simulations, sport activities, automotive industry, rehabilitation, video games (MS Kinect), for intelligent cars to prevent accidents, biometrical identification systems, also give us many useful information , such as identification of human position, state, motion direction, gesture recognition, 3D modeling of the human body.


The collection of dataset consists of two phases.

  1. Phase-I Experimentation using publically available datasets[2]

  2. Phase-II Generation of own dataset.

  3. Phase-III Human detection from thermal image dataset.

    Experimentation Using Publically Available Datasets

    Free thermal dataset for algorithm training open- sourced which comprises of annotated gray (thermal) imagery and non-annotated RGB imagers for reference. Camera centerlines are around 2 inches apart and collimated to minimize parallax.

    Visible Infrared database for research developed by various private companies are available online. The infrared and visible sequences are synchronized and registered. The database is available freely for research and development purposes.

    Benchmark Dataset Collection – This is a publicly available benchmark dataset for testing and evaluating novel and state- of-the-art computer vision algorithms.

    Manual generation of thermal images would require the availability of a thermal camera. Therefore, necessitating the need to utilize the open source datasets of thermal images.


    Features extraction of thermal image affects the design and performance of subsequent classifier greatly. It is among the aspects of image preprocessing, image Classification and recognition and pattern recognition. Features extraction transforms high-dimensional feature information into low dimensional feature information by the method of mapping or transformation, and selects those that contributes to the classification larger and reflects the essential characteristics of the classification best. The purpose of features extraction is to convert the image into essential characteristics, which can reflect the classification best, by some certain transformations, and this lays a solid foundation for the following target recognition and classification. After connected components clustering and ltering, regions of interest are tracked to have chronological information of 2D

    movements. Then identification of these connected components (that is to say if they are human beings or not) is carried out.




    Desktop Keyboard

    Python 3.5 or higher version

    Conda lib & Python lib

    Tensor flow framework, API tool

    Open CV

    Graphic card

    Image Training & labeling link

    Image Capture connector

    Dark net code base

    Foreground Segmentation-

    The main objective is to simplify the image to proeed without any important information alteration. Only the detected regions of interest will be treated in the following steps. The background is the joining together of all static objects and the foreground is composed of all regions in which there is a high probability to identify a human target.

    There are two ways for the foreground segmentation. Some algorithms are based on temporal dierence between two or three consecutive frames. Other algorithms perform a subtraction between the existing frame and a background model. The most used background models are a temporal average, a single Gaussian distribution.

    The background subtraction with a single Gaussian distribution presents good performances in terms of detection and computation time. we can take into account the acquisition system noise. The foreground detection is explained by [4]:-

    R1.t(p,q) = 1 if It(p,q)t(x,y)| > T 1.t(p,q) (1) R1.t(p,q) = 0 otherwise,

    Where, (x,y) are the pixel coordinates and It(x,y) its value in gray scale at time t.

    R1.t is the binary image of the foreground detection, t and t are respectively the average and the standard deviation dT1 is a threshold set to 2.6.

    If R1.t(p,q) = 0, the Gaussian model is updated with : t(p,q) = (1k).t1(p,q)+.It(p,q ) (2)

    2 t(p,q) = (1).2 t1(p,q)+.(It(p,q)t1(p,q) (3)

    Where k is a threshold determined empirically. Far infrared vision allows to see in night environment and gives information about the scene temperature. With the hypothesis that the human temperature is noticeably higher than the

    average of his environment, we can perform a binary operations to detect hot areas in image.

    R2.t(p,q) = 1 if It(p,q) > T 2 (4) R2.t(p,q) = 0 otherwise,

    Where, R2.t is the binary image representing hot areas, T2 is an arbitrary threshold. A logic between the background subtraction and the hot area detection result as:-

    Rt(p,q) = R1,t(p,q) B2,t(p,q) (5) Where, Rt is the result of our foreground segmentation. predictions for each location.

    Fig 2- Detection of nearby objects

    Fig 3- Detection of Occluded object

    Fig 4- Detection of distant object

    Fig 5- Detection of Group objects

    Fig 6- Detection accuracy


      In various circumstances, information about room occupation is really important for many systems, but human detection in video or in images is still a challenging task. In this report we as a syndicate propose an extension of object detection systems using advantages given by the video. In our approach, the foreground segmentation is used in order to limit the search space of our classier.

      Moreover, the 2D tracking system improves the global performance because we have multiple images of the same person at dierent moments. Experimental results show the eciency of our approach. However, it remains several ways of improvement. First, the classier concert is closely related to the database quality, and our database of infrared images can be improved. Second, we have to learn several classiers for one human.

      As we work in indoor environment, occlusions are frequent, we could improve the robustness if we learn a part of the body which is more often visible (e.g. head and shoulder). A fusion with the visible spectrum can also decrease the number of false positive detections (because of the reective surfaces in the infrared spectrum). Finally, we plan to develop our system in order to recover high-level information on human activities in a room.


      Though the suggested algorithm for human detection from thermal images stands successful, it struggles to give similar accuracy for smaller objects, and occluded targets, hence for future work, there is always a scope to improve our detection results using more powerful matching strategies for assigning weak labels to classication data. Image fusion is an enhancement technique that aims to combine images obtained by different kinds of sensors to generate a robust or informative image that can facilitate subsequent processing or help in decision making. Method with adequate fusion principles and powerful representations that combine pixel- level semantic features is still awaiting which will significantly improve accuracy for current applications and contribute to inventions of advanced systems in different areas.


    Author would like to thank Mr Krishna kumar KP (Dean faculty of Electronics), Mr Kuldeep Yadav (Hod CS dept) and Mr Shantanu Ghatak (Project Guide) for their insightful guidance, instructions, comments and suggestions, which have greatly improved this paper.


    1. Krito, M., Ivasic-Kos, M. and Pobar, M., 2020. Thermal Object Detection in Difficult Weather Conditions Using YOLO. IEEE Access, 8, pp.125459-125476.

    2. Ivai-Kos, M., Krito, M. and Pobar, M., 2019, April. Human detection in thermal imaging using YOLO. In Proceedings of the 2019 5th International Conference on Computer and Technology Applications (pp. 20-24).

    3. Redmon, J. and Farhadi, A., 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

    4. Redmon, J. and Farhadi, A., 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).

    5. Khandhediya, Y., Sav, K. and Gajjar, V., 2017. Human detection for night surveillance using adaptive background subtracted image. arXiv preprint arXiv:1709.09389.

    6. Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

    7. Johansson, J., Solli, M. and Maki, A., 2016, October. An evaluation of local feature detectors and descriptors for infrared images. In European Conference on Computer Vision (pp. 711-723). Springer, Cham.

    8. E.S., Kim, J.H., Hong, H.G., Batchuluun, G. and Park, K.R., 2016. Human detection based on the generation of a background image and fuzzy system by using a thermal camera. Sensors, 16(4), p.453.

    9. Jeon, E.S., Choi, J.S., Lee, J.H., Shin, K.Y., Kim, Y.G., Le, T.T. and Park, K.R., 2015. Human detection based on the generation of a background image by using a far-infrared light camera. Sensors, 15(3), pp.6763-6788 & Jeon.

Leave a Reply

Your email address will not be published. Required fields are marked *