Machine Health Monitoring Using Infrared Thermal Image by Convolution Neural Network

DOI : 10.17577/IJERTCONV6IS07026

Download Full-Text PDF Cite this Publication

Text Only Version

Machine Health Monitoring Using Infrared Thermal Image by Convolution Neural Network

M. Keerthi


Department of CSE, Arasu Engineering College, Kumbakonam, Tamilnadu, India.

R. Rajavignesh

Associate Professor,

Department of CSE, Arasu Engineering College, Kumbakonam, Tamilnadu, India.

AbstractMonitoring the condition of a machine is based on various features and characteristics of the measured signals. In practice the machine health monitoring is fully depends upon the human experience. Recently these features are based on the knowledge. Hence the performance and usefulness are based on the experts knowledge of statistics. To implement new feature extraction, new condition should be applied. To overcome the drawbacks of feature engineering here in this paper the deep learning technique more specifically known Convolution Neural Network (CNN) is used. The main objective of this paper is to investigate how the convolution neural network is applied in infrared thermal video or images to determine the condition of the machine. This method is used in (IRT) infrared thermal data in order to investigate the detection of fault in the machine. The proposed system is able to detect any condition in rotating machinery very accurately without knowing any detailed knowledge about the statistics. Here we show by using trained Convolution Neural Networks, important regions in infrared thermal images can be specified according to specific conditions this leads to new physical insights.

KeywordsConvolutional Neural Network, Infrared Thermal Image, Machine Health Monitoring, Deep Learning.


Thermal imaging is a method of improving visibility of objects in a dark environment by detecting the objects' infrared radiation and creating an image based on that information. Thermal imaging, near-infrared illumination, low-light imaging and are the three most commonly used night vision technologies. Unlike the other two methods, thermal imaging works in environments without any ambient light. Like near- infrared illumination, thermal imaging can penetrate obscurants such as smoke, fog and haze. Machine Health Monitoring provides efficient methods by reducing downtime and preserving assets. Vibration monitoring will enable preventive maintenance on almost any type of machine. It could be done by recurrent control or real time analysis. Condition monitoring involves fixing sensors to mechanical parts of the machines in order to track failures and malfunctions [1].

Deep learning is a class of machine learning algorithms that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. It learns in supervised (e.g., classification) and unsupervised (e.g., pattern analysis) .It learns multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts. , In machine learning, a Convolutional Neural Network (CNN) is a class of deep, feed-forward artificial neural networks that has successfully been applied to

analyzing visual imagery. CNNs use a variation of multilayer perceptron designed to require minimal preprocessing [2].


    In this the paper the authors described about the deep learning algorithm and CNN. Different combinations of condition patterns based on some basic fault conditions are considered. 20 test cases with different combinations of condition patterns are used, where each test case includes 12 combinations of different basic condition patterns. Vibration signals are preprocessed using statistical measures from the time domain signal such as standard deviation, skewness, and kurtosis. In the frequency domain, the spectrum obtained with FFT is divided into multiple bands, and the root mean square (RMS) value is calculated for each one so the energy maintains its shape at the spectrumpeaks.The achieved accuracy indicates that the approach is highly reliable and applicable in fault diagnosis of industrial reciprocating machinery [1].

    Here the authors presented a typical component in rotating machinery, the gearbox, as the target to study a proper monitoring and fault diagnosis method to prevent malfunction and failure. The failure is divided into two levels. One is at the component level that includes various gear faults, and another is at system level that studies machinery statuses include looseness, misalignment and unbalance. Two intelligent methods include artificial neural network (ANN) and genetic algorithms (GAs) are combined to carry out signal classification and analysis. ANNs are one of the common machine learning technologies that used for detecting and diagnosing faults in rotating machinery [8].

    In this paper author discussed the infrared imaging technique. Several infrared thermal videos are captured of a rotating set-up using various rotation speeds, loads, oil temperatures and flow rates. These infrared thermal videos are given as input to an image processing and machine learning system that can automatically extract the relevant region of interest, features, and subsequently make a prediction regarding the oil level in the bearing. Evaluation showed that the system achieves an accuracy of 96.67 %.To regulate the oil level automatically it has to be determined automatically[4]

    In this paper the broadside thermal imaging is used for condition monitoring to diagnosis the fault a novel automatic fault detection system using infrared imaging, focusing on

    bearings of rotating machinery. The set of bearing faults monitored contain faults for which state-of-the-art techniques have no general solutions such as bearing-lubricant starvation. For each fault, several recordings are made using different bearings to ensure generalization of the fault detection system. The system contains two image-processing pipelines, each with their own respective purposes. The final system is able to distinguish between all eight different conditions with accuracy of 88.25% [3].

    In particular this paper presents the application of SIMAP to the health condition monitoring of a wind turbine gearbox as an example of its capabilities and main features. SIMAP is a general tool oriented to the diagnosis and maintenance of industrial processes; however the first experience of its application has been at a wind farm. In this real case, SIMAP is able to optimize and to dynamically adapt a maintenance calendar for a monitored wind turbine according to the real needs and operating life of it as well as other technical and economic criteria.(10)


    Convolutional Neural Networks (CNN) is one kind of feed forward neural network. In 1960s, when Hubel and Wiesel researched the neurons used for local sensitive. CNN is an efficient recognition algorithm which is widely used in pattern recognition and image processing. It has many features such as simple structure, less training parameters and adaptability. It has become a hot topic in voice analysis and image recognition. Its weight shared network structure makes it more similar to biological neural networks. It reduces the complexity of the network model and the number of weights. Generally, the structure of CNN includes two layers one is feature extraction layer, the input of each neuron is connected to the local receptive fields of the previous layer, and extracts the local feature.

    CNN is mainly used to identify displacement, zoom and other forms of distorting invariance of two-dimensional graphics. Since the feature detection layer of CN learns by training data, it avoids explicit feature extraction and implicitly learns from the training data when we use CNN. Furthermore, the neurons in the same feature map plane have the identical weight, so the network can study concurrently. This is a major advantage of the convolution network with respect to the neuronal network connected to each other. Because of the special structure of the CNNs local shared weights makes it have a unique advantage in speech recognition and image processing. Its layout is closer to the actual biological neural network.

    A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers .Description of the process as a convolution in neural networks is by convention. Mathematically it is a cross-correlation rather than a convolution. This only has significance for the indices in the matrix, and thus which weights are placed at which index.

    Fig 1.1 Neural Network Diagram

    The three basic components to define a basic convolutional network.

    1. The convolutional layer

    2. The Pooling layer [optional]

    3. The output layer

    1. Convolution layer mainly performs convolution operation between the input matrix – a representation for the input image or a feature map (will be discussed later), and the convolution kernel – a tiny coefficient matrix. Given f is the filter index, c is the channel index and C is the total number of channels, then the convolution layer can be described as follows:

      Of = C c=1 conv(Ic,Kf,c) +bf (1)

      This equation means that each output filter will sum up all convolution results between each channel of the input feature map (Ic) and the kernel (Kf,c). In many architecture, an activation function can be applied to the result elements, like Rectified Linear Unit (ReLU).

    2. Fully-Connected layer is an affine transformation of the input feature vector. Fully connected layer contains a single matrix-vector multiplication followed by a bias offset.

    3. Max-Pooling layer performs a sub-sampling method that takes only the maximum value of each small region in the input matrix. These regions can be constructed by performing sliding window operations on the input matrix.

    4. Feature map is the core idea to understand how CNN works. Every input and output matrix inside the CNN can be viewed as a feature map, which contains extracted features for the given image. Image classification aims at transforming 2 the whole feature map into object classification scores by using fully-connected layers, and object detection aims at exploring region information

    CNN Architecture Design

    CNN algorithm need experience in architecture design, and need to debug unceasingly in the practical application, in order to obtain the most suitable for particular application architecture of CNN. Based on gray image as the input of 96 96, in the preprocess stage, turning it into 32 32 of the size of the image. Design depth of the layer 7 convolution model: input layer, convolution layer C1, sub sampling layer S1, convolution layer C2, sampling layer S2, hidden layer H and output layer F.

    CNN pre -Training

    1. Pre-train CNN for image classification CNN fine tuning

    2. Fine tune CNN on target dataset

    Algorithm for Convolution layer computation with two nested loops:

    Input: A feature map I of shape NC ×BH ×BW Input: A coefficient matrix K of shape NF ×NC ×BH


    Output: A feature map O of shape NF ×BH ×BW /s2 for f 0 to NF do

    For c 0 to NC do

    O[ f ] O[ f ] +conv(I[c],K[ f, c]) (a)Filter major:

    Algorithm presents the filter major pattern. Once we complete the inner reduce add loop of channels for each output filter f in the filter major pattern, the final result for this filter will be ready. Thus, we only need to store BHBW /s 2 , which is the shape of one output filter, in the output buffer. However, it needs to iterate through all the channels of the input feature map and the associated coefficient kernel, so the input buffer size of the filter major pattern is

    (BHBW +k 2) NC +kBW,

    where kBW is the line buffer size.

    (b) Channel major:

    In this case, the channel iteration is the outer loop. After each iteration in the outer loop, only partial results for all NF filters are available and they will be updated in the following iterations. Thus the output buffer is required to have size

    NF ×BHBW /s 2 .

    For the input buffer, only one channel of the input feature map needs to be cached, but all the coefficients for this channel should also be stored in the input buffer. Hence the input buffer size is

    BHBW +k 2NF +kBW .

    The line buffer is also required for this case. After the text edit has been completed, the paper is ready for the template. Duplicate the template file by using the Save As command, and use the naming convention prescribed by your conference for the name of your paper. In this newly created file, highlight all of the contents and import your prepared text file. You are now ready to style your paper; use the scroll down window on the left of the MS Word Formatting toolbar.


    Architecture represents the detection of crack. In this module, I will upload various images that related to the machine image. The preprocessing step is used to remove the low-frequency background noise. In denoising step, I will handle scaling, enhancement and Gaussian Filter. Denoising

    Fig1.2 System Architecture

    goal is to remove the noise and preserve useful information from it. After that the denoised image is applied it into the screening stage. In this stage the purpose is to remove the noised area and it handles training and testing phase. During the training phase, the samples are extracted from the damaged regions to expand the training database. In the discrimination stage, the candidates obtained from the screening stage are carefully distinguished while applying the HCSD. HCSD allows only those having the specific structure. The final accurate detection results are found in bounding box.

    Fig 1.3 Dataflow diagram

    Dataflow diagram specifies the flow of data between the user and actor or system. It denotes the actions carried by the actor, users step by step. It simplifiers the overall system behavior by representing them in a flow diagram. A DFD is often used as a preliminary step to create an overview of the system, which can later be elaborated. This diagram can also be used for the visualization of data processing. Input of any process is denoted by rectangular box.

    Modules description

    1. Image Preprocessing

      Thermal Image pre-processing can significantly increase the reliability of an optical inspection. Several filter operations which intensify or reduce certain image details enable an easier or faster evaluation. This includes correcting any intensity or shade irregularities in the image. This corrects the inherent problems associated with the scanning technology used, making the image a true replica of the actual features of the board. The Image Enhancement which includes the procedure of preprocessing the images. Two procedures which are used to enhance the image quality (a) Image Sharpening and (b) Contrast enhancement along with the following modules


      • Blocking is the process of dividing the RGB image into four blocks, each 250×250 pixels

        RGB to Gray

      • RGB to gray is the process of converting each block RGB image into gray image.

        Image Blocks (Gray) and Denoising.

      • Image blocks is the images converted from RGB images into gray image, which consists of the full image (500×500), top left block (250×250), the top right block (250×250), bottom left block (250×250), and the bottom right block (250×250).

    2. ImageScreening

    In order to accurately and efficiently detect cracks from volumetric brain SWI data, they propose a robust and efficient method by leveraging 3D CNNs. Specifically, our method consists of two stages that are designed in a cascaded manner. The first stage is the screening stage, in which a small number of candidates are retrieved using a novel 3D full convolution network (3D FCN) model. The screening strategy with the 3D FCN model can achieve significant acceleration compared with the conventional sliding window strategy under the same setting of the sampling stride. During the training phase, the positive samples are extracted from crack regions and augmented by translation, rotation and mirroring to expand the training database. In practice, the network is trained with three sub-steps. They start from training an initial 3D CNN with randomly selected non-crack regions throughout the brain negative samples. Next, they add false positive samples acquired by applying the initial model on the training set.

    (C)Candidate Generation

    During the testing phase, the 3D FCN model takes the whole volume as input and generates the corresponding coarse 3D score volume. Considering that the produced score volume could be noisy, they utilize the local non-max suppression in a 3D fashion as the post-processing. Locations in the 3D score volume are then sparsely traced back to coordinates in the original input space, according to the index mapping process

    presented. Finally, regions with high prediction probabilities are selected as the potential candidates. The fewer false positives produced the more powerful discrimination capability a screening method has. The proposed 3D FCN model achieves the highest sensitivity with least average number of false positives, which highlights the efficacy of the proposed method.

    (D)Image discrimination

    The second stage is the discrimination stage, where the candidates obtained from the screening stage are carefully distinguished with a 3D CNN discrimination model. This stage removes a large number of false positive candidates and yields the final detection results. In this stage, 3D small blocks are cropped centered on the screened candidate positions. The size of these blocks was carefully validated. They first found that a number of false positives were produced in the first stage with a training block size of 16 _ 16 _ 10. By enlarging the block size, richer contextual information within larger surrounding neighborhood can provide additional clues to better distinguish cracks from their mimics. However, due to the small size of crack, the cropped block size cannot be too large. Otherwise, redundant contextual information would be introduced a may degrade the performance. An ensemble of DBNs with multi- objective evolutionary optimization on decomposition algorithm (MOEA/D) for fault diagnosis with multivariate sensory data.

    Step 1: Identify data set Step 2: Data preprocessing

    Step 2.1: Data cleaning (remove noisy or inconsistent


    Step 2.2: Data transformation (normalize the data) Step 3: Training set data

    Step 3.1: Implement correlation analysis Step 3.2: Implement DNN

    Step 4: Construct pre Training process Step 5: construct fine tuning

    Step 6: Diagnosis of output image

    Fig 1.5 Confusion Matrix using CNN


    Thus the work based on deep learning has achieved 90% of accuracy in machine health monitoring using infrared thermal image.

    Here, the below figures are output of the analysis work based on the thermal image and detection of crack on the walls this are performed in mat lab tool.

    Fig. 2.1

    Fig. 2.2



This work focuses on automated early-stage bearing fault detection using IR imaging. IR imaging has already been applied to detect shaft misalignment, rotor imbalance, bearing looseness and general bearing faults as discussed. Therefore, this project work focuses on faults that are not timely and reliably detectable by current state-of-the-art techniques, such as different levels of bearing lubrication degradation, additional to outer-raceway faults. All conditions are tested during both, imbalanced and balanced machine conditions, as presented. To detect these faults, both specific features and system architecture are proposed for which the classification results are presented and discussed.


  1. ZhiQiang Chen,1 Chuan Li, Gearbox Fault Identification and Classification with Convolutional Neural NetworksHindawi Publishing Corporation Shock and Vibration Volume 2015.

  2. Y. Gao, M. A. Maraci and J. A. Nobles. Describing Ultrasound Video Content Using Deep Convolutional Neural NetworksIEEE ,978-1-4799-2349-6/16/ 2016.

  3. Olivier Janssens a,, Raiko Schulz a, Viktor SlavkovikThermal image based fault diagnosis for rotating machinery Elsevier

    1350-4495/ 2015

  4. Olivier Janssens1, Mathieu Rennuy Towards Intelligent Lubrication Control: Infrared Thermal Imaging for Oil Level Prediction in Bearings IEEE 978-1-5090-0755-4/16/ ©2016

  5. WadeA.Smithn,RobertB.Randall Rollingelementbearing diagnostics using the CaseWesternn Reserve University data:A benchmarkstudy 0888-3270/& 2015 Elsevier.

  6. Xueqiong ZENG1, Yixiao LIAO1, WeihuaGearbox fault classification using S-transform and convolutional neural network, 978-1-5090-0795-0/15 ,2016 Tenth International Conference on Sensing Technology

  7. Wenlu Zhang, Rongjian Li, Tao Zeng, Qian Sun, Sudhir Kumar, Jieping YeDeep Model Based Transfer and Multi-Task Learning for Biological Image Analysis2332-7790 (c) 2016 IEEE..

  8. Zhixin yang, wui ian hoi jianhua zhong Gearbox Fault Diagnosis based on Artificial Neural Network and Genetic AlgorithmsProceedings of 2011 International Conference on System Science and Engineering,.

  9. Ali MD. Younus, Bo-Suk Yang Intelligent fault diagnosis of rotating machinery using infrared thermal image2011 Elsevier doi:10.1016/j.eswa.2011.08.004.

  10. Mari Cruz Garcia a, Miguel A. Sanz-Bobi a,*, Javier del, Intelligent System for Predictive Maintenance Application to the health condition monitoring of a windturbine gearbox doi:10.1016/j.compind.2006.02.011.

Leave a Reply