Detection Of Foliar Disease In Apple Leaves Using Deep Learning

DOI : 10.17577/NCRTCA-PID-402

Download Full-Text PDF Cite this Publication

Text Only Version

Detection Of Foliar Disease In Apple Leaves Using Deep Learning

Mukkandla.Venkatesh Bandi Nikhil Babu

Master of Computer Applications (MCA)

Madanapalle Institue of Technology and Science (MITS) Angallu,India

Under the Guidence of:

Dr.R.Maruthamutu (Assistant Professor)

The Foliar diseases of the apple trees commonly reduce the crop Yield and photosynthesis which affects their productivity. Diagnosing foliar damage is not easy if there are no distinct patterns that would be fungal fruiting bodies it will. spread to the rest of the crops. The foliar disease of the apple trees is carried out due to biotic and abiotic causes, some of the biotic causes of foliar damage are – Bacterial Disease, Fungal Diseases, Viral Diseases, Insects, and Mites That Damage Foliage. some of the Abiotic causes are – Iron Chlorosis, Misapplied Herbicide, and Winter Desiccation of Evergreens. Traditional approaches rely on visual inspection by an expert and biological examination is the second choice. These approaches are time-consuming and expensive. we use machine learning methods to classify the disease in apple trees. The automatic detection of diseases in plants is necessary, as it reduces the tedious work of monitoring large farms and it will detect the disease at an early stage of its occurrence to minimize further degradation of plants. Besides the decline of plant health, a countrys economy is highly affected by this scenario due to lower production. The current approach to identify diseases by an expert is slow and non-optimal for large farms. We use pre-trained CNN models to extract features from the dataset, we applied the CNN model and compared them with Pre-trained Models, and we achieve accuracies of over 93% with CNN, among the Models We achieved 92%.

Keywords: -Collection Dataset, Data Pre-Processing, Data Visualization, Convolutional Neural Networks (CNN), Deep Learning


Plant disease affects the growth and crop yield which affects their productivity. We human beings mostly depend on crops. If we diagnose the crops there are many factors to consider behind the disease. If we make the right diagnosis we have a healthy crop, but the process of diagnosis takes a longer time than we expect hiring a visual expert on the crop would be expensive. one of the common factors of crop failure is due to misdiagnosis. misdiagnosis takes place due to misuse of chemicals leading to various factors such as cost of input and emergence of resistant pathogen strains. using machine learning we can classify the disease of the crop, many machine learning methods proves the efficiency of the crop disease detection, but some factors such as the light condition of the crop images and disease detection affect the detection accuracy. To solve this problem and improve the crop yield, we decide to train apple leaves to distinguish whether they are healthy, have rust are infected with scabs, and are infected with multiple Diseases. With over 2,000 years of history in China, areas of apple implantation have expanded annually, and 65% of all apples in the world are now produced in China. The traditional apple industry has been modernized with the support of national economic policies, and China has gradually developed into a major power source in the apple industry. Nonetheless, behind the rapid development of the apple implantation industry, disease prevention and control have been important problems that have long perplexed orchard farmers, who identify apple diseases by referring to experience, books, the Internet, and consulting professional and technical personnel. However, relying solely on these sources is not conducive to timely and effective identification of diseases, and might even cause other problems associated with subjective judgment.

The dataset we used plant pathology makes use of the given category of disease and dataset; we identify the category of

Foliar Diseases in Apple Trees. The dataset consists of 1821 labeled training images of apple tree leaves and 1821unlabeled test images, The images are categorized into four classes which include – healthy, scab, rust, and multiple diseases. Once the leaves are infected fall from trees and die. The tree has orange or yellow spots on its leaves and fruit that is mottled or distorted, you are probably dealing with rust. Multiple diseases class states that the plant is from both scab and rust. Effective automated detection of apple diseases during production not only promptly monitors the health status of apples but also helps orchard farmers to correctly judge apple diseases. They can then implement timely prevention and control to avoid large-scale diseases, which is crucial for promoting the healthy growth of apples and increasing the economic benefits of orchards. Hundreds of apple diseases disseminate in fruits, leaves, branches, roots, and other areas, but often initially appear in leaves, which are easily observed, collected, and managed. Therefore, they are an important reference for disease identification and effective automated detection of diseases is essential. However, judging differences among diseases is difficult due to the complexity of blade veins, resulting in unsatisfactory outcomes of experimental detection methods.


      1. Kanaka Durga and Anuradha utilised SVM and ANN algorithms to categorise leaf diseases in tomato and corn plants. The dataset contained 200 images of healthy and sick leaves, including those with tomato mosaic virus, common rust, bacterial spot, and northern leaf blight. They employed the subsequent procedures to determine the diseases: The RGB image was transformed into grayscale, and then the image was divided by figuring out the intensity gradient at each pixel. HOG was the method used for extracting the features (histogram of oriented gradients). The SVM and ANN classifier models received the characteristics that were extracted. SVM provided a 5565% precision for the corn crop compared to ANN.

      2. To discriminate between diseases that appear to be identical, Bhatt et al. employed CNN architectures to construct a system for diagnosing corn leaf diseases. This framework incorporates decision-tree-based classifiers and adaptable enhancing techniques. The four visual data categories were frequent rust, regular leaf, leaf blight, and leaf spot. Each classs images were from the Plant Village collection. The images were scaled according to the CNN models requirements for image processing techniques. The CNN models provided the classifiers with characteristics. It was shown that inception-v2 offered the maximum degree of accuracy for randomised woodlands. The authors acknowledged that it was challenging to distinguish between the leaf blight and leaf spot classifications depending on the extracted features of each categorisation.

      3. Chen et al. proposed a lightweight network architecture called MobInc-Net for performing crop disease recognition and detection. The proposed architecture enhances the Inception module by replacing the original convolutions with depth-wise and point-wise convolutions. The modified Inception (M- Inception) module is paired with a pre-trained Mobile Net to extract high-quality image features. A completely linked SoftMax layer is added with the actual number of categories, followed by an SSD block, to classify and detect crop disease types. This architecture seems to be designed specifically for crop disease recognition and detection, and lightweight modules may make it suitable for deployment on resource- constrained devices.


Convolutional Neural Network (CNN) based deep learning models have been the choice for Image Classification related tasks over Multilayer Perceptron (MLP). A CNN convolutional neural network is the most widely used method for extracting important features from huge datasets. Image datasets are usually of huge sizes. Using Convolutional layers shrinks the size of images and hence reduces the computation. CNN do weight sharing, also known as parameter sharing, because of the filters (or kernels) it uses. It makes use of pooling which helps in location invariance. It uses filters to identify patterns in image data without flattening, unlike MLPs, which flatten the input images. Hence CNN makes effective use of spatial information. The architecture of the CNN used on the architecture we made our CNN. The pipeline of the Model Holds RGB rather than Grey Colors of the input images.

The model consists of several layers of different types. Typically begins with the Sequential layer, followed by several layers of convolutional layers, more grouping layers, and activation layers and ends with a dense layer with filter size four to extract features from the given dataset, like – scab, rust, healthy, multiple diseases. In the convolution layer, the convolution operation is performed to extract features, and the output is passed to another layer by the activation function, we used in our model relu as an activation function it performs well compared to other activation functions.



      Figure:2.1 Analysing Data From the above plot, we came to know that:

      • 71.7% of TRAIN images are unhealthy. And, only 5% of TRAIN images have multiple diseases.

      • Only 28% of the entire samples are non-infected or totally healthy.

      • And, in the dataset, we have approximately the same number of infected and non-infected samples for Rust and Scab.


Data preprocessing is required to remove unwanted noise and outliers from the dataset that could lead the model to depart from its intended training. This stage addresses everything that prevents the model from functioning more efficiently. The collection of the relevant dataset, the data must be cleaned and prepared for model development. The dataset containing 3642 images in that dataset containing duplicate images its looks like similar. If the dataset we are dealing with contains images of different sizes, then it becomes mandatory to resize it to a fixed size. Large-sized images require more trainable parameters to train a model, resulting in the need for more computation power. Hence if we have limited computing power, size reduction of images can be the savior. The size to which we resize our image should be chosen carefully. We must make sure that we are not throwing away much of the information by reducing the size, at the same time not overloading our model with too many

parameters. The second case would result in very slow training and Resource Exhaust errors. In this case study, I resized the images to (224, 224), and it gave decently good results.

Label encoding converts the datasets string literals to integer values that the computer can comprehend. As the computer is frequently trained on numbers, the strings must be converted to integers. The gathered dataset has five columns of the data type string. All strings are encoded during label encoding, and the whole dataset is transformed into a collection of numbers. The dataset used for stroke prediction is very imbalanced.

Figure:2.2 Data Pre-Processing


Data visualizations are used to discover unknown facts and trends. You can see visualizations in the form of line charts to display change over time. Bar and column charts are useful for observing relationships and making comparisons. A pie chart is a great way to show parts-of- a-whole. And maps are the best way to share geographical data visually. The distribution of RGB channel values for all the images. This initial look indicates the RED channel values are very much like the Normal Distribution. Whereas, for the BLUE channel we can see the heavy right tail. The GREEN channel, we have a long-left tail with overall values greater than the other two channels. This again tells us that the peak of the BLUE channel is flat and wide. And, there are very few images with extreme values, and majority of the images for the RED and GREEN channels have values closer to the mean.

Figure: 2.3 Data Visualization


      • Convolutional Neural Network (CNN) based deep learning models have been the choice for Image Classification related tasks over Multilayer Perceptron (MLP).

      • CNN do weight sharing, also known as parameter sharing, because of the filters (or kernels) it uses.

      • It makes use of pooling which helps in location invariance.

      • It uses filters to identify patterns in image data without flattening, unlike MLPs, which flatten the input images. Hence CNN makes effective use of spatial information.

      • The advantage of using such a model is that these models are built after some extensive research



    The model Evolution module is the final step in a deep learning pipeline, which includes data collection, preprocessing, training and validation. The model evolution module uses the trained model generated from the previous steps to make predictions on new data.


The data has become available for model construction once it has been processed. A preprocessed dataset and Convolutional Neural Networks are needed for the model construction. The designed systems block diagram are below we are described.

and are trained using high-end processors, hence

in most cases, such models give very good results.

Figure: 2.4 Model Creation


The model evolution module is the component of a convolutional neural network system that uses the trained model to make predictions on new data. In the context of detect of foliar disease in apple leaves using deep learning, images are categorized several diseases such as healthy, multiple diseases, rust, scab. The model evolution module could be designed to output a probability of the apple leaves containing disease or not. The exact output of the model evolution module depends on the specific design of the convolutional neural network system and the goals of the project. Its important to note that the accuracy of the prediction module will depend on the quality and quantity of the data used to train the model, and how well the model generalizes to new cases. The model Evolution module in a detect of foliar disease in apple leaves using deep learning project using deep learning techniques would involve the following steps:



    One of the biggest challenges in this problem is the number or percentage of multiple disease leaves is very less as compared to other classes which introduces the problem of severe class imbalance. Thus, accuracy wont be the correct metric to measure the performance of deep learning models. As an important initial step, we will need to plot the confusion matrix. Then we need to check the misclassification i.e., FP and FN. FN means the leaves predicted by the model are healthy but actually they were suffering from some diseases. FP means the leaves detected by the model are suffering from some diseases but actually were healthy.

    1. Input

So, we need to look at the below-mentioned metrics for measuring the model performance:

Confusion Matrix: It is the table where TP, FP, TN, and FN counts will be potted. From this table, we can visualize and track the number of mistakes made by the model.

F1 Score: It is the harmonic mean of precision and recall.





Figure No.1

Figure No.2

Figure No.3

Figure No.4


[1] apple-production-in-india/

[2] scab-disease-intensifies-in-himachals-orchards/

[3] Bhatt, P.; Sarangi, S.; Shiv hare, A.; Singh, D.; Pappula, S. Identification of Diseases in Corn Leaves using Convolutional Neural Networks and Boosting. In

Proceedings of the ICPRAM, Prague, Czech Republic, 19 21 February 2019; pp. 894899. [Google Scholar] [4] Chen, J.; Chen, W.; Zeb, A.; Yang, S.; Zhang, D. Lightweight inception networks for the recognition and detection of rice plant diseases. IEEE Sens. J. 2022, 22, 1462814638. [Google Scholar] [CrossRef] [5] Alatawi, A.A.; Alomani, S.M.; Alhawiti, N.I.; Ayaz, M. Plant Disease Detection using AI based VGG-16 Model. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 718727. [Google Scholar] [CrossRef] [6] Mukti; Zahan, I.; Biswas, D. Transfer learning-based plant diseases detection using ResNet50. In Proceedings of the 2019 4th International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 2022 December 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar] [7] Wu, Z.; Jiang, F.; Cao, R. Research on recognition method of leaf diseases of woody fruit plants based on transfer learning. Sci. Rep. 2022, 12, 15385. [Google Scholar] [CrossRef] [PubMed] [8] Swaminathan, A.; Varun, C.; Kalaivani, S. Multiple Plant Leaf Disease Classification using Densenet-121 Architecture. Int. J. Electr. Eng. Technol. 2021, 12, 3857. [Google Scholar]