Identification of Soybean L eaf S pot D iseases using Deep Convolutional Neural Network s

— In this paper, we designed a Deep Convolutional Neural Network based on LeNet to perform soybean leaf spot disease recognition and classification using affected areas of disease spots. The affected areas of disease spots were segmented from the leaves images using the Unsupervised fuzzy clustering algorithm. The proposed Deep Convolutional Neural Network model achieved a testing accuracy of 89.84%, and poor per class recognition results in 1378 images misclassified, and 1271 images correct classified. TheVGG16 achieved the best performance reaching a 93.54% success rate, and better per class recognition results in 1245 images misclassified, and 1404 images correct classified.


INTRODUCTION
Most countries around the world have their economy that is dependent on agriculture. agriculture occupies an important place in the production and distribution of food to humans. (The growth of crops) the agriculture can not only bring necessities to people's daily life, but also improve soil fertility, maintaining a good soil ecosystem, control soil erosion, reduce natural disasters such as mudslides and sandstorms, and improve the environment in which humans depend. However, the growth and development of crops are closely related to the environment around them. and the environment becomes more and more polluted, may induce crop disease. Soybean is one of the most important agricultural products in the world [1]. Today, it is affected by several diseases that worry a lot of farmers, and the fight against crop diseases remains a major problem for them, to control these diseases, a large number of chemicals or fungicides are used on the citrus crop, which results in both economic loss and environmental pollution [2]. Now the new technologies based on artificial intelligence can develop precision agriculture, improve crops, manage and limit the misuse of chemicals in the beds [3,4]. Detection and classification of plant diseases are important tasks to increase plant productivity and economic growth [5,6,7]. computer vision, machine learning, and deep learning algorithms make it possible to develop tools for control and analysis of plant diseases [2,8,9].
II. RELATED WORK Today computer vision, machine learning, and deep learning are used in different fields, with the power of their classification algorithms and image analysis to revolutionize the world of agriculture. Different approaches have been developing for crop disease classification and detection: Machine learning methods, such as artificial neural networks (ANNs), Decision Trees, K-means, k nearest neighbors, and Support Vector Machines (SVMs) have been applied in agricultural research [10,11,12,13]. In [14] Shinde PG, Shinde AK, Shinde AA, Borate SP (2017) used image processing technique (k-means clustering) and raspberry PI to develop a system for automated crop disease which made use of email alerting and SMS functionalities to predict the disease and pesticide name. In [15] Baghel J. (2016) proposed k-means clustering segmentation technique, their proposed algorithm analyses the area of the leaf to separate the infected part and uninfected part of the leaf. Recently convolutional neural networks(CNN) have been used for object recognition and image classification: In [16] Serawork Wallelign, Mihai Polceanu, C´edric Buche (2017) based on the LeNet architecture was defined a CNN model to identify and classify 12,673 leaf images of soybean crop with 3 classes symptom images (Septorial leaf blight, Frogeye leaf spot, Downy Mildew), and the proposed algorithm achieved a good result with success rate of 99.32% in the classification of four different class soybean diseases. In [17] Ma J, Du K, Zheng F, Zhang L, Gong Z, Sun Z (2018) designed a new CNN model similar to Lenet5 and adopted the same image processing technique in their previous work [18]. The architecture of the model was composed of four modules, the first module is a convolutional layer that had 20 filters with a size of 5× 5 followed by ReLu activation and Max-pooling layers that has size of 2 x 2 and stride of 2, second module a convolutional layer that had 100 filters with a size of 3× 3, and a Maxpooling Layer with the filter that had a size of 2×2 and a stride of 2, the third module consisted of a Convolutional Layer that had 1000 filters with a size of 3× 3, the last module consisted of a Fully Connected Layer with 1500 neurons. The algorithm was applied to deal with four different cucumber diseases, when the algorithm was applied on the segmented symptom image the accuracy was 93.4%, and when the algorithm was implemented to identify symptom images under the influences of illumination the results were 98.1% of accuracy. When they use data augmentation methods to enlarge the datasets formed by the segmented symptom images, the model achieved 93.47% of accuracy.  [19], AlexNetOWTBn [20], GoogLeNet [21], Overfeat [22], and VGG [23] to classify plant diseases. Using a public dataset of 87,848 images of diseased and healthy plant leaves collected under controlled conditions, the CNN was trained to identify 25 different plants in a set of 58 distinct classes of plant, diseases. The best performance reaching a 99.53% success rate in identifying the corresponding plant, disease combination healthy plant.
The main objectives of this study were to recognize and classify the soybean leaf spot disease using the affected areas of disease spots based on the Deep Convolutional Neural Network.

A. Datasets
The datasets used for this work were downloaded from different databases on the internet (https://www.forestryimages.org https://plantvillage.org, http://www.image-net.org/challenges/LSVRC/2012/). The dataset includes 13243 leaf images of soybean crop with 5 classes of soybean leaf spot disease (Alternaria Leaf Spot, Phyllosticta Leaf Spot, Target Leaf Spot, Frogeye Leaf Spot, Bacterial Blight). TABLE 1 summarizes the number of images for each type of soybean leaf spot disease.
The symptom images were segmented using the Unsupervised fuzzy clustering algorithm which combines the advantages of fuzzy mean algorithm and unsupervised optimal clustering algorithm. It is based on the fuzzy mean algorithm. By gradually increasing the number of clusters and evaluating according to the effectiveness, it can find the best without supervision. The number of clusters. Compared with the fuzzy mean algorithm, the unsupervised fuzzy clustering algorithm improves the distance function, so that the cluster is not interfered with by the shape of the class so that the number of clusters can be accurately found and the clustering effect can be achieved more accurately. The symptom images were segmented following these steps: 1) Select the initial cluster center, and set the contrast factor, the maximum allowable error and the maximum number of clusters. 2) An initial clustering model can be obtained by clustering by a fuzzy mean algorithm. Use Euclidean distance in the distance function selection. 3) Again using the fuzzy mean algorithm for clustering, unlike step (2), the distance function is changed to an exponential distance function, as follows: 4) Calculate the effectiveness of clustering to measure scale parameters, there are some parameters below Fuzzy oversize standard: 5) If the current cluster is smaller than the predetermined maximum number of clusters, the number of clusters plus one is recalculated in step (2), otherwise, the calculation is stopped and the validity criterion is selected to select the cluster number with the largest average separation density as the best clustering.
Data that were of different sizes, were all being resized to 128 x 128 pixel. the dataset was divided into two sets, 80% for the training and 20% for testing. Shown in TABLE 1.

B. Deep Convolutional Neural Network model (Deep CNN)
The proposed CNN model used this work consist of 5 convolutional layers (Conv), 2 Max-Pooling layers, and 2 fully connected layers. 32 x 32 input mages size was selected for the classification. The architecture of this trained CNN model is shown in Fig.2. The first two convolutional layers (Conv1, and Conv2) are composed of 32 kernels of size 3 x 3, followed by the ReLu activation function forces the neurons to return positive values, and Max-poling with the filter that had a size of 2 x 2 and stride 3. The third convolutional layer (Conn3) consists of 64 kernels of size 4 x 4, followed by the Relu activation function, and the Max pooling layer had filter size 2 x 2 and stride 2. And the last two convolutional layers (Conv4, and Conv5) are composed of 128 filters with a size 3 x 3 followed also by Relu activation function, and the Max pooling layer had filter size 2 x 2 and stride 2. The two fully connected layers, each has 1024 neuron. ReLu activation and Dropout were applied after each fully connected layer. The probability of randomly drop a unit is 25% for the first one and 50% for the second Dropout. The softmax output classification was implemented to calculate the probability distribution of the 5 classes symptom images.

C. Performance model
As mentioned earlier, during the training stage, the dataset was divided into two sets, 80% for the training, 20% for testing as shown in TABLE 1. The proposed CNN model was loaded on python programming language using Keras and TensorFlow's deep learning library. To enlarge the datasets and overcome overfitting some simple and efficient technique was implemented such as data augmentation and Drop out. These algorithms are trained with a batch size of 32 for 1000 epochs with a momentum of 0.9, weight decay of 0.0005, and a learning rate of 0.001. Training algorithms were implemented on the GPU of a GeForce GTX1070 card, using the CUDA parallel programming platform, in a Linux environment (Ubuntu 16.04 LTS 64-bit operating system).
As we can see from TABLE 2, the designed CNN model achieved good recognition results, with a success rate of 89.84%, and an average loss of 0.20%. Fig. 3 illustrates the testing accuracy and loss accuracy of the Deep CNN model on the testing dataset.

D. Recognition performance
In this work, I use the three metrics of the confusion matrix: Precision, Recall, F-score, which can be respectively calculated from equations 1, 2 and 3 to measure the per class recognition performance of the DCNN algorithm.  As we can from those results, the implemented CNN algorithms didn't achieve a good performance per class recognition, the number of symptoms images misclassified (1378 images, probability rate of 52%) was larger than the number of symptom images correct classified (1271 images, probability rate of 48%), see in Fig. 5. This poor recognition performance can be explained by the reason that the spot diseases were segmented from input images. during the image recognition process, Deep CNN can logically analyze these constructs, first by simplifying the image and extracting the most important information, then organizing by data through feature extraction and classification.
We hypothesized that these results may be due to the quality of the images, and the characteristics such as the shape, the texture, and the margin extract from the input images [24].   As shown on the per-class recognition results in Fig  .4a, Fig. 6a, and Fig. 7a, all the implemented models (Deep CNN model, VGG16, SVM) outperformed a similar per class recognition performance on the Target Leaf Spot class, Frogeye Leaf Spot class, and Bacterial Blight class, shown in Fig.10. The similar prediction results of these soybean leaf spot disease classes may be caused by the reasons that the pattern of their spot diseases on the leaves area are identical. Their spot diseases are characterized by small, pale green spots or streaks which soon appear water-soaked, circular to angular in shape, shown in Fig.8. The implemented algorithms have difficulty predicting the correct class using the spot diseases segmented. These results verify the hypothesis described in Section 4.2. we conclude, however, the quality, the characteristics of the input images such as the shape, the texture, and the margin are an influence of factor to the Deep CNN. so for image recognition, it is better to implement the inputs images without extracting any feature any needed information from the input images and. Because the Depp CNN needs to learn more information during the training stage about the input images and Deep CNN have the ability for feature extraction. In deep learning, each level of layer learns to transform its input data into a slightly more abstract and composite representation. Each layer has a specific task for image processing, for example: • Convolution layer (CONV) that processes data from a receiver field. • The pooling layer (POOL), which allows compressing information by reducing the size of the intermediate image (often by sub-sampling). • The correction layer (Relu), often called by abuse 'Relu' concerning the function activation (linear grinding unit). • The "fully connected" (FC) layer, which is a perceptron type layer.
The future work will be to implement Deep CNN for leaf disease recognition and classification under the real cultivation condition in the field.

V. CONCLUSION
In this paper, we presented the Deep convolutional neural networks. The results show that Deep CNN has difficulty to recognize the leaf spot disease and predicting the correct class using the spot diseases segmented. Comparing the results, All the implemented models (Deep CNN model, VGG16, SVM) outperformed a similar per class recognition performance. Moreover, it is better to analyze the data obtained without the extraction of any feature for the input images using DCNN, because with these convolutional layers have the ability to recognize the important features and nonimportant features for the input images.
ACKNOWLEDGMENT This work was supported by the National Science  Foundation (61105035, 61502430).