Plant Identification Methodologies using Machine Learning Algorithms

Download Full-Text PDF Cite this Publication

Text Only Version

Plant Identification Methodologies using Machine Learning Algorithms

Skanda H N1, Smitha S Karantp, Suvijith S3, Swathi K S4

UG Scholars Pragati P5, Asst. Professor KSIT, Bengaluru, India

Abstract:- Plants are the backbone of all life and there are about 40 million plant species on Earth providing us with oxygen, food and many essential products helping for the existence of human life. A good understanding of plants is essential to help in the process of identification of new or rare plant species to improve the balance in the ecosystem. The matching of specimen plant to a known Taxon is termed as plant identification which implies assigning a particular plant to a known taxonomic group by comparing certain characteristics. Plant identification which has evolved over hundreds of years ago depends on the criteria and the system used. As identification enables us to retrieve the appropriate facts associated with different species to serve a particular kind of application, plant identification is essential. This paper includes various methodologies of numerous authors who have worked on different plant identification techniques.


    Plants are of central importance to natural resource conservation. Plant species identification provides significance information about the categorisation of plants and its characteristics. Manual interpretation is not precise since it involves individual's visual perception. Sampling and capturing digital leaf images are convenient which involves texture features that help in determining a specific pattern. The most important feature to distinguish among plant species are venation and shape of a leaf. As information technology is progressing rapidly, techniques like image processing, pattern recognition and so on are used for the identification of plants on basis of leaf shape description and venation which is the key concept in the identification process. Varying characteristics of leaves are difficult to be recorded over time. Hence it is necessary to create a dataset as a reference to be used for a comparable analysis. Leaves are used in most of the plant identification methodologies due to their attractive properties and availability throughout the year.


    The paper[1], describes image processing technique for identifying ayurvedic medicinal plants by using leaf samples. Forests and wastelands sources for over 80% of ayurvedic plants. There exists no predefined database of Ayurvedic plant leaves. A set of leaf images of medicinal plants were collected from the botanical garden. To improve the efficiency of plant identification system, machine learning techniques can be used over human

    visual perception as it is more effective. Weka is a collection of machine learning algorithms for data mining. It contains feature selection, regression, classification and pre-processing tools. Graphic user interface is used for accessing the functions. This proposed scheme uses some of the classifiers such as Support Vector Machine (SVM) and Multilayer perceptron (MLP). For reverting and classifying of data SVM is used. MLP is an artificial neural network which helps in routing the input data of one set to appropriate output pertaining to another set. The highest identification rate in SVM is 98.8% and 99% obtained in MLP.

    The paper[2] discusses the Computer-assisted android system for plant identification based on leaf image using features of SIFT along with Bag of Word (BOW) and SVM as classifiers. This identification method for android involves 8 stages. It employs client-server model of architecture. Server involves 2 main activities. The first activity is to train the SVM classifier to generate feature vector required for classification and then save it. The second activity is generation of feature vector with the help of photographs uploaded. These are uploaded by android client. The generated vector is used for identification by the SVM classifier. The process of training SVM involves SIFT descriptors along with Bag of Feature model that helps in generation of classifier. The generation of classifier involves 4 steps. In the first step, using the reduction method of data space SIFT descriptors are extracted from each leaf image belonging to the training data set. The second step is to cluster all the extracted features into feature bags using BOW methods. In the next step bow histograms are generated by taking all the images in the training dataset into consideration. In the final step all the histograms are passed to the SVM as the classification feature vector. SVM creates and saves the classifier in the server storage. The RGB image is converted into a greyscale image before extracting SIFT feature points as a pre-processing step. Following which involves extraction of key point and generating of descriptors by using SIFT algorithm that involves CBIR (content-based image retrieval) algorithm. Using k-means clustering method all the collected SIFT features from training dataset are clustered into several clusters. A histogram represents each image in the training dataset. Histograms are classified using multi-class linear support vector machine. Android implementation involves client application that consumes algorithm of leaf recognition. Dynamic Link Library (DLL) application is used to invoke

    communication between the web service and the OpenCV implementation of image processing. This methodology obtains an average accuracy of about 96.48% on 20 different species.

    The paper[3] discusses the general steps for plant identification using pre-processing, feature extraction and their classifications. The availability of classic classification algorithms are not accessible, therefore it gave way for new methodologies applying data mining methods in specific domain. Considering the extraction process, initially we come across pre-processing where extraction of the available data is done to form images. These leaf images are transformed into quality binary images using normalization and segmentation processes. Most of the leaf datasets is available online and here we scale it in order to constrain the size. We also consider image normalization where brightness and contrast features are considered. Binary images of the leaves are obtained using leaf segmentation that is necessary in order to eliminate noise using morphological features. By using contour extraction, the geometric features of leaves are obtained. The Feature extraction process is used for plant recognition which considers various parameters such as area convexity, perimeter convexity and so on describing the leaf characteristics. Classification process is a supervised learning technique where we use ANN, SVM and KNN classifiers which improves classification accuracy.

    The paper[4] describes the methods of shape feature extraction that is Scale Invariant Feature Transform (SIFT) and colour feature extraction Grid Based Colour Moment (GBCM) to identify plants which comprises of phases such as image acquisition, image processing, feature extraction, identification and performance measurement. The Image acquisition process mainly deals with acquiring datasets of different tree species. Image processing mainly aims to enhance image data required for further processing by discarding the undesired distortions. This process includes the phases of rotation, scaling and variations of leaf samples for further testing. Shape features and colour features are extracted using scale invariant feature transform and grid-based colour moment respectively. In SIFT both domains of spatial and frequency are considered. Geometric transforms makes it robust to illumination and noise. It also considers varying views of the object taking into consideration that helps in detection of the scale space extrema and an elaborate analysis is performed with respect to various features allowing the rejection of points corresponding to low contrast regions. The gradient magnitude and orientation is measured for each image sample. The orientation ranges from 360 degree and the Gaussian weighted circular window is used to measure the magnitude. The Grid-based colour moment is extracted using colour moment technique. Three parameters are used to calculate skewness, mean and standard deviation of an image. After acquiring these data, we go for an identification process based on Euclidian distance that determines the root square differences between values of a pair of objects considered. This methodology achieved an accuracy of 87.5% .

    The paper[5] discusses about the leaf features that uses shape contour which is represented mathematically. The distance travelled from the starting point is denoted by arc length, the periodic function of curve segment which is centred on the point depicts the perpendicular distance from that point to the straight line which connects it. The convexity and concavity measures of the arc are then considered, on the basis of these observed values functions operate on two different multi-scale shape information features. Capturing of the shape details is focused by smaller scale and the global properties are reflected by large scale. To achieve scale invariance consideration, maximum value is taken to normalize it and then subjected to Fourier transforms describes about the shape, in addition with standard deviation methodologies to enhance the power of discrimination of the shape descriptor. Then we consider the dissimilarity between the obtained shapes. Mobile leaf identification is a convenient and efficient method using Android OS helping in application development. Parameters such as storage, RAM, bandwidth and power computation are some of the constraints of a mobile which often tempts to request for a high- performance server with the connection of internet. Here the implementation of both an online, as well as offline leaf database is done. Here we consider leaf image datasets with Classical Fourier descriptors such as to find internal distance (IDSC), multi-scale convexity or concavity representation (MCC), triangle-area representation (TAR) approaches are used. With these proposed methods we achieve a 26.47% higher retrieval accuracy faster than MCC, TAR, IDSC at a speed of over 170. In offline leaf recognition, a database is been downloaded prior during the installation that allows consistent match speed and is most reliable. In online leaf recognition, a database is updated regularly for computation and memory requirements which involves sending of feature vector to the main server. The extraction process is done on phone itself where bandwidth reduces drastically. Then the server returns the closest matches of the databases opened showcasing the result in a webpage. The method proposed is 30 times faster obtaining the response almost instant.

    This paper[6] briefs about the idea of a graphical identification tool which uses computer aided system for automatic identification technique. Graphical tool describes three main components namely graphical interface, identification of plants and result interface. The graphical interface characterises plants based on leaf, venation etc as graphical icons. After this, comparison of similarities between the user-defined input with respect to the original database containing plants are subjected for the identification process. Finally the result interface provides the result of identification and also provides sorting of plants present in the database in a decreasing order based on their similarities. Even though plant identification process is made easier with the graphical tool, the feature extraction process still remains as base for the identification process. This might sometimes lead to improper identification. So, the automatic plant identification technique is used to overcome the disadvantages of the graphical tool process. In automatic

    plant identification technique, the leaf characteristic is used to identify the plant since it plays an important role in plant identification. In an object detection and identification, the histogram of oriented gradients (HOG) is recognised as the robust image descriptor. So, HOG is employed for identification of plants in an automatic plant identification technique which consists of three stages: (i) for all the images in the database HOG is computed. (ii) to reduce the descriptor dimension Maximum Margin Criterion (MMC) is used. (iii) SVM is applied for leaf identification. The Hu descriptor used for recognition of plants based on leaf images is compared with HOG to analyse the performance of the system.

    This paper [7] divides the identification of plant into three stages, they are: synthetic plant collection, spatiotemporal evolution model and automata extraction. In the first step, finite set of elements characterizes the plant development and growth in synthetic collection of plants. This finite set takes the indeterminate and complex shape. The mathematical formulation of underlying rules is named as L-system. An l-system is defined as the 3-tuple G = (V, w,

    1. system. The artificial regularity, also it introduces randomness to its production. In a synthetic plant collection, image processing and feature extraction method is also used. The L-systems are also visualized using truth table using turtle interpretation and saved as JPEG images to simulate the real plants. To detect the main axis and root of the plant, Hough transform is used. In the second step, that is, the spatiotemporal evolution model, KAARMA network models a dynamic system as defined by the general continuous non-linear state transitions functions and an observation function. To train a STEM, kernel adaptive KAARMA is used. In the third step, that is, the automata extraction, the discrete finite automation (DFA)is used where all the state transitions are uniquely determined by input symbols, from an initial state. The DFA is used to model the discrete time dynamical system in the discrete state space. A DFA can be represented in two ways, state transitions or lookup table. The analytical descriptor of a languages known as an Automata. The DFA also validates the corresponding regular grammar produced by the language.

      The paper[8], proposes the use of a convex combination comprising of two LMS adaptive transversal filters. One of the filters has a high adaption step whereas the other has low adaption steps. The exact balance between speed and convergence can be achieved using these adaption steps. Tracking capabilities of fast LMS and also low error by the slow filter during stationary period marks the combined advantage of this scheme. The additional advantage of this procedure is that switching procedures can be avoided.

      The paper[9] proposes identification of leaves by using triangular representations. It is based on all contour point markings and then uses a dynamic space warping matching method to compare the similarity between the image and database. Two types of contour points are employed, namely salient points that represent the points on the leaf where there are maximum activity and there is marginal points which are present on the leaf edge. Imaginary lines are drawn from point to point to form a triangular shape in

      the surface area of the leaf. Dimension of these leaves are calculated and matched to the same values occurring in the data set. It uses triangular area Representation (TAR) or Triangular side length representation (TSL) to calculate the shape dimensions. These methods are utilized in TASLA (triangle represented by two sides length and two angles). They have utilized the angles between in the formed triangles and the lengths as a set. On an experimental fusion method where two or more methods wee clubbed and used together.

      The paper[10] proposes the use of Convolution Neural Networks (CNN) to form a model that creates a dataset based on the input features provided. It utilizes numerous layers to form this data set. At each layer a convolved map of the input image is formed. Here the parameters are separated into their own individual maps through a rectified linear function. These maps are pooled in and sent to the next layer for further refining. The consecutive layers utilize kernels to refine the incoming pooled maps. This continues till n+ 1 layer. The paper also states the utilization of De-convolution Neural Networks (DN), which is used to read the model created by the CNN. The version used is V1 that takes in unpooled maps and de- convolves it from layer n till the first layer to reform the image. This image is then rotated about 7 different orientations. This provides an accurate visualization technique which creates a data set for further references. Experimental results proved the importance of venations in each leaf as well. This method provided a result of 98.1% accuracy.

      The paper[11] proposes a straight forward method of leaf identification using image processing. It has 3 basic steps, namely (i) Image Acquisition Phase where the image of the leaf is captured using a high-resolution camera. (ii) Image Pre-processing Phase where the image is cleaned of any noise or irregularities and (iii) Feature extraction Phase where the morphological parameters such as size, area and thickness are acquired. It uses a reference table for comparison. Simple software tools are implemented here such as ANN for classification, Python programming for maintaining a dataset and MATLAB used for testing and comparison. The basic process is to convert the image into a gray scale and then into a black and white pixel layout. The count of these pixels forms a binary image which is then converted to a hull made up of rows and columns. These parameters are converted to standard deviation and mean and placed in a confusion matrix where the leaf parameters are compared using MATLAB. This method has resulted in 98.61% accuracy.


    Most of the methodologies mentioned above require the usage of a reference table or an inbuilt data set. This means a pre-analysis and initial collection of data has to be done in order to be used as reference for future comparison. Avoiding this preliminary step is difficult, but the content can be stored in a more efficient way with the advance of CLOUD where digital data can be stored in the form of

    logical pools. New methods can be used based on the advancement of the present technology. Therefore, we propose the following new methods.

      1. Leaves can be identified using digital fingerprint. This method works the same way a media recognition app works. By scanning the leaf by lasers, different depth points can be marked and connected to form an image which can be plotted against a graph. The area enclosed by graph form the unique digital fingerprint of the leaf which can be used to recognize the plant.

      2. Leaf recognition can be done by tracing its outline on a digital screen such as a camera. Just like how a swype keyboard on our phones work, the path taken by the users finger to trace the leaf image can be linked to a preset algorithm. Once the finger is lifted from the screen, the Path is mapped and the similar path is extracted from dataset and leaf is recognized. Moreover, leaves with similar shapes which have similar path maps can be suggested to avoid error. Arguments can be made regarding the difference of inputs due to the change in users. But the uniqueness of the digital finger prints and the fixed preset algorithms (using python) will most definitely stabilize the varying users problem.


  1. P. M. Kumar, C. M. Surya and V. P. Gopi, "Identification of ayurvedic medicinal plants by image processing of leaf samples," 2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, 2017, pp. 231-238.

    DOI: 10.1109/ICRCICN.2017.8234512.

  2. H. A. Chathura Priyankara and D. K. Withanage, "Computer assisted plant identification system for Android," 2015 Moratuwa Engineering Research Conference (MERCon), Moratuwa, 2015, pp. 148-153. DOI: 10.1109/MERCon.2015.7112336.

  3. Rafael Rojas-Hernández and Asdrúbal López-Chau, "Plant identification using new geometric features with standard data mining methods", Networking Sensing and Control (ICNSC) 2016 IEEE 13th International Conference on, pp. 1-4, 2016.

  4. Che Hussin, N. A., Jamil, N., Nordin, S., & Awang, K. (2013). Plant species identification by using scale invariant feature transform (SIFT) and grid based colour moment (GBCM). In 2013 IEEE Conference on Open Systems, ICOS 2013 (pp. 226-230). [6735079] IEEE

    ComputerSociety.DOI: 10.1109/ICOS.2013.6735079.

  5. B. Wang, D. Brown, Y. Gao and J. L. Salle, "Mobile plant leaf identification using smart-phones," 2013 IEEE International Conference on Image Processing, Melbourne, VIC, 2013, pp. 4417- 4421.DOI: 10.1109/ICIP.2013.6738910.

  6. N. H. Pham, T. L. Le, P. Grard and V. N. Nguyen, "Computer aided plant identification system," 2013 International Conference on Computing, Management and Telecommunications (ComManTel), Ho Chi Minh City, Vietnam, 2013, pp. 134-139. DOI: 10.1109/ComManTel.2013.6482379.

  7. K. Li, Y. Ma and J. C. Príncipe, "Automatic plant identification using stem automata," 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, 2017, pp. 1-6. DOI: 10.1109/MLSP.2017.8168147.

  8. M. Martinez-Ramon and J. Arenas-Garcia,"An Adaptive Cmbination of Adaptive Filters for Plant-Identification" Digital Signal Processing, 2002. DSP 2002. 2002 14th International Conference on, Volume: 2, DOI:10.1109/ICDSP.2002.1028307 .

  9. Z. Q. Zhao, Y. Hong, P. Zheng and X. Wu, "Plant identification using triangular representation based on salient points and margin points," 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 1145-1149. DOI: 10.1109/ICIP.2015.7350979.

  10. S. H. Lee, C. S. Chan, P. Wilkin and P. Remagnino, "Deep-plant: Plant identification with convolutional neural networks," 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 452-456.

    DOI: 10.1109/ICIP.2015.7350839.

  11. R. G. de Luna et al., "Identification of philippine herbal medicine plant leaf using artificial neural network," 2017IEEE 9th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Manila, 2017, pp. 1-8. DOI: 10.1109/HNICEM.2017.8269470.

Leave a Reply

Your email address will not be published. Required fields are marked *