🏆
International Scientific Platform
Serving Researchers Since 2012

Herbal-i: A Mobile Application for Deep-learning-Based Medicinal Leaf Recognition Using YOLOv5 and MobileNetV2

DOI : 10.17577/IJERTCONV14IS010038
Download Full-Text PDF Cite this Publication

Text Only Version

Herbal-i: A Mobile Application for Deep-learning-Based Medicinal Leaf Recognition Using YOLOv5 and MobileNetV2

Stan Avil Dsouza

Dept. of Computer Applications St Joseph Engineering College An Autonomous Institution Vamanjoor, Mangaluru, India

Mr Sunith Kumar T

Assistant Professor

Dept. of Computer Applications St Joseph Engineering College An Autonomous Institution Vamanjoor,

Mangaluru, India

Nisha H

Dept. of Computer Applications St Joseph Engineering College An Autonomous Institution Vamanjoor, Mangaluru, India

Mr Murari B K

Assistant Professor

Dept. of Computer Applications St Joseph Engineering College An Autonomous Institution Vamanjoor, Mangaluru, India

AbstractHerbal-i is a lightweight, mobile-based intelligent system developed for the automated identification of medicinal plants through leaf image analysis. The system incorporates a two-stage deep-learning pipeline that integrates the object detection capability of YOLOv5 with the classification efficiency of MobileNetV2, enabling accurate recognition of plant species through leaf characteristics. A curated dataset of 4,988 high- resolution images, encompassing 40 distinct species of medicinal plants, was compiled from field-collected samples and publicly available sources. The dataset was constructed to include diverse forms of environmental conditions, under inconsistent lighting scenarios, complex backgrounds, and diverse leaf orientations, to ensure robustness in real-world settings. In the first stage, YOLOv5 detects and isolates the leaf region from the input image, achieving a mean Average Precision (mAP) of 77.46% at an IoU threshold of 0.5 and 76.27% at IoU 0.5:0.95. The extracted region is then passed to a MobileNetV2-based classifier, which attained a classification accuracy of 95.13% on the independent test set. We adjusted each model so it runs quickly and without lag, making sure it fits the limited resources of a phone. When someone snaps or uploads a leaf picture, Herbal-i instantly tells them the plants scientific and common names, plus a short note on its medicinal uses. The app is meant for anyone from students and farmers to health workers and conservationists. By integrating modern artificial intelligence with long-standing knowledge of medicinal herbs, Herbal-i seeks to improve public understanding of therapeutic plants, promote their thoughtful usage, and increase the general public's access to scientific knowledge.

Keywords – Herbal leaf recognition, deep learning for botany, mobile plant classification, YOLOv5, MobileNetV2, image-based detection, real-time plant scanning.

  1. INTRODUCTION

    In fields like ethnobotany, herbal pharmacology, traditional medicine, and ecological conservation, being able to identify medicinal plants is absolutely vital. For centuries, these

    species have served as the backbone of healthcare, supplying natural remedies for a wide range of medical needs. Yet, in real-world conditions, distinguishing between plant species remains surprisingly tough. Many leaves share very similar patternsveins, edges, and shapes can all blur together, making it easy to confuse one plant with another just by looking. The situation is further complicated by environmental influences: poor lighting, changing seasons, or leaves that are partly hidden can make identification much harder. People living in remote locations often cant access detailed taxonomic guides or consult experts, which traditional identification depends on. Understandably, theres a growing call for automated, user-friendly solutions that deliver quick, accurate results, especially when specialist knowledge is lacking and timely identification is critical.

    The last few years have brought impressive advances in object detection and image classification, driven by rapid progress in artificial intelligence and visual computing. Convolutional neural networks, in particular, have proven remarkably adept at capturing visual hierarchies, fueling innovation in fields like disease detection, precision farming, and monitoring of different species. Even when images are cluttered or have several overlapping leaves, detection models like YOLOv5 have gained a reputation for their ability to quickly and precisely pick out relevant features. Meanwhile, light-weight neural networks such as MobileNetV2 make high-quality classification possible even on mobile or resource-limited devices. Despite extensive use of these models individually, its relatively rare to see fully integrated systems aimed specifically at identifying medicinal plants by their leaves. Addressing this gap, the present work introduces Herbal-i: a two-stage system that first employs YOLOv5 to detect leaves and then uses MobileNetV2 to classify the extracted regions among 40 labelled species. The entire pipeline is built for genuine, real- world use, trained on a large and carefully curated dataset,

    and aims to give a diverse set of users fast, reliable access to medicinal plant identification. Herbal I was able to combine traditional botanical knowledge with contemporary artificial intelligence techniques by training both models on a varied collection of 4,988 high resolution leaf photographs gathered from field surveys and public image archives.

  2. RELATED WORK

    Lately, studies in medicinal plant recognition have improved dramatically thanks to deep-learning methods, particularly using YOLO and MobileNet models. Various studies have contributed datasets, model innovations, or hybrid strategies that push the boundaries of accuracy, efficiency, and real-world deployment.

    1. YOLO-based approaches:

      Valdez et al.[1] proposed a real-time detection system using YOLO-based medicinal plant recognition with a new image dataset, achieving 83% mAP on Philippine herbal species. This approach demonstrated the feasibility of deploying YOLOv5 on mobile platforms for plant identification. Similarly, Banala and Duvvuru[13] compared YOLOv5 and YOLOv8 on turmeric leaf diseases in YOLO- based detection of turmeric leaf disease using image processing, where YOLOv5 achieved superior accuracy (98.6% mAP), reinforcing its robustness in agricultural scenarios.

    2. MobileNet-based classifiers:

      Pushpa et al.[2] introduced multiple hybrid MobileNet architectures in A deep-learning hybrid model for classification of medicinal plant leaves, with the top model achieving 94.24% accuracy using MobileNetV2 and SE blocks. Abdollahi[4] employed MobileNetV2 in Classification of medicinal plants using transfer learning, achieving 98.05% on a 30-class dataset. Kavitha et al.[5] trained a MobileNet model in Real-time identification of a medicinal plants using MobileNet model, reaching 98.3% accuracy, confirming MobileNet's effectiveness for real-time classification on mobile platforms.

    3. Hybrid and ensemble models:

      Sachar and Kumar[7] developed an ensemble model in Ensemble deep-learning architecture for medicinal leaf identification, fusing MobileNetV2, InceptionV3, and ResNet50 for 99.66% accuracy. Dwivedi et al.[8] introduced a CNNSVM hybrid in Progressive transfer learning and SVM for medicinal plant identification, combining ResNet50 and optimized SVM to reach 96.8%. Manoharan[10] addressed segmentation challenges in Two-stage herbal plant recognition using deep knowledge-based decision fusion, using a two-stage XOR-fused classifier pipeline to improve resilience to seasonal variation and incomplete features.

    4. Traditional features and surveys:

      Saikia et al.[9] explored classical approaches in Identification of some medicinal plants based on leaf using neural network, combning handcrafted features like shape and texture with a backpropagation neural network, showing

      reasonable performance even with limited data. Hajam et al.[3] provided a systematic overview in Medicinal plant recognition using leaf image: A survey of deep-learning and machine-learning approaches, summarizing effective CNN- based techniques and highlighting data diversity as a common limitation across studies.

    5. Dataset contributions:

      Zhang et al.[11] presented a large-scale dataset in TCMP- 300: A large-scale benchmark dataset for traditional Chinese medicinal plant recognition, containing over 52,000 images across 300 species, on which CNN models achieved up to 89.64% accuracy. Pushpa and Rani[12] created DIMPSAR: Indian medicinal plant leaf and species image dataset, comprising over 12,000 images across different seasons and plant parts, offering realistic diversity for training automated plant classifiers.

    6. Comparative context:

    While earlier approaches tend to focus on either detection or classification alone, our method splits these tasks into two dedicated stages, using YOLOv5 for detection and MobileNetV2 for classification. This modular design improves spatial precision and classification generalization, offering better scalability across diverse species compared to prior monolithic or ensemble approaches. Our dataset contains

    40 well-balanced medicinal plant classes, making it more generalizable than earlier works focused on fewer species or domain-specific datasets.

  3. METHODOLOGY

    This research adopts a two-stage recognition pipeline for the computerized identification and classification of herbal plants leaves. The system comprises two independently trained deep-learning models: YOLOv5 for object detection and MobileNetV2 for image classification. Although YOLOv5 and MobileNetV2 share the same image collection, we train and fine-tune each network independently for its specific role, spatial detection or species classification.

    1. Dataset Preparation and Annotation

      A carefully assembled dataset of 4,988 high- resolution images was prepared, covering 40 different species of herbal plants. This study's dataset comprised 4,988 high- resolution leaf photos of 40 different species of medicinal plants. Each plant class in the dataset contained between 100 and 120 images, striking a careful balance to minimize bias and ensure the model could learn effectively from a consistent, well-distributed sample. Recognizing that object detection and image classification have differing requirements, we adopted two tailored annotation strategies. For YOLOv5, every image was labeled using the YOLO format, involving bounding boxes defined by normalized coordinates for width, height, and the center point of each leaf. Where relevant, class identifiers were included, and some images showed multiple leaf regions. For the classification phase, the focus shifted to single, isolated leaf segments, these were either manually cropped or derived from YOLOv5 outputs. The cropped images were then organized into folders, with each directory representing a separate plant species. At this stage, bounding

      box information was purposefully omitted, as it was unnecessary for classification.

      Before beginning model training, images underwent preprocessing to satisfy each architectures specific needs. MobileNetV2 inputs were resized to 224×224 pixels, and YOLOv5 inputs were adjusted to 640×640 pixels to fit their respective architectures. To further promote training stability and speed, pixel values in all images were normalized to the range. This consistent and systematic preprocessing streamlined the learning process and improved the performance of both models during training and evaluation. For data splitting, the YOLOv5 dataset used an 80% training and 20% validation split, while the MobileNetV2 dataset was divided into 70% for training, 10% for validation, and 20% for testing, customizing the approach to the needs of each task. The annotation procedure for the detection task is depicted in Figure 1, and example images with bounding boxes are shown in Figure 1.1.

      Figure 1. Annotation-to-bounding box conversion workflow

      Figure 1.1. Sample YOLOv5 Training Images with Annotated

      Bounding Boxes

    2. YOLOv5 Object Detection Pipeline

      We chose the YOLOv5s variant for its good trade-off between inference speed and detection precision. The model underwent 100 full training cycles (epochs) with batch size 16, using SGD (momentum 0.937, weight decay 0.0005) and a cosine-annealing schedule to adjust the learning rate over time, starting from 0.01. Augmentation techniques included

      horizontal flipping (p=0.5), mosaic augmentation, and random cropping.

      Training was conducted in Google Colab using an NVIDIA Tesla T4 GPU and Python 3. YOLOv5s (Ultralytics version) was used. The model was trained on YOLO- formatted annotations and evaluated using precision, recall, box loss, object-ness loss, and classification loss. The most optimal model state was chosen by monitoring validation mAP. Figure 2 illustrates the YOLOv5 detection pipeline.

      Figure 2. YOLOv5 process flow from image input to final detection output

    3. Image Classification Pipeline for MobileNetV2

      For the classification stage, each cropped leaf image was processed using the MobileNetV2 architecture. A two-stage transfer learning strategy was implemented: initially, for the first 15 epochs, only the classification head was trained while the base layers was pretrained on ImageNet remained frozen. Following this period, the entire network was unfrozen and further fine-tuned over an additional ten epochs, allowing both the feature extractor and classifier to adapt more fully to the datasets specific characteristics.

      Random vertical and horizontal flips, rotations up to ±20°, and 10% zoom were applied as data augmentation. Training was carried out on a v2-8 TPU in Google Colab, using TensorFlow and Keras. The model achieved 95.27% training accuracy, 95.22% validation accuracy, and 95.11% test accuracy. Figure 3 presents the classification pipeline.

      Figure 3. MobileNetV2 process flow from image input to final detection output

    4. Self-contained Model Structures and Functions

      While both YOLOv5 and MobileNetV2 were trained on the same dataset, each was specifically designed to address a different component of the recognition task. YOLOv5 generates rectangular regions, each labeled with a class identifier to detect one or several leaves in a complete image, which is crucial for accurately segmenting plant material from its background. In contrast, MobileNetV2 classifies the individual, cropped leaves by analyzing learned features

      related to their texture, shape, and structural characteristics. By integrating these two stages, the system achieves both reliable spatial detection and species-level classification. This division of tasks not only improves overall robustness but also makes the outputs of the model more interpretable and practical for real-world use.

    5. Evaluation Metrics

    To fully assess model performance, we used standard metrics including accuracy, precision, recall, and the F1 score. These metrics provide a comprehensive view of how well the system performsespecially for classes that have fewer examples.

    Precision refers to how many of the positive predictions made by the model are truly correct, or more formally, the proportion of predicted positives that are actual positives.

    Recall (also called sensitivity) measures the model's ability to identify all relevant examples in the dataset. It expresses the proportion of real positives that have been detected.

    F1 score offers a single performance value that balances recall and precision. By considering both missed detections and flse positives, it is particularly useful when dealing with unbalanced datasets and is calculated as the harmonic mean of precision and recall.

    Accuracy gives an overall indication of correctness by measuring the percentage of all predictions (both true positives and true negatives) that are accurate.

    Using these metrics to compare YOLOv5 and MobileNetV2 in the context of medicinal leaf identification helped clarify the strengths and weaknesses of each approach at different stages, providing valuable insight into the systems overall effectiveness.

  4. RESULTS AND ANALYSIS

    To evaluate the effectiveness of the proposed two-stage medicinal leaf recognition system, we tested its performance on independently prepared subsets of our dataset. The YOLOv5 models object detection capabilities were assessed using a dedicated validation split, while a separate test set was used to measure MobileNetV2s classification accuracy. Collectively, these evaluations illustrate how precise localization of leaf regions contributes to reliable species

    identification, further supporting the validity of the combined approach.

    1. YOLOv5 Detection Results

      Extensive data augmentation was applied while training the YOLOv5s model for 100 epochs. During evaluation, we used an NMS IoU threshold of 0.45 and set the confidence cutoff at 0.25. The detector recorded a precision of 98.22%, recall of 75.88%, and an F1-score of 85.61%. Its mAP reached 77.46% at IoU = 0.5, dropping slightly to 76.27% when averaged over IoU thresholds from 0.5 to 0.95. Figure 4(a) illustrates that both precision and recall rose rapidly within the first 50 epochs before leveling off, demonstrating quick and stable convergence.

      Figure 4(a). Precision and recall curves for YOLOv5

      The component-wise losses such as box loss, objectness loss, and classification loss consistently declined throughout training, as illustrated in Figure 4(b), suggesting stable convergence and absence of over-fitting.

      Figure 4(b). Loss components (box, objectness, classification) for YOLOv5

      Qualitative analysis of detection outputs confirmed strong generalization across varied environments, including different backgrounds and lighting conditions. Occasional detection failures were noted in images with dense foliage or overlapping leaves, pointing to potential improvements through multi-scale feature learning or context-aware attention mechanisms.

    2. MobileNetV2 Classification Results

    MobileNetV2 was trained in two segments. During the first stage, the pretrained convolutional base was frozen, and the classifier head was trained for 15 epochs. In the second phase, the full model was unfrozen and fine-tuned for an additional 10 epochs. The model achieved a training accuracy of 95.27%, a validation accuracy of 95.22%, and a test accuracy of 95.11%, with a final test loss of 0.1551. As depicted in Figure 5(a), the validation performance improved significantly after unfreezing the full network.

    Figure 5(a) Training and validation accuracy for MobileNetV2

    Figure 5(b) shows smooth and converging loss curves with minimal difference between training and validation losses, indicating strong generalization.

    Figure 5(b). Training and validation loss for MobileNetV2

    Misclassifications were primarily observed among species with visually similar leaf textures and shapes. These errors suggest possible gains from incorporating attention-based feature refinement or multi-view image inputs. Overall, the results confirm that YOLOv5 provides reliable leaf detection, while MobileNetV2 effectively classifies species from cropped regions. The two-stage design enables accurate and scalable medicinal plant identification through coordinated detection and classification.

    Table 1: Performance Comparison with State-of-Art Method

    Method

    Datset Size

    No. of classes

    Detectio n mAP

    Classificatio n Accuracy

    Valdez et al. [1]

    Not reported

    4

    83%

    Not reported

    Abdollahi [4]

    3,000

    30

    Not reported

    98.05%

    Zhang et al. [11]

    52,089

    300

    Not reported

    89.64%

    Our Method

    4,988

    40

    77.46%

    95.13%

  5. DISCUSSION

    This study implemented a two-stage recognition framework for medicinal plant leaves by leveraging YOLOv5s ability to precisely locate leaf regions and MobileNetV2s efficiency in classifying those regions. Evaluation results confirm that each model excels in its respective role, demonstrating the realworld applicability of our pipeline. In the detection phase, YOLOv5 achieved 98.22percent precision and 75.88percent recall, showing that most leaf instances are correctly identified while a minority are missed under challenging conditions such as cluttered backgrounds, overlapping foliage and low lighting. Its F1 score reached 85.61percent and its mean average precision measured 77.46percent at an intersection-over-union threshold of 0.5, with a slight decrease to 76.27percent when averaged over IoU values from 0.5 to 0.95. Visual inspection of sample outputs confirmed strong localization across a variety of leaf shapes and scene conditions. Nevertheless, the recall rate could improve by integrating multi scale feature extraction or attention modules that focus on finer spatial details. On the classification side, MobileNetV2 delivered stable and high performance, reporting 95.27percent accuracy during training, 95.22percent on validation data and 95.11percent using the final evaluation dataset. Even after the fine-tuning phase, the classification model showed little sign of overfitting, as reflected by the stable alignment between training and validation accuracy. Some misclassifications did occur, especially when the leaves of different species shared strikingly similar vein structures or overall arrangements. These cases indicate that performance may be further enhanced by adopting strategies like contrastive learning or advanced techniques for separating similar features, which could help the model better distinguish between closely related categories.

    The two-part architecture was purposefully designed to address semantic classification and spatial detection as separate, modular challenges. YOLOv5 handled the localization of leaf regions reliably, even under diverse lighting and background conditions. However, occasional misses were noted, particularly with leaves partially hidden or set in especially complex scenes; using multi-level feature maps or attention modules could help the detection stage become more sensitive to finer detail. MobileNetV2, meanwhile, continued to generalize effectively after full fine- tuning, performing well on classification across all test splits, though rare confusion persisted among species with nearly identical appearance. Importantly, the modular separation of detection and classification means enhancements can be made to either part independently, with no need to retrain the entire

    system. This flexible design also streamlines maintenance and future upgrades. Furthermore, the systems architecture allows for implementing adaptive confidence thresholds or fallback procedures when predictions are uncertain, supporting more robust real-world use. Both detection and classification stages meet real-time processing constraints based on inference tests with a desktop GPU, suggesting that only minor adjustments would be needed for deployment on mobile or embedded hardware. Even though direct user studies or large-scale field trials havent yetbeen conducted, the system is readily adaptable to such settings. Its straightforward nature and focus on species-level identification make it suitable for conservation, educational activities, and agricultural monitoringparticularly in environments where fast, accurate plant identification is critical. When compared to single-stage approaches that address only detection or only classification, this two-stage method provides a more balanced integration of precise localization and high classification accuracy, leading to

    current system. In future efforts, we plan to broaden the dataset by incorporating additional geographic regions and seasonal conditions, and to conduct mobile-based user trials in real-world environments for real-world deployment and user testing. This study provides a foundation for continued development of intelligent plant recognition systems grounded in deep-learning and tailored for real-world use.

    REFERENCES

    1. D. Valdez, F. C. Garcia, D. Roque, R. L. Reonal, and A. B. Gadia, YOLO-based medicinal plant recognition with a new image dataset, Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 1, pp. 238246, Jan. 2023.

    2. B. Pushpa, S. S. Gowda, and R. Venkatesh, A deep-learning hybrid model for classification of medicinal plant leaves, Indonesian Journal of Electrical Engineering and Computer Science, vol. 30, no. 1, pp. 159165, Apr. 2023.

      superior overall results.

  6. CONCLUSION

In our work, we built a two-stage approach for recognizing medicinal plant leaves: YOLOv5 handles the detection, and MobileNetV2 takes care of the classification.

  1. S. M. Hajam, M. A. Lone, and M. A. Shah, Medicinal plant

    recognition using leaf image: A survey of deep-learning and machine-learning approaches, Materials Today: Proceedings, vol. 72, pp. 10351040, 2023.

    H. Abdollahi, Classification of medicinal plants using transfer learning, Indonesian Journal of Electrical Engineering and Computer Science, vol. 25, no. 1, pp. 191198, Jan. 2022.

    Both models were trained on a carefully assembled collection of 4988 images covering 40 species and then fine-tuned independently. The detection model consistently produced precise and accurate results, successfully identifying leaf regions across a broad range of backgrounds and lighting conditions. At the same time, the classification model demonstrated strong performance when tested on new, unseen images, reinforcing its ability to generalize beyond the training data. One of the key strengths of the system lies in its modular setup: since detection and classification are handled independently, each component can be improved or replaced without requiring a full system retraining. This architectural flexibility not only simplifies maintenance but also makes the overall framework more adaptable for different use cases, such as academic research, environmental education, or on- site identification tasks in the field.

    Despite these strengths, there are still areas where the system could be refined. For instance, the detection model sometimes struggles with scenes that include cluttered backgrounds or partially obscured leaves. Enhancements such as multi-scale feature fusion or the addition of advanced attention modules could help address this limitation by improving the models ability to detect smaller or less distinct leaf regions. In terms of classification, errors occasionally occurred between species with nearly identical shapes or venation patterns. Future versions of the system could explore the use of structural analysis techniques like modeling the internal skeleton of the leaf or analyzing vein geometry to better differentiate these visually similar species and improve the overall reliability of the pipeline. Expanding the dataset to cover broader geographic regions and seasonal conditions, along with incorporating learning techniques that improve feature separation, may further enhance the robustness of the

  2. S. Kavitha and D. S. S. Sai, Real-time identification of medicinal plants using MobileNet model, Indonesian Journal of Electrical Engineering and Computer Science, vol. 26, no. 1, pp. 248254, Apr. 2022.

  3. F. M. Md Zin, M. F. A. Hamid, and M. H. Marhaban, Medicinal plant identification using deep-learning and data augmentation, IOP Conf. Series: Materials Science and Engineering, vol. 917, pp. 012007, 2020.‌

  4. A. Sachar and A. Kumar, Ensemble deep-learning architecture for medicinal leaf identification, Biomedical Signal Processing and Control, vol. 74, pp. 103524, 2022.

  5. R. Dwivedi, A. Agrawal, and R. Bansal, Progressive transfer learning and SVM for medicinal plant identification, Multimedia Tools and Applications, vol. 82, pp. 1677316791, 2023.

  6. P. Saikia and R. K. Sarma, Identification of some medicinal plants based on leaf using neural network, in Proc. of 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, Mar. 2021, pp. 245249.

  7. S. Manoharan, Two-stage herbal plant recognition using deep knowledge-based decision fusion, Indonesian Journal of Electrical Engineering and Computer Science, vol. 24, no. 3, pp. 15131520, Dec. 2021.

  8. Q. Zhang, Y. Wang, Y. Yu, and Y. Bai, TCMP-300: A large- scale benchmark dataset for traditional Chinese medicinal plant recognition, Data in Brief, vol. 47, pp. 108993, 2025.

  9. B. Pushpa and D. Rani, DIMPSAR: Indian medicinal plant leaf and species image dataset, Data in Brief, vol. 50, pp. 109497, 2023.

  10. M. Banala and R. Duvvuru, YOLO-based detection of turmeric leaf disease using image processing, Materials Today: Proceedings, 2025. [Online]. Available: https://doi.org/10.1016/j.matpr.2025.03.165.