Deep Learning-Based Breed Classi?cation of Cattle and Buffaloes using Ef?cientNetV2-S

doi:https://doi.org/10.5281/zenodo.19878445

Volume 15, Issue 04 (April 2026)

Deep Learning-Based Breed Classification of Cattle and Buffaloes using EfficientNetV2-S

DOI : https://doi.org/10.5281/zenodo.19878445

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 15
Authors : Priyanka, Kothagundi Sardhar, Sakinala Pavan Ajay, Abbineni Roshan Chowdary, Bavisetti V Venkata Sai Ganga Bulli Raju, Devika S, Mahammad Abujar
Paper ID : IJERTV15IS042941
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 29-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Deep Learning-Based Breed Classification of Cattle and Buffaloes using EfficientNetV2-S

(1) Priyanka, (2) Kothagundi Sardhar, (3) Sakinala Pavan Ajay, (4) Abbineni Roshan Chowdary, (5) Bavisetti V Venkata Sai Ganga Bulli Raju,

(6) Devika S, (7)Mahammad Abujar

(1,2,3,4,5,6,7)Department of Computer Science and Engineering,

Lovely Professional University, Phagwara, Punjab, India

Abstract – Accurate breed identication enables breeding ini-tiatives, insurance procedures, and herd management appli-cations; however, breed assessment on the ground remains a laborious process and relies on human judgment. This paper proposes an image classication model for breed prediction in bovines based on EfcientNetV2-S. We experiment with 2,427 images from a publicly available cattle-and-buffalo dataset, where our study dataset consists of ten categories of cattle breeds and is partitioned into training, validation, and testing sets using a stratied 70/15/15 split. Our breed prediction model is comprised of the pre-trained EfcientNetV2-S backbone network, a regularized classier, focal loss, re-weighting, and extensive augmentation to compensate for data scarcity and class imbal-ance. Our training procedure employs AdamW optimizer, cosine decay, and early stopping techniques. The proposed architecture achieves 90.93% accuracy on validation and 90.68% on testing datasets with test-time augmentation. Macro and weighted F1 scores for the test set are equal to 0.91, while class-wise F1 scores lie between 0.84 and 0.98. Misclassications occur mainly among morphologically alike Zebu breeds, namely Gir, Sahiwal, and Tharparkar breeds.

Index TermsBreed classication, EfcientNetV2-S, transfer learning, livestock vision, imbalanced learning

Introduction

Breed identication is a crucial part of cattle documentation, breeding, conservation, and pricing. In India, where there is great diversity in cattle breeds and where cattle breed registration is difcult to perform, visual recognition is still commonly practiced, although it relies heavily on subjective criteria and practical experience. This means that an objective method for breed identication using images could help to expedite this process.

In terms of computer vision, identifying breed is a very specic classication task. Many breeds are quite similar by their body color, shape of the hump, direction of horns, and face characteristics, but eld photos differ in many factors including camera angle, lighting, and occlusions. In particular, a useful approach should extract these tiny differences without being overly complex to be applied. It is especially true when talking about farming-related applications which may require inference on a regular machine or even mobile device.

The current literature proves the effectiveness of deep learning methods for livestock identication; however, three

major challenges keep emerging in these studies: small dataset, imbalanced classes, and insufcient diversity in terms of repre-senting indigenous Indian breeds. The current research utilizes the cattle and buffalo dataset as the source code repository; nevertheless, the chosen subset for evaluation comprises only ten breeds of cattle.

Our three key contributions are: 1) a transfer learning pipeline based on EfcientNetV2-S for classifying the dog breeds into ten classes; 2) a technique to handle data imbalance using focal loss, weighted sampling, and multi-level aug-mentation; and 3) a concise evaluation process that includes overall accuracy, class-wise accuracy, confusion matrix, and impact of test-time augmentation. The rest of the paper is structured as follows. Section II summarizes the existing literature, Section III presents the methodology, Section IV discusses the methodology, Section V explains the systems process ow Sections V-EVI and discusses implications.
Literature Review

The issue of breed identication is a major challenge in animal breeding due to its applicability in traceability, conservation of genetic materials, certication of products, and herd management. Currently, efforts are being made to develop approaches for breed identication through genomics and imagery analysis based on specic markers within cattle breeds. These two methods are aimed at reducing the reliance on visual inspection and enhancing scalability. [1], [2].

As regards genomic breeding identication based on the analysis of genomic data, it should be highlighted that Kasarda et al. [1] examined 17 breeds of cattle by employing 50K SNP data and analyzing various feature-selection approaches, including FST, PCA, and random forest-based methods. In the course of research, the authors found out that high classica-tion accuracy was achieved with the help of random forests, while various importance measures may yield an identical informative SNP prole, proving that a minimal set of markers may be sufcient for efcient identication.

Kumar et al. [2] furthered the research area by considering Tharparkar cattle and proving that an identication system using ultra-low-density SNP panels may yield similar high identication rates. They used a combination of GWAS, PCA,

and machine learning techniques and managed to shorten the large list of candidate genes to two small panels consisting of 23 and 48 SNPs. This research is particularly signicant as it proves that breed identication can be more affordable and fea-sible through the use of small genomic panels. Another piece of research, which is relevant to our topic, further validates the hypothesis that discriminative target selection may enhance the efciency of biological identication systems. Specically, Yang et al [3] developed an algorithm that can be used to design typing assays based on the whole-genome sequences of organisms by identifying the most informative loci. This approach may have applications beyond the scope of our study, but it underscores the general concept we are interested in.

Regarding the computer vision technique, the researchers from Shojaeipour et al [4]have developed a novel framework for cattle biometric identication in two stages, where rst, the muzzle region was detected and followed by the application of few-shot deep transfer learning for biometric identication. This approach has proven that cattle can be successfully identied through automation, even with a minimal number of pictures available for each cow.

Gupta et al. [5] proposed a machine vision algorithm for automatic identication of the breeds of dairy cows based on the YOLOv4 framework on a dataset of images of eight breeds that was mined from the Internet. While the system performed well in terms of classication accuracy, it additionally pro-vided the possibility of enhancing its stability by employing video frame-to-frame tracking technology. Such results are signicant since they show advantages and drawbacks of single-shot recognition.

Duraiswami et al. [6] represent an earlier stage of cattle breed recognition research, where image processing and clas-sical machine-learning methods were used for breed detection and categorization. Their study is useful because it shows the progression from handcrafted visual descriptors to more advanced deep-learning methods. However, methods based on classical image processing are generally less robust to variations in pose, illumination, background clutter, and partial occlusion, which are common in real farm environments.

In general, the literature shows continuous advancements in genomics-based and imaging-based identication of cattle. Nevertheless, some issues need to be reolved. In partic-ular, resistance to harsh farming conditions is insufcient, particularly when imaging-based identication suffers from blurring, occlusion, complex background, and pose variability.
[4][6]. Breed generalization is another difculty, given that genetically or physically similar breeds could be hard to distinguish [1], [2]. There also does not exist an integrated pipeline that combines breed detection, breed identication, and decision-making for deployment yet. These issues ne-

Proposed Methodology

A. Dataset and Preprocessing

Experiments have been conducted using a handpicked dataset obtained from the Kaggle Cattle and Buffaloes dataset. This handpicked dataset consists of 2,427 images from ten different breeds of cattle, which include Gir, Holstein Friesian, Jersey, Konkan Kapila, Mewati, Ongole, Shweta Kapila, Tharparkar, Ponwar, and Sahiwal. The dataset comprises a variety of breeds, ranging from exotic to indigenous ones, with large variations in viewpoint, postures, backgrounds, and lighting conditions.

TABLE I

Breed-Wise Image Counts in the Curated Dataset

Breed	Images
Gir	207
Holstein Friesian	326
Jersey	162
Konkan Kapila	226
Mewati	232
Ongole	231
Shweta Kapila	229
Tharparkar	196
Ponwar	202
Sahiwal	416
Total	2427

The split was stratied into 70/15/15 ratios and yielded 1,696 images for the training set, 366 images for the validation set, and 365 images for testing. The dataset had a mild form of class imbalance since the largest class was Sahiwal and the least was Jersey, which corresponds to a ratio of imbalance of

2.57:1. All the images were resized to 320 × 320, and for the training data, a random crop of 288 × 288 was done.

The augmentation process was created to enhance the robustness of the system in realistic imaging scenarios. The geometric transformation processes included horizontal ips, afne transformations, and perspective transformations, the photometric transformation processes included brightness changes, contrast changes, and color transformations, while degradation transformations were done through blurring and adding noise. CoarseDropout was added to prevent the system from becoming overly dependent on localized texture infor-mation.

TABLE II

Training-Time Augmentation Policy

Group Operations

Geometric Flip, shift, scale, rota-

tion, perspective Photometric Color jitter,

HSV/RGB shift, brightness, contrast

cessitate developing compact deep learning models that can

accommodate class imbalance in real-world cattle breeding classication.

Degradation Regular.

Noise, blur, motion blur

Coarse dropout

The adoption of this technique is aimed at creating eld-like conditions in which the image capture is hand-held, the view is not frontal, there is some level of motion artifact, and partial occlusion. In other words, the network will be forced

TABLE III

Training Configuration

Item Setting

to depend on structural information at the breed level.

B. Model Architecture

The classication network employs EfcientNetV2-S as the backbone architecture due to its advantageous accuracy-efciency ratio compared to other members of the EfcientNet family. The global average pooled 1280-dimensional feature is normalized through batch norm, followed by dropout (p = 0.5), dense layer (12801024), GELU non-linearity, dropout (p = 0.4), and a linear layer for classication into ten categories.

Input resolution Backbone Batch size Optimizer

Initial learning rate Weight decay Scheduler

Loss Sampling Early stopping Inference

D. Implementation Details

320 × 320, crop to 288 × 288

EfcientNetV2-S 16

AdamW

5 × 104

104

Cosine annealing Focal loss ( = 2)

Weighted random sampler Patience = 6 epochs

TTA with 5 passes

FC (102410)

breed logits

FC (12801024)

+ GELU + Dropout

BatchNorm + Dropout

EfcientNetV2-S feature extractor

Input image

320 × 320

Fig. 1. Classier architecture used for breed prediction.

C. Training Strategy

The training was done from scratch using AdamW opti-mizer with learning rate starting at 5 × 104, weight de-cay of 104, cosine annealing schedule, and early stop-ping with patience of six epochs. Focal loss along with a WeightedRandomSampler were employed to handle class imbalance. The focal loss is dened as

Lfocal = t(1 pt) log(pt), (1)

pt refers to the predicted probability of the positive class while t denotes the class weighting factor, with the exponent = 2 focusing on difcult examples. Gradient magnitudes were restricted to 1.0, while the batch size was set to 16. The model checkpoints with the best validation accuracy were picked. During inference, ve stochastic augmentations of each test image were performed.

Focal loss along with weighted sampling is essential for this particular dataset since imbalance exists on both the class level and the mini-batch level. Weighted sampling helps the model focus more on the smaller classes, whereas focal loss ensures that the decision boundary is not overwhelmed by the easy cases belonging to the majority class.

The pipeline for training the models was built using PyTorch and run on Tesla T4 GPUs. The model selection process was done solely by measuring the validation accuracy, and the best model was kept along with its class mapping to ensure the consistency of labels used during the test phase.

System Architecture

Resize, crop, augmentation ImageNet normalization

Per-class evaluation, TTA, deployment-ready inference

EfcientNetV2-S training focal loss + AdamW

Curated breed images stratied split

The entire pipeline has been structured to be a succinct four-tier process consisting of data set preparation, image transfor-mation, model learning, and inference with a view towards deployment. Images are classied according to breed before being split using a stratied technique. Data augmentation and sampling based on imbalanced classes have been used in the training path while determinate pre-processing is carried out in the evaluation path. Class probabilities are generated using EfcientNetV2-S. Figure 2 summarizes the workow.

Fig. 2. Compact workow of the proposed breed classication system.

Two key choices must be made when implementing this solution. The rst choice is to maintain the same denition of preprocessing during validation, testing, and inference to ensure consistency between predicted scores and the eval-uation method. The second choice is to retain the optimal validation checkpoint along with the label mapping for direct top-k inference on single images.

The decoupling between heavy training on one side and fast inference on the other can prove valuable for farm-centric applications. Model training happens once with optimization and selection while deployment only involves loading the

images, normalizing them, running them through a forward pass, and scoring the resulting probabilities. This makes the process compatible with existing desktop-based dashboards, smartphone applications, or edge computing solutions with minimal modications.

In an actual implementation, the output of the inference engine can include not only the predicted best breed but also its alternatives along with their probabilities. It elps in situations where there is uncertainty in the classication of the sample, where it might be more appropriate to send the sample for manual inspection rather than making a conclusive decision.

Results

TABLE IV

Epoch	Train Loss	Train Acc (%)	Val Loss	Val Acc (%)
1	0.6391	67.37	0.3460	75.27
2	0.8444	58.42	0.3397	78.85
3	0.5671	71.73	0.5030	76.92
4	0.4434	76.68	0.7385	73.63
5	0.3912	78.39	0.3077	79.40
6	0.3710	80.45	0.3622	81.59
7	0.3495	80.74	0.2985	82.97
8	0.3598	80.09	0.4315	80.22
9	0.2557	84.45	0.4191	77.47
10	0.2769	84.28	0.2751	84.07
11	0.2468	85.75	0.2556	85.16

Per-Epoch Training and Validation Metrics

Training Dynamics

The per epoch losses and accuracies for training and valida-tion sets during the 11 epochs that have been run before early stopping are listed in Table IV, and plotted in Figure 3. The training loss dropped monotonically from 0.6391 in epoch 1 to 0.2769 in epoch 10, and training accuracy increased from

TABLE V

Test-Set Accuracy With and Without TTA

Inference Mode Accuracy (%)

75.27% to 84.07%. The validation loss function did not have

Single inference (no TTA)

90.41

monotone behavior due to the stochasticity in the augmented training.

Fig. 3. Training and validation loss after 30 epochs. (LEFT) The training loss gradually decreases towards 0.06, whereas the validation loss demonstrates typical oscillations caused by extensive data augmentation. (RIGHT) The maximum accuracy on the training set is approximately 96%, and the accuracy on the validation set remains stable at 8890%.

The accuracy graphs (Figure 3, right) exhibit impressive convergence behavior. At the twentieth epoch, training accu-racy exceeded 90%, while the peak value of about 97% was observed towards the end of the training process. Validation accuracy followed a similar pattern, achieving its maximum of around 90.93% at the twenty-fourth epoch. The small difference between the two accuracies implies the success-ful regularization of the EfcientNetV2-S backbone through dropout and augmentation techniques.

The quick progress made in the rst couple of itera-tionswhere the validation accuracy was increased from 75.27% to 78.85% during epoch 1 and epoch 2is due to the ImageNet-21k pretrained backbones prociency in capturing low and mid-level visual information.
Test-Set Performance

The best-performing checkpoint was selected based on the test set consisting of 365 images. The performance of the model using only one inference is compared with the perfor-mance when using test-time augmentation with ve stochastic forward passes in Table V.

TTA (n = 5 passes) 90.68

There was only a minor improvement of 0.27% compared to single inference with TTA. This may be because the validation and test preprocessing pipeline uses only deterministic resizing and normalization according to ImageNet standards, while TTA brings back some stochastic noise introduced during training.

Per-Class Metrics

Table VI reports precision, recall, F1-score, and support for each of the ten breeds on the test set.

TABLE VI

Per-Class Classification Report on the Test Set (365 Samples)

Class	Precision	Recall	F1	Support
Gir	0.88	0.94	0.91	31
Holstein Friesian	0.92	0.96	0.94	51
Jersey	0.91	0.81	0.86	26
Konkan Kapila	0.89	0.97	0.93	34
Mewati	0.94	1.00	0.97	33
Ongole	0.81	0.88	0.84	33
Shweta Kapila	0.97	0.88	0.92	33
Tharparkar	0.84	0.75	0.79	28
Ponwar	0.99	0.97	0.98	30
Sahiwal	0.91	0.88	0.89	66
Macro avg	0.91	0.90	0.90	365
Weighted avg	0.91	0.91	0.91	365
Accuracy			0.91	365

Ponwar and Mewati exhibited the highest F1-scores, which were 0.98 and 0.97 respectively. Mewati attained a awless recall value of 1.00, in line with the distinctive fur pattern and physical appearance. The Holstein Friesian (F1 = 0.94)

and Konkan Kapila (F1 = 0.93) breeds did well too. However, Tharparkar (F1 = 0.79) and Ongole (F1 = 0.84) produced the worst scores. The two breeds are white and light grey-coloured Zebus and possess similar phenotypes, which is proven by their confusion matrix showing mutual misclassications for the two breeds. Gir breed had excellent recall (0.94), but poor precision (0.88) due to frequent misclassication of the breed among others.

Comparison with Prior Work

Table VII provides a comparative framework for the pro-posed architecture within existing works for cattle and live-stock breed classication. If not using the same data set, statistics have been taken from other livestock classication datasets, which have similar numbers of classes or similar image sizes.

TABLE VII

Comparison with Prior Methods on Cattle/Livestock Breed

Classification

Method Backbone Classes Accuracy (%)

Conclusion

In this research, a novel approach that makes use of trans-fer learning from EfcientNetV2-S network architecture was developed for automated breed recognition of Indian cattle. With an input dataset of 2,427 images following a stratied distribution with 70%, 15%, and 15% training, validation, and testing splits, respectively, the trained model attained 90.93% validation and 90.68% test accuracy with test-time data augmentation. Focal loss function, weighted random sampling, and multiple levels of augmentation have proven effective in dealing with the challenge of class imbalances and variability in natural images. In this work, visual differentia-tion has been established between breeds such as Ponwar and Mewati, which exhibited high classication accuracies (F1-score: 0.98 & 0.97, respectively). However, morphologically similar breeds of Zebu (Tharparkar, Sahiwal, and Gir) cotin-ued to be problematic areas in this study. Lightweight training for inference provides for portability of models onto edge devices in constrained environments. However, limitations in this approach involve working with limited dataset, lack of buffalo breed classication, and lower robustness in full unconstrained scenarios.

Khan et al. [7]

Warhade et al. [8]

Hybrid CNN + ML Model Attention-based TL (ResNet/EfcientNet)

83.50

56 90.00

References

R. Kasarda, N. Moravc´kova´, G. Me´sza´ros, M. Simcic, and D. Zaborski,

Manoj et al. [9]
CNN (custom) 82.00

Classication of cattle breeds based on the random forest approach,

Livestock Science, vol. 267, p. 105143, 2023, doi: 10.1016/j.livsci.2022.

Proposed EfcientNetV2-S 10 90.68

Some prior works focus on cattle identication rather than breed classication; datasets and evaluation protocols vary.

The suggested architecture performs better than all the baseline architectures discussed above. Compared with the closest prior architecture (EfcientNetB4, 90.00%), our ap-proach shows improvement of 0.68 percent point performance-wise despite the use of a lighter backbone due to the use of focal loss and weighted random sampling together with better pretraining of EfcientNetV2-S on the ImageNet-21k dataset.
1. Applications
This proposed framework will be able to help with digital herd registration, breed-specic advisory systems, and insur-ance documentation as well through the ability to determine breed quickly using an image-based approach, without any need for expert opinion. This is because all inference needs are fullled using just a photo and a saved checkpoint, which allows for seamless integration into edge-based mobile applications used in low-resource settings.

As far as the breeding process is concerned, the proposed system can work as an initial visual classier that will be validated by experts in this eld, which would save time and effort otherwise spent on labeling by humans. Being able to provide the top k results for each breed, along with condence scores, means that it would help in implementing human-in-the-loop processing of results.

105143.
H. Kumar, M. Panigrahi, D. Seo, S. Cho, B. Bhushan, and T. Dutt, Machine learning-aided ultra-low-density single nucleotide polymor-phism panel helps to identify the Tharparkar cattle breed, OMICS: A Journal of Integrative Biology, vol. 28, no. 10, pp. 514525, 2024, doi: 10.1089/omi.2024.0135.
J. Y. Yang, S. Brooks, J. A. Meyer, R. R. Blakesley, A. M. Zelazny, J. A. Segre, and E. S. Snitkin, Pan-PCR, a computational method for designing bacterium-typing assays based on whole-genome sequence data, Journal of Clinical Microbiology, vol. 51, no. 3, pp. 752758, 2013, doi: 10.1128/ JCM.02525-12.
A. Shojaeipour, G. Falzon, P. Kwan, N. Hadavi, F. C. Cowley, and D. Paul, Automated muzzle detection and biometric identication via few-shot deep transfer learning of mixed breed cattle, Agronomy, vol. 11, no. 11,

p. 2365, 2021, doi: 10.3390/agronomy11112365.
H. Gupta, P. Jindal, O. P. Verma, R. K. Arya, A. A. Ateya, N. F. Soliman, and V. Mohan, Computer vision-based approach for automatic detection of dairy cow breed, Electronics, vol. 11, no. 22, p. 3791, 2022, doi: 10.3390/electronics11223791.
N. R. Duraiswami, S. Bhalerao, A. Watni, and C. N. Aher, Cattle breed detection and categorization using image processing and machine learn-ing, in International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), 2022, pp. 16, doi: 10.1109/ASSIC55218.

2022.10088326.
S. S. Khan, N. V. Doohan, M. Gupta, S. Jaffari, A. Chourasia, K. Joshi, and B. Panchal, Hybrid Deep Learning Approach for Enhanced Animal Breed Classication and Prediction, Traitement du Signal, vol. 40, no. 5,

pp. 20872099, 2023, doi: 10.18280/ts.400526.
R. Warhade, I. Devi, N. Singh, S. Arya, K. Dudi, S. S. Lathwal, and D. S. Tomar, Attention module incorporated transfer learning empowered deep learning-based models for classication of phenotypically similar tropical cattle breeds (Bos indicus), Tropical Animal Health and Production, vol. 56, no. 6, p. 192, 2024, doi: 10.1007/s11250-024-04050-7.
S. Manoj, R. S, and K. V, Identication of Cattle Breed using the Convolutional Neural Network, in 3rd International Conference on Signal Processing and Communication (ICPSC), 2021, pp. 503507, doi: 10.1109/ICSPC51351.2021.9451706.

Deep Learning-Based Breed Classification of Cattle and Buffaloes using EfficientNetV2-S

Method Backbone Classes Accuracy (%)

Proposed EfcientNetV2-S 10 90.68