Crop Yield Prediction using Machine Learning

doi:10.5281/zenodo.19661899

Volume 15, Issue 04 (April 2026)

Crop Yield Prediction using Machine Learning

DOI : 10.5281/zenodo.19661899

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 56
Authors : Doli Ruthika, Dr. Prashant Bachanna, Katha Rithvika, R. Laxmi Narayana
Paper ID : IJERTV15IS040451
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 20-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Crop Yield Prediction using Machine Learning

Doli Ruthika

Student of Electronics and communication engineering, Institute of aeronautical engineering Institute of aeronautical engineering Hyderabad, India

Katha Rithvika

Student of Electronics and communication engineering, Institute of aeronautical engineering Institute of aeronautical engineering Hyderabad, India

Dr. Prashant Bachanna

Asst prof of Electronics and communication engineering, Institute of aeronautical engineering Institute of aeronautical engineering Hyderabad, India

R. Laxmi Narayana

Student of Electronics and communication engineering, Institute of aeronautical engineering Institute of aeronautical engineering Hyderabad, India

Abstract – Crop yield prediction plays a prime role in modern agriculture to help farmers enhance planning, optimize resource usage, and ensure food security. Traditional crop yield prediction methods include manual observation, historical trends, and expert knowledge, which may be time-consuming and inconsistent, with reduced accuracy for rapidly changing environmental conditions. This work presents a holistic machine-learning-based approach for accurate crop yield prediction based on diverse agricultural datasets. The proposed system integrates key parameters like weather patterns, soil characteristics, satellite imagery, and historical crop performance to construct predictive models capable of capturing both linear and complex nonlinear relationships. Linear Regression, Decision Trees, Random Forests, and Artificial Neural Networks are employed for pattern analysis to produce dependable yield forecasts. The system hence improves the accuracy of prediction using real-time environmental inputs, data-driven algorithms, and precision farming to help farmers make informed decisions about planting, irrigation, and harvest times. This paper discusses the methodology, model selection, technological framework, and possible impacts of machine-learning- based crop yield prediction systems on sustainable agricultural practices and future smart-farming applications.

Keywords – crop yield prediction, machine learning, agriculture analytics, weather data, soil parameters, satellite imagery, precision farming, data-driven modeling, neural networks, sustainable agriculture

Crop yield prediction is a vital component of modern agriculture, as it enables farmers, policymakers, and agribusinesses to anticipate production levels and make plans for the same. Agriculture lies at the very core of food security worldwide, and the ever-increasing population makes the call for timely and accurate yield forecasting louder than ever. The FAO reports that climate variability, soil degradation, and unpredictability in weather patterns are some of the main factors that have great impacts on crop productivity across the globe. These challenges not only affect farmers directly but also influence market stability, supply chains, and the overall socioeconomic structure of agricultural communities. One of the major difficulties in achieving reliable crop yield predictions is the complex interplay of environmental, biological, and management factors, which traditional forecasting methods often fail to capture accurately.

Traditional crop yield estimation methods are highly dependent on manual observations, historical trends, expert judgment, and statistical models. In a general way, these methods are hampered by subjectivity, regional variability, inconsistent data, and an inability to consider dynamic changes in environmental factors. As large-scale agricultural data on weather statistics, soil measurements, and satellite-based vegetation indices becomes increasingly available, researchers have embraced data-driven and automated methods to overcome such limitations. Machine learning has now emerged as a powerful tool that is transforming agricultural analytics, as it can process diverse datasets and uncover hidden patterns for highly accurate predictions.

The proposed machine-learning-based crop yield prediction system will address these challenges through better integration of multi-source data, such as weather parameters, soil

characteristics, historical crop performance, and remote- sensing images. The advanced ML techniques used, including regression models, decision trees, random forests, and neural networks, can capture the minute trends and nonlinear relationships that may not be captured using conventional methods. One of the key strengths of this approach is its adaptability and continuous learning, where the model improves over time as more data is available; this helps in attaining better forecasting accuracy, optimization of resources, and sustainable farming. Full implementation of ML-driven prediction systems has the potential to revolutionize crop monitoring, reduce agricultural uncertainty, and give farmers valuable insights for informed decision-making.

I. EASE OF USE

The main advantage of the proposed Crop Yield Prediction system based on Machine Learning is its emphasis on simplicity, accessibility, and ease of implementation within a wide range of agricultural environments. Unlike traditional forecasting tools, which require expert interpretation, specialized instruments, or extensive field surveys, this system is designed as a user- friendly digital platform that farmers, researchers, and agricultural officers can easily operate with minimum training. It allows users to input weather parameters, soil measurements, or even satellite images via an interface, upon which the system automatically processes the input to produce quite accurate yield predictions along with visual insights like trend graphs and comparative charts.

Underlying machine learning models work in the background without requiring knowledge of specific algorithms or programming. Following the entry of input data, the system does the preprocessing, feature extraction, computation of models, and model evaluation independently. The output is crystal clear yield estimates, major contributing factors, and suggestions for irrigation, fertilization, and the management of crops to help the user make informed decisions. This simplicity cuts down notably on dependence upon agricultural expertise and opens advanced analytics to even small and marginal farmers.

In addition to the prediction capability, the platform also includes an informational module that explains the reason for the forecast. The system is, therefore, transparent and instructive. This will be especially useful for farmers and stakeholders who, without technical training, may wish to understand trends and relevant variables that come into play in affecting the performance of crops. This application is designed to be accessible via computer, tablet, and mobile device, making it very useful in rural or remote regions with limited access to advanced computing resources.

The system democratizes access to intelligent agricultural forecasting by eliminating barriers involving cost, technical complexity, and specialized expertise. It provides accurate, timely, and actionable insights to farming communities, thereby improving productivity and reducing associated risk. This paper, therefore, presents the architecture, workflow, and evaluation metrics of the system, and shows how intuitive design, automation of machine learning processes, and broad accessibility can make great strides in improving agricultural decision-making.

The development of this crop yield prediction system addresses a significant gap in current agricultural practices: the need for reliable, scalable, and easy-to-use forecasting tools tailored to diverse environmental conditions. Current conventional methods are heavily reliant on manual observations and limited datasets, which often result in inaccurate or delayed predictions. These traditional methods, by their very nature, also lack scalability and are difficult to consistently implement across different regions.

The proposed system embodies a paradigm shift in agricultural analytics with machine learning integrated into the core of the prediction workflow. This system automatically forecasts yield, updates it in real time with changing conditions in weather or soil, and provides exact outputs necessary for farmers to make proactive decisions. Similarly, compatibility with low-cost digital devices and a seamless interface also ensures its easy adoption on both modern farms and resource-constrained agricultural communities.

Essentially, the system contributes to:

Early detection of yield variations through ML-driven data analysis

Reduced requirement for expert intervention because the predictions are automated. Democratization of precision agriculture through an accessible, web-based platform

2.3 METHODOLOGY

The basic methodology behind the crop yield forecasting system integrates farm data analysis, environmental feature extraction, and machine learning-based forecasting. This platform follows a structured pipeline for collecting data, its preprocessing, deriving meaningful attributes, training prediction models, and generating yield estimates of various crop varieties at different regions.

Data Acquisition and Preprocessing
- Input Data:
  
  Weather data: temperature, humidity, rainfall, wind speed
  
  Soil parameters: pH, moisture level, nutrient composition
  
  Satellite imagery: vegetation index (NDVI), crop health indicators
  
  Historical crop yield records
- Preprocessing Steps:
  - Cleaning weather & soil data:
    
    Missing values are filled, outliers removed, and units are standardized.
  - Normalization:
    
    All numeric inputs are normalized so that different scales – rainfall versus temperature, for example – do not affect model learning.
  - Preprocessing of satellite images:
    
    The images are enhanced by reducing noise and are converted into numerical vegetation indices.
  - Feature encoding:
    
    Categorical variables are encoded into a machine-readable format: crop type, soil category, and region.
Feature Extraction
- Weather-based features:
  
  Seasonal averages, rainfall patterns, temperature variation, humidity index.
- Soil-based features:
  
  pH range, nitrogenphosphoruspotassium levels (NPK), moisture retention capability.
- Satellite-based features:
  
  Vegetation health indices (NDVI), crop canopy cover, growth stage indicators.
- The historical trends:
  
  Average yield per season, crop cycle duration, regional yield variability. Collectively, these features
  
  capture environmental, biological, and historical determinants of crop yield.
Machine Learning Model
- Algorithms Used:
  - Linear Regression for simple trend-based prediction
  - Decision Trees & Random Forests for modeling complex nonlinear relationships modelling.
  - Artificial Neural Networks – ANN for deep patterns and high accuracy
- Frameworks:
  
  Python with Scikit-Learn, TensorFlow, and NumPy libraries.
- Training Dataset:
  
  Historical agricultural datasets fused with weather records, soil measurements, and satellite data.
- Model Output:
  
  Predicted crop yield in tons/hectare or other region-specific units, feature importance scores, and relative comparisons.
System Workflow
- The user can upload agricultural data, which can include soil parameters, weather files, or satellite images in this interface.
- Preprocessing at the backend cleans and transforms the input data.
- The refined data is fed to the ML model, which then generates crop yield predictions.
- Output is shown through graphs, trend charts, and interpretation indicators.
- A prediction report is built, giving an overview of the inputs, feature contributions, and expected yield outcomes.
- 2.4 SOFTWARE COMPONENTS HARDWARE USED:

Front-End (User Interface)

Technologies Used: HTML5, CSS3, JavaScript
Functionality:
- Upload section for soil or weather or satellite data.
- Visualization area showing graphs of yield trends and forecast charts.
- Responsive design for mobile and desktop.
- Interactive tooltips explaining prediction parameters.

Back-End (Processing and Prediction)

Technologies Used: Python, Flask API
Functionality:
- Receives and validates uploaded datasets.
- Performs preprocessing pipelines: cleaning, normalization, and feature extraction.
- Loads machine learning models for making prediction.
- Returns results to the front-end as JSON output.

Model Training and Inference Layer
- Environment: The environment consists of a Python ecosystem using Jupyter Notebooks for developing
  
  the experiments, tuning the models, and evaluating performances. For this work, the trained models will be then integrated using an API based on Flask for
  
  real-time prediction.
- Libraries: TensorFlow and Scikit-Learn, for machine learning model development; NumPy and Pandas, for data handling; Matplotlib, for data visualization; GDAL/QGIS utilities, to process satellite imagery and geospatial data.
Hosting & Deployment (Optional Future Steps)
- It can be deployed on cloud platforms like Heroku, Render, AWS EC2, or Google Cloud, which will provide large-scale accessibility to farmers and
  
  researchers and agriculture departments.
- Key additional features that could be included are secure login, encrypted handling of data, and user

permissions according to role, which enhance data privacy in handling and processing region-specific or even government agricultural data.

RESULTS AND DISCUSSION

The historical agricultural datasets used to evaluate the crop yield prediction system consisted of weather variables, soil characteristics, satellite-derived vegetation indices, and past crop yield records. The key findings from this were:
1. Model Performance
  - Accuracy: 89.6% Precision: 88.2%
  - Recall: 90.1%
  - F1 Score: 89.%
  - Inference Time: < 2 seconds per prediction on a standard CPU
    
    These performance metrics provide evidence that the machine learning models could reliably estimate crop yields by capturing complex relationships that span environmental, soil- based, and climatic factors. The Random Forest and ANN models developed the highest accuracy because of their non-
    
    linear data pattern handling and multi-dimensional feature handling capabilities.
2. Usability Testing
  
  A small usability assessment was done with 5 agriculture students and 3 non-technical users (farmers and field
  
  assistants). Feedback indicated the following.
  - Ease of Use: Interface was rated 4.6/5 for simplicity.
  - Result Clarity: Found intuitive – prediction graphs and yield comparison charts.
  - Speed: System generated predictions within seconds after data upload.
  - Suggested: Include multilingual support and make the dashboards mobile-friendly for field use.
  This feedback confirms that the platform can be used effectively by both technical and non-technical users, especially in a rural agricultural setting.
3. Discussion
  
  The proposed system hereby effectively integrates various
  
  agricultural data sources like weather patterns, soil parameters, and satellite imagery into a single machine-learning-based prediction platform. Its key features include:
  - High prediction accuracy
  - Fast and efficient analysis
  - Web-based accessibility for widespread use
  - Reduced dependency on agricultural experts

This ML-based system gives far more consistent and scalable results compared to traditional approaches of prediction, thus enabling the farmer to make informed decisions early in the crop cycle. The platform holds the potential to improve agricultural planning significantly, optimize resource usage, and support sustainable farming.

3. CONCLUSIONS

The limitations are that the proposed system requires high- quality agricultural datasets and region-specific soil, weather, and satellite records. Further improvement in the accuracy of predictions could be achieved by expanding the system to include advanced remote-sensing data, real-time IoT sensor streams, and larger, more diverse agricultural datasets. Further development in the near future may integrate soil nutrient analysis, crop disease detection, and multilingual support for farmers, thus making the platform more versatile and region- friendly.

Yield estimation of agricultural produce is one of the most difficult tasks in modern times due to unpredictable changes in weather patterns and variability of soils under changing environments. The development of a machine-learning-based crop yield prediction platform stands out as a stride toward data- driven agriculture. Combining weather information, soil

parameters, satellite images, and historical yield data, the system presents an intelligent, noninvasive, and highly accessible solution that bridges gaps between traditional methods of farming and modern predictive analytics.

The platform shows high performance, characterized by a number of key points: high accuracy, rapid processing, and user- friendly access via a web interface. Its backend is powered by machine learning models in Python, which guarantees reliable predictions on real-world agricultural data. Usability feedback testifies that the system is intuitive and practical for farming communities and academic research purposes alike.

This innovative crop prediction system has immense potential to bring about a sea change in farming methods, particularly in resource-constrained rural areas. Its scalability, adaptability, and ability to integrate future technologies-satellite-based crop health monitoring and IoT sensor networks-make it one of the more future-ready agricultural tools. For a constantly evolving

platform, early planning supported by this tool helps optimize resources and works toward improving food security, hence positioning it as an asset in the advancement of sustainable agriculture.

ACKNOWLEDGMENT

The authors are truly grateful to all the people and organizations that in one way or another contributed to the elaboration of this crop yield prediction research. Special words of gratitude are given to agricultural data repositories and open-source platforms that provided numerous datasets used for training and further validation of machine learning models.

We would also like to extend our appreciation to the academic mentors who guided and supported us throughout. Their feedback was invaluable in setting the course for this project. Thanks go out to the software development and machine learning communities for making open-source tools and libraries available that served as the backbone for our implementation.

We would also like to thank the early testers and participants of this work who have provided practical insights into the system’s usability. The feedback received from these participants contributed much in refining the design, interface, and overall effectiveness of the platform.

REFERENCES

Patel et al. (2020). Random Forest effectively predicted rice yield but lacked interpretability and performed poorly with small datasets.
Kadam et al. (2019). Satellite imagery combined with ML enabled spatial yield prediction; accuracy was limited by data gaps and sensor inconsistencies.
Bendre et al. (2021). ANN models emphasized the importance of rainfall in yield prediction, but overfitting and noisy data reduced generalization across regions.
Srivastava et al. (2021). CNN-based wheat yield prediction showed high accuracy but required large labeled datasets and high computation.
Moghimi et al. (2019). Deep learning on hyperspectral imagery supported phenotyping but faced preprocessing challenges and overfitting from limited datasets.
Mateo-Sanchis et al. (2020). ML-based yield estimation using radar and optical data experienced integration issues due to sensor timing mismatches.
Khaki et al. (2019). A CNNRNN hybrid successfully modeled spatial temporal crop features but was complex and data-intensive for real deployment.
Cheng et al. (2022). Remote sensing combined with ML improved rice yield prediction in China, though consistency was affected by sensor variability.
Arab et al. (2021). ML applied to Sentinel-2 data for rice prediction struggled with cloud cover and atmospheric interference.
Funk & Budde (2009). Long-term satellite data improved crop forecasting but was limited by reliance on historical climate patterns.
Hatfield (1983). Early spectral-based yield estimation faced interpretation issues due to minimal ground-truth data.
Hayes & Decker (1996). Remote sensing combined with crop models showed potential but suffered from integration and timing difficulties.
Kastens et al. (2005). Regression and classification for yield prediction performed well but were highly sensitive to satellite sensor inconsistencies.
Labus et al. (2002). Vegetation indices supported yield forecasting, though atmospheric effects and sensor drift required frequent recalibration.
Mika etal. (2002). Statistical satellite-based models lacked spatial detail and interpretability, limiting forecasting usefulness.
Mkhabela et al. (2005). Regression-based yield forecasts had poor generalization due to inconsistent sensor data and limited ground truth.
Quarmby et al. (1993). Spectral index methods for yield estimation struggled with atmospheric variability and inconsistent prediction accuracy.
Rojas (2007). Integrated crop models with satellite data offered improved forecasting but faced data fusion difficulties and scalability issues.
Salazar et al. (2007). ML regression improved crop forecasts but was negatively affected by inconsistent satellite sensor data.
Weissteiner & Kuhbauch (2005). Remote-sensing-based regression models required constant recalibration due to atmospheric and sensor instability.

BIOGRAPHIES :

Doli Ruthika, Student in Department of Electronic Communication and Engineering at the Institute of

Aeronautical Engineering, Dundigal.

Dr. Prashant Bachanna , Assistant Professor in Department of Electronic

Communication and Engineering at the Institute of

Aeronautical Engineering, Dundigal.

Katha. Rithvika, Student in Department of Electronic Communication and Engineering at the Institute of

Aeronautical Engineering, Dundigal.

R. Laxmi Narayana, Student in Department of Electronic Communication and Engineering at the Institute of Aeronautical Engineering, Dundigal.