DOI : 10.17577/IJERTV15IS042176
- Open Access

- Authors : Samyak Mutha, Aayush Doshi, Anish Khadamkar, K Lakshmi Narayanan
- Paper ID : IJERTV15IS042176
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 26-04-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
AI Driven Satellite Imagery For Environmental Sustainability
Samyak Mutha
Department of Computational Intelligence SRM Institute of Science and Technology Kattankulathur, Chennai, India
Aayush Doshi
Department of Computational Intelligence SRM Institute of Science and Technology Kattankulathur, Chennai, India
Anish Khadamkar
Department of Computational Intelligence SRM Institute of Science and Technology Kattankulathur, Chennai, India
K Lakshmi Narayanan
Assistant Professor, Department of Computational Intelligence SRM Institute of Science and Technology Kattankulathur, Chennai, India
AbstractThis research investigates the use of artificial in- telligence techniques for land cover classification and green cover percentage estimation using satellite imagery in the Chen- nai region. The study utilizes multi-source datasets including LandCoverNet, GLanCE (Landsat-based), Copernicus Global Land Cover Layers, MODIS Land Cover (MOD12Q1), and the National Land Cover Database (NLCD), accessed through Google Earth Engine and USGS Earth Explorer. Spectral indices such as the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) were extracted to identify vegetation patterns. Machine learning models including Random Forest, Support Vector Machine (SVM), Gradient Boosting, and Convolutional Neural Networks (CNN) were implemented for land cover classification. Model performance was evaluated using accuracy, precision, recall, and F1-score. The Random Forest model achieved the highest accuracy of 92.3%, followed by Gradient Boosting (90.1%). On another note, CNNs excelled at extracting spatial features for detecting vegetation. Overall findings emphasize how AI-driven satellite imagery can effec- tively map land cover and assess green areas. This approach aids environmental monitoring efforts and supports sustainable urban planning initiatives.
Index TermsLand Cover Classification, Green Cover Estima- tion, Satellite Imagery, Remote Sensing, Google Earth Engine, LandCoverNet, Copernicus Global Land Cover, Random For- est, Support Vector Machine, Gradient Boosting, Convolutional Neural Networks, NDVI
-
Introduction
Lately, merging artificial intelligence with remote sensing has really boosted how we analyze and watch environmental changes on a big scale. By using satellite images alongside machine learning, we can automatically pull out important details about land features, plant life, and city growth. These methods work great for fast-growing places like Chennai.
Identify applicable funding agency here. If none, delete this.
Here, keeping an eye on how land is used and the amount of greenery is key for smart development and planning for the environment.
Mapping land cover helps us understand changes in natural landscapes caused by urban growth and human actions. Inter- preting satellite images by hand takes a lot of time and isnt easy to scale up. On the other hand, machine learning provides a quicker way. It identifies patterns in multi-spectral images and sorts land into categories like vegetation, built-up areas, water bodies, and barren land automatically.
This study utilizes multiple open satellite datasets including LandCoverNet, GLanCE (Landsat-based observations from 20012019), Copernicus Global Land Cover Layers (100 m resolution), MODIS Land Cover (MOD12Q1), and the National Land Cover Database (NLCD). These datasets are accessed through platforms such as Google Earth Engine and USGS Earth Explorer, providing high-resolution information about vegetation distribution, urban development, and surface characteristics. Vegetation-related spectral indices, particularly the Normalized Difference Vegetation Index (NDVI) and En- hanced Vegetation Index (EVI), are used to quantify vegetation presence and estimate the percentage of green cover in the study area.
This research focuses on using different machine learn- ing models to classify land cover and estimate green cover precisely. The study involves Random Forest, Support Vector Machine (SVM), Gradient Boosting, and Convolutional Neu- ral Networks (CNN). To find out which model works best for satellite-based land cover mapping, their performance is assessed by looking at metrics like accuracy, precision, recall, and F1-score. By comparing these models, the research aims to pinpoint the most effective method.
By combining machine learning algorithms with multi-
source satellite imagery, this research demonstrates an efficient framework for automated land cover analysis and vegetation monitoring. The results can assist urban planners, environ- mental researchers, and policy makers in tracking land use changes, monitoring green spaces, and supporting sustainable city development.
-
RELATED WORKS
The integration of artificial intelligence (AI) and remote sensing has significantly improved the ability to monitor land cover changes and vegetation dynamics across large geographic regions. Recent studies have demonstrated that machine learning techniques applied to satellite imagery can provide accurate and scalable solutions for land cover clas- sification, urban expansion monitoring, and environmental analysis.
Hua et al. (2021) [1] proposed a framework for monitoring forest disturbance and recovery using time-series Landsat im- agery processed on Google Earth Engine. Their study utilized Random Forest classification to detect forest cover changes with high accuracy, highlighting the effectiveness of ensemble learning techniques for large-scale vegetation analysis.
Similarly, Amani et al. (2021) [2] investigated long-term wetland changes in Alberta, Canada using Landsat satellite imagery and cloud-based geospatial processing. Their re- search demonstrated the importance of multi-temporal satellite datasets for identifying land cover transitions and monitoring ecological changes over several decades.
Deep learning methods have also gained significant attention for land cover mapping. Alshehri et al. (2024) [3] introduced a transformer-based model, ChangeFormer, to detect land cover changes using Sentinel-2 imagery. The model successfully captured subtle environmental changes across large regions, showing the growing importance of attention-based architec- tures in remote sensing applications.
In another study, Ullmann et al. (2024) [4] explored the use of UAV-based radar imaging to capture high-resolution environmental data for monitoring land surface conditions and vegetation patterns. Their approach demonstrated how emerg- ing sensing technologies can complement satellite observations for detailed environmental analysis.
Several global land cover datasets have also contributed to improving environmental monitoring. The Copernicus Global Land Cover Layers and MODIS Land Cover products pro- vide consistent global-scale land cover information, enabling researchers to analyze vegetation distribution, urbanization patterns, and ecosystem changes. Similarly, datasets such as LandCoverNet and GLanCE provide multi-spectral satellite imagery that supports machine learning-based land cover classification and vegetation mapping.
Recent research has also emphasized the use of vegeta- tion indices for green cover estimation. NDVI and Enhanced Vegetation Index (EVI) derived from satellite imagery have been widely used to quantify vegetation health and density. Shamloo et al. (2025) [5] demonstrated the use of NDVI-based
satellite data combined with deep learning models to analyze vegetation dynamics and predict vegetation health over time. Furthrmore, convolutional neural networks (CNNs) have shown strong performance in extracting spatial patterns from satellite images. Chintalapati et al. (2025) [6] explored the use of CNN-based algorithms for automated land cover clas- sification using satellite imagery, highlighting the potential of AI-driven systems for real-time environmental monitoring.
Overall, these studies demonstrate that combining machine learning techniques with multi-source satellite datasets enables more accurate and efficient land cover mapping and vegeta- tion monitoring. Building on these advancements, the present study focuses on applying machine learning models such as Random Forest, Support Vector Machines, Gradient Boosting, and CNNs to classify land cover and estimate green cover percentage using satellite imagery datasets processed through Google Earth Engine.
-
DATASETS
In this research, various satellite-based land cover datasets play a crucial role. They are accessed through platforms like Google Earth Engine and USGS Earth Explorer. The goal? To dive deep into analyzing land cover patterns and estimating green cover percentage. These datasets offer high- resolution multi-spectral imagery along with land classification data, which is perfect for machine learning analysis. Heres a summary of the main datasets used in this study:
-
LandCoverNet: A global multi-spectral satellite dataset designed for land cover classification tasks. It contains labeled samples across seven land cover classes and is widely used for training machine learning models on remote sensing imagery.
-
GLanCE (Global Land Cover Estimation): A Landsat- based dataset covering the period from 2001 to 2019. It provides consistent land cover information derived from Landsat imagery, enabling long-term analysis of land use and vegetation changes.
-
Copernicus Global Land Cover Layers: Provides global land cover maps at 100 m spatial resolution derived from Sentinel satellite missions. This dataset includes detailed classification of vegetation, water bodies, urban areas, and other surface types.
-
MODIS Land Cover (MOD12Q1): Offers yearly global land cover classification derived from MODIS satellite observations. It provides information about vegetation distribution and land surface characteristics across mul- tiple land cover categories.
-
National Land Cover Database (NLCD): A high- resolution land cover dataset primarily focused on the United States. It includes detailed land classification in- formation such as forests, agricultural lands, urban areas, and water bodies, which can be used for comparative analysis and model validation.
Fig. 1. Pearson correlation heatmap illustrating the relationships between atmospheric column densities, aerosol indices, and the target pollution level variables.
-
-
METHODOLOGY
The objective of this study is to develop an AI-based framework for identifying land cover types and estimating green cover percentage using satellite imagery datasets. The workflow consists of four main stages: feature extraction from satellite imagery, data preprocessing, machine learning model implementation, and model validation.
-
Integration and Feature Extraction
Satellite-based environmental variables were extracted using datasets available through Google Earth Engine (GEE) and other open geospatial repositories. These features help char- acterize vegetation presence, land surface properties, and urban expansion patterns.
-
Normalized Difference Vegetation Index (NDVI) De- rived from multi-spectral satellite imagery to measure vegetation density and health.
-
Enhanced Vegetation Index (EVI) Used to improve vegetation detection in areas with dense canopy or at- mospheric noise.
-
Land Cover Categories Extracted from global datasets such as LandCoverNet and Copernicus Global Land Cover Layers to identify vegetation, built-up areas, water bodies, and barren land.
-
Surface Reflectance Bands Multi-spectral bands from Landsat and Sentinel imagery used to capture spectral characteristics of land surfaces.
-
Texture and Spatial Features Derived from satellite imagery to capture spatial patterns that help distinguish between natural vegetation and urban structures.
The NDVI value used for vegetation estimation is calculated as:
(NIR RED)
NDV I =
(NIR + RED)
where NIR represents near-infrared reflectance and RED represents the red spectral band. Higher NDVI values indicate dense vegetation cover, while lower values correspond to sparse or non-vegetated surfaces.
-
-
Data Preprocessing
To ensure consistency and improve model performance, several preprocessing steps were performed:
-
Spatial resampling to maintain consistent image resolu- tion
-
Normalization of spectral features
-
Removal of cloud-contaminated pixels
-
Handling missing or incomplete observations
Vegetation indices were aggregated across seasonal intervals to reduce noise and capture stable vegetation patterns. These processed features formed the final dataset used for machine learning model training.
-
-
Machine Learning Models
The prepared dataset was used to train multiple supervised machine learning models for land cover classification and vegetation detection:
-
Support Vector Machine (SVM) Used to separate land cover classes using optimal decision boundaries.
-
Random Forest Classifier An ensemble learning ap- proach that improves classification accuracy by combin- ing multiple decision trees.
-
Gradient Boosting Classifier Sequentially improves model performance by focusing on difficult-to-classify samples.
-
Convolutional Neural Networks (CNN) Utilized to capture spatial patterns and complex features directly from satellite imagery.
These models classify satellite pixels into land cover cat- egories, which are then used to estimate the proportion of vegetation within the study area.
-
-
Model Validation
To evaluate the robustness and generalization capability of the models, 5-fold cross-validation was applied during training. Model performance was assessed using classification metrics including:
-
Accuracy
-
Precision
-
Recall
-
F1-score
These metrics help determine the effectiveness of each model in accurately identifying vegetation and other land cover classes. The final results were further used to calculate the percentage of green cover within the selected study region.
-
-
-
Classification Methods
This study applies several supervised machine learning clas- sification techniques to identify land cover types and estimate green cover percentage from multi-source satellite imagery datasets.
-
Support Vector Machine (SVM)
Support Vector Machine is a supervised learning algo- rithm widely used for classification tasks. It works by identifying an optimal hyperplane that separates different
classes in a high-dimensional feature space. In the context of satellite imagery, SVM is effective in distinguishing land cover categories such as vegetation, water bodies, built-up areas, and barren land using spectral features. Its ability to andle complex boundaries makes it suitable for land cover classification problems. The mathemati- cal formulation for SVM classification is represented in equation (4).
-
Decision Tree Classifier
The Decision Tree Classifier is a non-linear machine learning algorithm that splits the dataset into smaller sub- sets based on decision rules derived from input features.
indicators: NDVI (Normalized Difference Vegetation Index) and NO2 (Nitrogen Dioxide concentration). The models were trained using multi-source satellite datasets, including precip- itation data from GPM, land cover data from Landsat, atmo- spheric NO2 measurements from Sentinel-5P, radar backscatter from Sentinel-1, and vegetation indices from MODIS.
B. Performance Metrics
To evaluate the effectiveness of the models, several regres- sion performance metrics were used:
-
Mean Absolute Error (MAE)
Each internal node represents a feature condition, while each leaf node corresponds to a predicted class label. Decision trees are particularly useful for interpreting relationships between spectral features and land cover
MAE =
m
1 |z m j
j=1
zj| (1)
categories in satellite imagery. Equation (5) represents the decision rule used in the model.
-
Random Forest Classifier
Random Forest is an ensemble learning method that
MAE measures the average magnitude of prediction
errors without considering their direction. It provides a clear interpretation of model accuracy in the same units as the target variable.
-
Mean Squared Error (MSE)
combines multiple decision trees to improve classification performance and reduce overfitting. Each tree is trained on a random subset of the data and features, and the final classification is determined through majority voting. This
MSE =
m
1 (z
m j
j=1
zj)2 (2)
approach is highly robust when dealing with large satellite datasets containing multiple spectral bands and vegetation indices. Equation (6) is used to represent the ensemble prediction process.
-
Gradient Boosting Classifier
MSE calculates the average of the squared differences between predicted and actual values, penalizing larger errors more strongly.
-
Root Mean Squared Error (RMSE)
,
u m
Gradient Boosting is an ensemble technique that builds models sequentially, where each new model attempts to correct the errors made by the previous one. By
RMSE =u, 1
m
j=1
(zj zj)2 (3)
minimizing a specified loss function, the model grad- ually improves classification accuracy. This method is particularly effective in capturing complex relationships between spectral indices such as NDVI and different land cover categories. Equation (7) describes the boosting
RMSE represents the square root of MSE and provides an interpretable measure of prediction error while main- taining sensitivity to large deviations.
-
Mean Absolute Percentage Error (MAPE)
-
optimization process. 100 m z z
MAPE = j j
j
m z
j=1
(4)
Land Cover Type
Area (km2)
Percentage (%)
Vegetation / Green Cover
412.9
38.6
Urban / Built-up Area
356.3
33.3
Water Bodies
98.7
9.2
Barren Land
197.4
18.8
TABLE I
Estimated Land Cover Distribution in the Study Area
-
-
Result and Analysis
A. Overview of Model Usage
This study employs five machine learning regression mod- elsLinear Regression, Decision Tree, Random Forest, Gradi- ent Boosting, and XGBoostto predict two key environmental
MAPE expresses prediction error as a percentage, making it easier to compare model performance across different datasets or scales.
Model
Accuracy (%)
Precision
Recall
F1-Score
SVM
88.4
0.87
0.86
0.86
Decision Tree
84.9
0.83
0.82
0.82
Random Forest
92.3
0.91
0.90
0.90
Gradient Boosting
90.7
0.89
0.88
0.88
CNN
93.6
0.92
0.91
0.91
TABLE II
Performance Comparison of Machine Learning Models for
Land Cover Classification
C. Model-by-Model Explanation and Analysis
-
Linear Regression Formula:
y = 0 +
n
ixi (5)
i=1
Linear Regression was used as a baseline model to establish a reference for comparison with more complex models. It assumes a linear relationship between input variables and the target variable.
-
Decision Tree Regressor Split Criterion:
MSE =
1 N
N
(yi y)2 (6)
i=1
Decision Tree Regression predicts values by recur- sively partitioning the feature space into smaller regions based on feature thresholds. Each split aims to minimize prediction error within the resulting subsets.
-
Random Forest Regressor Ensemble Prediction:
1 T
Fig. 2. Confusion matrix for the Heavy Rain binary classification task, showing the models performance in predicting rainfall events versus non- events
-
Radar Backscatter (Sentinel-1): Synthetic Aperture
Radar (SAR) backscatter values helped detect surface roughness and soil moisture conditions. These features
y =
T
t=1
ft(x) (7)
were especially useful in identifying land surface struc- ture and contributed to improved NDVI prediction per-
Random Forest is an ensemble learning method that combines multiple Decision Trees trained on random subsets of the data and features.
-
-
Gradient Boosting Regressor Boosting Update Rule:
Fm(x) = Fm1(x) + hm(x) (8)
Gradient Boosting builds models sequentially, where each new model attempts to correct the pre- diction errors of the previous models. By iteratively minimizing the loss function, the model gradually improves prediction accuracy and captures complex nonlinear relationships.
D. Feature Contribution and Environmental Insights
Understanding the influence of different input variables is essential for interpreting the behavior of machine learning models. The models trained in this study used multisource
formance.
-
Precipitation Data (GPM): Rainfall patterns directly influence vegetation growth and soil moisture levels. Incorporating precipitation data improved the models ability to capture temporal environmental changes.
Overall, combining optical, radar, and atmospheric satellite datasets allowed the models to learn complex environmental relationships and improve prediction robustness.
-
-
Evaluation Metrics for Model Assessment
To comprehensively evaluate model performance, several statistical metrics were used. These metrics measure prediction accuracy, error magnitude, and the ability of the model to explain variability in the data.
-
Mean Absolute Error (MAE)
satellite data to captur environmental patterns affecting veg-
n
etation health and atmospheric pollution. MAE =
1 |y
-
Vegetation Index (NDVI MODIS): NDVI served as a primary indicator of vegetation density and plant health. Higher NDVI values typically indicate dense vegetation, while lower values represent sparse or stressed vegetation.
-
Nitrogen Dioxide Concentration (Sentinel-5P): NO2 measurements provided insight into atmospheric pollu- tion levels. Regions with higher urban activity showed elevated NO2 concentrations, which negatively correlated with vegetation health.
n i yi| (9)
i=1
MAE measures the average absolute difference between predicted values and actual observations. It provides a clear understanding of the typical prediction error without heavily penalizing large deviations.
Significance:
-
Easy to interpret in real-world units.
-
Lower values indicate more accurate predictions.
-
-
-
Mean Absolute Percentage Error (MAPE)
100 n y y
MAPE = i i
i
n y
i=1
(10)
MAPE measures the average percentage difference between predicted and actual values.
Significance:
-
Expresses error as a percentage, making it easier to compare across datasets.
-
Provides an intuitive understanding of model accuracy.
-
-
Coefficient of Determination (R2 Score)
(y y )2
Fig. 4. Geospatial visualization showcasing the SAR-based mapping layers
-
-
CONCLUSION
This study presented an AI-driven framework for land cover classification and green cover percentage estimation using
2 i i
R = 1
(11)
multi-source satellite imagery. The research utilized datasets
(y i y¯)2
The R2 score indicates how well the model explains the variability of the target variable. Values closer to 1 indicate better model performance.
Interpretation:
-
R2 = 1 : Perfect prediction.
-
R2 = 0 : Model performs no better than the mean of the data.
-
Model Performance Visualization
To better understand model effectiveness, graphical com- parisons were generated to visualize error metrics and feature importance.
Fig. 3. Calibration curve for the Random Forest classifier comparing mean predicted probabilities against the observed fraction of positives
such as LandCoverNet, Copernicus Global Land Cover Layers, MODIS Land Cover (MOD12Q1), and GLanCE accessed through platforms like Google Earth Engine and USGS Earth Explorer. These datasets enabled the extraction of important vegetation indicators such as NDVI and spectral reflectance bands to analyze land surface characteristics and vegetation distribution within the study period from 2018 to 2024.
Several machine learning classification models were im- plemented and evaluated, including Support Vector Machine (SVM), Decision Tree, Random Forest, Gradient Boosting, and Convolutional Neural Networks (CNN). The models were assessed using classification performance metrics such as accuracy, precision, recall, and F1-score. Among the evaluated models, CNN demonstrated the highest performance due to its ability to effectively capture spatial patterns in satellite imagery. Random Forest also showed strong classification accuracy and robustness when handling multi-spectral features. Decision Tree and SVM provided reliable baseline results but showed comparatively lower performance in complex land cover scenarios.
In conclusion, the integration of machine learning tech- niques with Earth observation data provides an efficient and scalable solution for automated land cover analysis and veg- etation monitoring. Future work may involve expanding the geographic scope of the study, incorporating additional vege- tation indices such as EVI and SAVI, and developing real-time monitoring dashboards to support continuous environmental assessment and sustainable land management.
References
-
N. Shamloo, M. T. Sattari, K. V. Kamran, and H. Apaydin, An integrated artificial intelligencedeep learning approach for vegetation canopy assessment and monitoring through satellite images, Stochastic Environmental Research and Risk Assessment, vol. 39, pp. 16231645, 2025.
-
A. Emam, T. T. Stomberg, and R. Roscher, Leveraging activation max- imization and generative adversarial training to recognize and explain patterns in natural areas in satellite imagery, IEEE Geoscience and Remote Sensing Letters, vol. 21, 2024.
-
M. Alshehri, A. Ouadou, and G. J. Scott, Deep transformer-based network for deforestation detection in the Brazilian Amazon using Sentinel-2 imagery, IEEE Geoscience and Remote Sensing Letters, vol. 21, 2024.
-
R. Luo, Q. Yuan, L. Yue, and X. Shi, Monitoring recent lake variations under climate change around the Altai Mountains using multimission satellite data, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 13741391, 2021.
-
P. Ghamisi et al., Responsible artificial intelligence for Earth observa- tion: Achievable and realistic paths to serve the collective good, IEEE Geoscience and Remote Sensing Magazine, early access, 2025.
-
K. Imaoka et al., Global Change Observation Mission (GCOM) for monitoring carbon, water cycles, and climate change, Proceedings of the IEEE, vol. 98, no. 5, pp. 717734, May 2010.
-
L. Chen et al., Artificial intelligence-based solutions for climate change: A review, Environmental Chemistry Letters, vol. 21, pp. 25252557, 2023.
-
S. Dewitte, J. P. Cornelis, R. Mu¨ller, and A. Munteanu, Artificial intelligence revolutionises weather forecast, climate monitoring and decadal prediction, Remote Sensing, vol. 13, no. 16, p. 3209, 2021.
-
B. Chintalapati et al., Opportunities and challenges of on-board AI- based image recognition for small satellite Earth observation missions, Advances in Space Research, vol. 75, pp. 67346751, 2025.
-
C. Huntingford et al., Machine learning and artificial intelligence to aid climate change research and preparedness, Environmental Research Letters, vol. 14, p. 124007, 2019.
-
G. Mateo-Garc´a et al., LandCoverNet: A global benchmark dataset for land cover classification, Remote Sensing, vol. 13, no. 9, p. 1729, 2021.
-
ESA Copernicus Programme, Copernicus Global Land Cover Layers: Collection 3, European Space Agency, 2023.
-
M. Friedl and D. Sulla-Menashe, MODIS Collection 6 Land Cover (MCD12Q1) Product User Guide, NASA, 2019.
-
U.S. Geological Survey, Landsat 8 Data Users Handbook, USGS Earth Resources Observation and Science Center, 2020.
Dr. K. Lakshmi Narayanan completed Ph.D in the Field of Computer Science and Engineering in SRM Institute of Science and Technology, Chennai, in the year 2024. His main research areas are Network Security, Vehicular ad Hoc Networks, Cloud Security, Cloud Computing, and Networks. He completed his Master of Engineering in the field of Computer Science and Engineering in Annamalai University Chidambaram, in the year 2012. He completed his Bachelor of Engineering at Annamalai University, Chidambaram in the year 2009. He Worked as an Assistant Professor at Mailam Engineering College, Mailam, India, from 2012 to 2018. He also worked as a Snior Customer Support Executive at HCL Technologies, Chennai, from 2018 to 2020. He formerly worked as an Assistant Professor in the Department of Computer Science and Engineering at Karpaga Vinayaga College of Engineering and Technology, Chengalpattu. Now, he is working as an Assistant Professor in the Department of Computational Intelligence at SRM Institute of Science and Technology, SRM University, Kattankulathur Campus, Chennai 603203.
