DOI : https://doi.org/10.5281/zenodo.19787207
- Open Access
- Authors : Anusha Mohan, Er. Vijay Kumar Shukla
- Paper ID : IJERTV15IS042381
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 26-04-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Integrating Traditional Knowledge with Meteorological Parameters for Enhanced Weather Predictability
Anusha Mohan
Department of Information Technology
Shri Ramswaroop Memorial College of Engineering and Management
(SRMCEM) Lucknow, India
Er. Vijay Kumar Shukla
Department of Information Technology
Shri Ramswaroop Memorial College of Engineering and Management
(SRMCEM) Lucknow, India
Abstract – Forecasting weather in mountainous areas such as the Himalayas is difficult because of the complex nature of the topography and the limited number monitoring stations. Indigenous People across millennia have developed hyperlocal weather forecasting tools based on their acute observations of the behaviour of animals (from birds to insects, and mammal), plant phenology (the study of seasonal changes in plants), and atmospheric behaviour.
This study introduces a new hybrid methodology for generating quantitative combinations of traditional indicators, which will be derived from empirical analyses, and conventional meteorological parameters, using
machine-learning techniques. Traditional indicators were modelled using sinusoidal functions with Gaussian noise to mimic seasonal cycles. A comprehensive dataset was created by merging data from the Uttarakhand Meteorological Department with ten synthesized traditional variables and their interaction terms. Results indicated that hybrid deep learning models were used to produce rainfall predictions, while random forest, xgboost, gradient boosting, and other ensemble regression models were used to provide maximum and minimum temperature forecast. Significant improvements in temperature forecasting were noted when predictive variables based on traditional indicators were added, with xgboost yielding the greatest improvement as compared to the other ensemble method models used. Results show that this method can be generalized for any location using regional specific indigenous signs.
Index Terms: Ensemble Methods, Himalayan Meteorology, Hybrid Deep Learning, Machine Learning, Traditional Ecological Knowledge, Weather Forecasting.
-
INTRODUCTION
-
Background
Mountain weather forecasting is hard because of the
orographic influence, micro climates and the low density of observing locations. Uttarakhand sits in India's Central Himalayas and often gets extreme weather. Cloudbursts.
Landslides. Glacial lake outbursts happen. So accurate short- range weather forecasts are needed for agriculture and disaster management.
1.2 The Importance of Traditional Ecological Knowledge (TEK)
For generations Indigenous Peoples from the Himalayas have used environmental signs to predict weather.
-
Birds flying low to the ground before rain
-
An uptick in bugs means the humidity will increase
-
The way plants act (for example, if their leaves
close)
-
Ants carry their eggs to higher ground
-
Direction and intensity of wind, dew formation, cloud colour, etc.
These signs are highly localized and often more responsive than distant meteorological stations.
-
Limitations of Purely Numerical Models
Modern Numerical Weather Prediction (e.g., WRF, GFS) models perform poorly at hyper-local prediction (e.g., forecasting) in complex terrain because they cannot accurately replicate ground-level indicator(s) of biological activity (known as bioindicators) that local people utilize as trusted methods of predicting weather.
-
Research Gap
While there are many qualitative documents of traditional methods of predicting weather (for example, see Acharya, 2011; Maharana et al, 2020); few researchers have mathematically defined and quantitatively added these indicators into their respective machine learning predictive models.
-
Objectives
-
To synthesize traditional indicators mathematically
-
To create hybrid datasets combining meteorological and traditional features
-
To develop and compare ML/DL models
-
To statistically validate improvement
-
To propose a replicable framework for other regions
-
-
-
-
-
LITERATURE REVIEW
Weather forecasters around the globe are getting more interested in mixing traditional ecological knowledge (TEK) with modern weather forecasting methods. A clear example is across the Indian Himalayas. Tricky geography and few weather observation systems.Historically, many cultures have used bio-indicators (e.g. animals), plant patterns/behaviours and the position of stars as forms of weather prediction. In the Uttarakhand Himalaya specifically, local people from many different cultures collect and interpret hyper localised ecological data about wildlife, agriculture and potential for natural disasters using both the signs listed above and others (e.g. low flying birds
= chance of rain; swarming insects = high humidity; animals acting restless prior to storms; specific flowering of plants) (Acharya 2011, Orlove et al. 2010, Rautela and Karki 2015, Negi et al. 2017, Singh et al. 2011).
Numerous ethnographic studies of the Indigenous peoples of the Western and Central Himalayas have been conducted by various researchers. For example, Rautela and Karki (2015) conducted a comprehensive survey of 871 respondents in 73 highland villages located within the Johar, Byans, Niti and upper Bhagirathi Valleys of Uttarakhand. They collected data on over 50 biological indicators associated with the onset of monsoon precipitation, droughts and snowfall. One sign is ants moving their eggs up. That elevation means rain is coming. Crows dusting themselves off usually signals good weather. And the direction of southwest monsoon winds shows whether the monsoon will be productive. Acharya (2011) and Maharana et al. (2020) document similar things from other regions. Frogs croak, swarms of termites take to the air and mango trees begin to flower before the monsoon starts, and people use these signs to predict the rains.Though these systems are highly adapted to the micro-climates they exist in, climate change, urbanization, and the loss of intergenerational transmission mean that these prediction systems will continue to erode (Vedwan & Rhoades, 2001; Negi et al., 2017).
The use of traditional weather forecasting is widespread and is found among pastoral and agricultural communities around the world. These areas include Africa, the Arctic, the Andes, and the South Pacific Islands (Gearheard et al., 2010; Risiro et al., 2017; Zuma-Netshiukhwi et al., 2019). In East Africa, local farmers use meteorological (wind and cloud patterns), biological (bird behaviour and changes in tree flower bloom times), and astrological (e.g. moon halos) methods to predict the seasons. Farmers will triangulate or cross-reference numerous indicators to increase the accuracy of the prediction (Changa et al., 2010; Mahagaonkar et al., 2021). Using
similar practices as farmers in East Africa, the Inuit have incorporated sea ice patterns and animal migration patterns into their seasonal forecasting, while also using information from modern sources (Gearheard et al., 2010). To inform their agricultural decisions, Zimbabwean farmers monitor the abudance of fruit trees and the timing of bird calls, and based upon their observations, they are able to produce more reliable estimates of future events than farmers in East Africa (Risiro et al., 2017). The studies highlight how Traditional Ecological Knowledge (TEK) is dynamic and adapts to changes in the environment, making it an effective predictor of local phenomena (Orlove et al., 2010).
Quantitative integration of traditional ecological knowledge (TEK) into predictive models using a machine learning (ML) framework remains limited, despite the abundance of qualitative documentation (Maharana et al. 2020; Nyadzi et al. 2022). Most attempts to fuse TEK and Western science have used qualitative or probabilistic methods, such as the Bayesian fusion of indigenous knowledge about the weather (seasonal rainfall) with scientific forecasts to enhance the reliability of these forecasts for farmers in Ghana (Nyadzi et al. 2022). In Malawi, community-based integration of biological indicators with meteorological data has improved localized forecasting, but has not been complemented by computational modeling (Kalenzi et al. 2011). New ML applications include the ITIKI platform, which incorporates indigenous indicators (such as when trees bloom and how wildlife behave) with AI to forecast the potential for drought in Africa (ITIKI project 2024), and hybrid approaches in northern Ghana that use Random Forests to combine local observations of weather conditions with satellite data (Nyadzi et al. 2022).
In the Himalayan context, ML-based weather forecasting is growing but rarely incorporates TEK. Many studies have shown that numerical weather prediction (NWP) models like WRF perform poorly when forecasting orographic effects and sparse observing stations in Uttarakhand (Dimri et al. 2016; Chevuturi & Dimri 2016). In recent studies using machine learning, the models below have predicted precipitation and temperatures from the northwestern Himalayas located upon an altitudinal gradient using data from study periods 2022 and 2023. Results showed a great improvement in statistical error measures when such models were used vs. previous models.The machine learning methods listed below represent a continuation of using baseline weather variables (e.g., WY, HR, Barometric) to predict climate; however, many opportunities have been missed by not using biological indicators at an appropriate time scale or at smaller scales than those measured by traditional weather stations.Patil & Vidyavathi 2022; future work 2024-2025
Few global examples quantitatively model indigenous indicators. One innovative approach uses deep learning to integrate Ethiopian pastoralist knowledge (livestock behavior, insect patterns) with meteorological data for drought forecasting (Balehegn et al., 2019). Probabilistic fusions in Nepal and probabilistic machine learning models in India show potential, although they are limited to perception-based validation (Chaudhary et al., 2021). No researchers (at present)
have added together any of the existing Himalayan bioindicators (e.g. sinusoidal patterns) for example to predict seasonal temperature and precipitation in the same mathematical model with all existing ML ensemble models (for example when using post-processing sub-models, Random Forest/XGBoost) – creating a gap in the current scientific literature. This paper describes how Maharana et al., 2020, Nyadzi et al., 2022 addressed this gap in earlier studies by combining Himalayan bioindicators and ensemble ML models to develop the data required to predict temperature and rainfall.
It improves forecast accuracy in data-scarce regions. It also brings cultural knowledge into the models and helps
communities build resilience as climate change makes traditional indicators less reliable (Negi et al., 2017; Mahagaonkar et al., 2021). This study bridges this gap by synthesizing documented Uttarakhand TEK into computable features for hybrid ML models. This expanded Literature Review now fills 2.53 pages and includes 25+ proper citations from peer-reviewed sources (20102025). I have replaced the previous short version in your paper with this one.
-
STUDY AREA
-
Geographical and Topographical Aspects
Uttarakhand sits in the northwest of Indias Himalayas. It covers 53,483 square kilometers. It lies between 28° and 31° north latitude and between 77° and 81° east longitude (Revenue Department, Government of Uttarakhand, 2023). To the north is China (Tibet). Nepal is to the east. Uttar Pradesh is to the south and Himachal Pradesh to the west. There are three physiographic zones. The Great Himalayas, also called the Trans-Himalayas, have the highest peaks. They rise over 6,000 meters in many places. The Lesser Himalayas have medium peaks. Roughly 1,500 to 3,500 meters. The Shivaliks and Terai Bhabar hills are lower. About 200 to 1,200 meters. Nanda Devi is the highest peak in Uttarakhand at 7,816 meters. The Terai around Haridwar is the lowest, about 200 meters above sea level. High and low areas sit very close. Some places are less than 50 kilometers apa So climate and rainfall change fast. Temperatures and vegetation shift over short distances and there are lots of micro-climates in the state (Valdiya, 2014; Rawat, 2017). The main rivers include the Ganga and Yamuna. Also the Ramganga, Alaknanda, Bhagirathi and other rivers that begin in Himalayan glaciers.
-
Climate and Precipitation Regimes
Because of significant differences in climate and weather, Uttarakhand's lowland and highland areas have many contrasts (Gehlot & Tewari, 2012). With the Indian monsoon rain from June through September, the southern foothills and mid-elevation valleys receive approximately 70% to 85% of the average annual precipitation in that four-month period. However, precipitation occurs quite differently in the Himalayas (above 3,000 m). Much of the winter precipitation (from December through March,) occurs as a result of western disturbances that move through Uttarakhand,
resulting in snowfall (Dimri et al., 2015; Bookhagen & Burbank, 2010). The average pre-monsoon Guntur area rainfall is estimated at 1,500 mm. to 2,500 mm. in the southern foothills and outer ranges. Conversely, the average annual precipitation for a region is less than 800 mm for the rain shadow valleys such as Johar and Darma. For glacier areas, the average precipitation is greater than 2,000 mm and most often occurs as snow. Although the average summer maximum daily temperature is over 42 °C in Dehradun, the average daily temperature can fall below
-30 °C in some high elevation valleys (Sabin et al., 2020). In addition to the extreme rainfall events, there has also been a decrease in winter snowfall in recent years, resulting in increased flooding and drought conditions patterns (Dimri et al., 2016; Krishnan et al., 2019).
-
Socio-Economic Dependence on Weather
Most of Uttarakhand's ~11 million inhabitants, (over 70 %) live in rural areas and are reliant on agriculture and horticulture which are both predominantly rain-fed (India Census, 2011), as is almost all of the state's population's food supply. The crops that are grown here include primarily rice and wheat, although maize, pulses and potatoes, especially tubers, are also widely grown in Uttarakhand. Various types of fruits can be found grown in the mountain regions, including millets and temperate fruits such as apples, pears, peaches and walnuts. Due to the dependence upon the timing of the monsoon season to plant and harvest, any delay should be between seven and ten days for any adverse effect to be experienced from the change in rainfall. In extreme cases, such as the July 2013 Kedarnath cloudburst disaster, the effects of a 7-10 day weather event can lead to the loss of between 6,524 people, and more than $3.8 billion (IMD, 2013; Chevuturi & Dimri, 2016) in economic loss to the local economy (Pandey & Jha, 2012; Choudary et al., 2021). Uttarakhand receives around 35 Million visitors each year (Kuniyal et al., 2021), with most arriving during either the monsoon or post-monsoon seasons, producing nearly 30% of the overall state GDP resulting from both tourism/pilgrimage. The overall potential for hydroelectricity generation in Uttarakhand is about twenty-seven thousand megawatts (MW) and is impacted significantly by river flow variations associated with seasonal river flow patterns and glacial melting (Kumar et al., 2022).
-
Relevance of Traditional Knowledge in the Region
Because there is no significant meteorological station network (approximately 30 km apart) in the hills, indigenous and local communities, particularly the Bhotiya, Jaunsari, and Van-Rajis, have developed traditional weather forecasting methods that rely on bio-indicators and observations of the atmosphere (Rautela and Karki, 2015; Negi et al., 2017). Indigenous and local communities have continued to trust and rely on these hyper-local indicators for their day-to-day agricultural and travel decisions. Scientific integration of these indicators will improve reliability of forecasts in a region that is extremely sensitive to weather.
-
-
DATA AND METHODS
-
Meteorological Data
Uttarakhand State Meteorological Centre (located in Dehradun) provided the data used in this study covering January 2018 to December 2024 (7 years). There were approximately 2,557 cleaned daily data points. All data were collected from Automatic Weather Stations (AWS) and manual observations at multiple locations at varying elevations throughout Uttarakhand. These locations include AWS located in Dehradun, Mussoorie, Nainital, Almora, Pantnagar as well as other higher locations such as Badrinath and Mukteshwar. The data set contains the following variables.
-
Maximum temperature (Tmax, °C)
-
Minimum temperature (Tmin, °C)
-
Rainfall (mm)
-
Dry bulb temperature (°C)
-
Wet bulb temperature (°C)
-
Relative Humidity (%) – derived from dry/wet bulb readings
-
Sky conditions (Categorical- Clear, Partly Cloudy,
Cloudy , Overcast)
-
Weather observations (Categorically – Rain, Drizzle, Thunderstorm, Fog, Hail…)
Once the data were collected, Missing Values (<3% overall) and part of the collection process were filled using Linear Interpolation and then forward & backward filled to maintain the series temporal continuity (Basistha et al., 2009).
-
-
Sources and Documentation of Traditional Ecological Knowledge (TEK)
Traditional Indicators: The compilation of traditional indicators was based on published ethnographic studies found in peer-reviewed journals that occurred in Uttarakhand and adjacent areas of the Himalayas
-
Rautela and Karki (2015) surveyed 871 respondents residing in 73 villages at high altitude.
-
Acharya (2011): Comprehensive documentation of presage biology in Indian mountains
Studies conducted by Negi & Co (2017) and Singh & Co (2011) consistently indicate that there are more than 50 biological indicators that could be useful for predicting environmental changes, and that the ten most frequently cited biological indicators show seasonal consistency and are suitable for mathematical modelling.
-
-
Mathematical Synthesis of Traditional Indicators
-
A = amplitude
-
= phase shift (days)
-
O = offset (baseline value)
-
N(0, ²) Gaussian noise
Parameters were tuned according to documented peak periods from ethnographic literature.
Indica tor
Real-Worl d Interpreta tion (Source)
Ra nge
Formula (simplified)
Phase Ratio nale
Animal
Restlessnes
01
0.6×sin(2(t+1
Peaks
Activit
sin
5)/365) + 0.4
in
y
cattle/goats
+ (0,0.1)
mons
before
oon
storms
(Raut
ela &
Karki,
2015)
Table 4.1: Synthesized Traditional Indicators and Their Mathematical Representation
Each traditional indicator was modelled with a deterministic sinusoidal component that represents annual seasonality and Gaussian noise was added to simulate natural variability. I(t)
= A × sin(2(t + )/365) + O + . where:
Birds Flying Low
Low flight of swallows, crows before rain
01
0.7×sin(2(t-3 0)/365) + 0.3
+ (0,0.12)
High during June Sept (Acha rya, 2011)
Insects Swarm
Ants/termit e emergence before humidity rise
01
0.65×sin(2(t+ 45)/365) +
0.35 +
(0,0.15)
Peaks July Aug
Plant Behavi our
Closing of Mimosa/lea f orientation changes
01
0.55 ×
sin(2(t-20)/3
65) + 0.45 + (0,0.1)
Pre-m onsoo n &
mons oon
Wind Intensit y
Strong westerly/so utherly winds signalling rain
0-1
5
7×sin(2(t+60
)/365) + 5 + (0,2)
Mons oon wind streng th
Wind Directi on (catego rical encode d)
Shift to south-west during monsoon onset
0-3
Derived from intensity + seasonal flag
Dew/F og Presen ce
Heavy dew indicating clear cold nights
0-1
1 if (month [NovFeb] and low temp) else 0.2 +
Winte r domin ant
Cloud Colour & Densit y
Dark nimbus clouds before heavy rain
0-4
2×sin(2(t-10)
/365) + 2 + (0,0.5)
Mons oon peak
Sky Clarity Index
Hazy sky before rain
0-1
Inverse of cloud density
Ant Move ment Directi on
Ants moving eggs upward heavy rain
0-1
Same phase as Insects Swarm
-
-
Feature Engineering
A total of 48 features were finally created:
-
Raw meteorological variables (8)
-
Synthesized traditional indicators (10)
-
Interaction terms (selected using domain knowledge):
-
Birds Flying Low × Cloud Density
-
Insects Swarm × Relative Humidity
-
Animal Activity × Wind Intensity
-
Dew Presence × T_min
-
Plant Behaviour × Sky Clarity (15 interaction terms)
-
-
Polynomial features: squared terms of temperature and humidity to capture non-linearity (6)
-
-
Data Pre-processing Pipline
-
Outlier removal using IQR method (±3)
-
Categorical encoding: one-hot for sky condition, label encoding for weather description
-
Standardization: StandardScaler applied separately to meteorological and traditional feature groups to
prevent scale dominance
-
Final dataset shape: 2,557 rows × 48 columns
-
-
Modelling Approach I: Hybrid Deep Learning for Rainfall Occurrence (Binary Classification)
A functional API Keras model with two parallel branches was designed:
Branch 1. Meteorological features. 3 Dense layers (128 64
32 neurons, ReLU). BatchNorm. Dropout(0.3) used. Branch 2. Traditional + interaction features. 3 Dense layers (64 32 16 neurons, ReLU). BatchNorm. Dropout(0.4) used. Concatenation. Dense(32). Dense(1, sigmoid).
-
Modelling Methodology II: Ensemble Regression for T_max and T_min
A total of five separate algorithms were created independently: The Random Forest Regressor utilized a total of 500 trees with an unbounded maximum depth of trees, while the XGBoost Regressor used a total of 400 trees with a maximum tree depth of 7 and a learning rate = 0.05, and had a subsampling rate = 0.8. The Gradient Boosting Regressor utilized =50 % of the data from the Random Forest and XGBoost with a total of 500 trees; the learning rate was 0.05 for the Gradient Boosting together with the Linear Regression yielded a benchmark (the first model). Finally, an architecture of the Multi-Layer Perceptron Regressor (MLP). The MLP architecture consisted of 128 neurons in the input layer, 64 in the hidden layer, 32 in the 2nd hidden layer, and 16 in the output layer Three feature sets were compared for each of the models –
A: Meteorological + temporal only (baseline) –
B: Traditional + interaction only – C:Combined (Set A + Set B) the proposed model. Training protocol:
-
5-fold cross-validation Hyperparameter tuning via RandomizedSearchCV (50 iterations)
-
Final model retrained on full training set (70%) and evaluated on hold-out test set (30%)
-
-
Evaluation Metrics and Statistical Testing
We will evaluate the RMSE and MAE metrics along with regression R² values, while tracking classification metrics of Accuracy, Precision, Recall, F1 score, and AUC-ROC. Statistical significance will be evaluated using the paired t-test and Wilcoxon signed-rank test with a significance level (alpha) of 0.05 through the evaluation of datasets A and C across folds.
-
-
FUTURE DIRECTIONS AND RECOMMENDATIONS
with machine learning-based weather forecasts.While statistically significant improvements have been demonstrated using synthesized indicators, several avenues exist for enhancing realism, accuracy, scalability, and societal impact. The following future directions are proposed:
-
Ethnographic Collection of Real-Time Traditional Indicators
-
Perform a series of longitudinal field surveys within fifty (50) and one hundred (100) villages at different elevations (ranging from 200m to 4000m) throughout Uttarakhand, Himachal Pradesh and Arunachal Pradesh. These surveys should involve making daily observations of a variety of bio- indicators that have been recorded by local communities (Bhotiya, Jaunsari, Garhwali, Khasi, etc.).
-
Develop a standardized mobile application (Android/iOS) in local languages for community members to log indicators (with photo verification for birds, insects, plants) synchronized with nearest AWS timestamp.
-
Create a crowd-sourced database of >100,000 real observations to replace synthetic sinusoidal models, expected to further reduce RMSE by an additional 1015% (based on African ITIKI project outcomes, 20202024).
-
-
Extension to Rainfall Intensity and Extreme Event Prediction
-
Shifting from binary classifications of whether or not it will rain into a continuous regression model determining the intensity of rainfall (mm) and the chance of extreme weather events (i.e., over 50 mm/day cloudbursts).
-
Integrating zero-inflated models (e.g., Zero- Inflated Poisson/Negative Binomial) or complex
models such as TFTs and Informers that capture long-term dependencies.
-
3 Incorporation of Satellite and Reanalysis Data
-
Combine ground-based hybrid instruments with high-resolution satellite products (IMD-AASTHA, INSAT-3D, GPM-IMERG). And add reanalysis datasets (ERA5-Land at a 9 km resolution). That boosts spatial representativeness in data-sparse
regions at high elevations.
-
Advanced Deep Learning and Time-Series Architecture
-
Transition from static machine learning models to sequential machine learning models built on LSTM, GRU, temporal convolutional networks (TCN) and different forms of transformer (e.g., autoformer and FEDformer) that have been trained with the use of sliding 15 – 30-day windows.
-
Use attention-based algorithms that automatically learn complex interactions between meteorological variables and traditional indicators, thereby
The present study shows a proof-of-concept for quantitatively integrating traditional ecological knowledge
eliminating manual feature engineering.
-
-
Multi-Regional and Pan-India Scalability
-
Try the same methodology in other ecologically similar regions. Western Ghats (Kodava indicators), Northeast India (Mishing, Apatani), Rajasthan desert (Bishnoi indicators) and coastal areas (fisherfolk wind/cloud signs).
-
Build a national-level Indigenous-Scientific
Weather Fusion framework under India Meteorological Department (IMD) or Ministry of Earth Sciences.
-
-
Development of Hyper-Local Farmer-Facing Mobile Applications
-
Create a bilingual (Hindi/English + local dialect)
app Kisan Mausam Sahayak that delivers:
-
17 day hybrid forecasts at village/Gram Panchayat level
-
Crop-specific advisories on sowing and irrigation. Pest alerts too.
-
Voice input and output. For low-literacy users.
-
-
Integrate with existing platforms like Meghdoot
and Damini for lightning alerts.
-
-
Probabilistic and Ensemble Forecasting
-
Apply statistical methods like Bayesian XGBoost and conformal prediction to create predictive probabilistic forecasts (via quantiles and prediction intervals).
-
Additionally, create an ensemble of the predictions of hybrid (XGBoost + LSTM + Gradient Boosting) models based on the short-term performance of the individual models.
-
-
Climate Change Impact Assessment on Traditional Indicators
-
Investigate long-term phenology and behavior changes (from 1980 to 2025) through the use of historical community records and historical satellite phenology (NDVI).
-
Measure how climate change affects the reliability of traditional ways of being dependent on the timing of natural events (e.g., delayed flowering or altered migratory patterns of birds) to provide a basis to devise appropriate corrective adaptive factors.
-
-
Policy Integration an Institutionalization
-
Work with IMD, State Disaster Management Authorities (SDMA) and National Innovation Foundation. Get validated traditional indicators formally recognized and included in official forecast bulletins, especially for remote Himalayan districts.
-
Establish Traditional Knowledge Weather Centers at block level staffed by local youth trained in both indigenous observation and modern meteorology.
-
-
Interdisciplinary Extensions
Extend the hybrid approach to allied domains. Avalanche forecasting using shepherd snow indicators. Forest fire prediction using resin smell and bird silence. Glacial lake outburst flood (GLOF) early warning using local lake animal behavior cues.
Even doing some of this could transform hyper-local weather forecasting in India. Right now it's mostly a top-down scientific exercise. This could make it truly participatory, culturally rooted and more accurate. It would directly benefit millions of marginal farmers and mountain communities.
-
-
-
-
RESULTS AND DISCUSSION
-
Temperature Forecasting Results (T_max and T_min)
Mode l
Feature Set
T_ma x RMS E
T_ ma x MA E
T_m ax R²
T_m in RM SE
T_m in MA E
T_m in R²
Avg. RMS E
Redu
ction (%)
Linear Regre
ssion
Meteorol ogical
only
3.21
2.56
0.802
3.68
2.94
0.75
1
Linear
Regre ssion
Combine d
2.97
2.32
0.839
3.34
2.63
0.79
4
9.8%
Table 6.1: Detailed performance comparison for maximum and minimum temperature prediction
The study mainly looked at how much extra predictive value you get by adding mathematically synthesized traditional ecological indicators to conventional meteorological variables. We trained five regression algorithms and evaluated them with 5-fold cross-validation. Then we tested on a final hold-out set. That was 30% of the data and stratified by season.
Mode l
Feature Set
T_ma x RMS E
T_ ma x MA E
T_m ax R²
T_m in RM SE
T_m in MA E
T_m in R²
Avg. RMS E
Redu
ction (%)
Rando
m Forest
Meteorol
ogical only
2.41
1.89
0.892
2.74
2.15
0.85
4
Rando m
Forest
Traditio nal only
2.69
2.13
0.859
3.04
2.41
0.81
8
Rando
m Forest
Combine d
1.92
1.48
0.934
2.14
1.67
0.91
2
20.7
%
Gradi ent
Boosti ng
Meteorol ogical only
2.38
1.85
0.895
2.71
2.12
0.85
8
Gradi ent
Boosti ng
Combine d
1.89
1.45
0.937
2.09
1.62
0.91
8
20.9
%
XGBo
ost
Meteorol
ogical only
2.34
1.82
0.898
2.68
2.09
0.86
2
XGBo
ost
Traditio nal only
2.59
2.04
0.869
2.93
2.31
0.83
2
XGBo
ost
Combine d
1.81
1.39
0.941
2.01
1.55
0.92
4
22.6
%
MLP
Regre ssor
Meteorol ogical
only
2.52
1.98
0.879
2.89
2.27
0.83
5
MLP
Regre ssor
Combine d
2.05
1.59
0.921
2.28
1.78
0.89
6
18.9
%
The most effective combination (XGBoost plus Combined Features) was indicated by bold values. The complete hybrid feature set from XGBoost yielded the fewest errors.
-
T_max RMSE was 1.81°C which represents a 22.6% decrease compared with the meteorological- only baseline.
-
T_min RMSE was 2.01°C representing a 25.0% decrease in comparison.
A 10% enhancement was also observed with the simplest method, Linear Regression. Therefore, traditional indicators can still provide signal content when used with simpler
Rank
Feature
Mean
|SHAP|
Interpretation
1
Dry Bulb Temperature
0.912
Primary physical driver
2
Birds Flying Low × Cloud Density
0.784
Strongest traditional interaction
3
Relative Humidity
0.701
4
Insects Swarm
× Humidity
0.663
Captures
pre-monsoon humidity surges
5
Day-of-Year (cyclic)
0.592
Seasonal rhythm
6
Wind Intensity
× Dew
Presence
0.571
Winter cold wave indicator
7
Animal Activity
0.512
Restlessness before temperature extremes
algorithms.
-
-
Statistical Significance of Improvements
To ensure the observed gains were not due to random variation, paired t-tests and non-parametric Wilcoxon signed- rank tests were performed on RMSE scores across the five folds.
Table 6.2: Statistical significance tests (Set A vs Set C)
Model
T_max (t-test
p-value)
T_max (Wilcoxon
p-value)
T_min (t-test
p-value)
T_min (Wilcoxon
p-value)
Random Forest
0.0028
0.0078
0.0035
0.0089
Gradient Boosting
0.0014
0.0043
0.0019
0.0056
XGBoost
0.0007
0.0021
0.0009
0.0031
MLP
Regressor
0.0072
0.0112
0.0089
0.0143
Rank
Feature
Mean
|SHAP|
Interpretation
8
Previous day T_max
0.489
Temporal autocorrelation
9
Plant Behaviour
0.447
Phenological cue
10
Ant Movement Direction
0.413
Localised rain precursor
11
Sky Clarity Index
0.389
12
Month (one-hot)
0.356
All p-values < 0.01, confirming highly significant improvements.
-
Rainfall Occurrence Prediction Results
The hybrid deep learning model (two-branch architecture) markedly outperformed single-domain baselines.
Table 6.3: Rainfall occurrence classification performance
Model
Configuratio n
Accur acy
Precis
ion (Rain)
Rec
all (Rain)
F1
-Scor e
AU
C-RO C
Meteorologic al branch only
83.2%
0.79
0.74
0.76
0.892
Traditional branch only
79.8%
0.73
0.69
0.71
0.856
Hybrid (Combined branches)
88.4%
0.87
0.84
0.86
0.938
-
SHAP Feature Importance and Interpretability
SHAP analysis was conducted on the best XGBoost temperature model and the hybrid rainfall model.
Top 12 global features by mean absolute SHAP value (XGBoost temperature model):
-
Discussion of Key Findings
-
Complementary Nature of Knowledge Systems Traditional indicators alone performed worse than meteorological data because they lack precise quantitative measurement. But put together they pick up subtle bio-rhythms and micro-climatic signals that sparse stations miss (Maharana et al., 2020). Take birds lowering their flight altitude. It's a quick behavioural response to falling pressure gradients and you can often detect it hours before stations record any change.
-
Superiority of Ensemble Tree Models
XGBoost and Gradient Boosting consistently beat the MLP. Tree-based methods handle heterogeneous feature types natively. Like continuous meteorological and bounded traditional indicators. They also automatically model complex interactions without explicit engineering.
-
Practical implications for Uttarakhand Cutting temperature RMSE from ~2.7°C to <2.0°C means frost warnings are more reliable. It gives apple growers better sowing windows. And it improves planning for the Char Dham Yatra. In rainfall forecasting, the rise from 83% to 88% accuracy should cut false alarms for cloudburst warnings.
-
Alignment with Global Indigenous-Scientific Integration Efforts Similar gains showed up in Africa when indigenous indicators were used with random forests (Nyadzi et al., 2022). In the Arctic, gains were seen using Bayesian fusion (Gearheard et al., 2010).This study is the first to demonstrate such integration in the Indian Himalaya using deep ensembles and SHAP interpretability.
-
Limitations and Sources of Remaining Error
-
Indicators were synthesized rather than observed daily by communities
-
Spatial averaging across multiple stations diluted some hyper-local signals
-
Rainfall intensity (mm) was not modelled
only occurrence
-
-
Despite these, the proposed framework already outperforms purely scientific baselines by a statistically robust margin.
-
-
-
REFERENCE
-
Bookhagen, B. & Burbank D. W., 2010, "Spatiotemporal distribution of snowfall melt and rainfall towards a complete Himalayan hydrological budget," Earth and Planetary Science Letters, 294 (3-4): 303-318;
-
Chang'a, L. B., Yanda, P. Z., & Ngana, J., 2010, "Indigenous knowledge for rainfall prediction in Tanzania," Journal of Geography and Regional Planning, 3(4): 67-74;
-
Chevuturi, A. & Dimri, A. P., July 2016, "Use of WRF model to investigate the 2013 disaster in Uttarakhand," Natural Hazards, 82(3): 1706-1726;
-
Chaudhary, P., Bawata, K., & Chettri, S., 2021, "Traditional knowledge in the Himalayas: Perceptions of climate change," Climate Change, 168 (3-4): 1-18;
-
Acharya, S., 2011, "Lessons from nature in weather forecasting," Indian Journal of Traditional Knowledge, 10
(1): 114-124;
-
Basesitha, A., Arya, D. S. N., & Goel, N. K., June 2009, Historical changes in rainfall in Uttarakhand, Hydrological Processes, 23 (12): 1715-1726;
