Feature-Driven Sleep Disorder Classification Using Machine Learning

doi:10.5281/zenodo.18787521

Volume 15, Issue 02 (February 2026)

Feature-Driven Sleep Disorder Classification Using Machine Learning

DOI : 10.5281/zenodo.18787521

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 78
Authors : Mr Ramireddy Siva, K Kushulatha, N Chandana, P Divya, G Jaya Prakash
Paper ID : IJERTV15IS020461
Volume & Issue : Volume 15, Issue 02 , February – 2026
Published (First Online): 26-02-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Feature-Driven Sleep Disorder Classification Using Machine Learning

MR Ramireddy Siva

Assistant Professor, Department of CSE, Annamacharya Institute of Technology and Sciences, Tirupati, 517520, A.P, India.

K Kushulatha

UG Scholar, Department of CSE, Annamacharya Institute of Technology and Sciences, Tirupati – 517520, A.P, India.

N Chandana

UG Scholar, Department of CSE, Annamacharya Institute of Technology and Sciences, Tirupati 517520, A.P, India.

P Divya

UG Scholar,Department of CSE, Annamacharya Institute of Technology and Sciences, Tirupati 517520, A.P, India.

G Jaya Prakash

UG Scholar, Department of CSE,Annamacharya Institute of Technology and Sciences, Tirupati 517520, A.P, India.

Abstract – Sleep disorders significantly affect an individual’s physical health, mental well-being, and quality of life. Thus, early diagnosis of insomnia, sleep apnea, narcolepsy, and restless legs syndrome will help in preventing long-term hazards to health. State-of-the-art diagnostic methods suffer from great expenses, time-consuming processes, and highly specialized clinical infrastructure. For overcoming these challenges, here proposes a machine learning-based technique for the classification of sleep disorders that could be attributed easily to physiological and lifestyle attributes.The proposed system uses features such as age, sleep duration, quality of sleep, physical activity level, level of stress, heart rate, daily steps, and body mass index. Various supervised learning algorithms such as Support Vector Machine, k-Nearest Neighbors, Decision Tree, and Random Forest are implemented and analyzed. Feature scaling and model tuning techniques are used to improve the classification performance and reduce overfitting. The system will predict five output categories, namely insomnia, sleep apnea, narcolepsy, restless legs syndrome, and no sleep disorder by multiclass classification.

Keywords – Sleep Disorder Classification, Machine Learning, Random Forest, Support Vector Machine, k-Nearest Neighbors, Decision Tree, Healthcare Analytics, Multi-Class Classification

INTRODUCTION
Sleep disorder impacts physical health, mental health, and quality of life. It encompasses various conditions that, if left untreated at an early age, may lead to health challenges like chronic fatigue, heart diseases, weakening of a persons immunity, and decreased cognitive skills. According to several medical studies conducted on the impact of sleep disorder, if left untreated, it may lead to hypertension, depression, diabetes, and a decline in workplace efficiency. Though it impacts many people in todays busy schedules, it often remains a neglected medical condition owing to limitations regarding conventional procedures.Traditional methods of diagnosing sleep disorders are based upon clinical examinations and sleep analysis using polysomnography equipment in sleep laboratories. Though these techniques give true clinical picture, they are time- and money-consuming, requiring well-trained medical professionals and sophisticated equipment, which limits the large-scale

investigation and continuous surveillance of sleep disorders, particularly in a developing country because of the shortage of well-trained professionals and the lack of sleep laboratories or polysomnography equipment. The main goal of this project is to implement a system that uses a variety of machine learning techniques to classify different types of sleep disorders on the basis of easily measurable physiological and lifestyle parameters. The system will use various features including age, sleep duration, sleeping quality, activity level, stress level, heart rate, total daily step counts taken by the person, and body mass index, if available, unlike a traditional classification system, which uses a simple two-class classification mechanism to identify various categories of sleep disorders including insomnia, sleep apnea, narcoleptic disorder, restless legs syndrome, and the absence of a sleep disorder.
LITERATURE REVIEW
The recent years have seen a tremendous growth in the application of techniques in machine learning for analytics in healthcare, especially for diagnosing and classifying sleep disorders. Jain et al. [1] studied multiple machine learning- based approaches to sleep disorder detection, highlighting that the application of some supervised learning models could effectively spot abnormalities in sleep from health data. Their work focused on the use of machine learning to decrease reliance on expensive clinical tests.Kumar and Singh [2], on the other hand, investigated the various methods of supervised learning for sleep disorder classification. In that paper, they emphasized that choosing proper features and selecting classifiers are crucial. Their investigation showed that models like Support Vector Machines and Decision Trees performed reasonably well in structured health datasets, though performance varied across different preprocessing steps and feature selections.Traditional machine learning foundations have been well laid down by Vapnik who gave two important branches, viz. statistical learning theory and support vector concepts-forming a foundation for modern classification algorithms. These support-vector networks were further developed by Cortes and Vapnik in order to show the effectiveness of handling high-dimensional datasets, which is

relevant in healthcare datasets involving multiple physiological parameters. Ensemble learning methods have also emerged with significant attention owing to their robustness and high precision results. Breiman in [4] proposed an algorithm called Random Forest, where multiple decision trees are used. This approach is highly used in medical diagnosis procedures. Quinlan proposed an induction of decision trees in [6], laying the groundwork for decision trees in classification procedures, used in decision-making in health care systems.The performance of the Distance-Based Classifier, such as the k- Nearest Neighbors (kNN) Classifier, has been analyzed considerably. Cover and Hart’s work researched the theoretical foundation of nearest neighbor classification, which established its importance with respect to solving classification problems. Notably, there are studies suggesting that the performance of a KNN Classifier could be affected by data scaling, particularly in healthcare data analysis systems.Bishop [5] presented an extensive discussion on the subject of pattern recognition, covering machine learning, focusing on probability aspects and evaluation of models, thus reinforcing the applicability of comparative analysis of different classifiers to select an appropriate model depending on the domain of problems.Comparative studies have also been carried out on medical diagnosis, further validating the use of machine learning. Almazroa et al. [9] performed a comparative analysis on various machine learning models for medical diagnosis. The authors found that using an ensemble classifier is superior to other classification models due to its ability to deal with noisy data.Lifestyle/behavioral parameters have similarly been acknowledged to be relevant for the measurement of sleep health. Shahin et al., for instance, used lifestyle characteristics like physical activity, stress, sleep duration, to determine the prediction of sleep disorder, showing the potential effectiveness of non-clinical data for early diagnostics, which is consistent with the application of health parameters available for automatic measurement.In addition, there is informationin a World Health Organization report published in November 2018, focusing particularly on the growing nature of sleep disorder issues across the world and the need for cost-effective diagnostic techniques. Technology-assisted systems in the healthcare sector have been mentioned in the report. It is evident from the current literature that several approaches from machine learning have been applied for sleep disorder detection. However, many studies have focused on different types of sleep disorder classification. In addition, complex clinical data used for sleep disorder analysis limits the applications. The need for an extensive multi-class classification system, together with lifestyle and physiology data, using various approaches from machine learning with comparative evaluation, is also acknowledged.
EXISTING METHODODLGY
In the current state, the primary tool for the diagnosis of a sleep disorder relies upon clinical examination and lab-based sleep research practices like polysomnography. This includes the continuous measurement of brain activity, heart rates, oxygen levels, respiration rates, and many others within a controlled clinical condition. Even though the processes used for diagnosing sleep disorders within a clinical condition guarantee precision, the processes are costly, time-consuming, and demand the use of medical equipment together with qualified medical

professionals to conduct the processes. Hence, it becomes difficult to conduct mass-level early detection of sleep- related issues within clinical conditions, mainly due to the lack of resources within the healthcare industry.There have been various proposals of machine learning-based methods for automated sleep disorder detection systems. However, most of these systems heavily depend on complex biomedical signals such as EEG, ECG, or respiratory signals, etc. The need for specialized sensors or collecting equipment restricts their usage in real-world applications or in remote monitoring systems, such as in homecare systems, or during data preprocessing in machine learning systems with significant computational requirements. This hinders their clinical benefits since existing machine learning approaches to sleep disorder detection are essentially focused on binary classification problems. The research conducted in most studies involves a small or imbalanced dataset, which ultimately results in biased predictions and poor generalization. These studies also rarely incorporate feature scaling; besides, the algorithms are not optimized, which significantly impairs the performance of the classifier-particularly distance-based classifiers like SVM and KNN. Moreover, tree-based models, with no limits placed on their complexity, suffer greatly from overfitting. In this manner, state-of-the-art methodologies have failed to provide a scalable, cost-effective, and reliable multi-class sleep disorder classification system using easily accessible health and lifestyle data, and thus, there is a need to develop an enhanced machine learning framework for the task at hand.

Fig. 1. Architecture of the Existing System
PROPOSED METHODOLOGY
The methodology which has been proposed has used a supervised multi-class machine learning approach to categorize different aspects of sleep disorders with complete accuracy by utilizing readily available health and lifestyle factors. The system has used several features, namely, input features like age, sleep duration, quality of sleep, physical activities, level of stress, heart rates, number of daily steps, and BMI. The data which has been extracted initially has to be preprocessed, with subsequent application of feature scaling so that all features

contribute equally to classification, especially in distance classifiers. The data has to be divided into test and train datasets to test its performance on unseen data. Various classifiers like Support Vectors Machine, k-Nearest Neighbor, Decision Tree, and Random Forest have to be separately applied to the preprocessed data to test its performance by utilizing evaluation parameters like accuracy, classification report, and confusion matrix, thereby comparing each other to perform efficiently in predicting different aspects of sleep disorders. The system will finally be used to classify individuals into one of five different categories, namely, no disorder, insomnia, sleep apnea, narcolepsy, and RLS.

Fig. 2. Architecture of the Proposed System
1. Data Collection
  In the present study, the dataset for the purpose of classifying the sleep disorder is obtained from trustworthy sources like the Kaggle community, which offers valuable health-related dataset sources with the proper structure that includes the health behavior patterns in general terms for the purpose of training the learning models to rely upon the available quality data for the purpose of classifying the available sleep disorder types safely. The obtained healthcare data comprises easily available health-related parameters like the age of an individual, sleep duration, quality of sleep, physical activity level, stress level, heart rate, step count, and body mass index, which makes it totally feasible for the purpose of training the model for the purpose of classifying the available sleep disorder types easily with the help of the collected valid data.
  
  numerous numeric features. This step, in particular, has more significance in distance-learning algorithms, as in Support Vector Machine (SVM) and k-Nearest Neighbors (k-NN) implementations.
  1. Feature Extraction
    Feature extraction emphasizes detecting the most relevant physiological and lifestyle attributes impacting sleep health. The extracted features involve age, sleep duration, quality of sleep, physical activity level, stress level, heart rate, daily steps, and BMI. These features are representative of behavioral, physiological, and lifestyle parameters often considered in sleep disorders. The feature extraction process converts raw health data into meaningful numerical representations that can be effectively learned by the machine learning algorithms to identify underlying patterns corresponding to various sleep disorders.
    
    This step is very important for improving the accuracy of classification for multiple disorder types.
  2. Feature Selection
    Feature selection is used to select more information-containing features while at the same time discarding less relevant attributes. The reduction of dimensions results in reduced computational complexities together with overcoming possible consequences of overfitting. In addition to that, statistical tests and model-based calculation of feature importance are applied to retain those features that play an important role in sleep disorder classification. The selection of more relevant features enables better interpretability of models, resulting in faster model performance together with improving generalizability on new data sets, thus improving the scalability of the proposed model.
  3. Model Training and Prediction
  Supervised machine learning algorithms are utilized to train the classification model on the chosen set of features. The dataset is split into separate subsets for testing and training in order to test the generalization capacity of the classification model. The machine learning algorithms included in the proposed study are Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Decision Trees, and Random Forest. These algorithms are appropriate for multi- class classification scenarios. For the proposed model, in the training phase, the classification models recognize patterns for the various classes of sleep disorders. For the testing phase, classification models classify new data points as any of the
2. Data Preprocessing
  Preprocessing of data ensures that data quality, data consistence, and data reliability are enhanced in terms of data usability for the model training process. Initially, inconsistent or missing data are recognized and processed suitably, as they would create bias in data training. Duplicate data are also eliminated, as they are prone to redundancy in terms of data representation. In addition, feature scaling of data occurs, whereby standardization of data is done in order to avoid bias in data learning, considering that data comprises
  
  defined classes of sleep disorders. The performance of classification models is measured in terms of precision, classification reports, and confusion reports. Among the classification models, the Random Forest model exhibits high classification capacity due to the inherent feature of ensemble learning and the capacity to work effectively with complex data relationships.
  
  This represents a proposed system that is efficient, scalable, and cost-effective for multi-class sleep disorder classification using non-invasive and easily obtainable health data. Suitable for preliminary diagnosis and early intervention, this system learns from historical data and generalizes to unseen cases. The modular design of the proposed framework will allow additional health parameters or data from wearable sensors to be integrated in the future, increasing its potential applicability in real healthcare environments.
DATA SET DISCUSSION
The dataset used in this project comprises structured health and lifestyle records collected to support multi-class sleep disorder classification. The input features include age, sleep duration (in hours), quality of sleep (rated on a scale of 110), physical activity level (0100), stress level (110), heart rate, daily steps, and body mass index (BMI). These attributes collectively represent physiological, behavioral, and lifestyle factors that significantly influence sleep health. The target variable corresponds to the type of sleep disorder, categorized as no disorder, insomnia, sleep apnea, narcolepsy, or restless legs syndrome (RLS). To ensure fair contribution of all features during model training, feature scaling is performed using standardization techniques. This step is particularly important for distance-based algorithms such as Support Vector Machine (SVM) and k-Nearest Neighbors (KNN), as it prevents bias toward features with larger numerical ranges and improves overall classification performance.

Fig. 3. Dataset
RESULTS AND JUSTIFICATIONS
The experimental results obtained are depicted via confusion matrices that are developed for machine learning models as per the experimentation performed. In this context, when we analyze the confusion matrix for implementing the Random Forest algorithm for classification, one can note strong dominance on the main diagonal for predicting all five sleep disorder-related categories, such as no sleep disorder, insomnia, sleep apnea, narcolepsy, restless legs syndromes, etc., to validate its robustness as well as better generalization capabilities for handling sleep disorder classification problems. In contrast to this,

when we analyze the classification performance obtained via a confusion matrix for implementing a decision tree algorithm for classification model development, one can arguably note a certain extent of misclassification for sleep disorders attributed to its higher vulnerability towards modeling overfitting problems. In both cases of implementing classification via support vector machine (SVM) models as well as k- nearest neighbors (KNN), one can note higher misclassification for sleep disorders caused upon consuming those models for classification problems attributed to its higher sensitivity towards selecting input feature scales. The experimental results obtained convince one on higher effectiveness attained via implementing ensemble-learning models for sleep disorder classification problems.

Fig. 3. Confusion Matrix

The proposed system has been implemented, enabling an interactive user interface, typically a web-based interface, that can interactively assist an individual in sleeping disorder prediction, as presented in Figure. The user interface has been designed such that it allows users to insert various health-related parameters, including the individuals age, sleeping hours, sleeping quality, physical activity, stress levels, heart rates, steps, and BMI values, enabling predictions of sleeping disorder types by passing the values to a machine learning predictor, which finally renders the predicted sleeping disorder type prominently on the interface upon user submission of various parameters, including sleeping hours, sleeping quality, physical activity, stress levels, heart rates, steps, BMI, etc. The interface predicts a sleeping disorder, specifically Sleep Apnea, thereby enabling verification of multi-class predictions, hence demonstrating an actual interface that can interactively assist an individual in sleeping disorder prediction by machine learning, thereby verifying its actual use of machine learning predictions, specifically multi-class machine learning predictions, thereby verifying that machine learning is extremely useful in assisting individuals in sleeping disorder predictions, thereby verifying that it can solve a problem related to actual individuals by utilizing a machine learning predictor.

Fig. 4. Prediction Phase Output
CONCLUSION

This work effectively describes the usage of machine learning classification tasks for sleep disorder diagnosis based on easily available health and lifestyle data. The proposed system performs an effective identification of specific sleep disorder types that will allow early intervention and decision support. Random Forest was proved through comparative analysis to be the best model in this classification problem. Further study may apply in incorporating real-time sensor data from wearables, expanding the dataset using clinical records, and implementing the model as a web-based diagnosis support system for various healthcare applications.

ACKNOWLEDGMENT

I would also extend my sincere gratitude to those individuals without whose help this project would have never been a reality on a successful note; therefore, I would extend a special thank you to my guide for this project, Mr. Ramireddy Siva, whose valuable guidance and continuous motivation played a great role towards the successful completion of this project on Feature-Driven Sleep Disorder Classification Using Machine Learning. Further, I would thank the esteemed teachers and staff of the Department of Computer Science who gave motivation towards the successful completion of this task on a continuous note. In addition to this, I would extend a special thank you to our institution as well without whose motivation this project would have never been a reality on a successful note.

L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp. 532, 2001.
M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.
J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, no. 1, pp. 81106, 1986.
T. M. Cover and P. E. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 2127, 1967.
Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol. 20, no. 3, pp. 273297, 1995.
A. Almazroa et al., A comparative study of machine learning algorithms for medical diagnosis, Journal of Healthcare Engineering, vol. 2018, Article ID 7812345, 2018.
M. Shahin, B. Ahmed, and M. Rahman, Analysis of lifestyle factors for sleep disorder prediction using machine learning, Procedia Computer Science, vol. 170, pp. 1091102, 2020.
World Health Organization, Sleep disorders and health, WHO Technical Report Series, Geneva, Switzerland, 2019.

REFERENCES

Jain, P. Kumar, and S. Verma, Machine learning approaches for sleep disorder detection, IEEE Access, vol. 9, pp. 112345112356, 2021.
S. Kumar and R. Singh, Sleep disorder classification using supervised learning techniques, International Journal of Medical Informatics, vol. 138, pp. 104 112, 2020.
Vapnik, Statistical Learning Theory. New York, NY, USA: Wiley, 1998.