Detection of Autism Spectrum Disorder using Machine Learning

DOI : 10.17577/IJERTV11IS070268

Download Full-Text PDF Cite this Publication

Text Only Version

Detection of Autism Spectrum Disorder using Machine Learning

  1. Bhuvaneshwari

    Department of Information Technology Madras Institute of Technology, Anna University

    Chennai, India

    Dr. P. Lakshmi Harika

    Department of Information Technology Madras Institute of Technology, Anna University

    Chennai, India

    Pranusha S Bavan

    Department of Information Technology Madras Institute of Technology, Anna University

    Chennai, India

    N. Mathubaala

    Department of Information Technology Madras Institute of Technology, Anna University

    Chennai, India

    Dr. M R Sumalatha


    Department of Information Technology Madras Institute of Technology, Anna University Chennai, India

    AbstractAutistic Spectrum Disorder (ASD) is a severe neuro- logical condition that affects the entire brain system and whichin turn impacts the cognitive, emotional, social, and physical health of the individual. They experience difficulty in socializing and communicating with others. They are always in need of support either from parents, relatives, or friends to guide them. Unfortunately, there is no cure for autism but early detection can help in better treatment. A persons behavioural behaviours can be used to diagnose autism disorder. This method of diagnosisis time-consuming and ineffective for early detection of autism.. Therefore, there is a need for time-efficient and low-cost ASD screening to help individuals to decide whether they should undergo a clinical diagnosis and seek treatment. Therefore, we propose a machine learning-based, time-efficient solution to detect autism.


      Autism is one of the serious issues for humankind. Which affects the overall behavior of a person. It will also affect the emotional, cognitive, social, and physical health of an indi- vidual. It can be witnessed in individuals irrespective of their age (toddler, child, teenager, adults, and senior citizens). The autism screening tests are both time- and money-consuming.A technique based on machine learning is proposed to help the person to decide whether to get a formal clinical diagnosis or not based on the prediction/prognosis of the Machine Learning model. ASD is not curable but early detection is helpful to determine better treatment methodology. This proves to be of great help and can significantly reduce healthcare costs.


      1. Simple neural network models have been used by Madhura Ingalhalikar, Sumeet Shinde, Arnav Karmarkar, Archith Rajan, Dr. Rangaprakash, and Gopikrishna Deshpande (2021) to categorise their

        models. When opposed to sophisticated models, neural networks have made it easier to achieve greater accuracy on harmonised data. It was crucial to use ablation analysis to describe the most discriminative sub- networks that were directly linked to the clinical markers of autism.

      2. Five classification algorithms were utilised by Md. Fazle Rabbi, S. M. Mahedy Hasan, Arifa Islam Champa, and Md. Asif Zaman (2021) to identify autism in youngsters, and the best accurate model was identified by comparing them. Comparing various evaluation metrics, CNN algorithm outperformed all other algorithms. The dataset used consists of 2940 images of children.

      3. Classification Techniques have been utilised by M.

  2. Mythili and A. R. Mohamed Shanavas (2014) to research ASD. This papers primary goals were to identify autism and its severity degrees. SVM and neural networks were two of the classification algorithms employed. WeKA tools and fuzzy techniques were also employed to examine the social interaction and conduct of the students.

  1. In order to identify autism, J. A. Kosmicki1, V. Sochat,

    M. Duda, and D.P. Wall (2015) utilised a strategy of searching for the smallest collection of features. To assess the clinical assessment of ASD, the authors employed a machine learning methodology. The ADOS was applied to childrens behaviour that fell inside the autism spectrum. In this research, eight distinct machine learning algorithms from ADOSs four modules were applied. Stepwise backward feature recognition on score sheets from 4540 people was another aspect of

    the study. With an overall accuracy of 98.27 percent and 97.66 percent, respectively, it used 9 out of the 28 behaviours from module 2 and it had employed 12 out of the 28 behaviours from module 3 to detect an ASD risk.

  2. Fadi Thabtah has suggested an ASD screening methodthat makes use of machine learning adaption and the DSM-5 (2017). In this article, the researcher discussed the benefits and drawbacks of the ASD Machine Learning categorization. He has also made an effort to call attention to the problems with current ASD screening techniques and the way they consistently rely on the DSM-IV rather than the DSM-5 manual.



The architecture follows the following flow:

  • Collection of Dataset The Autism Screening Datasets are used and include age groupings for adults, toddlers, and children. Datasets were gathered via the UCI Repository and Kaggle.

  • Data Pre-processing The raw data will be cleaned by data pre-processing.

  • Model development and Evaluation After pre- processing, the dataset gets split into testing and training sets. Multiple classifiers are developed using prominent machine learning algorithms (Decision Tree, Naive Bayes, KNN and SVM). In the training phase, the training data is given to train the classifier. In the testing phase, class predictions are made on the test dataset. The classifiers are evaluated based on their performances in diagnosing autism. The test data is used for model evaluation to evaluate a model based on its performance and accuracy. The accuracies of the classification algorithms are compared. Using Voting classifier, a hybrid ensemble machine learning model is developed.

    Accuracy and recall are the metrics that are computed during the model evaluation process. The performance metric accuracy is the ratio of correctly predicted observations to all observations. Recall is the ratio of correctly predicted positive positive observations to all positive observations(output label 1). The voting classifier is evaluatedusing RepeatedStratifiedKfold cross validation. From cross validation, accuracy scores and recall scores are obtained.

  • Model deployment The model which has the highest accuracy is then deployed using Flask. In the frontend theuser has to input their basic form details and according to the age category, the user attempts the ASD Screening test. At the backend, the users input is pre-processed. The model is loaded from the pickle file and performs prediction with the processed input. The models prediction is displayed at the users front end.

Fig. 1. System Architecture


    1. Pre-processing

      Three datasets for different age categories of Adult, Chil- dren and Toddlers are used. The datasets are obtained from Kaggle and UCI repository. There are nearly 20 attributes in each of the dataset which consist of categorical, continuous, and binary values. The Class/ASD output label indicates if a person has ASD

      (1) or not (0). Firstly, the raw data is pre- processed. Unnecessary columns are dropped, column names renamed so that they are uniform across all datasets., Null values removed. Repository.

    2. Encoding

      Encoding is performed on categorical values to convert string value into numerical values. For this purpose, Label Encoder is used. The label encoder is saved as a pickle filefor further use. This pickle file is further used at the backend to encode the input obtained from the user.

    3. Model development

      After pre-processing, the dataset is split into testing set and training set. A train test split ratio of 80:20 is used. Multiple classification algorithms are used to develop the model. The classification algorithms used are K-nearest neighbours, Sup- port Vector Machine, Decision Tree, and Na¨ve bayes. Using these classification algorithms, a hybrid ensemble machine learning model is developed using Voting classifier. Both hardvoting and soft voting methods are used.

      The Voting Classifier is an estimator that combines repre- sentations of many classification techniques along with indi- vidual confidence weights. The Voting estimator, which was created by integrating various classification models, is a pow- erful meta-classifier that

      effectively counteracts the limitations of the individual classifiers on a given dataset. Voting classifier assigns a class label to a record based on a majority vote and weights applied to the class or class probabilities.

    4. Model Evaluation

      The Voting classifier is evaluated using RepeatedStrati- fiedKFold cross validation that repeats Stratified K-Fold n times. Three repeats of stratified 10- fold cross-validation is performed. KFold: Split dataset into k consecutive folds. Stratified: The folds are made by preserving the percentageof samples for each class. Repeats: Number of times cross- validator needs to be repeated.

      The metrics that are calculated are: Accuracy and recall Theaverage of all the accuracies and recall scores are computed and the mean accuracy and recall score is produced.

    5. Model Deployment

    The flask webapp starts with the form page that asks the user to enter the required input. After the user submits the input through a form, the user attempts the ASD Screening test. After user submits the test, backend receives the input data. The users input is first processed before performing the prediction. The encoder is loaded from the pickle file and is applied over the users input. The model is loaded fromthe pickle file and performs prediction with the processed input. The models prediction, the result of whether the user is autistic or not, and the class of the prediction are computed and displayed at the users front end.

    Fig. 2.

    Web page


    The performance metrics of the Hard Voting Classifier obtained are: accuracy of 83.76% for adults, accuracy of 84.48% for children, and accuracy of 96.39% for toddlers. The Recall values of the Hard Voting classifiers are 39.88% for adults, 73.11% for children, 96.11% for toddlers.

    The performance metrics of the Soft Voting Classifier ob- tained are: accuracy of 94.45% for adults, 91.30% for childrenand 96.71% for toddlers. The Recall values of the Soft Voting classifiers are 88.35% for adults, 88.77% for children, 97.85%for toddlers.

    In comparison, the Soft Voting Classifier has better perfor-mance metrics than Hard Voting Classifier.


It is inferred that Soft Voting classifier performs better than Hard Voting classifier. On comparing the ensemble model with the classification algorithms it is found that, for adult dataset, Naive Bayes classifier has slightly higher accuracy than the ensemble model. For other datasets, Soft Voting classifier has the maximum accuracy than other algorithms. If the personis found to be autistic, they are advised to seek a proper clinical diagnosis and are recommended support institutions that provide help to autistic people. A list of organizationssuch as schools for kids with special needs and other relevant institutions for all age groups are suggested.


[1] M. Ingalhalikar, S. Shinde, A. Karmarkar, A. Rajan, D. Rangaprakash and G. Deshpande, Functional Connectivity-Based Prediction of Autism on Site Harmonized ABIDE Dataset, in IEEE Transactions on Biomedical Engineering, vol. 68, no. 12, pp. 3628-3637, Dec. 2021,doi: 10.1109/TBME.2021.3080259.

[2] M. F. Rabbi, S. M. M. Hasan, A. I. Champa and M. A. Zaman, A Convolutional Neural Network Model for Early-Stage Detection of Autism Spectrum Disorder, 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), 2021, pp. 110-114, doi: 10.1109/ICICT4SD50815.2021.9397020.

[3] Mythili, M. S., and AR Mohamed Shanavas. An Analysis of students performance using classification algorithms. IOSR Journal of ComputerEngineering 16.1 (2014): 63-69.

[4] Kosmicki, J. A., Sochat, V., Duda, M., & Wall, D. P. (2015). Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning. Translational psychiatry, 5(2), e514- e514.

[5] Thabtah, Fadi. Autism spectrum disorder screening: machine learning adaptation and DSM-5 fulfillment. Proceedings of the 1st International Conference on Medical and health Informatics 2017. 2017.

[6] Vaishali, R., and R. Sasikala. A machine learning based approach to classify autism with optimum behaviour sets. International Journal of Engineering & Technology 7.4 (2018): 18.

[7] Vakadkar, Kaushik, Diya Purkayastha, and Deepa Krishnan.Detection of Autism Spectrum Disorder in Children Using Machine Learning Techniques. SN Computer Science 2.5 (2021): 1-9.

[8] Cavus, Nadire, Abdulmalik A. Lawan, Zurki Ibrahim, Abdullahi Dahiru, Sadiya Tahir, Usama Ishaq Abdulrazak, and Adamu Hussaini. A systematic literature review on the application of machine-learning models in behavioral assessment of autism spectrum disorder. Journal of Personalized Medicine 11, no. 4 (2021): 299.

[9] S.B. Shuvo, J. Ghosh and A. S. Oyshi, A Data Mining Based Approach to Predict Autism Spectrum Disorder Considering Behavioral Attributes, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, pp. 1-5, doi: 10.1109/ICCCNT45670.2019.8944905.

[10] J. Baio, Prevalence of autism spectrum disorders: Autism and developmental disabilities monitoring network, united states, 2008. morbidity and mortality weekly report.61. Centers for Disease Control and Prevention, 2012.

[11] S. E. Bryson, L. Zwaigenbaum, and W. Roberts, The early detection of autism in clinical practice, Paediatrics & child health, vol. 9, no. 4, pp. 219 221, 2004.

[12] F. Thabtah and D. Peebles, A new machine learning model based on induction of rules for autism detection, and A complete guide to the random forest algorithm, Built In, vol. 16, 2019.

[13] Haishuai Wang, Li LiLianhua Chi, Ziping Zhao, Autism Screening Using Deep Embedding Representation, International Conference on Computational Science, Lecture Notes in Computer Science, vol 11537, pp. 160-173, jun 2019.

[14] Muhammad Nazrul Islam, Kazi Shahrukh Omar, Prodipta Mondal, Nabila Shahnaz Khan, A Machine Learning Approach to Predict Autism Spectrum Disorder, International Conference on Electrical, Computer and Commmunication Engineering, feb 2019.