An Empirical Study of the Applications of Data Mining Techniques in Higher Education

DOI : 10.17577/IJERTCONV8IS04002

Download Full-Text PDF Cite this Publication

Text Only Version

An Empirical Study of the Applications of Data Mining Techniques in Higher Education

Ms.Blessy Paul P

Dept. Of Vocational Studies

Abstract : Few years ago, the information flow in education field was relatively simple and the application of technology was limited. However, as we progress into a more integrated world where technology has become an integral part of the business processes, the process of transfer of information has become more complicated. Today, one of the biggest challenges that educational institutions face is the explosive growth of educational data and to use this data to improve the quality of managerial decisions. Data mining techniques are analytical tools that can be used to extract meaningful knowledge from large data sets. This paper addresses the applications of data mining in educational institution to extract useful information from the huge data sets and providing analytical tool to view and use this information for decision making processes by taking real life examples.

Keywords- Higher education,; Data mining; Knowledge discovery; Classification; Association rules; Prediction; Outlier analysis;


    In modern world a huge amount of data is available which can be used effectively to produce vital information. The information achieved can be used in the field of Medical science, Education, Business, Agriculture and so on. As huge amount of data is being collected and stored in the databases, traditional statistical techniques and database management tools are no longer adequate for analyzing this huge amount of data. Data Mining (sometimes called data or knowledge discovery) has become the area of growing significance because it helps in analyzing data from different perspectives and summarizing it into useful information. [1]There are increasing research interests in using data mining in education. This new emerging field, called Educational Data Mining, concerns with developing methods that discover knowledge from data originating from educational environments [1].

    The data can be collected from various educational institutes that reside in their databases. The data can be personal or academic which can be used to understand students' behavior, to assist instructors, to improve teaching, to evaluate and improve e-learning systems , to improve curriculums and many other benefits.[1][2]

    Educational data mining uses many techniques such as decision trees, neural networks, k-nearest neighbour, naive bayes, support vector machines and many others.[3] Using these techniques many kinds of knowledge can be discovered such as association rules, classifications and clustering. The discovered knowledge can be used for organization of syllabus, prediction regarding enrolment of students in a particular programme, alienation of traditional classroom teaching model, detection of unfair means used in online examination, detection of abnormal values in the result sheets of the students and so on.


    Data mining in higher education is a recent research field and this area of research is gaining popularity because of its potentials to educational institutes.

    1. gave case study of using educational data mining in Moodle course management system. They have described how different data mining techniques can be used in order to improve the course and the students learning. All these techniques can be applied separately in a same system or together in a hybrid system.

    2. have a survey on educational data mining between1995 and 2005. They have compared the Traditional Classroom teaching with the Web based Educational System. Also they have discussed the use of Web Mining techniques in Education systems.

    3. have a described the use of k-means clustering algorithm to predict students learning activities. The information generated after the implementation of data mining technique may be helpful for instructor as well as for students.

    4. discuss how data mining can help to improve an education system by enabling better understanding of the students. The extra information can help the teachers to manage their classes better and to provide proactive feedback to the students.

    5. have described the use of data mining techniques to predict the strongly related subject in a course curricula. This information can further be used to improve the syllabi of any course in any educational institute.

    6. describes how data mining techniques can be used to determine The student learning result evaluation system is an essential tool and approach for monitoring and controlling the learning quality. From the perspective of data analysis, this paper conducts a research on student learning result based on data mining.


    The object of the present study is to identify the potential areas in which data mining techniques can be applied in the field of Higher education and to identify which data mining technique is suited for what kind of application.


    Simply stated, data mining refers to extracting or mining" knowledge from large amounts of data. [5] Data mining techniques are used to operate on large volumes of data to discover hidden patterns and relationships helpful in decision making. The sequences of steps identified in extracting knowledge from data are: shown in Figure 1.


    Pattern evaluation

    Pattern evaluation

    Data mining

    Data mining

    Data selection and transformation

    Data selection and transformation

    Data cleaning and integration

    Data cleaning and integration

    Figure 1. The steps of extracting knowledge from data

    The various techniques used in Data Mining are:

    A :Classification and Prediction

    Classification is the processing of finding a set of models (or functions) which describe and distinguish data classes or concepts, for the purposes of being able to use the model to predict the class of objects whose class label is nknown. The derived model may be represented in various forms, such as classification (IF-THEN) rules, decision trees, mathematical formulae, or neural networks. Classification can be used for predicting the class label of data objects. However, in many applications, one may like to predict some missing or unavailable data values rather than class labels. This is usually the case when the predicted values are numerical data, and is often specifically referred to as prediction.

    IF-THEN rules are specified as

    Syntax : IF condition THEN conclusion

    e.g. IF age=youth and student=yes then buys_computer=yes

    1. Clustering Analysis

      Unlike classification and predication, which analyze classlabeled data objects, clustering analyzes data objects without consulting a known class label. In general, the class labels are not present in the training data simply because they are not known to begin with. Clustering can be used to generate such labels. The objects are clustered or grouped based on the principle of maximizing the intra class similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Each cluster that is formed can be viewed as a class of objects, from which rules can be derived. [5] Application of clustering in education can help institutes group individual student into classes of similar behaviour. Partition the students into clusters, so thatstudents within a cluster (e.g. Average) are similar to each other while dissimilar to students in other clusters (e.g. Intelligent, Weak).

      Figure 2. Picture showing the partition of students in clusters


      1. Organization of Syllabus

        It is important for educational institutes to maintain a high quality educational programme which will improve the students learning process and will help the institute to optimize the use of resources. A typical student at the university level completes a number of courses (i.e. "course" and "subject" are used synonymously) prior to graduation. Presently, organization of syllabi is influenced by many factors such as affiliated, competing or collaborating programmes of universities, availability of lecturers, expert judgments and experience. This method of organization may not necessarily facilitate students' learning capacity optimally. Exploration of subjects and their relationships can directly assist in better organization of syllabi and provide insights to existing curricula of educational programmes. One of the application of data mining is to identify related subjects in syllabi of educational programmes in a large educational institute.[6] A case study has been performed where the student data collected over a period of time at the Sri Lanka Institute of Information Technology (SLIIT) [7].

        The main aim of the study was to find the strongly related subjects in a course offered by the institute. For this purpose following methodology was followed to:

        • Identify the possible related subjects.

        • Determine the strength of their relationships and determine strongly related subjects.

          METHODOLOGY :

          Student ID

          Subject 1

          Subject 2

          Subject 3



          Advanced Databases




          Advanced Databases




          Advanced Databases




          Advanced Databases

          Visual Basic



          Advanced Databases

          Web Designing

          Table 1. The subjects chosen by students

          In the first step, association rule mining is used to identify possibly related two subject combinations in the syllabi which also reduce our search space. In the second step(see[6]), Pearson Correlation Coefficient was applied to determine the strength of the relationships of subject combinations identified in the first step

          Association Rules that can be derived from Table 1 are of the form:

          (X,subject 1)=> (X,subject2)

          (X,subject 1)^ (X,subject 2)=> (X,subject3) (X,database)=> (X,advanced database) [support=2% and confidence=60%]

          (X,database)^ (X,advanced database)=> (X,data mining)

          [support=1% and confidence=50%]

          Where support factor of the association rule shows that 1% of the students have taken both the subjects Databases and Advanced Databases and confidence factor shows that there is a chance that 50% of the students who have taken Databases will also take Advanced Databases. This way we can find the strongly related subjects and can optimize the syllabi of an educational programme.

      2. Predicting The Registration Of Students in an Educational Programme

        Now a days educational organization are getting strong competition from other Academic competitors. To have an edge over other organizations, needs deep and enough knowledge for a better assessment, evaluation, planning, and decision making. Data Mining helps organizations to identify the hidden patterns in databases; the extracted patterns are then used to build data mining models, and hence can be used to predict performance and behaviour with high accuracy. As a result of this, universities are able to allocate resources more effectively.


        One of the applications of data mining can be for example, to efficiently assign resources with an accurate estimate of how many male or female will register in a particular program by using the Prediction techniques.

        Figure 4. Prediction of female students in the coming year


        Training Data

        Year of Students

        Number of Male Students

        Number of Female Students

        Unseen data (2010,80,?)



















        Female Students=60

        In real scenario couple of other associated attributes like type of course, transport facility, hostel facility etc can be used to predict the registration of students.

      3. Predicting Student Performance

        One of the question whose answer almost every stakeholder of an educational system would like to know Can we predict student performance?

        Over the years, many researchers applied various data mining techniques to answer this question. In modern

        times, learning is taking on a more important role in the development of our civilization. Learning is an individual behaviour as well as a social phenomenon.

        It is a difficult task to deeply investigate and successfully develop models for evaluating learning efforts with the combination of theory and practice. University goals and outcomes clearly relate to promoting learning through effective undergraduate and graduate teaching, scholarship, and research in service to University.

        Student learning is addressed in some goals and outcomes related to the development of overall student knowledge, skill, and dispositions. Collections of randomly selected student work are examined and assessed by small groups of faculty teaching courses within some general education categories. [8]

        With the help of data mining techniques a result evaluation system can be developed which can help teachers and students to know the weak points of the traditional classroom teaching model. Also it will help them to face the rapidly developing real-life environment and adapt the current teaching realities.


        We can use student participation data as part of the class grading policy. An instructor can assess the quality of student by conducting an online discussion among a group of students and use the possible indicators such as the time difference between posts, frequency distribution of the postings, duration between postings and replies etc. Given this data, we can apply classification algorithms to classify the students into possible levels of quality.

      4. Detecting Cheating in Online Examination

        We can say that online assessments are useful to evaluate students knowledge; they are used around the world in schools -since elementary to higher education institutions- and in recognized training centers like the Cisco Academy [9].

        Now a days exams are conducted online remotely through the Internet and if a fraud occurs then one of the basic problems to solve is to know: who is there? Cheating is not only done by students but the recent scandals in business and journalism show that it has become a common practice. Data Mining techniques can propose models which can help orgaizations to detect and to prevent cheats in online assessments. The models generated use data comprising of different students personalities, stress situations generated by online assessments, and common practices used by students to cheat to obtain a better grade on these exams.

      5. Identifying Abnormal/ Erroneous Values

    The data stored in a database may reflect outliers-|noise, exceptional cases, or incomplete data objects. These objects may confuse the analysis process, causing over fitting of the data to the knowledge model constructed. As a result, the accuracy of the discovered patterns can be poor. [5] One of the applications of Outlier Analysis can be to detect the abnormal values in the result sheet of the students. This may be due many factors like a software fault, data entry operator negligence or an extraordinary performance of the student in a particular subject.


In the present study, I have discussed the various data mining techniques which can support education system via generating strategic information.. Since the application of data mining brings a lot of advantages in higher learning institution, it is recommended to apply these techniques in the areas like optimization of resources, prediction of retainment of faculties in the university, to find the gap between the number of candidates applied for the post, number of applicants responded, number of applicants appeared, selected and finally joined.


  1. C. Romero, S. Ventura, E. Garcia, "Datamining in course management systems: Moodle case study and tutorial", Computers & Education, Vol. 51, No. 1, pp. 368-384, 2008

  2. C. Romero, S. Ventura "Educational data Mining: A Survey from 1995 to 2005", Expert Systems with Applications (33), pp. 135- 146, 2007

  3. Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar, M. Inayat Khan, Data Mining Model for Higher Education System, Europen Journal of Scientific Research, Vol.43, No.1, pp.24-29, 2010

  4. K. H. Rashan, Anushka Peiris, Data Mining Applications in the Education Sector, MSIT, Carnegie Mellon University, retrieved on 28/01/2011

  5. Han Jiawei, Micheline Kamber, Data Mining: Concepts and Technique. Morgan Kaufmann Publishers,2000

  6. W.M.R. Tissera, R.I. Athauda, H. C. Fernando Discovery of Strongly Related Subjects in the Undergraduate Syllabi using Data Mining, IEEE International Conference on Information Acquisition, 2006

  7. Sri Lanka Institute of Information Technology,, retrieved on 28/02/2011

  8. Sun Hongjie, Research on Student Learning Result System based on Data Mining, IJCSNS International Journal of Computer Science and Network Security, Vol.10, No. 4, April 2010

  9. Academy Connection Training Resources In html,, December 28th, 2005.

  10. Wayne Smith, "Applying Data Mining to Scheduling Courses at a University", Communications of the Association for Information Systems, Vol. 16, Article 23, 2005

  11. Firdhous, M. F. M. (2010). Automating Legal Research through Data Mining. International Journal of Advanced Computer Science and Applications – IJACSA, 1(6), 9-16.

  12. The Result Oriented Process for Students Based On Distributed Data Mining. (2010). International Journal of Advanced Computer Science and Applications – IJACSA, 1(5), 22-25.

  13. Jadhav, R. J. (2011). Churn Prediction in Telecommunication Using Data Mining Technology. International Journal of Advanced Computer Science and Applications – IJACSA, 2(2), 17-19.

Leave a Reply