Enhanced Student Bunglers Detection using Association Rules and Predicting Outliers

Devikala. D; Kamalraj. N

doi:10.17577/IJERTV3IS100801

Volume 03, Issue 10 (October 2014)

Enhanced Student Bunglers Detection using Association Rules and Predicting Outliers

DOI : 10.17577/IJERTV3IS100801

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 92
Total Downloads : 246
Authors : Devikala. D, Kamalraj. N
Paper ID : IJERTV3IS100801
Volume & Issue : Volume 03, Issue 10 (October 2014)
Published (First Online): 27-10-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Enhanced Student Bunglers Detection using Association Rules and Predicting Outliers

Devikala. D1

Research Scholar1, Department of Computer Science,

Dr. SNS Rajalakshmi College of Arts & Science, Coimbatore 641 049, India.

Kamalraj. N2

Head of the Department, Department of Computer Technology

Dr. SNS Rajalakshmi College of Arts & Science, Coimbatore-641 049,India.

Abstract – Recently many countries show interest and concern about problem of failure students and the way to determine the main contributing factors that affects the students performance. The great deal of research is undergoing for identifying the factors for the low performance of students using the large amount of information stored in databases.

This paper proposes a novel classification approach with association rule mining and outlier detection. Data mining is applied after preprocessing the data and continues with association, classification and outlier detection. The main objective of the paper is to detect dropout and failure data as early as possible which shows the factors trying to reduce dropout and failure students. The outcomes are compared and best result is identified.

Keywords: Educational data mining (EDM), Classification, Association, Outlier detection.

INTRODUCTION

The innovation of information technology from various disciplines such as database technology, scientific data, machine learning, neural networks, information retrieval, statistics, etc leads to usage of large volumes of data storage in various formats like records, files, documents, images, sound, videos and many new data formats. The process of identifying meaningful patterns and relationships of a data within very large databases is data mining and it is also called as KDD-knowledge discovery in databases. The steps involved before carrying out data mining are data cleaning, data selection, and pre-processing and data transformation.

The great deal of research [1] has been done on identifying the factors that affect the low performance of students at different educational levels using the large amount of information that current computers can store in databases. Current area of research in educational data mining is based on the development of methods for the better understand about students and the settings in which they learn [2]. The works show promising results with respect to, economic, sociological, educational characteristics which are more relevant in prediction of low academic performance [8] with some complexity of time and process by using various classification based algorithms. This paper proposes apriori algorithm in association rule mining for classification which provides more efficient results that the existing system. It reduces

the complexity of the system and the extreme data that is the data which is abnormal is detected by the outlier detection method. Density based outlier is used to detect the abnormal data. The result produced by the system is more accurate takes less time complexity and provides better performance.
LITERATURE REVIEW

Romero.c et al [2] studies about the educational data mining and the development of the studies by exploring the data. The paper deals with the introduction of the educational data mining with different types of user groups and types of educational environment of the user group which provides the data. The most common task by data mining technique to resolve the educational environment is listed out and finally some promising features are discussed.

N. V. Chawla et al [3] proposed a method of over- sampling the abnormal class and under-sampling the normal class can achieve better classifier performance by varying the loss ratios in class.

S. Kotsiantis et al [4] studies about the various methodologies that have been proposed for the betterment of failure students in the academics. The author proposes a local cost sensitive technique and concludes the framework which is more effective solution for the problem.

M.N.Quadril et al [6] studies about the work of data mining in predicting the drop out feature of students. He proposed decision tree technique for choosing the best prediction and analysis about the features of failure students. The author produces the lists that are predicted as likely to drop out of students from college that are handled by the management and teachers.
METHOD

This paper proposes a method for predicting the academic student failure belongs to the process of Knowledge discovery and Data mining. The stages of the method are:

Data preprocessing

Attribute selection

Collected data

Dataset
1. Outlier Detection: The data objects in the database that does not have general behavior that of normal data is called outliers. It is detected with outlier detection method and in this paper density based outlier
  
  detection method is used [10].
  
  Outlier detection
  
  Association rule mining
  
  Classification
2. Interpretation: The obtained models are analyzed to detect the failure student in the database.
PROPOSED ALGORITHM

Classification based decision tree and rule induction

Imbalanced dataset classification with SMOTE

Apriori algorithm

Density based approach
Reachability distance from o to o:

where k is a user-specified parameter.
Figure: 3. depicts the outlier detection using local outlier factor for students.

Figure: 3. Outlier Detection
PERFORMANCE EVALUATION

The performance of the existing classification and prediction system with proposed grammar based genetic programming approach to derive the pass/failure result are tested. Measure the performance results in terms of the true positive rate (TPR), False positive rate (FPR), False Negative Rate (FNR) and True negative Rate (TNR), accuracy, Time comparison.

We analyze and compare the performance offered by classification, classification with feature selection, imbalanced classification with SMOTE oversampling technique, and prediction using association rule mining, outlier detection approaches. The performance is evaluated by the parameters such as accuracy. Based on the comparison and the results from the experiment show the proposed approach works better than the existing system.

Accuracy

Accuracy is calculated from the below given formula as

Accuracy=

True positive + True negative

True positive + True negative +False positive + False negative

TP (True positive)

In a statistical hypothesis test, two types of incorrect conclusions can be drawn. The hypothesis can be inappropriately. A positive test results accurately reflects the test for the activity is analyzed. If the outcome from a prediction is p and the actual value is also p, then it is called a true positive (TP);

True positive rate ( TPR) =TP/P P= (TP+FN)

Where P is the positive. TP is the True Positive
TN (True negative)

A result that appears negative when it should not. A true negative (TN) has occurred when both the prediction outcome and the actual value are n is the number of input data.

True negative rate (TNR) =TN/N N= (TN+FN)

Where

N is the Negative value. TN is the True Negative.
FP (False positive)

A result that indicates that a given condition is present when it is not. However if the actual value is n then it is said to be a false positive (FP).

False positive rate () = FP / (FP + TN)
FN (False negative)

False negative (FN) is when the prediction outcome is n while the actual value is p.

False negative rate () =FN / (TP + FN)

Accuracy comparison

Accuracy rate (%)

100

98

96

94

92

90

88

86

84

82

80

Methods

Figure: 4. Shows Accuracy Comparison graph

The graph shows the accuracy rate of existing system such that classification, classification with feature selection, imbalanced classification with SMOTE oversampling technique, and proposed system such as prediction using association rule mining, outlier detection approach using density based approach based on two parameters of accuracy and methods such as existng and proposed system. From the graph we can see that, accuracy of the system is reduced somewhat in existing system than the proposed system. From this graph we can say that the accuracy of the proposed system is increased which will be the best one.

VI.CONCLUSION AND FUTURE WORK

The aim of this system is to analyze the factor that affects the academic achievement of the students. It is useful in identifying weak students who are likely to perform poorly in their studies. Data mining and machine learning depend on classification which is the most essential and important task. An educational institution needs to have an approximate prior knowledge of enrolled students to predict their performance in future academics. The various data mining techniques can be effectively implemented on educational data. From the results it is clear that classification techniques can be applied on educational data for predicting the students outcome and to improve their performance for results. The efficiency of various decision tree algorithms is analyzed based on their accuracy and time to derive the tree. The predictions obtained from the system have helped the tutor to identify the weak students and improve their Performance. The classification accuracy and performance is high in the proposed system when compared to the existing system. The experimentation result gives the proposed system is more efficient than the existing system.

Finally, as the next step in our research can be carry out with more experiments using more data and also with different educational levels to test whether the same performance results are obtained with different DM approaches.

The future work continues as, to predict the student failure as soon as possible. To detect students risk in time before it is too late. To propose actions for helping students identified within the risk group. Then, to check the rate of the time to prevent the fail or dropout of that student previously detected.

REFERENCES

F. Araque, C. RoldÃ¡n, and A. Salguero, Factors influencing university dropout rates, Comput. Educ., vol. 53, no. 3, pp. 563 574, 2009.
C. Romero and S. Ventura, Educational data mining: A review of the state of the art, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 6, pp. 601618, Nov. 2010.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,

Synthetic minority over-sampling technique, J. Artif. Intell. Res., vol. 16, pp. 321357, Jun. 2002.
S. Kotsiantis, Educational data mining: A case study for predicting dropoutprone students, Int. J. Know. Eng. Soft Data Paradigms, vol. 1, no. 2, pp. 101111, 2009.
Carlos MÃ¡rquez-Vera, CristÃ³bal Romero Morales, and SebastiÃ¡n Ventura Soto Predicting School Failure and Dropout by Using Data Mining Techniques, IEEE journal of latin-american learning technologies, vol. 8, no. 1, february 2013.
M.N.Quadri, Dr.N.V.Kalyankar, Drop Out Feature of Student Data for Academic Performance Using Decision Tree Techniques, Global Journal of Computer Science and Technology, Vol 10, No 2 (2010).
Devikala.D , Kamalraj.N, Data Mining Approaches on Detection of Students Academic Failure and Dropout: A Brief Survey, International Journal of Computer Trends and Technology (IJCTT) volume 14 number 3 Aug 2014
J. MÃ¡s-EstellÃ©s, R. Alcover-ArÃ¡ndiga, A. Dapena-Janeiro, A. Valderruten-Vidal, R. Satorre-Cuerda, F. Llopis-Pascual, T. Rojo- GuillÃ©n, R. Mayo-Gual, M. Bermejo-Llopis, J. GutiÃ©rrez- Serrano, J. GarcÃa-AlmiÃ±ana, E. Tovar-Caro, and E. Menasalvas-Ruiz,

Rendimiento acadÃ©mico de los estudios de informÃ¡tica en algunos centros espaÃ±oles, in Proc. 15th Jornadas EnseÃ±anza Univ. Inf., Barcelona, Rep. Conf., 2009, pp. 512.
http://en.wikipedia.org/wiki/Association_rule_learning#Apriori_algo rithm
http://en.wikipedia.org/wiki/Local_outlier_factor

Enhanced Student Bunglers Detection using Association Rules and Predicting Outliers

Leave a Reply