Credit Card Fraud Detection using Classification, Unsupervised, Neural Networks Models

L. Bhavya; V. Sasidhar Reddy; U. Anjali Mohan; S. Karishma

doi:10.17577/IJERTV9IS040749

Volume 09, Issue 04 (April 2020)

Credit Card Fraud Detection using Classification, Unsupervised, Neural Networks Models

DOI : 10.17577/IJERTV9IS040749

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 997
Authors : L. Bhavya , V. Sasidhar Reddy , U. Anjali Mohan , S. Karishma
Paper ID : IJERTV9IS040749
Volume & Issue : Volume 09, Issue 04 (April 2020)
Published (First Online): 05-05-2020
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Credit Card Fraud Detection using Classification, Unsupervised, Neural Networks Models

L. Bhavya1, V. Sasidhar Reddy2, U. Anjali Mohan3, S. Karishma4

Assistant Professor (Adhoc), Dept of CSE,JNTUACEP, Pulivendula, AP, India Student, Dept of CSE, JNTUACEP, Pulivendula, AP, India

Student, Dept of CSE, JNTUACEP, Pulivendula, AP, India Student, Dept of CSE, JNTUACEP, Pulivendula, AP, India

Abstract: Nowadays online transactions have grown in large quantities. Among them, online credit card transactions hold a huge share. Therefore, there is much need for credit card fraud detection applications in bans and financial business. Credit card fraud purposes may be to obtain goods without paying or to obtain unauthorized funds from an account. With the demand for money credit card fraud events became common. This results in a huge loss in finances to the cardholder.

Previously they used the most common methods like rule- induction techniques, fuzzy system, decision trees, Support Vector Machines (SVM), K-Nearest Neighbor algorithms to detect the fraud transaction using a credit card. From our perspective, neural networks will generate more accurate results.

To increase the accuracy and precision we use the algorithms Logistic Regression, K-Means, Convolution Neural Networks. Logistic Regression is a statistical model that tries to minimize the cost of how wrong a prediction is. CNN algorithm is used, to capture the intrinsic patterns of fraud behaviors learned from labeled data. So will make use of accuracy and precision to evaluate the performance of the proposed system.

Keywords: CNN, Credit Card, Fraud detection, Logistic Regression, K-Means.

INTRODUCTION

The rising of E-commerce business has resulted in a gentle growth within the usage of credit cards for online transactions and purchases. With the rise in the usage of credit cards, the number of fraud cases has also been doubled. Credit card frauds are those which are done with an intention to gain money in a deceptive manner without the knowledge of the cardholder.

A credit card fraud can be done in the subsequent ways:
1. Fake IDs: This type of fraud is done by using the personal information of other persons without any authorization.
2. Skimming Method: This type of fraud is done by installing a device called skimmer at ATM machines. The data present on the magnetic stripe of the card is collected when the card is swiped
3. Card not present: When the fraudster steals the data like the expiry date and account number of the card, they can use the card without being possessed physically.
4. False businessperson sites: this type of fraud is done by the method Phishing, where they create a website/webpage which looks similar to that of the original site.
Detecting fraud is extremely difficult. A large variety of parameters are used to choose and classify. Transactions cannot be classified into fraud/genuine strictly. However can be found by the intensive study of the defrayment habits, customers behavior and by analyzing pre-fraud patterns. Several challenges are faced by fraud detection techniques as they need to detect the fraud in a very short time and large parameters ought to be processed while training.

In our paper, we compared the techniques to evaluate, which gives the most effective performance underneath what conditions.

Logistic Regression, K-Means, and Convolution Neural Networks algorithms are used to achieve high accuracy rates and to increase the detection process. Convolution Neural networks are used to identify the underlying patterns that are followed in the previous cases and thereby increasing efficiency.
LITERATURE SURVEY

In 2019, Yashvin Jain, Namrata Tiwari proposed the comparative study of all the existing systems i.e. Support Vector machines, Bayesian network, Artificial neural networks, present to detect the credit card frauds.

In [2], they discussed the performance of Logistic Regression, Decision Tree and Random Forest to detect the fraud.

In [3], this paper discusses the probability of fraudulent transactions in prevalence and context of credit card usage.
METHODOLOGY
print("Score: ", logistic.score(X_test, y_test)p)

print("Score: ", logistic.score(X_test, y_test)p)

The models that we have used for detecting the credit card fraud are:
1. Logistic Regression
2. K-Means
3. Convolutional Neural Networks
III.2.1 LOGISTIC REGRESSION

X = df.iloc[:,:-1] y = df['Class']
print("X and y sizes, respectively:", len(X), len(y))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

X = df.iloc[:,:-1] y = df['Class']
print("X and y sizes, respectively:", len(X), len(y))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

The Logistic Regression is a Classification model used mainly for the binary classification datasets. Since our dataset is a classification dataset we used this Logistic Regression. It mainly classifies the dataset into two binary values finally which are 0s and 1s to detect the fraud in the credit card transaction. Initially, the dataset is loaded with the help of the panda's library. In the next step, the dataset is split into X and y values and sizes are printed. For training and testing, we will use the train_test_split() method and the test_size is given as 25% as shown

below.

After the division of the dataset into training and testing size Logistic Regression model is implemented on divided dataset. In this model first we should train the dataset and next testing is done for the remaining data with the help of prediction method.

The logistic score that we got here is 99.88%.

confusion_matrix = ConfusionMatrix(y_right, y_predicted)

confusion_matrix = ConfusionMatrix(y_right, y_predicted)

print("Confusion matrix:\n%s" % confusion_matrix)

print("Confusion matrix:\n%s" % confusion_matrix)

Finally the binary confusion matrix is noted for this Logistic Regression with the usage of method ConfusionMatrix() method.

conusion_matrix.plot(normalized=True)

plt.show()

confusion_matrix.print_stats()

confusion_matrix.plot(normalized=True)

plt.show()

confusion_matrix.print_stats()

The status of the Confusion Matrix are:

Confusion matrix:

Predicted

False

True

all

Actual

False

99477

30

99507

True

76

100

176

all

99553

130

99683

Fig. 2. Confusion Matrix for Logistic Regression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
MODEL COMPARISON

Among the three models, Logistic Regression outperformed well. This is due to the changing of decision boundary with the class weights features. After Logistic Regression, the better one is the Convolutional Neural Networks and the K- Means has the poorest performance. K-Means has the poorest performance because it works with the clustering process and this clustering entirely depends on finding similarities and differences in the features from the dataset and grouping them into clusters. Here for this dataset grouping is a difficult task because fraud and genuine transactions look very similar. So, it is very difficult to put the fraud and genuine transactions into separate groups.

The Logistic Regression gave us the best result among the three models. The accuracy rate for the Logistic Regression model is 99.88% with 0.079% of the validation set being false. Finally, there were the best results for Logistic Regression with balanced weights. The accuracy rate is 97.5% for Logistic Regression with balanced weights.

The Convolutional Neural Network model stands next to Logistic Regression in showing better results. For the simple neural network, we got an accuracy rate of 99.61% and the loss rate is 99.87%. So, to decrease the loss we have used multiple inner layers. Here, we have also used Principal Component Analysis (PCA). For these multiple inner layers the accuracy rate is 99.9% and the loss is reduced to 0.037%. Along with this, the validation accuracy is 47.05%.

The K-Means clustering model has the poorest performance on the dataset. Because the dataset we have taken is a classification dataset. We should classify the dataset into 0s and 1s in the dependent attribute named class. K-Means has a low accuracy rate of 54.27%. Of the wrongly predicted transactions, 99.75% were false positives, giving only 0.24% false negatives with 0.11% of the validation set.

Model

Accuracy

Logistic Regression

99.88%

K-Means

54.27%

CNN

99.61%

Accuracy Table
CONCLUSION

It was clear from the model results that the Logistic Regression model performed well on our dataset. After the Logistic regression model, the next better model is the Neural Networks model because of more hidden layers. K-Means model has poor performance on our dataset. This K-Means model works efficiently on an unsupervised dataset. In addition to our model, there are many other submissions on our project with the Random Forest model is the most common which also works very efficiently. So, the Logistic Regression model works effectively on the credit card fraud dataset with the binary classification process.
REFERENCES

A Comparative Analysis of Various Credit Card Fraud Detection Techniques, Yashvin Jain, Namrata Tiwari, ShriPriyaDubey, SarikaJain, International journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7 Issue-5S2,

January 2019
Detection and Prediction of Credit card Fraud Transactions Using Machine Learning, Kaithekuzhical Leena Kurien & Dr. Ajeet Chikkamannur VTU Research Scholar, VTU Research Centre, Bangalore, India 2Professor, Department of CSE & Engg,

R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore
Machine Learning For Credit Card Fraud Detection System International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 24 (2018) pp. 16819-16824 Â© Research India Publications. http://www.ripublication.com
Credit Card Fraud Detection Using Machine Learning as Data Mining Technique Ong Shu Yee, Saravanan Sagadevan and Nurul Hashimah Ahamed Hassain Malim School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia.
Credit card fraud detection using Machine Learning K. Karthikeyan1, K. P. Sangeeth Raj1, S. Ramaganesp, P. Parthasarathi2, Dr. N. Suguna3

Pedicted	0.0	1.0	all
Actual
0.0	43924	49910	93834
1.0	62	91	15
all	43986	50001	93987

Predicted	False	True	all
Actual
False	99477	30	99507
True	76	100	176
all	99553	130	99683

Model	Accuracy
Logistic Regression	99.88%
K-Means	54.27%
CNN	99.61%

Credit Card Fraud Detection using Classification, Unsupervised, Neural Networks Models

Previously they used the most common methods like rule- induction techniques, fuzzy system, decision trees, Support Vector Machines (SVM), K-Nearest Neighbor algorithms to detect the fraud transaction using a credit card. From our perspective, neural networks will generate more accurate results.

Leave a Reply