Credit Card Fraud Detection using Classification, Unsupervised, Neural Networks Models

: Nowadays online transactions have grown in large quantities. Among them, online credit card transactions hold a huge share. Therefore, there is much need for credit card fraud detection applications in bans and financial business. Credit card fraud purposes may be to obtain goods without paying or to obtain unauthorized funds from an account. With the demand for money credit card fraud events became common. This results in a huge loss in finances to the cardholder. Previously they used the most common methods like rule-induction techniques, fuzzy system, decision trees, Support Vector Machines (SVM), K-Nearest Neighbor algorithms to detect the fraud transaction using a credit card. From our perspective, neural networks will generate more accurate results. To increase the accuracy and precision we use the algorithms Logistic Regression, K-Means, Convolution Neural Networks. Logistic Regression is a statistical model that tries to minimize the cost of how wrong a prediction is. CNN algorithm is used, to capture the intrinsic patterns of fraud behaviors learned from labeled data. So will make use of accuracy and precision to evaluate the performance of the proposed system.


I. INTRODUCTION
The rising of E-commerce business has resulted in a gentle growth within the usage of credit cards for online transactions and purchases. With the rise in the usage of credit cards, the number of fraud cases has also been doubled. Credit card frauds are those which are done with an intention to gain money in a deceptive manner without the knowledge of the cardholder. A credit card fraud can be done in the subsequent ways: 1. Fake IDs: This type of fraud is done by using the personal information of other persons without any authorization. 2. Skimming Method: This type of fraud is done by installing a device called "skimmer" at ATM machines. The data present on the magnetic stripe of the card is collected when the card is swiped 3. Card not present: When the fraudster steals the data like the expiry date and account number of the card, they can use the card without being possessed physically. 4. False businessperson sites: this type of fraud is done by the method Phishing, where they create a website/webpage which looks similar to that of the original site.
Detecting fraud is extremely difficult. A large variety of parameters are used to choose and classify. Transactions cannot be classified into fraud/genuine strictly. However can be found by the intensive study of the defrayment habits, customer's behavior and by analyzing pre-fraud patterns. Several challenges are faced by fraud detection techniques as they need to detect the fraud in a very short time and large parameters ought to be processed while training. In our paper, we compared the techniques to evaluate, which gives the most effective performance underneath what conditions. Logistic Regression, K-Means, and Convolution Neural Networks algorithms are used to achieve high accuracy rates and to increase the detection process. Convolution Neural networks are used to identify the underlying patterns that are followed in the previous cases and thereby increasing efficiency.
II. LITERATURE SURVEY In 2019, Yashvin Jain, Namrata Tiwari proposed the comparative study of all the existing systems i.e. Support Vector machines, Bayesian network, Artificial neural networks, present to detect the credit card frauds. In [2], they discussed the performance of Logistic Regression, Decision Tree and Random Forest to detect the fraud. In [3], this paper discusses the probability of fraudulent transactions in prevalence and context of credit card usage. In the very step, we downloaded the dataset from the website Kaggle and the link is https://www.kaggle.com/mlgulb/creditcardfraud#__sid=js0.So, initially, the dataset is read carefully and understood. Next, the dataset is sampled i.e. apart of the dataset is taken for observation and preprocessing is done. Here, the unwanted data is removed if necessary. After data pre-processing the dataset is divided for training and testing. We divided 75% of the dataset for training and the remaining 25% of the dataset for testing. The next step is to create the models for the dataset. The Logistic Regression is a Classification model used mainly for the binary classification datasets. Since our dataset is a classification dataset we used this Logistic Regression. It mainly classifies the dataset into two binary values finally which are 0's and 1's to detect the fraud in the credit card transaction. Initially, the dataset is loaded with the help of the panda's library. In the next step, the dataset is split into X and y values and sizes are printed. For training and testing, we will use the train_test_split() method and the test_size is given as 25% as shown below.
After the division of the dataset into training and testing size Logistic Regression model is implemented on divided dataset.
In this model first we should train the dataset and next testing is done for the remaining data with the help of prediction method.
The logistic score that we got here is 99.88%. Finally the binary confusion matrix is noted for this Logistic Regression with the usage of method ConfusionMatrix() method.
The status of the Confusion Matrix are: Confusion matrix:

Fig. 2. Confusion Matrix for Logistic Regression
III.2.2. K-MEANS K-Means clustering algorithm is one of the unsupervised mac hine learning algorithms. This K-Means algorithm group simi lar data points together and finds the underlying patterns. To obtain these clusters should be formed. Here the clusters are n othing but centroids. Here in this K-Means model, we used P CA to reduce the dataset attributes into 2 components and sca ling is also done for them. Then we used the train_test_split() method for training and testing..  In the above figure, the centroids are marked with white cross. Here we have two white crosses. So, we have two centroids for our dataset. This model gives a test accuracy of 54.27%. This is because this algorithm works only on an unsupervised dataset.

III.2.3. CONVOLUTIONAL NEURAL NETWORKS The Convolutional
Neural Networks is a neural network algorithm mainly used for classifying and detecting the image. But we can also use this in credit card fraud detection. Initially, the dataset is loaded and split for training and testing purposes with the ratio 75:25 as shown below.

a) Simple Neural Networks
To create a neural network we Sequential() method. Initially, we created only two hidden layers in this model. Even though the accuracy is good but the loss value is high.
The summary of the model that created is:

808
The summary for this five hidden layer model is: After the creation of the model, the PCA reduced components are used for testing using five epochs. By using five epochs we will get finally an accuracy rate of 99.99% with less loss v alue 0.038%.

OUTPUT:
Train The Convolutional Neural Network model stands next to Logistic Regression in showing better results. For the simple neural network, we got an accuracy rate of 99.61% and the loss rate is 99.87%. So, to decrease the loss we have used multiple inner layers. Here, we have also used Principal Component Analysis (PCA). For these multiple inner layers the accuracy rate is 99.9% and the loss is reduced to 0.037%. Along with this, the validation accuracy is 47.05%. The K-Means clustering model has the poorest performance on the dataset. Because the dataset we have taken is a classification dataset. We should classify the dataset into 0's and 1's in the dependent attribute named class. K-Means has a low accuracy rate of 54.27%. Of the wrongly predicted transactions, 99.75% were false positives, giving only 0.24% false negatives with 0.11% of the validation set.