Machine Learning Framework for Detecting Spammer and Fake Users on Twitter

Akshatha T M; Dr. M. N Veena

doi:10.17577/IJERTCONV8IS14051

NCETESFT - 2020 (Volume 8 - Issue 14)

Machine Learning Framework for Detecting Spammer and Fake Users on Twitter

DOI : 10.17577/IJERTCONV8IS14051

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 1,787
Authors : Akshatha T M, Dr. M. N Veena
Paper ID : IJERTCONV8IS14051
Volume & Issue : NCETESFT – 2020 (Volume 8 – Issue 14)
Published (First Online): 28-08-2020
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Machine Learning Framework for Detecting Spammer and Fake Users on Twitter

Akshatha T M Department of MCA PES College of Engineering Mandya, Karnataka, India

Dr. M. N Veena

Department. of MCA PES College of Engineering Mandya, Karnataka, India

Abstract Twitter has rapidly become an online source for acquiring real-time his/her information about users. Twitter is an Online Social Network (OSN) where users can share anything and everything, such as news, opinions, and even their moods. Several arguments can be held over different topics, such as politics, Perticular affairs, and important events. When a user tweets something, it is instantly conveyed to her followers, allowing them to outspread the received information at a much broader level. With the evolution of OSNs, the need to study and analyze users' behaviors in online social platforms has intensity Spammers can be identified based on: (i) fake content, (ii) URL based spam detection, (iii) spam in trending topics, and (iv)fake user identification. And with the help of machine learning algorithms we are going to identify the fake user and spammer in twitter.

Keywords Spammers, fake identification, machine learning, online social platform.

INTRODUCTION

Several research works have been carried in the one of the popular social media like Twitter. Nowadays most of the people are using the twitter. In twitter also we have the fake users so in this survey we are find fake user identification from Twitter. In this paper we are going to identify the fake users based on : (i) fake content, (ii) URL based spam detection, (iii) spam in trending topics, and (iv)fake user identification. After identify the fake user. The fake user going to waste the times of others, they are going to post the post frequently and which is not related to the other user.
LITERATURE REVIEW

The survey of new methods and techniques to identify Twitter spam detection. The survey presents a comparative study of the current approaches. On the other hand, the authors in conducted a survey on different behaviors exhibited by spammers on Twitter social network. The study also provides a literature review that identify the existence of spammers on Twitter social network. Despite all the existing studies, there is still a gap in the existing literature.
Admin going to identify the fake user based on k-Means algorithms detecting fake users through hybrid techniques.
PROPOSED METHODOLOGY

In this paper we are going to divide the fake users into four types are (i) fake content, (ii) URL based spam detection,

(iii) detecting spam in trending topics, and (iv)fake user identify. With the help of Machine learning algorithms like Random forest, Minimum weight and K-means we using these algorithms in different stages to identify the fake users and spammer on twitter.

3.1 Random Forest Algorithm

In this paper we are using random forest which is comes under supervised learning in machine learning. Random forest algorithm which is used to classification, in this paper we are going to identify the spammer and firstly we have to categorized the spammer after that we are going to identify the spammer.

Steps for Random Forest algorithm

Step 1: Gather the different training data from the training dataset.

Step 2: In each data which we are gathered we have to take the particular information.

Step 3: Finally we have to predict the data

Training dataset

Training data1

Training dataset

Training data1

Training data1

Training data1

.

.

Training data2

Training data2

Training data1

Training data1

Prediction

…

Training data N

Training data N

Training data1

Training data1

..

Figure 1: Random Forest Algorithm
K-Means algorithm comes under the unsupervised learning which is used in cluster. This algorithm is used to identify the fake users in twitter.

Steps of K-Means Algorithm:

Step 1: we need to identify the number of clusters, K is num of cluster, need to be generated by this algorithm.

Step 2 : randomly select K value points and assign each value point to a cluster. That means, classify the data based on the number of value points.

Step 3 : In this step it will compute the cluster data.

Step 4 : keep fallow the following steps until we get optimal centroid which is the assignment of data points to the clusters that are not changing any more.

These are all the algorithms which we are used to done this survey.

Figure 2: Proposed Model
EXPERIMENTAL RESULTS

Figure 3: User dataset

In the above figure table contains the user information. Table

also store the url of the images

Figure 4: User Profile

This user page contains the user information this will also display to the Twitter user. And here only user can easily identify the type of the user

Figure 5: Search Friend

Search friends page used to search the users those who are using twitter . Twitter user can easily find their friends.

Figure 6: View Post Comment

In twitter user going to post the pictures and information to their friends. After that they may can get the comments to that pictures so here the user can easily view the comments which is send by their friends.

Figure 7 : Add Spammer Filter

In this page comes in the admin side here admin going to add The Spammer word to the spammer category based on the spammer category we can easily identify the spammer user. Here we are using the random forest algorithm

Figure 8: Spammer Detection

This Page admin going to identify the spammer and it categoty.

Here we are using the minimum weight algorithm.

Figure 9: Fake user Identification

In the above picture is also very important to identify Fake Users on twitter. In this page we can see the information of the fake user. We are using the k-Mean algorithm to find out the fake uers on twitter.

Figure 9: Fake User Identification Result

In above Picture we can see the fake user post in the graph.
CONCLUSION

The development of successful strategies for the spam detection and fake user recognition on Twitter, there are still many problems to further development by the researchers. The issues are highlighted as fallow: False news identification on social media is a problem that needs to be explored because of the serious repercussions of such news at individual as well as different level. Another related subject that is worth exploring is the discovery of rumor sources on social media. While a few experiments focused on different techniques have already been performed to identify the origins of misinformation, more advanced approaches, e.g., social networkbased approaches, can be extended because of their demonstrated efficacy.
REFERENCES

Spammer Detection and Fake User Identification on Social Networks Faiza Masood , Ghana Ammad ,Ahmad Almogren ,Assad Abbas , May 2019
Detection of spam-posting accounts on Twitter Isa Inuwa-

Dutse, Mark Liptrott, Ioannis Korkontzelos
A sneak into the Devils Colony- Fake Profiles in Online Social Networks Mudasir Ahmad Wani, Suraiya Jabina
Strangers Intrusion Detection – Detecting Spammers and Fake Profiles in Social Networks Based on Topology Anomalies Michael Fire, Gilad Katz, Yuval Elovici
Twitter Spammer Detection Ashwini Bhangare, Smita Ghodke, Kamini Walunj , Utkarsha Yewale
Fake News Detection on Social Media: A Data Mining Perspective Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang
Machine Learning (An Algorithmic Perspective) Stephen Marsland
N. Eshraqi, M. Jalali, and M. H. Moattar, Detecting spam tweets in Twitter using a data stream clustering algorithm, in Proc. Int. Congr. Technol., Commun. Knowl. (ICTCK), Nov. 2015, pp. 347351.
C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min, Statistical features-based real-time detection of drifted Twitter spam, IEEE Trans Apr. 2017
C. Buntain and J. Golbeck, Automatically identifying fake news in popular Twitter threads, Nov 2017.
C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. Hassan, A. AlElaiwi, and M. Alrubaian, A performance evaluation of machine learning-based streaming spam tweets detection,, Sep.

2015.
G. Stafford and L. L. Yu, An evaluation of the effect of spam on Twitter trending topics, Sep. 2013.
M. Mateen, M. A. Iqbal, M. Aleem, and M. A. Islam, A hybrid approach for spam detection for Twitter, Jan. 2017.
A. Gupta and R. Kaushal, Improving spam detection in online social networks, , Mar. 2015.
Parameshachari B D et. al Epileptic Seizure Detection Using Machine Learning, 1st International Conference on Emerging Trends in Engineering, Innovative Science and Management (ICETEISM-2019), 2019.

Machine Learning Framework for Detecting Spammer and Fake Users on Twitter

Leave a Reply