Web Mining to Detect Online Spread of Terrorism

Download Full-Text PDF Cite this Publication

Text Only Version

Web Mining to Detect Online Spread of Terrorism

Rinkle Goradia

K J Somaiya Institute of Engineering & Information Technology University of

Mumbai, India

Anjali Jhakhariya

K J Somaiya Institute of Engineering & Information Technology University of

Mumbai, India

Shravan Mohite

K J Somaiya Institute of Engineering & Information Technology University of Mumbai, India

Vijaya Pinjarkar

K J Somaiya Institute of Engineering & Information Technology University of

Mumbai, India

Abstract: In the recent times, terrorism has grown in an exponential manner in certain parts of the world. This enormous growth in terrorist activities has made it important to stop terrorism and prevent its spread before it causes damage to human life or property. With development in technology, internet has become a medium of spreading terrorism through speeches and videos. Terrorist organizations use the medium of the internet to harm and defame individuals and also promote terrorist activities through web pages that force people to join terrorist organizations and commit crimes on the behalf of those organizations. Web mining and data mining are used simultaneously for the purpose of efficient system development. Web mining even consists of many different text mining methods that can be helpful to scan and extract relevant data from unstructured data. Text mining is very helpful in detecting various patterns, keywords, and significant information in unstructured texts. Data mining and web mining systems are used for mining from text widely. Data mining algorithms are used to manage organized data sets and web mining algorithms can be helpful in mining and extracting from unstructured web pages and text data that is available across the web. Websites built in different platforms have varying data structures and that makes it quite difficult to read for a single algorithm.

Keywords: Terrorism, naïve-bayes, random forest, online spread

I. INTRODUCTION

Terrorist organizations are using the internet to spread their propaganda and radicalize youth online and encourage them to commit terrorist activities.In order to minimise the online presence of such harmful websites we need to devise a system which detects specific keywords in a particular website. The website should be flagged inappropriate if the keywords are found for efficient system development. Data mining consists of text mining methods that help us to scan and extract useful content from unstructured data. Text mining helps us to detect keywords, patterns and important information from unstructured texts. Hence, here we plan

to implement an efficient web data mining system to detect such web properties and flag them for further human review. Data mining is a technique used to extract patterns of relevant data from large data sets and gain maximum insights to the obtained results. Web mining as well as data mining are used simultaneously for efficient system development. The literature survey shows the previous work that has been carried out on this subject. The existing systems have been explained in detail in the paper.The system that we propose to implement significantly improves the current system and eliminates the flaws that exist in the existing system.The methodology and results that we achieved after the implementation of the proposed system have also been explained in brief further. This system should be helpful in anti-terrorism and cyber security response departments. The system should help the cops to track communication held between terrorists and should detect web pages developed in different platforms.

  1. LITERATURE REVIEW

    [1.]Aakash Negandhi et al. apllied various machine learning algorithms in Detect Online Spread of Terrorism Using Data Mining to mine textual information on web pages and detect their relevancy to terrorism.

    [2.]Chen, H. et al. used the features of sentiment analysis to segregate the words of a web page, classify them and assert a score to each word in "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums."

    [3.] Fawad Ali at al. studied various methods by which textual data can be fetched and scanned and executed them to counter Terrorism on Online Social Networks using web mining techniques.

    [4.] Naseema Begum et al. classified the web pages into various categories and sorted them appropriately. There are two features used in this system that are data mining and web mining.

    [5.] T.Anand et al. implemented Data mining as well as web mining are used together at times for efficient system development. System will track web pages that are more

    susceptible to terrorism and will report IP Address to the user who is using the system.

    Table 1. Comparison of existing system

    Sr No.

    Paper

    Algorithms

    Scope

    1.

    Aakash Negandhi, SohamGawas, Prem Bhatt , PriyaPorwal Detect Online Spread of Terrorism Using Data Mining

    Logistical regression

    Finds only the words that can be pegged as related to terrorism

    2.

    T.Anand Terror Tracking Using Advanced Web Mining AYCCollege of Engg

    Mayiladuthurai, India.

    Uses WEKA

    Finds the sentiments of the words

    3.

    Counter Terrorism on

    Uses various

    Online Social Networks

    Bayes

    techniques like

    Using Web Mining

    facial

    Technique

    recognition.

    Fawad Ali, Farhan

    Tree

    Hassan Khan, Saba

    mining on OSN

    Bashir, and Uzair

    regression

    Ahmad, Department of

    Computer Science,

    Forest

    Federal Urdu University

    of Arts, Science and

    Technology (FUUAST),

    Islamabad, Pakistan

    4.

    Detection of online

    Performs a

    spread of terrorism

    Tree

    well defined

    using web data mining

    cleaning of

    Naseema Begum A.

    Forest

    data and also

    Institute of Engineering

    data storage.

    and Technology,

    Coimbator

    • Decision Tree

    • Random Forest

    • Decision Tree

    • Naïve Bayes.

    • Naïve

    • KNN

    • Decision

    • Uses text

    • Logistical

    • Random

    • Decision

    • Random

  2. PROPOSED SYSTEM

We propose a system with the primary goal of developing a website where users can check any webpage or any website for any trace of terrorist activity. To do so, our website will provide the feature of entering the URL of the webpage the user wants to scan. After etering the URL,our system will tally the words of the whole webpage and tally them with the words that are already present in our database. Each word that we will store in our database will have a certain score to it. Our system will fetch the scores of each word that is present in the users web page from our database, and in the end it will calculate a total rank of the website.

This rank will determine if the users webpage contains any trace of terrorism or not.

Our system will detect patterns, keywords and relevant information in unstructured texts in a webpage using web mining as well as data mining. Our system will mine webpage using web mining algorithm to mine textual information on web pages and detect those web pages that are relevant to terrorism. Data mining as well as web mining is used together at times for efficient results.

Machine Learning algorithms:

  • Random Forest:

    Random forest algorithm, like its name implies, consists of a large number of individual decision trees that operate together. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our models prediction. [Deziel, M. et al.]

  • Decision Tree:

    A decision tree is a tree-like graph with nodes representing the place where we pick an attribute and ask a question; edges represent the answers to the question; and the leaves represent the actual output or class label. They are used in non-linear decision making with a simple linear decision surface.

  • Naïve Bayes:

    Naive Bayes classifiers are a collection of classification algorithms based on Bayes Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

  • Logistic Regression:

    The logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

  • K-nearest Neighbours:

K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970's as a non-parametric technique.

Figure 1. System block diagram

Traditionally, there was no such system to keep an eye on various websites or any suspicious words present online. Cops were unable to track the terrorist related website or any person with suspicious information. The ratio of terrorism is high in todays world. There must be a system to track those suspicious word online and bring down the ratio of terrorism. In various arrangements and have images, videos etc. intermixed on a single web page. So we here propose to use smartly designed web mining algorithms to mine textual information on web pages and detect their relevancy to terrorism. In this way we may judge web pages and check if they may be promoting terrorism. This system proves useful in anti-terrorism sectors and even search engines to classify web pages into the category. Their relevance to the field helps classify and sort them appropriately and flag them for human review.

  1. IMPLEMENTATION DETAILS

    We implemented various machine learning algorithms using the tool WEKA (Waikato Environment for Knowledge Analysis) which is a free software licensed under the GNU General Public License, and the companion

    software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

    Sr No.

    Algorithm

    Accuracy(in percentage)

    1.

    Logistic Regression

    77.47

    2.

    Naive Bayes

    88.23

    3.

    Decision Tree

    71.44

    4.

    k-Nearest Neighbors

    84.96

    5.

    Random Forest

    98.66

    Table 2: Comparison of machine learning algorithms

    We compared all of the algorithms on the basis of their accuracy and correctness (tallying the words and score stores in the database and the words on the webpage that the user wants to check) by applying these algorithms on our dataset and chose the one which has the highest accuracy: Random Forest. Above table shows each of the implemented algorithms and their accuracy. Once you login, it will redirect you to the page where you can enter the URL of the webpages that you want to check for any trace of terrorism. On entering the URL and clicking on Search, it will show you the complete webpage that its checking along with the words that have the maximum occurrences and that are tagged in the database as related to terrorism.

    The below images show the complete result.

    Figure 2. URL page

    Figure 3. Final score

    Figure 4. History of visited websites

  2. CONCLUSION AND FUTURE SCOPE

    To curb the menace of terrorism and to destroy the online presence of dangerous terrorist organizations like ISIS and other radicalization websites. We need a proper system to detect and terminate websites which are spreading harmful content used to radicalizing youth and helpless people. We analysed the usage of Online Social Networks (OSNs) in the event of a terrorist attack.

    We used different metrics like number of tweets, whether users in developing countries tended to tweet, re-tweet or reply, demographics, geo-location and we defined new metrics (reach and impression of the tweet) and presented their models. While the developing countries are faced by many limitations in using OSNs such as unreliable power and poor Internet connection, still the study finding challenges the traditional media of reporting during disasters like terrorists attacks. We recommend centres globally to make full use of the OSNs for crisis communication in order to save more lives during such.

  3. REFERENCES

Journal Papers:

  1. Aakash Negandhi, Soham Gawas, Prem Bhatt , Priya Porwal Detect Online Spread of Terrorism Using Data Mining.IOSR Journal of Engineering Volume 13,17 April 2019. So here they propose an efficient web data mining system to detect such web properties and flag them automatically for human review. Keywords: Anti-Terrorism, Data Mining, Online,

    Terrorism,World

  2. Avishag Gordon The spread of terrorism publications: A database analysis,Terrorism and Political Violence journal publishedin Dec 2007.This research note focuses on the spread of terrorism publications from 1988 to 1995 compared to their frequency of appearance from 1996 to 1998. It also identifies the core journals of this research field.

  3. A.Sai Hanuman, G.Charles Babu , P.Vara Prasad Rao, P.S.V.Srinivasa Rao ,B.Sankara Babu A Schematic Approach on Web Data Mining In Online Spread Detection of Terrorism,International Journal of Recent Technology and Engineering Volume-8, Issue-1, May 2019.So here they have propose a compelling web data mining structure to recognize such web properties and standard them thusly for human review. Index Terms: web data mining, terrorism, web structure mining, dread monger affiliations.

  4. Counter Terrorism on Online Social Networks Using Web Mining Techniques Fawad Ali, Farhan Hassan Khan, Saba Bashir, and Uzair Ahmad, Department of Computer Science, Federal Urdu University of Arts, Science and Technology (FUUAST), Islamabad, Pakistan.In this paper some major web mining techniques have been discussed which can be helpful to identify such people and terrorism may be countered from OSN. Each technique is discussed thoroughly, and effectiveness along with its pros and cons are also presented.

  5. Chen, H.. "entiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums." ACM Transactions on Information Systems, forthcoming,June 2008.In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content.

  6. J. Kiruba, P. Sumitha, K. Monisha, S. Vaishnavi Enhanced Content Detection Method to Detect Online Spread of Terrorism,International Journal of Engineering and Advanced Technology Volume-8, Issue-6S3, September 2019.They proposed a system delivery event notification which is used to monitor the activities and delivers notification according to the investigation knowledge. Alert reporting system is developed that takes earthquakes from websites and a message is sent the registered user.

  7. Michael Grenieri, Anthony Estrada Down Converter Characterization in a Synthetic Instrument Context 2006 IEEE Autotestcon.This paper provides an overview of the need for a common set of specification parameters to characterize a down converter in a synthetic instrument (SI).The paper then provides an in-depth technical discussion of two of the less understood down converter related intermediate frequency (IF) output parameters: group delay and phase linearity.

  8. Naseema Begum Detection of online spread of terrorism using web data mining A. Institute of Engineering and Technology, Coimbatore, Tamil Nadu, International Journal of Advance Research, Ideas and Innovations in Technology- Volume 5, Issue

    1. The basic idea of this project is to reduce or stop spreading of terrorism and to remove all these accounts

  9. T.Anand,S. Padmapriya,E. Kirubakaran Terror Tracking Using Advanced Web Mining 2009 International Conference on Intelligent Agent & Multi-Agent Systems.Web mining techniques can be used for detecting and avoiding terror threats caused by terrorists all over the world.

  10. United Nations Counter-Terrorism Implemantation Task Force The use of the Internet for terrorist purposes,2009.

  11. Jiawei Han,Micheline Kamber,Jian Pei "Data Mining Concepts and Techniques", Morgan Kaufmann 3nd Edition.

  12. P. N. Tan, M. Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson Education.

Leave a Reply

Your email address will not be published. Required fields are marked *