Comparative Analysis of Various Tools for Data Mining and Big Data Mining

—Data mining and knowledge discovery has emerged to extract useful, interesting, and unknown patterns and knowledge from huge amount of database. Big data is the term used to delineate massive amounts of information of both structured and unstructured data types. Data mining techniques can be classified as classification, association, clustering, anomaly detection, regression analysis, prediction, and tracking patterns. Data mining tools which are helpful to achieve above data mining techniques. This research analysis various data mining and big data mining tools with different perspectives. This research will help for researchers to select appropriate data mining tool or tools for their research.


I. INTRODUCTION
Data mining is an essential step of knowledge discovery process by analyzing the varieties of data from miscellaneous perspectives and summarizing it into useful knowledge {1, 7,12,13,14]. Data mining is widely used in various application domains such as Future Healthcare, Market Basket Analysis, Manufacturing Engineering, Education, Customer Relationship Management, Fraud Detection, Intrusion Detection, Lie Detection, Customer Segmentation, Financial Banking , Corporate Surveillance, Research Analysis, Criminal Investigation, Bio Informatics., and Science Exploration [7,8,13,15,16,25,31,32,35]. In today's digital world, we are surrounded with big data that is forecasted to grow 40% per year into the next decade. The data could be anything from a real time transaction, climatic situations, computers, and mobile logs, posts or tweets from social media and more and more. If the data is impossible keep in a single machine store and process, then such data could be named as Big Data. Data mining techniques can be classified as follows: Classification: Classification is the most commonly used data mining technique which cover a set of pre classified samples to create a model that can classify the big data [7,8,9]. Association: This technique helps to find the association between two or more items. It helps to know the relations between the different variables in databases [17,18] Clustering: Clustering is the division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. Clustering can be viewed as a data modeling technique that provides for concise summaries of the data [2,3]. Anomaly Detection: Anomaly detection is defined as the process of finding the patterns in a given dataset whose behavior is abnormal or unexpected.  In today's big data context, the previous approaches are either incomplete or substandard. The Big Data analytics lifecycle can be divided into the following nine stages.

B. Data Mining Tools
The implementation of data mining techniques requires the use of powerful software tools. Today number of data mining tools are available with different categories, the choice of the most suitable tool becomes increasingly difficult. This paper attempts to survey the availability of the traditional data mining and big data mining software in a several categorizations such as commercial and open source, business size, platform, deployments, data mining tasks and methods, and visualization.

II.
LITERATURE REVIEW A comparative analysis of data mining tools and to observe their behavior based on some selected parameters which will further be helpful to find the most appropriate tool for the given data set and the parameters [  data science teams more productive through an open source platform for data preparation, machine learning, deep learning, text mining, predictive analytics and model deployment.  TANAGRA: Tanagra is one of the free open source software for academic and research purposes which is provides various data mining methods from exploratory data analysis, statistical data mining, machine learning, and deep learning.  ELKI: This is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains.

B. Traditional Commercial Data Mining Tools
 Sisence: Sisense is a business intelligence platform that lets you join, analyze, and picture out information they require to make better and more intelligent business decisions and craft out workable plans and strategies.  Neural Designer: This is a desktop application for data mining which is uses neural network and machine learning  SharePoint: SharePoint is a Microsoft-hosted cloud service that empowers companies to store, access, share, and manage documented information from all devices.  Cognos: IBM Cognos is a set of smart self-service capabilities that enable them to quickly and confidently determine and make decisions on insight. The engaging experience provided by Cognos Analytics encourages business users to make and/or configure dashboards and reports on their ownwhile providing IT with a proven and scalable platform that can be deployed either on premises or in cloud.  Borad: Board is a Management Intelligence Toolkit that combines compact software. BOARD enables users to collect and gather data from almost any source, as well as create full self-service reporting. These reports can be delivered in different formats if needed, like CSV, HTML and more. Features of business intelligence (BI) and corporate performance management (CPM) into a comprehensive and compact software.
C. Big Data Mining Tools  Sisence: Sisense is a business intelligence platform that lets you join, analyze, and picture out information they require to make better and more intelligent business decisions and craft out workable plans and strategies.