- Open Access
- Authors : Urvashi Sharma , Dr. Sunita Varma
- Paper ID : IJERTV9IS100229
- Volume & Issue : Volume 09, Issue 10 (October 2020)
- Published (First Online): 31-10-2020
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Web Visitors Data Analytics using Hadoop Ecosystem
Urvashi Sharma, Dr. Sunita Varma
Shri G.S.I.T.S Indore, 452003, INDIA
Abstract. Hadoop is an open source framework used for storing and processing huge amount of data i.e big data. In this paper by using these tools the data is stored, processed and analysed. Data is stored in HDFS by using flume, filtered and analysed through pig and hive. The comparsion is done to find out time required by pig and hive for processing the same query. It is concluded that the time required to process the same query to hive is less as to find through pig.
1 INTRODUCTION :
Big data means a massive data set. It cannot be study by traditional computing techniques. Immense volume of the data cannot be processed or stored by applying traditional method. Normal relational database system cannot accumulate this data thats why we require some separate type of data structure and database over which we can able to analyze data which outgrowth proper output. Hadoop is an open source framework for storing data. Hadoop can handle various form of structured and unstructured data.
2 TECHNOLOGIES USED :
Hadoop 1.0 consist of two components, HDFS and map-reduce programming tool. Hadoop 2.0 is also called Hadoop Ecosystem which consist of following components-
Apache Pig : Pig is high level procedural language platform used for programming on Hadoop and Map reduce. Pig  is an Apache open source project.
Aapache hive : Hive is the tool to process structured data in HDFS. It remains on the top of the HDFS to help, summarize and analyse data.
HDFS: Hadoop distributed file system- It stores data files as similar to the original form as possible.
HBase: It is Hadoops database and compares well with an RDBMS. It reinforce structured data storage for large tables.
Zookeeper: It is combination service for distributed application.
Sqoop : It is used to move bulk data between Hadoop and structured data stored such as relational data.
Flume : Flume is system used for moving large quantities of streaming data into hdfs .
3 PROCESS OF ANALYSIS THROUGH HADOOP ECOSYSTEM
Hadoop Ecosystem The process of Hadoop services typically involves technique to analyze data. (i) The data is stored in HDFS by using flume. (ii) This data is filtered and analyzed through pig and hive. (iii) The data is stored to SQL.
Data IngestionData Filtration and Analysis4.1 Performance Analysis of result