Analyse Electoral Performance of Parliament Using Hadoop

DOI : 10.17577/IJERTCONV5IS22017

Download Full-Text PDF Cite this Publication

Text Only Version

Analyse Electoral Performance of Parliament Using Hadoop

K. Sai Prasad, K Shekar, G. Prabhakar Reddy, K Sai Teja

Computer Science Engineering, Computer Science Engineering, Computer Science Engineering MLR Institute of Technology, MLR Institute of Technology, MLR Institute of Technology, Hyderabad, India. Hyderabad, India. Hyderabad, India.

Abstract – Bigdata is a buzz word in market and industry for its wide applications. Hadoop a framework which is considered as the solution for many problems of Bigdata. This paper is to analyse the data on the election results for and criminal charges against all candidates contesting elections to the lower house of the Parliament of India, the Lok Sabha, in the 2004 and 2009 elections using hadoop. These were the first national elections conducted after the 2002 Supreme Court ruling mandating that all candidates running for public office file affidavits with the Election Commission of India prior to the election. In these affidavits, candidates report their criminal histories or pending criminal charges. We present graphical results of the election analysed with the surveyed data and predict the winner of the elections i.e; The Member of Parliament. This is done by collecting data of the survey of elections.

Keywords Elections, Bigdata, Hadoop. Parliament, Loksabha.

database. As a result, the same database can be viewed in many different ways. The proposed work is by using "Hadoop" [2]. Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate.


    Big data is a term that describes the large volume of data both structured and unstructured that inundates a business on a day-to-day basis. Big Data [1] is a phrase used to mean a massive volume of both structure and unstructured data. But its not the amount of data thats important. Its what organizations do with the data that matters. Electoral Performance and Criminal Status of Candidates Contesting the 2004 and 2009 Parliamentary Elections to the Lok Sabha (India) (ICPSR 35512). The understand the effectiveness of visualization of historical data we conducted a case study to determine which candidate has got how many votes. So, this can be used in any elections across the globe, small or big. Candidates and voters will determine to what extent analysis plays a major role in the visualization of elections. Visualization functions as a convenient way to display the results. Visualizing these patterns and attaching them to our memories could make the difference between winning and losing.


    The traditional process used to solve structured data is using "Relational Database Management System" (RDBMS).In RDBMS the data is stored in the form of relational structural format. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the

    Fig:1 Functional flow of processing in Hadoop.

    Fig.1. describes the process data loading and processing in Hadoop.

    • The Required datasets are collected and loaded into MySQL server.

    • Using Sqoop data from MySQL server is been loaded into Hadoop supported data query language i.e Hive.

    • Using Hive query language and hue which is an open source Web interface for analysing data, we analyse the data generate required results.


Collecting Data Sets

  • The below fig:2 shows the samples of the data sets collected and stored with pipe seperation in HDFS.

  • The above figure depicts the datasets collected for analysing electoral performance.

  • The step by step procedure involved in our process is described as below:

  • We have saved the data sets in csv format

  • After arranging the data sets, we uploaded these data sets into the MySQL data base in cloudera.

  • Later coped the csv files from local system to cloudera.

    Fig:2 Sample data format for processing

    Login into MySQL

    Now create databases and tables for the corresponding csv files in MySQL.

    Creating table in MySQL

    • The command used for creating table is:

      > create database elections;

    • Loading data from csv file into MySQL


      mysql> load data local infile 'home/cloudera/elections/elections.csv' into table parliament fields terminated by ',' lines terminated by '\r\n';

      When we run the above command we get the output as below.


      Query OK, 368 rows affected, 15 warnings (0.04 sec)

      Records: 368 Deleted: 0 Skipped : 0 Warnings: 15

      Now exit from MySQL using EXIT command.

    • Import data to Hadoop from MySQL using SQOOP

      > sqoop import connect jdbc:MySQL://localhost/elections

      username xxxx password xxxx table tournament fields-terminated-by , hive-import m 1.After executing the sqoop command the following results are generated as shown in fig:3.

      Fig 3: Result of sqoop command

    • After the successful completion of sqoop import open web browser and click on hue interface to interact with data which is loaded.

    • Later go to query editor and choose hive editor in HUE browser as shown in fig:4.

      Fig: 4 HUE browser for generating results using hadoop.

    • Now to do the analysis part select the chart option to see the relation between the different columns in terms of

  1. Pie charts

  2. Bars etc.

    • To check the relationships between the different columns in a data set change the columns on X and Y axis respectively.

    • Let us understand the analysed results as shown in fig : 5

Fig: 5 Results obtained using hive query in HUE.

Fig: 6 Final result of analysis


By using the Hadoop framework tools we performed analysis on historical dataset of elections and obtained results on those data sets in the forms of graphs and pie charts. By using this process, we can work with HDFS, hive, sqoop, hue, impala, cloudera. A novice user of Hadoop can easily understand the process which we have followed to get the results and analyzing any types of data. Our results clearly shows the crime status of various places for parliament contested members in different locations. The architecture clearly shows how data collected can be processed and analyzed. For a very huge data of structured or unstructured data we can get results effectively using the hadoop.

Scope of Future Work:

Based on this framework this we can create a platform to predict election results or any analytical results for all the countries and even make up a data base for world population. This can be made into standard system for all the analysis procedures for heavy datasets for organizations to make the works easier.


  1. HAN HU1, YONGGANG WEN2, (Senior Member, IEEE), TAT-SENG CHUA1,

    AND XUELONG LI3, (Fellow, IEEE) Toward Scalable Systems for Big Data Analytics: A Technology Tutorial, 2169- 3536 2014 IEEE.

  2. Jonathan R. Owens, Brian Femiano, Jon Lentz Hadoop Real World Solutions Cookbook ISBN:1849519129 9781849519120



  2. of-HDFS-file-read-operations.html

  3. architecture/

  4. ads/vmware_workstation/7_0


Leave a Reply