An Iterative Mapreduce based Frequent Subgraph Mining Algorithm with Load Balancing

Ms. Vishnuppriya S; Sandeepkumar R; Swathi S; Vishnu Priya T. S

doi:10.17577/IJERTCONV8IS12005

RTICCT - 2020 (Volume 8 - Issue 12)

An Iterative Mapreduce based Frequent Subgraph Mining Algorithm with Load Balancing

DOI : 10.17577/IJERTCONV8IS12005

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 95
Authors : Ms. Vishnuppriya S, Sandeepkumar R, Swathi S, Vishnu Priya T. S
Paper ID : IJERTCONV8IS12005
Volume & Issue : RTICCT – 2020 (Volume 8 – Issue 12)
Published (First Online): 04-08-2020
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

An Iterative Mapreduce based Frequent Subgraph Mining Algorithm with Load Balancing

Ms. Vishnuppriya S (1), Sandeepkumar R (2), Swathi S (3), Vishnu Priya T. S (4)

Assistant Professor (1) Final Year(2),(3),(4) Department of CSE

Velalar College of Engineering and Technology Thindal, Erode

Abstract:- Health plays a major role in humans life. Healthcare insurance is provided by insurance company in order to reduce the financial costs. Nowadays, frauds in healthcare insurance was occurring and it has caused huge dollar losses all over the world. It is mainly due to the disease that occurs for more than three months which is called as chronic disease. If the investigators understand the evolution of disease earlier means they can save their insurance amount by detecting the insurance frauds earlier. In the existing method, Frequent Subgraph Mining Algorithm is used. Firstly, a graph for each patient is constructed. To mine the subgraph from that graph, Frequent Subgraph Mining is used. By using that subgraph the base disease is found. If the base disease and chronic rules satisfies means the insurance is claimed. But, it wont solve all type of problems. That means for the same chronic disease, different medication stages is not considered. All the process is done in single system whatever the system capability is and the overall time efficiency is less. In the proposed method, Load Balancing Algorithm is used along with the Frequent Subgraph Mining Algorithm. Before the nodes are assigned for mapper process, they are equally balanced by using Load Balancing. That is in the sub graph, small graphs are assigned to node A and one big graph is assigned to node B. Here for the same chronic disease, different medication stages is considered. The time efficiency is increased by balancing the process of the system.

Keywords: Evolution of disease; subgraph mining; chronic disease

INTRODUCTION:

Data mining is extracting information in the huge amount of data. Hidden information from the large datasets is discovered. It is a very new and useful technology that helps companies to focus on information which is in the data warehouses. Tools of data mining helps to predict the future trends and allows to make decisions in the business. It answers all the business questions that traditionally takes too time to resolve. Then, it also helps organization to make use of the data stored in their databases. When it comes to decision making, it is true in different types of fields and organizations. Knowledge Discovery in Databases (KDD) is defined as the process of discovering useful information from the given datas collection. The techniques of data mining is a process that includes Data cleaning, Data integration, Data selection Data transformation, Data mining, Pattern evaluation, Knowledge representation like that.

In the process of data cleaning the data's that are not relevant are removed and the data's that are common will be available in the integration. In data selection the data's that are relevant is selected and available data is transformed to the procedure of mining in transformation. The next phase is Data mining which means extracting the patterns. According to the measures, knowledge from the available data is identified in pattern evaluation. The final step in the process is the knowledge discovery which here means as the discovering of knowledge in the data mining.

Data mining techniques is used to help the users and to understand and to interpret the data mining results. These methods are applied to uncover the hidden patterns and it has been used for many years. The main reason for using data mining is to observe and analyze the behavior. The fact is that the data subsets analyzed may or may not belong to the whole domain. Some categories are involved in the function of data mining.

The tasks of data mining are classified as Predictive and Descriptive. Predictive data mining means prediction about future data sets. Descriptive data mining means it will describe new or latest information. The data mining method is used in many of the sectors. For example, the chronic disease will be identified and it also will track the regions about the spreading of disease and some programs are designed to reduce those spreading of disease. The professionals of healthcare will analyze about the diseases along with the regions of patients with maximum admissions in the hospital. With this information, they will make a way for giving awareness to the people. By giving the awareness, the patient count will be reduced. This will reduce the number of patients admitted in the hospital.
BACKGROUND:
E. Tsourakakis, Christos Faloutsos

We describe a very important primitive for the PEGASUS which is known as the GIM-V. That is an very important and useful one. GIM-V, Which is known as Generalized Iterated Matrix-Vector multiplication. This is particularly optimized and achieving the (a) desirable scale-up at those range of the given available machines (b)The linear running time at the variety of edges (c) greater than 5 times faster performance over the non- optimized version of GIM-V. Our experiments ran on M45, certainly one of the pinnacle 50 which will be considered as supercomputers in the world.

Here, our findings on the document of several real graphs, including one of the biggest publicly available of the type of Web Graphs, thanks to yahoo. The Graphs are very ubiquitous here: pc network also the cell call networks and www [1], protein regulation networks to call a few. The huge extent will only will have the data, stunning fulfillment of the type of the on-line social networks.

In Web2.0 applications all lead to the graphs of exceptional size. Then, the typical graph mining algorithms silently anticipate that the available graph will fit in reminiscence of a regular workstation, or it will be atleast on a unmarried disk; the above graphs just violate above assumptions and it span more than the one Giga- bytes, and heading to Tera and Peta-bytes of data.
EXISTING METHODS :

The existing system proposes a method to mine the chronic disease progression and helps us to detect chronic disease- related healthcare insurance fraud.

The steps are mentioned below:

Step 1 : Construct a health seeking temporal graph for each patient.

Step 2 : From the frequent disease-process subgraphs, Constrained Frequent Subgraph Mining (CFSM) algorithm is used and then the subgraph is constructed.

Step 3 : From the subgraph, Construct the base disease progression network by aggregation of the recoded graph. Step 4 : Conduct community detection on the base disease

4.2 SYSTEM FLOW DIAGRAM

EFFECTIVE CHRONIC DISEASE PROGRESSION MODEL

Graph Mining

Frequent Sub

View

progression network and transform them into chronic disease progression rules.

Step 5:Conduct chronic disease-based healthcare insurance fraud detection according to the rules obtained in Step 4

Add Node

Graph Mining

Even if the number of graphs is more, the application Add Edge

executes. That is, mine them in single system.

Proposed Capable

Node Finding For Graph Allocation

3.1 DRAWBACKS :
- All the graphs are processed in single system even if the system capability is less.
- For example, one node may be assigned with more big graphs and other node with small graphs.
- Overall time efficiency is poor.
  
  View Graph
  
  Node
PROPOSED SYSTEM :

Construct a health seeking temporal graph for each patient. Then, From the frequent disease-process subgraphs, the Constrained Frequent Subgraph Mining (CFSM) algorithm is used and then the subgraph is constructed. From the subgraph, Construct the base disease progression network by aggregation of the recoded graph.

Conduct community detection on the base disease progression network and transform them into chronic disease progression rules. Conduct chronic disease-based healthcare insurance fraud detection according to the rules obtained in Step 4. Additionally, each nodes are assigned with graphs for Map process. All the nodes should be equally balanced by using load balancing technique.

For example, in subgraph one graph is given to Node A and another graph is given to Node B. So, the map process will complete in smaller intervals in all the nodes so that reduce phase can be started immediately.
CONCLUSION :

The proposed system approach aims in calling map functions with more number of key, value pairs and also to get new key, value pairs. And the reduce function is used to compute the new cluster centers.

Unlike existing approach where all the data are given to map functions in single pass, here the passes may or may not continue based on previous iteration values.
REFERENCES :

G. Liu,M. Zhang, and F. Yan, Large-scale social network analysis based on Mapreduce, in Proc. Int. Conf. Comput. Aspects Soc. Netw., 2010, pp. 487490.
J. Dean and S. Ghemawat, Mapreduce: Simplied data processing on large clusters, Commun. ACM, vol. 51, pp. 107 113, 2008.
U. Kang, C. E. Tsourakakis, and C. Faloutsos, Pegasus: A petascale graph mining system implementation and observations, in Proc. 9th IEEE Int. Conf. Data Mining, 2009, pp. 229238.
U. Kang, B. Meeder, and C. Faloutsos, Spectral analysis for billion-scale graphs: Discoveries and implementation, in Proc.15th Pacic-Asia Conf. Adv. Knowl. Discov. Data Mining,2011, pp. 13 25.

An Iterative Mapreduce based Frequent Subgraph Mining Algorithm with Load Balancing

Leave a Reply