Effective Chronic Disease Progression Model using Frequent Subgraph Mining Algorithm

M. S. Gayathri; M. Shiva; T. Hariharasudh A N; K. Ravikumar

doi:10.17577/IJERTCONV7IS01014

RTICCT - 2019 (Volume 7 Issue 01)

Effective Chronic Disease Progression Model using Frequent Subgraph Mining Algorithm

DOI : 10.17577/IJERTCONV7IS01014

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 113
Total Downloads : 82
Authors : M. S. Gayathri, M. Shiva, T. Hariharasudh A N, K. Ravikumar
Paper ID : IJERTCONV7IS01014
Volume & Issue : RTICCT – 2019 (Volume 7 – Issue 01 )
Published (First Online): 05-04-2019
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Effective Chronic Disease Progression Model using Frequent Subgraph Mining Algorithm

M. S. Gayathri B.E,

Department of CSE,

Builder Engineering College, Kangayam, Tirupur, Tamilnadu, India.

M. Shiva

B.E, Department of CSE, Builder Engineering College,

Kangayam ,Tirupur, Tamilnadu, India.

T. Hariharasudh A N B.E,

Department of CSE,

Builder Engineering College, Kangayam,Tirupur, Tamilnadu, India.

K. Ravikumar

E.,(Ph.d), Assistant Professor, Department of CSE,

Builder Engineering College,

Kangayam,Tirupur, Tamilnadu,India.

ABSTRACTPublic healthcare funds around the world a billions of dollars in losses due to Healthcare insurance fraud. Understanding disease progression can help the investigators to detect healthcare insurance frauds early on. Existing disease progression methods often ignore complex relations, such as the time-gap and pattern of disease occurrence. They also do not take into account the different medication stages of the same chronic disease, which is of great help when conducting healthcare insurance fraud detection and reducing healthcare costs. This project proposes a heterogeneous network-based chronic disease progression mining method to improve the current understanding on the progression of chronic diseases, including orphan diseases. The method also considers the different medication stages of the same chronic disease. Combining automated method and statistical knowledge lead to the emergence of a new interdisciplinary branch of science that is named Knowledge Discovery from Databases(KDD).

Keywords: Disease progression, Heterogeneous network, Knowledge from Database, Healthcare fraud.
1. INTRODUCTION:
  
  Datamining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve.
  
  Classifications of Data Mining Methods : There are different classifications of data mining. It depends on the kinds of data being mined, the kinds of knowledge being discovered and the kinds of techniques utilized.
  1. Logistic regression is a machine learning algorithm for classification. In this algorithm, the probabilities describing the possible outcomes of a single trial are modelled using a logistic function.
  2. Naive Bayes algorithm based on Bayes theorem with the assumption of independence between every pair of features. Naive Bayes classifiers work well in many real-world situations such as document classification and spam filtering.
  3. Stochastic gradient descent is a simple and very efficient approach to fit linear models. It is particularly useful when the number of samples is very large. It supports different loss functions and penalties for classification.
  4. Neighbours based classification is a type of lazy learning as it does not attempt to construct a general internal model, but simply stores instances of the training data.
  5. Dicision tree, In Given a data of attributes together with its classes, a decision tree produces a sequence of rules that can be used to classify the data.
  6. Random forest classifier is a meta-estimator that fits a number of decision trees on various sub- samples of datasets and uses average to improve the predictive accuracy of the model and controls
    
    over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement.
  7. Support vector machine is a representation of the training data as points in space separated into categories by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.
  KDD technique:
  
  The term Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the "high-level" application of particular data mining methods.KDD involves several steps, starting from understanding the organization environment, determining obvious objectives, understanding the data, cleaning, preparation and transformation of the data, selecting the appropriate data mining approach, applying data mining algorithms, and evaluation and interpretation of the findings. In this paper, we use this KDD technique to find the progression of the chronic diseases using the data we have.KDD technique gives a overall progression to find the knowledge in data, here this technique is our overall process of the paper to know the progression of a particular disease. In this paper, we propose a Heterogeneous Network-based Chronic Disease Progression Mining (HNCDPM) method to help us understand the progression of chronic disease, including orphan diseases, detect chronic disease fraud, and reduce healthcare costs.
  
  The application of Constrained Frequent Subgraph Mining (CFSM),which can maintain rare nodes and mine only subgraphs with a certain structure. Our methods can reduce thesize of the candidate subgraph set and remarkably improve computation efficiency.
2. RELATED WORKS:
  1. T. Ramraja, R.Prabhakar A work done to create a common data structure used to represent
    
    / model real world system. In this paper, a survey is done on theapproaches in targeting frequent sub graphs and various scalable techniques to find them.
  2. K.Lakshmi and Dr. T. Meyyappan. A work done, for the purpose is to help user to apply the technique in a task specific manner in various application domain. In this paper we present a detailed survey on frequent subgraph mining algorithms, which are used for knowledge
    
    discovery in complex objects and also propose a frame work for classification of these algorithms.
  3. Mrs.M.H.Sangle1, Prof.S.A.Bhavsar-A work done to propose a frequent subgraph algorithm called as gSpan-H which is iterative SS MapReduce based framework . This algorithm uses breadth first search strategy. This algorithm is isomorphism testing free approach for efficiently mine frequent subgraph. This experiments with real life and large synthetic datasets validatethe effectiveness of gSpan-H for mining frequent subgraphs from large distributed datasets.
  4. Chenfei Sun, Qingzhong Li, Lizhen Cui, Hui Li, and Yuliang Shi A work done by Frequent subgraph mining. The objective of FSM is to extract all of the frequent subgraphs in a given data set with occurrence counts a specified threshold.
  5. Chenfei Sun, Qingzhong Li, Lizhen Cui, Hui Li, and Yuliang Shi – This part of the framework essentially determines the similarity between the base chronic disease network and the healthcare history of a new patient. This method is called longitudinal node matching, which combines the sequential phases of rule- based and graph theory.
3. EXISTING SYSTEM
  The base disease progression network is constructed from a recoded graph set using statistical aggregation. In the base network Gbase, each node indicates a mined frequent disease-process subgraph, and the edge between nodes refers to the frequency with which nodes tend to occur sequentially. A node attribute called frequentness.
4. PROPOSED SYSTEM
  
  The proposed system involves all the existing system approaches. In addition, before nodes are assigned with graphs for Map process, the graphs are balanced such that all the nodes get correct number of graphs with nodes count. For example, two small graphs are given to Node A and one big graph is given to Node B. So, the map processes are completed in fewer intervals in all the nodes so that reduce phase can be started immediately. Some of the advantages of proposed system:
5. CONCLUSION

This paper proposes HNCDPM to help detect health insurance fraud. The developed method helps us understand the progression of chronic disease, including orphan diseases, and is helpful in detecting chronic disease-related fraud and reducing healthcare costs. HNCDPM considers different medication periods of the same disease and produces two types of rules: the pattern between different stages of different chronic diseases, which indicates the relationship between different types of chronic disease, and the pattern beteen different stages of the same chronic disease, which shows the

clinical path of the disease. These two types of rules can be used to help detect chronic disease fraud. The proposed system presented a novel iterative Map Reduce based frequent subgraph mining algorithm, called FSM-H. The proposed system shows the performance of FSM-H over real life and large synthetic datasets for various system and input configurations. In this project also compare the execution time of FSM-H with an existing method, which shows that FSM-H is signicantly better than the existing method.

REFERENCE

S. S. Waghade and A. M. Karandikar, A comprehensive study of healthcare fraud detection based on machine learning, Int. J. Appl. Eng. Res., vol. 13, no. 6, pp. 4175 4178, 2018.
H. Joudaki, A. Rashidian, B. Minaei-Bidgoli, M. Mahmoodi,B.Geraili,M.Nasiri,andM.Arab,Usingdata mining to detect health care fraud and abuse: A review of literature, Glob. J. Health Sci., vol. 7, no. 1, pp. 194202, 2015.
R. A. Bauder and T. M. Khoshgoftaar, A novel method for fraudulent Medicare claims detection from expected payment deviations (application paper), in Proc. 17th Int. Conf. Information Reuse and Integration (IRI), Pittsburgh, PA, USA, 2016, pp. 1119.
H. Joudaki, A. Rashidian, B. Minaei-Bidgoli, M. Mahmoodi,B.Geraili,M.Nasiri,andM. Arab,Improving fraud and abuse detection in general physician claims: A dataminingstudy,Int.J. HealthPolicyManag.,vol.5,no. 3, pp. 165172, 2016.

[5 ]J.S.Ko,H.Chaln,B.J.Trock,Z.Y.Feng,E.Humphreys,

S. W. Park, H. B. Carter, K. D. Frick, and M. Han, Variability in Medicare utilization and payment among urologists, Urology, vol. 85, no. 5, pp. 10451051, 2015.

R. A. Bauder, T. M. Khoshgoftaar, A. Richter, and

M. Herland, Predicting medical provider specialties to detect anomalousinsuranceclaims,inProc.28th Int.Conf. Tools with Articial Intelligence (ICTAI), San Jose, CA, USA, 2016, pp. 784790.
M. E. Charlson, P. Pompei, K. L. Ales, and C. R. MacKenzie, A new method of classifying prognostic comorbidity in longitudinal studies: Development and

validation, J. Chron. Dis., vol. 40, no. 5, pp. 373383, 1987.
A. Elixhauser, C. Steiner, D. R. Harris, and R. M. Coffey, Comorbidity measures for use with administrative data, Med. Care, vol. 36, no. 1, pp. 827, 1998.
M. T. A. Sharabiani, P. Aylin, and A. Bottle, Systematic review of comorbidity indices for administrative data, Med. Care, vol. 50, no. 12, pp. 1109 1118, 2012.
D. T. Wong and W. A. Knaus, Predicting outcome in criticalcare: ThecurrentstatusoftheAPACHEprognostic scoring system, Can. J. Anaesth., vol. 38, no. 3, pp. 374 383, 1991.34 BigDataMiningandAnalytics,March 2019, 2(1): 25-34
M. J. Breslow and O. Badawi, Severity scoring in the critically ill: Part 1Interpretation and accuracy of outcome prediction scoring systems, Chest, vol. 141, no. 1, pp. 245252, 2012.

[12]M.Baglioni,S.Pieroni,F.Geraci,F.Mariani,S.Molinaro,

M. Pellegrini, and E. Lastres, A new framework for distilling higher quality information from health data via social network analysis, in Proc. 13th Int. Conf. Data Mining Workshops, Dallas, TX, USA, 2013, pp. 4855.

J. G. Anderson, Evaluation in health informatics: Social network analysis, Comput. Biol. Med., vol. 32, no. 3, pp. 179193, 2002.
S. Uddin, A. Khan, and M. Piraveenan, Administrative claimdatatolearnabouteffectivehealthcarecollaboration and coordination through social network, in Proc. 48th Hawaii Int.Conf.System Sciences, Kauai, HI,USA,2015, pp. 31053114.
S. Uddin, A. Khan, and L. A. Baur, A framework to explore the knowledge structure of multidisciplinary research elds, PLoS One, vol. 10, no. 4, p. e0123537, 2015.
H. Luijks, T. Schermer, H. Bor, C. Van Weel, T. LagroJanssen, M. Biermans, and W. De Grauw, Prevalence and incidence density rates of chronic comorbidity in type 2 diabetespatients:

Anexploratorycohortstudy,BMCMed., vol. 10, p. 128,

2012.
D. Chambers, P. Wilson, C. Thompson, and M. Harden, Social network analysis in healthcare settings: A systematic scoping review, PLoS One, vol. 7, no. 8, p. e41911, 2012.
X.F.YanandJ.W.Han,gSpan: Graph- basedsubstructure pattern mining, in Proc. 2002 IEEE Int. Conf. Data Mining, Maebashi, Japan, 2002, pp. 721724.

[19]M.RosvallandC.T.Bergstrom,Mapsofrandomwalkson complexnetworksrevealcommunitystructure,Proc. Natl. Acad. Sci. USA, vol. 105, no. 4, pp. 11181123, 2008.

X. Y. Li, H. H. Cao, E. H. Chen, H. Xiong, and J.

L. Tian, BP-growth: Searching strategies for efcient behavior pattern mining, in Proc. 13th Int. Conf. Mobile Data Management, Bengaluru, India, 2012, pp. 238247.
J. A. K. Suykens, Support vector machines: A nonlinear modelling and control perspective, Eur. J. Control, vol. 7, nos. 2&3, pp. 311327, 2001.

Effective Chronic Disease Progression Model using Frequent Subgraph Mining Algorithm

Leave a Reply