Novel Most Frequent Pattern Mining Approach  Using Distributed Computing Environment

Parag M. Moteria; Dr. Y. R. Ghodasara

doi:10.17577/IJERTV2IS2293

Volume 02, Issue 02 (February 2013)

Novel Most Frequent Pattern Mining Approach Using Distributed Computing Environment

DOI : 10.17577/IJERTV2IS2293

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 54
Total Downloads : 511
Authors : Parag M. Moteria, Dr. Y. R. Ghodasara
Paper ID : IJERTV2IS2293
Volume & Issue : Volume 02, Issue 02 (February 2013)
Published (First Online): 28-02-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Novel Most Frequent Pattern Mining Approach Using Distributed Computing Environment

Parag M. Moteria

PhD Scholar,

School of Computer Science, R K University, Rajkot.

Abstract

Frequent patterns are frequent data set in transactional data set, play an essential role in mining associations, correlations and many other interesting relationships among data that leads knowledge discovery and helps in many business decision making processes [1]. Data mining is a very basic operational technique in knowledge discovery and decision making processes. Frequent pattern mining techniques have become necessary for massive amount datasets in distributed data mining approach using distributed computing environment. This paper discuss novel approach for efficient and scalable distributed algorithm for most frequent itemsets generation on Boolean types of single dimensional and single level data mining using distributed computing environments in transactional dataset.

ITRODUCTION

Data mining is the process of finding interesting trends or patterns in large datasets to steer decision about future activities. Knowledge discovery in databases and data mining helps to extract useful information from raw data. Frequent itemsets play an essential role in many data mining tasks that try to find interesting patterns from databases or transactional dataset, such as association rules, correlations, sequences, episodes, classifiers, clusters. Frequent pattern mining is one of the most important and well researched techniques of data mining. Association rules can be useful for decisions concerning product pricing, promotions, store layout and many others [2]. Thus, frequent pattern mining has become an important data mining task and a focused theme in data mining research [3]. Our novel most frequent pattern mining approach using distributed computing environment is data mining where computations are spread over many independent nodes with central transactional dataset. This paper describes theoretical approach to mine

Dr. Y. R. Ghodasara

Associate Professor

most frequent pattern itemset without using user threshold in distributed computing environment.

MOST FREQUENT PATTERN MINING (MFPM) APPROACH

The proposed novel approach to design efficient and scalable MFPM in distributed computing environment using transactional dataset is as under [4].

Figure 1

Assume that, we have one server (master) say S and n numbers of nodes (slaves) say Ni (where i=2, 3,

, n. Here, n equals to total number of different items

in itemset. Each itemset consists unique items per transaction.

Consider following transactional dataset in lexicographic order [1]:

Table 1
TID	List of ITEM IDs
T100	I1, I2, I5
T200	I2, I4
T300	I2, I3
T400	I1, I2, I4
T500	I1, I3
T600	I2, I3
T700	I1, I3
T800	I1, I2, I3, I5
T900	I1, I2, I3

Step 1:

Transactional dataset resides into server.

Step 2:

Build cardinality table of each itemset by server and store maximum number of different items in itemset in variable say n.

Table 2
Cardinality	List of ITEM IDs
3	I1, I2, I5
2	I2, I4
2	I2, I3
3	I1, I2, I4
2	I1, I3
2	I2, I3
2	I1, I3
5	I1, I2, I3, I5
2	I1, I2, I3

Step3:

Server sends maximum number of items in itemset to each node Ni. Each node Ni generates flags say fp (where p = 1, 2, , nCi).

For example,

total number of different items in itemset n=5 Cardinality of itemset=4,

Node N4 generates fp (Here, p=1, 2, , 5C4 = 5)

Step 4:

Server reads cardinality from table2, depending upon cardinality corresponding itemset send to node Ni.

Step 5:

Node Ni scans itemset, flag fp sets on and code this combination of itemset say combp (where p = 1, 2,

, nCi). Each flag and combination of itemsets are

predetermined. If all predetermined flags set on, stop message raised by node Ni to server followed with computed result by node Ni.

For Example,

Node N4 scan itemset like {I1, I2, I3, I4} set f1 on and generate code comb1 that represents {I1, I2, I3, I4}.

Step 6:

When end of last itemset reach, appropriate message send by server to each node Ni, except nodes those have been submitted their result.

Step 7:

Now, each node Ni compute intersection operation on combp that corresponding flags set on. Result sends to server.

Step 8:

Compute intersection on results submitted my each node gives final result.

Case i

If final result is NULL, then make union each result. It generates most frequent itemset in transactional dataset.

Case ii)

Omit resultant set with cardinality with one and compute intersection on remaining results send by each node Ni. It generates most frequent itemset in transactional dataset.

Case iii)

Otherwise we get final result as most frequent itemset.

As per above steps,

N2 generates result {I1, I2, I3} N3 generates result {I1, I2}

N4 generates result {I1, I2, I3, I5}

N5 generates result no itemset with cardinality five

Final result equals to {I1, I2}, which is most frequent itemset in transactional dataset.

Conclusion

Our novel approach for most frequent pattern mining using distributed computing environment may build efficient and scalable distributed mining approach to enhance strength to discover knowledge. It may help to determine most frequent pattern item in transactional dataset. This novel approach is developed with theoretical background. Hence, implementation is needed as our future work.

References

Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques – Third Edition,

ELSEVIER Morgan Kaufman Publisher, July 6, 2011
D. N. Goswami, Anshu Chaturvedi, C. S. Raghuvanshi, "Frequent Pattern Mining Using Record Filter Approach",International Journal of Computer Science, Vol. 7, Issue 4, No 7, July 2010, pp 38-43
Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan, Frequent pattern mining: current status and future directions, Springer Science+Business Media, LLC 2007, pp 55-86
Anjan K Koundinya,Srinath N K,K A K Sharma, Kiran Kumar, Madhu M N and Kiran U Shanbag, "Map/Reduce Design And Implementation Of Apriorialgorithm For Handling Voluminous Data-

Sets", ACIJ, Vol.3, No.6, November 2012, pp 29-39
Lamine M. Aouad, Nhien-An Le-Khac and Tahar M. Kechadi, "Distributed Frequent Itemsets Mining in Heterogeneous Platforms", Journal of Engineering,

Computing and Architecture, Vol. 1, Issue 2, 2007
Bagrudeen Bazeer Ahamed and Shanmugasundaram Hariharan, "A Survey On Ditributed Data Mining Process Via Grid", International Journal of Database Theory and Application, Vol. 4, No. 3, September 2011, pp 77-90
Goswami D.N., Chaturvedi Anshu., Raghuvanshi C.S.,"An Algorithm for Frequent Pattern Mining Based On Apriori", IJCSE, Vol. 02, No. 04, 2010, pp 942-947
Sunil Joshi, R S Jadon and R C Jain, "A Frame Work for Frequent Pattern Mining Using Dynamic Function",

IJCSI, Vol. 8, Issue 3, No. 1, May 2011, pp 141-147
Sumithra, R.; Paul, S.; , "Using distributed apriori association rule and classical apriori mining algorithms for grid based knowledge discovery," Computing Communication and Networking Technologies (ICCCNT), 2010 International Conference on , vol., no., 29-31 July 2010, pp 1-5

Novel Most Frequent Pattern Mining Approach Using Distributed Computing Environment

Leave a Reply