Apriori Based Algorithms And Their Comparisons

Rupali; Gaurav Gupta

doi:10.17577/IJERTV2IS70053

Volume 02, Issue 07 (July 2013)

Apriori Based Algorithms And Their Comparisons

DOI : 10.17577/IJERTV2IS70053

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 492
Total Downloads : 1414
Authors : Rupali, Gaurav Gupta
Paper ID : IJERTV2IS70053
Volume & Issue : Volume 02, Issue 07 (July 2013)
Published (First Online): 01-07-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Apriori Based Algorithms And Their Comparisons

Rupali

Student, Computer Engineering University College of Engineering Punjabi University

Patiala, India

Gaurav Gupta Department of Computer Engineering

University College of Engineering Punjabi University

Patiala, India

Abstract Mining frequent itemsets from the large transactional database is a very critical and important task. The first algorithm for mining all frequent itemsets and strong association rules was the AIS algorithm by [2]. Shortly after that, the algorithm was improved and renamed Apriori. Apriori algorithm is the most classical and important algorithm for mining frequent itemsets. This paper aims to presents a basic Concepts of some of the algorithms (Direct Hashing and Pruning (DHP), Partitioning, Sampling, Dynamic Itemset Counting (DIC), Improved Apriori algorithm) based upon the Apriori like algorithm for mining the frequent item sets along with their capabilities and comparisons.

Keywords Data mining, Apriori variations, Frequent Itemsets.

INTRODUCTION

Frequent item set mining is one of the most important and common topic of research for association rule mining in data mining research area. As an association rule mining is defined as the relation between various itemsets. Association rule mining takes part in pattern discovery techniques in

knowledge discovery and data mining (KDD). As performance of association rule mining is depends upon the frequent itemsets mining, thus is necessary to mine frequent item set efficiently.

The process of extracting association rule mining consists of two parts firstly, mine all frequent itemsets pattern each of these pattern should satisfy the minimum support threshold. Once these entire frequent patterns are mined, then only second phase of mining

i.e. association rules are produced from these frequent itemsets. These association rules must satisfy the minimum support and minimum confidence. This minimum support and confidence should be defined by the user. In this way the generation of association rule is largely depends upon the generation of the frequent items in the first phase. [1, 2, 3]
A large number of algorithms with different mining efficiencies were proposed by many researchers for generation of frequent itemsets. Any algorithm should find the same set of rules though their computational efficiencies and memory requirements may be different. The best known mining algorithm is Apriori algorithm.

The further organisation of this paper is as follows. In Section 2, we briefly define the problem statement for finding the frequent itemsets from transactional database. Section

3 defines the classic Apriori algorithm. Sections 4 define the existing techniques based upon the Apriori algorithm. Section 5 defines their comparisons. Section 6 concludes the paper.
PROBLEM STATEMENT

The problem of mining association rules over market basket analysis was introduced in [4] .i.e. finding associations between the items that are present in the transaction from the database. The database may be from any retail shop, medical or from any other applications [5]. As defined in [2] the problem is stated as follows:

Let I= i1, i2 . . . im be a set of literals, called items and m is considered the dimensionality of the problem. Let D be a set of transactions, where each transaction T is a set of items such that T I. A unique identifier TID is given to each transaction. A transaction T is said to contain X, a set of items in I. X T An association rule is an implication of the form X Y , where X

I , Y I, and X Y = . An itemset X is said to be large or frequent if its support s is greater or equal than a given minimum support threshold . An itemset X satisfies a constraint C if and only if C (X) is true. The rule X Y has a support s in the transaction set D if s% of the transactions in D contain X

Y. In other words, the support of the rule is the probability that X and Y hold together among all the possible presented cases. It is said that the rule X Y holds in the

transaction set D with confidence c if c% of transactions in D that contain X also contain Y . In other words, the confidence of the rule is the conditional probability that the consequent Y is true under the condition of the antecedent X. The problem of discovering all association rules from a set of transactions D consists of generating the rules that have a support and confidence greater than a given threshold. These rules are called Strong Rules. This association- mining task can be broken into two steps:

Finding the frequent k-itemset from the large database.
Generate the association rule from these frequent item sets.
APRIORI ALGORITHM

The first algorithm for mining all frequent itemsets and strong association rules was the AIS algorithm by [1]. Shortly after that, the algorithm was improved and renamed Apriori. Apriori algorithm is, the most classical and important algorithm for mining frequent itemsets. Apriori is used to find all frequent itemsets in a given database DB. The key idea of Apriori algorithm is to make multiple passes over the database. It employs an iterative approach known as a breadth-first search (level-wise search) through the search space, where k-itemsets are used to explore (k+1)-itemsets.

The working of Apriori algorithm is fairly depends upon the Apriori property which states that All nonempty subsets of a frequent itemsets must be frequent [6]. It also described the anti monotonic property which says if the system cannot pass the minimum support test, all its

supersets will fail to pass the test [6, 1]. Therefore if the one set is infrequent then all its supersets are also frequent and vice versa. This property is used to prune the infrequent candidate elements.

It is no doubt that Apriori algorithm successfully finds the frequent elements from the database. But as the dimensionality of the database increase with the number of items then:
- More search space is needed and I/O cost will increase.
- Number of database scan is increased thus candidate generation will increase results in increase in computational cost.
  
  Therefore many variations have been takes place in the Apriori algorithm to minimize the above limitations arises due to increase in size of database. These subsequently proposed algorithms adopt similar database scan level by level as in Apriori algorithm, while the methods of candidate generation and pruning, support counting and candidate representation may differ. The algorithms improve the Apriori algorithms by:
  - Reduce passes of transaction database scans
  - Shrink number of candidates
  - Facilitate support counting of candidates
APRIORI ALGORITHM

VARIATIONS
This algorithm [12] also used to reduce the number of database scan. It is based upon the downward disclosure property in which adds the candidate itemsets at different point of time during the scan. In this dynamic blocks are formed from the database marked by start points and unlike the previous techniques of Apriori it dynamically changes the sets of candidates during the database scan. Unlike the Apriori it cannot start the next level scan at the end of first level scan, it start the scan by starting label attached to each dynamic partition of candidate sets.

In this way it reduce the database scan for finding the frequent itemsets by just adding the new candidate at any point of time during the run time. But it generates the large number of candidates and computing their frequencies are the bottleneck of performance while the database scans only take a small part of runtime.
COMPARISON OF ALGORITHMS
CONCLUSION

In this paper we studied the basic algorithm

i.e. Apriori, to mine the frequent item sets.

Apriori algorithm is, the most classical and important algorithm for mining frequent itemsets. Apriori is used to find all frequent itemsets in a given database DB. Apriori algorithm is associated with certain limitations of large database scans. Thus variations of Apriori come into existence. The various variations we discussed are DHP, Partition, DIC and Sampling algorithms. We also studied their comparisons.

REFERENCES

R.Agrawal, R.Srikant, Fast algorithms for mining association rules, Proceedings of the 20th Very Large DataBases Conference (VLDB94), Santiago de Chile, Chile, 1994, pp. 487-499.
Jiawei Han, M.Kamber, Data Mining- Concepts and Techniques, Morgan Kanufmann Publishers, Sam Francisco, 2009.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACMSIGMOD Int. Conf. Management of Data, pages 207216, Washington, D.C., May 1993.
J.Han, J.Pei and Y.Yin., Mining frequent patterns without candidate Generation, in: Proceeding of ACM SIGMOD International Conference Management of Data, 2000, pp.1-12
M.-L. Antonie and O. R. ZaÂ¨ane. Text document categorization by term association. In IEEE International Conference on Data Mining, pages 1926, December 2002.
Aggrawal.R, Imielinski.t, Swami.A. Mining Association Rules between Sets of Items in Large Databases. In Proc. Intl

Conf. of the 1993 ACM SIGMOD Conference Washington DC, USA.
Park. J. S, M.S. Chen, P.S. Yu. An effective hash-based algorithm for mining association rules. In Proc. ACM-SIGMOD Intl Conf. Management of Data (SIGMOD), San Jose, CA, May 1995, pages 175186.
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. Intl Conf. Very Large Data Bases (VLDB), Sept. 1995, pages 432443.
Pei.J, Han.J, Lu.H, Nishio.S. Tang. S. and Yang. D. H-mine: Hyper-structure mining of frequent patterns in large databases. In Proc. Intl Conf. Data Mining (ICDM), November 2001
Toivonen.H. Sampling large databases for association rules. In Proc. Intl Conf. Very Large Data Bases (VLDB), Sept. 1996, Bombay, India, pages 134145.
Data mining Concepts and Techniques by By Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers, 2006.
Brin.S, Motwani. R, Ullman. J.D, and S. Tsur. Dynamic itemset counting and implication rules for market basket analysis. In Proc. ACM-SIGMOD Intl Conf. Management of Data (SIGMOD), May 1997, pages 255264.
Data mining Concepts and Techniques by By Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers, 2006.
Dongme Sun, Shaohua Teng, Wei Zhang, Haibin Zhu. An Algorithm to Improve the Effectiveness of Apriori. In Proc. Intl Conf. on 6th IEEE Int. Conf. on Cognitive Informatics (ICCI'07), 2007.

Apriori Based Algorithms And Their Comparisons

Leave a Reply