 Open Access
 Total Downloads : 523
 Authors : R. Madhuri, P. Pushpa Latha, K. Prasad Rao
 Paper ID : IJERTV2IS2378
 Volume & Issue : Volume 02, Issue 02 (February 2013)
 Published (First Online): 28022013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Weighted Association Rule Mining without Predetermined Weights
R. Madhuri1 1Asst.professor, CSE Department, GMR
Institute of Technology, A.P, India,
P. Pushpa Latha2 2Asst.professor, CSE Department, GMR
Institute of Technology, A.P, India,
K. Prasad Rao3
3Asst.professor, CSE Dept, Aditya Institute of Technology, A.P, India,
Abstract
Compared to traditional frequent pattern mining, group pattern mining incurs much more computation and storage. To discover group patterns from a set of mobile users, we have proposed an efficient fast mining algorithm that is based on frequent pattern mining. In this paper, we introduced wsupport, a new measure of item sets in databases with only binary attributes which does not require any pre determined weights. These weights are completely derived from the internal structure of the database based on the assumption that good number in transactions consists of good items. A fast miming algorithm is given and experimental results show that wsupport can be worked out without much overhead, and interesting patterns may be discovered through this new measurement.
Index Terms: Apriori, Data mining, HITS, Link based association rule, Weighted association rule.

Introduction:
Association rule mining aims to explore large transaction databases for association rules, which may reveal the implicit relationships among the data attributes. It has turned into a thriving research topic in data mining and has numerous practical applications, including cross marketing, classification, text mining, Web log analysis, and recommendation systems.
The classical model of association rule mining employs the support measure, which treats every transaction equally. In contrast, different transactions have different weights in Reallife data sets. For example, in the market basket data, each transaction is recorded with some profit. Much effort has been dedicated to association rule mining with pre assigned weights. However, most data types do not come with such preassigned weights, such as Web
page click–stream data. There should be some notion of importance in those data. For instance, transactions with a large amount of items should be considered more important than transactions with only one item. Current methods, though, are not able to estimate this type of importance and adjust the mining results by emphasizing the important transactions.
In this paper, we introduce wsupport, a new measure of item sets in databases with only binary attributes. The basic idea behind wsupport is that a frequent item set may not be as important as it appears, because the weights of transactions are different. These weights are completely derived from the internal structure of the database based on the assumption that good transactions consist of good items.
This assumption is exploited by extending Kleinbergs HITS model and algorithm to bipartite graphs. Therefore, wsupport is distinct from weighted support in weighted association rule mining, where item weights are assigned. Furthermore, a new measurement framework of association rules based on wsupport is proposed. We used HITS and Apriori algorithm on mobile and laptop database. Experimental results show that w support can be worked out without much overhead, and interesting patterns may be discovered through this new measurement. This work would be better useful for marketing people.

Data Mining Techniques:

Association Rule Mining:
Association Rule mining is one of the fundamental data mining method. Agarwal first introduced the problem of association rule mining in 1993. Since then it is one of the most popular research area on the field of knowledge discovery. The association rule
mining problem is commonly known as the market basket analysis, but there are several applications that use association rules as well i.e. biological research areas, telecommunication and network analysis etc.
Regarding the diversity of the applications that use association rule mining, several algorithms have been developed. All of these algorithms have their own advantages and disadvantages, so it is useful to compare them.
Goal of Association rule mining helps in finding interesting association relationships among large set of data items. The discovery of such associations can help develop strategies to predict.
An Association Rule is a rule of the format
LHS (left hand side) => RHS (ride hand side).
Let X and Y are two item sets where X, Y I (both side contains sets of items) and XY=Ã˜ (dont share common items).
Briefly, an Association Rule is an expression; X =>Y, where X and Y are set of items.
Each rule is assigned with two interesting measures:

Support

Confidence
Let A= { l1, l2, l3 ln} be the set of items. Let T be the set of task relevant data containing database D with transactions Ti, where Ti is a subset of the items in A.

Support:
A transaction T is said to support an item I1. If I1 is present in T it is said to support a subset of items X A, if it supports l1 in X. An item X A has support s in T denoted by s(X)T , if s% of transactions in T support X.
Support X =>Y = (XUY) = s (XUY) / D. (Where probability denotes as ).

Frequent Item set:
Let T be the transaction database and s be the user specified minimum support. An item set X A is said
to be frequent with respect to s, if s(X) T

Confidence:
The confidence of an association rule is defined as the measure of the probability of an item set dependency on the other item sets in the association rule.
Confidence X UY = (X  Y) = s (XUY) / s(X).

Association Rule:
For a given transaction database T, an association rule is an expression of the form X=>Y, where X and Y are subsets of A and X=>Y holds with confidence
,if %of transactions in T that support X also support Y. The rule X=>Y has support in the
transaction set T if % of the transactions in T support XUY.
Generally association rule mining is performed in two steps:

Find all frequent item sets:
The basic foundation of Association Rule algorithm is fact that any subset of a frequent item set must also be a frequent item set. i.e., if {AB} is a frequent item set, both {A} and {B} should be a frequent item set. Iteratively find frequent item sets with cardinality from 1 to k (kitem set)

Use frequent item sets to generate strong rules having minimum confidence:
The first step, find all frequent item sets, is expensive in terms of computation, memory usage and I/O resources. Much of the research effort in these algorithms has been related to improving the efficiency of this first step.
The second step, use frequent item sets to generate strong rules having minimum confidence, is relatively certain, but it can still be very expensive when solving realworld problems.
The aim of the algorithm is to discover all association rules with
Support minimum Support Confidence minimum Confidence
These rules are very simple, understandable and
useful for association rule mining. However, the determination of these rules is a major challenge due to the very large data sets and the large number of potential rule.


Weighted Association Rule Mining:
The methodology of weighted association rule mining is to assign weights to items, invent new measures (weighted support) basd on these weights, and develops the corresponding mining algorithms. A directed graph is created where nodes denote items and links represent association rules. A generalized version of HITS algorithm which is shown below applied to the graph to rank the items, where all nodes and links are allowed to have weights. However, the model has a limitation that it only ranks items but does not provide a measure like weighted support to evaluate an arbitrary item set. Anyway, it may be the first successful attempt to apply linkbased models to association rule mining.

HyperlinkInduced Topic Search (HITS) Algorithm:
Kleinberg's HITS algorithm, Hypertext Induced Topic Selection", is a standard algorithm of Link Analysis that rates web pages, developed by Jon Kleinberg. The premise of the HITS algorithm is that a web page serves two purposes: to provide information and to provide links relevant to a topic. This gives two ways to categorize a web page. A web page is an authority on a topic if it provides good information, and it is a hub if it provides links to good authorities. The HITS algorithm is an iterative algorithm developed to quantify each pages value as a hub and an authority.
The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs.
The Hub score and Authority score for a node is calculated with the following algorithm:
Start with each node having a hub score and authority score of 1.
Run the Authority Update Rule Run the Hub Update Rule
Normalize the values by dividing each Hub score by the sum of the squares of all Hub scores, and dividing each Authority score by the sum of the squares of all Authority scores.
Repeat from the second step as necessary.
Item set evaluation by support in classical association rule mining is based on counting. We introduced a linkbased measure called wsupport and formulate association rule mining in terms of this new concept.
The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. We have taken the models of mobiles and laptops for finding the hub values which are very important to find the number of hits for each and every model.
Here we have taken the sample data of mobiles and found the hub weights of each model specified.
Following hub weights are obtained for each mobile transactions
Table 1. Hub weights of Mobile Transactions
TID
Transaction
Hub weight
1
X501, lumia 900, Xperia mini pro, Txt Pro
0.518
2
Xperia mini pro, 7210, X202
0.436
3
X501, lumia 900
0.233
4
X501
0.148
5
Xperia mini pro, 7210, X202, p350
0.544
6
X501, X202, p350
0.412
In HITS algorithm, as the iteration converges, the authority weight auth(i)=T:iThub(T) represents the significance of an item i. Accordingly, we generalize the formula of auth(i) to depict the significance of an arbitrary item set, as the following definition shows:
The wsupport of an item set X is defined as
T: XTTD hub(T) Wsupp (x) = ——————————
T: TD hub(T)
Where hub(T) is the hub weight of transaction T. An item set is said to be significant if its wsupport is larger than a userspecified value.
2.4 Mining the Most Interesting Rules:
Several algorithms have been proposed for finding the best, optimal, or most interesting rule(s) in a database according to a variety of metrics including confidence, support, gain, chisquared value, gini, entropy gain, laplace, lift, and conviction. In this paper, we show that the best rule according to any of these metrics must reside along a support/confidence border. Further, in the case of conjunctive rule
mining within categorical data, the number of rules along this border is conveniently small, and can be mined efficiently from a variety of realworld data sets. We also show how this concept can be generalized to mine all rules that are best according to any of these criteria with respect to an arbitrary subset of the population of interest. We argue that by returning a broader set of rules than previous algorithms, our techniques allow for improved insight into the data and support more userinteraction in the optimized rulemining process.


Problem Description:
In classical models users can rate the products in the website based upon binary attributes such as like or dislike only. Here the top most products are getting by using preassigned weights, so apart from top most products it can display the top most products according to the support system. Already existing model is having some limitations such as
There is no calculation of quality transaction.
There is no estimation of awareness of user.
Chance of getting quality product is very less.
It gives equal priority to all transaction items
In this paper we eliminate equal priority of each transaction item and the priority of transaction items are calculated based on the number of clicks made by the user. Once a registered user rate a product, analysis is made on the database. By considering this rating, we update product rating every time in database. We calculate the Wsupport values of products to display the top most products. Finally an analysis is made over top products rated by the users and frequent products are mined using an Apriori algorithm to get the perfect results. Hence the results that were observed are
It provides quality of transactions for the particular user.
It provides wsupport value based on the transaction items.
By using this system easily we can maintain different user details.
We introduce wsupport, a new measure of item sets in databases. The basic idea behind wsupport is that a frequent item set may not be as important as it appears, because the weights of transactions are different. These weights are completely derived from the internal structure of the database based on the assumption that good transactions
consist of good items. The concept of association rule was proposed the support measurement framework and reduced association rule mining to the discovery of frequent item sets. A fast mining algorithm, Apriori, was proposed. Much effort has been dedicated to the classical (binary) association rule mining problem. These algorithms strictly follow the classical measurement framework and produce the same results once the minimum support is given.

Algorithms:

Fast Mining Algorithm:
The problem of mining association rules that satisfy some minimum wsupport and wconfidence can be decomposed into two sub problems:

Find all significant item sets with wsupport above the given threshold.

Derive rules from the item sets found in Step 1.
The first step is more important and expensive. The key to achieving this step is that if an item set satisfies some minimum wsupport, then all its subsets satisfy the minimum wsupport as well. It is called the downward closure property.
 h4>Initialize auth (i) to 1 for each item i

for (l=0; l<num_it; l++) do begin

auth (i) =0 for each item i

for all transactions tD do begin

hub (t) = i: it auth (i)

auth (i) +=hub (t) for each item it

End

auth (i) =auth (i) for each item I,
normalize auth

End

L1= {{i}: wsupp (i)>=minsupp}

for (k=2; Lk1; k++) do begin

Ck=apriorigen (Lk1)

for all transactions td do begin

Ct=subset (Ck, t)

for all candidates c Ct do

C.wsupp+=hub (t)

H +=hub (t)

End

Lk= {cckc.wsupp/H>=minwsupp}

End


Apriori algorithm:
For finding frequent patterns, associations, correlations, or causal structures among sets of items in transaction databases, relational databases, and other information repositories we use an influential algorithm known as Apriori Algorithm. Apriori algorithm is used in Basket data analysis, cross marketing, catalog design, Lossleader analysis, clustering, classification etc.
In order to express the quality of an association rule one uses measures such as
Support: This metric determines how often a rule is satisfied in the transaction database. It is obtained by dividing the support count for X Y by the total number of transactions.
Confidence: This metric determines how often items in Y appear in transactions that contain X.
Algorithm
Join Step: Ck is generated by joining Lk1with itself Prune Step: Any (k1)itemset that is not frequent cannot be a subset of a frequent kitemset


Results:
Figure 5.1 Home Page
Pseudocode
Ck: Candidate itemset of size k. Lk : frequent itemset of size k. L1 = {frequent items};
for (k = 1; Lk !=Ã˜; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database
do
increment the count of all candidates in
Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min support end
return Uk Lk;
Generate Candidates
Suppose the items in Lk1 are listed in an order Step 1: selfjoining Lk1
insert into Ck
select p.item1,p.item2, ,p.itemk1, q.itemk1 from Lk1 p, Lk1 q
where p.item1=q.item1, , p.itemk
2=q.itemk2, p.itemk1 < q.itemk1 Step 2: pruning
For all itemsets c in Ck do For all (k1)subsets s of c do
if (s is not in Lk1) then delete c from Ck
Figure 5.2 Product details Page
Figure 5.3 Mobile details page
Figure 5.4 Mobile ranking page
Figure 5.5 WAR of mobiles using WSupport
Figure 5.6 Frequent item sets using measure

Conclusion:
In this paper we have developed a novel framework in association rule mining. First, the HITS model and algorithm are used to derive the weights of transactions from a database with only binary attributes. Based on these weights, a new measure w support is defined to give the significance of item sets. It differs from the traditional support in taking the quality of transactions into consideration. Then, the wsupport of association rules are defined in analogy to the definition of support. An Apriority like algorithm is proposed to extract association rules whose wsupport and wconfidence are above some given thresholds.
Experimental results show that the computational cost of the linkbased model is reasonable. At the expense of three or four additional database scans, we can acquire results different from those obtained by traditional countingbased models. Particularly for sparse data sets, some significant item sets that are not so frequent can be found in the link based model. Through comparison, we found that our model and method address emphasis on highquality transactions. The linkbased model is useful in adjusting the mining results given by the traditional techniques. Some interesting patterns may be discovered when the hub weights of transactions are taken into account. Moreover, the transaction ranking approach is precious for estimating customer potential when only binary attributes are available, such as in Web log analysis or recommendation systems.
References:

R.Agrawal, T.Imielinski and A. Swami, Mining Association Rules between Sets of Items in Large Datasets, Proc. ACM SIGMOD 93, pp. 207216, 1993.

R.Agrawal and R.Srikant, Fast Algorithms for Mining Association Rules, Proc. 20th Intl Conf. Very Large Data Bases (VLDB 94), pp. 487499, 1994.

J.M.Kleinberg, Authoritative Sources in a Hyperlinked Environment, J. ACM, vol. 46, no. 5, pp. 604632, 1999.

O. Kurland and L. Lee, Respect My Authority! HITS without Hyperlinks, Utilizing ClusterBased Language Models, Proc. ACM SIGIR, 2006.

K.Wang and M.Y.Su, Item Selection by Hub Authority Profit Ranking, Proc. ACM SIGKDD, 2002.

G.D.Ramkumar, S.Ranka, and S.Tsur, Weighted Association Rules: Model and Algorithm, Proc. ACM SIGKDD, 1998.

W. Wang, J. Yang, and P.S. Yu, Efficient Mining of Weighted Association Rules (WAR), Proc. ACM SIGKDD 00, pp. 270274, 2000.
BIOGRAPHIES
Madhuri Ravi, is M.Tech in Computer Science from JNTU, A.P., India. She has vast experience in Computer Science and Engineering areas. She is presently working as Assistant Professor in Department of Computer Science and Engineering, GMRIT, Rajam, A.P, India. Her area of research includes Data Mining, Computer Networks and Computer Architecture. 

Pushpa Latha Palli, is M.Tech in Computer Science from Andhra University, A.P., India. She has vast experience in Computer Science and Engineering areas. She is presently working as Assistant Professor in Department of Computer Science and Engineering, GMRIT, Rajam, A.P, India. Her area of research includes Data Mining, Web Technologies and Emerging Technologies. 

Prasad Rao Karu, is M.Tech in Computer Science from JNTU, A.P., India. He has vast experience in Computer Science and Engineering areas. He is presently working as Assistant Professor in Department of Computer Science and Engineering, AITAM, Tekkali, A.P, India. His area of research includes Data Mining, Databases, Web Technologies. 