Mining Association Rules Using Ontologies

DOI : 10.17577/IJERTV1IS7457

Download Full-Text PDF Cite this Publication

Text Only Version

Mining Association Rules Using Ontologies

P. Sarala#1, S. Jayaprada*2

#Department Of Computer Science and Engineering

V.R.Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India

*Department Of Computer Science and Engineering V.R.Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India

Abstract

Association rule mining is considered as one of the most important tasks in Knowledge Discovery in Databases. Among sets of items in transaction databases, it aims at discovering implicative tendencies that can be valuable information for the decision-maker. The rules generated by the existing methods are in more number. To reduce the number of rules several post processing methods and many techniques were developed but they are not effective. This paper aims to develop a new frame work called Mining Interest Rules Using Ontologies for extracting association rules based on user interest and also implementing a real time web semantic engine using an extended robust framework.

Keywords- Association Rules, Association Rule Mining, Ontology, correlation measures, user constraints, Web Ontology Language.

Introduction

Data mining is the nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns from data. One important topic in data mining is concerned with the discovery of interesting association rules. Association rules mining allows nonsupervised discovery of implicative and interesting tendencies in databases. An association rule a b implies the presence of the itemset b when an itemset a occurs in a database transaction. Apriori [3] is the first algorithm proposed to extract all association rules satisfying minimum thresholds of support and confidence. If the support threshold is low then we can extract more valuable information. But usually rules are high.

To reduce the number of rules several post processing methods were developed using nonredundant rules or pruning techniques such as pruning, summarizing, grouping or visualization based on statistical information in the database. Many methods such as Rule deductive method [4], Stream Mill Miner, Data Stream Management Systems [2] , Partitioning around medoids clustering technique [5] , Constraint-based Multi-level Association Rules with an ontology support [1] were developed but which are not effective.

This paper is implementing a new framework to reduce number of rules based on user interest. User constraints can be categorized as succint constraints that can be pushed into the intial data selection process at the start of mining. Monotonic constraints can be checked and once satisfied not to do more constraint checking at their further pattern growth. Anti- monotonic constraints can pushed deep into the mining process to restrain pattern growth. Also instead of statistical measures we can use correlation measures such as lift, cosin and overall confidence and this work is extended to implement a real time web semantic engine using an Association Rule Interactive post- Processing using Schemas and Ontologies framework.

Methodology

Our proposed system Mining Interest Rules Using Ontologies consists of the following steps:

Define User Data and Search

Ontology Construction

Define User Data and Search Ontology: User will specify the particular item on which he interested. Searching for the user specified (interest) item in ontology.

Here the user is searching for the item milk-cream on which he/she interest.

Visualize the Results

Generate Candidate Itemsets

Figure1: Mining Interest Rules Using Ontologies

Input Ontology: Ontology construction for a supermarket dataset, to describe the ontology the framework uses the Web Semantic representation language, OWL.

The ontology representation of supermarket dataset which is given as input.

Figure2: Ontological representation of supermarket dataset

Figure3: List of items in dataset

Generate Candidate Itemsets: Generating candidate itemsets.

Visualize the Results: Visualization of the Results in the form of rules.

The rules/results are generated according to the user selected item milk-cream.

Figure4: Association Rules for Milk-Cream item

It is producing the results based on user selected/interested item so the rules generated by this approach is reducing the number of association rules compared with existing approach.

The above approach is used for handling single dataset, so a real time web semantic engine is implementing using an Association Rule Interactive post-Processing using Schemas and Ontologies.

The framework implementation consists of following steps:

Step1: we propose to integrate user knowledge in association rule mining using two different types of formalism: ontologies and rule schemas.

Step2: we propose to use ontologies in order to improve the integration of user knowledge in the post processing task

Step3: we propose the Rule Schema formalism extending the specification language proposed by Liu et al. for user expectations.

Step4: Furthermore, an interactive framework is designed to assist the user throughout the analyzing task.

Interactive post mining Process Description:

Ontology Construction

Filters

Defining Rule Schemas

Interactive Loop

Applying Operators

Visualizing Results

Selection/ Validation

Figure5: Interactive post mining Process

Step1: Ontology will be developing by the user on database items. To describe the ontology the framework uses the Web Semantic representation language, OWL-DL.

Step2: The user expresses his/her local goals and expectations concerning the association rules that he/she wants to find in terms of rule schemas.

Step3: Applying the operators over the rule schemas created. The operators are pruning and filtering. The filters are minimum improvement constraint filter, and item relatedness filter.

Step4: Visualizing the filtered association rules by the user.

Step5: The user validates gained results.

Step6: Filters can be applied over rules whenever the user wants to reducing the number of rules.

Step7: The interactive loop permits to the user to revise the information that he/she proposed. So the user can return to step 2 in order to modify the rule schemas, or can return to step 3 in order to change the operators.

Figure 6: Ontology Searching

Here the user wants to search ontology then it will collect all urls related to that word and constructs ontology for those and then we are applying filters which are specified in the framework. Then it will produce filtered rules (filtered links) which is shown above.

For visualizing the results two ways of representations using here are Tree and Visualization. The rules generation is also done by placing user interest. That means the user will selects his/her interest item then rules will be generates according to the selected item.

Figure 7: Representation of Results using Tree

Figure8: Representation of Results using Visualization

Figure9: Rules for user selected item Encyclopedia

Results

The results of the implementation are defined as follows:

This graph is showing the results of number of association rules provided for each item based on user interest.

Real-time analysis: Proposed approach gives better rank results

Number of

1200 30

25

ProposedRan k

GoogleRank

1000 20

15

800

10

Association Rules 600

400

5

0

http://

http://

http://

200

0

Apriori Improved Apriori

www.j www.o www.l ava.co racle.c onelyp

Pro pos edR

ank

1

3

14

Goo

gleR ank

1

4

24

This graph is presenting the comparative results between Apriori and Improved Apriori.

-Apriori algorithm implemented by using support and confidence measures so it is generating 1048 association rules.

-Improved Apriori implemented by using lift, confidence, leverage, conviction measures so it is generating 148 association rules.

-When compared with the Apriori, Improved Apriori is reducing the association rules from 1048 to 148.

Rank

Fr oz

B

ak

Br ea

Bi sc

Mi lk

Ju ic

Fr uit

P

art

C

he

Rank

55

42

14

58

70

58

81

18

23

160

140

120

Rank

100

80

60

40

20

0

This graph is presenting the comparative results between google search and our approach (using Association Rule Interactive Post mining using Ontologies and Rule schemas).

While we are searching for oracle then the related link http://www.oracle.com will be provided in 4th position by google search and in 3rd position by our approach.

We want search for lonely planet then the related link http://www.lonelyplanet.com will be provided in 24th position by google search and in 14th position by our approach.

Conclusion

The post-processing of association rules is improved by using a framework in order to generate association mining rules of user interest and also implemented a real time web semantic engine using an extended robust framework called Association Rule Interactive Post mining Using Schemas and Ontologies.

References

  1. A. Bellandi, B. Furletti, V. Grossi, and A. Romei, Ontology- Driven Association Rule Extraction: A Case Study, Proc. Workshop Context and Ontologies: Representation and Reasoning, pp. 1-10, 2007.

  2. Hetal Thakkar, Barzan Mozafari, Carlo Zaniolo. Continuous Post- Mining of Association Rules in a Data Stream Management System. Chapter VII in Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction, Yanchang Zhao; Chengqi Zhang; and Longbing Cao (eds.), ISBN: 978-1-60566-404-0.

[3]. Srikant, R., Agrawal, R.: Mining generalized association rules. In: VLDB 1995: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 407419. Morgan Kaufmann Publishers Inc., San Francisco (1995)

  1. Wenxiang Dou, Jinglu Hu, Gengfeng Wu, Interesting Rules Mining with Deductive Method, ICROS-SICE International Joint Conference 2009.

  2. Xuan-Hiep Huynh, Fabrice Guillet and Henri Briand, Extracting representative measures for the post-processing of association rules, 2006.

Leave a Reply