Author(s): Harshita Taran, Shilpa Ghode
Published in: International Journal of Engineering Research & Technology
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Volume/Issue: Volume. 6 - Issue. 07 , July - 2017
This Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Conventional data mining techniques have focused largely on finding the items that are more frequent in the transaction databases, which is also called frequent itemset mining. These data mining techniques were based on support confidence model. Itemsets which appear more frequently in the database must be of more meaning to the user from the business point of view. High Utility Itemset Mining that discovers the itemsets considering not only the frequency of the itemset but also utility associated with the itemset. Every itemset have a value like quantity, profit and other user’s interest. This value associated with every item in a database is called the utility of that itemset. Those itemsets having utility values greater than given threshold are called high utility itemsets. Prior works on this problem all employ a two-phase, candidate generation approach with one exception that is however inefficient and not scalable with large databases. The two-phase approach suffers from scalability issue due to the huge number of candidates. In this paper we present survey on a novel algorithm that finds high utility patterns in a single phase without generating candidates. The novelties lie in a high utility pattern growth approach, a lookahead strategy, and a linear data structure. Concretely, pattern growth approach is to search a reverse set enumeration tree and to prune search space by utility upper bounding. Look ahead strategy is to identify high utility patterns without enumeration by a closure property and a singleton property. The linear data structure is to compute a tight bound for powerful pruning and to directly identify high utility patterns in an efficient and scalable way, which targets the root cause with prior algorithms.
Number of Citations for this article: Data not Available
7 Paper(s) Found related to your topic:
Publish your Ph.D/Master's Thesis Online