 Open Access
 Authors : Karunambika R, Dr. R. Venkatesan, Dr. S. Lovelyn Rose
 Paper ID : IJERTV12IS090079
 Volume & Issue : Volume 12, Issue 09 (September 2023)
 Published (First Online): 12102023
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Analysing Customer Data using Utility Mining
Karunambika R PG Scholar Department of CSE
PSG College of Technology Coimbatore, India
Dr.R.Venkatesan Professor&Head Department of CSE
PSG College of Technology Coimbatore, India
Dr.S.Lovelyn Rose professor Department of CSE
PSG College of Technology Coimbatore, India
Abstract Mining High Utility Itemsets is a popular data mining task, which discovers High Utility Itemsets (HUIs) from a transaction database. Recently, several efficient algorithms have been proposed for this task with different datasets. But, many of them do not consider the onshelf time period of items and negative unit profit value of items, which is common in real transaction databases. For efficiently finding HUIs, considering these two things is crucial. In this paper, we address both of these challenges in analyzing customer data by proposing a novel efficient algorithm named FHM(Fast High Utility Itemsets Mining) to mine HUIs while considering onshelf time periods of items, and items having positive or negative unit profit. FHM is an extension of EFIM algorithm. An extensive experimental study with reallife dataset shows that the proposed algorithm can analyze the customer data (transaction database may represent purchasing behavior of customer) efficiently to discover the importance (utility) of those customer data.
Keywords: Itemset mining, Highutility mining; Fast utility mining; Highutility database merging and projection; onshelf time periods.

INTRODUCTION
Frequent Mining (FM) is a popular data mining task, which discovers frequent itemsets (groups of items) appearing frequently in the transaction database. But, the main drawback of FM is that it assumes all items have the same weight (profit or importance) and appears only once in each transaction. This lead to a bias in the realtime applications. For example, consider the transactions of the database which describes the customer behavior of purchasing items and unit profit value for each item in the transactions. In that case, FM discovers only the frequent itemsets that the customer bought, it does not consider about profit value of that frequent itemsets. So there is a
chance to miss high profitable itemsets which appears infrequently in that transaction (customers purchasing behavior). To address this issue many efficient algorithms for mining high utility itemsets are proposed which are used different techniques that reduce the time complexity and different data structures which reduces space complexity to discover high profitable itemsets from transactions. EFIM (A Fast and Memory Efficient Algorithm for High Utility Itemset Mining) is one of the most efficient algorithms for finding high profitable itemsets from transactions. EFIM uses database projection and transaction merging technique to analyze the customer data(transactions) and discovering high profitable itemsets., but EFIM was not designed to handle negative unit profit value and the onself periods of items. In this paper, we present a novel algorithm named FHM (Fast Highutility itemsets Mining) to mine HUIs while considering both positive and negative unit prots with the onself periods. It extends the current EFIM algorithm so that it can handle negative unit prots and the onself periods eciently. The rest of this paper is organized as follows. Section 2, 3, 4and 5 respectively presents the problem denition and related work, the proposed FHM algorithm, the experimental evaluation and the conclusion.

PROBLEM STATEMENT AND RELATED
WORK
In this section, we first state the problem of utility mining and then review the previous extensive studies of utility mining.
Problem Statement
Given a transaction database, the problem of finding high utility itemset is to find the complete set of itemsets whose utilities are greater than or equal to minimum utility threshold by analyzing customer data which contains different time periods for each
transaction, positive or negative unit profit value and minimum utility threshold.
Related Works
Extensive studies have been proposed for mining high utility itemsets. A base of utility mining is frequent itemsets mining. One of the important frequent mining algorithms is FPGrowth [1], which is the treebased approach, so it can find frequent itemsets without generating any candidate itemset by scanning the database just twice. However, the main drawback of frequent mining is not considering unit profit value of items. To overcome this drawback many utility mining algorithms are proposed, like FPGrowth one of the popular algorithms for utility mining is UPGrowth: An Efficient Algorithm for High Utility Itemset Mining[2], which is also used treedata structure to efficiently pruning unprofitable itemsets by scanning the database just twice and it uses different utility values like TU and TWU for pruning the search space for discovering high utility itemsets. But it generates more candidate itemsets before discovering HUIs based on userspecified minimum utility threshold. Mining High Utility Itemsets without Candidate Generation [3] is another popular utility mining algorithm, which used list structure instead of a tree. It scans the transaction database just once to calculate TWU and construct list structure for each item with item utility and remaining utility. Based on this remaining utility, items can be explored for discovering HUIs so it can give better performance than UPGrowth. EFIM is one of the most efficient algorithms [4], which uses different techniques (explained in section 3.2) to discover HUIs in less time and space requirement. The above algorithms are only considering positive unit profit value of items, does not care about negative unit profit.

THE FHM ALGORITHM
We next present our proposal, the FHM algorithm. It is an extended version of EFIM onephase algorithm, EFIM uses several novel ideas to minimize the time and memory required for discovering high utility itemsets. FHM algorithm uses these novel ideas to the database which has transactions with the onself period, negative or positive unit profit value. Subsection 3.1 reviews the depthfirst search of itemsets. Subsection 3.2 gives an overview of EFIM algorithm. Subsection 3.3 presents proposed FHM algorithm. Finally, subsection 3.4 gives the pseudo code of FHM.

DEPTHFIRST SEARCH
The FHM algorithm explores this search space using a depthfirst search, starting from the root (the empty set). During this depthfirst search, for any itemset, FHM recursively adds one item at a time according to the increasing order of TWU because it generally reduces the search space, to generate larger itemsets.

EFIM – ALGORITHM
EFIM is a single phase algorithm to discover High Utility Itemsets (HUIs) for the customer data with only positive unit profit value. Table 1. shows a Transaction Database which contains five transactions, each row represents the transaction.id, items in that transaction, TU, the utility of the items in that transaction and the onself period of that transaction.
Fig.1. DepthFirst Search Procedure for I – {a,b,c,d} Table.1. Transaction Database
Table.2. Projected Database of {c}
3.2.1 Transaction Utility (TU) and Transaction Weighted Utility (TWU)
Transaction Utility is a utility of transaction, which is a summation of utility values of the items in a transaction. The utility value of an item is a product of unit profit and frequency count of an item. For example, In Table 1 first transaction, iem as utility is
5, likewise (c, 1) and (d, 2). TU is 3(actually it is 3( 5+1+2) but here we take only positive utilities for TU calculation while we have negative utility, the reason for this is given in section 3.3.2). Transaction weighted Utility is a summation of the TU values of transactions in which a particular item appears. For example, item as TWU is 45(3+17+25). TU values of transactions shown in Table 1.
3.2.3 Three Techniques in EFIM
This algorithm mainly used three techniques to efficiently reduce the search space and time required to discover HUIs. Three techniques are,

High Utility Database Projection,

High Utility Transaction Merging,

Using UtilityBin Array to calculate Utility (Transaction Utility(TU), Transaction Weighted Utility(TWU).

High Utility Database Projection (HDP)
EFIM is a patterngrowth algorithm. It scans the horizontal database to calculate the utility. By using projected databases it reduces the cost of database scans. To reduce the memory space, EFIM performs pseudoprojections. Table 2 shows projection database of an item {c}.

High Utility Transaction Merging (HTM)
To further reduce the cost of database scans, EFIM also introduced an efficient transaction merging technique named HighUtility Transaction Merging. HFM is based on the observation that transaction databases often contain identical transactions. The technique consists of identifying these transactions and then replaces them with a single transaction while combining their utilities. Table 3 shows Merged Database for an item {c}. In that Transactions T2 and T5 has identical items e and g so, those two transactions can be merged into a single transaction T(2,5) and frequency count of those items also added respectively(for example if T2 is (e,1) (g,1) and T5 is (e,2) (g,1) after merging it becomes T(2,5) which is (e,3) (g,2)).

UtilityBin Array to calculate Utility
A utilitybin array can be used to efficiently calculate TWU of items in O(n) time (n is the number of transactions). A utilitybin array U is initialized with
zero. For example, consider the database of the running example. In this example I = {a; b; c; d; e; f}. A utilitybin array U is constructed with 7 bins since there are 7 items as illustrated in Fig. 2 A. Then, the database is scanned one transaction at a time. The first transaction is T1, which has a transaction utility of 3. Because items a, c, and d appear in transaction T1, the value 3 is added to the utilitybins U[a], U[c], and U[d]. The result is shown in Fig. 2 B. Then the next transaction T2 is read, which has a transaction utility of 17. Because items a, c, e, and g appear in transaction T2, the value 17 is added to the utilitybins U[a], U[c], and U[d]. The result is shown in Fig. 2 C. Then, the same process is repeated for the remaining transactions. The content of the utilitybin array after reading transactions T3, T4, and T5 are respectively shown in Fig. 2 D, E, and F. After the last transaction has been read, it is found that the TWU of items a, b, c, d, e, f, and g are respectively 45,56, 76, 48, 73, 25, and 53, according to their respective entries in the utilitybin array.
Table.3. Merged Transactions Database of {c}

FHM – ALGORITHM
Fast High Utility Itemsets Mining (FHM) is an extended version of EFIM algorithm with the onself period and positive or negative unit profit value. It uses the same three techniques explained in 3.2.3, but with some modifications corresponding to negative unit profit and the onself period. Following subsections illustrates handling onself periods of transactions (3.3.1) and handling negative unit profit of items (3.3.2).

HANDLING ONSELF PERIODS
This high utility mining algorithm does not consider the onshelf period of items. In reallife, some items are only sold during specific time periods (For example summer). High utility mining is biased towards the items with long shelftime because they have more opportunity to generate a higher profit. Each transaction has its period which is shown in Table 1. In FHM, we first calculate Total Period Utility (PTO) of each period. Our running example has three periods (0, 1, 2). Total Period Utility of a period is defined as the summation of TU ( and +) of
transactions in which that period appears. For example period 0th PTO is 5(2 for T1 + 7 for T2). In FHM, we calculate TWU for all items and for all periods. Each TWU is divided by that corresponding periods PTO. This value can be used for comparing with user defined Minimum Utility threshold value (minutil) for discovering HUI.
Fig.2. TWU Calculation using UtilityBin Array

HANDLING NEGATIVE UNIT PROFIT

In high utility mining, items are not allowed to have negative unit profit, but in reallife transaction databases, items are often sold at a loss. FHM integrate new ideas to handle negative items more efficiently. In FHM, first, we calculate two types of TU, TU (+) is calculated with the only positive utility of items and TU ( and +) is with both positive and negative utility of items. While calculating TWU, we use TU (+) because if we use the negative utility of an item then that item cant be explored, but there may be a chance to miss itemset which can give HUI. Likewise while calculating Total Period Utility and transaction merging (for comparing items to find identical items) we used TU ( and +).
3.4 PSEUDOCODE OF FHM
In this subsection, we present the FHM algorithm, which combines all the ideas presented in the previous section. The main procedure (Algorithm 1) takes as input a transaction database and the minutil threshold. The algorithm initially considers that the current itemset is the empty set (line 1). The algorithm then scans the database once to calculate the local utility of each item w.r.t. and positive and negative unit profit, using a utilitybin array (line 2). Note that in the case where = , the local utility of an item is its TWU
which is divided by PTO (h) h represents corresponding period. Then, the local utility of each item is compared with minutil to obtain the secondary items w.r.t to that is items that should be considered in extensions of (line3). Then, these items are sorted by ascending order of TWU and that order is thereafter used as the x order (line 4). The database is then scanned once to remove all items that are not secondary items w.r.t to since they cannot be part of any highutility itemsets (line 5). At the same time, items in each transaction are sorted according to x, and if a transaction becomes empty, it is removed from the database. Then, the database is scanned again to sort transactions by the xT order to allow O(nw) transaction merging, thereafter (line6). Then, the algorithm scans the database again to calculate the subtree utility of each secondary item w.r.t. , using a utilitybin array (line 7 and 8).Thereafter, the algorithm calls the recursive procedure Search to perform the depthfirst search starting from (line 9).
The Search procedure (Algorithm 2) takes as parameters the current itemset to be extended, the projected database, the primary and secondary items
w.r.t and the minutil threshold. The procedure performs a loop to consider each singleitem extension of of the form = U {i}, where i is a primary item
w.r.t (line 1 to 9). For each such extension , a database scan is performed to calculate the utility of and at the same time construct the projected database (line 3). Note that transaction merging is performed at the same time the projected database is constructed. If the utility of is no less than minutil, is output as a highutility itemset (line 4). Then, the database is scanned again to calculate the subtree and local utility
w.r.t of each item z that could extend (the secondary items w.r.t to ), using two utilitybin arrays (line 5). This allows determining the primary and secondary items of (line 6and 7). Then, the Search procedure is recursively called with to continue the depthfirst search by extending (line 8). Based on properties and theorems presented in previous sections, it can be seen that when FHM terminates, all and only the highutility itemsets have been output.


EXPERIMENTAL RESULT
We performed several experiments to evaluate the performance of the proposed FHM algorithm. Experiments were carried out on a computer with 64 bit Core i7 processor running Windows 7 and 4 GB of RAM. Algorithms were implemented in Java and memory measurements were done using the standard Java API. Experiments were performed using a standard dataset used in the FOSHU literature for evaluating FHM algorithm which is downloaded from SPMF opensource data mining library details shown in Table4. Subsection 4.1 gives the influence of user specified minimum utility threshold and 4.2 gives time and space complexity of FHM algorithm.

INFLUENCE OF MINIMUM UTILITY THRESHOLD USER SPECIFIED (MINUTIL)
Table 5 shows the results for minutil (0.65, 0.75, 0. 85), we can observe that while minutil value increases the number of HUIs generated was reduced.

TIME AND SPACE COMPLEXITY
The time complexity of FHM is O(lnw), where n and w is respectively the number of transactions and the average length of transactions, finally, l is a number of itemsets in the search space. Space complexity is O(Ih)+O(lnw), wherein O(Ih) I and h means total items and period, Ih gives space requirement of binarray to calculate TU and TWU and O(lnw) is space requirement for projected database because the number of projected databases is determined by the number of itemsets in the search space l.
Table.4. Dataset – mushroom
Table.5. Results


CONCLUSION
High Utility mining is an important task in many applications. But still, it has some drawbacks like not considering negative or positive unit profit value with an onself period. The EFIM algorithm uses some unique techniques to reduce the time and memory required to discover HUIs but not handled negative profit and the onself period. So, for analyzing customer data (for example, transactions represents purchasing behavior of customers) we presented an algorithm called FHM with onself period and negative unit profit of items. FHM uses the techniques in EFIM with some modifications for handling periods and negative unit profit values, like using TU(+), TU( and
+), PTO for each period and TWU which is used by dividing with respect to corresponding periods PTO.
REFERENCES
[1] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation, in Proc. of the ACMSIGMOD Int'l Conf. on Management of Data, pp. 112, 2000. [2] V.S. Tseng, B.E. Shie, C.W. Wu and P.S. Yu. Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases, in IEEE Trans. Knowl.Data Eng, 25(8):1772{1786, 2013. [3] M. Liu and J. Qu. Mining High Utility Itemsets without Candidate Generation, in Proc. 21st ACM Intern. Conf. Inform. Known. Management, pp. 55{64, 2012. [4] Zida S, FournierViger P et al. EFIM: A Highly Efficient Algorithm for HighUtility Itemset Mining, in Proceedings of the 14th Mexicaninternational conference on artificial intelligence, 9413, Springer, Berlin, pp. 530{546, 2015.
[5] G.C. Lan, T.P. Hong and V.S. Tseng. Discovery of high utility itemsets from onshelf time periods of products, in Expert Systems with Applications,.38:5851{5857, 2011. [6] G.C. Lan, T.P. Hong, J.P. Huang and V.S. Tseng. Onshelf utility mining with negative item values, in Expert Systems with Applications,.41:3450{3459, 2014. [7] C.Y. Li, J.S. Yeh, C.C. Chang. Isolated items discarding strategy for discovering high utility itemsets, in Data & Knowledge Engineering, 64(1): 198{217, 2008.