An Efficient Way to Handle the High Dimensional Problem with Fuzzy Association Rule

Indhumathi. V; Karthika. S K

doi:10.17577/IJERTCONV4IS19038

NCICCT - 2016 (Volume 4 - Issue 19)

An Efficient Way to Handle the High Dimensional Problem with Fuzzy Association Rule

DOI : 10.17577/IJERTCONV4IS19038

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 98
Total Downloads : 16
Authors : Indhumathi. V, Karthika. S K
Paper ID : IJERTCONV4IS19038
Volume & Issue : NCICCT – 2016 (Volume 4 – Issue 19)
Published (First Online): 24-04-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

An Efficient Way to Handle the High Dimensional Problem with Fuzzy Association Rule

Indhumathi. V

Department of Computer Science and Engineering M.A.M College of Engineering Siruganur, Trichy,

Karthika. S K

Department of Computer Science and Engineering M.A.M College of Engineering Siruganur, Trichy,

AbstractPrediction is one of the important tasks in data mining process which predicts the future events and unknowns in the form of continuous valued functions. Mining sequential rule in data mining has limitations like very specific rules and similar rules representing the same situation. This creates three major problems such as rating of similar rules can be different, rules are not found because the individual rules are considered uninteresting and too specific rules have less chance for making predictions. In this paper, these issues can be addressed using the idea of mining Partially Ordered Sequential Rules (POSR), where items in the antecedent and consequent are unordered. Fuzzy logic is also used in this paper to remove the infrequent items. RuleGrowth and TRule Growth algorithm are proposed to mine POSR where TRule Growth algorithm accepts the sliding window constraint to find rules in maximum amount of time which has excellent scalability and provides higher prediction accuracy.

KeywordsPrediction; Sequential Rule; Fuzzy logic; Rule Growth; TRule Growth; sliding window constraint

INTRODUCTION

The most important research domain in this 21st century is the Data Mining, the process of examining large pre-existing database to generate new information. The goal is to extract only frequents from a large amount of data. Several algorithms have been proposed for this purpose of extraction. Prediction is final goal in data mining to predict something that will happen in future. For example, predicting the future value of a company in a stock market analysis, weather condition, and risk of individual diseases and also behavior of the learners.

Major issues concerned with prediction are speed, scalability, predictive accuracy, goodness of rules. Before starting the prediction the dataset need to be classified first as to make the process of prediction easier. The existing work uses many algorithms such as GSP, SPADE, and PrefixSpan for sequential pattern mining. These algorithms mislead the user to make decisions because the patterns found by these algorithms are only based on the support value. Solution to this problem is to add a measure of confidence but adding this to sequential pattern is not straightforward because it contains multiple items. An alternative approach is the sequential rule (SR) mining has

three drawbacks. First, rules may have many variations with different item ordering. Second, the ratings by the algorithms show differences between the rules and their variations. Third, rules have less chance for making prediction because of specific rules.

The proposed work in this paper overcomes these issues by introducing the partially-ordered sequential rules (POSR), where items are unordered. The algorithms RuleGrowth and TRuleGrowth proposed for mining partially-ordered sequential rules find rules by applying constraints. The constraint is to find rules with a sliding window because users often wish to discover rules within the maximum amount of time. To find rules only frequent items are extracted the infrequent items are eliminated using the fuzzy logic technique. Linguistic terms and membership functions are used removing those infrequent items.

This paper is organized as follows: Section 2 presents the related works on sequential rule mining and association rule. Section 3 describes the proposed work. Section 4 concludes the proposed work with the observations.
RELATED WORKS

Several algorithms and techniques have been proposed and used for the sequential rule and association rule mining. In [7] Phillippe Fournier Viger et al. proposed an algorithm named TNS (Top k-Non redundant Sequential rule) avoids the generation of redundant sequential rule based on the depth first search strategy which shows excellent performance and scalability. The new data structure named CMAP (Co- occurrence MAP) is suggested by Antonio Gomariz et al. in [10] to store co-occurrence information to overcome the performance bottleneck problem of vertical representation in sequential pattern mining. CMAP is used to prune the candidate generation in three state-of- the-art vertical algorithms namely SPADE, SPAM and GSP. Ted Gueniche in [11] suggested a framework called SPMF (Sequential Pattern Mining Framework) is a cross platform library in java for discovering patterns. Command line interface and graphical interface that are offered by the sequential pattern mining framework performs fast testing. Algorithms are framed during the source code version and released during the release version are the two versions in SPMF. Predicting the sequential data is research

problem with many applications like stock market prediction and web link recommendation. Instead of using mining sequential rules for large sequences, partially-ordered sequential rule is new approach in [9] by Phillippe Fournier Viger et al. this improves the accuracy of prediction and the matching rate. In [6] Phillippe Fournier Viger et al. proposed an algorithm CMRules scans the sequence database and finds all the association rules by applying the algorithm of association rule mining algorithms like Apriori. Original database also scanned for calculating the sequential support and sequential confidence for each association rule. Finally the association rules are eliminated (SeqSup(r) < minSeqSup and SeqConf(r) < minSeqConf) and return the remaining set of rules. David Lo et al. in [17] focused on recovering rules from the execution traces. Propositional rule mining algorithm is used which enables the mining to be complete by avoiding exponential blowup. The results produced by this approach are efficient and very effective.

A syntactic characterization of a non-redundant sequential rules is investigated and proposed in [16] by David Lo et al. built on the representative patterns. A rule is said to be redundant if the support and confidence values are same. CNR (Compressed set of Non-Redundant) rule mining algorithm is proposed based on the definition of configuration key that improves the runtime and compactness of the rule. Minqing Hu et al. focused on extracting the opinion features from the benefits and drawbacks in [13]. For this purpose a language pattern based approach is proposed and the patterns are generated from the CSR (class sequential rules). The class sequential rule is different from the classic sequential rule where CSR has fixed class showing the implication as X Y. Jay Ayres et.al, introduced new efficient algorithm in [3] named SPAM (Sequential Pattern Mining). This algorithm uses the depth first strategy and uses vertical bitmap data allowing for simple and efficient counting. The lexicographic tree is traversed for sequences and the pruning methods are used to reduce the search space. The problem of finding rules by relating the patterns to the time series is considered in [5] by Gautham Das et al. Adaptive methods are described based on discretizing the sequence in the database by resembling the methods as vector quantization. Subsequence are first formed by sliding window and based on the similarities these subsequence are clustered. Rakesh Agrawal et al. in [1] presented an efficient template algorithm. The algorithm generates all significant rules between the items which incorporates the buffer management, estimation echniques and pruning techniques. Rules are found with the minimum support and minimum confidence. In [2] two algorithms have been proposed AprioriSome and AprioriAll by Rakesh Agrawal et al. AprioriSome find patterns with minimum number of customers and the sequential pattern found by this is low. AprioriAll algorithm scales up linearly with the maximum number of customers.

Sherri K. Harms et al. in [12] provided a new approach MOWCATL (Minimal Occurrences with Constraints and Time

Lags). Separate antecedent and consequent inclusive constraints and window width are used by the MOWCATL approach to find sequential patterns and separated by the time lag. Algorithm returns the temporal rules with minimum confidence threshold values. Inge Jonassen et al. identified the conserved patterns form a set of unaligned protein sequences in [15] by proposing new methods. Those identified patterns are refined by guaranteed refinement algorithms to get only the flexible patterns and known motifs are recovered for PROSITE families. Assembling the frequent event which are close to each other in a database that are partially ordered is called as an episode predicts and describes the behaviour of the system. Two efficient WINEPI and MINEPI algorithms are proposed in [19] by Heikki Mannila finds all episodes and discovers only frequent episodes based on the minimal occurrences of the previous episode. In [21] Jian Pei et al. proposed a pattern- growth approach based on projection by converting the large sequence database into projected database and the frequent items are explored to develop the sequential patterns.

Mohammed J.Zaki reported that the SPADE algorithm works in the best way in [25] by splitting the larger problem into small sub-problems to solve them independently in the memory using the lattice search techniques and joined again using the join operations. In [24] Yanchang Zhao et al. presented impact oriented negative sequential rules to consider positive and negative correlation. SPAM (Sequential Pattern Mining) algorithm is used as a starting point. Two metrics such as contribution and impact have been considered to discover impact oriented rules. Innovative approach SOMAD (Service Oriented Mining for Antipattern Detection) proposed by Mathieu Nayrolles in
[20] to detect SOA antipattern. Strong association is detected and filtered using detection metrics. Phillippe Fournier Viger et al. in [8] used a RuleGrowth algorithm find rules based on the pattern growth approach. First find rules between items in the database then the expandleft and expandright procedure is followed by scanning the database.

Srivatsan Laxman et al. in [18] studied the algorithms of temporal data mining and presented the results regarding the pattern discovery methods. Investors in the stock market wish to predict the behaviour of the customers by understanding the relationship between items purchased by the customers from the market. Y.L Hsieh et al. in [14] understand the relationship by using data mining knowledge and techniques. Arthur Pitman in [22] focused on mining sequential navigation patterns from web. Spurious sequences are eliminated using the closed criteria. Tjorben Bogon et al. presented a new approach in
1. by combining the assistance functionalities to analyse the input and output. EDASIM tool is developed for input selection, preparation and validation to assist the output data. Maureen A.Sartor et al. developed a web based gene set called ConceptGen and mapping tool in [23]. It offers more concepts related to the biological systems.
  
  The previous research works offers limitations such as infrequent items leads to predict inaccurate data, rules have same support and confidence value rules generated are more and slow generation of rules. The proposed work overcomes
  
  these issues by using the fuzzy logic technique that clearly removes all those infrequent items and provides fast prediction with accurate data.
PROPOSED WORK
Fuzzy logic is the technique used to remove those infrequent items from the database with linguistic terms and membership function. Linguistic terms is the process of converting the variable into words and the membership function represents the data set using the Boolean value.

Weight value assigned to the association rule found to make the process of predicting easier. Best quality rules alone extracted finally by using those weight value reduction from larger rules.

Input: transaction database

Output: frequent items Start

Step 1: database is taken as the input

Step 2: the database is classified based on the attribute range values

Step 3: after classification the variables are converted into words using the linguistic term as low, middle and high

Step 4: the membership function with the Boolean value 0 and 1 is used to again classify the data set as low, high and medium by converting the linguistic terms into Boolean variables.

Step 5: removes the infrequent from the data set

Step 6: the association rule forming procedure proceeds with frequent items are further it can be reduced by using the weight to form best quality rules.

Stop
CONCLUSION

The two algorithms RuleGrowth and TRuleGrowth for the proposed work of partially ordered sequential rule find rules with excellent scalability and extension of the RuleGrowth algorithm is the TRuleGrowth which accepts the sliding window constraint. Eliminating the infrequent items before generating the sequential rules so that rules are found only with the frequent items. Using fuzzy logic infrequent items are removed which improves the performance and also the prediction accuracy. Weight value assigned to the association rules also improves the accuracy of the predicting data. Best quality rules only extracted finally which increases the speed of the process. In future it can be extended to predict particular value from the set of predicted data. For that obtained data threshold value can be given for the weight value assigned for the association rule as minimum support threshold value and maximum confidence threshold value. Y assigning threshold value the very particular data can be obtained.

REFERENCES

R. Agrawal, T. Imielminski, and A. Swami, Mining association rules between sets of items in large databases, in Proc. 13th ACM SIGMOD Int. Conf. Manage. Data, 1993, pp. 207216.
R. Agrawal and R. Srikant, Mining sequential patterns, in Proc. 11th International Conference of Data Engineering., 1995, pp. 314.
J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, Sequential Pattern mining using a bitmap representation, in Proc. 8th ACM

Int. Conf. Knwl. Discovery Data Mining, 2002, pp. 429435.
T. Bogon, I. J. Timm, A. D. Lattner, D. Paraskevopoulos, U. Jessen, M. Schmitz, S. Wenzel, and S. Spieckermann, Towards assisted input and output data analysis in manufacturing simulation:The EDASIM approach, in Proceedings of Winter Simulation Conference, 2012, pp. 257269.
G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth, Rule discovery from time series, in Proc. 4th ACM Int. Conf. Knowl. Discovery Data Mining, 1998, pp. 1622.
P. Fournier-Viger, U. Faghihi, R. Nkambou, and E. Mephu Nguifo,

CMRules: An efficient algorithm for mining sequential rules common to several sequences, Knowledge Based System, Elsevier, vol. 25, no. 1, pp. 6376, 2012.
P. Fournier-Viger and V. S. Tseng, TNS: Mining Top-K Nonredundant sequential rules, in Proc. 28th Symp. Appl. Comput., ACM, 2013, pp. 164166.
P. Fournier-Viger, R. Nkambou, and V. S. Tseng, RuleGrowth: Mining sequential rules common to several sequences by patterngrowth, in Proc. 26th ACM Symp. Appl. Comp., 2011, pp. 954959.
P. Fournier-Viger, T. Gueniche, and V. S. Tseng, Using partiallyordered sequential rules to generate more accurate sequence prediction, in Proc. 8th Int. Conf. Adv. Data Mining Appl., Springer, 2012, pp. 431442.
P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas, Fast vertical sequential pattern mining using co-occurrence information, in Proc. 18th Pacific-Asia Conf. Knowl. Discovery Data Mining, Springer, 2014, pp. 4052.
P. Fournier-Viger, A. Gomariz, A. Soltani, T. Gueniche, C. W. Wu, and V. S. Tseng, SPMF: A Java open-source pattern mining library, Journal of Machine Learning Research., vol. 15, pp. 3389 3393, 2014.
S. K. Harms, J. Deogun, and T. Tadesse, Discovering sequential association rules with constraints and time lags in multiple sequences, in Proc. 13th Int. Symp. Method. Intell. Syst., Springer, 2002, pp. 373376.
H. Minqing and B. Liu, Opinion feature extraction using class sequential rules, presented at the AAAI Spring Symp. Computational Approaches Analysing Weblogs, Palo Alto, USA, Mar. 2006.
Y. L. Hsieh, D.-L. Yang, and J. Wu, Using data mining to study upstream and downstream causal relationship in stock market, in Proc. 9th Joint Conf. Inf. Sc., ACM, 2006.
I.Jonassen, J. F. Collins, and D. G. Higgin, Finding flexible patterns in unaligned protein sequences, Protein Sci., Journal of Computational Biology, vol. 4, no. 8, pp. 15871595, 1995.
D. Lo, S.-.C. Khoo, and L. Wong, Non-redundant sequential rulesTheory and algorithm, Inf. Syst., Esevier, vol. 34, no. 4/ 5, pp. 438 453, 2009.
D. Lo, G. Ramalingam, V. P. Ranganath, and K. Vaswani,

Mining quantified temporal rules: Formalism, algorithms, and evaluation, in Proc. 16th Working Conf. Reverse Eng., 2009, pp. 6271.
S. Laxman and P. Sastry, A survey of temporal data mining, Sadhana, vol. 3, pp. 173198, 2006.
H. Mannila, H. Toivonen, and A. I. Verkano, Discovery of frequent episodes in event sequences, Data Mining Knowl. Discovery, vol. 1, no. 3, pp. 259289, 1999.
M. Nayrolles, N. Moha, and P. Valtchev, Improving SOA antipatterns detection in service based systems by mining execution traces, in Proc. 20th IEEE Working Conf. Reverse Eng., 2013, pp. 321330.
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, Mining sequential patterns by pattern-growth: The prefixspanapproach, IEEE Transaction of Knowledge and Data Engineering, vol. 16, no. 10, pp. 117, Oct. 2004.
Pitman and M. Zanker, An empirical study of extracting multidimensional sequential rules for personalization and recommendation in online commerce, in Proc. 10th International Conference on Wirtschaftinformatik, 2011, pp. 180189.
M. A. Sartor, V. Mahavisno, V. G. Keshamouni, J. Cavalcoli, Z. Wright, A. Karnovsky, R. Kuick, H. V. Jagadish, B. Mirel, T. Weymouth, B. Athey, and G. S. Omenn, ConceptGen: A gene set enrichment and gene set relation mapping tool, Bioinformatics, vol. 26, no. 4, pp. 456463, 2010.
Y. Zhao, H. Zhang, L. Cao, C. Zhang, and H. Bohlscheid,

Mining both positive and negative impact-oriented sequential rules from transactional data, in Proc. 13th Pacific-Asia Conf. Knowl. Discovery Data Mining, Springer, 2009, pp. 656 663.
M. J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learning, ACM, vol. 42, no. 12, pp. 3160, 2001.

An Efficient Way to Handle the High Dimensional Problem with Fuzzy Association Rule

Leave a Reply