Feasibility of Fuzzy Clustering for Improving the Objective Function of K-Means Clustering

Shubham Patil; Hiralal Solanki

doi:10.17577/IJERTV11IS070231

Volume 11, Issue 07 (July 2022)

Feasibility of Fuzzy Clustering for Improving the Objective Function of K-Means Clustering

DOI : 10.17577/IJERTV11IS070231

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 133
Authors : Shubham Patil , Hiralal Solanki
Paper ID : IJERTV11IS070231
Volume & Issue : Volume 11, Issue 07 (July 2022)
Published (First Online): 05-08-2022
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Feasibility of Fuzzy Clustering for Improving the Objective Function of K-Means Clustering

Shubham Patil

P.G. Student

G. H. Raisoni Institute of Engineering & Management Jalgaon M.S. India

Hiralal Solanki

Assistant Professor

H. Raisoni Institute of Engineering & Management Jalgaon M.S. India

Abstract:- Clustering is a division of data items into groups of similar objects. Each group, called cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. The k-Means clustering method is one of the classical and simplest methods for data clustering. It is one of the most widely used methods in practical implementations because of its simplicity. But sometimes the resulting membership values do not always correspond well to the degrees of belonging of the data, so to overcome the problems in hard K-means (KM) clustering, the Fuzzy K-Means (FKM) clustering approach is proposed. fuzzy clustering forms clusters such that data object can belong to more than one cluster based on their membership levels, In the existing system subtraction structure objective function is used, it introduced saw tooth nature in objective function. In this paper feasibility of fuzzy partition matrix of objective function in k-means clustering is proposed, it provides smoothness in saw tooth nature in objective function, which is main reason for the improving the objective function.

Keywords : Fuzzy clustering, fuzzy partition matrix
1. INTRODUCTION
  
  Data clustering is one of the important data mining metho
  
  e same class and most dissimilarity between different classes. The large number of data objects greatly increases the challenges of comprehending and interpreting the resulting mass of data. A first step toward addressing the challenge is the use of clustering techniques. Clustering is a common descriptive task in which one seeks to identify a finite set of categories or clusters to describe the data. It groups the data objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels.
  
  The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminate analysis (supervised learning).
  
  The clustering in data mining becomes very difficult because of very large datasets with many attributes of different types. This causes to have unique computational requirements on appropriate clustering algorithms. The main concern for most of clustering algorithms is their need to know the number of clusters for which to look. Since the clustering is an unsupervised way of grouping, the user has no previous knowledge about the actual number of clusters. Apparently, dividing the dataset into smaller or larger clusters will result in merging some separate clusters or breaking down some compact ones. The process of finding an optimal number of clusters is called cluster validity
  
  In order to achieve the main aim of fuzzy k-means clustering, the drawbacks of traditional k-means clustering are studied. k-means clustering clusters the data in a crisp sense which results into empty clusters.
  
  Whereas, the proposed Fuzzy K-Means (FKM) clustering uses the membership partition matrix grades in order to express ambiguity in the assignment of data point to clusters. The proposed partition based fuzzy k-means employs fuzzy measures as the basis for membership matrix calculation and for cluster centers identification. The fuzzy measures applied to clustering helps to improve the results of fuzzy k- means clustering.
2. RELATED WORK
  
  Fuzzy logic introduced by Zadeh [12] may be viewed as an attempt at formalization of two remarkable human capabilities. The capability to perform a wide variety of physical and mental tasks without any measurements and any computations. In many real world application areas, knowledge is represented by in terms of imprecise linguistic words from a natural language. A linguistic variable means a variable whose values are words or sentences in a natural language or artificial language. For example, honesty is linguistic variable. The linguistic values of this variable can be extremely honest, not honest, sometimes honest, and very honest.
  
  Fuzzy logic is the way of representing and manipulating data that is not exact, but rather uncertain [11]. Uncertainty can be manifested in many forms: it can be fuzzy (not sharp, unclear, imprecise, approximate), it can be vague (not specific, amorphous), it can be ambiguous (too many choices, contradictory), it can be of the form of ignorance (dissonant, not knowing something), or it can be a form due to natural variability (conicting, random, chaotic, unpredictable).
3. LITERATURE SURVEY
  
  Figure 1 shows the structure of literature survey. There are two main domains of data mining Clustering and classification. The clustering is unsupervised learning and classification is supervised learning. Clustering is divided into two categories namely hard clustering and fuzzy clustering. Hard clustering clusters the data in a crisp sense. It means each data object can be a member of one and only one cluster at a time. In Hard clustering there is always at least one object in each cluster. However, the empty clusters can be obtained if not a single object is allocated to a cluster during the assignment. The Fuzzy clustering assigns the data object to more than one cluster at a time.
  
  Figure 1: Literature Survey
  M. S. Yang, in [15], have proposed fuzzy relation based clustering. Fuzzy relation based clustering includes an N- step procedure by using the composition of fuzzy relations beginning with a reflexive and symmetric fuzzy relation R in X. The data set is partitioned into the number of cluster by equivalence relation. G. S. Liang et. al., in [24], have introduced cluster analysis based on fuzzy equivalence relation. The approach used is the distance measure between two trapezoidal fuzzy numbers is used to aggregate subjects linguistic assessments. The distance measure is used to characterize the interobjects similarity. The linguistic assessment is for attributes ratings to obtain the compatibility relation. Then a fuzzy equivalence relation based on fuzzy compatibility relation is constructed.
4. FEASIBILITY OF FUZZY CLUSTERING FOR IMPROVING THE OBJECTIVE FUNCTION OF K-
  
  MEANS CLUSTERING
  
  The proposed solution focuses on text clustering using fuzzy logic based clustering in order to facilitate and improve effectiveness in an conventional hard clustering approach. Fuzzy clustering is a partition based clustering scheme and is particularly useful when there are no apparent clear groupings in the data set [34]. Partitioning schemes provide automatic detection of cluster boundaries and in case of fuzzy clustering, these cluster boundaries overlap. Every individual data entity belongs to not one but all the clusters with varying degrees of membership.
  
  The proposed system preprocess the data, then the preprocessed data is given as an input to the conventional hard clustering algorithm and proposed fuzzy portioning algorithm. Finally, the cluster formation is done and the results of both the hard clustering and fuzzy clustering are compared.
  
  Hard partition is insufficient to represent many real situations. Therefore, a fuzzy clustering method is offered to construct clusters with uncertain boundaries. Hence, this method allows that one object belongs to some overlapping clusters to some degree.
  
  Fuzzy clustering is a partition based clustering scheme and is particularly useful when there are no apparent clear groupings in the data set [34]. Partitioning schemes provide automatic detection of cluster boundaries and in case of fuzzy clustering, these cluster boundaries overlap. Every individual data entity belongs to not one but all the clusters with varying degrees of membership.
  
  Conditions for a fuzzy context partition matrix, are suggested by relaxing the constraint, this constraints suggest that each target (outlier) word is assigned to at least one of the fuzzy sense cluster with membership greater than zero
  
  [0, 1], 1 , 1 ,
  
  , > 0,
5. CONCLUSION AND FUTURE WORK
  
  The Proposed FKM formulates the objective function in terms of improving the membership assignments of an object. It assigns fuzzy memberships to data object and
  
  0 <
  
  =1
  
  < N, 1
  
  updates the centre of cluster according to the assigned
  
  FCM calculates the distance between cluster center to the data point and assigns flexible membership to each data point. More the data is near to the cluster center more is its membership towards the particular cluster center. Summation of membership of each data point should be equal to one. FCM uses fuzzy portioning approach, fuzzy partitioning is carried out through the iterative procedure
  
  memberships. The assigned memberships play a role as
  
  weight values which represent the degree to which data object belongs to more than one clusters. The degree of belongingness depends on the selection of Fuzziness Factor. The Proposed Fuzzy K-Means significantly differ depending on the choice of Fuzziness Factor. In hard K- Means a set of k initial cluster centers is chosen arbitrarily
  
  that updates the membership and cluster center
  
  equations 5.1 d and 5.1 e respectively.
  
  N
  
  c j by
  
  and each object is then assigned to the center closest to it,
  
  and the centers are recomputed. This is repeated until the process stabilizes which takes more execution time and memory. On the other hand, in Proposed Fuzzy K-Means
  
  m
  
  uij xi
  
  approach though it assigns membership to an object which
  
  , = 1,2, . ,
  
  c
  
  i 1
  
  u
  
  j N
  
  m
  
  ij
  
  i
  
  1
  
  1
  
  is inversely related to the relative distance of the object to
  
  cluster centre.
6. BIBLIOGRAPHY

[1] J. Han and M. Kamber, Data mining: concepts and techniques," 2001.

Where

uij

2

C || xi c j || m1

k 1 || xi ck ||

[2] C. C. Aggarwal and C. K. Reddy, Data clustering: algorithms and applications.

[3] P. Berkhin, Survey of clustering data mining techniques," San Jose, CA, 2002,

[4] O. M. Jafar and R. Sivakumar, A comparative study of hard and

n = number of data point

= is the cluster center

m = fuzziness index m [1, ]

c = number of cluster center

= member of data to cluster

|| || = Euclidean distance

4.1 Architecture

The architecture of the proposed system is shown in Figure

2. Input to the proposed system is the text data set [36].

Figure 2: Architecture of Proposed System

In fuzzy clustering, a data object will have an associated degree of membership for each cluster, indicating the strength of its association in that cluster. It iteratively update the membership values of a data object with the pre-defined number of clusters. Thus, a data object can be the member of all clusters with the corresponding membership values. The process of calculation of cluster centers and the assignment of points to these centers is repeated until the cluster centers stabilize [37].

fuzzy data clustering algorithms with cluster validity indices," in

Proceedings of the Elsevier International

[5] J. Daxin, C. Tang, and A. Zhang, Cluster analysis for gene expression data: a survey", IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11, pp. 1370.

[6] A. Baraldi and P. Blonda, A survey of fuzzy clustering algorithms for pattern recognition," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, no. 6, pp. 778-785, october 1998.

[7] J. Hu, C. Xiong, J. Shu, X. Zhou, and J. Zhu, A novel text clustering method based on tgsom and fuzzy k-means" , in Proceedings of the 2009 First International Workshop on Education Technology and Computer Science – Volume 01, ser. ETCS '09, 2009, pp. 26-30.

[8] C. Wu, C. Ouyang, L. nan Chen, and L. Lu, A new fuzzy clustering validity index with a median factor for centroid based clustering", IEEE Transactions on Fuzzy Systems, vol. 23, no. 3, June 2015.

[9] L. Rokach and O. Maimon, Clustering methods," in Data mining and knowledge discovery handbook. Springer, 2005, pp. 321-352.

[10] N. A. M. Isa, S. Salamah, U. K. Ngah et al., Adaptive fuzzy moving k-means clustering algorithm for image segmentation," IEEE Transactions on Consumer Electronics, vol. 55, no. 4, pp. 2145-2153, 2009.

[11] T. J. Ross, Fuzzy logic with engineering applications, 2nd ed. John Wiley & Sons, 2009.

[12] L. A. Zadeh, Fuzzy sets," Information and Control, vol. 8, pp.

338-353, 1965.

[13] Zadeh, Is there a need for fuzzy logic?" Information sciences, vol.

178, no. 13, pp. 2751-2779, 2008.

[14] L. Zadeh, C. Negoita, and H. Zimmermann, Fuzzy sets as a basis for a theory of possibility," Fuzzy sets and systems, vol. 1, pp. 3- 28, 1978.

[15] M. S. Yang, A survey of fuzzy clustering," Mathematical and Computer modelling, vol. 18, no. 11, pp. 1-16, 1993.

[16] Q. Ni, Q. Pan, H. Du, C. Cao, and Y. Zhai, A novel cluster head selection algorithm based on fuzzy clustering and particle swarm optimization," IEEE/ACM Transaction on Computational Biology and Bioinformatics, no. 99, pp. 1{9, 2015.

[17] C. Fraley and A. E. Raftery, How many clusters? which clustering method? Answers via model-based cluster analysis," The computer journal, vol. 41, no. 8, pp. 578-588, 1998.

[18] S. Ayramo and T. Karkkainen, Introduction to partitioning based clustering methods with a robust example," Reports of the Department of Mathematical Information Technology Series C. Software and Computational Engineering, 2006.

[19] K. Tapas, D.M.Mount, N. Netanyahu, C. Piatko, R. Silverman, and A.Y.Wu, An efficient k-means clustering algorithm: analysis and implementation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.

[20] I. V. Cadez, S. Ga_ney, and P. Smyth, A general probabilistic framework for clustering individuals and objects," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2000, pp. 140-149.

[21] P. IndiraPriya and D. Ghosh, A survey on different clustering algorithms in data mining technique," International Journal of Modern Engineering Research (IJMER), vol. 3, no. 1, pp. 267-274, 2013.

[22] M. W. Berry and M. Castellanos, Survey of Text Mining:Clustering, Classification and Retrieval, 2nd ed. Springer, 2007.

[23] N. H. Park and W. S. Lee, Statistical grid-based clustering over data streams," ACM SIGMOD Record, vol. 33, no. 1, pp. 32-37, 2004.

[24] G.-S. Liang, T.-Y. Chou, and T.-C. Han, Cluster analysis based on fuzzy equivalence relation," European Journal of Operational Research, vol. 166, no. 1, pp. 160-171, June 2004.

[25] M.-S. Yang and H.-M. Shih, Cluster analysis based on fuzzy relations," Fuzzy Sets and Systems, vol. 120, no. 2, pp. 197{212, 2001.

[26] E. G. Mansoori, Frbc: a fuzzy rule-based clustering algorithm," IEEE Transactions on Fuzzy Systems, vol. 19, no. 5, pp. 960-971, 2011.

[27] M. Delgado, A. F. G_omez-Skarmeta, and F. Martin, \A fuzzy clustering-based rapid prototyping for fuzzy rule-based modeling," IEEE Transactions on Fuzzy Systems, vol. 5, no. 2, pp. 223-233, 1997.

[28] Y. Lu, T. Ma, C. Yin, X. Xie, W. Tian, and S. Zhong,

\Implementation of the fuzzy c-means clustering algorithm in meteorological data," International Journal of Database Theory and Application, vol. 6, no. 6, pp. 1-18, 2013.

[29] P. Lingras and G. Peters, Applying rough set concepts to clustering," in Rough Sets: Selected Methods and Applications in Management and Engineering. Springer, 2012,

[30] N. R. Pal and J. C. Bezdek, On cluster validity for the fuzzy c- means model," IEEE Transactions on Fuzzy Systems, vol. 3, no. 3, pp. 370-379, 1995.

[31] P. Maji and S. Paul, Rough-fuzzy clustering for grouping functionally similar genes from microarray data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 2, pp. 286-299, March 2013.

[32] O. Sutton, Introduction to k nearest neighbour classification and condensed nearest neighbour data reduction," University lectures,

University of Leicester, 2012

[33] R.-P. Li, M. Mukaidono, and I. B. Turksen, A fuzzy neural network for pattern classification and feature selection," Fuzzy Sets and Systems, vol. 130, no. 1, pp. 101-108,

[34] L. Zhu, F.-L. Chung, and S. Wang, Generalized fuzzy c-means clustering algorithm with improved fuzzy partitions," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 3, pp. 578-591, 2009.

[35] Y. Karali, D. Kodamasingh, and R. L. H. Behera, Hard and fuzzy clustering algorithms using normal distribution of data points: a comparative performance analysis," in International Journal of Engineering Research and Technology, vol. 2, no. 10 (October- 2013).

[36] Enron email dataset," https://www.cs.cmu.edu/ ./enron/ [Accessed on: May 2015].

[37] W. Pedrycz and H. Izakian, Cluster-centric fuzzy modeling," IEEE Transactions on Fuzzy Systems, vol. 22, no. 6, pp. 1585- 1597, December 2014.

[38] M. F. Porter, \An algorithm for suffix stripping," Program, vol. 14, no. 3, pp. 130{137, 1980.

[39] C. Silva and B. Ribeiro, The importance of stop word removal on recall values in text categorization," in Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 3, 2003, pp. 1661-1666.

[40] M. Ramaswami and R. Bhaskaran, A study on feature selection techniques in educational data mining," journal of computing, vol. 1, no. 1, pp. 7-11, December 2009.

[41] S. Beniwal and J. Arora, Classification and feature selection techniques in data mining," International Journal of Engineering Research & Technology (IJERT), vol. 1, no. 6, pp. 1-6, 2012.

[42] A. K. Murugesan and B. J. Zhang, A new term weighting scheme for document clustering," in 7th International Conference on Data.

[43] J. W. Reed, Y. Jiao, T. E. Potok, B. Klump, M. T. Elmore, A. R. Hurson et al., Tf icf: A new term weighting scheme for clustering dynamic data streams," in proceedings of 5th IEEE International Conference on Machine Learning and Applications, 2006.

[44] G. Salton, A. Wong, and C.-S. Yang, A vector space model for automatic indexing," Communications of the ACM, vol. 18, no. 11, pp. 613-620, 1975.