- Open Access
- Total Downloads : 46
- Authors : Cina Mathew
- Paper ID : IJERTCONV7IS05006
- Volume & Issue : NCACCT – 2019 (Volume 7 – Issue 05)
- Published (First Online): 08-05-2019
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
A Survey on Privacy Preserving Data Mining Techniques
Kristu Jyothi College of management and Technology
Abstract:- The emerging privacy concern has become a major obstacle in storing and sharing of data. The proliferation of data can be useful, but it must be performed in a way that preserves user's privacy. This is not straightforward, because the proliferated data need to be protected against several privacy threats. Various algorithms have been designed for privacy-preserving data mining, that can be classified into three categories i.e., privacy by policy, privacy by statistics, and privacy by cryptography. We review algorithms like; Randomization, k-anonymization, and distributed privacy- preserving data mining etc., derive insights on their operation, and compare their advantages and disadvantages. We also provide a study of the computational and hypothetical boundaries involved with privacy-preservation over high dimensional data sets.
Keywords: PPDM, Anonymization, Perturbation, Cryptography
Recent years have seen unprecedented growth in applicability of Computer Science in day-to-day activities. Organizations, community and individuals show an augmented trend of storing their data in cloud. The huge amount of data collected can be used for analyzing trends of markets and individual or society. Data mining activities involve extracting knowledge from this massive pool of data. The sensitive information about the individuals may be disclosed creating ethical or privacy issues. Many individual therefore dont share their data publicly, creating data unavailability. Privacy of individual should not be compromised under any case. PPDM has gained popularity so as to address the privacy concerns while data mining is being carried out .
PRIVACY PRESERVING DATA MINING [PPDM] Privacy preserving data mining is an area of data mining
that is used to protect sensitive information from unsolicited or unsanctioned disclosure. It consists of techniques and methodologies of data mining, which would be used to fulfil privacy constraint and it also maintains the utilization of data for data mining. Privacy preserving data mining is solely based on description of privacy that defines the different attributes of data. It depicts which attribute is sensitive and hence required to ensure confidentiality constraint [2, 3]. The block diagram of PPDM is shown in figure;
Figure 1: Blockdiagram of PPDM
PRIVACY PRESERVING DATA MINING (PPDM)
In this section we focus on number of methods that have recently been proposed for privacy preserving data mining. A survey on several privacy preserving data mining technologies are studied in  and the pros and cons of these technologies are analysed. In this paper, we analyse an overview of the state-of-the-art in privacy preserving data mining. In order to perform the privacy preservation most methods for computations use some form of transformation on the data. Typically, such methods reduce the granularity of representation in order to reduce the privacy. This reduction in granularity results in some loss of effectiveness of data and mining algorithms. This is the natural trade-off between information loss and privacy. Methods such as k- anonymity, l-diversity, t-closeness, classification, association rule mining are all designed to prevent identification to preserve the privacy of sensitive information. The Application of several techniques for preserving privacy on experimental dataset is illustrated in and their effects on the results are revealed.
Anonymization methods have emerged as an effective means to achieve privacy preservation. In these methods some part of the original data, for instance, through generalization, compression, etc., is transformed and let the transformed data cannot be combined with other information to reason about any personal privacy information. The implementation of privacy preservation mainly concentrates on two aspects: (1) How to ensure that the data been used without privacy disclosure? (2) How to make the data to be better utilized? So, the problem to be solved urgently is a trade-off between privacy preservation and data utilization.
Data Perturbation introduces random perturbation to individual values to preserve privacy before data are
published. These techniques are statistically based methods that seek to protect confidential data by adding random noise to confidential, numerical attributes, thereby protecting the original data. Data Perturbation techniques are not encryption techniques, where the data is first modified, then (typically) transmitted, and then received, decrypted back to the original data. But the intent of these techniques is to allow authentic users the capability to access important aggregate statistics (such as mean, correlations, etc.) from the entire database while protecting the individual identify of a record.
Distributed Privacy Preservation
In many cases, individual entities may wish to derive aggregate results from data sets which are partitioned across these entities. For this purpose, Privacy preserving
distributed data mining is used that aims to design secure protocols which allow multiple parties to conduct collaborative data mining while protecting the privacy of their data. Such partitioning may be horizontal (when the records are distributed across multiple entities) or vertical (when the attributes are distributed across multiple entities). In this the individual entities may consent to limited information sharing with the use of a variety of protocols and may not desire to share their entire data sets. The whole effect of such methods is to preserve privacy for each individual entity, while deriving aggregate results over the entire data.
The advantages and limitations of some of the PPDM techniques are tabulated in Table 1.
Secrecy of data are preserved.
More information loss
Preserves various attributes independently.
Original data values cannot be regained.
Distributed Data Mining
It is an efficient technique. Simple and supports large databases.
Minimal information loss.
Data encryption and decryption using keys is accurate and improves security.
Complexity and number of keys are proportional.
Table1: Advantages and limitations of PPDM techniques
COMPARISON OF RECENT RESEARCHES ON PPDM
Table 2 shows the all available PPDM methods for data mining to secure the data set. When we are transferring or
exchanging the data set with fair enough security and also these methods ensures the various approaches which are being used to obtain the cryptosystem.
Year of Publication
Technique Used for PPDM
Result and Accuracy
Y.Lindell, B.Pinkas 
Sensitive data are encrypted in different levels using keys.
The complexity increases when more than a few keys are involved. Also, it does not hold good for large databases.
Information about an individual contained in a release cannot be distinguished from at least k-1 individuals information.
Privacy is Preserved at greater levels.
J. Vaidya and C. Clifton
Data are vertically distributed into segments.
HillolKargupta, Souptik Datta, Qi Wang and Krishnamoorthy Sivakumar
Data Privacy is preserved by adding random noise.
Randomization Techniques are used to generate random matrices.
Charu C Aggarwal, Philip S. Yu
Condenses the data into multiple groups of predefined size. The different records are not distinguishable.
The use of pseudo-data no longer requires to redesign the data mining algorithms, since they have the original format.
SlavaKisilevich, Lior Rokach, Yuval Elovici, BrachaShapira
Anonymization uses generalization and suppression
for data hiding.
Background knowledge and
Homogeneity attacks of K-Anonymity algorithm
do not preserve sensitivity of an individual.
P.Deivanai, J. JesuVedhaNayahi andV.Kavitha1
Hybrid Approach is a combination of different techniques
which combine to give an integrated result.
It uses Anonymization and
suppression to preserve data.
George Mathew, Zoran Obradovic
An approach which is technical, methodological and should give judgemental knowledge.
A graph-based framework for preserving patients sensitive information.
M. N. Kumbhar and R. Kharat
Association Rule By Horizontal and Vertical Distribution
Different approaches in the field of Association rule is reviewed.
of all models is analyzed in terms of privacy, security and communications.
Savita Lohiya and LataRagha
A combination of K- Anonymity and Randomization.
It has more accuracy and original data can be regained.
George Mathew, ZoranObradovic
Distributed Privacy Preserving
Provides an algorithm to collaboratively build a better decision-making model
It improves the overall accuracy of a classification
Shweta Taneja, Shashank Khanna, SugandhaTilwalia,
Cryptography, Anonymization, Perturbation
A tabular comparison of work done by different methods.
Cryptography and Random Data Perturbation methods perform better than the other existing methods.
M. Antony Sheela, K. Vijayalakshmi
Partition Based Perturbation
Applied techniques on the vertically partitioned data.
When the threshold value is reached,the individual data is changed.
Anonymization based techniques used to preserve privacy by reducing the granularity.
Wasnt that perfect, so opted differential privacy.
Privacy is the major concern to protect the sensitive data in today's world. People are very much anxious about their sensitive information which they dont want to share. In this paper our survey focuses on the existing literature present in the field of Privacy Preserving Data Mining. The primary objective of PPDM is promoting algorithm to hide sensitive data or offer privacy in data mining. From our analysis, we have found that that there is no single PPDM technique in existence that outshines every other technique with relation to each possible criterion such as use of data, performance, difficulty, compatibility with procedures for data mining, and so on. All methods perform in a different way depending on the type of data as well as the type of application or domain. But still from our analysis, we can conclude that Distributed data mining and Random Data Perturbation methods perform better than the other existing methods.
Alpa Shah and Ravi Gulati, Privacy Preserving Data Mining: Techniques, Classification and Implications – A Survey, International Journal of Computer Application, Vol. 137 No 12, March 2016, 40-46.
AlShwaier and A. Z. Emam, Data Privacy OnEHealth Care System, International Journal of Engineering, Business and Enterprise Applications, (2013).
Xu, Yang, Tinghuai Ma, Meili Tang, and Wei Tian. "A survey of privacy preserving data publishing using generalization and suppression." Appl. Math 8, no. 3, pp. 1103-1116, (2014).
Y.Li, B.Vinzamuri, C.K.Reddy, Constrained elastic net based knowledge transfer for health care information exchange, Data Mining Knowl. Discov. 29 (4) (2015) 10941112.
Jian Wang, YongchengLuo ; Yan Zhao ; Jiajin Le, 2009,A Survey on Privacy Preserving Data Mining, First International Workshop on Database
Grljevic, O., Bosnjak, Z., Mekovec, R. 2011, Privacy preserving in data mining – Experimental research on SMEs data, IEEE 9th International Symposium on Intelligent Systems and Informatics (SISY), 2011 , pp- 477 481.
Y. Lindell, B.Pinkas, Privacy preserving data mining, in proceedings of Journal of Cryptology, 5(3), 2000.
A Review Paper, in proceedings of 978-1-46735116- 4/12/$31.00_c, IEEE 2012.
L. Sweeney, "k-Anonymity: A Model for Protecting Privacy, in
S. Lohiya and L. Ragha, Privacy Preserving in Data Mining
proceedings of Int'l Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2002.
Using Hybrid Approach, in proceedings of 2012 Fourth International Conference on Computational Intelligence and
J. Vaidya and C. Clifton, Privacy preserving association rule
Communication Networks, IEEE 2012.
mining in vertically partitioned data, in The Eighth ACM SIGKDD International conference on Knowledge Discovery and
Martin Beck and Michael MarhÂ¨ofer, Privacy-Preserving Data Mining Demonstrator, in proceedings of 16th International
Data Mining, Edmonton, Alberta, CA, July 2002, IEEE 2002.
Conference on Intelligence in Next Generation Networks, IEEE
H. Kargupta and S. Datta, Q. Wang and K. Sivakumar, On the Privacy Preserving Properties of Random Data Perturbation
George Mathew, ZoranObradovic, "Distributed Privacy Preserving
Techniques, in proceedings of the Third IEEE International
Decision System for Predicting Hospitalization Risk in Hospitals
Conference on Data Mining, IEEE 2003.
C. Aggarwal , P.S. Yu, A condensation approach to privacy
with Insufficient Data", in proceedings of 2012 11th InternationalConference on Machine Learning and Applications
preserving data mining, in proceedings of International
Yuan Zhang , Sheng Zhong, "A privacy-preserving algorithm for
Conference on Extending Database Technology (EDBT), pp. 183 199, 2004. 746
distributed training of neural network ensembles", Neural Comput&Applic (2013) 22
A. Machanavajjhala, J.Gehrke, D. Kifer and M.
Shweta Taneja, Shashank Khanna, SugandhaTilwalia, Ankita, "A
Venkitasubramaniam, "I-Diversity: Privacy Beyond k- Anonymity", Proc. Int'l Con! Data Eng. (ICDE), p. 24, 2006
Review on Privacy Preserving Data Mining : Techniques and Research Challenges", International Journal of Computer Science
SlavaKisilevich, LiorRokach, Yuval Elovici, BrachaShapira,
and Information Technologies, Vol. 5 (2) , 2014, 2310-2315
Efficient Multi-Dimensional Suppression for K-Anonymity, inproceedings of IEEE Transactions on Knowledge and Data
Abel N. Kho, John P. Cashy, Karthryn L. Jackson, Adam R. Pah, SatyenderGoel, JornBoehnke, John Eric Humphries, Scott Duke
Engineering, Vol. 22, No. 3. (March 2010), pp. 334-347, IEEE
Kominers, Bala N. Hota, Shanon A. Sims, Bradley A. Malin,
P.Deivanai, J. JesuVedhaNayahi and V.Kavitha, A Hybrid Data
Dustin D. French, Theresa L. Walunas, David O. Meltzer, Erin O. Kaleba, Roderick C. Jones, Wiliam L. Galanter,"Design and
Anonymization integrated with Suppression for Preserving Privacy in mining multi party data in proceedings of International
implementation of a privacy preserving in electronic health record linkage tool in chicago" journal of American Medical Informatics
Conference on Recent Trends in Information Technology, IEEE
G. Mathew, Z. Obradovic, A PrivacyPreserving Framework for
V. Baby , N. Subhash Chandra , " Privacy-Preserving Distributed Data Mining Techniques: A Survey ", International Journal of
Distributed Clinical Decision Support, in proceedings of 978-1-
Computer Applications (0975 8887) Volume 143 No.10, June
61284852-5/11/$26.00 Â©2011 IEEE.
A. Parmar, U. P. Rao, D. R. Patel, Blocking based approach for
M. Antony Sheela, K. Vijayalakshmi,"Partition Based Perturbation
classification Rule hiding to Preserve the Privacy in Database , in
for Privacy Preserving Distributed Data Mining" CYBERNETICS
proceedings of International Symposium on Computer Science and Society, IEEE 2011.
AND INFORMATION TECHNOLOGIES ,2017Volume17, No 2
Review of different privacy preserving techniques of PPDM
M. N. Kumbhar and R. Kharat, Privacy Preserving Mining of
Association Rules on horizontally and Vertically Partitioned Data: