A Survey on Privacy Preserving Data Mining Techniques

Cina Mathew

doi:10.17577/IJERTCONV7IS05006

NCACCT - 2019 (Volume 7 - Issue 05)

A Survey on Privacy Preserving Data Mining Techniques

DOI : 10.17577/IJERTCONV7IS05006

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 610
Total Downloads : 46
Authors : Cina Mathew
Paper ID : IJERTCONV7IS05006
Volume & Issue : NCACCT – 2019 (Volume 7 – Issue 05)
Published (First Online): 08-05-2019
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Survey on Privacy Preserving Data Mining Techniques

Cina Mathew

Assistant Professor

Kristu Jyothi College of management and Technology

Abstract:- The emerging privacy concern has become a major obstacle in storing and sharing of data. The proliferation of data can be useful, but it must be performed in a way that preserves user's privacy. This is not straightforward, because the proliferated data need to be protected against several privacy threats. Various algorithms have been designed for privacy-preserving data mining, that can be classified into three categories i.e., privacy by policy, privacy by statistics, and privacy by cryptography. We review algorithms like; Randomization, k-anonymization, and distributed privacy- preserving data mining etc., derive insights on their operation, and compare their advantages and disadvantages. We also provide a study of the computational and hypothetical boundaries involved with privacy-preservation over high dimensional data sets.

Keywords: PPDM, Anonymization, Perturbation, Cryptography

INTRODUCTION

Recent years have seen unprecedented growth in applicability of Computer Science in day-to-day activities. Organizations, community and individuals show an augmented trend of storing their data in cloud. The huge amount of data collected can be used for analyzing trends of markets and individual or society. Data mining activities involve extracting knowledge from this massive pool of data. The sensitive information about the individuals may be disclosed creating ethical or privacy issues. Many individual therefore dont share their data publicly, creating data unavailability. Privacy of individual should not be compromised under any case. PPDM has gained popularity so as to address the privacy concerns while data mining is being carried out [1].
PRIVACY PRESERVING DATA MINING [PPDM] Privacy preserving data mining is an area of data mining

that is used to protect sensitive information from unsolicited or unsanctioned disclosure. It consists of techniques and methodologies of data mining, which would be used to fulfil privacy constraint and it also maintains the utilization of data for data mining. Privacy preserving data mining is solely based on description of privacy that defines the different attributes of data. It depicts which attribute is sensitive and hence required to ensure confidentiality constraint [2, 3]. The block diagram of PPDM is shown in figure;

Figure 1: Blockdiagram of PPDM

PRIVACY PRESERVING DATA MINING (PPDM)

METHODS

In this section we focus on number of methods that have recently been proposed for privacy preserving data mining. A survey on several privacy preserving data mining technologies are studied in [5] and the pros and cons of these technologies are analysed. In this paper, we analyse an overview of the state-of-the-art in privacy preserving data mining. In order to perform the privacy preservation most methods for computations use some form of transformation on the data. Typically, such methods reduce the granularity of representation in order to reduce the privacy. This reduction in granularity results in some loss of effectiveness of data and mining algorithms. This is the natural trade-off between information loss and privacy. Methods such as k- anonymity, l-diversity, t-closeness, classification, association rule mining are all designed to prevent identification to preserve the privacy of sensitive information. The Application of several techniques for preserving privacy on experimental dataset is illustrated in

[6] and their effects on the results are revealed.

Anonymization Algorithms

Anonymization methods have emerged as an effective means to achieve privacy preservation. In these methods some part of the original data, for instance, through generalization, compression, etc., is transformed and let the transformed data cannot be combined with other information to reason about any personal privacy information. The implementation of privacy preservation mainly concentrates on two aspects: (1) How to ensure that the data been used without privacy disclosure? (2) How to make the data to be better utilized? So, the problem to be solved urgently is a trade-off between privacy preservation and data utilization.
Perturbation Techniques

Data Perturbation introduces random perturbation to individual values to preserve privacy before data are

published. These techniques are statistically based methods that seek to protect confidential data by adding random noise to confidential, numerical attributes, thereby protecting the original data. Data Perturbation techniques are not encryption techniques, where the data is first modified, then (typically) transmitted, and then received, decrypted back to the original data. But the intent of these techniques is to allow authentic users the capability to access important aggregate statistics (such as mean, correlations, etc.) from the entire database while protecting the individual identify of a record.
Distributed Privacy Preservation

In many cases, individual entities may wish to derive aggregate results from data sets which are partitioned across these entities. For this purpose, Privacy preserving

distributed data mining is used that aims to design secure protocols which allow multiple parties to conduct collaborative data mining while protecting the privacy of their data. Such partitioning may be horizontal (when the records are distributed across multiple entities) or vertical (when the attributes are distributed across multiple entities). In this the individual entities may consent to limited information sharing with the use of a variety of protocols and may not desire to share their entire data sets. The whole effect of such methods is to preserve privacy for each individual entity, while deriving aggregate results over the entire data.

The advantages and limitations of some of the PPDM techniques are tabulated in Table 1.

Technique	Advantages	Limitations
Anonymization Technique	Secrecy of data are preserved.	More information loss
Perturbation Technique	Preserves various attributes independently.	Original data values cannot be regained.
Distributed Data Mining	It is an efficient technique. Simple and supports large databases.	Minimal information loss.
Cryptography Technique	Data encryption and decryption using keys is accurate and improves security.	Complexity and number of keys are proportional.

Table1: Advantages and limitations of PPDM techniques

COMPARISON OF RECENT RESEARCHES ON PPDM

Table 2 shows the all available PPDM methods for data mining to secure the data set. When we are transferring or

exchanging the data set with fair enough security and also these methods ensures the various approaches which are being used to obtain the cryptosystem.

S. No	Authors	Year of Publication	Technique Used for PPDM	Approach	Result and Accuracy
1.	Y.Lindell, B.Pinkas [6]	2000	Cryptographic Technique	Sensitive data are encrypted in different levels using keys.	The complexity increases when more than a few keys are involved. Also, it does not hold good for large databases.
2	L. Sweeney[7]	2002	K- Anonymity	Information about an individual contained in a release cannot be distinguished from at least k-1 individuals information.	Privacy is Preserved at greater levels.
3	J. Vaidya and C. Clifton[8]	2002	Association Rule	Data are vertically distributed into segments.	Ensures privacy.
4	HillolKargupta, Souptik Datta, Qi Wang and Krishnamoorthy Sivakumar[9]	2003	Data Perturbation	Data Privacy is preserved by adding random noise.	Randomization Techniques are used to generate random matrices.


5	Charu C Aggarwal, Philip S. Yu[10]	2004	Condensation Approach	Condenses the data into multiple groups of predefined size. The different records are not distinguishable.	The use of pseudo-data no longer requires to redesign the data mining algorithms, since they have the original format.
6	SlavaKisilevich, Lior Rokach, Yuval Elovici, BrachaShapira[12]	2010	Anonymization	Anonymization uses generalization and suppression for data hiding.	Background knowledge and Homogeneity attacks of K-Anonymity algorithm do not preserve sensitivity of an individual.
7	P.Deivanai, J. JesuVedhaNayahi andV.Kavitha1[3]	2011	Hybrid Approach	Hybrid Approach is a combination of different techniques which combine to give an integrated result.	It uses Anonymization and suppression to preserve data.
8	George Mathew, Zoran Obradovic[14]	2011	Decision Tree	An approach which is technical, methodological and should give judgemental knowledge.	A graph-based framework for preserving patients sensitive information.
9	M. N. Kumbhar and R. Kharat[16]	2012	Association Rule By Horizontal and Vertical Distribution	Different approaches in the field of Association rule is reviewed.	The performance of all models is analyzed in terms of privacy, security and communications.
10	Savita Lohiya and LataRagha[17]	2012	Hybrid Approach	A combination of K- Anonymity and Randomization.	It has more accuracy and original data can be regained.
11	George Mathew, ZoranObradovic[19]	2012	Distributed Privacy Preserving	Provides an algorithm to collaboratively build a better decision-making model	It improves the overall accuracy of a classification model
12	Shweta Taneja, Shashank Khanna, SugandhaTilwalia, Ankita[21]	2014	Cryptography, Anonymization, Perturbation	A tabular comparison of work done by different methods.	Cryptography and Random Data Perturbation methods perform better than the other existing methods.
13	M. Antony Sheela, K. Vijayalakshmi[24]	2017	Partition Based Perturbation	Applied techniques on the vertically partitioned data.	When the threshold value is reached,the individual data is changed.
14	JalpeshVasa, PanthiniModi[25]	2018	t-closeness	Anonymization based techniques used to preserve privacy by reducing the granularity.	Wasnt that perfect, so opted differential privacy.

CONCLUSION

Privacy is the major concern to protect the sensitive data in today's world. People are very much anxious about their sensitive information which they dont want to share. In this paper our survey focuses on the existing literature present in the field of Privacy Preserving Data Mining. The primary objective of PPDM is promoting algorithm to hide sensitive data or offer privacy in data mining. From our analysis, we have found that that there is no single PPDM technique in existence that outshines every other technique with relation to each possible criterion such as use of data, performance, difficulty, compatibility with procedures for data mining, and so on. All methods perform in a different way depending on the type of data as well as the type of application or domain. But still from our analysis, we can conclude that Distributed data mining and Random Data Perturbation methods perform better than the other existing methods.
REFERENCES

Alpa Shah and Ravi Gulati, Privacy Preserving Data Mining: Techniques, Classification and Implications – A Survey, International Journal of Computer Application, Vol. 137 No 12, March 2016, 40-46.
AlShwaier and A. Z. Emam, Data Privacy OnEHealth Care System, International Journal of Engineering, Business and Enterprise Applications, (2013).
Xu, Yang, Tinghuai Ma, Meili Tang, and Wei Tian. "A survey of privacy preserving data publishing using generalization and suppression." Appl. Math 8, no. 3, pp. 1103-1116, (2014).
Y.Li, B.Vinzamuri, C.K.Reddy, Constrained elastic net based knowledge transfer for health care information exchange, Data Mining Knowl. Discov. 29 (4) (2015) 10941112.
Jian Wang, YongchengLuo ; Yan Zhao ; Jiajin Le, 2009,A Survey on Privacy Preserving Data Mining, First International Workshop on Database
Grljevic, O., Bosnjak, Z., Mekovec, R. 2011, Privacy preserving in data mining – Experimental research on SMEs data, IEEE 9th International Symposium on Intelligent Systems and Informatics (SISY), 2011 , pp- 477 481.

[7]	Y. Lindell, B.Pinkas, Privacy preserving data mining, in proceedings of Journal of Cryptology, 5(3), 2000.		A Review Paper, in proceedings of 978-1-46735116- 4/12/$31.00_c, IEEE 2012.
[8]	L. Sweeney, "k-Anonymity: A Model for Protecting Privacy, in	[18]	S. Lohiya and L. Ragha, Privacy Preserving in Data Mining
	proceedings of Int'l Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2002.		Using Hybrid Approach, in proceedings of 2012 Fourth International Conference on Computational Intelligence and
[9]	J. Vaidya and C. Clifton, Privacy preserving association rule		Communication Networks, IEEE 2012.
	mining in vertically partitioned data, in The Eighth ACM SIGKDD International conference on Knowledge Discovery and	[19]	Martin Beck and Michael MarhÂ¨ofer, Privacy-Preserving Data Mining Demonstrator, in proceedings of 16th International
	Data Mining, Edmonton, Alberta, CA, July 2002, IEEE 2002.		Conference on Intelligence in Next Generation Networks, IEEE
[10]	H. Kargupta and S. Datta, Q. Wang and K. Sivakumar, On the Privacy Preserving Properties of Random Data Perturbation	[20]	2012. George Mathew, ZoranObradovic, "Distributed Privacy Preserving
	Techniques, in proceedings of the Third IEEE International		Decision System for Predicting Hospitalization Risk in Hospitals
[11]	Conference on Data Mining, IEEE 2003. C. Aggarwal , P.S. Yu, A condensation approach to privacy		with Insufficient Data", in proceedings of 2012 11th InternationalConference on Machine Learning and Applications
	preserving data mining, in proceedings of International	[21]	Yuan Zhang , Sheng Zhong, "A privacy-preserving algorithm for
	Conference on Extending Database Technology (EDBT), pp. 183 199, 2004. 746		distributed training of neural network ensembles", Neural Comput&Applic (2013) 22
[12]	A. Machanavajjhala, J.Gehrke, D. Kifer and M.	[22]	Shweta Taneja, Shashank Khanna, SugandhaTilwalia, Ankita, "A
	Venkitasubramaniam, "I-Diversity: Privacy Beyond k- Anonymity", Proc. Int'l Con! Data Eng. (ICDE), p. 24, 2006		Review on Privacy Preserving Data Mining : Techniques and Research Challenges", International Journal of Computer Science
[13]	SlavaKisilevich, LiorRokach, Yuval Elovici, BrachaShapira,		and Information Technologies, Vol. 5 (2) , 2014, 2310-2315
	Efficient Multi-Dimensional Suppression for K-Anonymity, inproceedings of IEEE Transactions on Knowledge and Data	[23]	Abel N. Kho, John P. Cashy, Karthryn L. Jackson, Adam R. Pah, SatyenderGoel, JornBoehnke, John Eric Humphries, Scott Duke
	Engineering, Vol. 22, No. 3. (March 2010), pp. 334-347, IEEE		Kominers, Bala N. Hota, Shanon A. Sims, Bradley A. Malin,
[14]	2010. P.Deivanai, J. JesuVedhaNayahi and V.Kavitha, A Hybrid Data		Dustin D. French, Theresa L. Walunas, David O. Meltzer, Erin O. Kaleba, Roderick C. Jones, Wiliam L. Galanter,"Design and
	Anonymization integrated with Suppression for Preserving Privacy in mining multi party data in proceedings of International		implementation of a privacy preserving in electronic health record linkage tool in chicago" journal of American Medical Informatics
	Conference on Recent Trends in Information Technology, IEEE		Association,(2015) 22(5)
[15]	2011. G. Mathew, Z. Obradovic, A PrivacyPreserving Framework for	[24]	V. Baby , N. Subhash Chandra , " Privacy-Preserving Distributed Data Mining Techniques: A Survey ", International Journal of
	Distributed Clinical Decision Support, in proceedings of 978-1-		Computer Applications (0975 8887) Volume 143 No.10, June
[16]	61284852-5/11/$26.00 Â©2011 IEEE. A. Parmar, U. P. Rao, D. R. Patel, Blocking based approach for	[25]	2016 M. Antony Sheela, K. Vijayalakshmi,"Partition Based Perturbation
	classification Rule hiding to Preserve the Privacy in Database , in		for Privacy Preserving Distributed Data Mining" CYBERNETICS
	proceedings of International Symposium on Computer Science and Society, IEEE 2011.	[26]	AND INFORMATION TECHNOLOGIES ,2017Volume17, No 2 Review of different privacy preserving techniques of PPDM
[17]	M. N. Kumbhar and R. Kharat, Privacy Preserving Mining of
	Association Rules on horizontally and Vertically Partitioned Data:

A Survey on Privacy Preserving Data Mining Techniques

Leave a Reply