Preserving Data Privacy Through Secrecy views in Data Mining

Sailekshmi B; Ashwini B

doi:10.17577/IJERTV4IS100460

Volume 04, Issue 10 (October 2015)

Preserving Data Privacy Through Secrecy views in Data Mining

DOI : 10.17577/IJERTV4IS100460

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 69
Total Downloads : 191
Authors : Sailekshmi B, Ashwini B
Paper ID : IJERTV4IS100460
Volume & Issue : Volume 04, Issue 10 (October 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS100460
Published (First Online): 29-10-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Preserving Data Privacy Through Secrecy views in Data Mining

Sailekshmi B

Department of computer Science and Engineering,

Mar Baselios College of Engineering & Technology

Kerala, India

Ashwini B.

Department of Computer Science and Engineering, Mar Baselios College of Engineering

Kerala, India

Abstract Current inventions in the field of communication technologies and several other technologies such as biometric technologies have given rise to a new research span, known as Privacy Preserving Data Mining (PPDM). It is around for a couple of years now and has accepted on the grounds that it permits exchange of private or confidential data for study purposes. Different algorithms regarding data mining, incorporating these mechanisms, have been developed which allows taking out relevant information from massive amount of data, while hiding sensitive information or data from disclosure or inference. For privacy preservation, different techniques have been proposed such as cryptographic techniques, k-anonymity, data perturbation, anonymization etc. But they suffer from various types of attacks such as linkage attacks, background attacks, homogeneity, integrity loss, information loss etc. Proposed framework is a combination of approaches for privacy conservation in data mining which uses the combined techniques of randomization using matrix of probability and generalization which may reduce the integrity loss and information loss and here pattern of the data cannot be identified by the attacker.

Keywords Privacy Preserving Data Mining; k-anonymity; anonymization.

INTRODUCTION

Nowadays different organizations, companies and industries collect and store very large amount of data for their own needs. These huge amounts of data are then gone for analysis purposes; to obtain relevant or useful information with the help of data mining. After this stage, company or organization will get useful data according to their needs but these types of data or information may include private information or confidential information or sensitive information about individuals.

Privacy can be considered as the right to be abandoned and it is the right to be free from any vigilance and any illogical personal invasions. Couple of cases exists similar to the above condition. For example, statistical data provided by the census bureau collected from houses and individuals are given to third parties for research purposes. If personal details in the statistical data are not hidden, then the third party will easily get information of individual and so privacy breach occurs. Privacy becomes a relevant issue when data set contain confidential details of individuals.

To sort out this problem, PPDM was introduced. For preserving data privacy, a new method was introduced known as PPDM which has been accepted widely [16]. And so, sharing of private data can be permitted for different purpose such as analysis [1]. Main objective is exploring useful data

from huge data set and at the same time it provides protection for sensitive information. Two fields of this technique are knowledge hiding and information hiding. Both deals with hiding information, on which knowledge hiding focuses hiding private information and data hiding focuses on modification of sensitive information or removal of private data [3].

There exits several methods for PPDM which can be typically branched into two: cryptographic techniques and non-cryptographic techniques [16]. Cryptographic techniques are the techniques through which data that are sensitive can be encrypted. It is the preferred technique used to yield privacy to the information. Cryptographic technique is very successful approach for the reason that it take care of safety and security to confidential attributes. Other technique i.e. Non- cryptographic techniques involves k-anonymity technique, LKC privacy techniques etc. Here the main problem of non- cryptographic technique is loss of information.

Problems due to privacy and problems due to information loss have a direct relation. To handle this, a new way of preservation should be introduced. Proposed approach can be classified into two segments. In segment 1, randomization (data modification to provide privacy) is applied on original data with the help of matrix of probability. In next segment, the randomized output is then divided into two, based on one private attribute (in the data set as confidential details) as well as non-sensitive details [17]. And then generalization is applied only to confidential attributes to avoid over anonymization (over anonymization results in information loss and integrity loss). By this way, information loss and loss of integrity can be limited to a certain extent. When anonymization is done with the help of probability of matrix and generalization, then it is troublesome for the invader to attack.

Organization of this paper is in this fashion. Segment 2 lay out the literature review. Segment 3 explains the problem definition. Segment 4 covers proposed approach and lastly the segment 5 concludes the work.

LITERATURE REVIEW

There exist several methods for PPDM. Goal of this segment is to analyze current approaches in PPDM and to identify drawbacks of those techniques. Existing systems can be classified into cryptographic techniques and non- cryptographic techniques.

Anonymization: Anonymization is an approach for masking confidential data from original or owners record. Anonymization can be generally classified into generalization and perturbation. Some advantages over cryptographic methods are easy to implement [6]. Some limitations are that they dont guarantee privacy and sensitive data are not preserved properly (because knowledge attacks and homogeneity attacks of k-anonymity algorithm cant uphold confidentiality of information) [6][20].

Cryptographic Technique: Cryptography is an approach through which data that are sensitive can be engrafted. It is the most popular approach used for providing privacy to the information because it provides safety and security to sensitive attributes [1][20]. Some disadvantages are it fails to protect the output data while computation takes place. This method does not give beneficial results when the data set is very large in number (It is a challenging task, to employ this method for vast databases because as instances increases the chances of occurrence of error also increases). Final product may crack the confidentiality of person-specific record because it fails to protect the original data though it is encrypted [1][20].

Some of non-cryptographic techniques are described here such as k-anonymity, data perturbation, l-diversity etc.

K- Anonymity

Place	DOB	Gender	Pin code	Disease
Vadassrekonam	1956	M	695143	Stomach Cancer
Puthenthope	1982	M	695586	Myocardial Infraction
Vadassrekonam	1990	M	695143	Stroke
Puthenthope	1982	M	695586	Myocardial Infraction
Vadassrekonam	1956	M	695143	Stomach Cancer
Vadassrekonam	1990	M	695143	Stroke

The k-anonymity model can be considered as a framework implementing or constructing algorithms and evaluating those systems and algorithms that gives data. The released or publically available data, limits to- what cannot be revealed and what can be revealed about the data entities. For example: to identify a person and the only data available is date of birth and place – there should be at least k number of people meeting with the same requirement [2].

Table 2.1 An example for k-anonymity

Each data must be in a fashion that any combination of values in each tuples cannot be used for identifying individuals. In this method, the fragments of data representation are reduced by the help of generalization and suppression. Some advantages of this method are maintained data integrity (maintaining the consistency of data) and data granularity (granularity of data refers to the size of data) is reduced (and by that way it prevents the possibility of indirect

identification) [2]. Some disadvantages are dealing with large number of quasi identifiers could be problematic and it generalizes or suppresses quasi identifier (set of minimal attributes which can be used to combine additional information to regain the identity of individuals) attributes or demographic attributes (pin code, age, gender) to protect data which reduces quality of data [2]. And there are chances of temporal attacks (due to dynamic collection of data), unsorted matching attacks (due to the ordered arrangement of data) and complementary release attacks [2]

Data Perturbation

Data perturbation is a highest accepted approach in PPDM or privacy-preserving data mining. Data Perturbation is an approach for customizing data using random process. These approaches alter confidential data values by modifying them by addition of values or subtraction of values or by using any other mathematical equations [20]. Some advantages of this method are geometric perturbation can conserve the most critical geometric properties [3][18]. Some limitations of this method are they doesnt guarantee privacy for multi- dimensional perturbation (because perturbs multiple columns in one transformation) and other than privacy preservation, accuracy preservation is also considered as a problem [3].
L-Diversity Algorithm

The l-diversity model was designed to handle some weaknesses in the k-anonymity model [19]. L-diversity provides privacy even when the publisher of the data does not know what form of information is possessed by the attacker [20]. Some advantages of this method are l-diversity doesnt require information of the entire distribution of the sensitive and non-sensitive attributes (previous method requires the knowledge). L-Diversity does not require the publisher to have as much information as the attacker [5][20]. Some limitations of l-diversity algorithm are, it cannot be used for multiple sensitive attribute, data quality is degraded (when the data is high dimensional).
Hybrid Approach

Privacy conservation is a very huge field. Many algorithms such as k-anonymity, data perturbation, l-diversity algorithm etc have been proposed in order to secure the data. Hybrid approach is a new facility through which one can combine two or more approaches to preserve the data [20].

One of the hybrid technique proposed was the combination of randomization and generalization [20]. In this case, data is randomized (with the help of random probabilities) and then generalized the data (modified or randomized data).

Other hybrid techniques proposed was the combination of anonymization and suppression. In this case, data is anonymised and then privacy is given to the modified data.

Some advantages of this technique are the method protects personal data with more preferable accuracy (over anonymization is avoided and due to that reason misrepresentation of data is avoided); also it can rebuild the original data and provide data with no information loss [9][20].

Original data set

Several other approaches can also be clubbed together to make a hybrid technique such as Data perturbation, Blocking based method, Cryptographic technique, Condensation approach etc.

Identify the key attribute, sensitive attribute and quasi identifiers

Some limitations of the existing hybrid approaches are, it is very hard to employ this algorithm for huge databases (because when instances increases the chances of occurrences of error also increases) [9][20].

PROBLEM STATEMENT

Calculate the matrix of probability and re- substitute

Though there exist different techniques in privacy preserving data mining, some shortcomings still exists. Some of them are integrity loss, information loss, temporal attacks,

over anonymization, identity linkage, attribute linkage,

Convert matrix in table format

unsorted matching attacks and problems due to contemporary

Classify the values of sensitive attributes into two: and non-sensitive details.

attacks and large data set. For example, if k-anonymity is used, then it minimizes information loss but there are chances of temporal attacks and unsorted matching attacks. To sort out these problems, hybrid approach of different techniques was introduced.

Main motive of this research work is to boost up hybrid approach of randomization and generalization which partially

removes the problems of k-anonymity, randomization and l- diversity. Here it increases the utility of data, reduces loss of

Non-sensitive details

Confidential details

information and at the same time it provides privacy also. Since it uses random values for randomization stage, output may result to failure is the basic problem here. Proposed approach offers solution to this problem.

Apply generalization

PROPOSED APPROACH

Combine both the table (modified confidential details and original non sensitive details)

Proposed approach is a hybrid one which uses the techniques of randomization and generalization. Solution to half of the problems mentioned above (problems due to k- anonymity) can be simplified by using the techniques of randomization with the help of matrix of probability. Data patterns cannot be analysed by the invader if matrix is calculated in randomization.

Randomization is a traditional method for distorting data in privacy preserving data mining. Values in attributes are

covered-up here. Existing methods in randomization are

Derived table

additive randomization, multiplicative randomization and

micro data randomization. Main idea behind this is to pervert data so that invader cannot determine the data pattern.

Proposed approach is a new way of doing randomization using matrix of probability.

Proposed approach can be categorized into two different segments. On segment 1, randomization is performed on original data with the help of probability of matrix. Matrix of probability is the probability of occurrence of each instance under conditions. And in next segment, generalization is performed on randomized output i.e. the output of the section 1 is given to the next segment as input. Generalization is a process of anonymization to hide sensitive values. Flowchart of the proposed solution is given in Fig 4.1.

Fig 4.1: Flowchart of proposed approach

Algorithm for segment 1 Input: Original data set Output: Modified table

Method:

step 1. Identify the key attribute, sensitive attribute and quasi identifiers from the original table.

step 2. Calculate the matrix of probability

Select quasi identifiers and sensitive attributes. (Let the number of quasi identifiers be n (n=n1, n2… nn) and the number of records in the original table

be j. There will be n number of matrix each of size j*j).
Calculate the probability of occurrence of each instance under conditions in a matrix format.

Re-sustitute the values. step 3. Convert matrix in table format.

In segment 1, after selecting quasi identifiers, confidential attributes and key attributes; the matrix of probability should be calculated. Matrix of probability is the probability of occurrence of each instance under conditions. For example, consider the following table for calculating matrix of probability. This table is just for an example which cannot be considered as an original table.

Now 2nd value for age should be considered for every row. Calculate the probability of occurrence of age 29 under conditions such when the gender is M, when the symptom 1 is Cough, when the symptom 2 is Chest pain, when the symptom 3 is Breathing Trouble. After that next row should be considered at this same age. The probability of occurrence of age 29 under conditions such when the gender is F, when the symptom 1 is Cough, when the symptom 2 is Weight loss, when the symptom 3 is Coughing out blood should be calculated next. Similarly calculate every row till last one. The probability of occurrence of age 29 under conditions such when the gender is F, when the symptom 1 is upper abdominal pain, when the symptom 2 is indigestion, when the symptom 3 is vomiting should be calculated.

Similarly each value for age should be calculated for every row till occurrence of age 25 under conditions such when the gender is F, when the symptom 1 is upper abdominal pain, when the symptom 2 is indigestion, when the symptom 3 is vomiting should be calculated. This is the procedure for calculating probability of matrix for 1st quasi identifier. Similarly each instance should be considered for every quasi identifier.

Calculation of probability of occurrence can be explained with an example. Consider the condition The probability of occurrence of age 33 under conditions such when the gender is M, when the symptom 1 is Cough, when the symptom 2 is Chest pain, when the symptom 3 is Breathing Trouble .

Quasi Identifier					Sensitive Attribute
Age	Gender	Symptom 1	Symptom 2	Symptom 3	Disease
33	M	Cough	Chest pain	Breathing Trouble	Lung Cancer
29	F	Cough	Weight loss	Coughing out blood	TB
21	M	Cough	Fatigue	Breathing Trouble	Bronchit is
31	M	upper abdominal pain	indigestio n	vomiting	Chronic Gastritis
60	M	Cough	Chest pain	Breathing Trouble	Lung Cancer
25	F	upper abdominal pain	indigestio n	vomiting	Gastritis

Table 4.1 Medical data

There are 5 quasi identifiers: age, gender, symptom1, symptom2 and symptom 3. Here aim is to calculate the probability of occurrence of each instance. For that each instance should be considered.

Consider the 1st attribute age, the 1st quasi identifier. Each value for age i.e. 33, 29, 21, 31, 60, 25 should be considered with every row i.e. to calculate the probability of occurrence of each age under conditions such when the gender is so and so, when the symptom 1 is so and so, when the symptom 2 is so and so, when the symptom 3 is so and so. The probability of occurrence of age 33 under conditions such when the gender is M, when the symptom 1 is Cough, when the symptom 2 is Chest pain, when the symptom 3 is Breathing Trouble should be calculated first. After that next row should be considered at this same age. The probability of occurrence of age 33 under conditions such when the gender is F, when the symptom 1 is Cough, when the symptom 2 is Weight loss, when the symptom 3 is Coughing out blood should be calculated next. Similarly calculate every row till last one. The probability of occurrence of age 33 under conditions such when the gender is F, when the symptom 1 is upper abdominal pain, when the symptom 2 is indigestion, when the symptom 3 is vomiting should be calculated.

Example for matrix of probability: -matrix 1

0.3333	0.3333	0.2666	0.1333	0.26666	0.066
33333	333333	666666	333333	6666666	66666
33333	333333	666666	333333	6666666	66666
33333	333333	666666	333333	6666666	66666
33333	333333	666667	333333	67	66666
3333					66667
0.2666	0.0666	0.0666	0.2666	0.33333	0.333
66666	666666	666666	666666	3333333	33333
66666	666666	666666	666666	3333333	33333
66666	666666	666666	666666	3333333	33333
66666	666667	666667	666667	33	33333
6667					33333
0.3333	0.1333	0.1333	0.1333	0.06666	0.066
33333	333333	333333	333333	6666666	66666
33333	333333	333333	333333	6666666	66666
33333	333333	333333	333333	6666666	66666
33333	333333	333333	333333	67	66666
3333					66667
0.0666	0.0666	0.1333	0.0666	0.06666	0.066
66666	666666	333333	666666	6666666	66666
66666	666666	333333	666666	6666666	66666
66666	666666	333333	666666	6666666	66666
66666	666667	333333	666667	67	66666
6667					66667
0.2	0.0666	0.1333	0.2	0.06666	0.066
	666666	333333		6666666	66666
	666666	333333		6666666	66666
666666	333333		6666666	66666
	666667	333333		67	66666
					66667
0.1333	0.1333	0.0666	0.0666	0.06666	0.133
33333	333333	666666	666666	6666666	33333
33333	333333	666666	666666	6666666	33333
33333	333333	666666	666666	6666666	33333
33333	333333	666667	666667	67	33333
3333					33333

Re-substitution

step 1. To re-substitute the values for each quasi identifier N, consider the corresponding matrix.

step 2. Consider only the diagonal values for every matrix

a. Let N11 be the 1st value of matrix N11 =p(a)*p(b)*p(c)..

For n1 consider p(a) step 3. Compare it with mapping table.

If value p(a) exists in the table
- Check the duplication of that value.

If it exists only 1 time, then just give that answer.
If not, consider all the answers of duplication
- Num-> generalize
- String-> use * instead Example: Let the matrix obtained from the original table

selected be the following one

N11	N12	N13	N14	N15	N16
N21	N22	N23	N24	N25	N26
N31	N32	N33	N34	N35	N36
N41	N42	N43	N44	N45	N46
N51	N52	N53	N54	N55	N56
N61	N62	N63	N64	N65	N66

Consider the diagonal values only, because the value according to the table lies diagonally in the matrix. Now consider each diagonal value. From matrix calculation, N11 is the product of n1, n2 n6 which are actually the probability values. For example consider the values for x1 to x6. It is clear that x1 denotes the exact age.

N11 = n1 * n2 * n3 * n4 * n5 * n6 N11= 1/6 * 4/6 * 4/6 * 2/6 * 3/6 * 2/6 p(1) = 1/6 age

Similarly the values for x22 to x66 should be calculated. Consider an example:

Let matrix created for age be

j	k	l
m	n	o
p	r	q

Age and probability of occurrence: Age: 21 1/3: probability Age: 352/3: probability

Age= p(j) * p(n) * p(q)

=p(occurrence of age 21 under conditions when gender=M and Disease=HIV+) * p(occurrence of age 35 under conditions when gender=F and Disease=Cancer) *

p(occurrence of age 35 under conditions when gender=M and Disease=HIV+)

p(j) = 1\3 * 2\3 * 2\3 21 p(n) = 2\3 * 1\3 * 1\3 35 p(q) = 2\3 * 2\3 * 2\3 35

Similarly calculate Gender. And then generalize

Age 2\13 – – – > 54, 65, 21, 30 — 20-60

Age 4\13 – – – > 73, 87, 90– 70-90

Algorithm for next segment

Input: Converted table. Output: Derived table. Method:

step 1. Select converted table.

step 2. Classify the values of sensitive attributes into two: confidential details and non-sensitive details.

step 3. Consider the confidential details and apply generalization to the details.

step 4. Combine both the table (modified confidential details and original non sensitive details).

step 5.

After completing segment 1, the output of segment 1 is given to next segment as input. In traditional methods of PPDM, generalization is applied to entire data which may result in information loss due to over anonymization. So here in the work, it considers only highly sensitive information and applies generalization only to this section. Rest of the data set is not generalized and kept it as such because in this area, anonymization is not required. If entire data set is generalized, then this second part of data (non-confidential data) is also generalized which is of no use. To avoid this problem confidential attribute can be split into two; confidential details and non-confidential details [21]. Generalization is applied

only to confidential details so that anonymization occurs only to those parts. Generalization is actually a process of anonymization which duplicates the record. Here in this work, generalization is done separately for strings and numerical. For strings, each word is modified by changing the second alternative positions (letters) in each word with asterisk values only after finding the total strength of each word. For numerical, last three values are hidden by replacing the original value with asterisk only after calculating the total numbers in each value.

RESULT AND DISCUSSION

Previous method of hybrid approach of randomization and generalization had a major drawback which leads to different answers for the same data set. This is because of the probability matrix (here the matrix has been created with random values which are not fixed). Due to this random generation of values, each time when a data set is checked, it will result different values for the same data set. Main advantages are loss of information is reduced, data utility has been increased and at the same time it was able to provide privacy. But the problem is integrity loss. If integrity and accuracy of the data is not been protected correctly, then data will be of no use. The below figures shows the results of same data set when data has been checked for two times. This happens because of the random value generation in the matrix of probability.

Fig 5.1 Random values of 1st output

Fig 5.2 Random values of 2st output

From the graph itself it is clear that there is a drastic change in each calculation which finally leads to incorrect result.

This problem has been sorted out by the proposed approach. Here the values of matrix are calculated with the help of probability matrix. The random generation process is omitted here and instead of that original values are calculated. And here the values wont be changed for the same data set.

Fig 5.3 Output of Matrix of Probability for 1st and 2nd checking (for same data set)

Result of matrix values are then re-substituted to get a converted table. After re-substitution, the converted table which has been modified is given to section 2 as input. And confidential and non-sensitive data are separated here. Then generalization is only applied to confidential data. The matrix obtained here is given below

CONCLUSION

Privacy, confidentiality and security are primary concern when data is considered. Nowadays society immensely worries about their confidential information given to others for various reasons and because of that reason many of them are not ready to disclose information which may result in false data set. Though there exist different techniques in privacy preserving data mining, some shortcomings still exists. Some of them are integrity loss, information loss, temporal attacks, over anonymization, idntity linkage, attribute linkage, unsorted matching attacks and problems due to contemporary attacks and large data set. For example, if k-anonymity is used, then it minimizes information loss but there are chances of temporal attacks and unsorted matching attacks. To sort out these problems, hybrid approach of different techniques was introduced. Main motive of this research work is to boost up a hybrid approach of randomization and generalization which partially removes the problems of k-anonymity, randomization and l-diversity. Here it increases the utility of data, reduces loss of information and at the same time it provides privacy also. But the problem is; since it uses random values for

randomization stage, output may result to failure. Proposed approach offers solution to this problem

In future instead of generalization in the second segment, k-anonymity, l-diversity and lkc privacy can be applied to increase the data utility without replacing the first section.

REFERENCES

Y. Lindell and B. Pinkas, Privacy Preserving Data Mining,Journal of Cryptology, Vol. 15, No. 3, pp. 177-206, 2002.
L.Sweeny, k-anonymity:a model for protecting privacy International Journal on Uncertanity,Fuzziness and knowledge-based systems, pp.557-570, 2002.
Chen,K. And Liu, Geometric Data Perturbation for Privacy Preserving Outsourced Data Mining, Proceedings of International Conference on Data Mining (ICDM), IEEE, 2010.
Z. Zhang and A. Mendelzon, Authorization Views and Conditional Query Containment, International Conference on Database Theory (ICDT'05), pp. 259-273, 2007.
Ashwin Machanavajjhala, Johannes, Gehrke Daniel Kifer"-Diversity: Privacy Beyond k-Anonymity", ACM International Conference on Management of Data (SIGMOD), pp. 551-562, 2011.
S.Shaik Parveen, Dr.C.Kavitha, Review on Anonymization, International Journal of Computers & Technology, Volume 3 No. 3, Nov-Dec 2012.
J. Liu, J. Luo and J. Z. Huang, Rating: Privacy Preservation for Multiple Attributes with Different Sensitivity requirements, in proceedings of 11th IEEE International Conference on Data Mining Workshops, 2011.
T. Jahan, G.Narsimha and C.V Guru Rao, Data Perturbation and Features Selection in Preserving Privacy in proceedings of Conference on Privacy Management of Data, 2012.
H. Kargupta and S. Datta, Q. Wang and K. Sivakumar, On the Privacy Preserving Properties of Random Data Perturbation Techniques, in proceedings of the Third IEEE International Conference on Data Mining, 2003.
Manish Sharma, Atul Chaudray, Manish Mathuria,Santhosh Kumar, An Efficient approach for privacy preserving in data mining, International Conference on Signal Propagations and computer technology(ICSPCT), 2014.
Agarwall, R. and Shrikant, R. Privacy Preserving Data Mining, Proceeding of Special Interest Group on Management of Data, pp.439- 450, 2000.
Jian Wang, Yong Cheng Luo, Yen Zha, Jiajin Le, A Survey on Privacy Preserving Data Mining, International Workshop on Database Technology and Applicationpp.111-114,2009.
V.S Verkoys, A.K Elmagarmid, E. Bertino, Y. Saygin and E. Dasseni, Assosiation Rule Hiding, IEE Transaction Knowledge and Data Engineering, 16(4); 434-447,2004
P. Samurai, L Sweeny, Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression, Technical Report, SRI International,1998.
A.Agarwal and R. Srikant, Privacy Preserving Data Mining, ACM SIGMOD International Conference on Kmowledge Discovery and Data Mining, vol.29 no.2,pp. 50-57,2004
Fung, Benjamin C. M.; Ke Wang; Rui Chen and Yu, Philip S.. "Privacy-Preserving Data Publishing: A Survey of Recent Developments" , ACM Computing Surveys, 2010.
Li Xiao-Bai Sarkar, Sumit. "Privacy protection in data mining: a perturbation approach for categorical data." , Information Systems Research, Sept 2006 Issue
Keke Chen. "Geometric data perturbation for privacy preserving outsourced data mining" , Knowledge and Information Systems, 2010.
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. -Diversity: Privacy Beyond k-Anonymity ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, March 2007.
www.cse.psu.edu and www.ijcsit.com
Savita Lohiya, Lata Ragha, Performance Analysis of Hybrid Approach for PrivacyPreserving in Data Mining, Int. J. on Recent Trends in Engineering and Technology, Vol. 8, No. 1, Jan 2013
M. Young, The Technical Writers Handbook. Mill Valley, CA: University Science, 1989.

Preserving Data Privacy Through Secrecy views in Data Mining

Leave a Reply