Privacy Preserving of Contextual user Profiles in Search Engine Repository

S. Haripriya; R. Indumathi; V .M. Suresh

doi:10.17577/IJERTCONV3IS07026

NCICN - 2015 (Volume 3 - Issue 07)

Privacy Preserving of Contextual user Profiles in Search Engine Repository

DOI : 10.17577/IJERTCONV3IS07026

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 44
Total Downloads : 8
Authors : S. Haripriya, R. Indumathi, V .M. Suresh
Paper ID : IJERTCONV3IS07026
Volume & Issue : NCICN – 2015 (Volume 3 – Issue 07)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Privacy Preserving of Contextual user Profiles in Search Engine Repository

S. Haripriya1

PG Scholar,

Department of Information Technology, E.G.S.Pillay Engineering College, Nagapattinam

R. Indumathi2

PG Scholar,

Department of Information Technology, E.G.S.Pillay Engineering College, Nagapattinam

V .M. Suresp

Assistant Professor, Department of Information Technology,

illay Engineering College, Nagapattinam

Abstract: Retrieving the most relevant information for the Web becomes difficult because of the huge amount of documents available in various formats. One approach to satisfy the requirements of the user is to personalize the information available on the Web, called Web Personalization. PWS is the present techniques has proved that increases the quality of searching the services on web but the user privacy is the major problem in the wide proliferation of PWS. In the proposed system, implementing the String Similarity Match Algorithm (SSM Algorithm) for improving the better search quality results. To address this privacy threat, current solutions propose new mechanisms that introduce a high cost in terms of computation and communication. And present a novel protocol specially designed to protect the users privacy in front of web search profiling. Personalized search is promising way to improve the accuracy of web search.. It aims on runtime generalisation and customization of user profile, thus providing privacy and improving the quality of search services.

Keywords: Privacy protection, personalized web search, utility, risk, profile
1. INTRODUCTION
  
  Communication networks enable us to reach a very large volume of information in a minimal amount of time. Furthermore, that huge quantity of data can be accessed at any time and any place with a capable device (e.g. a laptop, a PDA, etc.) and an Internet connection. Nowadays, it is pretty common to access easily to both resources. In the future, it will be even easier. However, useful information about a specific topic is hidden among all the available data and it can be really challenging to find it since that information can be scattered around the Word Wide Web.
  
  Web search engines (e.g. Google, Yahoo, Microsoft Live Search, etc.) are widely used to do this hard job for us. The 84% of the Internet users have used a web search engine at least once. For the 32%, web search engines are an essential tool to address their everyday duties [1]. Among the different search engines, Google is the most used in the US with a 43.7% of the total amount of searches performed in 2006 [2]. Google improves its performance (it gives
  
  personalized search results) by storing a record of visited sites and past searches submitted by each user [3] (Web History). Those searches can reveal a lot of information from individual users or the institutions they work for. For example, let us imagine an employee of a certain company
  
  A. This employee uses Google to obtain information about a certain technology. If a company B, which is a direct competitor of A, knows this situation, it can infer that this technology will be used in the new products offered by A. This knowledge gives to B an important advantage over A. Another example of this situation occurs when a person is applying for a certain job. In this case, if the employer knows that the applicant has been looking for information regarding a certain disease, she can use this knowledge to choose another person for the job. In both examples, the attacker (the entity who gets some advantage over the other) benefits from the lack of a privacy-preserving mechanism between the user and the web search engine.
2. Offline privacy requirement customization,
3. Online query-topic mapping, and
4. Online generalization.

Normally, user posts the query and retrieves the information from the server. In several systems, information is loosed due to the algorithm inefficiency. In this, Greedy IL algorithm minimizes the information loss during retrieving the informations. The advantage of GreedyIL over GreedyDP is more obvious in terms of response time. This is because GreedyDP requires much more computation of DP, which incurs lots of logarithmic operations. The problem worsens as the query becomes more ambiguous. For instance, the average time to process GreedyDP for queries in the ambiguous group is more than 7 seconds. In contrast, GreedyIL incurs a much smaller real-time cost, and outperforms GreedyDP by two orders of magnitude. GreedyIL displays near-linear scalability, and significantly outperforms Greedy

Algorithms for Proposed System

Step1: Detecting & removal of unwanted symbols

Step2: compute similarity calculation for user given word and word in database

Step3: In that similarity calculation, extract the features in the dataset.

Step4: Then estimate the ASCII difference for user given word and words in database

SteP5: The estimate the similarity values.

Step6:Then retrieve the most relevant documents based on the similar values

Fig.2.1 Steps in Existing sys

EXISTING SYSTEM

In the Existing Work, a client-side privacy protection framework called UPS for personalized web search was proposed. UPS could theoretically be adopted by any PWS that captures user profiles in a hierarchical taxonomy. The context allowed users to stipulate customized privacy requirements via the hierarchical profiles. In addition, UPS also performed online generalization on user profiles to protect the personal privacy without compromising the search quality. In this they proposed two greedy algorithms, namely GreedyDP and GreedyIL, for the online generalization. In this for query mapping process it has various steps to compute the relevant items.

Most works on anonymization focus on relational data where every record has the same number of sensitive attributes. There are a few works taking the first step towards anonymizing set-valued or transactional data where sensitive items or values are not clearly defined. While they could be potentially applied to user profiles, one main limitation is that they either assume a predefined set of sensitive items that need to be protected, which are hard to done in the web context in practice, or only guarantee the anonymity of a user but do not prevent the linking attack between a user and a potentially sensitive item.

Another approachto provide privacy in web searches is the use of a general purpose anonymous web browsing mechanism. Simple mechanisms to achieve a certain level of anonymity in web browsing include: (i) the use of proxies; or (ii) the use of dynamic IP addresses.
We now define the problem of privacy-preserving generalization in UPS as follows, based on two notions named utility and risk.

The former measures the personalization utility of the generalized profile, while the latter measures the privacy risk of exposing the profile.

3.5Attack Model

Our work aims at providing protection against a typical model of privacy attack, namely eavesdropping. As shown in Fig. 3, to corrupt Alices privacy, the eavesdropper Eve successfully intercepts the communication between Alice and the PWS-server via some measures, such as man-in- themiddle attack, invading the server, and so on. Consequently, whenever Alice issues a query q, the entire

copy of q together with a runtime profile G will be captured by Eve. Based on G, Eve will attempt to touch the sensitive nodes of

Fig. 3.Attack model of personalized web search.

Alice by recovering the segments hidden from the original H and computing a confidence for each recovered topic, relying on the background knowledge in the publicly available taxonomy repository R.

Note that in our attack model, Eve is regarded as an adversary satisfying the following assumptions:

Knowledge bounded. The background knowledge of the adversary is limited to the taxonomy repository R. Both the profile H and privacy are defined based on R.

Session bounded. None of previously captured information is available for tracing the same victim in a long duration. In other words, the eavesdropping will be started and ended within a single query session.

The above assumptions seem strong, but are reasonable in practice. This is due to the fact that the majority of privacy attacks on the web are undertaken by some automatic programs for sending targeted (spam) advertisements to a large amount of PWS-users. These programs rarely act as a real person that collects prolific information of a specific victim for a long time as the latter is much more costly.

If we consider the sensitivity of each sensitive topic as the cost of recovering it, the privacy risk can be defined as the total (probabilistic) sensitivity of the sensitive nodes, which the adversary can probably recover from G. For fairness among different users, we can normalize the privacy riskwith which stands for the total wealth of the user. Our approach to privacy protection of personalized web search has to keep this privacy risk under control.
PROPOSED SYSTEM

Web search engines (e.g. Google, Yahoo, Microsoft Live Search, etc.) are widely used to find certain data among a huge amount of information in a minimal amount of time. However, these useful tools also pose a privacy threat to the users: web search engines profile their users by storing and analyzing past searches submitted by them. In the proposed system, we can implement the clustering algorithms for improving the better search quality results. It is retrieved by using the String Similarity Match Algorithm (SSM

Algorithm) algorithm. To address this privacy threat, current solutions propose new mechanisms that introduce a low cost in terms of computation and communication. In this paper we present a novel protocol specially designed to protect the users privacy in front of web search profiling.

In this we propose and try to resist adversaries with broader background knowledge, such as richer relationship among topics. Richer relationship means we generalize the user profile results by using the background knowledge which is going to store in history. Through this we can hide the user search results. In the Existing System, Greedy IL and Greedy DP algorithm, it takes large computational and communication time.

Advantages
- It achieves better search results.
- It achieves the privacy results when applying the background knowledge to the user profiling results.
- It has less computational time and communicational time.
- It achieves better accuracy when compared with the Existing Works.
CONCLUSION AND FUTURE ENHANCEMENTS

Privacy protection in publishing transaction data is an important problem. This paper presented a client-side privacy protection framework called SSM for personalized web search. SSM could potentially be adopted by any PWS that captures user profiles in a hierarchical taxonomy. The framework allowed users to specify customized privacy requirements via the hierarchical profiles. In addition, SSM also performed online generalization on user profiles to protect the personal privacy without compromising the search quality. We proposed String Similarity Matching Algorithm, for the online generalization. Our experimental results revealed that SSM could achieve quality search results while preserving users customized privacy requirements. The results also confirmed the effectiveness and efficiency of our solution.

Our proposed system gives better quality results and gives more efficiency. Privacy is too good when compared with the Existing system. In the Existing System, only generalization technique is used. Our String matching algorithm gives more accuracy when compared with the Greedy IL algorithm. Generalization and suppression technique achieves better privacy when compared with the existing system. In Future Work, we can implement the hierarchical divisive approach for retrieving the search results. It will gives better performance when compared with our proposed System.we will try to resist adversaries with broader background knowledge, such as richer relationship among topics (e.g., exclusiveness, sequentiality, and so on), or capability to capture a series of queries from the victim. We will also seek more sophisticated method to build the user profile, and better metrics to predict the performance (especially the utility) of UPS.

REFERENCES

D. Fallows, Search engine users: internet searchers are confident, satisfied and trusting, but they are also unaware and naive, Pew/Internet & American Life Project (2005).
D. Sullivan, comScore Media Metrix Search Engine Ratings, comScore, 2006. Available from: <http://searchenginewatch.com>.
Google History, 2009. Available from:

<http://www.google.com/history>.
P. Agouris, J. Carswell, and A. Stefanidis, “An environment for contentbased image retrieval from large spatial databases,'' ISPRS

J. Photogram. Remote Sens., vol. 54, no. 4, pp. 263_272, 1999.
M. Atallah and K. Frikken, “Securely outsourcing linear algebra computations,'' in Proc. 5th ASIACCS, 2010, pp. 48_59.
M. Atallah and J. Li, “Secure outsourcing of sequence comparisons,'' Int. J. Inf. Security, vol. 4, no. 4, pp. 277_287, 2005.
M. Atallah, K. Pantazopoulos, J. Rice, and E. Spafford, “Secure outsourcing of scienti_c computations,'' Adv.

[8] Comput., vol. 54, pp. 216_272, Feb. 2001.

D. Benjamin and M. Atallah, “Private and cheating-free outsourcing of algebraic computations,'' in Proc. Conf. PST, 2008, pp. 240_245.
E. CandÃ¨s, “The restricted isometry property and its implications for compressed sensing,'' Comptes Rendus Mathematique, vol. 346, nos. 9_10, pp. 589_592, 2008.
E. CandÃ¨s, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,'' IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489_509, Feb. 2006.
E. CandÃ¨s and T. Tao, “Decoding by linear programming,'' IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203_4215, Dec. 2005.
E. CandÃ¨s and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies,'' IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406_5425, Dec. 2006.
E. CandÃ¨s and M. Wakin, “An introduction to compressive sampling,'' IEEE Signal Proc. Mag., vol. 25, no. 2, pp. 21_30, Mar. 2008.
(2009). Security Guidance for Critical Areas of Focus in
Cloud Computing, [Online].

Available:http://www.cloudsecurityalliance.org
K. Ramanathan, J. Giraudi, and A. Gupta, Creating Hierarchical User Profiles Using Wikipedia, HP Labs, 2008.
K. JaÂ¨rvelin and J. KekaÂ¨laÂ¨inen, IR Evaluation Methods for Retrieving Highly Relevant Documents, Proc. 23rd Ann. Intl ACM SIGIR Conf. Research and Development Information Retrieval (SIGIR), pp. 41-48, 2000.
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley Longman, 1999.
X. Shen, B. Tan, and C. Zhai, Privacy Protection in Personalized Search, SIGIR Forum, vol. 41, no. 1, pp. 4-17, 2007.
Y. Xu, K. Wang, G. Yang, and A.W.-C. Fu, Online Anonymity for Personalized Web Services, Proc. 18th ACM Conf. Information and Knowledge Management (CIKM), pp. 1497-1500, 2009.
Y. Zhu, L. Xiong, and C. Verdery, Anonymizing User Profiles for Personalized Web Search, Proc. 19th Intl Conf. World Wide Web (WWW), pp. 1225-1226, 2010.
J. CastellÂ´-Roca, A. Viejo, and J. Herrera- JoancomartÂ´, Preserving Users Privacy in Web Search Engines, Computer Comm., vol. 32, no. 13/14, pp. 1541-1551, 2009.
A. Viejo and J. Castella-Roca, Using Social Networks to Distort Users Profiles Generated by Web Search Engines, Computer Networks, vol. 54, no. 9, pp. 1343-1357, 2010.
X. Xiao and Y. Tao, Personalized Privacy Preservation, Proc. ACM SIGMOD Intl Conf. Management of Data (SIGMOD), 2006.
J. Teevan, S.T. Dumais, and D.J. Liebling, To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent, Proc. 31st Ann. Intl ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 163-170, 2008.

Privacy Preserving of Contextual user Profiles in Search Engine Repository

Leave a Reply