A Review Report on Searching Techniques on Encrypted Data

Neha Goyal; Amninder Gill

doi:10.17577/IJERTV4IS070425

Volume 04, Issue 07 (July 2015)

A Review Report on Searching Techniques on Encrypted Data

DOI : 10.17577/IJERTV4IS070425

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 54
Total Downloads : 214
Authors : Neha Goyal, Amninder Gill
Paper ID : IJERTV4IS070425
Volume & Issue : Volume 04, Issue 07 (July 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS070425
Published (First Online): 30-07-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Review Report on Searching Techniques on Encrypted Data

Neha

PURCITM,

Department of Computer Science Punjabi University

Mohali, India

Amninder Gill

PURCITM,

Department of Computer Science Punjabi University

Mohali, India

Abstract Cloud computing has come as a storm in the streets of Information technology industry. Cloud computing provides a platform to store data on the virtual network. There are a number of cloud storage providers that are providing storage as a service to their users. But the privacy of data on this platform is the main hindrance in adopting cloud technology. To overcome this problem of data privacy, many encryption algorithms have been proposed to be implemented on the data before uploading it on the cloud. But if we talk about large database, then the problem of searching for a chunk of data from this large encrypted database is still there. A number of searching algorithm is there that are suitable for cloud platform, but they may fail in the case of progressive elliptic curve encryption. In this paper, we are comparing various searching techniques on encrypted data suitable for cloud platform.

Keywords Cloud Computing, Multi-Keyword, Ranking, Encrypted Phrase, Synonym Query, Fuzzy, Privacy-Preserving, Order-Preserving, Semantic Search.

INTRODUCTION

Cloud computing is the most innovative technology in the history of internet technology. Cloud computing gives you the power to keep your data on the cloud storage provider. But the security of your sensitive data i.e. the trustworthiness of cloud storage providers is the main hindrance in adopting this technology. Cryptography is probably the best solution to overcome this problem. There are many encryption techniques such as RSA, Elliptic curve cryptography and many more to encrypt your confidential/sensitive data before uploading this to cloud. But another problem of searching from this encrypted data is still there. Many Scientist had discovered various alternatives to search from this encrypted database. Every algorithm has its own specialty some are faster while others are powerful. To have an efficient algorithm it terms of system usability, performance and speed, it is important to go through number of searching.
RELATED WORK DONE

Searchable encryption is not a new construct. However, all current strategies have not been so successful in varied aspects that keep them from changing into common or thought. Even hierarchal word proximity searches, a quest that ranks results supported. However shut the question keywords are along, has not been totally enforced by previous analysis.

Song et al. [1] projected a searchable encoding theme supported an interchangeable key. The theme concerned demonstrably secure, question isolation for searches, controlled looking, and support hidden question. Drawbacks: Case inability, regular expression and sub matches don't seem to be supported. Speed of looking and total area needed is big.

Eu-Jin Goh [2] projected AN index looking algorithmic rule supported a bloom filter. Their theme reduced the machine overhead of loading AN index and sorting out files. Drawbacks: Bloom filters lead to false positives; change procedure lacks security analysis, Security model not satisfactory for mathematician searches, unclear experimental evaluation.

PEKS: Public encoding Keyword Search [3] makes use of public key cipher technique. This system focuses on refreshing keywords, removing secure channel and process multiple keywords. Drawbacks: List of keyword should be determined fastidiously so as to stay length of message down. Public key algorithms need giant prime numbers to be calculated so as to come up with usable keys, therefore this method is probably terribly time intense.

PKIS: sensible Keyword Index Search on cloud data centre
[4] focuses on cluster search over encrypted info. It involves 2 schemes: PKIS-1 and PKIS-2. PKIS give sensible, realistic, and secure solutions over the encrypted dB. Drawbacks: The common keywords in numerous documents surely cluster have identical index values that ends up in Brute Force attacks.

APKS: licensed personal keyword search over encrypted knowledge in cloud computing [5] deals with multi- keyword search. Multi-dimensional question are born-again to its CNF (Conjunctive traditional Form) formula and are organized in an exceedingly ranked approach. Drawbacks: APKS doesn't forestall keyword attack.

Multi-keyword hierarchal Search over Encrypted Cloud knowledge (MRSE) [6] uses "co-ordinate matching" principle

i.e. as several matches as potential, it's AN economical principle among multi-keyword linguistics to refine the result connection. Drawbacks: although keywords square measure protected by trapdoors server will do some applied mathematics analysis over search result. Server will generate trapdoor for set of any multi keyword trapdoor request.

This paper presents the novel technique that gives sub-word matching, exact-matches, regular expressions, language searches, frequency ranking, and proximity-based queries i.e. all forms of looking out that trendy search engines use and users expect to possess.
METHOD TO USE SEARCHING OVER ENCRYPTED DATA

3.1 Encrypted Phrase looking within the Cloud [7]
This technique permits each encrypted phrase searches and proximity hierarchal multi-keyword searches to encrypted datasets on untrusted cloud.
1. Overview
  
  Fig. 1: flow diagram of the planned encrypted phrase looking procedure.
  1. The consumer sends a plaintext search question to a sure client-side server.
  2. The Client-side server encrypts all keywords within the search question singly exploitation symmetric-key encryption; it then truncates the encrypted keywords to a set range of bits to improve security by permitting for collisions, and queries the untrusted cloud server for the documents containing the set of truncated encrypted keywords.
  3. The cloud server will an info question of its encrypted index and returns to the client-side server encrypted knowledge that corresponds to document methods, truncated encrypted keyword index offset, and encrypted keyword locations.
  4. The client-side server decrypts this knowledge 1st. From the fresh decrypted keyword index offset it will then verify that came results are literally for the keywords searched and that are merely collisions.
    
    It discards those collisions and filters and/or ranks the pertinent came documents supported relevant keyword locations and frequency. Finally, it sends this hierarchal listing to the first consumer.
2. Indexing
  
  Prior to sorting out a document , Associate in nursing encrypted index of the corpus should be generated by the sure client-side server. The index is then encrypted and sent to the untrusted cloud server. Every row within the encrypted index table corresponds to at least one document corpus. every row contains 2 columns: Associate in Nursing at random allotted distinctive document id (ID) and a specialized organization that contains truncated symmetric-key encrypted keywords related to encrypted versions of the keywords location in (Word Vectors). Additionally, this knowledge string contains Associate in nursing offset that's accustomed
  
  map the truncated encrypted keyword with its full version (stored on the trust client-side server). The scientific discipline keys used for each encryptions are a similar key which is used for decryption similarly. Solely the sure client-side serve has access to the price of . Once the encrypted index is transferred to the untrusted cloud server, Associate in nursing inverted index, based on the encrypted index, is generated by the cloud server to facilitate the looking speed of the index.
3. Keyword Truncation
  
  To improve the protection every encrypted keyword is truncated to a predefined range of bits . The sure client-side server creates a distinctive keyword truncation index price for every encrypted keyword. A table that's keep on the client-side server maps the truncated index price with the totally encrypted keyword. Every index is keep in conjunction with the keyword locations within the encrypted index. This string is encrypted exploitation AES and an irregular salt. As several multiple keywords currently map to a similar bits it makes any applied math frequency analysis attack way less helpful.
4. Searching
When the search begins, the consumer sends the question phrase with multiple keywords, k1 . . . , kn, to a client-side server, that concatenates the keywords to a listing, K. The client-side server then encrypts every k K exploitation in which the order of keywords is irregular. Every keyword in this list is truncated to bits to form the encrypted keyword list, K. The client-side server then transfers this encrypted question, K, to the untrusted cloud server.

The untrusted cloud server parses K into individual encrypted keywords k and, exploitation the inverted index, determines the documents, , that contain a k. The selected IDs, the keyword locations, and therefore the truncated keyword index, are sent back to the trusty client- facet server.

The client-side server parses the results that the cloud server returns, that embody the documents IDs, paths, and associated encrypted keywords, index, and encrypted locations l. Every li is decrypted to the truncation index and li. A proximity ranking perform R, hosted by the client- facet server and is employed to meaningfully rank the results.

Disadvantages:
1. Since it is not possible, previous to decoding, to see that keywords within the collision set were really being probe for, they need to all be came back, decrypted, and so filtered.
2. Doesn't support fuzzy keyword search.
3. Doesn't support word question.
4. Not appropriate for multi-user setting.
3.2.3.1 Creation of Encrypted Databases and Tables

To create an information and a table, the information application will issue the subsequent 2 statements.

Create information dbname

Create table tblname (colnm sort,… )

In the statement on top of, kind is that the information kind for the column. The statements area unit translated into the subsequent statements by the proxy. Additionally, the proxy records the schema of the created table in its information. Create information Hash (k, dbname) produce table Hash (k, tblname) (Hash (k, colnm + "EqIdx") String, Hash (k, colnm + "RngIdx") Num, Hash (k, colnm + "Enc") String,..)

That is, 3 columns area unit created for the column colnm. The column colnm + EqIdx have the kind String, since its values area unit continuously positional notation strings generated by secure hash functions. The values of column colnm + RngIdx area unit generated by our classification mechanism and have the numerical kind. The column colnm +

Enc for cipher text even have the kind String.
Data supporting word question [12]
The technique proposes a semantics-based multi- keyword graded search theme over encrypted cloud knowledge that supports word question. The search results is achieved once approved cloud customers input the synonyms of the predefined keywords, not the precise or fuzzy matching keywords, because of the attainable word substitution and/or her lack of tangible information concerning the info.

Fig. 6: design of the Multi-keyword graded search supporting word

question

Notations
- DC the plaintext document assortment, expressed as a collection of m documents DC = d1, d2, d3…, dm.
- C the encrypted type of DC hold on within the cloud server, expressed as C = c1, c2…cm.
- W the keyword wordbook, as well as n keywords, expressed as W = w1, w2…wn.
- I the searchable index tree generated from the total document set DC. (Each leaf node within the index tree is related to a document in DC.)
- Doctor of Divinity the index vector of document d for all the keywords in.
- Letter of the alphabet the question vector for the keyword set W.
- d The encrypted type of Doctor of Divinity. Rank function: In info retrieval, a ranking operate is typically wont to value relevant lots of matching files to asking. The ranking
  
  operate used here is TFÃ—IDF wherever TF (term frequency) denotes the prevalence of the term showing in the document, and military force (inverse document frequency) is usually obtained by dividing the full variety of documents by the quantity of files containing the term. That means, TF represents the importance of the term in the document and military force indicates the importance or degree of distinction in the whole document assortment. Every document is reminiscent of Associate in nursing index vector DD that stores normalized TF weight, and also the question vector letter of the alphabet stores normalized military force weight. every dimension of DD or letter of the alphabet is said to a keyword in W, and also the order is same thereupon in W, that is, Dd[i] is reminiscent of keyword Wisconsin in W. The notations used in similarity analysis operate are showed as follows:
- fd,j, the TF of keyword wj inside the document d;
- fj, the quantity of documents containing the keyword wj ;
- M, the full variety of documents within the document collection;
- N, the total variety of keywords in the keyword dictionary;
- wd ,j , the TF weight computed from fd, j;
- wq,j, the military force weight computed from N and fj; The definition of the similarity operate is as follows:
SC (Q ,Dd) = Where wq,j = 1+ ln fd,j . The normalized TF and military force weight are and severally and the vectors letter of the alphabet and DD ar unit vectors.Construction of keyword set extended by synonym:

Let N be the full variety of texts in corpus, let n be the quantity of texts containing the term i in corpus, let E1 be the quantity of texts within the largest class containing the term i, let E2 be the quantity of texts within the second largest class containing the term i. The new weight issue Cd is more to the formula of TFIDF, the improved formula is as follows:

Wik = TF*IDF*Cd = TF* *Cd = fik * *

So the keywords are extracted from every outsourced text document by mistreatment our improved methodology. All keywords extracted from identical one text type one keyword set, and every one subsets type the keyword set finally. All the outsourced text documents is expressed as follows:

We build a standard equivalent word synonym finder on the muse of the New Yankee Rogets school synonym finder (NARCT). NARCT is ablated in amount by US consistent with the subsequent 2 principles:
1. Choosing the common words;
2. Choosing the words that will be semantically substituted fully.
The made equivalent word set contains a complete of 6353 equivalent word teams when the reduction. The keyword set is extended by mistreatment our made equivalent word synonym finder. The new keyword set containing equivalent word is shown as follows:

Where s1 represents the equivalent word of kfi. If a keyword has 2 or additional synonyms, then all synonyms are more into the keyword set. The repetitive keywords are deleted to cut back the burden of storage. At last, a simplified keyword set and corresponding keyword evaluation table are made and used.

Disadvantage:

1. Doesn't support syntactical transformation, anaphora resolution and alternative linguistic communication process technology.
CONCLUSION

In this paper varied techniques to search out documents with needed keywords are mentioned. to form the search of needed documents additional correct these techniques is integrated which will give the user with easy supporting multi keyword search, fuzzy keyword search, search supported equivalent word words within the question, linguistics primarily based search. The efficient keyword serch would be provided using Ranked Search Algorithm based on proximity and similarity based ranking techniques. Also technique to deal with indexing and searching on encrypted databases and data shared between multi-users, multi- contributors scenarios is discussed in this paper. To provide confidentiality user authentication, user revocation and authentication of returned results is done using Fine- grained Owner-enforced Search Authorization. To provide better security at server technique that supports keyword truncation is used.

ACKNOWLEDGEMENTS

I would like to thank my guide Ms. Amninder Gill for helping me out in my research work.

REFERENCES

Dawn Xiaodong, Song David Wagner and Adrian Perrig, Practical Techniques for Searches on Encrypted Data in Proc. of IEEE Symposium on Security and Privacy00, 2000.
EuJin Goh, Secure indexes in the Cryptology ePrint Archive, Report 2003/216, March 2004.
Joonsang Baek, Reihaneh Safiavi-Naini, Willy Susilo, Public Key Encryption with Keyword Search Revisited, in Cryptology ePrint Archive, Report 2005/191, 2005.
Hyun-A Park, Jae Hyun Park and Dong Hoon Lee, PKIS: practical keyword index search on cloud datacenter, in EURASIP Journal on Wireless Communications and Networking 2011.
Li, M., S. Yu, N. Cao and W. Lou, Authorized private keyword search over encrypted data in cloud computing. in Proceedings of the 31st International Conference on Distributed Computing Systems, June 20-24, 2011, Minneapolis, MN, USA, pp: 383-392.
Ning Cao, Cong Wang, Ming Li, Kui Ren, and Wenjing Lou, Preserving Multi-keyword Ranked Search over Encrypted Cloud Data.
Steven Zittrower and Cliff C. Zou, Encrypted Phrase Searching in the Cloud in Globecom – Communication and Information System Security Symposium, 2012.
Dongxi Liu Shenlu Wang, Programmable Order-Preserving Secure Index for Encrypted Database Query, in IEEE Fifth International Conference on Cloud Computing, 2012.
Xingming Sun, Yanling Zhu, Zhihua Xia and Liahong Chang, Privacy- Preserving Keyword-based Semantic Search over Encrypted Cloud Data, in International Journal of Security and Its Applications Vol.8, No.3, pp.9-20, 2014.
Wenhai Sun, Ahucheng Yu, Wenjing Lou and Y. Thomas Hou, Verifiable Attribute-based Keyword Search with Fine-grained Owner-enforced Search Authorization in the Cloud, in IEEE Journal, 2013.
Bing Wang, Shucheng Yu, Wenjing Lou, Y. Thomas Hou, Privacy- Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud, in IEEE INFOCOM 2014 – IEEE Conference on Computer Communications.
Zhangjie Fu, Xingming Sun, Nigel Linge and Lu Zhou, Multi- keyword Ranked Search over Encrypted Cloud Data Supporting Synonym Query, in IEEE Transactions on Consumer Electronics, Vol. 60, No. 1, February 2014.
Ankit Doshi, Kajal Thakkar, Sahil Gupte and Anjali Yeole, A Survey on Searching and Indexing on Encrypted Data, in International Journal of Engineering Research & Technology (IJERT), ISSN: 2278-0181, Vol. 2 Issue 10, October 2013.
Xingming Sun, Lu Zhou, Zhangjie Fu and Jin Wang, Privacy- Preserving Multi-Keyword Ranked Search over Encrypted Data in the Cloud supporting Dynamic Update, in International Journal of Security and its Application (IJSIA), Vol.8, No.6, pp.1-16, 2014.
Zhihua Xia, Yanling Zhu, Xingming Sun, and Liahong Chen, Secure Semantic expansion based search over encrypted cloud data supporting similarity ranking, in Journal of Cloud Computing,2014.

A Review Report on Searching Techniques on Encrypted Data

Leave a Reply