A Fast, Secure, Efficient Image Retrieval Framework with user Feedback Support based on Color Features

While designing an image retrieval framework based on content based techniques, a critical aspect is that of transfer of visual data. It opens up the Pandora’s box of data privacy issues as well as retrieval performance bottleneck due to added network data transfer latency. The approach suggested here, elaborates enhanced privacy protection scheme. Firstly, by conducting search based on robust hashed values of features extracted from images to prevent revealing original content; secondly, by omitting random bits (both of length and position) from the search client's query hash to increase ambiguity for the image database server. It also lessens the network latency by limiting the server-client data transfer to variable sized candidate image sets. The search algorithm is made effective by using a combination of both local and global color features. So that even the local spatial information is not lost. To lessen computational complexity during search fusion of fuzzy color histogram with block color moment has been utilized to decrease the color feature dimension. Here a basic Relevance Feedback module is incorporated to capture the users' feedback on retrieval results and in turn improve, return better results to users. Keywords—Image Retrieval; Feature Extraction; Color Features; Fuzzy histogram; Relevance Feedback; Image Hashing; Color moment, Data Privacy.

INTRODUCTION Multimedia technology and digital image databases are trending nowadays resulting in rapid growth in size of database, quality of images and variety of image obtaining sources. Hence for usage, there is an inherent demand for efficient image retrieval. There are two hurdles though, 1. the risk of privacy leakage and 2. computational complexity. Image retrieval should be secure and fast, i.e. relatively unaffected due to the network latency. These two aspects should be considered very carefully while designing any approach for Image retrieval. Here I am considering such retrieval based on the content of the images only, i.e. Content Based Image Retrieval or, in short CBIR. Three propertiescolor, shape and texture are said to be content of an image. Thereby, CBIR is a strategy of recovering similar images w.r.t. the content of a supplied image. In system described in this paper. I have considered an environment, where the image database owner (remote storage), database user (search client/ query user) are different parties, not necessarily trusting each other. Hence, the privacy issues. The followings are the key players in this environment: a private database, a private query, a private CBIR technique. The common approach to solve the privacy problem is designing a retrieval algorithm on an encrypted search domain after storing images in encrypted format. [1], [2] As such an approach relies on complicated cryptographic computation, they are costly. My approach inclines toward SRR [3]. Hence, can be used with large databases, has privacy cover and adjustable control for both privacy and computation cost. It is essentially an SRR with robust hashing as a key component.
The proposed CBIR technique uses robust hashing for privacy, using color feature from images. To begin with, image features after extraction are normalized and hashed into a binary vector. Users are allowed arbitrary bits' omission of random length & positions. Thereby, query user has option to choose privacy Vs search speed trade-off. Once, the query is sent, image database calculates the possible candidate matches and returns them. The designed clients then trim down the final search result based of content similarity matching of a fusion of fuzzy feature [4], decreasing computational time. I also incorporated a Relevance Feedback module to capture the users' feedback on retrieval results and then re-sort, update and return them as final results to the users.
II. RELATED WORKS Content based image retrieval is a much studied topic. Its importance is felt when one considers the impact it has of various fields like digital image processing, medical imaging, diagnostic radiology, defense monitoring etc. Most of the articles I reviewed are based on color and texture features. Analysis of some of them are discussed below:

A. On Color Features
Sharma, Rawat & Singh [5], 2011, discussed the importance of color histogram for image database indexing and retrieval. In this process, all image pixels are counted and the track of color distribution is kept by the association of each quantized color value with a specific bin. They advise to check similarity of images through comparing obtained histogram outputs by intersecting them. This image descriptor is both simple to describe and easy to compute.
The work performed by Mangijao & Hemachandran et al. [6], 2012, suggests improving the discriminative power of color histogram indexing techniques, by dividing image horizontally into three equal non-overlapping regions. Then extract first three color moment from each of these three regions, to store a 27 floating point numbers in the index of the image.
Stricker & Orengo et al. [7], 1995, long back provided the algorithm to calculate color moments, and proved that image's color distribution can then be interpreted as a probability value which characterizes its color moments.

B. On Outsourced Image Privacy Aspects
The Earlier approaches for the support of outsourced storage, search, and retrieval of images can be broadly divided into two classes: based on Searchable Symmetric Encryption (SSE) and based on Public-Key Partially-Homomorphic Encryption (PKHE).
Z. Xia et.al [14], 2015, SSE-based solutions. Clients encrypts data and create encrypted index, before outsourcing it. Both encrypted index and data are outsourced. This allows searching in an efficient and secure way. The limitations are the need to index and encrypts it locally, entailing additional computational power; transferring additional data to cloud (encrypted index) etc.
Zheng et al. [15], 2015, Other approach is PKHE, schemes such as ElGammal [16] allowing additive and multiplicative freedom in encrypted domains. Clients does pixel by pixel processing of images with a PKHE schemed encryption and cloud indexes encrypted images. Issue with this is greater time and space complexities and limited scalability.
Li Weng, Laurent Amsaleg, April Morton and Stephane Merchand-Maillet [3], 2015, proposed a privacy protecting framework for large scale CBIR using robust hashing instead of encryption. My approach is built upon this very idea.

C. On Fuzzy Features
K. Konstantinidis, A. Gasteratos, I. Andreadis [17], 2005, Talked about replacing the classical color histogram creation with histogram linking. Reducing computationally expensive 3D histograms to one single-dimension histogram. Though it was based on the L*a*b* color space.
Mengzhe Li, Xiuhua Jiang [4], 2016, Talks about a highly effective image retrieval algorithm based on fusion of global fuzzy color feature algorithm and local color algorithm in low feature dimension.

III. PROPOSED SYSTEM & WORKING PRINCIPLE
Here, a scalable CBIR system has been considered. There are two primary entities: 1. Image data owner (search server) and 2. Search Client, or Query User. If elaborated in a step by step fashion, this discussed paradigm has six parts:

A. System Model
• Submit a partial query to the server (remove details to create ambiguity). • An extended query list is created on the supplied partial query (calculating all possible combinations for the missing binary bits). • The server performs a search with the extended query list, and sends back all matching items (it is called the 'Candidate list'). • The client matching against received results using original query and the fuzzy features. • The client provides relevant feedback if he/she is not happy with the search performance. • Take account of the feedback while a similarity match check is performed again with modified parameters (here, I used a simple statistical measure i.e. mean feature vector of original matches to further perform refined search for new matches) In this approach, the server could narrow down search scope using partial query. Whereas it becomes difficult for the server to infer the original query. The framework makes sure |Candidate Set| is large but also client should be able to find the final matches. Client is presented with the option to choose how much ambiguity to introduce through partial query. Hence, the size and the diversity of Candidate set can be controlled.

B. Attack Model
Thinking from the query client's perspective, server 1) should not know original query content, or 2) query category. Fulfilling the first ask is tougher. On the other hand, image server should be assured that client doesn't know too much information about its content, or hierarchy of indexing.
There are two steps where image server may derive something about the query: A. While receiving the query hash (denote client's privacy here as Pc1), B. While returning candidate set (denote client's privacy here as Pc2). Server privacy is represented as Pc3. If length of candidate set is |A|, then measures and inter-relations between privacy and |A| are as shown below: Min. privacy requirements ≤ |A| ≤ power of client, |database| • For a good system all of Pc1, Pc2 and Pc3 should be sufficiently large. In the designed system, there is option for user to choose how many bits to omit from the original query hash. For each case, bits are omitted across various sub-hashes before concatenating it to create the final partial query hash. Options are 5, 7, 9, 11 and 13. • Also, it has been considered here following Weng et.
Al. [3], probing'. Which is what is being performed in the system discussed using r = 5 or 6 mostly. A specific attack using majority voting has been considered here, where a curious server can and will try to predict the query category judging majority presence in candidate list. Details of the attack are listed in a later section.

C. Workflow
For easy understandings, workflow of the discussed framework is shown from the entity standpoint, as separate flow charts: From query user's end: From image owner's end: V. RESULTS My primary goal of design was to create a functioning image retrieval scheme for -• Similarity retrieval.
• Establish bias if any, between # of bits omitted from query and candidate set size. • Protect some privacy of image data. I have only focused on content confidentiality and not about nondetectability or unlinkability. • Provide search client option to submit feedback for better retrieval accuracy. The below elaborated results are generated following experiments using Matlab R2018a on a machine having Intel (R) Core i3-5005U CPU @ 2.00 GHz, 4 GB RAM, 64 bit, Microsoft Windows 10 OS. The paradigm has been tested on the Corel-1K image database [21], freely available on the Internet. It contains images of 10 categories, each with 100 images.  The results are shown in a comparative fashion [9], [10], [11], [12], [13], [20], [21].  Table-5

C. Privacy Performance
To focus on the search performance with regards to the varying degree of ambiguity in search query, privacy performance analysis has been done. The Same data, when plotted in graph also verifies the fact that there is no apparent bias for different classes of images in between candidate set length and the number of bits omitted.   [5,7,9,11,13,15] o These are calculated using the following equation [3], -Cost =(Ni*(d+l*n)) for i = 1,2. Here N1, N2 are # of candidates returned for public query (with no omitted bits) and private query with multi-probing respectively.  Beach  33  6336  33  6336  Monuments  65  12480  65  12480  Bus  127  24384  144  27648  Dinosaur  233  44736  233  44736  Elephant  58  11136  58  11136  Rose  110  21120  120  23040  Horse  73  14016  76  14592  Mountain  33  6336  33  6336  Food  203  38976  203  38976 From Table -6, one can say the cost incurred for public and private database are mostly close enough. Or, we can say this privacy requirement doesn't cost much.

D. Majority Voting Attack
To measure resilience against majority voting attack, I am guessing the query category from candidate list result for some uses cases. I am intentionally choosing some cases where images have greater structural contents and other cases where they have lesser structural contents. E. Feedback performance I have given an option for the user in the implemented model, to specify which returned images are proper to his/her query by clicking of a check box next to each returned images. I am gathering these user selections as relevant feedback (through human interactions) to try improving the search performance. The algorithm used to improve retrieval performance after feedback submission is already discussed in appropriate section. It is nothing revolutionary, just a simplified approach. Following the suggestion of statistical analysis of feedback in CBIR mentioned in a paper [19]. I have used a metric called ROC (Rate of Convergence) [19] along with precision and recall here. To check if the proposed feedback at all improves retrieval performance. ROC is the defined, as the requisite numbers of iterations of feedbacks following which precision of a CBIR system remains constant or the other system parameters do not change considerably. It measures whether the most accurate results possible can be produced fast enough, another practical demand for modern CBIR systems.
Below are the results when only least ambiguous (5-bit omission) query is considered: