Overview of Content based Image Retrieval using Map-Reduce

Tapas Bhadra; Shachi Sonar; Samruddhi Zagade

doi:10.17577/IJERTV8IS090272

Volume 08, Issue 09 (September 2019)

Overview of Content based Image Retrieval using Map-Reduce

DOI : 10.17577/IJERTV8IS090272

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 148
Total Downloads : 92
Authors : Tapas Bhadra , Shachi Sonar , Samruddhi Zagade
Paper ID : IJERTV8IS090272
Volume & Issue : Volume 08, Issue 09 (September 2019)
Published (First Online): 09-10-2019
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Overview of Content based Image Retrieval using Map-Reduce

Tapas Bhadra, Shachi Sonar, Samruddhi Zagade Department of Information Technology K.J.Somaiya College of Engineering,

Vidyavihar

Abstract:- With the exponential increase in the amount of multimedia data, data storage and retrieval has become a big challenge. It has become increasingly difficult to query and retrieve result relevant to users demand with efficiency and accuracy. Previously, the search of image data was done by using keywords or description of data which failed to produce expected results. The distributed data model of Hadoop, an open-source software, provides us with a solution to this problem by using an image as query input and Map Reduce algorithms for processing. This paper intends to discuss the Content based Image Retrieval using Map-Reduce Algorithm.

INTRODUCTION

With the growing amount of media, especially images, produced through mobile phones, cameras and over the world wide web, the problem of storing and retrieving relevant data has become a real issue. The advances made in the field of digital technology has led to the explosion of the volume of images created.

Earlier, text-based image retrieval systems existed. Metadata in the form of captions and keywords were added to every single image in the database. While this was a viable solution earlier, the exponential growth of digital media has made this an impossible task. Feature extraction is another popular method available for the retrieval of images. This approach though appealing has its cons. Extraction of features can become extremely complicated with large databases. Furthermore, massive computational power is required to run the particular algorithms. This complexity of retrieval of images coupled with the large processing power required for the huge databases presents organizations with a tough challenge. To solve this difficulty regarding image retrieval, Hadoops Map Reduce functionality plays a vital role. Here, rather than using keywords to retrieve the desired image, certain features of the images like colour, texture, facial features are used. This entire process is known as content based retrieval. Through Hadoops Map reduce frameworks parallel functionality, the processing is sped up considerably. Along with this, through the reduction of features, large databases are stored in a compact manner for retrieval.
RELATED WORK

Dewen Zhuang et al[2] has proposed the relevance feedback method to reduce the semantic gap. Image feature dimensionality reduction was performed utilizing the linear discriminant analysis. It diminishes the semantic gap and the storage of image signatures, along with improving the retrieval efficiency and performance. However, the performance is low for a few of the categories. Furthermore, the extraction of visual effect features and measurement of

regional similarity still needs to be worked upon. CBIR for JPEG pictures has pulled in numerous individuals' attention and a sequence of algorithms directly reliant on the discrete cosine transform area have been formulated. To exploit these DCT coefficients while considering the color and texture data for the retrieval of JPEG formatted pictures, performing CBIR winds up being productive. Here, decompressing the images and after that, processing in the spatial domain is carried out. The feature vectors are then figured out from a few of the DCT coefficients. This activity is performed in the partially decoded domain. It can incredibly useful in reducing retrieval complexity.

The two methods that well be analyzing are given in the next section.
HADOOP AND MAP REDUCE

Hadoop is an open source software used for the data handling of large databases. It uses the distributed data framework for the storage as well as processing of data. The data types that can be handled by Hadoop can either be structured, unstructured or semi-structured. Hadoop used HDFS which is Hadoop Distributed File System for storage of data. For the computation of such large data, The MapReduce framework consists of two phases, Map and Reduce. In the map phase, data that is stored by splitting at various locations is provided as input to a function which then produces the key value pairs. These sets of key value pairs then form the input for the next phase and over each pair, a user-defined function is then executed in order to produce a set of intermediate key value pairs. In the reduced phase, the aforementioned intermediate key value pairs form the input and groups are determined as per the key, while values are consolidated according to the reduced algorithm that is to be provided by the user. HDFS is one of the key aspects of Hadoop. In HDFS, the storage data is split and stored as datanode and the metadata pertaining to this data is stored as namenode. Namenode is the master which stores all information regarding meadata of the data and Datanode is the slave which stores the actual data. Further, data kept in Hadoop is reliable as multiple copies of it are stored for security and backup.
SYSTEM ARCHITECTURE

Fig-1: System Architecture

The admin or the host of the system will upload the image and all the features related to the image are extracted such as the color, shape and texture. These features are stored in the feature vector with the help of Hadoop MapReduce. When the user provides a query, the features of the query are extracted and they are compared with the feature vector in the database and the one with the minimum distance is selected for the user.
1. METHODOLOGY
  
  Content Based Image Retrieval usually consists of two steps, the extraction of features as well as their mapping. While feature extraction is concerned with the accuracy achieved during retrieval, the feature matching is concerned with the efficiency and speed. As the features are found in high dimensions, so is the searching. There are numerous approaches to scan high dimensional space, for example, linear scanning, tree searching, vector quantization, and hashing. Between the mentioned techniques, hashing comes out to be the most effortless approach to limit time complexity as O(1) while devising a fuzzy search strategy. Indeed, even feature matching is upgraded by means of hashing; because of the immense volume of data for CBIR, the time complexity nevertheless persists to be extravagantly high. Primarily, in the age of the explosive expansion of digital data, the stand-alone techniques for CBIR start getting increasingly hard to maintain due to the dissatisfaction from the load of capacity and processing.
  
  Amongst every one of the available modules, Feature Extraction and Feature Matching are the most tedious and dilatory.
  
  The process is given by:
  Shape could be thought of as the surface definition of a particular image. It has certain contours and outlines. Through shapes, you can distinguish a region from its surroundings. Fourier transform and Moment Invariants are few of the available shape descriptors.
2. CONCLUSION

The constantly growing nature of digital images poses a challenge for organizations to carry out image retrieval for their problems. Methods like text based retrieval fail to keep up with this growth of media. Content Based Image Retrieval is a strong candidate as a solution to this problem of image retrieval. Hadoops ability of parallel processing not only speeds up the process, but the MapReduce framework helps with one of the most challenging aspects of Image Retrieval. By the usage of various algorithms like KMeans, Fuzzy C means Clustering, Convolutional Neural Networks, as well as Support Vector Machine, a comprehensive and complete system for Image Retrieval can be created.

REFERENCES:

KUSUMA.B, 2MEGHA.P.ARAKERI. Survey On Content Based Image Retrieval Using Map reduce Over Hadoop 5 SURVEY ON CONTENT BASED IMAGE RETRIEVAL USING MAPREDUCE OVER HADOOP
Gao Li-chun and Xu Ye-qiang. Image retrieval based on relevance feedback using blocks weighted dominant colors in MPEG-7. Journal of Computer Applications.vol.31(6), pp.1549-1551, 2011.
Chunhao Gu Yang Gao. A Content-Based Image Retrieval System Based on Hadoop and Lucene
Techniques of Content Based Image Retrieval: A Review Sheetal A. Wadhai 1 , Seema S. Kawathekar 2
A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallel -Means Algorithm in MapReduce Environment Jianfang Cao,1 Min Wang,2 Hao Shi,2 Guohua Hu,1 and Yun Tian1

Overview of Content based Image Retrieval using Map-Reduce

Leave a Reply