Content Based Image Retrieval Using Color Feature

DOI : 10.17577/IJERTV2IS4326

Download Full-Text PDF Cite this Publication

Text Only Version

Content Based Image Retrieval Using Color Feature

Ms. Shilpa P. Pant Computer Department CCOEW

Pune

Abstract:

The purpose of this paper is to describe the problem of designing a Content Based Image Retrieval, CBIR system. Using color feature. Due to the enormous increase in image database sizes, as well as its vast deployment in various applications, the need for CBIR development arose. In CBIR systems, image processing techniques are used to extract visual features such as color, texture and shape from images. Therefore, images are represented as a vector of extracted visual features instead of just pure textual annotations. Color, which represents physical quantities of objects, is an important attribute for image matching and retrieval. Many publications focus on color indexing techniques based on global color distributions. Color correlogram and color coherence [6] vector can combine the spatial correlation of color regions as well as the global distribution of local spatial correlation of colors. These techniques perform better than traditional color histograms when used for content-based image retrieval. However, they require very expensive computation

Keywords: CBIR, Feature Extraction, HSV, RGB

  1. Introduction

    Content-based image retrieval (CBIR) [1, 2] is a technique used for extracting similar images from an image database. Due to the advances in digital photography, storage capacity and networks speed, storing large amounts of high quality images has been made possible. Digital images are used in a wide range of applications such as medical, virtual museums, military and security purposes, and personal photo albums. Users have difficulties in organizing and searching large numbers of images in databases. Therefore, an efficient way for image retrieval is desired.In order to respond to this need, researchers have tried extending Information Retrieval (IR) techniques used in text retrieval to the area of image retrieval. In this approach, a set of keywords are assigned to each image. However, there are significant limitations to this approach. First, the approach is not scalable since each object needs to be manually annotated with keywords and/or textual descriptions, making it impractical for large data sets. Second, due to the subjectivity of the human annotator, the annotations may not be consistent or complete which negatively effects retrieval performance. Furthermore, it may be infeasible to describe visual content (e.g., shape of an object) simply using words

    1.1 Content Based Image Retrieval

    CBIR or Content Based Image Retrieval is the retrieval of images based on visual features such as colour, texture and shape Reasons for its development are that in many large image databases, traditional methods of image indexing have proven to be insufficient, laborious, and extremely time consuming[2]. These old methods of image indexing, ranging from storing an image in the database and associating it with a keyword or number, to associating it with a categorized description, have become obsolete. This is not CBIR. In CBIR, each image that is stored in the database has its features extracted and compared to the features of the query image. There are many things to consider in the design of a system for content-based search in image databases:

    Image features What visual features are most useful in each particular case?

    Image representation How should we code the image features?

    Representation storage and retrieval The search must be made fast. What are the proper searching techniques and indexing structures?

    User interface How should the user best browse and search for images?

    Figure 1-1. Diagram for content-based image retrieval system

    The main steps in CBIR are:

    Feature Extraction Features are extracted from the images. The definitions of features are usually pre-defined, such as color, texture and shape. These features are usually stored in the form of real-valued multi- dimensional vectors.

    Indexing The image database may then organize the extracted features by using an indexing structure for retrieval.

    Retrieval Content-based retrieval can be performed on the indexing structure efficiently and effectively.

  2. Colour

One of the most important features that make possible the recognition of images by humans is colour. Colour is a property that depends on the reflection of light to the eye and the processing of that information in the brain. We use colour everyday to tell the difference between objects, places, and the time of day[3]. Usually colours are defined in three dimensional colour spaces. These could either be RGB (Red, Green, and Blue), HSV (Hue, Saturation, and Value) or HSB (Hue, Saturation, and Brightness). The last two are dependent on the human perception of hue,

to obtain a color histogram for each block. An image will then be represented by these histograms. A GCH takes color histogram of whole image and thus represent information regarding whole image, without concerning color distribution of regions in image. In the contrary, an LCH divides an image into fixed blocks or regions and takes the color histogram of each of those blocks. LCH contains more information about an image but when comparing images it is computationally expensive.

Two images are compared by calculating the distances, using their histograms, between a region in one image and a region in same location in the other image. The square root of Euclidean distance as the distance between color histograms can be useful for comparison.

The distance metric between two images Q and I in the LCH will be defined as:

, = ( )² (3.1)

saturation, and brightness[1].

=

=

Most image formats such as JPEG, BMP, GIF, use the RGB colour space to store information . The RGB colour space is defined as a unit cube with red, green, and blue axes. Thus, a vector with three co-ordinates represents the colour in this space. When all three coordinates are set to zero the colour perceived is black. When all three coordinates are set to 1 the colour perceived is white[3]

.The other colour spaces operate in a similar fashion but with a different perception.

2.1 Methods of Representation

Color histogram [11] is widely used technique in most of the CBIR approaches. A color histogram is a type of bar graph, where height of each bar represents an amount of particular color of the color space being used in image. The bars in a color histogram are named as bins and they represent the x-axis. The number of bins depends on the number of colors there are in an image. The number of pixels in each bin denotes y-axis which shows how many pixels in an image are of a particular color.

In color histograms Quantization is a process where numbers of bins are reduced by taking colors that are similar to each other and placing them in the same bin. Quantizing reduces the space required to store the histogram information and time to compare the histograms. Obviously quantization reduces the information regarding the content of images. This is the tradeoff between space, processing time and accuracy in results.

For color analysis following three methods are provided 1.Color moments

  1. Local color histogram

  2. Global color histogram

3. Local Color Histogram

Color histograms are classified into two types, global color histogram (GCH) [12] and local color histogram (LCH) [11]. This approach (referred to as LCH) includes informationconcerning the color distribution of regions. The first step is to segment the image into blocks and then

where M is the number of segmented regions in the images, N is the number of bins in the color histograms, is the value of bin i in color histogram ,which represents the region k in the image Q and is the value of bin i in color histogram ,which represents the region k in the image I. In some scenarios, using LCHs can obtain better retrieval effectiveness than using GCHs However, since the LCH only compares regions in the same location, when the image is translated or rotated, it does not work well.

Fig.3.1 Using LCH to compute the distances between images A and B

Input: Query image.

Output: Set of similar images to query image from the set of N images.

1.Initialization

1.1 gridCount 4

    1. row gridCount * gridCount

    2. Declare vector localColorHistogram of size row * 64

    3. Declare array histogram

      [64]

      1.4 i, j, k 0

  1. Compute image size height * width

  2. For i = 1 to size do

    1. k divided pixel values into 64 bins

    2. localWidth width/gridCount

    3. localHeight height/gridcount

    4. j(i/width/localHeight)*gridCount+ ((i%width)/localWidth)

    5. histogram[j][k] histogram[j][k] + 1.0

  3. For i = 1 to gridCount do

    1. For j = 1 to 64 do

      4.1.1 Vector histogram[i][j]/(size/gridCount)

  4. Add Vector to feature vector localColorHistogram

  5. Compare this feature vector with the feature vectors of N images stored in the database using distances between them.

  6. Calculate dLCH using histogram [ ] [ ].

  7. According to the distances dLCH the system returns nearest neighbors as the query result.

Figure: 3.2 Algorithm for Local Color Histogram

4. Global Color Histogram

A GCH takes color histogram of whole image and thus represent information regarding whole image, without concerning color distribution of regions in image.GCH is known as traditional method for retrieving color based images. Since it does not include color distribution of the regions, when two GCHs are compared one might not always get a proper result when viewed in terms of similarity of images.

h r, g, b (R, G, B) = N. Prob{R = r, G = g, B = b} (4.1)

Where R, G and B represent the three color channels and N is the number of pixels in the image. Computationally, the color histogram is formed by discretizing the colors within an image and counting the number of times each discrete color occurs in the image array. Two images are compared by calculating the distances, using their histograms, between a region in one image and a region in same location in the other image. The square root of Euclidean distance as the distance between color the distance metric (Equation 4.15) between two images Q and I in the GCH will be defined as:

, = ( )²

(4.2)

=

Fig.3.2. Input Query Image

Fig.3.3 Result by using Local Color Histogram Method

where N is the number of bins in the color histograms,

is the value of bin i in color histogram of the image Q and is the value of bin i in color histogram

of the image I.

Fig.4.1. Image A and B and their color histograms

In the sample color histogram there are three bins: black, white and gray. The color histogram of image A: {25%, 25%, 50%}; the color histogram of image B: {18.75%, 37.5%, 43.75%}. By using the Euclidian distance metric (Equation 5.3) to calculate the histogram distance, the distances between images A and B for GCH is given as:

dGCH (A, B) =

. . + . . + . .

= 0.153 (4.3)

The GCH is the traditional method for color-based image retrieval. However, it does not include information concerning the color distribution of the regions, so the distance between images sometimes cannot show the real difference between images.

Input: Query image.

Output: Set of similar images to query image from the set of N images.

  1. Initialization

    1. Declare vector of size 64

    2. Declare array histogram[ ] of size 64

      the spatial correlation of color regions as well as the global distribution of local spatial correlation of colors. These techniques perform better than traditional color histograms when used for content-based image retrieval. However, they require very expensive computation.

      Color moments [3,5] have been successfully used in content based image retrieval systems, especially for retrieval of images only containing the objects of users interest. Because most information can be captured by low- order moments, i.e. the first moment , the second and the third central moments and skewness equation , color moments can be effectively used as color features. Although simple, these features are inexpensive to calculate.

      If the value of the i-th color channel at the j-th image pixel is pij, then the color moments are defined as:

  2. Compute image size height * width

    µ =

    ( 5. 1 )

  3. For i = 1 to 64 do 3.1histogram[i] 0.0

  4. For i = 1 to size do

    1. Find the index

      =

      =

      µ

      ² ( 5. 2 )

  5. For i = 1 to 64

    1. vector histogram[i]/size

  6. Compare this feature vector with the feature vectors of N

=

images stored in the database using distances between

= µ ³

( 5. 3 )

them. 7.Calculate dGCH.

=

8.According to the distances dLCH the system returns nearest neighbors as the query result.

Figure: Algorithm for Global Color Histogram

where n is the number of pixels in an image. These

moments may be calculated in different color spaces. Mean can be understood as the average color value in the image. The standard deviation is the square root of the variance of the distribution. Skewness can be understood as a measure the degree of asymmetry in the distribution.

For each image, a 9-dimensional color feature vector (Equation 4.4) is obtained. These color feature vectors are defined as:

= [µ ; ; ] ( 5. 4 )

where i = 1, 2, 3 is the three channels of a color space

A function of the similarity (Equation 4.7) between two image distributions is defined as the sum of the weighted differences between the moments of the two distributions. Formally this is:

, =

|µµ| + || + ||

(5.5)

=

Fig. 4.2 Result by using Global Color Histogram

method

5. Color Moments

Where:

(H, I) : are the two image distributions being compared

i : is the current channel index

r : is the number of channels (e.g. 3)

µ, µ : are the first moments (mean) of the two image

Color, which represents physical quantities of objects, is an

distributions

, : are the second moments (variance or standard

important attribute for image matching and retrieval. Many publications focus on color indexing techniques based on

global color distributions. However, these global

deviation) of the two image Distributions

, : are the third moments (skewness) of the two

distributions have limited discriminating ability because they are unable to capture local color information. Color correlogram and color coherence [4] vector can combine

image distributions

wi : are the weights for each moment

Ste 4: Rank images based on similarity

Pairs of images can be ranked based on their dmom values. Those with greater values are ranked lower and considered

From above, if we compare the two d

mom

values:

less similar than those with a higher rank and lower dmom values. It should be noted that the dmom value is a similarity function and not a metric. It is very possible that the

dmom( Index,Test1) < d

mom( Index,Test2)

comparison of two different pairs of distributions can result in the same dmom value. In practice this leads to false positives being retrieved along with truly similar images. For an image retrieval system, this drawback is considered

5.1 Color Moments for Color Feature Extraction

The following example illustrates how color moments are used to compare images. Consider three images as shown in figure

Index Imag Test Image1 Test Image 2

Fig.5.1. Three images for comparing color features using color moments.

Step 1: Calculate Moments for Index Image

Calculate the three color moments using the formula defined in 4.2 for the 'Index Image'. The values are:

0.1016

0.1149

0.1779

0.8583

0.1139

0.0563

0.6416

0.2994

0.0974

Index Image

The rows correspond to each of moments and the columns to channels.

Step 2: Calculate Moments for Query Image

Repeat the calculations for two test images. The values are:

0.1718

0.0986

0.1400

0.1878

0.1671

0.2331

0.7619

0.1508

0.0455

0.2462

0.2281

0.2492

0.7062

0.2242

0.0772

0.6052

0.3532

0.1534

Test Image 1 Test Image 2

Therefore Test Image 1 is more similar to the Index Image than Test Image 2.

Conclusions

In this paper I have introduced some fundamental techniques for content-based image retrieval. Color moments, local color histogram and global color histogram methods are used for color feature extraction in content based image retrieval. Currently implemented methods are tested on the image database containing 1000, 6000 and 10000 images. For indexing the images it takes more time, if the database is larger.

Future Scope

Complete architecture can be extended for automatically annotating images in database.The primary objective was to compare the different methods of image analysis and there still remains a need in the future to derive an objective performance measure that would yield absolute results. Retrieval scheme can be further extended to the relevance feedback method.

References:

  1. Philippe Henri Gosselin and Matthieu Cord Active Learning Methods for Interactive Image Retrieval IEEE Trans. On Image Processing, Vol. 17, No. 7, July 2008.

  2. M. Jian, J. Dong, R. Tang, "Combining color, texture and region with objects of users interest for CBIR, IEEE 2007, DOI 10.1109/SNPD.2007.104.

  3. CBIR USING COLOR HISTOGRAM PROCESSING An Intelligent Image Database System, P.S.SUHASINI

    Dr. K.SRI RAMA KRISHNA, Dr. I. V. MURALI

    KRISHNA, Journal of Theoretical and Applied Information Technology

  4. J. L. Shih & L. H. Chen, Color Image Retrieval based on primitives of color moments, IEEE Proc. Vis. Image signal processing, vol. 149, no. 6, December 2002.

[5]Y. J. Choi, Retrieval of identical clothing Images based on Local Color Histogram, IEEE Comp. society 2008. ICCIT 2008.16.

Step 3: Calculate d

mom

value

  1. J. Zhao, Y. K. Zhang, 2-Layer method of Image Retrieval based on Global Color Histogram and Local color spatial features, Proceeding of 6th intl conf on cybernatics

    Consider all weights wi values equal to 1. Calculate the

    dmom value for dmom (Index, Test1) and dmom(Index, Test2). The following values result:

    dmom(Index,Test1) = 0.5878

    Aug 2007

  2. C. Ruberto & A. Morgera, Moment based techniques for image retrieval, IEEE 2008, DOI 10.1109/DEXA.2008.73 IEEE CS.

dmom(Index,Test2) = 1.5585

Leave a Reply