A Comparative Study of Steganalysis using Support Vector Machines on Different Image Formats

DOI : 10.17577/IJERTV4IS030828

Download Full-Text PDF Cite this Publication

Text Only Version

A Comparative Study of Steganalysis using Support Vector Machines on Different Image Formats

Rishidas .S Associate Professor. Dept. of Electronics G.E.C.Kozhikode

Gayathri Krishnan L. Dept. of Electronics G.E.C.Kozhikode

Sujith Kumar T P Asst Professor Dept. of Electronics G.E.C.Kozhikode

Abstract: Steganography is a widely used technique in secure communication. In image steganographic technique one message is secretly embedded in an image so that the existence of this message is concealed from the viewers. Steganography can be performed on different image formats. Steganalysis is the technique of detecting the presence of stego content in the image. This paper aims to make a comparative study of steganalysis technique using Support Vector Machine Classifier on different image formats embedded with stego content.

KeywordsSteganography; steganalysis; moments; Support Vector Machine,Bitmap image, gif image

  1. INTRODUCTION

    1. Steganography

      Steganography is a widely used technique of embedding a message with in another file in such a manner that the existence of the first message is concealed. It is a special kind of data hiding technique when compared to cryptography. The latter covers the hidden information from malicious people, while the former conceals even the presence of the hidden information. If we examine the etymology of the word steganography, it originated from the Greek word steganos which means secret or covered, and the graphy means writing or drawing , both of these words together constitute the covered writing. The main objective of steganography is to communicate secretly. It is widely range of applications in Defence and forensic areas. Steganography can be performed on images, audio signals, video signals etc. among them the most commonly used is the image steganography. The simplest steganographic technique is the LSB steganography. Here the least significant bit of each pixel is in the image is replaced with the information that we want to hide. This does not decrease the perceptual image quality.

    2. Steganalysis

      Steganalysis is the technique of detection of the presence of the information which is hidden in a stego image. It is difficult problem because the original host data is unknown. After detecting the presence of hidden information in the image, it can be processed using different steganalysis techniques. Steganalysis techniques are classified into two, blind and targeted steganalysis technique. Blind steganalysis technique or otherwise known in the name universal steganalysis technique is a

      generalized one which is uses to detect the presence of stego image without knowing the steganalysis technique used to hide the image. While targeted steganalysis technique is used to crack message embed due to a particular steganographic technique. Usually targeted steganalysis techniques are more accurate than the blind ones.

    3. Bitmap Image

      Bitmap image format is a widely used image format which carries an extension of .bmp. A bitmap image literally means a map of bits that eventually form a picture when rendered to a display. In bitmap image each pixel is assigned a particular bit to reflect a colour. For an RGB image there are different shades of gradiation in the colours and lightning. As the number of bits used to represent an image increases the resolution of the image also increases. As this type of image format store so much information in the highest resolution, they make very beautiful images. These images are built pixel-by-pixel they can be easily edited.

    4. Gif image

      The Graphics Interchange Format, known in the acronym GIF format is an image format that came to widespread use recently. This format generally supports up to 8 bits per pixels for an image, which allows the image to reference its own palette of up to 256 different colours, selected from the 24-bit RGB colour space.GIF format is well suited for simpler images like graphics or logos with solid area of colour. These images are compressed using the Lempel-Ziv-Welch (LZW) lossless data compression technique for reducing the file size without degradation in the visual quantity.

  2. METHODOLOGY

    Here we are having two sets of images, original image set and stego image set. Each image is first divided into disjoint sets of size 16X16.Then transform each image into different domain. In this paper we are considering three domains, spatial, discrete cosine domain and discrete wavelet domain. Extract the first order moments of each blocks, these moments are mean, variance, skewness and kurtosis. These operations are performed over the whole data set. Then labelling is done, +1 label is given to the original image set and -1 label is given to the stego image

    set. Using these data sets a Support Vector Machine is trained. In testing phase the feature vectors of the image under test is applied to an already trained SVM to detect the presence of a steg signal.

      1. Discrete Cosine Transform

        The discrete cosine transform of an image represent image as the sum of the sinusoids of varying

        In the proposed work, the first order moments are extracted. They are the mean variance skewness and kurtosis. Consider a set of N samples X={x1, x2,.xN}

        The mean expressed as,

        magnitude and frequency.DCT exhibits excellent energy compaction property and it is a fast transform. Most of the

        = 1

        =1

        —— (2)

        visually significant information of an image is concentrated within a few coefficients.The 2-D DCT of an N x N image Fu, v is given as

        The mean of a set of values represents the tendency to cluster around a particular value

        Variance,

        F u,v uvN 1 N 1 f(x,y) cos 2x 1u cos 2 y 1v

        —–(1)

        2N

        2N

        x 0 y 0

        = 1

        ( )2 `

        Where N is the length of the sequence, 0 < u, v < N-1 and u is defined as,

        1

        =1

        ability around the

        Variance represents the vari

        ——— (3)

        1

        u N

        for u 0

        mean value. The square root of variance is the standard deviation.

        Skewness measures the degree of asymmetry of a

        2

        N

        for u 0

        distribution around the mean value.

      2. Discrete Wavelet Transform

        Wavelet analysis performs the local and multi

        =

        =1

        3

        ———- (4)

        resolution analysis. If we consider the spatial domain representation of an image, theadjacent pixel values are highly correlated and hence the image contains a lot of redundant data. If we compute the Discrete Wavelet Transform image is represented in multiple sub bands, for different time scale and frequency points. That is this transform analyses the signal in different frequencies with different resolutions.

        A Haar wavelet is the simplest type of wavelet. It is related to a mathematical operation called the Haar transform. It serves as a prototype for all theother wavelet transforms. This transform decompose the signal into two sub signals, one is the running average and other signal is the running difference.

      3. Feature extraction

        Feature vectors are the n dimensional vectors that contain information describing an object's important characteristics. In image processing, features can take different forms. A simple feature representation of an image is the raw intensity value of each pixel.

        These feature vectors re extracted from the images and created a database. This database is used to train the Support Vector Machine and a model file is created. In the testing phase the feature vectors of the test images are extracted and compared with the Model and classification is performed.

        Where represent the standard deviation.

        Kurtosis measures the relative peakness or flatness of a distribution to a normal distribution.

      4. Principal Component Analysis

        Principal Component Analysis is a technique used for the dimensionality reduction. It allows the identification of standards in data and their expression in such a way that the similarities and differences are given emphasis. Once the patterns are identified, they can easily be compressed. Principal components are identified by calculating the Eigen vectors and Eigen values of the data covariance matrix.

      5. Classification

        Support vector machine is a classifier that performs classification by constructing hyperplanes. Here decision planes are used to define the decision boundaries. The decision planes are constructed in such a way that the margin of separation is maximum. Support Vector Machines are of two kinds linear and nonlinear.

        If the classes are not linearly separable the data points are transformed into another high dimensional space,where the data points become linearly separable. The training examples {xi} are mapped into a higher dimensional space using a function . K (xi, xj) = (xi) T(xj) is known as the kernel functions. The main kernel functions are,

        i

        • Linear kernel: K (xi, xj) = x Txj.

          i

        • Polynomial kernel: K (xi, xj) = (x Txj+ r)d , > 0

        • RBF kernel : K (xi, xj) = exp(-xi- xj2) , > 0

        • Sigmoid kernal

    If the classes are nonlinearly separable Radial Basis Function kernel is used.

    IV. CONCLUSION

    From the above results it is clear that the steganalysis technique using Support Vector Machines could provide a good accuracy. The GIF format is easily identifiable to steganalysis techniques when compared to the bitmap image format.

    Fig1.Non linearly separable data

  3. RESULT

There are 75 images in each class. 60% of this is used for training the SVM and 20% is used as the validation data set, and 20% is used as the test data set. The svm is trained using different kernels for different c values. The obtained accuracies are as follows,

Table I. Accuracy of test data when classified using svm for Bitmap format.

Kernal

Domain

Linear kernal

Polynomial kernal

RBF

kernal

Spatial

33.33%

50%

50%

DCT

46.67%

50%

50%

DWT

50%

60%

60%

Table II. Accuracy of test data when classified using svm for GIF format

REFERENCES

  1. Provos N. and Honeyman (2003): Hide and Seek: an introduction to steganography, Security and Privacy Magazine IEEE, volume 1. Issue 3, pp. 32-44.

  2. Bret Dunbar, A detailed look at steganographic techniques and their use

  3. Donovan Artz (2001): Digital steganography: Hiding data within data,IEEE Internet computing, 75-80.

  4. Reshmi S. Bhooshan, and Biji Jacob Audio Steganalysis:A Comparison between DWT and BMPT Based Approaches.

  5. John C. Platt (1998): Sequential Minimal Optimization- A fast algorithm for training Support Vector Machines, Microsoft Research,Technical Report MSR-TR-98-14.

  6. Neil F. Johnson and Sushil Jajodin, Steganalysis: The investigation of Hidden Information.

  7. Gonzalez and Woods, Digital Image Processing 3rd Ed. (DIP/3e).

  8. Barnali Gupta Banik Prof. Samir K. Bandyopadhyay A DWT Method for Image Steganography

  9. Yun Q. Shi,Guorong Xuan, Dekun Zou,Jianjiong Gao Steganalysis Based on Moments of Characteristic Functions Using Wavelet Decomposition, Prediction,Error Image, and Neural Network

  10. Arvind Kumar,Km. Pooja Steganograhy -A Data HidingTechnique

Kernal Domain

Linear kernal

Polynomial kernal

RBF

kernal

Spatial

53.33%

60%

66.67%

DCT

53.33%

66.67%

60%

DWT

60%

60%

70%

Leave a Reply