Design Of Adaptive Embedding Schemes For Digital Images

DOI : 10.17577/IJERTV1IS6465

Download Full-Text PDF Cite this Publication

Text Only Version

Design Of Adaptive Embedding Schemes For Digital Images

International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181

Vol. 1 Issue 6, August – 2012

VamsiKiranMekathoti 1**SaipriyaVissapragada1* ArunkumarBeyyala 1 Assistant Professor Assistant Professor Assistant Professor

VignanNirula Institute of Science and Technology for Women,



The minimum-distortion framework is dedicated to the problem of optimizing distortion functions so that they better correspond to statistical detectability as restrained by blind feature-based steganalyzers. In reality, utmost distortion functions are gotten heuristically and do not generalize well to other cover sources. Here, we restrain ourselves to independent embedding variations and present practical tools that can use for learning the embedding algorithm fora given cover source.


Batch steganography, stegosystem, Gibbs constructional,

  1. Introduction

    Our motivation for solving the problem of the cost-function design comes from the HUGO algorithm that assigns the costs of individual changes based on the pixel neighborhood. Unfortunately,this approach does not easily generalize to other cover sources, such as JPEG or colorbitmap images, neither is it clear how to optimize the design. Here we open the questionof the cost-function design and strive for a robust approach that generalizes well to unseen coverimages and unseen steganalytic features to avoid overfitting to a particular cover source and featurespace. For example, the Feature Correction Method, which is a heuristic approach to embed

    while approximately preserving the cover- image feature vector, is known to be overly sensitive to thechosen feature set and does not generalize or scale well. The work in has an alternate featurepreservation approach and also empirically considers the dynamics between steaganographer andsteganlyzer.

  2. Empirical Design of Cost Functions

    We focus on designing adaptive embedding schemes for the payload-limited sendersubjected to sequential steganalysis. In this regime, the sender decides on the number of bits hewants to hide in a given cover object, embeds his payload, and sends the stego object through apassively monitored channel. In sequential steganalysis, the warden has to decide whether agiven image is cover or stego solely based on a single object. We deliberately omit the possibility ofintentionally spreading the payload into a group of cover images a technique known as the batchsteganography. This mode can improve the security of the scheme; however, it should no longer betested with sequential steganalysis.

    A common way of testing steganographic schemes is to report a chosen detection metric (ROCcurve, accuracy, minimum error probability under equal priors PE, etc.) empirically estimatedfrom a database of cover and stego images where each stego image carries a fixed relative 1

    International Journal of Engineering Research & Technology (IJERT)

    ISSN: 2278-0181

    Vol. 1 Issue 6, August – 2012

    payload.Whenever possible, we report results obtained from cover images of roughly the same size to reducethe effect of the square root law [1].Our goal is to design a set of functions i, i 2 {1, . . . , n}, which, given the original cover image,assign the cost of changing individual cover elements to their new values. For digital images, thedependence between two cover pixels rapidly decreases with their distance. In case of gray-scalespatial-domain digital images, the cost of changing a single pixel should mainly depend on its immediate neighborhood. For this reason, we constrain ito be a real-valued function with smallsupport, i(x, yi) = (x(i), yi), where x(i) denotes cover pixels spatially close to pixel i.From practical experiments, it is possible to identify the quantity that should drive the costs.

  3. Inverse single-difference cost model

    Let 0 and Ni = {xi,!, xi,, xi,",

    . . . , xi,} be a setof eight pixels from the 3

    × 3 neighborhood of the ith pixel. We use the

    ±1 embedding operation,Ii = {xi 1, xi, xi + 1} I, and define

    At the image boundary, the set of neighboring pixels Ni is reduced accordingly. This cost assignmentpenalizes changes in textured areas less than those in smooth regions depending on the differencesbetween neighboring pixels.

  4. Blind Steganalysis

    The only way of evaluating the security of steganographic schemes for

    empirical covers is to subjectthem to a steganalysis test. According to Kerckhoffs principle, we allow the warden to know allelements of the stegosystem (the cover source statistics, the embedding algorithm and the size ofthe possible payload) except for the (possibly encrypted) message. Given a single image, the wardenhas to decide whether it is cover or stego. In this simple binary hypothesis test, the warden canmake two types of errors either detect the cover image as stego (false alarm) or recognize thestego image as cover (missed detection). The corresponding probabilities are denoted PFA and PMD,respectively. The relationship between these two errors is completely described by the ROC curveobtained by plotting 1 PMD(PFA) as a function of PFA. Unfortunately, ROC curves cannot bedirectly used for evaluating steganalyzers (embedding algorithms) as they cannot be ordered (theymay overlap).

  5. L2R_L2LOSS – soft-margin optimization criterion

    Although there exist many algorithms for binary classification, SVMs are popular for their goodability to generalize to unseen data samples. The success of SVMs lies in the optimization criterionwhich, for the case of a linear classifier, looks for the separating hyperplane maximizing the distance(often called margin) between itself and the closest data points. Intuitively, the larger the marginbetween two classes, the better they can be separated and the smaller the PE error becomes. Weuse the size of the margin for a linear SVM as the optimization criterion. Let C be the set of N cover images and S the set of N stego images obtained from C by embeddinga pseudo-random message into each image. By extracting a d- dimensional feature from each image,we obtain a set of 2N vectors {fi Rd|i2 {1, . . .

    ,2N}}. We also define the labels gi, i {1, . . 2

    International Journal of Engineering Research & Technology (IJERT)

    ISSN: 2278-0181

    Vol. 1 Issue 6, August – 2012

    . ,2N},as gi= 1 if fi was obtained from a cover image and gi= +1 otherwise. Furthermore, we normalizeall cover feature vectors fi so that the sample variance of each element is 1. This scaling is thenapplied to stego features as well. SVMs with a linear kernel [3] classify a new sample f as coverif wTf <0, where wRdis the normal vector of the decision hyper plane obtained by solving theoptimization problem:

    Here,p(w; fi, gi) is a loss function and C >0 is a penalty parameter. By minimizing (7.1.2), we maximize the margin while penalizing the misclassified samples. We focus on the so-called L2-SVMpenalty function (w; fi, gi) = max(1 giwTfi, 0)2. The optimization problem can also be formulated in its dual form [3]:

    Where , D being a

    diagonal matrix with Dii= (2C) 1, and Qij= gigjfiTfj, i, j 2{1, . . . , 2N}. Given, the

    solution to is From

    the duality, the value of h ( ), forany with i 0, bounds the optimal solution to the primal problem from below. Wecall theoptimal value of h (), the L2RL2LOSS (L2-regularized L2-loss) criterion. Thesmalle the value of this criterion, the larger the optimal value of, and the smaller thepossible margin between cover and stego samples becomes. Therefore, stegano – graphers should beinterested in minimizing L2RL2LOSS.

    We used a dual coordinate descent method [3] with 104 iterations, C = 0.1, and = 0.1 asimplemented in the LIBLINEAR

    [2] package to calculate L2RL2LOSS. Evaluating L2RL2LOSS withsecond-order SPAM features took 12 seconds for N = 80

    512 × 512 cover images on a cluster of 40CPUs when the message-embedding and feature-extraction parts were distributed using OpenMPI.

    When optimizing using L2RL2LOSS, we fix the set of cover images C and the set of pseudorandommessages we will be embedding. We did this by fixing the seeds used for choosing the coverimages and the seed used by the embedding simulator. Although L2RL2LOSS may have differentvalues when evaluated across different sets C, the minimum w.r.t. stays approximately the same.

    The figure shows the value of the L2R_L2LOSS criterion based on the CDF set when evaluated for different values of 0 in (7.1.1) and the number of images in C. We can see that even with 40images, the optimal value of is close to the value obtained from the SVM-based classifier. Because the L2R_L2LOSS criterion can be evaluated quickly, it can be minimized using numericalmethods even for a high dimensional . unfortunately, for higher dimensional , the surface obtainedby this criterion w.r.t. is not smooth enough for gradient-based optimization methods to be usedefficiently. Instead, we used the NelderMead simplex-reflection method with elements of the initial simplex generated uniformly at random in [0, 1]. Due to the non-smooth nature of the optimization criterion, we cannot guarantee that we reached a globalminimum (in fact, the solution will be most likely a local minimum). 3

    International Journal of Engineering Research & Technology (IJERT)

    ISSN: 2278-0181

    Vol. 1 Issue 6, August – 2012

    Figure: Comparison of different cost assignments in the inverse single-difference costmodel with a payload-limited sender embedding 0.5 bpp using the L2RL2LOSS (left) andMMD2 (right) optimization criteria. The results are compared with the PE error obtained from anSVM-based classifier. All results were produced using the CDF set and the BOWS2 database of512

    × 512 grayscale images.

  6. Other optimization criteria and their relevance to cost design

    Due to the non-smooth optimization surface, we may be interested in other metrics. Metrics leadingto a smooth optimization surface may produce an embedding algorithm whose cost assignmentsmay be easier to interpret. Here, we present one such metric the Maximum Mean Discrepancy(MMD) [54, 96]. MMD has been used for comparison of steganographic methods [5] and othermachine learning problems, such as feature selection [6]. Originally, MMD was designed as astatistical test for the two- sample problem to decide whether two data sets were obtained fromthe same distribution. The theoretical derivation of MMD appears in [5]. Here, we only review theconnection between MMD and binary hypothesis testing.

    Let C0 and S0 be the sets of N0 cover and stego images, respectively. We require the set of coverimages used for creating S0 to be disjoint with C0. Let ci, si2 Rd, i 2 {1, . . . ,N0}, be the featurevectors representing the ith cover and stego image, respectively. As in Section 7.1.2, we normalizeci and sito unit variance obtained from the cover features. An unbiased estimate of MMD2

    Is the Gaussian kernel with parameter 0. We set the widthof the Gaussian kernel to = 103, which closely corresponds to the

    median rule [4]. In practice,we used the set of N 2N0 cover images from which C0 and S0 were derived using a pseudo- randompermutation. For a given set of N cover images, we define the MMD2 criterion as the sample meanof MMD(C0,S0)2 calculated over M pseudo- random partitions. For the 1234-dimensional CDF set,evaluating MMD2 using N = 80 512 × 512 cover images with N0 = 40 and M

    = 105 took 4 secondson a 40-CPU computer cluster when all operations were parallelized using OpenMPI.

    The MMD2 criterion is related to binary classification using Parzen windows. Asimple binary hypothesis testing problem (deciding whether a given image is cover or stego) can besolved optimally using the Likelihood Ratio Test (LRT) once the exact probability distributions ofcover, PC, and stego feature vectors, PS, are available. Given an unknown feature vector f, the LRTcalls f cover if PC(f) > PS(f) and stego otherwise. Because neither PC or PS are available, one maywant to estimate them from a set of N cover and N stego training samples fi Rd withlabels gi,i 2 {1, . . .

    ,2N}. The Parzen estimate of PC(f) defined as 4

    International Journal of Engineering Research & Technology (IJERT)

    ISSN: 2278-0181

    Vol. 1 Issue 6, August – 2012

    counts the number of training vectors that are close to f. Here, K(fi, f) is a kernel giving largerweights to vectors closer to f. A popular choice for Kis the Gaussian kernel

    K(fi, f) = k(fi, f) =

    The Parzen estimate of PS(f), denoted PS(f), is defined in a similar way. Whenwe substitute PC(f) and PS(f) into the LRT, we obtain the Parzen window classifier. Therefore,MMD(C0, S0)2 calculates a finite-sample estimate of the average detection criterion with equal-priors:

    This obtained using the leave-one-out cross- validation. Due to the Gaussian kernelk, MMD(PC, PS)2 0 and MMD(PC, PS)2 =

    0 if and only if PC = PS. For this reason, the steganographer should minimize the MMD2 criterion when calculated from N = 80 and N

    = 40 coverimages using N0 = N/2 and M =

    105 over different values of 0. The results obtained from theSVM-based classifier are plotted for reference. Due to bootstrapping, the MMD2 criterion results in asmooth optimization surface even for a high-dimensional . We used a simple gradient descent-basedoptimization technique to minimize MMD2.

  7. Application to Digital Images in DCT Domain

    Most adaptive embedding schemes for JPEG images embed message bits while quantizingthe DCT coefficients during JPEG compression and minimize an additive distortion functionderived from the rounding errors. This approach utilizes the side- information in theform of a never- compressed image, which may not always be available. In this section, we focus ondesigning adaptive embedding schemes

    that start directly from a JPEG image and derive the costsof changing a single DCT coefficient from its neighborhood.

    We used a mother database of 6, 500 images obtained from 22 different cameras at their fullresolution in a raw format from which a database of 6, 500 grayscale JPEG cover images was created.Each raw image was first converted to grayscale, resized to a smaller size of 512 pixels using bilinearinterpolation while preserving the aspect ratio, and finally JPEG compressed using quality factor75.A common way of expressing the payload in DCT-domain steganography is the number of bitsembedded per non-zero AC DCT coefficient [8], which we denote as bpac. This is becauseessentially all embedding schemes for DCT domain never change zero coefficients and some evenavoid changing DC coefficients due to their high impact on statistical detectability. According to [8],the most secure algorithm that does not rely on any side-information is the nsF5, which minimizesthe number of changed non-zero AC DCT coefficients. Using our terminology, the nsF5 uses abinary embedding operation that decreases the absolute value of a non-zero AC DCT coefficient, i.e.,Ii = {xi, xi sign(xi)} whenever xi 6= 0 is an AC coefficient, and Ii = {xi} otherwise.The detection wasimplemented using the CDF set with a Gaussian SVM-based steganalyzer.Similar to the spatial domain, we design the costs based on the differences between DCT coefficientseither from neighboring blocks or from similar DCT modes inthe same 8 × 8 block. Thisallows us to express the context in which a single change is made. We represent a JPEG imagex in a matrix notation, where xi,j2 I , {1024, . . . , 1024} denotes the DCT element of mode(i mod 8, j mod 8) in the di/8e , dj/8eth block. The set

    {xi,j|i mod 8 6= 0 j mod 8 6= 0} describesall AC DCT coefficients in x. We 5

    International Journal of Engineering Research & Technology (IJERT)

    ISSN: 2278-0181

    Vol. 1 Issue 6, August – 2012

    define the following cost model, which we use with a ternaryembedding operation.

  8. Inter/intra-block cost model: Let = (ir, ia) 2 R(2+1)+1 ×R(2+1)+1 be the model parameters

    describing the cost of disturbing inter- and intra-block dependencies with ir = (ir,, .

    . , ir,,ir,) and ia = (ia,, . . . , ia,, ia,). The cost of changing any (even zero)

    AC DCT coefficient

    where Nir ={xi+8,j , xi,j+8, xi8,j , xi,j8} and Nia ={xi+1,j , xi,j+1, xi1,j , xi,j1} are inter- and intrablockneighborhoods, respectively. As before, ia,z= ia, and ir,z= ir, whenever |z| >. Wereduced the sum in (7.3.1) accordingly when the required element falled outside of the image boundary. Compares the performance of embedding algorithms based on the aboveinter/intra-block cost model when optimized using the L2RL2LOSS criterion with CC-PEV featuresand payload 0.5 bpac. We report the performance of two algorithms for = 6. In the firstversion, both

    ir and ia were optimized, while in the second version only the inter-block part irwas optimized while ia = (0, . . . ,0). To show that the optimized algorithms are not over-trained tothe CC-PEV features calibrated by cropping by 4×4 pixels, we report the PE error obtained from aGaussian SVM-based steganalyzer utilizing the CDF set. Similar performance results were obtainedusing the CC-PEV feature set with calibration by cropping by 2×4 pixels, which suggests that thealgorithms are not over- trained to a specific feature set. Unfortunately, the algorithm optimizedw.r.t. both inter- and intra-block parts did not achieve a better performance than the algorithmwith ia = 0, which is just a special case. This is due to the fact that the Nelder Mead algorithmconverged to a local minimum (the L2RL2LOSS criterion was smaller for the case with ia = 0).

    When compared with the non- adaptive nsF5 algorithm, both versions increased the payload for thesame level of security more than twice. All algorithms can be implemented using the multi- layeredSTCs [7] in practice. In the figure shows that the loss introduced by such a practical implementationis small when implemented using STCs with constraint height h = 10.We found out experimentally that it is more effective to optimize the cost functions w.r.t. largerpayloads. Methods optimized for smaller payloads, such as 0.1 bpac, did not achieve as high performancefor higher payloads as methods optimized for larger payloads.

  9. Conclusion

    The basic premise behind steganography designed to embed while minimizing a certain distortionfunction is that the distortion is related to statistical detectability. In the past, steganographersusedheuristically defined distortion functions and focused on the 6

    International Journal of Engineering Research & Technology (IJERT)

    ISSN: 2278-0181

    Vol. 1 Issue 6, August – 2012

    problem of embedding with minimaldistortion while no attempt was made to justify the choice of the distortion function or optimize itsdesign. Since the problem of embedding with minimal distortion has been resolved in a near- optimalfashion in Chapters 5 and 6, what remains to be done and where the biggest gain in steganographicsecurity lies is the form of the distortion function.The main contribution of this chapter is a practical methodology using which one can optimizethe distortion to design steganographic schemes with improved security. We do so by representingimages in a feature space in which we define a criterion evaluating the separabilitybetween thesets of cover and stego features. The distortion function is parametrized and the parameters arefound by optimizing them

    w.r.t. the chosen criterion on a set that is relatively small 80 coverand stego images. The result is validated on various cover sources using blind steganalyzers. Weintentionally use steganalyzers that utilize different feature spaces than the one in which we optimizeto demonstrate that our optimized design generalizes to other feature sets as well cover sources.We work with additive distortion functions that can be written as a sum of costs defined for eachpixel, while each pixel cost depends on neighboring cover pixels. After investigating three differentchoices for the criterion, we selected the margin of a linear SVM as the most suitable one that iscomputationally efficient yet still closely tied to detectability as determined by a binary classifiertrained on a large set of images.The merit of the proposed work is demonstrated by incorporating the optimized cost for the±1 embedding operation in the spatial domain and the ±1 operation for the DCT domain. Theimprovement over current state of the art is especially apparent in the DCT domain where themethods with optimized costs can

    embed more than twice as large payloads for the same detectabilityas the nsF5 algorithm. The costs are robust in the sense that the improvement can be observed even when the new method is tested with steganalyzers using a different feature set and even on aslightly different cover source.Without any doubts, better parametric models for the distortion in the DCT domain can andshould be considered. For example, the cost parameters should be dependent on the spatial frequencyof DCT coefficients. This would substantially increase the dimensionality of the parameter spacewhich would need to be balanced out by a corresponding increase of the number images. Thisappears to be a mere issue of increased complexity rather than one that would render our approachinapplicable and we might consider it in our future work.

  10. Bibliography

    1. T. Filler, A. D. Ker, and J. Fridrich. The Square Root Law of steganographic capacity forMarkov covers. In Proceedings SPIE, Electronic Imaging, Security and Forensics of MultimediaXI, volume 7254, pages 08 1 08 11, San Jose, CA, January 1821, 2009.

    2. R.-E. Fan, K.-W.Chang, C.-J.Hsieh, X.- R.Wang, and C.-J. Lin. LIBLINEAR: A library forlarge linear classification. Journal of Machine Learning Research, 9:1871 1874, 2008. Softwareavailable at

    3. C.-J. Hsieh, K.-W.Chang, C.-J.Lin, S. S. Keerthi, and S. Sundararajan.A dual coordinatedescent method for large-scale linear svm. In Proceedings of the 25th international conferenceon Machine learning, ICML 08, pages 408415. ACM, 2008. 7

      International Journal of Engineering Research & Technology (IJERT)

      ISSN: 2278-0181

      Vol. 1 Issue 6, August – 2012

    4. A. Gretton, K. M. Borgwardt, M. Rasch,

      B. Schölkopf, and A. J. Smola.A kernel methodfor the two-sample-problem. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances inNeural Information Processing Systems 19, pages 513520. MIT Press, Cambridge, MA, 2007.

    5. T. Pevný and J. Fridrich.Benchmarking for steganography. In K. Solanki, K. Sullivan, andU. Madhow, editors, Information Hiding, 10th International Workshop, volume 5284 of LectureNotes in Computer Sc., pages 251267, Santa Barbara, CA, June 1921, 2008. Springer- Verlag,New York.

    6. K. Fukumizu, F. R. Bach, and M. I. Jordan.Dimensionality reduction for supervised learningwith reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5:7399, 2004.

    7. T. Filler, J. Judas, and J. Fridrich.Minimizing additive distortion in steganography usingSyndrome-Trellis Codes.IEEE Transactions on Information Forensics and Security, 2010.Submitted. See ons.php.

    8. J. Fridrich, T. Pevný, and J. Kodovský. Statistically undetectable JPEG

  11. About Authors:

steganography: Deadends, challenges, and opportunities. In Proceedings of the 9th ACM Multimedia & SecurityWorkshop, pages 314, Dallas, TX, September 2021,


  1. Y. Kim, Z. Duric, and D. Richards. Modified matrix encoding technique for minimal distortionsteganography. In Information Hiding, 8th International Workshop, volume 4437 of LectureNotes in Computer Sc., pages 314327, Alexandria, VA, July 1012, 2006.

  2. V. Sachnev, H. J. Kim, and R. Zhang.Less detectable JPEG steganography method based onheuristic optimization and BCH syndrome coding. In Proceedings of the 11th ACM Multimedia& Security Workshop, pages 131140, Princeton, NJ, Sept. 2009.

  3. P. Bas, T. Filler, and T. Pevný. "Break Our Steganographic System" the ins and outs oforganizing BOSS. In T. Filler, T. Pevný, A. D. Ker, and S. Craver, editors, Information Hiding,13th International Conference, Lecture Notes in Computer Sc., Prague, Czech Republic, May1820, 2011. Submitted.

    1. VamsikiranMekathoti, Assistant Professor, CSE Department, working in VignansNirula, Guntur, which is one of the best organizations in Andhra Pradesh under JNTUK. As Assistant Professor I awarded with best faculty of the year in 2010. Guided 10 projects of UG and PG levels.

    2. VissapragadaSaipriya, Assistant Professor, CSE Department, working in VignansNirula, Guntur, which is one of the best organizations in Andhra Pradesh under JNTUK. As part of work, published 2 International Journals based on Leaf Image Processing. Approved as Assistant Professor in different levels of classes for UG and PG courses by JNTUH. Guided 5 projects in UG and PG levels. 8

      International Journal of Engineering Research & Technology (IJERT)

      ISSN: 2278-0181

      Vol. 1 Issue 6, August – 2012

    3. ArunkumarBeyyala, Assistant Professor, CSE Department, working in VignansNirula, Guntur, which is one of the best organizations in Andhra Pradesh under JNTUK. Published 2 International Journals on Detection of disease in plants using Image Processing. Ratified as Assistant Professor by JNTUA. Guided 3 projects in UG and PG levels 9

Leave a Reply