3D Hadamard Transform Based Perceptual Video Hashing

Call for Papers - April 2019

Download Full-Text PDF Cite this Publication

Text Only Version

3D Hadamard Transform Based Perceptual Video Hashing

Navaneeth S Rao

UG student Vidyavardhaka College of Engineering

Mysuru- 570002

Dileep M K

UG student Vidyavardhaka College of Engineering

Mysuru- 570002

Shruthi S H

UG student Vidyavardhaka College of Engineering

Mysuru- 570002

Sandeep R

Associate professor Vidyavardhaka College of Engineering

Mysuru- 570002

Achutha D

UG student Vidyavardhaka College of Engineering

Mysuru- 570002

Girijamba D L

Assistant professor Vidyavardhaka College of Engineering

Mysuru- 570002

Abstract:- As there is a dynamic exchange of multimedia dataover the internet, content identification and copyright protectionhas emerged as a serious issue. Perceptual video hashing helpsto overcome this problem by providing user authenticity andsecurity to the video data stored. The perceptual video hashfunction generates a compact code called the hash using theperceptual content of the video. This hash must be robust to anycontent preserving alterations and sensitive to content changingalterations.The paper proposes a robust video hashingalgorithm using 3D Hadamard transformation. This algorithmis well suited for the hardware implementation as the basisfunctions of Hadamard transform involves only +1 and -1 values.

General Terms:- Hash, Algorithm, Simulation, Transform, Function and Co-efficients.

Keywords:- Perceptual video hashing, Hadamard transform,Near-identical videos, Indexing and video retrieval.


    The hashes computed from cryptographic hash functions are highly sensitive to every single bit of the input. These functions are extremely fragile and cannot be adopted for hashing multimedia data. In video hashing, the hashes computed must be sensitive to the content of the video rather than the actual binary representation. For instance, the original video, its brightness manipulated version, contrast manipulated version, histogram equalized version and noisy version should produce identical hashes as their perceptual content remain same although their binary representations differ. So a differentmethod is required to compute the hash values that result in same output unless the perceptual content of the video is significantly changed. Content tracking and copyright protectionis a great issue in recent days as there is a large dynamic flow of data in the internet. So it is necessary to differentiate the video uploaded by the genuine owner and the hacked versionof the same uploaded by the invader [1]. The hash must be robust to the content preserving alterations and sensitive to the content changing alterations.

    The remaining paperis structured as follows. In section 2 Literature survey is discussed in brief. In section 3, the various propertiesof the perceptual hash function is

    defined. In section 4,we discuss the mathematics related to the perceptual videohashing using 3D Hadamard transform. The simulation resultsand discussion are briefly explained in section 5. Finally insection 6, conclusion are drawn and future work is studied.


    There are various video hash algorithms proposed in the literature.

    Coskun and Sankur [2] have proposed a video hashing algorithm by projecting the luminance components of video onto a DCT basis. The input video is normalized and the projected low frequency 3D-DCT co-efficients are quantized to get the hash. Li and Monga [1] proposed a video hashing technique by modeling videos as tensors and using subspace projections of tensors, such as Low-Rank Tensor Approximations (LRTAs)via Parallel Factor Analysis (PARAFAC) for generation ofperceptual hash from the videos. A rank one was used fortensor approximations. This method showed excellent robustness to large class of content preserving distortions. Sandeepet al. [3], extracted video hash using Tucker decompositiontechnique. It decomposes a tensor into a core tensor multipliedby the matrix along each mode. It flexibly decomposes n-waydata arrays into a lower dimensionality space. The algorithmshowed average performance against malicious video attacks.Inspired from the work in [4], Sandeep and Bora [5] extractedthe hashes from the videos by projecting the normalizedsub videos onto Achilioptass random basis. The algorithm showedaverage performance against content preserving attacks andgood performance against content changing attacks. This technique is secure, computationally efficient and retains essential features in reduced dimensionality sub-space with minimal distortion. This algorithm is robust to most of single and multiple image processing attacks.

    Many hash algorithms extract hashes using certain key framesfrom the video. When the video loses the key frames duringits transmission or reception, the hash value degrades. Thus toovercome this problem, Sandeep et al.

    [unpublished], proposeda 3D Radial Projection Technique (3D-RPT) that uses allthe frames of the video to compute the hash rather thana particular set of key frames. In this technique, video istreated as an order 3 tensor. The video is broken down intorandomly overlapping sub-blocks and these sub-blocks coverthe entire video to compute the hash.

  3. PROPERTIES OF PERCEPTUAL HASH FUNCTION The notations utilized for defining the properties are shown in Table 1.

    Table 1. Notations involved in defining the properties of perceptual hash functions



    Finite video space

    Finite secret key space

    Video input

    Secret key

    Perceptual hash function

    Hamming distance

    Video perceptually similar to

    Video perceptually different from

    1, 2, 3

    Suitable thresholds

    The various properties of perceptual hash function are as follows:

    1. One-way function

      The mapping of the video to the hash must be a one-way function i.e., it should be practically infeasible to reconstruct the whole video from the hash generated.

      (, ) (1)


    2. Compactness

      The size of the hash must be very small compared to the size of the video.

      ((, )) () (2)

    3. Perceptual robustness

      The Hamming distance calculated between the hashes of perceptually similar videos using the same secret key should be very small. The probability of these measurements being close should be very high i.e., near to

      1. Mathematically, it is represented as

      { ((, ), (, )) 1} 1 (3)

    4. Diffusion

      The Hamming distance calculated between the hashes of perceptually different videos using the same secret key should be very large. The probability of these measurements being large should be very high. Mathematically, it is represented as

      { ((, ), ( , )) > 2} 1 (4)

    5. Confusion

      The Hamming distance measured between the hashes of perceptually different keys should be very large. The probability of these measured being large should be very high. Mathematically, it is represented as

      {((, 1), (, 2)) > 3} 1 (5)


    The proposed hashing metho is an extended idea of 3D DCT based video hashing proposed by Coskun and Sankur in

    [2]. Here the 3D DCT is replaced by 3D Hadamard transform.

    The elements of the basis vectors of the Hadamard transform

    takes only binary values ±1 and are therefore well suited for

    hardware implementation of the algorithm.

      1. Properties of Hadamard Transform [6]

        1. The Hadamard transform is real, symmetric and orthogonal.

        2. The Hadamard transform is a fast transform since it contains only ±values, no multiplications are required in transform calculations.

        3. The Hadamard transform has good to very good energy compaction for highly correlated images.

      2. Proposed Method

        The basic block diagram of perceptual video hashing via 3D Hadamard transform is as shown in Figure 1. The steps involved are as follows: Pre-processing and Normalization, 3D Hadamard transform and Co- efficient selection and Hash selection.

        Fig 1: Block diagram of proposed hashing method

        1. Pre-processing and Normalization

    The videos to be hashed exist in various frame dimensions and frame rates. Hence, the video data has to be standardized in terms of frame dimensions and number of frames. In-order to standardize, the input video is subjected to temporal sub-sampling followed by spatial sub- sampling. In temporal sub-sampling, the input video is subjected to standard frame dropping to obtain 128 normalized frames. These frames are input to spatial sub-

    sampling where the normalized frames are re-sized to a standard size. We have standardized the frame dimensions to 64 × 64. It is as shown in Figure 2.

    Let us use the notation (, , ) that denotes any input video where is the frame width, is the frame height and is the number of frames. In this step, the input video (, , ) is converted to standard size . This standard size was experimentallydetermined based upon the fact that smaller sizes risk losing semantic content.

    Fig 3: The 3D-Hadamard array of normalized video with dimensions W = 64, H = 64 and F = 128.

    c) Hash computation

    After the 3D transform is applied and co-efficients are selected, the hash calculation procedure remains the same irrespective of the transformation used. The selected 128 co-efficients are converted to binary string using the method of median based quantization. Let the co-efficients be denoted by where ranges from 1 to 128. The hash is binarized using the Equation 6.

    = {1: () 0: < ()

    Where n = 0, 1, 2.128



    Fig 2: Normalization before Hash extraction

    b) 3D Hadamard transform and co-efficient selection The 3D Hadamard transform is applied on the normalized video. After applying the 3D Hadamard transform on the normalized video (, , ) , we obtain 3D array of Hadamard co-efficients. The low frequency Hadamard co-efficients contain most of the energy and are robust against various signal processing attacks. In our work, we have extracted hashes from a 8 × 8 × 2 cube. It is as shown in Figure 3.

    The details of our original database used for experimentation is tabulated in Table 2. A total of 1000 videos wereselected forthe experimentation purpose.Totally 19 attackswere performed on the database. They are: Brightness increase, brightness decrease, contrast increase, contrast decrease, adding logo (of size × ) adding logo (of size × ), histogram equalization, frame dropping, framerate increase (to 60fps), frame rate decrease (to 15fps),spatial resolution increase, spatial resolution decrease, averageblurring, gaussian blurring, addition of salt and pepper noise,addition of gaussian noise, bit rate changes (to 100 kbps),frame rotation and frame cropping.

    Table 2. Database details used for experimentations





    Number of videos


    Minimum spatial resolution

    192 × 144

    Maximum spatial resolution

    640 × 480

    Minimum frame rate


    Maximum frame rate


    Minimum number of frames


    Maximum number of frames


    Video format


    A total of 20,000 videodatabase was created (including the original videos). Wenormalized these videos to × ×

    . A 128 bit hashis extracted from × × sub-cube for a single video. Itis repeated for total number of videos

    in the database and issaved in an excel spreadsheet. For any query video selected,hash is computed and is checked for matching with the hashesin the spreadsheet. In-order to compare the hashes, hammingdistance metric is used. The normalized hamming distanceis computed between the hashes of the query video and thevideos in the database. A threshold is setup for retrieval ofvideos by trial and error method. If the normalized hammingdistance is less than the threshold, the videos are said to benear identical and are retrieved. If the normalized hammingdistance is greater than the threshold, the videos are said tobe different and are not retrieved. To compare the perceptualcontent between the query video and a randomly chosen videofrom the database, a graphical user interface (GUI) is createdfor clean and easy interface between the application and theuser. It is as shown in Figure 4 and Figure 5 respectively.


  7. The Proposed video hashing technique works fine for common signalprocessing attacks but under frame insertions ordropping offrames, there is a degradation of performance. Thehardwareimplementation is easy as the basis vectors in Hadamardtransform involves only ±1. The retrieval of near identicalvideos is less complex and hardware implementation involveonly addition and subtraction operations. The performanceof the algorithm is yet to be studied using precision-recallcurves and Receiver Operating Characteristic curves (ROC).

    Fig 4: GUI showing similar videos

    Fig 5: GUI showing dissimilar videos


  1. M. Li and V. Monga, Robust video hashing via multilinear subspace projections, IEEE Transactions on Image processing, vol. 21, no. 10, pp. 4397-4409, Oct 2012.

  2. B. Coskun, B. Sankur, and N. Memon, Spatio- temporaltransform based video hashing, IEEE Transactions on Multimedia, vol. 8, no. 6, pp. 11901208, 12 2006.

  3. R. Sandeep, S. Sharma, T. Mayank, and P. K. BoraPerceptual video hashing based on tucker decomposition with application to indexing and retrieval of near-identical videos, Multimedia Tools and Applications, vol. 75, no. 13, pp. 77797797, 2016. [Online]. Available:http://dx.doi.org/10.1007/s11042-015-2695-1

  4. M. Li and V. Monga, Desynchronization resilient videofingerprinting via randomized, low-rank tensor approximations, in 2011 IEEE 13th International workshop on Multimedia Signal processing, Oct 2011, pp. 16.

  5. R. Sandeep and P. K. Bora, Perceptual video hashing based on the achlioptass random projections, in 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Dec 2013, pp. 14.

  6. A. K. Jain, Fundamentals of Digital Image Processing, ser. Prentice-Hall Information and System Sciences Series.Prentice-Hall, 1989.

Leave a Reply

Your email address will not be published. Required fields are marked *