 Open Access
 Total Downloads : 6
 Authors : Navaneeth S Rao, Shruthi S H, Achutha D, Dileep M K, Sandeep R, Girijamba D L
 Paper ID : IJERTCONV6IS13002
 Volume & Issue : NCESC – 2018 (Volume 6 – Issue 13)
 Published (First Online): 24042018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
3D Hadamard Transform Based Perceptual Video Hashing
Navaneeth S Rao
UG student Vidyavardhaka College of Engineering
Mysuru 570002
Dileep M K
UG student Vidyavardhaka College of Engineering
Mysuru 570002
Shruthi S H
UG student Vidyavardhaka College of Engineering
Mysuru 570002
Sandeep R
Associate professor Vidyavardhaka College of Engineering
Mysuru 570002
Achutha D
UG student Vidyavardhaka College of Engineering
Mysuru 570002
Girijamba D L
Assistant professor Vidyavardhaka College of Engineering
Mysuru 570002
Abstract: As there is a dynamic exchange of multimedia dataover the internet, content identification and copyright protectionhas emerged as a serious issue. Perceptual video hashing helpsto overcome this problem by providing user authenticity andsecurity to the video data stored. The perceptual video hashfunction generates a compact code called the hash using theperceptual content of the video. This hash must be robust to anycontent preserving alterations and sensitive to content changingalterations.The paper proposes a robust video hashingalgorithm using 3D Hadamard transformation. This algorithmis well suited for the hardware implementation as the basisfunctions of Hadamard transform involves only +1 and 1 values.
General Terms: Hash, Algorithm, Simulation, Transform, Function and Coefficients.
Keywords: Perceptual video hashing, Hadamard transform,Nearidentical videos, Indexing and video retrieval.

INTRODUCTION
The hashes computed from cryptographic hash functions are highly sensitive to every single bit of the input. These functions are extremely fragile and cannot be adopted for hashing multimedia data. In video hashing, the hashes computed must be sensitive to the content of the video rather than the actual binary representation. For instance, the original video, its brightness manipulated version, contrast manipulated version, histogram equalized version and noisy version should produce identical hashes as their perceptual content remain same although their binary representations differ. So a differentmethod is required to compute the hash values that result in same output unless the perceptual content of the video is significantly changed. Content tracking and copyright protectionis a great issue in recent days as there is a large dynamic flow of data in the internet. So it is necessary to differentiate the video uploaded by the genuine owner and the hacked versionof the same uploaded by the invader [1]. The hash must be robust to the content preserving alterations and sensitive to the content changing alterations.
The remaining paperis structured as follows. In section 2 Literature survey is discussed in brief. In section 3, the various propertiesof the perceptual hash function is
defined. In section 4,we discuss the mathematics related to the perceptual videohashing using 3D Hadamard transform. The simulation resultsand discussion are briefly explained in section 5. Finally insection 6, conclusion are drawn and future work is studied.

LITERATURE SURVEY
There are various video hash algorithms proposed in the literature.
Coskun and Sankur [2] have proposed a video hashing algorithm by projecting the luminance components of video onto a DCT basis. The input video is normalized and the projected low frequency 3DDCT coefficients are quantized to get the hash. Li and Monga [1] proposed a video hashing technique by modeling videos as tensors and using subspace projections of tensors, such as LowRank Tensor Approximations (LRTAs)via Parallel Factor Analysis (PARAFAC) for generation ofperceptual hash from the videos. A rank one was used fortensor approximations. This method showed excellent robustness to large class of content preserving distortions. Sandeepet al. [3], extracted video hash using Tucker decompositiontechnique. It decomposes a tensor into a core tensor multipliedby the matrix along each mode. It flexibly decomposes nwaydata arrays into a lower dimensionality space. The algorithmshowed average performance against malicious video attacks.Inspired from the work in [4], Sandeep and Bora [5] extractedthe hashes from the videos by projecting the normalizedsub videos onto Achilioptass random basis. The algorithm showedaverage performance against content preserving attacks andgood performance against content changing attacks. This technique is secure, computationally efficient and retains essential features in reduced dimensionality subspace with minimal distortion. This algorithm is robust to most of single and multiple image processing attacks.
Many hash algorithms extract hashes using certain key framesfrom the video. When the video loses the key frames duringits transmission or reception, the hash value degrades. Thus toovercome this problem, Sandeep et al.
[unpublished], proposeda 3D Radial Projection Technique (3DRPT) that uses allthe frames of the video to compute the hash rather thana particular set of key frames. In this technique, video istreated as an order 3 tensor. The video is broken down intorandomly overlapping subblocks and these subblocks coverthe entire video to compute the hash. 
PROPERTIES OF PERCEPTUAL HASH FUNCTION The notations utilized for defining the properties are shown in Table 1.
Table 1. Notations involved in defining the properties of perceptual hash functions
Symbol
Description
Finite video space
Finite secret key space
Video input
Secret key
Perceptual hash function
Hamming distance
Video perceptually similar to
Video perceptually different from
1, 2, 3
Suitable thresholds
The various properties of perceptual hash function are as follows:

Oneway function
The mapping of the video to the hash must be a oneway function i.e., it should be practically infeasible to reconstruct the whole video from the hash generated.
(, ) (1)
Where

Compactness
The size of the hash must be very small compared to the size of the video.
((, )) () (2)

Perceptual robustness
The Hamming distance calculated between the hashes of perceptually similar videos using the same secret key should be very small. The probability of these measurements being close should be very high i.e., near to
1. Mathematically, it is represented as
{ ((, ), (, )) 1} 1 (3)

Diffusion
The Hamming distance calculated between the hashes of perceptually different videos using the same secret key should be very large. The probability of these measurements being large should be very high. Mathematically, it is represented as
{ ((, ), ( , )) > 2} 1 (4)

Confusion
The Hamming distance measured between the hashes of perceptually different keys should be very large. The probability of these measured being large should be very high. Mathematically, it is represented as
{((, 1), (, 2)) > 3} 1 (5)


PERCEPTUAL VIDEO HASHING USING 3D HADAMARDTRANSFORM
The proposed hashing metho is an extended idea of 3D DCT based video hashing proposed by Coskun and Sankur in
[2]. Here the 3D DCT is replaced by 3D Hadamard transform.The elements of the basis vectors of the Hadamard transform
takes only binary values Â±1 and are therefore well suited for
hardware implementation of the algorithm.

Properties of Hadamard Transform [6]

The Hadamard transform is real, symmetric and orthogonal.

The Hadamard transform is a fast transform since it contains only Â±values, no multiplications are required in transform calculations.

The Hadamard transform has good to very good energy compaction for highly correlated images.


Proposed Method
The basic block diagram of perceptual video hashing via 3D Hadamard transform is as shown in Figure 1. The steps involved are as follows: Preprocessing and Normalization, 3D Hadamard transform and Co efficient selection and Hash selection.
Fig 1: Block diagram of proposed hashing method

Preprocessing and Normalization

The videos to be hashed exist in various frame dimensions and frame rates. Hence, the video data has to be standardized in terms of frame dimensions and number of frames. Inorder to standardize, the input video is subjected to temporal subsampling followed by spatial sub sampling. In temporal subsampling, the input video is subjected to standard frame dropping to obtain 128 normalized frames. These frames are input to spatial sub
sampling where the normalized frames are resized to a standard size. We have standardized the frame dimensions to 64 Ã— 64. It is as shown in Figure 2.
Let us use the notation (, , ) that denotes any input video where is the frame width, is the frame height and is the number of frames. In this step, the input video (, , ) is converted to standard size . This standard size was experimentallydetermined based upon the fact that smaller sizes risk losing semantic content.
Fig 3: The 3DHadamard array of normalized video with dimensions W = 64, H = 64 and F = 128.
c) Hash computation
After the 3D transform is applied and coefficients are selected, the hash calculation procedure remains the same irrespective of the transformation used. The selected 128 coefficients are converted to binary string using the method of median based quantization. Let the coefficients be denoted by where ranges from 1 to 128. The hash is binarized using the Equation 6.
= {1: () 0: < ()
Where n = 0, 1, 2.128
(6)


SIMULATION RESULTS AND DISCUSSION
Fig 2: Normalization before Hash extraction
b) 3D Hadamard transform and coefficient selection The 3D Hadamard transform is applied on the normalized video. After applying the 3D Hadamard transform on the normalized video (, , ) , we obtain 3D array of Hadamard coefficients. The low frequency Hadamard coefficients contain most of the energy and are robust against various signal processing attacks. In our work, we have extracted hashes from a 8 Ã— 8 Ã— 2 cube. It is as shown in Figure 3.
The details of our original database used for experimentation is tabulated in Table 2. A total of 1000 videos wereselected forthe experimentation purpose.Totally 19 attackswere performed on the database. They are: Brightness increase, brightness decrease, contrast increase, contrast decrease, adding logo (of size Ã— ) adding logo (of size Ã— ), histogram equalization, frame dropping, framerate increase (to 60fps), frame rate decrease (to 15fps),spatial resolution increase, spatial resolution decrease, averageblurring, gaussian blurring, addition of salt and pepper noise,addition of gaussian noise, bit rate changes (to 100 kbps),frame rotation and frame cropping.
Table 2. Database details used for experimentations
Parameter
Description
Database
[trecvid.nist.gov][openvideo.org] Number of videos
1000
Minimum spatial resolution
192 Ã— 144
Maximum spatial resolution
640 Ã— 480
Minimum frame rate
4
Maximum frame rate
60
Minimum number of frames
1
Maximum number of frames
11415
Video format
MP4
A total of 20,000 videodatabase was created (including the original videos). Wenormalized these videos to Ã— Ã—
. A 128 bit hashis extracted from Ã— Ã— subcube for a single video. Itis repeated for total number of videos
in the database and issaved in an excel spreadsheet. For any query video selected,hash is computed and is checked for matching with the hashesin the spreadsheet. Inorder to compare the hashes, hammingdistance metric is used. The normalized hamming distanceis computed between the hashes of the query video and thevideos in the database. A threshold is setup for retrieval ofvideos by trial and error method. If the normalized hammingdistance is less than the threshold, the videos are said to benear identical and are retrieved. If the normalized hammingdistance is greater than the threshold, the videos are said tobe different and are not retrieved. To compare the perceptualcontent between the query video and a randomly chosen videofrom the database, a graphical user interface (GUI) is createdfor clean and easy interface between the application and theuser. It is as shown in Figure 4 and Figure 5 respectively.

CONCLUSION AND FUTURE WORK

The Proposed video hashing technique works fine for common signalprocessing attacks but under frame insertions ordropping offrames, there is a degradation of performance. Thehardwareimplementation is easy as the basis vectors in Hadamardtransform involves only Â±1. The retrieval of near identicalvideos is less complex and hardware implementation involveonly addition and subtraction operations. The performanceof the algorithm is yet to be studied using precisionrecallcurves and Receiver Operating Characteristic curves (ROC).
Fig 4: GUI showing similar videos
Fig 5: GUI showing dissimilar videos

REFERENCES

M. Li and V. Monga, Robust video hashing via multilinear subspace projections, IEEE Transactions on Image processing, vol. 21, no. 10, pp. 43974409, Oct 2012.

B. Coskun, B. Sankur, and N. Memon, Spatio temporaltransform based video hashing, IEEE Transactions on Multimedia, vol. 8, no. 6, pp. 11901208, 12 2006.

R. Sandeep, S. Sharma, T. Mayank, and P. K. BoraPerceptual video hashing based on tucker decomposition with application to indexing and retrieval of nearidentical videos, Multimedia Tools and Applications, vol. 75, no. 13, pp. 77797797, 2016. [Online]. Available:http://dx.doi.org/10.1007/s1104201526951

M. Li and V. Monga, Desynchronization resilient videofingerprinting via randomized, lowrank tensor approximations, in 2011 IEEE 13th International workshop on Multimedia Signal processing, Oct 2011, pp. 16.

R. Sandeep and P. K. Bora, Perceptual video hashing based on the achlioptass random projections, in 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Dec 2013, pp. 14.

A. K. Jain, Fundamentals of Digital Image Processing, ser. PrenticeHall Information and System Sciences Series.PrenticeHall, 1989.