Investigation and Evaluation of New Filtering Approach for Efficient Video Compression

DOI : 10.17577/IJERTV1IS8161

Download Full-Text PDF Cite this Publication

Text Only Version

Investigation and Evaluation of New Filtering Approach for Efficient Video Compression

Ms. Jyoti M. Pingalkar(Lecturer)

Asst. Prof. P.B. Kumbharkar. (Head Of Computer Engineering Department.)

Siddhant College Of Engineering, Pune


Alard College of Engineering,pune


    As we know that for H.264/AVC standardization the use of in-loop filters was introduced in order to improve the objective and subjective quality of video compression through noise reduction. The in-loop filers such as deblocking filter and adaptive loop filter are currently introduced for the HEVC standardization. However this both filters for HEVC only work under the spatial domain despite the fact of temporal correlation within video sequences. Thus to overcome this problem, in this paper we are presenting new filter which works efficiently with both the domains such as spatial and temporal. In addition to this, the investigation method is based on depth compression technique. This proposed filter introduced with aim of delivering most efficient decoding of video streaming. In this filter, small overhead requires at slice level during the use of temporal information conveyed in the bit stream to reconstruct the individual motion trajectory of every pixel in a frame at both encoder and decoder. Afterward, this information is used for pixel-wise adaptive motion- compensated temporal filtering. Implementation and evaluation study for proposed approach is done using the JAVA frameworks.

    Index Terms H.264/AVC, HEVC, Video Compression, in-loop filter, noise, video frames, decoder, encoder.


    3D video has gained more attentions as one of the most dominant video formats with a variety of applications such as (3DTV) 3-dimensional TV or (FTV) free view point TV, with the recent development of 3D multimedia/display technologies and the increasing demand for realistic multimedia. FTV can give users the freedom of selections viewpoint, different from conventional TV where the view- point is determined by an acquisition camera. 3DTV aims to provide users with 3D depth perception by rendering two and more views on stereoscopic and auto-stereoscopic 3D display [9]. Synthesizing multiple views provided 3D free

    view video can also be at the selected viewpoint according to a user preference. Many technical issues should be resolved for successful development of 3D video systems,

      1. capturing and analyzing the stereo or multi view images, transmitting the data and compressing, and rendering multiple images on various 3D displays.

        The main challenging issues of FTV and 3DTV are depth estimation, 3D video coding and virtual view synthesis. The virtual view at the receiver side depth maps are used to synthesize, so accurate depth maps should be estimated in an efficient manner for ensuring a seamless view synthesis. Since the performance of 3DTV and FTV heavily depends on the number of multiple views, virtual view synthesis is an important technique in 3D video systems as well. In other words, synthesizing virtual view with the limited number of original views leads to reducing the cost and bandwidth required to capture a number of viewpoints and to transmit the huge amount of data over network [10] [11].

        In this paper we are basically presenting the wide study over the depth map used for video compression in section II at first, and then after the TTF (temporal trajectory filter) is investigated in the environment of the HM. In addition, it is shown how the length of individual pixel trajectories can be increased to achieve better filtering performance. The remainder of the paper is structured as follows. Section III presents the literature study over mathematical model for the noise encountered along a pixel trajectory, if the trajectory is completely known. Based on this model, optimal filter coefficients are calculated. In Section IV, three previously described thresholds are revisited that allow the calculation of pixel trajectories directly from the motion vectors conveyed in the bit stream [9].


    The MPEG-FTV division, an ad hoc group of MPEG, has made a new standard for FTV. Also, a multi view-plus- depth data format was proposed to make the 3D video systems more flexible. For 3D video coding, (JVT) Joint Video Team from the (MPEG) Moving Pictures Experts Group of ISO/IEC and the (VCEG) Video Coding Experts

    Group of ITU-T jointly standardized (MVC) multi view video coding with an extension of H.264/AVC standard. Depth maps are encoded with the corresponding color images together and pre-calculated at the transmitter side. A number of methods have been proposed to efficiently compress the multi view video using view redundancies; the depth video coding has not been studied extensively while at the receiver side, the decoded multi view-plus-depth data are utilized to synthesize the virtual view. Depth video coding aims to reduce a depth bit rate as much as possible while ensuring the quality of the synthesized view. Thus, not the depth map itself but its performance is determined by the quality of the synthesized view. In general, the depth map contains a per-pixel distance between object and camera, usually represented by 8-bit grayscale value. The depth map has unique characteristics are following:

        1. Depth value varies smoothly except object boundaries or edges, 2) Edges of the depth map usually coincide with those of the corresponding color image, and 3) Object boundaries should be preserved in order to provide the high-quality synthesized view. Thus, which ultimately affect the synthesized view quality? The straightforward compression of the depth video using the existing video coding standards such as H.264/AVC may cause serious coding arti-facts along the depth discontinuities [2].

    Morvan et al. proposed a platelet-based method that models depth maps by estimating piecewise-linear functions in the sub-divisions of quad tree with variable sizes under a global rate-distortion constraint. Depth video coding can be classified into two categories according to coding algorithms: post-processing based coding and transform based coding. They show the proposed method out per- forms JPEG-2000 encoder with a 1-3 dB gain [1].

    These methods have better performance than the existing image compression methods, they are difficult to be extended into video domain for exploiting temporal redundancies, and are not compatible with the conventional video coding standards such as H.264/AVC. New intra prediction in H.264/AVC was proposed to encode depth maps by designing an edge-aware intra prediction scheme that can reduce a prediction error in macro blocks. And Maitre and Do proposed a depth compression method based on a shape-adaptive wavelet transform by generating small wavelet coefficients along depth edges [1] [9] [10]. These are different from the platelet or wavelet based coding methods; this scheme can be easily integrated with H.264/AVC. However, the performance of the coding algorithm was evaluated by measuring the depth map itself, but not by the synthesized view.

    Depth video coding algorithms have moved interest on reducing compression artifacts that may exist on depth video which is encoded by H.264/AVC in order to meet the compatibility to the advanced H.264/AVC standard. Kim et al. proposed a new distortion metric that considers global video characteristics and camera parameters, and then used

    the metric in the rate-distortion optimized mode selection to quantiy the effects of depth video compression on the synthesized view quality [12]. Lai et al. showed that a rendering error in the synthesized view is a monotonic function of the coding error, and presented a method to suppress compression artifacts using a sparsity-based de- artifacting filter [11]. The work proposed in exploited an adaptive depth map upsampling algorithm with a corresponding color image in order to obtain coding gain while maintaining the quality of the synthesized view. Oh et al. proposed a new coding scheme based on a depth boundary reconstruction filter which considers occur-rence frequency, similarity, and closeness of pixel. Liu et al. utilized a trilateral filter, which is a variant of bilateral filter, as an in-loop filter in H.264/AVC and a sparse dyadic mode as an intra-mode to reconstruct depth map with sparse representations. Some approaches have been proposed to encode a downsampled depth map and to use a special upsampling filter after decoding to recover the depth edge information.

    Author of [1] utilize a (WMF) weighted mode filtering for the post-processing of the compressed depth map, which was proposed to enhance the depth video obtained from depth sensors such as (ToF) Time-of-Flight camera. A joint histogram is generated by first calculating the weight based on spatial and range kernels and then counting each bin on the histogram of the depth map given by the input noisy depth map. Final solution is obtained by seeking a mode with is the maximum value on the histogram. In addition this, author introduce the concept the depth image compression of the weighted mode filtering in generic formulation tailored. Also describe the relation with the bilateral and trilateral filtering methods, which have been used in depth video coding, and show the effectiveness of the proposed method with a variety of experiments. The main contributions of this work as follows.

    • Analyze the relation between the WMF and the existing approaches in a localized histogram frame- work to justify its superior performance in the perspective of robust estimation theoretically.

    • Evaluate the effectiveness of the WMF in the depth coding context, where the noise characteristics and the objective measure are different from thoroughly.

    • Utilize the WMF in various proposed schemes for depth coding by considering important depth proper-ties for a better synthesized view quality effectively.

    In paper [1], using the framework of a conventional video codec author propose a novel scheme that compresses the depth video efficiently. In particular, an efficient post- processing method for the compressed depth map is proposed in a generalized framework, which considers compression artifacts, dynamic range of the depth data and spatial resolution. Proposed post-processing method utilizes

    additional guided information from the corresponding color video to reconstruct the depth map while preserving the original depth edge. As an in-loop filter depth video is encoded by a typical transform-based motion compensated video encoder, and compression artifacts are addressed by utilizing the post-processing method. In addition, author design a down/upsampling coding approach for both the spatial resolution and the dynamic range of the depth data. Here basic idea is to reduce the bit rate by encoding the depth data on the reduced spatial resolution and depth dynamic range. Reconstruct the depth video this proposed post-processing filter is then utilized to efficiently.


    For any given pixel in frame j it

    is assumed that its locations 1 i < N, in N-

    1 previous frames are also known. If denotes the luminance component of frame n at location

    the distorted versions of the original sample in any of the N1 previous frames

    Yji(xi, yi) = Yj(x0, y0) + ni, 1 i < N (1)

    Even the motion of the pixel is perfectly known, a

    noise term with variance is introduced due to the reduced quality of the encoded sequence. As described, it

    can be assumed that all are uncorrelated. A filtered version of the original luma component can compute by calculating a weighted mean [1]

    Where are the individual weights per frame with to make the filter unbiased.

    This leads to the definition of a new noise term for the filtered pixel.

    The variance of the filtered noise is subsequently given by

    As the filter is to minimize n constraint to

    the minimum may be found by Lagrangian minimization:

    The reconstruction error variance for each pixel along with the trajectory would be required to

    calculate the optimal filter weight . According to Wiegand and Girod the distortion variance in a reconstructed frame is given [1]:

    Where is the quantizer step size selected by the quantization parameter QP. Both in H.264/AVC and

    in HEVC, is roughly:

    Subsequently, the optimal filter weight for frame i according to its QP may be calculated



    Low-delay high efficiency setting of IBBB coding structure and HEVC with an every is B-predicted block. There are two motion vectors pointing to one of the last four encoded pictures. Assumed that the motion vector for a given block and also describes the individual motion of each pixel [4].

    Figure 1

    In Fig. 1. Starting at a pixel with luminance

    in an arbitrary B-frame i, possible trajectory locations derived through the concatenation of motion vectors pointing to previously encoded B-frames.

    Block. The components of the two resulting motion vector fields for frame i shall be denoted by

    and .

    Starting again with pixel in frame i, two possible locations of the pixel in the referenced frames are therefore given by

    In Figure 1 show how the concatenation of motion vectors is used to derive possible pixel locations over a GOP of four frames. Not all of these describe the true motion of the pixel. It is necessary, to discard those motion vectors that have purely been chosen due to rate distortion Optimization and thus May not relate to the true motion of pixels [5]. To this end three thresholds are used. In each of the following equations the motion vectors are scaled according to the temporal distance that they span.

    1. Absolute Error along the Trajectory

      For every pixel the absolute difference of two consecutive luminance samples together

      with the respective chrominance differences and

      are calculated. A sudden change in one of these differences is assumed to indicate that a motion vector no longer describes the true motion of a pixel. The trajectory is only continued, if

    2. Temporal Motion Consistency

      In addition, Consecutive motion vectors are tested in the similarity. A trajectory is expected to be correct as long as its motion does not change significantly over time. When examining a new motion vector for list 0 for any given pixel of the trajectory, its Euclidean distance to the vector pointing to

      Figure 2

      Fig. 2. On the left the original motion vectors for all 4×4 blocks surrounding the trajectories current location are shown. Vectors that span a temporal distance greater than 1 are marked in gray. After the scaling (right) the BV – metric (i.e. the number of vectors differing from the current one) for the current block is 5

      current location is calculated. The trajectory is continued, as long as.

      With 0 TTC 7 in quarter-pel, where and are the components of the motion vector for list 0

      pointing from the current frame to reference frame r0. The temporal motion consistency is also checked for the vectors of reference list 1.

    3. Spatial Motion Consistency

      At each frame the motion vector for a pixel of the trajectory is compared with its eight neighbors on 4 × 4 block level. In this context the block-vote metric BV denotes the number of neighboring motion vectors that differ significantly fom the current one. A side from a temporal comparison of motion vectors, spatial similarity is also examined, which is a measure for the reliability of a motion vector [2]. The allowed maximum difference is 30% of the original motion vectors length or at least 0.3 quarter-pel. Both values were chosen empirically and proved to be well suited for all sequences. In the scaling of motion vectors and the block- vote metric are illustrated by Figure 2. The filtering along the trajectory is continued only, if the block-vote metric for the current pixel satisfies

    4. Long Trajectories

      Previous implementation described in, a trajectory was interrupted as soon as the luminance difference became bigger than the threshold . This makes the coder biased towards shorter trajectories. In theory, the quality of the filtering process is increased with the number of samples

      used. The new design, therefore, does not stop the trajectory formation all together, but simply omits the luma sample in question from the filtering process and continues the trajectory. For the threshold the last filtered luminance samples and the respective chrominance samples are now used for the calculation of

    5. Parameter Calculation

    The parameter combination the minimum mean square error is selected. In which all possible parameter combinations can be tested simultaneously at the encoder. Each of the thresholds is transmitted to the decoder requiring 9 additional bits per frame. A tenth bit can be used to disable the filter for the current frame all together, in which case the other thresholds are simply omitted.

    Figure 3

    In Fig. 3, The TTF is included in the local decoder loop of the encoder after the deblocking filter. Both ALF and TTF are used together, for the first test the ALF was disabled, the respective frame in the TTFs buffer is updated after the ALF has been applied.


    In [2], the evaluation of this proposed TTF method has been carried out. During the practical experiment of this method using JAVA, the TTF is integrated into the HEVC test model HM 3.0. Up to 32 previously decoded unfiltered frames are kept in a buffer to be used for the trajectory formation. Figure 3, above showing the practical design of this approach. In this setting the ALF (dotted connections) was disabled. Tests have been conducted for a variety of sequences listed in Table I below. The exact configuration for the low-delay high efficiency setting may be found in [3]. From [2], results showing that TTF outperforms the existing method. The average BD-rate for the simplified filter is only 0.9%, which provides evidence for the effectiveness of the weighted filtering.


    As in this research study we investigated the best approach for improved video compression, however in literature we studied the one more approach which is resulting into the better improvement for video compression, this approach is called as depth video coding. So along with the use of TTF, in future we can also use the depth coding technique for more improved results for video compression. From the work done section over TTF, it shows that this filter produces an average BD-rate of 1.4% when included in the HEVC test model. Additional improvements may be achieved by further investigating both long trajectories and weighted averaging separately. Thus our further work will be focusing over possible interactions between in-loop filtering approaches, TTF encode complexity reduction and use of depth map coding technique.


  1. Efficient Techniques for Depth Video Compression Using Weighted Mode Filtering, Viet-Anh Nguyen, Dongbo Min, Member, IEEE, and Minh N. Do, Senior Member, IEEE, 2011.

  2. Weighted Temporal Long Trajectory Filtering for Video Compression, Marko Esche, Alexander Glantz, Andreas Krutz, Michael Tok, and Thomas Sikora, 2012.

  3. T. Chujoh, N. Wada, T. Watanabe, G. Yasuda, and T. Yamakage, Specification and experimental results of quadtree-based adaptive loop filter, ITU-T SG16/Q.6 VCEG document VCEG-AK22, Apr 2009.

  4. A. Golwelkar and J. Woods, Motion-compensated temporal filtering and motion vector coding using biorthogonal filters, IEEE TCSVT, vol. 17, no. 4, pp. 417 428, April 2007.

  5. A. Glantz, A. Krutz, M. Haller, and T. Sikora, Video coding using global motion temporal filtering, Proceedings of the 16th International Conference on Image Processing (ICIP), pp. 10531056, Nov 2009.

  6. M. Esche, A. Krutz, A. Glantz, and T. Sikora, A novel in-loop filter for video-compression based on temporal pixel trajectories, Proceedings of the 26th PCS, pp. 514517, Dec 2010.

  7. T. Wiegand and B. Girod, Lagrange multiplier selection in hybrid video coder control, Proceedings of the International Conference on Image Processing (ICIP), vol. 3, pp. 542545, 2001.

  8. G. Bjøntegaard, Calculation of average PSNR differences between RDcurves, ITU-T SG16/Q.6 VCEG document VCEG-M33, Mar 2001.

  9. MPEG document, N9760, Text of ISO/IEC 14496- 10:2008/FDAM Multiview Video Coding, Oct. 2008, Busan, Korea.

  10. A. Smolic, K. Mueller, N. Stefanoski, J. Ostermann, A. Gotchev, G. B. Akar, G. Triantafyllidis, and A. Koz,

    Coding Algorithms for 3DTV-A Survey, IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 11, pp. 1606-1621, 2007.

  11. MPEG document, N9992, Results of 3D Video Expert Viewing, Jul. 2008, Hannover, Germany.

  12. MPEG document, w11061, Applications and requirements on 3D video coding,MPEG, Xian, China, Oct. 2009.

Leave a Reply