An Efficient VLSI Architecture for High Speed 1-D Lift DWT 5/3 Filter for Image Compression

DOI : 10.17577/IJERTCONV3IS19128

Download Full-Text PDF Cite this Publication

Text Only Version

An Efficient VLSI Architecture for High Speed 1-D Lift DWT 5/3 Filter for Image Compression

Praveen M Pattar Krishnamohan

Student, M.Tech 4th Semester, Assistant Professor,

Department of Instrumentation Technology, Department of Instrumentation Technology,

Dayananda Sagar College of Engineering, Dayananda Sagar College of Engineering, Bangalore, India Bangalore, India

AbstractThe wavelet transform is a cutting edge technology in the field of image compression. Wavelet-based coding provides very good improvements in picture quality at higher compression ratios. There are different types of wavelets exists. But most of the wavelets are lossy in nature. Only CDF-5/3 wavelet is lossless in nature. Due to this property this wavelet is used where accuracy of signal is concerned. Here we propose efficient hardware structure for lifting based 5/3 DWT architecture. The proposed both 1D and 2D architecture uses less hardware resources than other existing techniques.

The coding efficiency and the quality of image restoration with the DWT are higher than those with the traditional discrete cosine transform. it is easy to obtain a good compression ratio. As a result, the DWT is highly used in signal processing and image compression, such as JPEG2000, and so on. Traditional DWT architectures are based on convolutions. The next-generation DWTs, which are based on lifting algorithms, are proposed. Compared with convolution-based architecture, lifting-based architectures not only have lower computation complexity but also require less memory. and directly mapping these algorithms to hardware leads to relatively long data path and low efficiency

Index Terms Discrete wavelet transform, CDF-5/3 wavelet , discrete cosine transform.


    The discrete wavelet transform (DWT) has become one of the most used techniques for signal analysis and image processing applications. The discrete wavelet transform (DWT) performs a multiresolution signal analysis which has adjustable locality in both time and frequency domains. Due to it is well time frequency characteristics, one of the most significant uses for (DWT) has been for image compression as in the (JPEG 2000). The available DWT architecture is divided broadly into two schemes named as convolution scheme and lifting scheme. Generally convolution scheme is used to implement DWT filters. But this scheme uses large number of multipliers so it is very difficult to implement and take a large amount of resources in hardware. To eliminate those problems lifting schemes is used. This scheme uses the basic convolution equations in such way that the numbers of multipliers are drastically reduced. Due to this reason this reasons lifting scheme is widely used to build chip than convolution scheme.

    Advantages of the DWT are obvious in many applications. however, the computation complexity and memory requirement are its main drawbacks. These drawbacks have an impact on speed, power consumption and hardware resources. Accordingly, introducing efficient and high speed DWT

    architectures is still a big and important challenge. Thus, various architectures for different wavelet filters to elevate all or part of these drawbacks are introduced.

    Huang et al. [2] suggested an algorithm to improve the critical path by eliminating the multipliers on the path from the input node to the computation node. This is achieved by flipping each computing unit with the inverse of the multiplier coefficient and connecting them in pipeline manner. Finally, critical path is reduced to one multiplier delay with the help of five stage pipelining. Since the predictor/updater dominates the critical path delay, Wu et al. and Lai et al. [7] merged the predictor and updater stages into one single lifting step, thereby reduced the critical path delay to one multiplier. In order to achieve a critical path with only one multiplier at least four pipeline stages are required. Recently, Zhang et al. proposed a three-stage pipeline and fewer registers to store the intermediate results to achieve critical path of one multiplier. In this proposal, the flipping scheme is modified by computation of independent data concurrently. The critical path delay is reduced to one adder by replacing multipliers by shift-and-add logic.

    In this paper, a low complexity multidimensional CDF 5/3 DWT filter and its FPGA implementation is proposed. The main characteristics of the proposed architecture are.

    • Low hardware cost

    • High speed

    • Low output latency

    • Easy implementation due to being composed of similar units


    1. Proposed 1_D DWT

      • Low pass FIR Filter using only shift and add operation.

        The impulse response of 5-taps FIR Filter is considered as hLPF (n)=[-0.125,0.25,0.75,0.25,-0.125]

        The output of Low pass FIR Filter is given as

        YLPF(n ) = 4k=0hLPFx(nk) (1)

        YLPF(n)= -1/8x[n]+1/4x[n2-1]+3/4x[n-2]+1/4x[n-3]-

        1/8x[n4] (2)

        YLPF(n)=(3){-x[n]+(1)x[n-1]+((2)+(1))x[n-2]+(1)x[n- 3]-x[n-4]} (3)

        Where () & () indicates left shift and Right shift respectively.

      • High Pass FIR Filter using only shift and add operation.

        The impulse response of 3-tap FIR Filter is considered as hHPF (n)=[-0.25,0.5,-0.25]

        Where n=1 to 3,

        The output of the high pass FIR Filter is given as YHPF(n)=3k=1hHPF(n)x(n-k) (4)

        YHPF(n)= -1/4x[n-1]+1/2x[n-2]-1/4x[n-3]

        YHPF(n)=(2){-x[n-1]+(1)x[n-2]-x[n3]} (5)

        Fig.1. Proposed 1-D DWT

        The above Fig1 shows the proposed 1_D DWT. It can be designed by using the shifters and adders. The multipliers present in the existing architecture can be replaced by adders and shifters. The proposed model consists of 6 D F/F and. The Function of these to holding the samples.

        Clk is given to each F/F along with input data x(n). x(n) is the image pixels which comes serially.

        By using the final equation(3) of the above low pass FIR Filter using only shift and add operation we can design the Low pass unit which gives the output LPF OUT.

        And by using equation (5) of high pass FIR Filter using only shift and add operation we can design the high pass filter unit that gives the output HPF OUT.

        And another block called clk_divider present which divides the input clock f to f/2. And that will be given to D F/F and from clk_divider, clk_out is obtained.

        Since the architecture is multiplier less the delay will be Reduced and hence, the speed of the 1_D DWT will be increased.

    2. Proposed controller structure

      Fig.2. Proposed controller unit

      The above Fig 2 shows the proposed controller unit it has three counters. The first counter (counter 1) has two inputs clk div and rst. The function of this counter is to control the way how the pixel address of an image should be displayed. Those values will be displayed in matrix form ie from 0 to 32767. Once the values are wrote up to 32767 then the other two counters (counter 2 and counter 3) starts functioning then the read operation starts. In this the required transformation will be done row transformation or column transformation. And this controller unit operates at the frequency of 308 MHZ.

    3. Proposed memory unit

      Fig.3. Proposed memory unit

      The memory unit reads the data from the 1_D DWT. That data contains the pixel values which are separated into low pass filter output and high pass filter output. The controller will controls the way in which memory address will be assigned to the pixels in the memoryblocks.

      There are two memory blocks the output of the memory block 1 is DATA_OUT 1 which contains the actual image it is the output of low pass filter. And the output of memory block 2 is DATA_OUT 2 which is from the high pass filter, this contains the edges.


    The 1D DWT will be designed using the Xilinx tool and the simulation will be done.

    Fig 4. Simulation result of 1D DWT

    Fig 4 shows the 1D DWT simulation we can see that initially when the rst pin becomes high, some garbage value will be present after that lp out and hp out data starts appearing .the input clk f will be divided into f/2.

    Fig.5. Output of 1D DWT.

    We can see the L Band contains the actual image and the H Band contains the edges.

    The simulation result of the controller is observed as below,

    Fig.6. Simulation result of controller unit

    In Fig 6 when the Rd wr is high then the read operation will be performed and when the Rd wr is low then the write address operation will be performed.

    The simulation results for the memory unit can be shown as

    Fig.7. Simulation result of memory unit

    We can see in Fig7, when the Rd wr is high then only read operation of the data will be performed. The output of this block can be given to the 2D DWT further.


In this paper, an efficient hardware structure for high speed 1_D lift DWT architecture is proposed. The proposed architecture is verified by writing the code using VHDL language. Comparing the proposed architectures with other previous architectures in terms of hardware complexity, critical path delay, computation time, and throughput .the proposed architecture can achieve high speed with lower hardware complexity and smaller storage size.


  1. S. Mallat, A Theory for Multiresolution Signal Decomposition: The Wavelet Representation, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674-693, 1989.

  2. C.-T. Huang, P.-C. Tseng, and L.-G. Chen, Flipping structure: An efficient VLSI architecture for lifting-based discrete wavelet transform, IEEE Trans. Signal Process., vol. 52, no. 4, pp. 10801089, Apr. 2004.

  3. G. Xing, J. Li, and Y. Q. Zhang, Arbitrarily shaped video-object coding by wavelet, IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 10, pp. 11351139, Oct. 2001.

  4. P.Wu and L. Chen, An efficient architecture for two-dimensional discrete wavelet transform, IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 4, pp. 536545, Apr. 2001.

[5 J. M. Jou, Y. H. Shiau, and C. C. Liu, Efficient VLSI architectures for the biorthogonal wavelet transform by filter bank and lifting scheme, in Proc. IEEE ISCAS, May 2001, vol. 2, pp. 529532.

[6 ] G. Shi, W. Liu, and L. Zhang, An efficient folded architecture for lifting based discrete wavelet transform, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 4, pp. 290294, Apr. 2009.

  1. B. F. Wu and C. F. Lin, A high-performance and memory-efficient pipeline architecture for the 5/3 and 9/7 discrete wavelet transform of JPEG2000 codec, IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 12, pp. 16151628, Dec. 2005.

  2. Y. K. Lai, L. F. Chen, and Y. C. Shih, A high-performance and memory- efficient VLSI architecture with parallel scanning method for 2-D lifting- based discrete wavelet transform, IEEE Trans. Consum. Electron., vol. 55, no. 2, pp. 400407, May 2009.

  3. Sowmya KB, SavitaSonali and M Nagabushanam, Optimized DA Based DWT-IDWT for Image Compression, International Journal of Electrical and Electronics Engineering, Vol. 1, Issue. 1, 2013.

  4. C. Xiong, J. Tian, and J. Liu, Efficient architectures for two-dimensional discrete wavelet transform using lifting scheme, IEEE Trans. Image Process., vol. 16, no. 3, pp. 607614, Mar. 2007.

Leave a Reply