 Open Access
 Total Downloads : 1153
 Authors : Venkata Vinetha Kasturi, Y. Syamala
 Paper ID : IJERTV2IS50360
 Volume & Issue : Volume 02, Issue 05 (May 2013)
 Published (First Online): 18052013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
VLSI Architecture For DCT Based On Distributed Arithmetic
Venkata Vinetha Kasturi 1, Y. Syamala 2
Dept. of Electronics & Communications Engineering Gudlavalleru Engineering College, Andhra Pradesh, India769008
Abstract
Discrete cosine transform (DCT) is widely used in multimedia communications such as image and video which require high volume of data transmission. An 8×8 2D discrete cosine transform is used in image and video compression standards. DCT is a computation intensive operation. It requires a large number of adders and multipliers for direct implementation. Multipliers consume more power and hence distributed arithmetic (DA) is used to implement multiplication without multiplier so, DA acts as an multiplier . In the proposed method, VLSI architecture of 1D DCT based distributed arithmetic (DA) is for low hardware circuit cost as well as low power consumption. The proposed 1D DCT architecture is implemented in Xilinx ISE Simulator.
With proposed 1D DCT architecture, 2D DCT will implement using row column decomposition technique. Results of proposed architecture with existed architecture are compared and delay and power is reduced to 50% . Further, this project can be extendable by using any other type of faster adder/multiplier in terms of area, speed and power.
Keywords: Discrete cosine transform (DCT), Distributed arithmetic (DA).

Introduction
A discrete cosine transform (DCT) expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio and images (where small high frequency components can be discarded), to spectral methods for the numerical solution of partial differential equations. The use of cosine rather than sine functions is critical in applications such as
compression. The cosine functions are much more efficient where as for differential equations the cosines express a particular choice of boundary conditions.
As like Fourierrelated transform, DFT, discrete cosine transforms (DCTs) express a function or a signal in terms of a sum of sinusoids with different frequencies and amplitudes. And which operates on a function at a finite number of discrete data points. However, this visible difference is merely a consequence of a deeper distinction. A DCT implies different boundary conditions than the DFT or other related transforms.
Frequency analysis of discrete time signals is most convenient in DCT. Discrete cosine transform is the most popular transform technique for image compression and is adopted on various standardized coding schemes. Some applications require realtime manipulation of digital images. Because this, fast algorithms and specific circuits for DCT have been developed. Among the methods for twodimensional DCT, the indirect method based on rowcolumn decomposition is the best method for hardware implementation.
The energy compaction property of the DCT is well suited for image compression since, as in most images, the energy is concentrated in the low to middle frequencies, and the human eye is more sensitive to the middle frequencies.
A large majority of useful image contents change relatively slowly across images, i.e., it is unusual for intensity values to alter up and down several times in a small area, for example, within an 8 x 8 image block. Translate this into the spatial frequency domain, it says that, generally, lower spatial frequency components contain more information than
the high frequency components which often correspond to less useful details and noises.
The Discrete Cosine Transform, transforms data into a format that can be easily compressed. The characteristics of the DCT make it ideally suited for image compression algorithms. These algorithms let you minimize the amount of data needed to recreate a digitized image. Reducing digitized images into the least amount of data possible has some advantages such as Less memory required to store images, Less time may be needed to analyze images, Channel bandwidth efficiency increased when transmiting images.
Performing the DCT on a digitized image creates a data array that can be compressed by data compaction algorithms. Then, data can be stored or transmitted in its compacted form. The image quality depends on the amount of quantization used in the compaction algorithm. To reproduce the original image, the data is retrieved from memory, uncompacted, and an inverse DCT is performed.
Some of today's most popular image data compression applications include, Teleconferencing using motioncompensated video codecs, ISDN multimedia communications including voice, video, text, and images, Video channel transmission using commercial geosynchronous tele communications satellites, Digital facsimile transmission using dedicated equipment and personal computers.
Several image data compression algorithms use the DCT to remove spatial data redundancies in twodimensional (2D) data. Images are subdivided into smaller, twodimensional blocks. These blocks are then processed independently of the neighboring blocks. In general, the two dimensional, discrete cosine transform (2D DCT) transforms an (n x n) data array into an (n x n) result array. First the DCT transforms the columns, then it transforms the rows.

Existing Method
(DCT in pipelined fashion)
For a 2D data X(i, j), 0 i 7 and 0 j 7, 8×8 2D DCT is given by
Where 0<u<7 and 0<v<7 and C(u),C(v)=1/2 for u,v=0,C(u),C(v)=1
Implementation computation is reduced by decomposing 2D DCT in two 8×1 !D DCT & it is given by
(2)
Considering the periodicity and the symmetry of trigonometry function, the following 8 equations can be inferred from equation (2). Then the 1D DCT is simplied as by applying i=0 to 7 from eqn (2)
F(0) =[X(0) + X(1) + X(2) + X(3) + X(4) + X(5) + X(6) + X(7)]P
F(1)=[X(0)X(7)]A+[X(1)X(6)]B+[X(2)X(5)]C + [X(3)X(4)]D
F(2) =[X(0) X(3) X(4)+ X(7)]M+[X(1)X(2)X(5)+ X(6)]N
F(3)=[X(0)X(7)]B+[X(1)X(6)](D)+[X(2)X(5)](A
)+[X(3) X(4)](C)
F(4) =[X(0) – X(1) X(2) + X(3) + X(4) – X(5) – X(6)
+ X(7)]P
F(5) =[X(0) X(7)]C+[X(1) X(6)](A)+[X(2)X(5)] D +[X(3) X(4)]B
F(6) =[X(0) X(3) X(4)+ X(7)]N+[X(1) X(2) X(5)
+ X(6)](M)
F(7) =[X(0) X(7)]D+[X(1)X(6)](C)+[X(2)X(5)] B
+[X(3)X(4)](A)
Where,
M=1/2 cos(/8), N=1/2cos(3/8), P=1/2cos(/4), A=1/2cos(/16), B=1/2cos(3/16), C=1/2cos(5/16), C=1/2cos(7/16);
Let us assume the equations as a1=X(0)+X(1)+X(2)+X(3)+X(4)+X(5)+X(6)+X(7),
a2=X(0)X(1)X(2)+X(3)+X(4)X(5)X(6)+X(7),
b1=X(0)X(7),
b2=X(1)X(6),
b3=X(2)X(5),
b4=X(3)X(4),
c1=X(0)X(3)X(4)+X(7) and
c2X(1)X(2)X(5)+X(6).
By implementing above equqtions in 1D DCT equations, it becomes
F(0)=a1xP, F(4)=a2xP.
F(1)=b1xA+b2xB+b3xC+b4xD. F(3)=b1xBb2xDb3xAb4xB. F(5)=b1xCb2xA+b3xd+b4xB.
F(7)=b1xDb2xC+b3xBb4xA. F(2)=c1xM+c2xN, and F(6)=c1xNc2xM.
Figure 1. VLSI architecture for computing of 8 point DCT in pipeline manner for computation of F(0) and F(4)
Figure 2. VLSI architecture for computing of 8 point DCT in pipeline manner for computation of F(1), F (3), F(5)
Figure.3. VLSI architecture for computing of 8 point DCT in pipeline manner for computation of F(2) and F(6)
In the fig 2,3,4 the 1DDCT architecture is implemented in pipeline fashion ,it requires more no.of multiplers and adders so it consumes more power,area and delay wiil be more.

Proposed Method
(DCT using Distributed Arithmetic)
Distributed arithmetic is a bit level rearrangement of a multiply accumulate to hide the multiplications. It is a powerful technique for reducing the size of a parallel hardware multiply accumulate that is well suited to FPGA designs. It can also be extended to other sum functions such as complex multiplies, fourier transforms. Distributed arithmetic(DA) is an effective method for computing inner products. It uses Look Up Tables(LUT) and accumulators instead of a multipliers.
Distributed arithmetic (DA) provides application in Very Large Scale Integration(VLSI) implementations of Digital Signal Processing(DSP) algorithms. Most of these applications, for example Discrete Cosine Transform(DCT) calculation, are arithmetic intensive with multiply/accumulate (MAC) being the predominant operation.
The advantage of DA approach is that it alerts the basic assumption of using multipliers and adders for computing the DCT.
Figure 4. Distributed Arithmetic
DCT is a computational intensive operation.It requires large number of adders and multipliers for direct implementation .Multipliers consume more power and hence distributed arithmetic (DA) is used to implement multiplication without multiplier.
The above 8 equations F(0) to F(7) are analysed and instead of multiplier Distributed Arithemetic (DA) is used for the arhitecture of 1DDCT
Figure 5. Overall architecture for DA base DCT

Simulation Results
Figure.6. Simulation of 1D DCT Architecture
Figure 7. Simulation of 1D DCT using DA architecture
In this work VHDL code is written for the1D DCT architecture using Distributed Arithmetic and implemented in Xilinx ISE simulator.
Table 1. Device Utilization
Table1 shows the device utilization for both existed method and proposed method . It is observed that the delay is reduced and frequency is increased in the proposed method
Table 2. Power Calculation
Table 2 shows the power calculation for normal DCT and DADCT

Graphical Analysis for Power
Figure 8. Graphical representation for power
In fig.7 xaxis represents frequency, yaxis represents power. Power is analysed for both the methods at 10,100,1000MHz and it is reduced up to 50% for DADCT when compared to normal DCT.

Conclusion
The proposed 1D DCT Architecture using distributed arithmetic is designed and simulated using Xilinx ISE simulator. Power is calculated and it is reduced to 50% when compared to normal DCT. Here the device utilization is compared and delay has been reduced to large extent.

References

Vijay Kumar Sharma, K. K. Mahapatra and Umesh C. Pati,An efficient distributed arithmetic based VLSI architecture for DCT IEEE transaction on devices and communications 2011.

T. Acharya and P. Tsai, JPEG2000 Standard for ImageCompression:Concepts, Algorithms and VLSI Architectures J. Wiley & sons. NJ, 2005.

Gregory K. Wallace, The JPEG Still Picture Compression Standard,IEEE Transactions on Consumer Electronics, vol.38(I), Feb. 1992.

R. C. Gonzalez, R. E. Woods, Digital Image Processing,2nd.Ed.,Prentice Hall, 2002.

F.H.P. Fitzek, M. Reisslein, MPEG4 and H.263 Video Traces for Network Performance Evaluation ,IEEE Network, vol.15, no.6, pp.4054, Nov/Dec 2001.

Luciano Volcan Agostini, Ivan Saraiva Silva and Sergio Bampi,Multiplierless and fully pipelined JPEG compression soft IP targeting FPGAs, Microprocessors and Microsystems, vol. 31(8), 3 pp.487497,Dec. 2007.

S. A. White, Applications of distributed arithmetic to digital signal processing: a tutorial review, IEEE ASSP Magazine, vol.6, no.3, pp.419, Jul.1989.

M.T. Sun, T.C. Chen, A.M. Gottlieb, VLSI Implementation of a 16×16 Discrete Cosine Transform, IEEE Transactions on Circuits and Systems, vol.36, no. 4, pp. 610 617, Apr.1989.

A. Shams, A. Chidanandan, W. Pan, and M. Bayoumi, NEDA: A low power high throughput DCT architecture, IEEE Transactions on Signal Processing, vol.54(3), Mar. 2006.

Peng Chungan, Cao Xixin, Yu Dunshan, Zhang Xing, A 250MHz optimized distributed architecture of 2D 8×8 DCT, 7th International Conference on ASIC, pp. 189 192, Oct. 2007.

M. Kovac, N. Ranganathan, JAGUAR: A Fully Pipelined VLSI Architecture for JPEG Image Compression Standard, Proceedings of the IEEE, vol.83, no.2, pp. 247 258,Feb.1995.

YuanHo Chen, TsinYuan Chang, ChungYi Li, High Throughput DA Based DCT With High Accuracy ErrorCompensated Adder Tree, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. PP, issue 99, pp. 15, Jan 2010.

A. Kassem, M. Hamad, E. Haidamous, Image Compression on FPGA using DCT," International Conference on Advances in Computational Tools for Engineering Applications, 2009, ACTEA '09, pp.320323, 1517 July 2009.

Leila Makkaoui, Vincent Lecuire and JeanMarie Moureaux, Fast Zonal DCTbased image compression for Wireless Camera Sensor Networks, 2nd International Conference on Image Processing Theory Tools and Applications (IPTA), pp. 126129, 2010.

ByoungIl Kim and Sotirios G. Ziavras, LowPower Multiplierless DCT for Image/Video Coders,IEEE 13th International Symposium on Consumer Electronics, 2009. ISCE '09, pp. 133136.

C. H. Chen, B. D. Liu and J. F. Yang, Direct Recursive Structures for Computing Radixr Two Dimensional DCT/IDCT/DST/IDST, IEEE Transactions On Circuits And Systems,I Regular Papers , vol. 51, no. 10, October 2004.

S. An C. Wang, Recursive algorithm, architectures and FPGA implementation of the twodimensional discrete cosine transform, IET Image Process., vol. 2( 6), pp. 286 294, 2008.