Implementation of Random logic in the Quaternary FPGA

DOI : 10.17577/IJERTV1IS6471

Download Full-Text PDF Cite this Publication

Text Only Version

Implementation of Random logic in the Quaternary FPGA

Dedeepya Nutakki#1

Nagaraju Ravada*2

N.V.G.Prasad#3

M.Tech. Student

Assistant Professor

Associate professor

#Dept of Electronics and Communication Engineering,

Sasi Institute Of Technology & Engineering, JNTU Kakinada, India.

Abstract

Field Programmable Gate Arrays (FPGA) are most preferred than ASICS for low non recurring engineering costs. But the interconnections in FPGA plays an important role in speed, area and power consumption as there are large number of wires and switches. Multivalued logic is such a technique which allows the reduction of number of signals in a device thus reducing the impact of interconnects. The proposed work is a new FPGA structure (quaternary instead of binary) which uses a voltage mode device with reduced wires in length, switches, fanout with small delay penalties. We propose a work to reduce these small delays by reducing the critical path delay using random logic. We choose direct from FIR filters as a demonstrator.

Keywords : Field programmable Gate Array (FPGA), FIR, Quaternary, multi valued logic,Random logic.

  1. The high integration of modern systems increases the number and length of interconnections, hence the overall complexity involving the connections of these systems. Moreover with the advent of deep sub-micron technologies, interconnections are becoming the dominant metric of the circuit delay. On the other hand, the interconnection resistance-capacitance product increases with the technology node, leading to an increase of network delay.

    Interconnections play an even more crucial role in Field Programmable Gate Arrays (FPGA).Recent works suggest that in modern million-gates

    FPGAs, as much as 90% of chip area is dedicated to interconnections .If one could reduce the FPGA area without losing logic capabilities one could enhance the yield and reduce prices, or even increase the amount of memory available inside the FPGA. To reduce the area of the FPGA, a reduction in the interconnection is mandatory, since interconnections take large amount of area.

    Multiple-valued logic (MVL) has received increased attention in the last decades because of the possibility to represent the information with more than two discrete levels. Representing data in a MVL system is more effective than the binary based representation, because the number of interconnections can be significantly reduced.

    Recently, a voltage-mode MVL technique was proposed dealing specifically with the power dissipation problem using a standard CMOS process, and still maintaining the logic compaction allowed by MVL. The proposed circuits intend to reduce the number of interconnections present in existing binary-based systems, without incurring on power consumption penalties. The benefits of this new MVL implementation technique were considered for application in the reconfigurable domain.

    A new lookup table (LUT) structure was proposed where the information is represented by quaternary values. In this contribution we show the first steps to tackle the challenge of low power and high density FPGAs, by proposing a quaternary CLB. By using quaternary connections one is able to reduce the number of wires and switches, thus reducing area, power consuming part of current FPGAs. Also by using random

    logic one can increase frequency. A complete arithmetic-oriented CLB is proposed, in which any logic operation can be implemented through a quaternary LUT. A fast carry ahead propagation unit and a register are also presented in the proposed CLB.

    As case study to validate the proposed CLB we have used a digital signal processing application focusing on Finite Impulse Filters (FIR) filters. Only adders/subtractions and shift operations are Used in synthesis of the filters.

    This paper is organized as follows. Section 2 discusses the differences between binary and quaternary implementations of lookup tables. Section 3 presents the n quaternary FPGA, gives details about the new arithmetic-oriented logic block and presents comparisons with the binary version. Section 4 discusses FIR filter implementations using Multiple Constant Multiplications (MCM) as the case study adopted in this work and exemplifies how filters are deployed in the proposed FPGA structure. Experimental results are presented in Section 5. Finally, Section 6 concludes the paper and outlines future work.

  2. General Lookup Tables (LUT) are basically memories, which implement a given logic function. Values are initially stored in the lookup table structure, and once inputs are applied, the logic value in the addressed position is assigned to the output. A 4-input binary lookup table with one output is able to store 16 Boolean values. For the purpose of this work, only 1-output LUTs (n = 1) are discussed in this paper.

    1. Preliminaries

      A binary function implemented by a Binary Lookup Table (BLUT) is defined as : a lookup table with 4 inputs in fig (a) can implement one of 65 536 different functions. In the case, where b = 4 the number of functions that can be represented is around 4 X109 for a QLUT with only two inputs, which is much larger than the BLUT. Figure (b) illustrates a 2-input quaternary function implemented in a QLUT. Note that the function g(Y ) performs exactly the same function as the two binary BLUTs, f0(Y ) and f1(Y ), as depicted in Figure 1c, where f0 represents the least

      significant Boolean values and f1 represents the most significant ones.

      Figure 1 Binary (BLUT) And Quaternary (QLUT) Lookup Tables

      And The Quaternary Function

    2. LUTs Implementation

      Binary and quaternary lookup tables were implemented by a set of multiplexers, such as illustrated in Fig 2.The BLUT is composed of four stages as a consequence of the number of inputs. Multiplexers are responsible for propagating configuration values to the BLUT output. The multiplexers are composed of pass gates, which receive selection signals from the four BLUT inputs and associated inverters.

      Figure 2(a) 4-input BLUT

      A quaternary lookup table follows the same structure as the BLUTs. However, Down Literal Circuits (DLCs) structures determine which configuration value must be propagated to the output .Fig b illustrates the implementation of a 2- input QLUT (b = 4; |Y| = k = 2; |C| = 16). Due to the quaternary representation, each multiplexers has four configuration inputs, therefore only two multiplexers stages are required.

      Figure 2(b) 2-input QLUT

      The DLCs (Gray triangles 1; 2 and 3 in Figure) have structures similar to inverters (with 1 PMOS and 1 NMOS transistor). Transistors in each DLC circuit have modified Vth values in order to allow the switching at different input voltages. This way, the 3 DLCs circuits work as a thermometer system. The DLC output values are only 0 (GND) or 3 (VDD), according to the logic value applied to their inputs. Table below shows the DLC output logic values as function of the inputs.

      Table 1

      Down Literal Circuits (DLCS) Behaviour According To The Logic Value At The Input

  3. In general, FPGAs are basically sets of programmable Configurable Logic Units (CLBs) and interconnections. The CLBs contain LUTs to implement logic and storage elements .CLBs in the Xilinx Spartan-3 FPGA family are composed by two independent groups of two slices. A Slice is a logic/storage unit. The routing among logic blocks are performed through programmable switch matrices. he group of switch matrix and CLBs is called a tile.

    1. The FPGA Logic blocks

      In this work we propose a new quaternary logic block targeting arithmetic functions. Fig 3 illustrates the structure of binary and quaternary logic blocks of the FPGA configured to implement the sum of variables X and Y. The binary logic block (Fig 3a) represents two slices of the Xilinx Spartan-3 FPGAs.

      Carry look-ahead is implemented by propagating the carry signal through two multiplexers from Cin to Cout. The carry propagation signal is define by a XOR function implemented by the BLUT. Otherwise, the carry is generated as one of the inputs.

      Figure 3(a) Binary Logic Block

      We developed the quaternary logic block following the same idea, but considering quaternary functions (Fig3b). The QLUT implements functions of 2 variables as a generalization of the 4-input binary LUT. Table II shows the signal S, implementing the sum of X and Y, and the Cout as function of the inputs X, Y, S and Cin.

      Figure 3(b) Quaternary Logic Block

      The carry propagation/generation in the quaternary element is defined by a modified multiplexer, in such a way that Cout is a function of the input signals X, Y and the QLUT output signal as well. The Quaternary Carry Propagation (QCP) logic is illustrated in Fig 4 and implements the Cout function.

      The QCP logic is divided in two parts. The first part is the carry propagation detection (i:e: generation of the function Cout = Cin). Thus, the same DLC 3 used in the QLUT (Fig2b) is used in the QLC to generate the signal S3.S3 enables the propagation of the carry whenever the QLUT output S=3, which implies S3 =0. See Table I for further details.

      Figure 4 Quaternary Carry Propagation (QCP) Logic

    2. Interconnections

      The FPGA structure is composed by a fully programmable network connecting CLBs, IOs, and other FPGA components. In order to increase the efficacy of the FPGA routing, four types of interconnects are present in the Xilinx Spartan-3 FPGAs: long lines, hex lines, double lines and direct lines. We model the FPGA interconnections as distributed RC networks on the Predictive Technology Model (PTM) parameters.

  4. Table 2

    The QLUT output S and Cout functions

    The carry generation is defined by S3 = 0 and one of other two conditions K1 & K2. K1 & K2 are generated by quaternary logic gates. These conditions determine Cout =1 or Cout =0. First, K1 defines Cout =1 when X =3 or Y =3 and second, K2 defines Cout =1 when X >_2 and Y

    >_2. Otherwise, Cout =0.A D-type Flipflop (FF) is also presented in the quaternary logic block. The FF is composed by quaternary inverters.

    In several computationally intensive operations, notably Finite Impulse Response (FIR) filters, the same input is multiplied by a set of constant coefficients. This operation is called Multiple Constant Multiplications (MCM). MCMs are commonly used in Digital Signal Processing (DSP) applications and are an important choice for reduce the power consumption due to the high level of sharing of operations and the possibility to implement multiplications by using only adders, subtractions and shifts.

    Fig below illustrates the implementation of a filter with 4 taps, in which the sharing of partial terms can be verified. The input x is multiplied by the constants 117, 100, 13 and 36.

    Figure 5(a)

    Figure 5(b) Filter With 4 Taps

    For each set of constant coefficients there are a wide range of possible mapping solutions. In this example, instead of using two adders per coefficient (Figure 5a), the adder that generates the value 3x is shared in order to reduce the number of adders.

    The placement & routing of the filters is very simple to implement in FPGAs. Operators are placed in the CLB columns in order to take advantage of the fast carry look-ahead chain. Horizontally, CLBs are placed according to the succession of operators.

Our experiments were realized with some filters with 8-bit random coefficients. Results are obtained through Xilinx Synthesis Tool. Results show an important reduction of 16% on power consumption (PWR) with no penalty on timing (Freq).

The operation frequency is not much slower in the quaternary implementation due to the number of CLBs in the critical path. In binary implementations of the filters, the number of bits may increase only by one from one adder to the next one. Hence, only a slice (not a complete CLB) is inserted in the critical path. For the quaternary version, the critical path is increased by the delay of the full CLB, because it cannot be separated in two as in the binary case.

Circuits present important gains due to the smaller bus width, but also because shift operations can be performed with reduced vertical connections. This way, the overall performance can be increased, since less switches will be present in the critical path.

This work presents important advances on the development of multi-valued circuits through the implementation of a transistor level arithmetic- oriented quaternary FPGA structure. Results show that the proposed quaternary FPGA is competitive with the binary one because of the important reductions on the connection sizes and number of switches and its effects on the power consumption and circuit performance.

In this paper we have successfully shown that significant power reduction and increased frequency can be achieved by a quaternary device. Increased frequency is obtained by implementing random logic in the quaternary LUT due to the possibility to reduce the number of CLBs without increasing the number of CLBs in the critical path. The quaternary representation applied to the random logic will allow, not only the reduction of the number and size of the connections, but most important, the reduction of the fan out and the load applied to the logic blocks.

The author would like to thank Sasi Institute Of Technology and Engineering, Ashwan Kumar Karrolla of cedronics and reviewers.

  1. A. K. Gupta and W. J. Dally, Topology optimization of interconnection networks, IEEE Comput. Archit. Lett., vol. 5, no. 1, p. 3, 2006.

  2. K. Banerjee, S. Souri, P. Kapur, and K. Saraswat, 3-D ICs: a novel chip design for improving deep-sub micrometer interconnect performance and systems-on- chip integration, Proceedings of the IEEE, vol. 89, no. 5,pp. 602633, May 2001.

  3. F. Li, Y. Lin, L. He, D. Chen, and J. Cong, Power modeling and characteristics of field programmable gate arrays, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24,no. 11, pp. 17121724, Nov. 2005.

  4. A. Singh and M. Marek-Sadowska, Efficient circuit clustering for area and power reduction in FPGAs, in FPGA 02: Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays. New York, NY, USA: ACM, 2002, pp. 5966.

  5. R. da Silva, C. Lazzari, H. Boudinov, and L. Carro, CMOS voltagemode quaternary look-up tables for multi-valued FPGAs, Microelectronics Journal, vol. 40, no. 10, pp. 1466 1470, 2009.

  6. E. Dubrova, Multiple-valued logic in vlsi: challenges and opportunities, in Proceedings of NORCHIP99, 1999, pp. 340350.

  7. T.-S. Jung, Y.-J. Choi, K.-D. Suh, B.-H. Suh, J.-K. Kim, Y.-H. Lim,Y.-N. Koh, J.-W. Park, K.-J. Lee, J.-H. Park, K.-T. Park, J.-R. Kim,J.-H. Yi, and H.-K. Lim, A 117-mm2 3.3-v only 128-mb multilevel NAND flash memory for mass storage applications, IEEE Journal of

    Solid-State Circuits, vol. 31, no. 11, pp. 15751583,Nov 1996.

  8. A. Gonzalez and P. Mazumder, Multiple-valued signed digit adder using negative differential resistance devices, IEEE Transactions on Computers, vol. 47, no. 9, pp. 947959, Sep 1998.

  9. T. Hanyu and M. Kameyama, A 200 MHz pipelined multiplier using 1.5 v-supply multiple-valued mos current-mode circuits with dual-rail source-coupled logic, IEEE Journal of Solid-Sate Circuits, vol. 30,no. 11, pp. 12391245, Nov 1995.

  10. Z. Zilic and Z. Vranesic, Multiple-valued logic in FPGAs, Aug 1993,pp. 15531556 vol.2.

  11. R. Cunha, H. Boudinov, and L. Carro, A novel voltage-mode cmos quaternary logic design, IEEE Transactions on Electron Devices, vol. 53,no. 6, pp. 14801483, June 2006.

  12. L. Aksoy, E. da Costa, P. Flores, and J. Monteiro, Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 27, no. 6, pp. 013 1026, June 2008.

  13. W. Zhao and Y. Cao, New generation of predictive technology model for sub-45nm design exploration, International Symposium on Quality Electronic Design, pp. 585590, 2006.

  14. Xilinx Inc., Spartan-3 fpga family data sheet, 2008. [Online].

    Available:http://www.xilinx.com/support/documentatio n/datasheets/ds099.pdf

  15. T. Sakurai, Closed-form expressions for interconnection delay, coupling, and crosstalk in vlsi s, Electron Devices, IEEE Transactions on, vol. 40, no. 1, pp. 118124, Jan 1993.

  16. T. Mak, C. DAlessandro, P. Sedcole, P. Y. K. Cheung, A. Yakovlev, and W. Luk, Global interconnections in fpgas: modeling and performance analysis, in SLIP 08: Proceedings of the 2008 intern workshop on System level interconnect prediction. New York, NY, USA: ACM, 2008, pp. 5158.

  17. Cristianolazzari,Paulo flores , Jose Monteiro and Luigi Carro,a new quaternary FPGA based on voltage- mode Multivalued circuit, 2010 EDAA

Leave a Reply