A Partial Carry-Save on-the-fly Correction Multispeculative Multiplier using with Modified CSLA

Download Full-Text PDF Cite this Publication

Text Only Version

A Partial Carry-Save on-the-fly Correction Multispeculative Multiplier using with Modified CSLA

Kaja Naga Venkata Akhil , Mr.P.Sathish Kumar

Department Of Electronics and Communication Amrita School of Engineering, Bangalore, Amrita Viswa Vidyapeetham, India

Abstract: While multispeculative multiplier radix-8 booth on the fly (OMSM-B8) architecture is makes to solve the higher order of radix booth multiplication and also it helped in using the solve 3X calculation by decoupling it and operated independently. But to improve the OMSB-8 architecture performance partial carry save adders is modified efficiently that are present in the OMSB architecture. Radix 8 Booth multispeculative On-the-Fly Correction Multipliers outperform their previous radix 4 version because they are more energy efficient than the radix 4 to radix 8 versions. So to improve the performance of the OMSB-8 architecture we have to we are replacing the architecture with the modified carry save adder. Here, in this architecture of carry save adder we are utilizing 2 pairs of ripple carry adder (RCA) in which one pair of RCA is replaced with the binary excess code (BEC) in which it helps to improve the performance of the architecture by 27%. The use of modified CSA architecture that performs in the OMSB-8 makes arithmetic tasks quicker than any other adder found in various data-processing devices. The CSLA structure shows that there is a way to decrease the latency and power consumption. The developmental characteristics will decrease the amount of space required, enhance performance, and reduce power consumption. These applications are primarily used in multimedia and signal processing applications, which helps to enhance the overall architecture's performance.

Index terms: Booth, Area-efficient, carry select adder (CSLA), Binary to excess converter (BEC)

  1. INTRODUCTION

    Addition and Multiplications are the key operators in the digital system designs and processors. Complex operations like multiplications, divisions, log operations, square-root operations are based on the basic operators.

    The easiest way to make single additions is to use initial adders. In a nutshell, they forecast incoming weights (carriers) by computing them in concurrently. The height of a partial product matrix is reduced via radix booth recoding. Because only the 0X, 1X, and 2X multiples of a product X*Y must be created, the height of the nxn multiplier is decreased from n to (n+1)/. = 2. As can be seen, they're all simple to compute utilizing shift and negation operations. Hard multiples arise for > 2, those that cannot be handled with only shifts and negations, as well as the penalty for calculating them outweighs the advantages from lowering the partial product matrix height. The intermediate results are gathered and reduced to just operands in a Partially Product Matrix That combines these two operands and

    produces the final output. To improve performance even more, this concept may be applied on any sort of adder.

    In this study, we also examine at multispeculative functional units (MSFU). To get around this, we recommend using a carry- save multiplier, which breaks n-bit adders to numerous k-bit segments and guesses the amount of the segment incoming carries[1]-[3] leading in incomplete carry-save outputs. To multiply one or all of these results with a normal two-input multiplier, they must first be changed to a not characterized by repetition form. A number of penalty cycles may arise as a result of this. It requires 2 partial take inputs but also provides a partial carry-save result.

    In terms of speed, the (OFMSM) Conventional carry choose adder performs better. Because logic circuit sharing compromises the length of the parallel path, the latency of our suggested architecture rises only little. The recommended region carry select adder leverages the same partially parallel computing to enhance adding performance, use the same technique as the present carry select adder to transform excess-1 signals (BEC). To increase performance yet further, similar concept may well be utilized with just about any kind of adder. By replacing the RCA in a standard CSLA with a BEC, we may save space and power. The BEC logic has a strong benefit over the FA structure in terms of total quantity of logic gates. The primary aim of this proposed project is to examine the speed of addition while using n-bit BEC. This technique may also be employed to enhance the capacity of every adder further in. In a traditional CSLA, we may save space and power by employing a BEC instead of RCA. The BEC logic structure has a significant advantage over the FA structure in proportion to the number of logic gates utilized. Proposed technique:

    The Region Power Saving CSLA Circuit is altered by this notion. In digital adders, the time it would take to transmit a carry via the adder limits the speed of addition. There are a few distinct adder processors and systems on the market. The time required for a transfer to travel via a computerized adder has had an effect on adding efficiency. Only when the previous bit position has been summed can another bit value be determined and a carry has been transferred to the next place in a primitive adder, the total for every bit location is generated sequentially. Any adder's main speed constraint is the generation of carries. Now, utilizing the updated carry save adder, we can increase the performance of the

    architecture. In terms of speed, a traditional CSLA outperforms. Because circuit design pooling shortens the paralleled route, the latency of our proposed design is only marginally enhanced. The proposed area-efficient CSLA, on the other hand, retains the traditional CSLA somewhat massively parallel topology. BECs are being used to increase the speed the adding operation. To improve speed even more, this logic may be used with any type of adder [5]. By using a BEC rather than an RCA in a conventional CSLA, we may save space and power. In terms of number of logic gates, the BEC logic has a significant benefit over the FA structure.

  2. RELATED WORK:

    1. With on-the-fly correction on the Radix-8 Booth multispeculative Multiplier architecture:

      The below diagram are proposed radix-8 Booth Multispeculative multipier.

      Fig:1 with on-the-fly correction on the Radix-8 Booth multispeculative Multiplier architecture.

      In the picture above, the OMSM-B8 accepts 2 parameters and the 3 times X (2X+X) then it gives the result for partial carry save mode. the same format as the input [6].Here we are having two types of vectors that are represented in the diagram as result vectors as U, V, W and carry vectors as A, B, D to compute Z=Sum + Carry. Real carries are calculated in the bottom carry calculation block utilizing create and propagate by k-bit signals. The OMSM diagram's left side is utilized to compute utilizing the Radix-8 booth encoding approach, which employs the carry choose based technique. The radix-8 booth recoder multiplier operand Y is divided in the groups in such a way that it overlaps with each other, i.e. the MSB of one bit overlaps with the LSB of another bit. Here two recoders are working in parallel B8_0 and B8_1 in which there will be a selection of B8_0 when the carry-in will be 0 and B8_1 will be selected when the carry-in will be 1.As the selection of the radix-8 booth encoder is depend upon the carry so this process of the architecture is named as carry select booth encoder and as it follows the radix-8 techniques so finally this architecture is named as the B8CSBE cell is a Radix-8 carry select booth encoder.

      Fig.2 Radix-8 carries a limited number of booth encoder cells.

      Because it may provide different LSB owing to earlier produce bits of distinct MSBT, it is first utilized to construct the real booth tuple in tp. The calculation of MSB is done by MSBU and there also be carry-in internal carries within the fragment due to B8_1. Assume the following signals are used as inputs:

      The inputs are msbp-1, Cyi-1, and ctrlp-1, while the outputs are msbp, ctrlp, and Cp.

      1. Msbp-1: This is the most important bit from the preceding cell of msbp, and it is used to choose the current cell's real select.

        The real carry calculation unit calculates the ith fragment's real carry. Each internal carry is allocated a piece. It's worth noting that because of the radix 8, k will be a multiple of three.

      2. Ctrlp-1: It is a control signal in which it is decided to select B8_0 and B8_1 as shown in fig: 6. (p-1) will decide to selecting with B8_0 or B8_1 as per the ctrl signal at which ctrl=1 as B8_1and ctrl=0 as B8_0.

      3. Msbp: That's the precise most significant bit generated either by current cell . It's critical that keep in mind that such a sign is in some way unclear uses v3p+2, v3p+1, as well as v3p, thus any changes are unlikely. The lsb waves never make it to the next cell.

      4. Cp: This is the internal carry produced in the current cell by the B8 1 recoding (p).The same remark as with msbp must be made: it is the logic AND of v3p+2, v3p+1, and v3p, and it is unaffected by the lsb.

      5. Ctrlp: This signal is created using a Boolean AND with a configurable a collection of data in reality, k would be small, but no fragment include more over three tuples. Consider the tuple t2 = v8v7v6v5 as an instance. Inner carry c1 affects bits v6, but not bit v5. v8 is affected by true carrier Cy2. As a result, four distinct options for correctly recoding this tuple are possible. In the tuple t3 = v11v10v9v8, this happens at different points, notably v8 and v9. To put it another way, unlike Figure 3.8, all of the tuples in this example are aligned [8].

      6. Cyi: The actual carry calculation unit is used for this. Every inner carry is associated with a piece. It's worth noting that due of the radix 8, k will be a multiple of three.

      The theoretical justification for this design decision may be found in the work's appendix, there's also a proof. Due to the usage of radix 8, the remainder of the article should focus on fragment sizes that are multiples of three for the data points to be matched this section explains how well the package was made that computes the Multispeculative Tripler (MST) is depicted in Figure 8, which has three main stages: operand creation, reduction, and MSADD. The MST is used to compute T =W +

      D with the input X = U + A. We divide 3X into four operands, each of which is straightforward to calculate because to the tiny shifts required

      The preceding combinations must be computed in radix-8 Booth recoding with the 3X combinations being the most complex to compute. 3X is usually calculated by multiplying 2X + X by 3X, which increases the its multiplier's main route. Designers generate those 3X multiples in our study by exploiting the presence of spare phases with in data stream. Designers by radix-8 computation rather than radix 4-based recoding in this method of compromising the radix 8 Booth

      multiplication key route. Our solution surpasses Booth's radix 4 and radix 8 techniques, according to tests.

      Expressions for 4 bit BEC are:

      Y 0

      = ~A0

      (1)

      Y 1

      = A0^A1

      (2)

      Y 2

      = A2^ (A0

      & A1)

      (3)

      Y 3

      = A3^ (A0

      & A1 &A2)

      (4)

      Expressions for 4 bit BEC are:

      Y 0

      = ~A0

      (1)

      Y 1

      = A0^A1

      (2)

      Y 2

      = A2^ (A0

      & A1)

      (3)

      Y 3

      = A3^ (A0

      & A1 &A2)

      (4)

      Fig. 3 a) B8CSEL fragment with k=3 bits. b) A B8CSEL fragment with k=4 bits. d) A B8CSEL fragment with k=6 bits.

    2. Basic adder blocks delay and area evaluation

    To improve speed even more, this logic may be utilized with any sort of adder. We can save area and power in a conventional CSLA by using a BEC rather than an RCA. The major benefit of the whole BEC reasoning is that it has fewer logic gates and takes up the same amount of space as a Full Adder (FA) structure. The number of gates in a logic block's longest path that cause the most delay is then put together [7]. Calculating the maximum count of AOI gates needed for every logic block gives the size. Technique is used to evaluate the 2:1 mux, Half Adder (HA), and FA CSLA adder blocks.

    Fig 4: Delay and Area evaluation of an XOR gate.

  3. PROPOSED DESIGN:

  1. Basic structure of BEC logic:

    The fundamental component of an arithmetic unit is an adder, and a complicated digital signal processing system contains numerous adders. The CSLA is used in many computing systems to reduce carry propagation delay by creating multiple carries independently and then picking one just to produce the aggregate. The architecture of an RCA was straightforward, even though main issue is delay in carry propagation (CPD).

    The carry select adder (CSLA) seems to be the fastest of the classic adder architectures. In a conventional carry select adder, an RCA arrangement produces a pair consecutive total words. One of every pairing is chosen again eventual sum and carry. A traditional CSLA has a lower CPD than that of an RCA, but the dual RCA has a higher CPD makes the appearance unattractive [29]. After so many unsuccessful attempts, it was decided to use one RCA and one add-one circuit rather than two RCAs in the CSLA design. In terms of speed, a traditional carry choose adder outperforms. The latency of our suggested architecture is only minimally improved since logic circuit processing decreases the number of concurrent paths An region carry select adder, on the other hand, uses the same amount of space and power as the classic carry select adder while maintaining the same partly parallel processing architecture.

    Table.I

    Function table of the 4-bit BEC

    000

    001

    001

    010

    010

    011

    011

    100

    100

    101

    101

    110

    110

    111

    111

    000

    1. CSLA basic structure in 16-bits

      In terms of speed, the 16-b standard modified CS traditional CSLA beats out the 16-b standard modified CS ordinary CSLA. Since logic circuit pooling shortens the concurrent path, the latency of our proposed design is only marginally enhanced.

      The suggested area-efficient CSLA, but at the other side, retains massively parallel design of the standard carry select adder.

      In group2, one set of 2-b RCA has two FA for Cin=1, while the other set has one FA and one HA for Cin=0.

      Gate Count = 57 (Full adder +Half Adder +Mux)

      (5)

      Full adder=39(3*13)

      (6)

      Half adder=6(1*6)

      (7)

      Mux=12(3*4)

      (8)

    2. CSLA delay and area evaluation with the BEC converter To optimize space and power, Figure 6 depicts the design of a intended 16-b modified CSLA with BEC for RCA and carry in=1. Yet again, we separated the framework into five categories. In terms of speed, the stages leading up to the traditional carry choose adder perform better. The latency of

our suggested architecture is only marginally improved since logic circuit shared shortens the parallel path.

The suggested area-efficient CSLA, but at the other side, maintains the partially massively parallel design of the traditional CSLA, which is based on s3 with multiplexer with incomplete c3 and mux. Sum2's performance is influenced by both carry in and Mux.

Fig: 5 CSLA circuit using BEC Converter

The mux's arrival time for the rest of the party. For the remaining groups, the arrival rate of mux choice input is always greater than that of the ready queue of information inputs from the BECs [14]. As a result, the arrival time of mux selection input and the mux delay determine the delay of the remaining groups.

The following method is used to calculate the area count of group 2:

Gate count =43(Full Adder + Half Adder + Multiplexer + BEC)…. (12) Full adder= 13(1*13) .. (13) Half Adder=6(1*6)… (14)

AND=NOT=1.. (15)

XOR=10(2*5).. … (16)

Multiplexer=12(3*4) (17)

XOR=10(2*5).. (18)

Table:II

Comparing the synthesis result using OMSM-B8 using with CSA and modified CSA

Logic

Used

Available

Utilization

Used

Available

Utilization

Utilization

Using

Using

Using CSA

Modified

Modified

Modified

CSA

CSA

CSA

CSA

CSA

Number

of

732

4656

15%

560

4656

12%

slices

Number

of

24

9312

18%

978

9312

10%

slices flip-

flops

Number of 4

1234

9312

13%

568

9312

7%

input LUTs

Number

of

98

232

42%

82

232

35%

bonded

IOBs

Number

of

1

24

4%

1

24

4%

GCLKs

IV SIMULATION RESULTS

Fig: 6 Simulation results of the modified OMSM using modified CSLA

The above simulation shows the correct result that is occurred after the modification of the architecture that where two inputs 000101 and 000011 and the results produced is 1111

V CONCLUSION

To improve the performance of OMSM-B8 addition, we chose a carry choose adder to replace the modified carry save adder, which uses binary to BEC. To improve performance even more, this concept may be applied on any sort of adder in a typical CSLA, we may achieve lower space and power consumption by utilizing a BEC instead of RCA. In terms of the number of logic gates, the BEC logic provides a good benefit and over FA structure. As a result, VLSI hardware is small, low-power, easy to use, and efficient. And if we observe in the flops section from the synthesis report we can say that almost 8% slices utilized are reduced and LUTs utilized are also very less and number of bonded IOBs Utilized are very less almost 8% has been reduced. By overall comparison we can say that as area consumed is 1.983W (1983mW) that of modified CSA is 1.977W (1977mW) and the power consumption reduced to almost 0.6%

REFERENCES

  1. A Combined Arithmetic High-Level Synthesis Solution to Deploy Partial Carry-Save Radix-8 Booth Multipliers in Data paths Alberto

    1. Del Barrio;Roman;Seda Ogrenci Memik IEEE Transactions on Circuits and Systems I: Regular Papers Year:2019 Volume:66, Issue: 2 | Journal|Article

      |Publisher: IEEECitedby: Papers(3)(base paper)

      DOI: 10.1109/TCSI.2018.2866172

  2. G. D. Micheli, Synthesis and Optimization of Digital Circuits, 1st Ed. New York, NY, USA: McGraw-Hill, 1994.

  3. S. Gupta, A. Nicolau, N. D. Dutt, and R. K. Gupta, SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits. Norwell, MA, USA: Kluwer, 2004.

  4. P. P. Coussy and A. Morawiec, High-Level Synthesis: From Algorithm to Digital Circuit, 1st ed. Dordrecht, the Netherlands: Springer, 2008. [Online]. Available:

  5. Analysis of High Speed Radix-4 Serial Multiplier B.V.N Tarun Kumar;Aravind Chitiprolu;G Hemanth Kumar Reddy;Sonali Agrawal 2020 Third International Conference on Smart Systems and

    Inventive Technology (ICSSIT) Year: 2020 | Conference Paper | Publisher: IEEE

  6. Prabhu E. and Reddy, B. Madhukar, An Efficient 16-Bit Carry Select Adder With Optimized Power and Delay, International Journal of Applied Engineering Research, vol. 10, 2015

  7. High Speed Low Power Radix 4 Approximate Booth Multiplier Nivya Rose Varghese;Swaminadhan Rajula 2019 3rd International Conference on Electronics,Materials Engineering

& Nano-Technology

[14] A. A. Del Barrio and R. Hermida, A slack-based approach to efficiently deploy radix 8 booth multipliers, in Proc. Design, Automat. Test Eur. (DATE), 2017, pp. 11531158.(IEMENTech)

Year: 2019 | Conference Paper

|Publisher: IEEE Cited by: Papers (2)

  1. Design of high speed multiplier using modified booth algorithm with hybrid carry look-ahead adder R Balakumaran;E Prabhu 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) Year: 2016 | Conference Paper | Publisher: IEEE Cited by: Papers (18)

  2. A Delay Efficient Vedic Multiplier E. Prabhu, H. Mangalam & P. R. Gokul Proceedings of the National Academy of Sciences, India Section A: Physical Sciences volume 89, pages257268 (2019)

  3. Design of high speed multiplier using modified booth algorithm with hybrid carry look-ahead adder R Balakumaran;E Prabhu 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) Year: 2016 | Conference Paper | Publisher: IEEE

  4. Performance analysis of Wallace and radix-4 Booth-Wallace multipliers Shahzad Asif;Yinan Kong 2015 Electronic System Level Synthesis Conference (ESLsyn) Year: 2015 | Conference Paper |

    Publisher: IEEE

  5. S. Shah and Swaminathan, R., Design of FIR Filter Architecture for Fixed and Reconfigurable Applications using Highly Efficient Carry Select Adder, in International Conference on Soft Computing and Signal Processing (ICSCSP-2018) .

  6. D. De Caro, E. Napoli, D. Esposito, G. Castellano, N. Petra, and A.

G. M. Stroll, Minimizing coefficients word length for piecewise polynomial hardware function evaluation with exact or faithful rounding, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 64, no. 5, pp. 1187 1200, May 2017.

[15]H. H. Saleh, B. S. Mhammad, and E. E. Swartzlander, The optimum Booth radix for low power integer multipliers, in Proc. 8th Int. IEEE Design Test Symp. (IDT), Dec. 2013, pp. 14

Leave a Reply

Your email address will not be published. Required fields are marked *