A Partial Carry-Save on-the-fly Correction Multispeculative Multiplier using with Modified CSLA

DOI : 10.17577/IJERTV10IS090186

Text Only Version

A Partial Carry-Save on-the-fly Correction Multispeculative Multiplier using with Modified CSLA

Kaja Naga Venkata Akhil , Mr.P.Sathish Kumar

Department Of Electronics and Communication Amrita School of Engineering, Bangalore, Amrita Viswa Vidyapeetham, India

Index terms: Booth, Area-efficient, carry select adder (CSLA), Binary to excess converter (BEC)

1. INTRODUCTION

Addition and Multiplications are the key operators in the digital system designs and processors. Complex operations like multiplications, divisions, log operations, square-root operations are based on the basic operators.

The easiest way to make single additions is to use initial adders. In a nutshell, they forecast incoming weights (carriers) by computing them in concurrently. The height of a partial product matrix is reduced via radix booth recoding. Because only the 0X, 1X, and 2X multiples of a product X*Y must be created, the height of the nxn multiplier is decreased from n to (n+1)/. = 2. As can be seen, they're all simple to compute utilizing shift and negation operations. Hard multiples arise for > 2, those that cannot be handled with only shifts and negations, as well as the penalty for calculating them outweighs the advantages from lowering the partial product matrix height. The intermediate results are gathered and reduced to just operands in a Partially Product Matrix That combines these two operands and

produces the final output. To improve performance even more, this concept may be applied on any sort of adder.

In this study, we also examine at multispeculative functional units (MSFU). To get around this, we recommend using a carry- save multiplier, which breaks n-bit adders to numerous k-bit segments and guesses the amount of the segment incoming carries[1]-[3] leading in incomplete carry-save outputs. To multiply one or all of these results with a normal two-input multiplier, they must first be changed to a not characterized by repetition form. A number of penalty cycles may arise as a result of this. It requires 2 partial take inputs but also provides a partial carry-save result.

The Region Power Saving CSLA Circuit is altered by this notion. In digital adders, the time it would take to transmit a carry via the adder limits the speed of addition. There are a few distinct adder processors and systems on the market. The time required for a transfer to travel via a computerized adder has had an effect on adding efficiency. Only when the previous bit position has been summed can another bit value be determined and a carry has been transferred to the next place in a primitive adder, the total for every bit location is generated sequentially. Any adder's main speed constraint is the generation of carries. Now, utilizing the updated carry save adder, we can increase the performance of the

architecture. In terms of speed, a traditional CSLA outperforms. Because circuit design pooling shortens the paralleled route, the latency of our proposed design is only marginally enhanced. The proposed area-efficient CSLA, on the other hand, retains the traditional CSLA somewhat massively parallel topology. BECs are being used to increase the speed the adding operation. To improve speed even more, this logic may be used with any type of adder [5]. By using a BEC rather than an RCA in a conventional CSLA, we may save space and power. In terms of number of logic gates, the BEC logic has a significant benefit over the FA structure.

2. RELATED WORK:

1. With on-the-fly correction on the Radix-8 Booth multispeculative Multiplier architecture:

The below diagram are proposed radix-8 Booth Multispeculative multipier.

Fig:1 with on-the-fly correction on the Radix-8 Booth multispeculative Multiplier architecture.

In the picture above, the OMSM-B8 accepts 2 parameters and the 3 times X (2X+X) then it gives the result for partial carry save mode. the same format as the input [6].Here we are having two types of vectors that are represented in the diagram as result vectors as U, V, W and carry vectors as A, B, D to compute Z=Sum + Carry. Real carries are calculated in the bottom carry calculation block utilizing create and propagate by k-bit signals. The OMSM diagram's left side is utilized to compute utilizing the Radix-8 booth encoding approach, which employs the carry choose based technique. The radix-8 booth recoder multiplier operand Y is divided in the groups in such a way that it overlaps with each other, i.e. the MSB of one bit overlaps with the LSB of another bit. Here two recoders are working in parallel B8_0 and B8_1 in which there will be a selection of B8_0 when the carry-in will be 0 and B8_1 will be selected when the carry-in will be 1.As the selection of the radix-8 booth encoder is depend upon the carry so this process of the architecture is named as carry select booth encoder and as it follows the radix-8 techniques so finally this architecture is named as the B8CSBE cell is a Radix-8 carry select booth encoder.

Fig.2 Radix-8 carries a limited number of booth encoder cells.

Because it may provide different LSB owing to earlier produce bits of distinct MSBT, it is first utilized to construct the real booth tuple in tp. The calculation of MSB is done by MSBU and there also be carry-in internal carries within the fragment due to B8_1. Assume the following signals are used as inputs:

The inputs are msbp-1, Cyi-1, and ctrlp-1, while the outputs are msbp, ctrlp, and Cp.

1. Msbp-1: This is the most important bit from the preceding cell of msbp, and it is used to choose the current cell's real select.

The real carry calculation unit calculates the ith fragment's real carry. Each internal carry is allocated a piece. It's worth noting that because of the radix 8, k will be a multiple of three.

2. Ctrlp-1: It is a control signal in which it is decided to select B8_0 and B8_1 as shown in fig: 6. (p-1) will decide to selecting with B8_0 or B8_1 as per the ctrl signal at which ctrl=1 as B8_1and ctrl=0 as B8_0.

3. Msbp: That's the precise most significant bit generated either by current cell . It's critical that keep in mind that such a sign is in some way unclear uses v3p+2, v3p+1, as well as v3p, thus any changes are unlikely. The lsb waves never make it to the next cell.

4. Cp: This is the internal carry produced in the current cell by the B8 1 recoding (p).The same remark as with msbp must be made: it is the logic AND of v3p+2, v3p+1, and v3p, and it is unaffected by the lsb.

5. Ctrlp: This signal is created using a Boolean AND with a configurable a collection of data in reality, k would be small, but no fragment include more over three tuples. Consider the tuple t2 = v8v7v6v5 as an instance. Inner carry c1 affects bits v6, but not bit v5. v8 is affected by true carrier Cy2. As a result, four distinct options for correctly recoding this tuple are possible. In the tuple t3 = v11v10v9v8, this happens at different points, notably v8 and v9. To put it another way, unlike Figure 3.8, all of the tuples in this example are aligned [8].

6. Cyi: The actual carry calculation unit is used for this. Every inner carry is associated with a piece. It's worth noting that due of the radix 8, k will be a multiple of three.

The theoretical justification for this design decision may be found in the work's appendix, there's also a proof. Due to the usage of radix 8, the remainder of the article should focus on fragment sizes that are multiples of three for the data points to be matched this section explains how well the package was made that computes the Multispeculative Tripler (MST) is depicted in Figure 8, which has three main stages: operand creation, reduction, and MSADD. The MST is used to compute T =W +

D with the input X = U + A. We divide 3X into four operands, each of which is straightforward to calculate because to the tiny shifts required

The preceding combinations must be computed in radix-8 Booth recoding with the 3X combinations being the most complex to compute. 3X is usually calculated by multiplying 2X + X by 3X, which increases the its multiplier's main route. Designers generate those 3X multiples in our study by exploiting the presence of spare phases with in data stream. Designers by radix-8 computation rather than radix 4-based recoding in this method of compromising the radix 8 Booth

multiplication key route. Our solution surpasses Booth's radix 4 and radix 8 techniques, according to tests.

 Expressions for 4 bit BEC are: Y 0 = ~A0 (1) Y 1 = A0^A1 (2) Y 2 = A2^ (A0 & A1) (3) Y 3 = A3^ (A0 & A1 &A2) (4)
 Expressions for 4 bit BEC are: Y 0 = ~A0 (1) Y 1 = A0^A1 (2) Y 2 = A2^ (A0 & A1) (3) Y 3 = A3^ (A0 & A1 &A2) (4)

Fig. 3 a) B8CSEL fragment with k=3 bits. b) A B8CSEL fragment with k=4 bits. d) A B8CSEL fragment with k=6 bits.

2. Basic adder blocks delay and area evaluation

To improve speed even more, this logic may be utilized with any sort of adder. We can save area and power in a conventional CSLA by using a BEC rather than an RCA. The major benefit of the whole BEC reasoning is that it has fewer logic gates and takes up the same amount of space as a Full Adder (FA) structure. The number of gates in a logic block's longest path that cause the most delay is then put together [7]. Calculating the maximum count of AOI gates needed for every logic block gives the size. Technique is used to evaluate the 2:1 mux, Half Adder (HA), and FA CSLA adder blocks.

Fig 4: Delay and Area evaluation of an XOR gate.

3. PROPOSED DESIGN:

1. Basic structure of BEC logic:

The fundamental component of an arithmetic unit is an adder, and a complicated digital signal processing system contains numerous adders. The CSLA is used in many computing systems to reduce carry propagation delay by creating multiple carries independently and then picking one just to produce the aggregate. The architecture of an RCA was straightforward, even though main issue is delay in carry propagation (CPD).

The carry select adder (CSLA) seems to be the fastest of the classic adder architectures. In a conventional carry select adder, an RCA arrangement produces a pair consecutive total words. One of every pairing is chosen again eventual sum and carry. A traditional CSLA has a lower CPD than that of an RCA, but the dual RCA has a higher CPD makes the appearance unattractive [29]. After so many unsuccessful attempts, it was decided to use one RCA and one add-one circuit rather than two RCAs in the CSLA design. In terms of speed, a traditional carry choose adder outperforms. The latency of our suggested architecture is only minimally improved since logic circuit processing decreases the number of concurrent paths An region carry select adder, on the other hand, uses the same amount of space and power as the classic carry select adder while maintaining the same partly parallel processing architecture.

Table.I

Function table of the 4-bit BEC

 000 001 001 010 010 011 011 100 100 101 101 110 110 111 111 000
1. CSLA basic structure in 16-bits

In terms of speed, the 16-b standard modified CS traditional CSLA beats out the 16-b standard modified CS ordinary CSLA. Since logic circuit pooling shortens the concurrent path, the latency of our proposed design is only marginally enhanced.

The suggested area-efficient CSLA, but at the other side, retains massively parallel design of the standard carry select adder.

In group2, one set of 2-b RCA has two FA for Cin=1, while the other set has one FA and one HA for Cin=0.

2. CSLA delay and area evaluation with the BEC converter To optimize space and power, Figure 6 depicts the design of a intended 16-b modified CSLA with BEC for RCA and carry in=1. Yet again, we separated the framework into five categories. In terms of speed, the stages leading up to the traditional carry choose adder perform better. The latency of

our suggested architecture is only marginally improved since logic circuit shared shortens the parallel path.

The suggested area-efficient CSLA, but at the other side, maintains the partially massively parallel design of the traditional CSLA, which is based on s3 with multiplexer with incomplete c3 and mux. Sum2's performance is influenced by both carry in and Mux.

Fig: 5 CSLA circuit using BEC Converter

The mux's arrival time for the rest of the party. For the remaining groups, the arrival rate of mux choice input is always greater than that of the ready queue of information inputs from the BECs [14]. As a result, the arrival time of mux selection input and the mux delay determine the delay of the remaining groups.

The following method is used to calculate the area count of group 2:

AND=NOT=1.. (15)

XOR=10(2*5).. … (16)

Multiplexer=12(3*4) (17)

XOR=10(2*5).. (18)

Table:II

Comparing the synthesis result using OMSM-B8 using with CSA and modified CSA

 Logic Used Available Utilization Used Available Utilization Utilization Using Using Using CSA Modified Modified Modified CSA CSA CSA CSA CSA Number of 732 4656 15% 560 4656 12% slices Number of 24 9312 18% 978 9312 10% slices flip- flops Number of 4 1234 9312 13% 568 9312 7% input LUTs Number of 98 232 42% 82 232 35% bonded IOBs Number of 1 24 4% 1 24 4% GCLKs

IV SIMULATION RESULTS

Fig: 6 Simulation results of the modified OMSM using modified CSLA

The above simulation shows the correct result that is occurred after the modification of the architecture that where two inputs 000101 and 000011 and the results produced is 1111

V CONCLUSION

To improve the performance of OMSM-B8 addition, we chose a carry choose adder to replace the modified carry save adder, which uses binary to BEC. To improve performance even more, this concept may be applied on any sort of adder in a typical CSLA, we may achieve lower space and power consumption by utilizing a BEC instead of RCA. In terms of the number of logic gates, the BEC logic provides a good benefit and over FA structure. As a result, VLSI hardware is small, low-power, easy to use, and efficient. And if we observe in the flops section from the synthesis report we can say that almost 8% slices utilized are reduced and LUTs utilized are also very less and number of bonded IOBs Utilized are very less almost 8% has been reduced. By overall comparison we can say that as area consumed is 1.983W (1983mW) that of modified CSA is 1.977W (1977mW) and the power consumption reduced to almost 0.6%

REFERENCES

1. A Combined Arithmetic High-Level Synthesis Solution to Deploy Partial Carry-Save Radix-8 Booth Multipliers in Data paths Alberto

1. Del Barrio;Roman;Seda Ogrenci Memik IEEE Transactions on Circuits and Systems I: Regular Papers Year:2019 Volume:66, Issue: 2 | Journal|Article

|Publisher: IEEECitedby: Papers(3)(base paper)

DOI: 10.1109/TCSI.2018.2866172

2. G. D. Micheli, Synthesis and Optimization of Digital Circuits, 1st Ed. New York, NY, USA: McGraw-Hill, 1994.

3. S. Gupta, A. Nicolau, N. D. Dutt, and R. K. Gupta, SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits. Norwell, MA, USA: Kluwer, 2004.

4. P. P. Coussy and A. Morawiec, High-Level Synthesis: From Algorithm to Digital Circuit, 1st ed. Dordrecht, the Netherlands: Springer, 2008. [Online]. Available:

5. Analysis of High Speed Radix-4 Serial Multiplier B.V.N Tarun Kumar;Aravind Chitiprolu;G Hemanth Kumar Reddy;Sonali Agrawal 2020 Third International Conference on Smart Systems and

Inventive Technology (ICSSIT) Year: 2020 | Conference Paper | Publisher: IEEE

6. Prabhu E. and Reddy, B. Madhukar, An Efficient 16-Bit Carry Select Adder With Optimized Power and Delay, International Journal of Applied Engineering Research, vol. 10, 2015

7. High Speed Low Power Radix 4 Approximate Booth Multiplier Nivya Rose Varghese;Swaminadhan Rajula 2019 3rd International Conference on Electronics,Materials Engineering

& Nano-Technology

[14] A. A. Del Barrio and R. Hermida, A slack-based approach to efficiently deploy radix 8 booth multipliers, in Proc. Design, Automat. Test Eur. (DATE), 2017, pp. 11531158.(IEMENTech)

Year: 2019 | Conference Paper

|Publisher: IEEE Cited by: Papers (2)

1. Design of high speed multiplier using modified booth algorithm with hybrid carry look-ahead adder R Balakumaran;E Prabhu 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) Year: 2016 | Conference Paper | Publisher: IEEE Cited by: Papers (18)

2. A Delay Efficient Vedic Multiplier E. Prabhu, H. Mangalam & P. R. Gokul Proceedings of the National Academy of Sciences, India Section A: Physical Sciences volume 89, pages257268 (2019)

3. Design of high speed multiplier using modified booth algorithm with hybrid carry look-ahead adder R Balakumaran;E Prabhu 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) Year: 2016 | Conference Paper | Publisher: IEEE

4. Performance analysis of Wallace and radix-4 Booth-Wallace multipliers Shahzad Asif;Yinan Kong 2015 Electronic System Level Synthesis Conference (ESLsyn) Year: 2015 | Conference Paper |

Publisher: IEEE

5. S. Shah and Swaminathan, R., Design of FIR Filter Architecture for Fixed and Reconfigurable Applications using Highly Efficient Carry Select Adder, in International Conference on Soft Computing and Signal Processing (ICSCSP-2018) .

6. D. De Caro, E. Napoli, D. Esposito, G. Castellano, N. Petra, and A.

G. M. Stroll, Minimizing coefficients word length for piecewise polynomial hardware function evaluation with exact or faithful rounding, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 64, no. 5, pp. 1187 1200, May 2017.

[15]H. H. Saleh, B. S. Mhammad, and E. E. Swartzlander, The optimum Booth radix for low power integer multipliers, in Proc. 8th Int. IEEE Design Test Symp. (IDT), Dec. 2013, pp. 14