FPGA Implementation of Single Cycle Signed Multiplier using Booth Recoding Algorithm

Piyush Pati

doi:10.17577/IJERTV12IS030115

Volume 12, Issue 03 (March 2023)

FPGA Implementation of Single Cycle Signed Multiplier using Booth Recoding Algorithm

DOI : 10.17577/IJERTV12IS030115

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 789
Authors : Piyush Pati
Paper ID : IJERTV12IS030115
Volume & Issue : Volume 12, Issue 03 (March 2023)
Published (First Online): 25-03-2023
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

FPGA Implementation of Single Cycle Signed Multiplier using Booth Recoding Algorithm

Piyush Pati

Electronics and Instrumentation Engineering Odisha University of Technology and Research, Bhubaneswar, Odisha, India

AbstractThe focus of this paper is on the implementation of a single cycle signed multiplier through use of the booth recoding algorithm on an FPGA. By utilizing fewer partial products, this implementation offers benefits such as reduced delay, power consumption, and usage of hardware resources. Additionally, this signed multiplier is capable of performing multiplication of both signed and unsigned numbers. The paper presents a comparative analysis of the 32-bit multiplier's performance in terms of power consumption and FPGA hardware resource utilization. The proposed 32-bit multiplier is designed using Verilog HDL and implemented through Xilinx Vivado 2022.2 software for Xilinx Virtex-7 FPGA.

KeywordsBooth recoding; Signed Multiplier; FPGA; Verilog HDL.

INTRODUCTION

A

B

Y

0

0

0

0

1

0

1

0

0

1

1

1

The process for binary multiplication is analogous to that of the decimal system. The regulations governing binary multiplication are illustrated in the following table.

Binary multiplication offers several benefits. It involves adding the multiplicand to itself, after a suitable shift based on the multiplier, which simplifies the process into a series of shifting and adding steps. These steps should be repeated until the MSB of the multiplier has been shifted, and the final addition has been performed.

The paper is organized as follows. Section II presents the design of booth recoding multiplier. Section III contains the sub-modules of implemented booth recoding multiplier. Section IV describes partial product addition unit. Experimental results and comparison are given in Section V. Finally, Section VI concludes the paper.

DESIGN OF BOOTH RECODING MULTIPLIER

32×32 bit multiplier(Heading 2)

The Booth recoding multiplier requires two 32-bit inputs representing the multiplier and multiplicand, respectively. The output of the multiplier is 64-bit. Additionally, two control signals are included to specify whether the multiplier and multiplicand are signed or unsigned.

Table 1:1-bit binary multiplication

In binary arithmetic, when multiplying two numbers, each bit of the multiplier is multiplied by the multiplicand one at a time, similar to how it's done in the decimal system. The resulting partial products are arranged such that the least significant bit (LSB) is positioned under the corresponding bit in the multiplier.

Once the partial products have been calculated through binary multiplication, they are summed together to form the final product.

It is important to keep in mind that any multiplication by zero would result in all bits of the partial product being zero, which can be skipped in intermediate steps.

Furthermore, when multiplied by 1, the bits of the

32-bit booth recoding multiplier

mplier mplier_s_u

mplicand mplicand_s_u

Signal Name	Width	Source	Description
mplier	32	input	Top module multiplier input
mplier_s_u	1	input	1= multiplier is signed, 0 =multiplier is unsigned
mplicand	32	input	Top module multiplicand input
mplicand_s_u	1	input	1 = multiplier is signed, 0 =multiplier is unsigned
prod	64	output	Output from the multiplier block

Figure 1: 32-bit booth recoding multiplier top module

Table 2: Signal Description of top module

prod

multiplicand remain unaltered, but they are shifted one bit position to the left. Performing intermediate sums of partial products simplifies the multiplication process of binary numbers.

Mathematical Representation of Signed Number

2s complement representation of A

= 23131 + 23030 + (2 1)2929 + 22828 + (2 1)2727 + 22626 +

(2 1)2525 + . + 222 + (2 1)11 + 200 + 201

where 1 0

21(22 + 1 + 0) +

21(20 + 1 + 2)

Where 1 = 2 0

= 23131 + 23030 + 23029

22929 + 22828 + 22827

=

16

=0

221(22 + 21 + 22)

22727 + 22626 + 22625

=

16

=0

2212

(2)

233 + 222 + 221

211 + 200 + 201

21 = 22 + 21 + 22

Comparing equation 1 and 2 it is apparent that one can

process the integer A before re-coding basic unsigned and signed integer.

+				(+/)	(x1/0)	(x2/0)
0	0	0	0	0	0	0
0	0	1	1	0	1	0
0	1	0	1	0	1	0
0	1	1	2	0	1	1
1	0	0	-2	1	1	1
1	0	1	-1	1	1	0
1	1	0	-1	1	1	0
1	1	1	0	0	0	0

where 1 0

= 230(23131 + 23030 + 23029) 228(22929 + 22828 + 22827)

226(22727 + 22626 + 22625)

22(233 + 222 + 221)

20(211 + 200 + 201)

where 1 0

=

15

=0

22(22+1 + 2 + 21)

=

15

=0

22 2 (1)

2 = 22+1 + 2 + 21

Mathematical Representation of Unsigned Number

= 23232 + 23131 + 23130

23030 + 22929 + 22928

22828 + 22727 + 22726

222 + 211 + 210

200 + 211 + 212

Where 1 = 2 0

= 231(232 + 31 + 30) + 229(230 + 29 + 28) +

Table 3:Booth recoding truth table

SUB-MODULES OF IMPLEMENTED BOOTH RECODING MULTIPLIER

33rd bit extension unit

Our implementation involves using a signed multiplier to perform an unsigned operation. To cover the entire range of unsigned multiplication, we add an extra bit at the MS. For unsigned numbers, the 33rd bit is zero, while for signed numbers, the 33rd bit is sign-extended.

F8	{9, 8, 7}
F10	{11, 10, 9}
F12	{13, 12, 11}
F14	{15, 14, 13}
F16	{17, 16, 15}
F18	{19, 18, 17}
F20	{21, 20, 19}
F22	{23, 22, 21}
F24	{25, 24, 23}
F26	{27, 26, 25}
F28	{29, 28, 27}
F30	{31, 30, 29}
F32	{32, 32, 31}

mplier_s_u

mplier [31]

mplier [31:0]

mplicand [31:0]

mplicand [31]

0

mux_out

1

mux_out

0

a [32:0]

b [32:0]

mplicand_s_u

Figure 2: 33rd bit extension unit block diagram

Preprocessing/F block formation unit
- During the pre-processing phase, we will append a "0" to the least significant bit (LSB) of "a", which is a 32-bit input to the pre-processing block, and serves as the output of the 33rd bit extension unit.
- Following that, we need to create blocks consisting of
  
  Table 5: F Block Formation
Partial product generation

There will be 16 rows of partial products for each of the

16 "F" blocks. Table 3 was used to generate , 1 and 2

which are then used for the required bit manipulation to

generate the necessary partial products. The hardware

implementation for , 1 and 2 is provided below.

21

2

three bits each, where the least significant bit (LSB) of the current block becomes the most significant bit (MSB) of the previous block.
Since we are forming blocks in the 33-bit number "a", adding a "0" to the least significant bit results in a shortage of one bit required to form the F block.
To address this situation, we will add a number to the most significant bit (MSB) that is the same as the 33rd bit, regardless of whether we are generating signed or unsigned partial products.

2+1

21

2

2+1

21

2

1

0

1 2

Extra bit added for block formation

Extended 33-bit number (output of 33rd bit extension unit)

0 added in LSB

a [32]

a [32:0]

0

2+1

Table 4:Extended 34-bit multiplier

After preprocessing total width of the number is 34-bit.

Note: preprocessing of number a is nothing but just rewiring.

Figure 3: Hardware realization for , 1 and 2

Note: 1. is high when 2 is -ve, low otherwise.
1. 1 is high for 2 0 ,low otherwise.
2. 2 is high for 2 = Â±2 ,low otherwise.
We need to process all the F block in this hardware to get

F0	{1, 0, 0}
F2	{3, 2, 1}
F4	{5, 4, 3}
F6	{7, 6, 5}

corresponding , 1 and 2. For example, if we process F0 block we can get the 0, 01 and 02. Similarly, we

can get the values for 2, 2

and 2 ——32, 32

1 2 1

and 322by processing corresponding F blocks.

As we are using booth recoding algorithm, so our partial

product will reduce. There will be 17 rows of partial

product which we need to add. For generating ROW#0

partial product we need to 0, 01 and 02 in b (33-bit multiplicand, output of preprocessing unit). Similarly for

ROW#2 we need to use 2, 21 and 22 on b and so

on for other ROWS

PARTIAL PRODUCT ADDITION UNIT

The process in this unit involves the addition of all partial products, namely ROW#0, ROW#2, …, ROW#32. To achieve this, we use full adders and half adders. The carries are propagated diagonally to the left and downward to achieve superior speed. However, when processing the last row,

b [32]
b [32]
0 1

b [31]
02

b [1]
0 1

b [0]
0 1

02

01

0

ROW#32, there is no row below it, and so we propagate the carries horizontally instead.

Note: Horizontal carry propagation is feasible because the augend is unoccupied, given the absence of a row beneath it.

It is worth noting that if the value of 2 is negative, we must

compute the 2's complement of the number "b". However, the

structure depicted in figures 4 and 5 merely perform XOR operations. Additionally, we need to add 1 to the LSB position, aligned with the binary values of the respective ROWs. To address this, we introduce an extra ROW, known as ROW#-1, which will facilitate the addition of 1 to the aligned LSB position for the relevant ROWs.

row0[63]
row0[33] row0[32] row0[34]
row0[1]
row0[0]
ROW#-1= {32b0, 32, 0 , 30, 0 , 28, 0 , 26, 0

,24, 0 ,22, 0 ,20, 0 , 18, 0 , 16, 0 , 14, 0 , 12,

0 ,10, 0 ,8, 0 ,6, 0 , 4, 0 , 2, 0 , 0}

A. Addition Stages

During the initial stage of addition, we will add ROW#-1, ROW#0, and ROW#2 (shifted left by 2 positions). In the

Figure 4: ROW#0 calculation using 0, 01 and 02

Similar structure can be used for calculation of other ROWs.

subsequent stage, we will add the sum obtained from the previous stage, ROW#4 (shifted left by 4 positions), and the carry generated by the first stage (shifted left by 1 position). This procedure is repeated in a similar manner for the succeeding

b [32]
b [32]
b [31]
b [1]
b [0]
stages of addition.

0 1 02 0 1

R#-1[0]
R#0[0]

R#-1[1]
R#0[1]

R#2[0]
R#-1[2]
R#0[2]

R#2[61]
R#-1[63] R#0[63]

0 1 02

01

——–

0

C#2[63]
S#2[63]
C#2[2]
S#2[2]
C#2[1]
S#2[1]
C#2[0]
S#2[0]
1b0

S#2[0]

C#2[0]
S#2[1]

R#4[0]
C#2[3]
S#2[4]

R#4[59]
C#2[62] S#2[63]

Figure 6 : First stage addition block

row2[33] row2[32] row2[34]
row2[1]
row2[0]
——–

——–

row2[63]
Figure 5: ROW#2 calculation using 2, 21 and 22

C#4[63]
S#4[63]
C#4[4]
S#4[4]
C#4[1]
S#4[1]
C#4[0]
S#4[0]
Figure 7: Second stage addition block

1b0 S#32[0]

C#32[0]
S#32[1]

C#32[16]
S#32[17]

C#32[62]
S#32[63]

carry

——–

carry

——–

Figure 11 : On Chip Power

M[63]
M[17]
M[1]

Parameter

Utilization

Bonded IOB

130

Slice

SLICEL

223

SLICEM

144

LUT as Logic

using o5 output only

0

using o6 output only

919

using o5 and 06

386

M[0]
Figure 8: Final Stage addition block
RESULTS AND COMPARISON

A test bench program has bee developed to verify the results of the implemented single-cycle signed multiplier using the Booth recoding algorithm. The code was written and executed using Xilinx Vivado 2022.2. The synthesized design was targeted towards the Virtex-7 FPGA, specifically the Device-XC7V585T with Package-FFG1157 and speed-1. Table-6 presents a comparison between the implemented single-cycle signed multiplier and the 32-bit complex Vedic multiplier [2].

Parameter

32-bit Vedic Multiplier

Implemented 32-bit Signed Multiplier

Improvement

Slice LUTs

7874

1305

83.43%

Bonded IOBs

258

130

49.61%

Table 6: Comparison with Vedic multiplier

Figure 9 : Simulation Waveform

Figure 10 : RTL schematic of Implemented multiplier

Table 7:FPGA hardware utilization
CONCLUSION

The architecture we propose in this paper can multiply two 32-bit signed numbers using the Booth recoding algorithm. Compared to conventional multipliers, our multiplier algorithm requires only half of the partial product addition. We examine the implemented multiplier's performance using various parameters, including power and hardware utilization. Compared to the 32-bit complex Vedic multiplier [2], the slice LUTs and bonded IOBs utilizations are significantly better, with improvements of 83.43% and 49.61%, respectively.

REFERENCES

[1] Saha, P., Banerjee, A., Bhattacharyya, P., and Dandapat, A. (2011, January). High speed ASIC design of complex multiplier using vedicmathematics. In Students Technology Symposium (TechSym), 2011 IEEE (pp. 237-241). IEEE.

[2] Ankush Nikam, Swati Salunke, Sweta Bhurse. Design and Implementation of 32bit Complex Multiplier using Vedic Algorithm IJERT ,2015 March,Vol 4.

[3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay, A 1.2-ns16Ã—16- Bit Binary Multiplier Using High Speed Compressors, International Journal of Electrical and Electronics Engineering, 2010.

[4] M. Morris Mano, Digital Design,3rd edition, Prentice Hall,2002.

Extra bit added for block formation	Extended 33-bit number (output of 33rd bit extension unit)	0 added in LSB
a [32]	a [32:0]	0

Parameter		Utilization
Bonded IOB		130
Slice	SLICEL	223
Slice	SLICEM	144
LUT as Logic	using o5 output only	0
	using o6 output only	919
	using o5 and 06	386

Parameter	32-bit Vedic Multiplier	Implemented 32-bit Signed Multiplier	Improvement
Slice LUTs	7874	1305	83.43%
Bonded IOBs	258	130	49.61%