FPGA Implementation of Single Cycle Signed Multiplier using Booth Recoding Algorithm

DOI : 10.17577/IJERTV12IS030115

Download Full-Text PDF Cite this Publication

Text Only Version

FPGA Implementation of Single Cycle Signed Multiplier using Booth Recoding Algorithm

Piyush Pati

Electronics and Instrumentation Engineering Odisha University of Technology and Research, Bhubaneswar, Odisha, India

AbstractThe focus of this paper is on the implementation of a single cycle signed multiplier through use of the booth recoding algorithm on an FPGA. By utilizing fewer partial products, this implementation offers benefits such as reduced delay, power consumption, and usage of hardware resources. Additionally, this signed multiplier is capable of performing multiplication of both signed and unsigned numbers. The paper presents a comparative analysis of the 32-bit multiplier's performance in terms of power consumption and FPGA hardware resource utilization. The proposed 32-bit multiplier is designed using Verilog HDL and implemented through Xilinx Vivado 2022.2 software for Xilinx Virtex-7 FPGA.

KeywordsBooth recoding; Signed Multiplier; FPGA; Verilog HDL.

  1. INTRODUCTION

    A

    B

    Y

    0

    0

    0

    0

    1

    0

    1

    0

    0

    1

    1

    1

    The process for binary multiplication is analogous to that of the decimal system. The regulations governing binary multiplication are illustrated in the following table.

    Binary multiplication offers several benefits. It involves adding the multiplicand to itself, after a suitable shift based on the multiplier, which simplifies the process into a series of shifting and adding steps. These steps should be repeated until the MSB of the multiplier has been shifted, and the final addition has been performed.

    The paper is organized as follows. Section II presents the design of booth recoding multiplier. Section III contains the sub-modules of implemented booth recoding multiplier. Section IV describes partial product addition unit. Experimental results and comparison are given in Section V. Finally, Section VI concludes the paper.

  2. DESIGN OF BOOTH RECODING MULTIPLIER

    1. 32×32 bit multiplier(Heading 2)

      The Booth recoding multiplier requires two 32-bit inputs representing the multiplier and multiplicand, respectively. The output of the multiplier is 64-bit. Additionally, two control signals are included to specify whether the multiplier and multiplicand are signed or unsigned.

      Table 1:1-bit binary multiplication

      In binary arithmetic, when multiplying two numbers, each bit of the multiplier is multiplied by the multiplicand one at a time, similar to how it's done in the decimal system. The resulting partial products are arranged such that the least significant bit (LSB) is positioned under the corresponding bit in the multiplier.

      Once the partial products have been calculated through binary multiplication, they are summed together to form the final product.

      It is important to keep in mind that any multiplication by zero would result in all bits of the partial product being zero, which can be skipped in intermediate steps.

      Furthermore, when multiplied by 1, the bits of the

      32-bit booth recoding multiplier

      mplier mplier_s_u

      mplicand mplicand_s_u

      Signal Name

      Width

      Source

      Description

      mplier

      32

      input

      Top module multiplier input

      mplier_s_u

      1

      input

      1= multiplier is signed, 0 =multiplier is unsigned

      mplicand

      32

      input

      Top module multiplicand input

      mplicand_s_u

      1

      input

      1 = multiplier is signed, 0 =multiplier is unsigned

      prod

      64

      output

      Output from the multiplier block

      Figure 1: 32-bit booth recoding multiplier top module

      Table 2: Signal Description of top module

      prod

      multiplicand remain unaltered, but they are shifted one bit position to the left. Performing intermediate sums of partial products simplifies the multiplication process of binary numbers.

    2. Mathematical Representation of Signed Number

      2s complement representation of A

      = 23131 + 23030 + (2 1)2929 + 22828 + (2 1)2727 + 22626 +

      (2 1)2525 + . + 222 + (2 1)11 + 200 + 201

      where 1 0

      21(22 + 1 + 0) +

      21(20 + 1 + 2)

      Where 1 = 2 0

      = 23131 + 23030 + 23029

      22929 + 22828 + 22827

      =

      16

      =0

      221(22 + 21 + 22)

      22727 + 22626 + 22625

      =

      16

      =0

      2212

      (2)

      233 + 222 + 221

      211 + 200 + 201

      21 = 22 + 21 + 22

      Comparing equation 1 and 2 it is apparent that one can

      process the integer A before re-coding basic unsigned and signed integer.

      +

      (+/)

      (x1/0)

      (x2/0)

      0

      0

      0

      0

      0

      0

      0

      0

      0

      1

      1

      0

      1

      0

      0

      1

      0

      1

      0

      1

      0

      0

      1

      1

      2

      0

      1

      1

      1

      0

      0

      -2

      1

      1

      1

      1

      0

      1

      -1

      1

      1

      0

      1

      1

      0

      -1

      1

      1

      0

      1

      1

      1

      0

      0

      0

      0

      where 1 0

      = 230(23131 + 23030 + 23029) 228(22929 + 22828 + 22827)

      226(22727 + 22626 + 22625)

      22(233 + 222 + 221)

      20(211 + 200 + 201)

      where 1 0

      =

      15

      =0

      22(22+1 + 2 + 21)

      =

      15

      =0

      22 2 (1)

      2 = 22+1 + 2 + 21

    3. Mathematical Representation of Unsigned Number

    = 23232 + 23131 + 23130

    23030 + 22929 + 22928

    22828 + 22727 + 22726

    222 + 211 + 210

    200 + 211 + 212

    Where 1 = 2 0

    = 231(232 + 31 + 30) + 229(230 + 29 + 28) +

    Table 3:Booth recoding truth table

  3. SUB-MODULES OF IMPLEMENTED BOOTH RECODING MULTIPLIER

    1. 33rd bit extension unit

      Our implementation involves using a signed multiplier to perform an unsigned operation. To cover the entire range of unsigned multiplication, we add an extra bit at the MS. For unsigned numbers, the 33rd bit is zero, while for signed numbers, the 33rd bit is sign-extended.

      F8

      {9, 8, 7}

      F10

      {11, 10, 9}

      F12

      {13, 12, 11}

      F14

      {15, 14, 13}

      F16

      {17, 16, 15}

      F18

      {19, 18, 17}

      F20

      {21, 20, 19}

      F22

      {23, 22, 21}

      F24

      {25, 24, 23}

      F26

      {27, 26, 25}

      F28

      {29, 28, 27}

      F30

      {31, 30, 29}

      F32

      {32, 32, 31}

      mplier_s_u

      mplier [31]

      mplier [31:0]

      mplicand [31:0]

      mplicand [31]

      0

      mux_out

      1

      1

      mux_out

      0

      a [32:0]

      b [32:0]

      mplicand_s_u

      Figure 2: 33rd bit extension unit block diagram

    2. Preprocessing/F block formation unit

      • During the pre-processing phase, we will append a "0" to the least significant bit (LSB) of "a", which is a 32-bit input to the pre-processing block, and serves as the output of the 33rd bit extension unit.

      • Following that, we need to create blocks consisting of

        Table 5: F Block Formation

    3. Partial product generation

    • There will be 16 rows of partial products for each of the

      16 "F" blocks. Table 3 was used to generate , 1 and 2

      which are then used for the required bit manipulation to

      generate the necessary partial products. The hardware

      implementation for , 1 and 2 is provided below.

      21

      2

      three bits each, where the least significant bit (LSB) of the current block becomes the most significant bit (MSB) of the previous block.

    • Since we are forming blocks in the 33-bit number "a", adding a "0" to the least significant bit results in a shortage of one bit required to form the F block.

    • To address this situation, we will add a number to the most significant bit (MSB) that is the same as the 33rd bit, regardless of whether we are generating signed or unsigned partial products.

      2+1

      21

      2

      2+1

      21

      2

      1

      0

      1 2

      Extra bit added for block formation

      Extended 33-bit number (output of 33rd bit extension unit)

      0 added in LSB

      a [32]

      a [32:0]

      0

      2+1

      Table 4:Extended 34-bit multiplier

      After preprocessing total width of the number is 34-bit.

      Note: preprocessing of number a is nothing but just rewiring.

      Figure 3: Hardware realization for , 1 and 2

      Note: 1. is high when 2 is -ve, low otherwise.

      1. 1 is high for 2 0 ,low otherwise.

      2. 2 is high for 2 = ±2 ,low otherwise.

    • We need to process all the F block in this hardware to get

    F0

    {1, 0, 0}

    F2

    {3, 2, 1}

    F4

    {5, 4, 3}

    F6

    {7, 6, 5}

    corresponding , 1 and 2. For example, if we process F0 block we can get the 0, 01 and 02. Similarly, we

    can get the values for 2, 2

    and 2 ——32, 32

    1 2 1

    and 322by processing corresponding F blocks.

    • As we are using booth recoding algorithm, so our partial

    product will reduce. There will be 17 rows of partial

    product which we need to add. For generating ROW#0

    partial product we need to 0, 01 and 02 in b (33-bit multiplicand, output of preprocessing unit). Similarly for

    ROW#2 we need to use 2, 21 and 22 on b and so

    on for other ROWS

  4. PARTIAL PRODUCT ADDITION UNIT

    The process in this unit involves the addition of all partial products, namely ROW#0, ROW#2, …, ROW#32. To achieve this, we use full adders and half adders. The carries are propagated diagonally to the left and downward to achieve superior speed. However, when processing the last row,

    b [32]

    b [32]

    0 1

    b [31]

    02

    b [1]

    0 1

    b [0]

    0 1

    02

    01

    0

    ROW#32, there is no row below it, and so we propagate the carries horizontally instead.

    Note: Horizontal carry propagation is feasible because the augend is unoccupied, given the absence of a row beneath it.

    It is worth noting that if the value of 2 is negative, we must

    compute the 2's complement of the number "b". However, the

    structure depicted in figures 4 and 5 merely perform XOR operations. Additionally, we need to add 1 to the LSB position, aligned with the binary values of the respective ROWs. To address this, we introduce an extra ROW, known as ROW#-1, which will facilitate the addition of 1 to the aligned LSB position for the relevant ROWs.

    row0[63]

    row0[33] row0[32] row0[34]

    row0[1]

    row0[0]

    ROW#-1= {32b0, 32, 0 , 30, 0 , 28, 0 , 26, 0

    ,24, 0 ,22, 0 ,20, 0 , 18, 0 , 16, 0 , 14, 0 , 12,

    0 ,10, 0 ,8, 0 ,6, 0 , 4, 0 , 2, 0 , 0}

    A. Addition Stages

    During the initial stage of addition, we will add ROW#-1, ROW#0, and ROW#2 (shifted left by 2 positions). In the

    Figure 4: ROW#0 calculation using 0, 01 and 02

    Similar structure can be used for calculation of other ROWs.

    subsequent stage, we will add the sum obtained from the previous stage, ROW#4 (shifted left by 4 positions), and the carry generated by the first stage (shifted left by 1 position). This procedure is repeated in a similar manner for the succeeding

    b [32]

    b [32]

    b [31]

    b [1]

    b [0]

    stages of addition.

    0 1 02 0 1

    R#-1[0]

    R#0[0]

    R#-1[1]

    R#0[1]

    R#2[0]

    R#-1[2]

    R#0[2]

    R#2[61]

    R#-1[63] R#0[63]

    0 1 02

    01

    ——–

    0

    C#2[63]

    S#2[63]

    C#2[2]

    S#2[2]

    C#2[1]

    S#2[1]

    C#2[0]

    S#2[0]

    1b0

    S#2[0]

    C#2[0]

    S#2[1]

    R#4[0]

    C#2[3]

    S#2[4]

    R#4[59]

    C#2[62] S#2[63]

    Figure 6 : First stage addition block

    row2[33] row2[32] row2[34]

    row2[1]

    row2[0]

    ——–

    ——–

    row2[63]

    Figure 5: ROW#2 calculation using 2, 21 and 22

    C#4[63]

    S#4[63]

    C#4[4]

    S#4[4]

    C#4[1]

    S#4[1]

    C#4[0]

    S#4[0]

    Figure 7: Second stage addition block

    1b0 S#32[0]

    C#32[0]

    S#32[1]

    C#32[16]

    S#32[17]

    C#32[62]

    S#32[63]

    carry

    ——–

    carry

    ——–

    Figure 11 : On Chip Power

    M[63]

    M[17]

    M[1]

    Parameter

    Utilization

    Bonded IOB

    130

    Slice

    SLICEL

    223

    SLICEM

    144

    LUT as Logic

    using o5 output only

    0

    using o6 output only

    919

    using o5 and 06

    386

    M[0]

    Figure 8: Final Stage addition block

  5. RESULTS AND COMPARISON

    A test bench program has bee developed to verify the results of the implemented single-cycle signed multiplier using the Booth recoding algorithm. The code was written and executed using Xilinx Vivado 2022.2. The synthesized design was targeted towards the Virtex-7 FPGA, specifically the Device-XC7V585T with Package-FFG1157 and speed-1. Table-6 presents a comparison between the implemented single-cycle signed multiplier and the 32-bit complex Vedic multiplier [2].

    Parameter

    32-bit Vedic Multiplier

    Implemented 32-bit Signed Multiplier

    Improvement

    Slice LUTs

    7874

    1305

    83.43%

    Bonded IOBs

    258

    130

    49.61%

    Table 6: Comparison with Vedic multiplier

    Figure 9 : Simulation Waveform

    Figure 10 : RTL schematic of Implemented multiplier

    Table 7:FPGA hardware utilization

  6. CONCLUSION

The architecture we propose in this paper can multiply two 32-bit signed numbers using the Booth recoding algorithm. Compared to conventional multipliers, our multiplier algorithm requires only half of the partial product addition. We examine the implemented multiplier's performance using various parameters, including power and hardware utilization. Compared to the 32-bit complex Vedic multiplier [2], the slice LUTs and bonded IOBs utilizations are significantly better, with improvements of 83.43% and 49.61%, respectively.

REFERENCES

[1] Saha, P., Banerjee, A., Bhattacharyya, P., and Dandapat, A. (2011, January). High speed ASIC design of complex multiplier using vedicmathematics. In Students Technology Symposium (TechSym), 2011 IEEE (pp. 237-241). IEEE.

[2] Ankush Nikam, Swati Salunke, Sweta Bhurse. Design and Implementation of 32bit Complex Multiplier using Vedic Algorithm IJERT ,2015 March,Vol 4.

[3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay, A 1.2-ns16×16- Bit Binary Multiplier Using High Speed Compressors, International Journal of Electrical and Electronics Engineering, 2010.

[4] M. Morris Mano, Digital Design,3rd edition, Prentice Hall,2002.