 Open Access
 Total Downloads : 882
 Authors : Ch.Nirosha , P.Sunitha
 Paper ID : IJERTV1IS7256
 Volume & Issue : Volume 01, Issue 07 (September 2012)
 Published (First Online): 25092012
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Area Efficient High Bit Rate Serialserial Multiplier with 1’s Asynchronous Counters
AREA EFFICIENT HIGH BIT RATE SERIALSERIAL MULTIPLIER WITH 1S ASYNCHRONOUS COUNTERS
ECE Department Pragati Engineering College, Surampalem, India
In this Paper, a technique for Serial Multiplication completes the partial product Formation in n cycles is presented. The Proposed technique effectively forms partial product matrix in just n cycles for an nxn multiplication instead of at least 2n cycles in traditional SerialSerial Multiplier. Here Serial Serial algorithm is used. The newly developed design is capable of processing input data (GBs) without buffering and with reduced number of computational cycles .This multiplication of partial products by considering two series inputs among which one is starting from LSB and other from MSB. The architecture consists of a series of asynchronous 1s counters instead of 5:3 Counters, then critical path is limited to AND gate & D flip flop. The proposed multiplier consists of a SerialSerial data accumulator and CSA, is designed to carry out both signed and unsigned multiplication. We can able to extend area reduction of proposed multiplier by using CLA instead of RCA in CSA.

Serialserial multiplication techniques have been in use for many years . The proposed a structure based on 1s counters, which computed the N bits of the product of an NxN multiplication in N bits clock cycles, using N processing cells. Multipliers are the fundamental and essential building blocks of VLSI systems. The design and implementation approaches of multipliers contribute substantially to the area, speed and power consumption of computational intensive
VLSI systems.Hardware implementation of a multiplication operation consists of three stages, specifically the generation of partial products (PPs), the reduction of partial products (PPs), and the final carry propagation addition. The partial products can be generated either in parallel or serially, depending on the target application and the availability of input data. The partial products are reduced by carrysave adders (CSA) using an array or tree structure. Carry propagation addition is inevitable when the number of partial products is reduced to two rows. This final adder can be a simple ripple carry adder (RCA) for low power or a carry lookahead adder (CLA) for high speed. As the height of PP tree increases linearly with the word length of the multiplier, it aggravates the area, delay and power dissipation of the two subsequent stages.
Therefore, it is highly desirable to reduce the number of partial products before the CSA Stage. In the proposed method the partial product formation is revamped using an algorithm named as serialserial algorithm which is explained in the following sections. The generated partial products are passed to a group of asynchronous 1s counters for accumulation. The counters will count the number of ones in the partial products which is used for addition. In the following sections an approach to the design of serial multiplier that is capable of processing input data without input buffering and with reduced total number of computational cycles is proposed.

REVIEW OF SERIAL MULTIPLIERS
In a serialserial multiplier both the operands are loaded in a bitserial fashion, reducing the data input pads to two serial multipliers are popular for their low area and power. Bit serial processing can result in efficient communications, both within and between VLSI chips, because of the reduced number of interconnections required. Serial multiplier designs which are particularly suitable for applications where input
data are sequentially presented .The operating speeds are determined mainly by the Propagation delays along the critical path within the processing elements. A major advantage offered by bit serial processors is when the operands are available only one bit at a time, the processing speed can be improved using bitserial arithmetic elements. The structures that use this approach can achieve moderate speeds with comparatively small area. The existing method of multiplication is the CSAS (carry save add shift) multiplier
.CSAS architecture consists of a FA, D FF and an AND gate in the critical path for unsigned multiplication and an additional EXOR gate for signed multiplication. This architecture takes 2n cycles for the partial product formation.
Parallel multipliers are popular as the size is less critical due to technology scaling. However due to the emerging development of the onchip seriallink bus architectures serialserial multipliers could find their potential roles in the new generation of SoCs and FPGAs. In the following sections, approach to the design of serial multiplier that is ca pable of processing input data at Gb/s without input buffering and with reduced total number of computational cycles is proposed.

CONCEPT OF SERIAL ALGORITHM:
The paper addresses an algorithm named serial algorithm that reduces the computation time of the partial products such that they can be formed in just n cycles for an n x n multiplier. According to this algorithm the partial product row and column structure is revamped. The figures (1) and
(2) shows the partial product formation of the conventional and the proposed multiplier.
Figure 1: Conventional PP Formation
Figure 2: Proposed PP Formation
The partial products so formed are counted column wise for the number of ones.
Figure 3: Hardware architecture of a 3bit 1s counter.
The counters corresponding to the columns that have a 1 input are incremented. The counters can be clocked at high frequency and all the operands will be accumulated at the end of the clock. The final outputs of the counters need to be further reduced to only two rows of partial products by a CSA tree.

CONCEPT OF SERIAL ACCUMULATOR:
Accumulation is an integral part of serial multiplier design. A typical accumulator is simply an adder that successively adds the current input with the value stored in its internal register.
Generally, the adder can be a simple RCA but the speed of accumulation is limited by the carry propagation chain. The accumulation can be speed up by using a CSA with two registers to store the intermediate sum and carry vectors, but a more complex fast vector merged adder is needed to add the final outputs of these registers. In either case, the basic functional unit is an FA cell. A new approach to serial accumulation of data by using asynchronous counters is suggested here which essentially count the number of 1s in respective input sequences (columns).
Figure 4: Architecture of Accumulator
For an 8 bit operands multiplication the counter counts the number of ones in the columns and these count values are positioned and arranged for addition.

PROPOSED SERIALSERIAL MULTIPLIER:
This section describes an unsigned multiplier in which the operands are fed serially one starting from LSB and the other from MSB Using this feeding sequence and the proposed counter based accumulation, it takes only n cycles to complete the entire partial product generation for nxn multiplication.
The product of two unsigned numbers X and Y can be written as
Gggggggggggggggggg
(1)
Where xi and yj are the ith and jth bits of X and Y with bit 0 being the LSB.
Reversing the sequence of index I and rearranging the above equation can be written as
Gggggggggg
(2)
Where
The partial product row PPr can be generated in rth cycle If X is fed rom MSB (bit n1) first and Y is fed from LSB first (bit 0), then in the rth cycle ,PPrC is a partial product bit generated by the current input bits xnr1 and yr,PPrL are partial product bits of the current input bit, yr and each of the preceding input bits of X, ie xnk1, for k=0,1,.r1.and PPrR are the partial product bits of the input bit xnr1,and each of the preceding input bits of Y, i.e Yrk1 for k = 0,1,2,.r1. By appropriately sequencing the input bits of X and Y in to a shift register, one PP (PPr) in each cycle can be generated. As a result P can be obtained in n cycles.
The figure (5) illustrates the PP generation of an 8X8 multiplier for unsigned numbers using the serial algorithm.
According to the algorithm, the partial products formation sequence is as shown in the figure (2) like a pyramid. From the figure it is clear that in the first cycle of operation the pp X7Y0 is formed. In the next cycle x7 is shifted one position to the left and y0 is shifted one position to the right. At the same time the x6 and y1 is also generated. In this way at the next cycle x7 and x6 will be shifted again one position to the
left and y0 and y1 one position to the right. Similarly the process continues till the last row of partial products is being formed. Once the partial products are formed they are accumulated using the asynchronous ones counter explained in the last section.
Figure 5: Proposed Architecture for 8×8 Serial Serial Unsigned Multiplications

EXTENDING THE LOGIC TO SIGNED NUMBERS
The above said architecture can be modified such that the same could be used for the multiplication of signed numbers. The signed multiplication is carried out with Baugh Wooley algorithm. By combining the Baugh Wooley and the serial algorithm, the architecture could be modified to perform signed multiplication. Using Baugh Wooley algorithm the equation for the multiplication of two signed numbers in twos complement form can be written as
(3)
Using the proposed architecture the multiplication of the signed numbers can be written as
Cycle n1. The generation of
has to be delayed until cycle n1. The remaining terms
The difference between 3 and 4 can be written as
(4)
can be computed during the initial n 1cycles. Hence, the difference can be corrected in the CSA tree. It is trivial that a n 1bit shift register, a NAND gate and several FAs are required for adding .
Xni1
+ 2n – 22n1 (5)
The above expression could be simplified as
(6)
i
To extend the unsigned multiplier architecture for signed multiplication without introducing a high overhead, the difference expressed in 6must be simplified. Since n2 2i =
j
n2 2j ,the following summation terms embedded in (6) can
be simplified by the closed form expression of a geometric progression
ggggggggggggggggggg
Therefore
Gggggggggggggggggggggggggggggggggggggg
(7)
The difference is added to the proposed architecture such that the architecture can be used for signed multiplication. Thus we can write
P = P + (8)
In the proposed PP generation method, Yn1 arrives only in
Figure 6: Proposed Architecture for 8×8 Serial Serial Signed Multiplication
The architecture of the proposed 2s complement serial serial multiplier is depicted in the figure (6). A control input is required to latch Xn1 in the first clock cycle to generate
Serially in n1 cycles. The bits of to be added in the CSA tree is shown in the figure.

CARRY PROPAGATION ADDITION USING CLA IN CSA
Look ahead carry algorithm speed up the operation to perform addition, because in this algorithm carry for the next stages is calculated in advance based on the input signals. By using this CLA, the carry propagation time is
reduced by using a tree like circuit to compute the carry rapidly. The CLA exploits the fact that the carry generator by a bit position depends on the 3 inputs to that position.
If X & Y are two inputs then
if X=Y=1 a carry generated independently of the carry from the previous bit position
X=Y=0 no carry generated.
X Y a carry generated if and only if the previous bit position generates a carry.
The multiplier architectures area can be reduced with CLA than RCA used in the carry propagation addition stage of CSA.

PERFORMANCE COMPARISON AND RESULTS
Fig: Simulation results of 8 x 8 unsigned multiplication using RCA.
Fig: Simulation results of 8 x 8 signed multiplication using RCA.
Fig: Simulation results of 8 x 8 unsigned multiplication using CLA.
Fig: Simulation results of 8 x 8 signed multiplication
using CLA.
In this section the results of the proposed work is compared with the existing carry save adder multiplier using RCA and CLA .The architecture is implemented in VHDL and simulated using XYLINX.
Method
Bit Rate
Operating Mode
Number of
Slices
Using RCA
Using CLA
Proposed
8 x 8
Un Signed
91
86
Proposed
8 x 8
Signed
166
165
Table 1: Proposed serialserial multiplication Area comparison
International Journal of Engineering Research & Technology (IJERT)
ISSN: 22780181
Vol. 1 Issue 7, September – 2012
The proposed work can be made area efficient by using CLA for PP addition than RCA. From the synthesis report which is shown in the above table, the no. of slices are reduced by using CLA than RCA.

CONCLUSION:
In this paper a new method of computing serial serial multiplication is introduced by using low complexity asynchronous counters. By exploiting the relationship among the bits of a partial product matrix it is possible to generate all the rows serially in just n cycles for an n x n multiplication. Employing counters to count no of ones in each column allows the partial products bits to be generated onthefly and partially accumulated in place with a critical path delay of only an AND gate and a DFF . The counterbased accumulation reduces the partial product height logarithmically and makes it possible to achieve an effective reduction rate. The proposed method outperforms many serialserial and serialparallel multipliers in speed. This approach has clear advantage of low I/O requirement and hence is most suitable for complex SOCs, advanced FPGAs and high speed bit serial applications.
REFERENCES:

P. Ienne and M. A Viredaz, Bitserial multipliers and squarers, IEEE Trans. Comput., vol. 43, no. 12, pp. 14451450, Dec. 1994.

Aggoun, A. Ashur, and M. K. Ibrahim, Area time efficient serialserial multipliers, in Proc. IEEE Conf. Circuits Syst. (ISCAS), Geneva, Switzerland, 2000, pp. 585588.

A high bit rate serialserial multiplier [online].

R. Gnanasekaran, On a bitserial input and bitserial output multiplier, IEEE Trans. Comput., vol. C32, no. 9, pp. 878880, 1983.

O. Nibouche, A. Bouridarie, and M. Nibouche, New architectures for serialserial multiplication, in Proc. IEEE Conf. Circuits Syst. (ISCAS), Sydney, Australia, 2001, vol. 2, pp. 705708
