 Open Access
 Total Downloads : 18
 Authors : R. Ponnagan, C. Archana, R. Bhuvaneswari, R. Boovitha
 Paper ID : IJERTCONV5IS13150
 Volume & Issue : ICONNECT – 2017 (Volume 5 – Issue 13)
 Published (First Online): 24042018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
LowDelay HighPerformance VLSI Architecture using Montgomery Modular Multiplication
R. Ponnagan1
Assistant Professor, Dept. of ECE K.Ramakrishnan College of Technology Samayapuram, Trichy.
C. Archana2, R. Bhuvaneswari3, R. Boovitha4
Department of ECE K.Ramakrishnan College of Technology
Samayapuram, Trichy.
AbstractThere is more advancement in VLSI technology for designing combinational and sequential circuits. Among various techniques we have chosen a CMOS (Complementary Metal Oxide Semiconductor) and Verilog coding technique to design the combinational circuits such as full adder and ripple
4. qi = (S[i]0+Ai*B0) mod 2;
5. S[i +1] = (S[i] + Ai * B + qi * N) / 2;
6. }

return S[k];
carry adder. Full adder is the basic combinational circuit which performs arithmetic operations. Multiple full adder
circuits can be cascaded to add an Nbit number i.e. Ripple carry adder. This paper proposes a simple and an efficient modular multiplication using a Carry Skip Adder. The proposed algorithm improves the efficiency of the multiplication. We made power analysis comparisons by using several types of adders and found that Carry skip adder consumes less power than other adders. The simulation result shows that the proposed combinational circuits have better performance and it maintains low complexity of the design.
Index Terms CarrySkip addition, lowpower architecture, Montgomery modular multiplier.

INTRODUCTION
IN MANY systems, large integers in modular multiplication (MM) is the most critical and timeconsuming operation. Therefore, many algorithms have been presented to carry out the modular multiplication more fastly, among that Montgomerys algorithm is one of the efficient MM algorithm. Depending on the least significant digit of the operands, Montgomerys algorithm determines the quotient and replaces the complex division in terms of conventional modular multiplication with a series of shifting modular additions to produce the result as S = A x B X R1 (mod N), where N is the Kbit modulus, R1 is the inverse of R modulo N and R=2Kmod N. In order to speed up the encryption/decryption process, it is implemented into the Very Large Scale Integration (VLSI) circuits. Let us see the Montgomerys algorithm by using three operand for addition. Fig.1 represents, for large operands in binary representation requires long carry propagation.
Algorithm MM:
Radix2 Montgomery modular multiplication
Inputs: A, B, N (modulus)
Output: S[K] 1. S [0] = 0;
2. For i=0 to k1 3. {
Fig. 1 MM algorithm

MODULAR MULTIPLICATIONALGORITHMS

Montgomery Multiplication
Montgomery multiplication, is a method for performing fast modular multiplication. However, when many products are required, as in modular exponentiation , the conversion to Montgomery form is a negligible fraction of the time of the computation, and performing the computation by Montgomery multiplication is faster than the available alternatives. Many important cryptosystems such as RSA exchange is based on arithmetic operations modulo a large number, and for these cryptosystems, the increased speed afforded by Montgomery multiplication can be important in practice. Fig. 1 shows the Montgomery MM algorithm (denoted as MM algorithm) for radix2 version. As mentioned earlier, S = A Ã— B Ã— R1 (mod N) represents the Montgomery modular product S of A and B, where R1 is the inverse R modulo N. where, R Ã— R1 = 1 (mod N). Note that, the notation Ai in Fig. 1 shows the ith bit of A in binary representation. In addition, the notation Ai: j indicates a segment of A from the ith to jth bit. Ss convergence range in MM algorithm is 0 S<2N, S = S N is an additional operation is required, if SN which remove the oversize residue. step 6 is to eliminate the final comparison and subtraction which represented in Fig. 1.
Walter [2] changed the value of R to k+ 2 and 2k+2 mod N and the number of iterations , respectively. Nevertheless, the very large operand addition still restricts the performance of MM algorithm for long carry propagation. , the intermediate result S of shifting modular addition kept in the carrysave presentation (SS, SC), in order to avoid the long carry propagation. To remove the final comparison and subtraction [2] the number of iterations been changed from k to k +2. The format conversion from the carrysave format into its binary format is needed. Therefore, the 32bit will take 32 clock cycles to complete the format conversion of a 1024bit in based Montgomery multiplication. The extra clock cycles enlarges the critical path and area of the
multiplier. The works in [6] and [7] states that the computation of AiÃ—B+qiÃ—N which is pre computed D=B+N can be simplified into one selection operation. . One of the operands will be chosen among 0, N, B, and D if ( Ai , qi)=(0, 0), (0, 1), (1, 0), and (1, 1), respectively. In this multiplier, only onelevel CSA architecture is required to perform the carryskip at addition the expense of one extra 4to1 multiplexer. Operand D value can be store in an additional register.


CARRY SAVEADDER
In many cases we need to add several operands together, carry save adders are ideal for this type of addition. A carry save adder consists of a ladder of standalone full adders, which carries out a number of partial additions in every steps. The principal idea is that the carry has a higher power of 2 and thus is routed to the next column. Doing additions with Carry save adder saves time and logic.
In this method, for the first 3 numbers a row of full adders are used in Fig. 3.4. Then a row of full adders is added for each additional number. The final results, SUM and CARRY in the form of two numbers, are then summed up with a carry propagate adder or any other adder.
Fig 3.1 Carry Save Adder for 4 Bit Operand
When n bit adder is implemented, still carry is move to the next stage, of same row, even though inputs to the lower next stage is ready, it kept in wait state, until the carry and sum output didnt come from the above stage. This induces delay, now it can be optimized, if carry out is passed diagonally to the lower next stage, instead of rippling to the next stage of same row. Carry computation is not performed, but it is saved up to last row, in the last bottom row, carry is rippled however, where the results are obtained finally, but it significantly reduce the amount of delay occurred due to rippling operation.

PROPOSED MONTGOMERY MULTIPLICATION
In this section, we propose a Montgomery MM algorithm for carry skip adder to reduce the critical path delay of Montgomery multiplier. In addition, the disadvantage of more clock cycles for completing one multiplication can be rectified while maintaining the advantages of short critical path delay and low hardware complexity. The time delay can also be reduced while perform the multiplication operation on the carry skip adder. Carry skip adder plays a
vital role among all other adders in the area of power, time and area.
CARRYSKIP ADDER
In binary system, carry is either can be 0 or 1.only two possibility of value provides feasibility for choosing between two. So if there is a mechanism to select carry, or better say, skipping carry through several stages as shown in Fig. 4.1, then delay minimization can be better obtained. In the carry skip adder to speedup operation, carry propagation is skipped to position i. A carryskip adder reduces the carrypropagation time by skipping over groups of consecutive adder stges.
Fig. 4.1 Formal working of Carry Skip Adder
The carryskip adder is usually speed when compare to the carry lookahead technique, but it requires less chip area and consumes less power. To implement carryskip adder, stages are divided into rbit blocks of simple carry scheme. In each block, a ripple carry adder is utilized to produce the sum and carry for each block. Every block generates a block propagate and block generate signal. Also for the given column, CSKA uses the carry out equation in terms of the carry in signal.
Ck+1= Gk + Pk .Ck.carry out of ith stage
Also Generate and Propagate signals used by the carry skip
Gk = Ak .Bk
Pk = AkBk .carry propagate of ith stage
From this equation, it can be seen that setting the carryin signal to zero causes the carry out to serve as a block generate signal. Therefore, an N bit AND gate is also used to form block propagate signal. The block generates and propagates signals produce the input carry to the next block. To implement carry skip adder, stages are divided into blocks
Carry skip adder reduces the time needed to propagate carry by skipping over groups of consecutive adder stages. It requires less chip area and consumes less power than other adders. It has an execution time proportional to n.
SIMPLE CARRYSKIP ADDERS
SOME IMPORTANT INTERPRETATION ABOUT CARRY:
From the full adder truth table we can easily interpret, carry out is killed or deleted when both of the inputs to the full adder are zero, irrespective of carry in whether it is 1 or 0. The 2nd observation is carry which is feed by the previous stage propagates to the next stage when either of the inputs to the full adder at present stage is 1(high).The another observation states carry is generated at present stage and will be feed to next stage when both of the inputs to the full adders are high. Hence in the expression cout=ai.bi+(ai+bi).ci, 1st term tells about carry generation at the present stage, and 2nd term states about carry propagation to the next stage(look we used XOR gate there in the design of full adder, in place of or gate in 2nd term here, which gives is the original carryout expression, just to avoid use of extra OR gate in design, as it does effect either XOR gate or OR gate is used for carry out since the only difference caused between 2 gates is for when both of the inputs are high ,that even is compensated, because we consider carry generation by 1st term in above expression to rise carry out high.
A carryskip adder in Fig. 4.2, reduces the carrypropagation time by skipping over groups of consecutive adder stages. The carryskip adder is usually speed to the carry look ahead technique, but it requires less chip area and consumes less power.
Fig.4.2 MM1 Multiplier
In the carryskip adder, any adder stage can be skipped for which Pm = A xor B = 1, where Pm indicates the mth carry propagate. The adder structure is divided into blocks of consecutive stages with a simple ripplecarry scheme. Every block also generates a carrypropagate signal that equals 1 if all stages internal to the block satisfy Pm = 1. This signal can be used to allow an incoming carry to skip all the stages within the block and generate a blockcarryout. Fig. 8 shows an example block consisting of k bit positions j, j+1… j+k1.

OUTPUT AND ITS DESCRIPTION
In this section, we first analysis the delay and power of the proposed carry skip adder. Here the delay and power are compared with the different adders. In addition, we simulate the adder and provide the result for the modular multiplication. Finally several Montgomery multipliers are implemented and synthesized to demonstrate the efficiency of the proposed approach.
A.SIMULATION OF SCSMM MULTIPLIER
Fig. 5.1 Proposed system i.e. having two carry skip adder
The above Fig.5.1 show that Proposed system i.e. having two carry skip adder which perform repetitive addition in order to get a multiplication result by using Montgomery Modular Multiplication.
Fig. 5.2 Overall simulation of MMM algorithm
Fig. 5.3 Continuation of Overall simulation of MMM algorithm
B.POWER AND TIMING ANALYSIS
The Time Quest analyzer uses industrystandard Synopsys Design Constraint (SDC) methodology for constraining designs and reporting results.. Shown in fig. 5.4.
Fig. 5.4 Power analysis of SCSMM multiplier
The Power Play Power Analyzer performs post fitting power analysis and produces a power report that highlights, by block type and entity, the power consumed. Shown in fig. 5.5
Fig. 5.5 Timing analysis of SCSMM multiplier
These are the power and timing analysis of SCS based Montgomery modular multiplication by using carry skip adder.

CONCLUSION

CSA based Montgomery modular multiplication maintain a format leading to fewer clock cycle and also had a large delay. To improve the performance of the Montgomery modular multiplication, this paper has been modified the low delay high performance Montgomery modular multiplier. The proposed multiplier used carry skip adder which skipped the unnecessary addition operation. Finally the experimental result showed that the proposed approaches are indeed capable of enhancing the performance and reducing the delay of the Montgomery modular multiplier maintaining low hardware complexity.






REFERENCE

R. L. Rivest, A. Shamir and L. Adleman A method for obtaining digital signatures and public key cryptosystems, Published in 2010.

C. D. Walter, Montgomery exponentiation needs no final subtractions, Electron. Lett., vol. 35, no. 21, pp. 18311832, Oct. 1999.

T. Dhinesh Kumar & S. K. Srivatsa An effectiveMontgomery algorithm using multiplier circuits, Published in 2011.

M. Navya, A. Sireesha, B. Ramanjaneyulu High performance Montgomery modular multipliers for RSA cryptosystem, Published in 2014.

S. Ashwini, P. ThirapaiahA high speed Montgomery modular multiplication algorithm to reduce the energy consumption based on RSA cryptosystems, Published in 2015.

A. Akilavathi, A. VijayaPrabu Design and implementation of energy efficient and high throughput vedic multiplier, Published in 2016.

Aleixandre F. Tenca, Member, IEEE, and Cetin K. KocÂ¸ Senior Member, IEEE A Scalable Architecture for Modular Multiplication Based on Montgomerys Algorithm, Published in 2003.

YuanYang Zhang, Zheng Li, Lei Yang, ShaoWu Zhang An efficient CSA architecture for Montgomery modular multiplication, Published in 2007.

Jun Han, Shuai Wang, Wei Huang, Zhiyi Yu Parallelization of Radix2 Montgomery Multiplication on Multicore Platform, Published in 2013.