RNS And Adders Based Radix-8 2n -1 Modulo Multiplier

DOI : 10.17577/IJERTV1IS10284

Download Full-Text PDF Cite this Publication

Text Only Version

RNS And Adders Based Radix-8 2n -1 Modulo Multiplier

Reshma Janardhan

M.Tech Student, LIET, Hyderabad, AP ,India


The modulo 2n-1 multiplier is usually the non critical data path among all modulo multipliers in such high-DR RNS multiplier. This timing slack can be exploited to reduce the system area and power consumption without compromising the system performance. Several commercial processors have selected the radix-8 multiplier architecture to increase their speed, thereby reducing the number of partial products. Radix-8 encoding reduces the digit number length in a signed digit representation. Its performance bottleneck is the generation of the term 3X, also referred to as hard multiple. The modified booth encoder will reduce the number of partial products generated by a factor of 2 using radix-4 but by using radix-8 the partial products reduces by a factor of n/3.

A special moduli set of forms {2n-1, 2n, 2n

+1} are preferred over the generic moduli due to the ease of hardware implementation of modulo arithmetic functions as well as system-level inter- modulo operations, such as RNS-to-binary conversion and sign detections. To facilitate design of high-speed full-adder based modulo arithmetic units, it is worthwhile to keep the moduli of a high- DR RNS in forms of {2n-1, 2n, 2n +1}.The modulo 2n-1 multiplier is usually the non critical data path among all modulo multipliers in such high-DR RNS multiplier. With this precept, a family of radix-8 Booth encoded modulo 2n-1 multipliers, with delay adaptable to the RNS multiplier delay, is proposed. The modulo 2n-1 multiplier delay is made scalable by controlling the word-length of the ripple carry adder, employed for radix-8 hard multiple generation.

Keywords- Residue number system(RNS), RSA, Radix 8 booth encoding, moduli set, prefix structure.

  1. Introduction

    Two Algorithms that are mostly established in public Key cryptography are RSA (Rivest and Shamir) and ECC (Elliptical curve cryptography)[1]- [3]. For these algorithms Encryption and decryptions are performed by modulo multiplications. During encryption and decryption process of RSA ECC key sizes ranges from 512 1024 bits and 106 512 bits[3]- [6]. Due to this it becomes difficult to implement hardware because of Long carry propagation of large inter multiplies. To make it easy for hardware implementation Residue Number system has come into existence to design low power and faster multipliers.

    RNS (Residue Number System) is based on generic moduli. But we are going to prepare special moduli form of 2n-1 over generic moduli for the hardware implementation. There is triple moduli set

    {2n-1, 2n, 2n+1} And the speed of RNS processor of 2n-1 is increasingly dominated by the residue arithmetic operation rather than one time forward and one time back ward or reverse conversion .[7]

    1. Modulo 2n-1 Multiplier Architecture

      Modulo multiplier are used effectively for the area and time purpose. The design of 2n-1 modulo multiplier consists of three as shown in figure 1, steps one is modulo particial product generation then second slip for addition of those partial products and the third step for the addition of carry and propagate bits of the adder. The fig. 1 represents the modulo multiplier architecture. The modulo multiplication is extensively used in Residue number system (RNS) based digital signal processing and cryptography units.

      Figure 1: Modulo (2n-1) Multiplier Architecture.

  2. Review

    Radix-8 is the advanced technique than that of Radix4. In Radix 4 the multiplier the partial product generation the multiplicand is taken and is divided into digits denoted by Di where Di is one of


    Radix 4 are used for generation of soft multiples shown in table 1, where as Radix 8 is used for generation of both hard and soft multiples as shown in table 4.

    Table 1:Modulo Reduced Multiples For The Radix 4 Booth Encoding

    Table 2: Radix 4 Recoding.


    Signed values

    Xi+2 Xi+1 Xi

    0 0 0


    0 0 1


    0 1 0


    0 1 1


    1 0 0


    1 0 1


    1 1 0


    1 1 1


    In Radix 4 booth algorithm it starts by appending a zero to the right of X0 and three bits are take once as an input and they are denoted by Di.

    Example: 01000111-multiplicand 01001011-multiplier

    The number 1X represent the same number, 0X represents all digits are zero and 2x represents circular left shift by one position. The negative symbol for ix i.e 1x means 2's complement & -2x means 2's complement and circular left shift by one position.


    |di.X| 2n










    CLS( X ,1)






    Whereas for Radix 8 for the particle product generation the 8 bit multiplicand is considered and a zero bit is added at MSB LSB of it.

    Unlike the Radix 4 in Radix 8 we are grouping 4 bits each as one group and hence we obtain three partial products. Thus reduci8ng one of the partial product when compared with Radix4.

    Table 3: Modulo Reduced Multiples For Radix 8 Booth Encoding.

    Tables 4: Radix 8 Recoding

  3. Proposed Radix-8 Multiplier

    1. Generation Of Hard Multiple

      In RNS multiplier the carry propagation length in the hard multiple generation should not exceed 'n' bits so the carry propagation through the half adders as shown in figure 2 can be eliminated by making end around carry bit c7 a partially product to be accumulated in the CSA (carry save accumulator) tree in figure 3. so this technique reduces the carry propagation length to n bits as shown in figure below.

      Figure 2: Generation Of Hard Multiple Architecture

      2 -1

      |+3X| n

      Figure 3: Modified Architecture Of Hard Multiplier Using K Bit RCA's

      The carry out of RCA0 i.e.



      3 is not

      In Radix 8 we are having hard multiple generation i.e 3x. and this 3x is obtained by 2x +x

      propagated to the input of RCA 1 but is taken as one of the partial product bit to be accumulated in CSA

      i.e. circular left shift of the multiplicand added with

      the multiplier. Here, the Di ranges between {-4, -3, –

      tree. The output of the RCA1 i.e.



      3 is been

      2, -1, 0, 1, 2, 3, 4}

      The radix8 booth encoding reduces the partial products to (n/3)+1 from that of radix 4.

      After obtaining the partial products from the multiplier and the multiplicand they are given it to adder. the result of 2n-1 multiplier is the reduced output i.e we obtain |x.y|2n-1. And in this project the maximum range of the output is of only 255 because its designed for only 8 bit number.

      reduced before being accumulated. Therefore the partially redundant for of |+3X|2n-1 is given by


      And the partially redundant form of |-3X|2n-1 is given by


      To avoid many long string of ones in C an appropriate bias B is added to hard multiple so that C

      and C are sparse.

      M 1

      B 2

      K . j



      j 0 k k

      The block diagram for |B+3X|2n-1 is as shown in figure 4 and its generated by simple XNOR, OR operations on the bit positions 2kj.

      2 -1

      Figure 4: Generation Of Partial Redundant Of |B+3X| n

    2. Generation Of Soft Multiples

      Since the hard multiple can't be predicted at design time, all the multiples must be similarly represented. So the bias is added to the soft multiples also. It is represented as |B+0|2n-1, |B+X|2n-1,

      |B+2X|2n-1, |B+4X|2n-1 in the partial redundant form for 8-bit. the figure 5 represents the simple multiples.

      Figure 5: Generation Of Partially Redundant Of Simple Multiples

    3. Generation Of Partial Product

      To generate partial product it requires booth selector and booth encoder. to generate each partial product it requires one booth encoder and ten booth selector. in Radix 8 we can generate three partial products pp0 , pp1, pp2. the booth encoder produces signed one hot encoded digit from adjacent overlapping multiplier bits. The signed one hot encoded digit is then used to select the correct multiple to generate ppi. The block diagram of booth encoder and booth selector is as shown in figure 6.

      Figure 6.a: Booth Encoder

      Figure 6.b: Booth Selector

      In booth encoder the selected inputs i.e Di is given and we obtain any one as output from X,2X,3X,4X with respect to its sign value and the outputs are then again given as input to booth selector.where as to booth selector we give inputs as output of booth encoder and also the Di bits and we obtain the output as the partial product ppij.In partial product generation we obtain PPij and qij values. Each PPij consists of an n-bit vector, ppi7…….ppi1pp10 and a vector of n/k = 2 and redundancy carry bits qi1qi0 are the carryout bitsas in figure 7 and 8.After we obtain the partial product and redundancy carry bits we add compensation constant for the bias introduced in partial redundant representation

      Figure 7: Reduced Partial Product Generation

      Figure 8: Modulo Reduced Partial Product And CC For

      2 -1

      |X.Y| n

    4. Addition Of Partial Products

      2 -1

      The obtained partial product are accumulated in the carry save adder and also the partial carry redundant bit along with bias is given to the parallel prefix adder as in figure 9. Thus the output obtained is |X.Y| n

      Figure 9: modulo reduced partial product accumulation

    5. Modification Of Adder

      Now instead of using parallel prefix adder after accumulation we are using carry look ahead adder. So let us consider the case where the underlying adder is a one-level carry -look ahead adder The logic equation for such an adder are


      Pi= A i Bi + Ai B i =Ai Bi Ci =Gi+PiCi-1

      Si =Pi Ci-1

      Ci Gi+PiGi-1+PiPi-1Gi-2+….PiPi-1….P1G0Gn-1PiPi-1

      …P0 +Pn-1Gn-2PiPi-1……P0+Gi+1PiPi-1….. P0Pn-1…. Pi+2

      when the output is obtained by parallel prefix adder the number of the gate count is less than that of CLA. where as instead of parallel prefix adder if we give input to the carry look ahead adder the no of gate count is less when compared to previous. The implies that the area is reduced in CLA adder than that of parallel prefix adder.

  4. Result

    Figure 10 : Multiplier Simulation Results

    Figure 11:Radix-8 Synthesis Report (Gate Count) using parallel prefix adder

    Figure 12 :Radix-8 Synthesis Report (Gate Count) using carry look ahead adder

  5. Conclusion

    A new approach for modulo multiplication of 2n-1 has been approached and the partial products, hard and soft multiples are accomplished by the AND XNOR and OR gates. Thus a family of low area and low power modulo 2n-1 multiplies was proposed The decay of the proposed multiplies is controlled by word length of the small parallel RAS which is used to compute hard multiples From synthesis result its Clearly shown that the area is used by CLA is less when compare to parallel prefix And also its useful for dynamic range for RNS multiplication.

  6. Reference

  1. R. Rivest, A. Shamir, and L.Adle-man, A method for obtaining digital signatures and public key cryptosy-stems Commun. ACM, vol. 21, no.2, pp. 120126, Feb. 1978.

  2. V. Miller, Use of elliptic curves in cryptography in Proc. Advancesin Cryptology-CRYPTO85, Lecture Notes in Computer Science, 1986,vol. 218, pp. 417426.

  3. N. Koblitz, Elliptic curve cryptosys-tems, Mathemat. of Comput., vol.48, no. 177, pp. 203209, Jan. 1987.

  4. National Institute of Standards and Technology. Available: http:// csrc.nist.gov/publications/PubsSPs


  5. A. K. Lenstra and E. R. Verheul, Selecting cryptographic key sizes,J. Cryptol., vol. 14, no. 4, pp. 255293, Aug. 2001.

  6. C. McIvor, M. McLoone, and J. V. McCanny, Modified Montgomery modular multiplication and RSA

    exponentiation techniques, IEE Proc. Comput. and Dig. Techniq., vol. 151, no. 6, pp. 402408, Nov.2004.

  7. D. J. Soudris, V. Paliouras, T. Stouraitis, and C. E. Goutis, A VLSI design methodology for RNS full adder- based inner product architectures, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol.44, no. 4, pp. 315318, Apr. 1997.

Leave a Reply