Modelling of Elliptic Curve Scalar Multiplier on Lut-based FPGAS for Area and Speed

DOI : 10.17577/IJERTCONV2IS05065

Download Full-Text PDF Cite this Publication

Text Only Version

Modelling of Elliptic Curve Scalar Multiplier on Lut-based FPGAS for Area and Speed


Sakthi shree.E1 ,



Student,Dept. Of ECE

Student,Dept. Of ECE

Assistant professor,Dept.Of ECE

Sathyabama University,

Sathyabama University

Sathyabama University,




Keywords-ECSMA, Karatsuba multiplier, Montgomery ladder,Itoh-tsujii algorithm

  1. The rapid advances in information technology in the past few decades have led to intensive research on information security. Many technologies and cryptographical systems have been developed, all to secure information and protect it from unauthorized invaders. Public-key cryptography has been widely studied and used since 1975 when Rivest, Shamir, and Adleman invented RSA public key cryptography. This system heavily depends on integer factorization problem (IFP) using big key bits such as 1024 bits and 2048 bits. Later on Deffie-Hellman in developed the public key exchange algorithm using the discrete logarithm problem (DLP). ElGamal also used DLP in encryption and digital signature scheme. In 1985, Koblitz and Miller independently used EC in cryptography using elliptic curves discrete logarithm problem (ECDLP). In recent years, researchers have given more attention to develop the proposed ECC algorithms and improve their

    efficiency. Improving the efficiency of scalar multiplication in EC is one of the main interests of many researchers in the field of cryptology.

    The techniques proposed so far use different methods for representing the scalar k, which clearly shows different levels of computation speed and security. Binary representation is extended to signed binary representation, and its Non-Adjacent Form (NAF) algorithm . Other well-known techniques such as the window methods and the Montgomery method have brought about much improvement in terms of the efficiency of EC arithmetic. When doubling one point P to obtain 2P = R as a new point on E, extra field squaring over prime fields is required, but it is the same cost as in point adding if the curve has been defined over the binary fields. Adding two points P and Q on the same elliptic curve E, requires solving three equations, that involving one field inversion, one field squaring, and two field multiplications, which are costly operations in EC implementation. Some other operations involved in adding two points are addition, subtraction, and multiplication of small integers, which are in most cases neglectible operations. The proposed algorithms offers many different formulas for finding point multiplication from the given point P. One can use many doublings 2(…(2(2P))) with one or more extra point additions. The reverse operation also lead to find points such as P from (2P), which is called point halving.

    Elliptic curve cryptography is rapidly becoming the standard for public-key ciphers because of the large amount of security provided per key bit. Several accreditation bodies have migrated to ECC for their public-key cryptographic requirements. To match the speed requirements for real-time applications, hardware acceleration of ECC is a necessity. Field- programmable gate arrays Forman ideal platform for hardware implementations of security algorithms such as ECC. However, because FPGAs are resource- constrained, there are many challenges in developing designs to ensure better resource utilization, keeping the

    critical delay minimal. In this paper, we present the design of a high-speed ECC processor for binary fields on FPGA platforms. This paper proposes an efficient implementation of the field multiplier, which is the most important primitive in ECC. The elliptic curve scalar multiplier architecture presented uses a pipelined bit- parallel Karatsuba multiplier, with the objective of improving the utilization of the FPGAs lookup tables. For the inversion, another important field primitive, the ItohTsujii algorithm, is designed to use optimal exponentiation circuits specific to the LUT size of the underlying FPGA platform. These field primitives are combined to realize the elliptic curve scalar multiplier. In the next part, this paper explores opportunities for pipelining the design for high-speed.

  2. Basically Karatsuba stated that if we have to multiply two n-digit numbers x and y, this can be done with the following operations, assuming that B is the base of and m < n.

    First both numbers x and y can be represented as x1,x2 and y1,y2 with the following formula.

    x=x1 * B^m + x2 ; y= y1 * B^m + y2 —-(1)

    Obviously now xy will become as the following product.

    Xy=(x1 * B^m + x2)(y1 * B^m +y2) =>

    Lets see this formula by example.

    47 x 78

    x = 47

    x = 4 * 10 + 7

    x1 = 4

    x2 = 7

    y = 78

    y = 7 * 10 + 8

    y1 = 7

    y2 = 8

    a = x1 * y1 = 4 * 7 = 28 c = x2 * y2 = 7 * 8 = 56

    b = (x1 + x2)(y1 + y2) – a – c = 11 * 15 – 28 –


    Now the thing is that 11 * 15 its again a multiplication between 2-digit numbers,but fortunately we can apply the same rules two them. This makes the algorithm of Karatsuba a perfect example of the divide and conquer algorithm.

      1. fficiency analysis:

        Karatsuba's basic step works for any base B and any m, but the recursive algorithm is most efficient when m is equal to n/2, rounded up. In particular, if n is 2k, for some integer k, and the recursion stops only when n is 1, then the number of single-digit multiplications is 3k, which is nc where c = log23.

        a= x1 * y1 ;

        Since one can extend any inputs with zero digits until their length is a power of two, it follows that the number of elementary multiplications, for any n, is

        at most 3[log n] 3nlog 3.

        2 2

        b = x1 * y2 + x2 * y1 ;

        c = x2 * y2 —–(2)

        Finally xy will become:

        xy = a * B^2m + b * B^m + c

        However a, b and c can be computed at least with four multiplication, which isnt a big optimization. That is why Karatsuba came up with the brilliant idea to calculate b with the following formula:

        b = (x1 + x2)(y1 + y2) – a c —-


        That make use of only three multiplications to get xy.

        Since the additions, subtractions, and digit shifts (multiplications by powers of B) in Karatsuba's basic step take time proportional to n, their cost becomes negligible as n increases. More precisely, if t(n) denotes the total number of elementary operations that the algorithm performs when multiplying two n- digit numbers, then

        t(n)=3t([n2]) + cn +d —-(4)


        for some constants c and d. For this recurrence relation, the master theorem gives the asymptotic bound t(n) = (nlog 3).

        It follows that, for sufficiently large n, Karatsuba's algorithm will perform fewer shifts and single-digit additions than longhand multiplication, even though its basic step uses more additions and shifts than the straightforward formula. For small values of n, however, the extra shift and add operations may make it run slower than the longhand method. The point of positive return depends on the computer platform and

        context. As a rule of thumb, Karatsuba is usually fater when the multiplicands are longer than 320640 bits.

In polynomial basis representation of binary fields, addition is performed by bitwise EXCLUSIVE OR operation. For fixed irreducible polynomials, squaring circuits can be hardwired, thus making squaring operations easy. Operations such as multiplications and inversions are complex and should be implemented efficiently. This section describes the implementation of the field multiplication and inversion units in the ECSMA.

    1. ield multiplication

      Multiplication in binary fields involves a polynomial multiplication followed by a modular reduction. For high-speed implementations, bit-parallel multipliers are preferred to serial and word-parallel multipliers. Several algorithms for bit parallel field multiplication are available in the literature. The ECSMA in this paper uses a Karatsuba multiplier because of its sub quadratic complexity. It is shown in fig 3.1.

      1. The key to our Montgomery multiplier design is the recursive Karatsuba algorithm. This allows us to compute modular multiplication with a complexity approaching. Our design uses multiple-precision

        arithmetic techniques so that the critical path delay is independent of the multipliers bit-width. Unless stated otherwise, we assume we are multiplying two 2k-bit unsigned integers and the limb-widths of all components are w. The number of limbs in a 2k-bit word. We use either a coarse-grained carry-save technique or a pipelined multiple-precision technique in all of our adders and subtracters. The critical path of the circuit primarily depends upon the limb-width.

      In this section, we describe the construction of the ECSMA, which uses the left-to-right double and add algorithm with binary signed digit representation of the scalar Algorithm 1. The scalar s is the input to the processor and the output is the scalar product sP. Point arithmetic is done in the LD projective coordinate system. At every clock cycle, the register bank feeds the arithmetic unit through six buses. At the end of the clock cycle, the results of the computation are stored in the registers through buses C0, C1, C2, and Qout. Control signals are generated at every clock depending on the state of the machine and key digit. This section elaborates the working of the ECSMA shown in Fig 5.1

      The ECSMA discussed in the previous section suffers from low operating frequency because of the long combinational paths. The operating frequency can be increased by splitting the critical paths using the pipeline strategy. Data dependencies present in the point arithmetic steps may introduce bubbles in the pipeline, resulting in an increase in the clock cycle requirement. The number of bubbles during execution is likely to

      increase with the number of pipeline stages. Thus, with the increase in pipeline stages, though the delay reduces, the clock cycle requirement increases. Therefore designing a pipeline would require a balance between clock cycle requirement and critical delay. This section first identifies the critical paths in the ECSMA and then theoretically estimates the best pipeline strategy for the design.

        1. Itoh-Tsujii algorithm

          The Itoh-Tsujii inversion algorithm is used to invert elements in a finite field. It was introduced in 1988 and first used over GF(2m) using the normal basis representation of elements, however the algorithm is generic and can be used for other bases, such as the polynomial basis. It can also be used in any finite field, GF(pm).

          The algorithm is as follows:

          Input: A GF(pm)

          Output: A1

          1. r (pm 1)/(p 1)

          1. compute Ar 1 in GF(pm)

          2. compute Ar = Ar 1 · A

          3. compute (Ar)1 in GF(p)

          4. compute A1 = (Ar)1 · Ar 1

          5. return A1

        This algorithm is fast because steps 3 and 5 both involve operations in the subfield GF(p). Similarly, if a small value of p is used a lookup table can be used for inversion in step 4. The majority of time spent in this algorithm is in step 2, the first exponentiation. This is one reason why this algorithm is well-suited for the normal basis, since squaring and exponentiation are relatively easy in that basis.

  1. .Montgomery ladder

Points on an elliptic curve E, defined over a finite field GF (q), along with a special point called infinity, and a group operation known as point additio n, form a commutative finite group. If P is a point on the curve E, and k is a positive integer computing(eqn 5)

kP = P + P + P + · · · + P —-(5)

is called scalar multiplication. The result of scalar multiplication is another point Q on the curve E. It is normally expressed as Q = kP . If E is an elliptic curve

defined over GF (q), the number of points in E(GF (q)) is called the order of E over GF (q), denoted by

#E(GF (q)). For cryptographic applications #E(GF (q))

= rh where r is prime and h is a small integer and P and Q have order r. Scalars such as k are random integers where 1 < k < r 1. Since r q, the binary representation of k = ki2i has n bits where n m = log2 q. Scalar multiplication is the most dominant computation part of elliptic curve cryptography.

Algorithm A shows the Montgomery scalar multiplication scheme for non-supersingular elliptic curves over binary fields as it was introduced . In this algorithm Madd(X1,Z1,X2,X2), Mdouble(X1,Z1) and Mxy(X1,Z1,X2,X2) are functions for point addition, point doubling and conversion of projective coordinates to affine coordinates.

  1. Point multiplication technique

    Point Multiplication is the basic computation primitive of elliptic curve cryptography. The definition of corresponding operations depends on a particular field, but they always amount to combinations of arithmetic operation.

    kP = P + P + P + · · · + P —-(6)

    Where P is a point on an elliptic curve E and k is an integer in a range 1 k < order (P). Accordingly, the elliptic curve point multiplication means that the point P is added to itself k times. The order of the point P is n0 if and only if P multiplied with n0 results in the point at infinity.

  2. Elliptic curve point addition & doubling Point addition:

To add two distinct points P and Q on an elliptic curve.

Point doubling:

The point-doubling operation amounts to squaring operations of any binary number.

E..Finite field arithmetic

A field F (finite field) is equipped with two operations, addition and multiplication. Subtraction of field elements is defined in terms of addition.

  1. The simulation results of Karatsuba Multiplier is shown below. The result in figure 8.1 and figure 8.2 shows the area utilized in the existing system.

    Figure 8.3 shows the top level module implementation (RTL Schematic) of the Multiplier.

    Figure 8.6 shows the device utilization summary of proposed system. Figure 8.7 shows the detailed analysis of the proposed system.

    Fig 8.6 Device Utilization summary of proposed system

    The simulation results of Montgomery ladder is shown in the figure 8.8 and karatsuba multiplier is shown in the figure 8.9.

    Table 8.1 shows the comparison of existing and proposed system.



This proposed elliptic curve scalar multiplier used for high-speed and area-constrained applications. The implementation uses a novel pipelined bit-parallel Karatsuba multiplier that has sub quadratic complexity. A theoretical model was used to analyze delay of critical paths in the ECSMA and to obtain optimal pipelining. Efficient choice of scalar multiplication algorithm, optimized field primitives, balanced pipeline stages, and

enhanced scheduling of point arithmetic resulted in a high-speed architecture with a significantly small area.

In future te power analysis can be done by using tools like TSpice and it can be reduced.


  1. Wollinger.T, Guajardo.J, and Paar.C, Security on FPGAs: State-of the-art implementations and attacks, ACM Trans. Embedded Comput. Syst., vol. 3, no. 3, pp. 534574, 2004.

  2. Roy.S.S , Rebeiro.C., and Mukho padhyay.D, Theoretical modeling of the Itoh- Tsujii inversion algorithm for enhanced performance on k-LUT based FPGAs, in Proc. Design, Autom., Test Eur. Conf. Exhibit., Mar.2011, pp. 16.

  3. López.J and Dahab.R., Improved algorithms for elliptic curve arithmetician GF(2n), in Proc. Sel. Areas Cryptography, 1999, pp. 201212.

  4. Hankerson.D, Menezes.A.J, and Vanstone.S., Guide to Elliptic Curve Cryptography. Secaucus, NJ: Springer- Verlag, 2003.

  5. Joye.M. and Yen.S.M., Optimal left-to-right binary signed-digit recoding, IEEE Trans. Comput., vol. 49, no. 7, pp. 740748, Jul. 2000.

  6. López.J. and Dahab.R., Fast multiplication on elliptic curves over GF (2m) without pre computation, in Proc. 1st Int. Workshop Cryptographic Hardw. Embedded Syst., 1999, pp. 316327.

  7. Orlando.G. and Paar.C., A high performance reconfigurable elliptic curve processor for GF (2m), in Proc. 2nd Int. Workshop Cryptographic Hardw. Embedded Syst., 2000, pp. 4156.

  8. Song.L. and Parhi.K.K, Low-energy digit- serial/parallel finite field multipliers, J. VLSI Signal Process., vol. 19, no. 2, pp. 149166, Jul.1998.

  9. Gura.N., Shantz.S.C., Eberle.H,, Gupta.S., Gupta.V., Finchel stein.D.,Goupy.E., and Stebila.D., An end-to- end systems approach to elliptic curve cryptography, in Proc. 4th Int. Workshop Cryptographic Hardw. Embedded Syst., 2003, pp. 349365.

  10. Järvinen.K., Optimized FPGA-based elliptic curve cryptography processor for high speed applications, Integr., VLSI J., vol. 44, no. 4,pp. 270279, Sep. 2011.

  11. Solin.J.A as, Efficient arithmetic on Koblitz curves, Designs, Codes Cryptography, vol. 19, nos. 2 3, pp. 195249, 2000.

  12. Järvinen.K. and Skytta.J., On parallelization of high-speed processors for elliptic curve cryptography, IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 16, no. 9, pp. 11621175, Sep. 2008.

  13. Azarderakhsh.R. and Reyhani-Masoleh.J., Efficient FPGA implementations of point multiplication on binary Edwards and generalized Hessian curves using Gaussian normal basis, IEEE Trans. Very LargeScale Integr. (VLSI) Syst., vol. PP, no. 99, p. 1, Jun. 2011.

  14. Saqib.N.A., Rodríguez-Henríquez.F., and Diaz- Perez.A., A parallel architecture for fast computation of elliptic curve scalar multiplication over GF(2m), in Proc. 18th Int. Parallel Distrib. Process. Symp., Apr.2004, pp. 144151.

  15. Pu.Q. and Huang.J., A micro coded elliptic curve processor for GF(2m)using FPGA technology, in Proc. Int. Conf. Commun., Circuits Syst.,vol. 4. Jun. 2006, pp. 27712775.

Leave a Reply