Hardware Acceleration of Elliptic Curve Cryptographic (ECC) Scalar Multiplication Unit over Binary Polynomial based Galois Field GF (2m) using Verilog HDL

Mohammad Abdul Momen; Md.    Ishtiaque Mahmood; Iqbalur Rahman Rokon

doi:10.17577/IJERTV4IS050986

Volume 04, Issue 05 (May 2015)

Hardware Acceleration of Elliptic Curve Cryptographic (ECC) Scalar Multiplication Unit over Binary Polynomial based Galois Field GF (2m) using Verilog HDL

DOI : 10.17577/IJERTV4IS050986

Download Full-Text PDF Cite this Publication

Open Access
[post-views]
Total Downloads : 288
Authors : Mohammad Abdul Momen, Md. Ishtiaque Mahmood, Iqbalur Rahman Rokon
Paper ID : IJERTV4IS050986
Volume & Issue : Volume 04, Issue 05 (May 2015)
Published (First Online): 25-05-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Hardware Acceleration of Elliptic Curve Cryptographic (ECC) Scalar Multiplication Unit over Binary Polynomial based Galois Field GF (2m) using Verilog HDL

Iqbalur Rahman Rokon ,Mohammad Abdul Momen, Md. Ishtiaque Mahmood

Department of Electrical and Computer Engineering North South University

Dhaka, Bangladesh

Abstract- This paper describes algorithms and implementation of those algorithms that will hardware accelerate scalar multiplication unit of ECC over binary polynomial based Galois fields in the particular case of the K-163 NIST- recommended curve. The hardware/circuit design has been done in Verilog and synthesized and simulated in Altera Quantus-II and Modelsim, respectively. In finite field operations, GF Division is used instead of GF Inversion which makes the division operation in finite field more independent and faster. Furthermore, instead of double-and-add algorithm, Frobenius Map algorithm is used which makes the hardware faster.

Keywords- Elliptic Curve Cryptography, Frobenius Map, Hardware Acceleration, Galois Field.

INTRODUCTION

requires consumption of both smaller area and less computational time, being m the degree of the irreducible polynomial (m = 163). After implementing the design of EC scalar multiplication, computational time was reduced from 46.6 Âµs [1] to 14.6 Âµs.
ELLIPTIC CURVE

Suppose K is finite field and elliptic curve E over K is defined by a non-singular Weierstrass equation [2, 3] y2+ a1xy +a3y = x3 + a2x2 + a4x + a6 where a1, a2, a3, a4, a6 belong toK. GivenL of K is an extension field,the following relation defines the corresponding

elliptic curve E(L):

E(L) = {(x,y) L x L:

2 3

y + a1xy+ a3y = x + a2x2 + a4x + a6} {}

In several cryptographic algorithms, signature schemes, public-key encryption or symmetric key generation Elliptic curve (EC) scalar multiplication is basic operation. Traditionally, scalar multiplication is implemented in software using generalpurpose processors or on digital signal processors. In some cases software time constraints cannot met with instruction-set processors and as a result specific hardware or circuit must be designed for executing very complex operations which will take much less time than software.

Now-a-days for developing specific circuits Field Programmable Gate Arrays (FPGA) is used instead of Application Specific Integrated Circuits (ASIC) because of reprogrammable option, small production quantities and much lower engineering cost than ASICs.

This paper describes algorithms and implementation of those algorithms that will hardware accelerate scalar multiplication unit of ECC over binary polynomial based Galois fields in the particular case of one of the NIST- recommended Koblitzcurves, namely K-163.. The circuit design has been done in Verilog and synthesized and

is point at infinity, which is an additional point.Given an elliptic curve E modulo p, the number of points of L on the curve is denoted by E (L) [4] and is bounded by:

p + 1 2 p E (L) p + 1 + 2 p (1)

Number of points is approximately equal to thenumber of field elements:

#E(L) p(2)

Equation of an elliptic curve E(L) over K is

y2 +xy=x3 +x2 +1 (3)

GF (2163) is the extension field L and Reduction Polynomial representation of GF (2163) is used.

F(x) = x163 + x7 + x6 + x3 + 1(4)

E(L) can be defined as addition operation. Here neutral element is point at infinity and the point addition is defined as follows:

Let elements of E(L) be P (x1, yl) and Q(x2, y2); then

P+ = +P =P, (x y)+(x, x+y) = ; if P Q and P -Q, then P+Q =(x3, y3) where

simulated in Altera Quantus-II and Modelsim, respectively.

x = 2++x +x +a

1 + 2

In the design, efficient bit-series algorithms are compared 3 1 2

= (5)

1 + 2

and implemented for Galois Field Operations and EC Scalar Multiplication Operation considering among speed, cost and area constrains, so that the proposed circuit

y3 = (x1+ x3)+x3+y1

if P = Q and P -P, then P + P = (x3, y3) where x3= 2+ +x1+x2+a = 1 + 2(6)

1 + 2

y3 = (x1+ x3)+x3+y1

For the basic operation of ECC We consider a primitive elementP and another element T. kPis the scalar product of a natural number k by a curve point P can be defined as

T = kP = P+ P+ Â· Â· Â· + P (K times)

ECC Scalar Multiplicationis based on Binary Polynomial Based Galois Field arithmetic. Figure 1 depicts this hierarchical structure of arithmetic operations usedfor elliptic curve cryptography over finite fields.

Scalar Multiplication

Point Doubling (if P= Q) Q = P + P

Point Addition (if PQ) Q = P + Q

Elliptic Curve Cryptographic Operation

= 0 + 1 + + 1 1 = 0 + 12 + +

1 (9)

Using the fact that = + 1 1 + + 1 +

0 we have = 0 + 1 + + 1 1, where are the coefficient of the irreducible polynomial. Substituting this expression in equation (8) we obtain

= 0 + 1 + + 1 1(10)

Where,

0 = 10

= 1 + 1 = 1,2, , 1 (11)

Assume that the function, functionProduct_alpha_A(a,f: poly_vector) return poly_vector

implementing Eq. (9) according to Eq. (10) & Eq. (11) and therefore the polynomial () has been

Galois Field Operation

Addition

`

Multiplication

Square

Inversion

Figure 1: Basic Hierarchy of Elliptic Curve.

defined, where _ is a bit vector from 0 to

1.

Assume also that the functions

function m2abv(x: bit; y: poly_vector) return poly_vector

function m2xvv(x, y: poly_vector)
GALOIS FIELD ARITHMETIC

EC over field2 includes arithmetic of integer with length m bits. The binary string can be declared as polynomial: Binary String: (1 10)

Polynomial: 1 1 + 2 2 + + 22 +

1 + 0where = 0
1. Addition over GF(2m)
  
  Addition of field elements is performed bitwise, and the sum of () and () given as
  
  = + = 1( + )(7)
  
  return poly_vector
  
  In a least-significant-bit (LSB) multiplier, the coefficients of b(x) are processed starting from the least-significant bit
  
  0and continue with the remaining coefficients one at a time in ascending order. Thus multiplication according to this scheme is performed in the following way:
  
  =
  
  = 0 + 1 + 2 2 +
  
  + 1 1 ()
  
  = 0 + 1( ) + 2( 2) +
  
  =0
  
  + 1( 1) ()
  
  The bit additions in + is performed by 2 and translate to an () operation.
2. Multiplication over GF(2m)
  
  Multiplication of two elements , in (2 ) can be expressed as
  
  =
  
  1
  
  =
  
  =0
  
  1
  
  = ()
  
  =0
  
  Therefore, the product can be computed as
  
  = 0 + 1 + 2 2 + + 2 +
  
  + 1 1 () (8)
  
  In order to compute above equation a quantity of the form
  
  where = 1 1 + + 1 + 0 , with
  
  (2) has to be reduced modulo (). The product
  
  = 0 + 1( ) + 2( ) + +
  
  1( 2 ) (12)
  
  Algorithm 1: LSB-first multiplier
  
  for i in 0 . . m-1 loop c(i) := 0; end loop; for i in 0 .. m-1 loop
  
  c := m2xvv(m2abv (b(i) , a,), c) a := Product_alpha_A(a,f )
  
  <>end loop;
3. Squaring over GF(2m)
2 computation is done br a specific synthesized circuit [8]. It can be shown that2 = + + ,

= 162 162 + + 1 + 0

= +163 2 ,

= 2 (13)

= 162 162 + + 1 + 0

= 0 < 7,

7 = 82 ,

= () can be computed as follows:

= +156 2 & 8,

= +157 2 & 8,(14)

= 162 162 + + 1 + 0

0 = 160

1 = 160 + 162

2 = 161

3 = 160 + 161

4 = 82 + 160

5 = 161 + 162

6 = 83 + 160 + 161

7 = 0

8 = 84 + 160 + 161

9 = 0

10 = 85 + 161 + 162

11 = 0

12 = 86 + 162

= 0 > 12 &,

= +160 2 > 12 (15)

All outputs 2 , but 2 6 , 2 8 and 210 , are Boolean

d := d + 1;

else

if s(m) = 1 then

s := m2xvvm(s,r);

v := m2xvvm(v,u); end if;

s := rshiftm(s); if d = 0 then

auxm := s; s := r; r := auxm; auxm := v; v := u;

u := rshiftm(auxm); d := 1;

else u := lshiftm(u); d := d – 1; end if; end if;

end loop;

E. Division over 2

Given three polynomials

functions of less than five variables, while 2 6 , 2 8 and

=

1 +

2 + + +

210 are five-variable Boolean functions. Thus, the

1 2 1 0

= 1 + 2

computation time of 2 is approximately equal to the

1

= 1

1

2

+ 2

+ + 1 + 0

2 + + 1 + 0

computation time of a five-variable Boolean function.

D. Inversion over GF(2m)

Extended Euclidean algorithm for polynomials

The greatest common divisor (GCD) of and (and are binary polynomials and they are not zero), are denoted by = (, ). is the largest common divisor. In the classical Euclidean algorithm, ()

()computes the of binary polynomials. is divided by to obtain a quotient . A remainder satisfying = + and () < ().

In such state, the problem in determining (, )reduces the computation of (, ), where (, )have lower degrees than (, ). The process should be repeated until one of the arguments reaches to zero. Then the result is immediately obtained since (0, ) = . The algorithm is reached and ended since the degrees of the remainders decrease. The Euclidean algorithm can be extended to find binary polynomials and satisfying + =

where = ( , ).

1 + 1 = 1 (16)

2 + 2 = 2 (17)

The algorithm ends when u value reaches zero, in the case of 2 = gcd(, )& 2 + 2 = . The next algorithm is used to compute (, ) (Hankerson, et al. 2004) (Liu 2007).

Algorithm 2: Inversion in F2m using the extended Euclidean algorithm

for i in 0 .. m loop

s(i) := f(i); r(i) := a(i); v(i) := 0; u(i) := 0; auxm(i) := 0;

end loop;

u(0) := 1; d := 0;

for i in 1 .. 2*m loop if r(m) = 0 then

r := rshiftm(r); u := rshiftm(u);

The quotient = 1 can be computed with an algorithm based on the following properties( =

): given two polynomials

where is not divisible by, that is 0 = 1, then

, 0 = 0, (, ) =

(/, ) (18)

, 0 =

1, (, ) = (( + )/, ) = (( +

)/, )(19)

The following formal algorithm, in which the function divides by (, ) computes, 1 , generates the quotient = 1

Algorithm 3: Division of polynomials modulof (binary algorithm)

a := f; b := h; c := zero; d := g; alpha := m; beta := m-1; while beta >= 0 loop

if b(0) = 0 then

b := shift_one(b); d := divide_by_x(d, f); beta :=beta – 1; else

old_b := b; old_d := d; old_beta := beta; b := shift_one(add(a, b));

d := divide_by_x(add(c, d),f); if alpha > beta then

a := old_b; c := old_d; beta := alpha – 1; alpha :=old_beta; else beta := beta – 1;

end if; end if; end loop;

if b(0) = 0 then z := d; else z := c; end if;

The sum of two binary polynomials amounts to the component-by-component , and the division by to a one-bit left shift. The more complex primitive is the division by It is computed as follows:

1 = 0 1 + 1 + 01 2 +

2 + 02 3 + + 1 + 01(20)

The circuit corresponding to the computation of 1 or

1 , according to the value of0 , is the more time consuming operation). The number of steps of algorithm 3 is smaller than2. Thus, the total computation time of = 1 is smaller than 2 times the computation time of the circuit, that is 2 times the computation time of a five-variable Boolean function if is assumed to be constant.

In this paper instead of using GF Inversion, we used GF Division for division operation. By using GF Inversion(Figure 2)for a division operation first the denominator (y) had to inverted (y-1) and then it had to multiply (x *{y-1}) to get division output which makes the system slow and dependable. By using GF Division(Figure 3), now division operation first of all independent and secondly requires less computation time.

X

B. Frobenius Map

The chosen curve is a Koblitz curve for which an interesting property can be used. Define the Frobenius map
[11] X from E(L) to E(L):

() = () ,(x, y) = (x2, y2) (22)

It can be demonstrated thatP +P = 2(P) + (P) with = 1 if c = 1 and = 1 if c = 0

More generally, it is possible to express kP under the form:

kP= rt 1t 1(P) + rt 2t 2(P) + . . . + r1(P) + r0Pwith ki

{ 1, 0, 1} (23)

to which correspond the formal algorithms 2 in which frobenius is a function computing relations (8), which in turn includes three computation primitives: adding (whenki

= 1), subtracting (when ki = -1) and . The difference with the basic algorithm is that point doubling has been substituted by the Frobenius map computation, that is

Input Y

Y-1 X * Y-1

Output

squaring, an easy operation over a binary field. Obviously it remains to express kP under the form (4). Given two integers a and b, define an application cc = a + bt from

Figure 2: Division Using GF Inversion

X / Y

Input Output

Figure 3: GF Division
POINT MULTIPLICATION

The parameter sets of the K-163 binary koblitz curves standardized by NIST [10] for ECC is below (hexadecimal)

p(t) = 800000000000000000000000000000000000

000C9, a = 1, G_x =2fe13c0537bbc11acaa07d793d e4e6d5e5c94eee8, G_y = 289070fb05d38ff58321f 2e800536d538ccdaa3d9, r = 584600654932361167

2814741753598448348329118574063

where p(t) is the reduction polynomial, a is the curve coefficient, G_x and G_y is the x and y coordinates of the base generator point G, r is the base point's order.

A. Double-and-Add (basic) Algorithm

Let the binary representation of k be (km-1, km-2,,k0) that is k

= km-1*2m-1 + km-2*2m-2 + … + k0*20.Then according to the following schemeKP can be computed (right to left)

kp = k0p + k1(2p) + k2(22p) + … + km-1(2m-1p) (21)

(21) can be implemented with algorithm 1 which consists of m iteration steps, each of them including atmost two function calls point adding (Q = Q+P) and/or point doubling(P = P+P).

Algorithm 4: Scalar multiplication (Q = kP)

Q:= point at infinity; for i in .. m-1 loop

P := P + P;

if k(i) 1 then Q := Q + P; end if; end loop;

E(L) to E(L): (P) = aP + b (P). Then look for two integers a' and b' such that

(P) = ' ((P)) + rP

where ' = a' + b' and r {-1, 0, 1}(24)

To summarize:
1. If a is odd and a1b0 = 0, then r = 1, b' = (a 1)/2, and a' = b b' = b + (a 1)/2.
2. If a is odd and a1b0 = 1, then r = 1, b' = (a + 1)/2 and a'= b b' = b + (a + 1)/2.
Equation defines a kind of integer division of by , that is = ' + r with r {1, 0, 1}

By repeatedly using the previous relation, an expression of can be computed:

= 1 + r0 1 = 2 + r1

. . .

t 1 = t + rt 1(25)

withri { 1, 0, 1}. Thus (multiply the second equation by , the third one by 2, and so on, and sum up the t equations)

= r0 + r1 + . . . + rt 1t 1 + t t(26)

Algorithm 5: Point multiplication (Q = kP), Koblitz curve Q :=point_at_infinity;

for i in 0 .. t-1 loop

if r(i) = 1 then Q := Q + P; elsif r(i) = -1 then Q := Q-P;

end if;

P := frobenius(P); end loop;

Assume that algorithm 5 is used. The coefficients ki can be computed in parallel with the other operations of algorithm
1. As the value of t is not known in advance, the computation is performed as long as aj = aj + bjt 0, that is aj 0 and b j 0. Initially a0 = k, that is a0 = k and b0 = 0:
  
  Algorithm 6:Point multiplication (Q = kP), Koblitz curve,
  
  -ary representation
  
  Q := point_at_infinity; a := k; b := 0; if a /= O then
  
  loop
  
  if a(0) = 0 then r_i := 0;
  
  elsif (a(1) + b(0)) mod 2 = 0 then r_i := 1; Q := Q + P;
  
  else r_i := -1; Q := Q – P; end if; old_a := a; a := b + (old_a -r i)/2; b (r i – old a)/2;
  
  if a = 0 and b = 0 then exit; end loop;
  
  Start
  
  Square GF (2m)
  
  square_x = (xxP)2
  
  xxP = Px yyP = Py a = k
  
  b = 0 (zero) Q_infinity = 1
  
  Input: P (Px, Py), KPrivate Key
  
  Square GF (2m)
  
  square_y = (yyP)2
  
  a[0] = 0 No a[1] = b[0] No
  
  Calculate updated_a and updated_b
  
  Yes Yes
  
  Calculate updated_a and updated_b
  
  Calculate updated_a and updated_b
  
  end if;
  
  To summarize, doubling has been substituted by squaring, an operation executable in one clock cycle. Furthermore, among two successive coefficients ki, at least one is equal to 0, so that the number of non-zero coefficients Ki is
  
  Q_infinity =1
  
  Yes
  
  No P,Q
  
  Qx = xR Qy = yR
  
  Qx = xxP Qy = yyP
  
  Q_infinity = 0
  
  Point Addition
  
  R
  
  Q_infinity =1
  
  No
  
  Point Addition
  
  P,Q
  
  Qx = xR Qy = yR
  
  R
  
  Yes
  
  yyP
  
  (xxP yyP)
  
  Qx = xxP
  
  smaller than m. Thus, the computation of KP includes at most m operations (adding or subtracting), so that the total computation time should be roughly half the computation time of the basic algorithm.
  
  The algorithm 7, deduced from algorithms 5 and 6,
  
  a = 0 &
  
  No b = 0
  
  Yes
  
  Qy = (xxP yyP) Q_infinity = 0
  
  xxP = square_x yyP = square_y a = updated_a b = updated_b
  
  computes Q = kP, with k < n and P of order n. At the end of step number i of algorithm 5, Q = k0P + kl (P) + k22
  
  (P) +…+ ki-1 i-1(P). If can be shown that, unless all coefficients k0, k1, … , ki-1, are equal to 0, Q . As before, instead of defining a specific representation for 0o, Boolean flags Q infinity and R infinity are used. Figure 4 reflects full operation of algorithm 7.
  
  Algorithm 7: Point multiplication (k < 2m), Frobenius map
  
  Q_infinity true; xxP =xP; yyP = yP; a = k; b= zero;
  
  while ((a /= zero) or (b /= zero)) loop if a(0) = then r_i = 0;
  
  elsif a(l) = b(0) then r_i = 1; if Q infinity then
  
  (xQ,yQ) =(xxP,yyP); Q_infinity = false;
  
  else (xQ,yQ) = adding ((xxP , yyP) , (xQ , yQ)); end if;
  
  else r_i = -1;
  
  if Q_infinity then xQ = xxP
  
  yQ = xxP + yyP; Q_infinity = false;
  
  else (xQ,yQ) = adding ((xxP , xxP + yyP) , (xQ , yQ));
  
  end if; end if;
  
  xxP = square(xxP); yyP = square(yyP); old a = a; a = b + (old_a – r_i)/2; b = (r_i – old_a)/2;
  
  if a = 0and b = 0 then exit end loop;
  
  Output: APublic Key = KPrivate Key * Q (Qx, Qy) Stop
  
  Figure 4: Flowchart Point multiplication (k < 2m), Frobenius map
  
  Scalar Multiplication (Frobenius Method)
  
  Point Addition
  
  Elliptic Curve Cryptographic Operation
  
  Galois Field Operation
  
  Addition
  
  Multiplication
  
  Square
  
  Division
  
  Figure 5: Design Hierarchy of Point multiplication (Frobeniusmap)
  
  After applying Frobenius map as a point multiplication algorithm and above stated Galois Field algorithms the design hierarchy changes drastically (Figure 5). Only for Frobenius map speed has boosted about 50% just because of we are using GF Square instead of point doubling. Figure 5 Design Hierarchy of Point multiplication (Frobenius map).

HARDWARE DESIGN OF THE SYSTEM

Top View

Figure 6 shows the block diagram of Scalar Multiplication

and Table I is the pin description Scalar

Multiplicationblock.

start reset

start (POINT_ADDITION)

clk

start

(GF Addition)

done

(GF Addition)

Flag: Squaring output non zero

xP [162:0]

yP [162:0]

k [162:0]

Scalar Multiplication

xQ [162:0]

yQ [162:0]

reset

rx [162:0]

ry [162:0]

done (POINT_ADDITION)

POINT_ADDITION

Control Unit

start (GF Multiplication) done (GF Multiplication)

clk

done

start

(GF Division)

done (GF Division)

Figure 6: Block Diagram of Scalar Multiplication Table I

Pin Name	Input/Output	Description
xP [162:0]	Input	Curve Generator Point Co- ordinate x
yP [162:0]		Curve Generator Point Co- ordinate y
K [162:0]		Private Key
reset		Reset Flag
start		Start Flag
clk		System Clock
xQ [162:0]	Output	Public Key/Scalar Multiplication Co-ordinate x
yQ [162:0]		Public Key/Scalar Multiplication Co-ordinate y
done		Done Flag

Pin Description of Top Block

Figure 9: Point Addition Control Unit

EXPERIMENTAL RESULTS AND DISCUSSION

Synthesis Result

Altera Quantus-II was used to Analyze and synthesize the design. Synthesis report is shown in Table II, III and IV. Figure 10 shows the RTL view of Altera Quantus-II synthesis tool.

clk

RESET

xP [162:0]

yP [162:0]

k [162:0]

start xQ [162:0]

xQ [162:0]

done

Scalar Multiplication Control

logic Unit

Square GF(2m)

Addition GF(2m)

Square GF(2m)

Addition GF(2m)

Division GF(2m)

Addition GF(2m)

Multiplication GF(2m)

Figure 10: Register Transfer Levelviewof Scalar Multiplication from Altera Quantus-II

Point Addition Control Unit

Flow Status
Quartus II 64-Bit Version	14.1.0 Build 186 12/03/2014 SJ Web Edition
Revision Name	scalar_multiplication
Top-level Entity Name	scalar_multiplication
Family	Cyclone V
Device	5CSEMA5F31C6
Timing Models	Final
Logic utilization (in ALMs)	3,982
Total registers		6576
Total pins	819

Table II Quantus II Flow Summary

Addition GF(2m)

Point Addition Scalar Multiplication

7: Internal Block diagram of Scalar Multiplication

Figure

Figure 7 displays the entire hardware internal block diagram of Scalar Multiplication Unit. In this diagram, how the input/output pins are connected to Scalar Multiplication Control Logic Unit (SMCLU) (Figure 8)and the SMCLU is connected to the Point Addition and GF Square unit is shows. Furthermore, displays how the input/output connections of Point Addition and GF Arithmetic Operations are controller by Point Addition Control Unit(Figure 9).

Table III Resource Usage summary

xP [162:0]

yP [162:0]

k [162:0] start (SCALAR_MULTIPLICATIPON)

reset clk

xQ [162:0]

xQ [162:0] done (SCALAR_MULTIPLICATIPON)

SCALAR_MULTIPLICATION

Control Logic Unit

px [162:0]

Resource	Usage
Estimate of Logic utilization (ALMs needed)	3585
Combinational ALUT usage for logic	4932
— 7 input functions	1
— 6 input functions	181
— 5 input functions	846
— 4 input functions	1492
— <=3 input functions	2412
Dedicated logic registers	6576
I/O pins	819

py [162:0]

qx [162:0]

qy [162:0]

start (POINT_ADDITION)

rx [162:0]

ry [162:0]

done (POINT_ADDITION)

Figure 8: Scalar Multiplication Control Logic Unit

Table IV

General Register Statistics

CONCLUSION

Resource	Usage
Maximum fan-out node	clk~in put
Maximum fan-out	6576
Total fan-out	40417
Average fan-out	3.07

Statistic	Value
Total registers	6576
Number of registers using Synchronous Clear	2612
Number of registers using Synchronous Load	1469
Number of registers using Clock Enable	3770

This paper represents an implementation of hardware accelerated Elliptic Curve co-processor components, Scalar Multiplication and Point addition. By only using Frobenius Mapfor Scalar Multiplication, the computation is accelerated about 50%. Furthermore, in Galois Field operations most efficient algorithms are used for GF Addition, GF Multiplication, GF Square and GF Division. As a result, after implementing the algorithms, computational time for Scalar Multiplication has been reduced from 46.6 Âµs [1] to 14.6 Âµs which is about 68.67% faster.In order to further accelerate the computation process more efficient algorithms are required perform the field arithmetic operations and Point/Scalar Multiplicationoperation. At the circuit level, some optimizations are required which will even accelerate the process.

Simulation Result

ModelSimwas used to simulate the design. Figure 11 and

12 gives a full view of the simulation of Scalar Multiplicationand Point Addition, respectively.

Figure 11: Simulation Scalar multiplication

Figure 12: Simulation Point Addition

ACKNOWLEDGMENT

Authors would like to express their gratitude towards Almighty for giving them the opportunity to do this work. They would also like to thank Dr. Arshad M. Chowdhury, Chairman of the ECE Department at North South University.

REFERENCE

[1]. MubarekKedir (2008) Hardware Acceleration of Elliptic Curve Based Cryptographic Algorithms: Design and Simulation

[2]. D.Hankerson, A.J.Menezes, and S.Vanstone, Guide to Elliptic Curve Cryptography, Springer-Verlag, 2004.

[3]. F.Rodriguez-Henriquez, N.A.Saqib, A.Diaz Perez, and Q.K.Koq, Cryptographic Algorithms on Reconfigurable Hardware, Springer, 2007.

[4]. I.VBlake, G.Seroussi, and N.Smart, Elliptic Curves in Cryptography, Cambridge University Press, 2002.

[5]. N.Koblitz, A course in Number Theory and Cryptography,Springer, 1994.

[6]. Ch.-H.Wu, Ch.-M.Wu, M.-D.Shieh, and Y.-T.Hwang, "Novel Algorithms and VLSI Design for Division over GF(2m)", IEICE Transactions Fundamentals, vol.E85-A, no 5, May 2002, pp. 1129-

1139.

[7]. J.-P.Deschamps and G.Sutter, "Hardware Implementation of Finite- Field Division", ActaApplicandaeMathematicae, vol.93, no 1-3, September 2006, pp.119-147.

[8]. J.-P.Deschamps, "Squaring over GF(2163)", UniversityRovira i Virgili, Inter Research Report DESS21, Oct 2006.

[9]. Paar, C., &Pelzl, J. (2009). Understanding cryptography a textbook for students and practitioners. Berlin: Springer.

[10]. NIST Koblitz Curves Parameters. (n.d.). Retrieved May 16, 2015, from http://en.wikisource.org/wiki/NIST_Koblitz_Curves_Parameters

[11]. Deschamps, J., & A, J. (2009). Hardware implementation of finite-

field arithmetic. New York: McGraw-Hill.

Hardware Acceleration of Elliptic Curve Cryptographic (ECC) Scalar Multiplication Unit over Binary Polynomial based Galois Field GF (2m) using Verilog HDL

POINT_ADDITION

Control Unit

Leave a Reply