 Open Access
 Total Downloads : 191
 Authors : Nahala Basheer, Ms. Nisha Lali R, Mr. Rajeev S K
 Paper ID : IJERTV2IS50088
 Volume & Issue : Volume 02, Issue 05 (May 2013)
 Published (First Online): 25052013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Fault Detection For AES Cryptographic Design
Nahala Basheer1
PG Scholar
ECE Department
Younus College of Engineering and Technology
Kerala
Ms. Nisha Lali R2
Asst. Professor ECE Department
Younus College of Engineering and Technology
Kerala
Mr. Rajeev S K3 Head of the Department ECE Department
Younus College of Engineering and Technology
Kerala
Abstract Cryptography is a method that has been developed to ensure the secrecy of messages and transfer data securely. Advanced Encryption Standard (AES) has been made as the first choice for many critical applications because of the high level of security and the fast hardware and software implementations, many of which are power and resource constrained and requires reliable and efficient hardware implementations. Naturally occurring and maliciously injected faults reduce the reliability of Advanced Encryption Standard (AES) and may leak confidential information. In this paper, a lightweight concurrent fault detection scheme for the AES is presented. In the proposed approach, the composite field Sbox and inverse Sbox are divided into blocks and the predicted parities of these blocks are obtained. For high speed applications, Sbox implementation based on lookup tables is avoided. Instead, logic gate implementations based on composite fields are utilized. A compact architecture for the AES Mix columns operation and its inverse is also presented. This paritybased fault detection scheme reaches the maximum fault coverage when compared to other methods of fault detection. The proposed fault detection technique for AES encryption and decryption has the least area and power consumption compared to their counterparts with similar fault detection capabilities.
Index terms AES, composite fields, parity prediction, fault detection, Sbox.

INTRODUCTION
The Advanced Encryption Standard (AES) has been accepted by NIST [1] as the symmetric key standard as a replacement for the previous standards because of its good characteristics in terms of security, cost, and efficient implementations for encryption and decryption of blocks of data. In encryption, under the influence of a key, a 128bit block is encrypted by transforming it in a unique way into a new block of the same size. AES is symmetric since the same key is used for encryption and the reverse transformation, decryption. The only secret necessary to keep for security is the key. AES may be configured to use different keylengths, the standard defines 3 lengths
and the resulting algorithms are named AES128, AES192 and AES256 respectively to indicate the length in bits of the key. After 10 rounds, the cipher text is generated where each encryption round (except for the final round) consists of four transformations. The four transformations of round of encryption are explained below.
The 128 bits of input (and output) of each transformation are considered as a four by four matrix, called state, whose entries are eight bits. Except for the last round, the first transformation in each round is the bytes substitution, called SubBytes, which is implemented by 16 Sboxes. ShiftRows is the second transformation in which the four bytes of the last three rows of the input state are cyclically shifted. The third transformation is Mixcolumns in which the columns are considered as polynomials over GF(28) and multiplied by a fixed polynomial. The final transformation is AddRoundKey in which a roundkey is added to the input by 128 twoinput XOR gates.
Among the transformations in the AES, the Sboxes in the encryption and the inverse Sboxes in the decryption are alone nonlinear. Fault detection in the AES hardware implementation is important in order to make the standard robust to the internal and malicious faults. There exists various fault detection schemes for the AES hardware implementation. For fault detection of the encryption or decryption in AES redundant units may be used [12], [14], where algorithmlevel, roundlevel and operationlevel concurrent error detection for the AES is used. A number of fault detection schemes based on the error detecting codes, also exists. For high performance AES implementations, using ROMs may not be preferable. The proposed fault detection approach is applied to the composite field AES encryption and decryption. There exist a number of fault detection approaches which are specific to composite field Sboxes and inverse Sboxes. In the scheme of [13], the fault detection of the multiplicative inversion of the Sbox is considered. In [12], predicted parities have been used for the multiplicative inversion of a specific S box using composite field and polynomial basis. Furthermore, the transformation matrices are also considered. In [12] and [6], the composite field S boxes and inverse Sboxes (using polynomial basis) have been divided into subblocks and parity predictions are used for their fault detection.
In the schemes proposed in [15] and [22], all the search space of composite fields is considered for presenting optimum lightweight fault detection schemes. The scheme presented in [8] is for all the transformations in the AES encryption/decryption independent of the ways these transformations are implemented. Moreover, the scheme presented in [7] uses doubledatarate computation for counteracting
the fault attacks. Additionally, a fault detection scheme based on the Hamming and ReedSolomon codes for protecting the storage elements within the AES is proposed in [11]. It is also noted that, for the logic elements, the scheme in [2] and the use of the partial duplication of the most vulnerable elements are proposed in [11].
Fig. 1. The Sbox (the inverse Sbox) using composite fields and polynomial basis and their fault detection blocks.
Fig. 2. The Sbox (the inverse Sbox) using composite fields and normal basis and their fault detection blocks.
All the Sboxes (respectively the inverse Sboxes) occupy much of the total AES encryption (respectively decryption) area and their power consumption is around three fourths of that of the entire AES [16]. LUTs can be utilized for the AES S boxes and inverse Sboxes in hardware implementation. This work involves lowarea implementation of the AES encryption and decryption using composite fields.
The contributions of this paper are as follows.

The Sbox and the inverse Sbox has been designed to obtain low power and low area.

An alternative lightweight design for both forward and inverse mixcolumns operation required in the AES hardware implementation is also presented.

A lowcost paritybased fault detection scheme for the Sbox and the inverse Sbox using composite fields is presented, for increasing the error coverage. The predicted parities of the five blocks of the Sbox and the inverse Sbox are
multiplicative inversion and two for the transformation and affine matrices).

The actual parity is obtained from the blocks using XOR gates. The predicted parity is compared with the actual parity. The error gets indicated using the error indication flag.

The proposed fault detection scheme is simulated and maximum error coverage is obtained compared to existing methods.
It is shown that the power and area of the proposed technique is least compared to the schemes that have the same fault detection capabilities.


SBOX AND INVERSE SBOX IN COMPOSITE FIELDS
=0
=0
In this section, the Sbox and the inverse Sbox operations and their compositefield ealizations are described. The Sbox and the Inverse Sbox are nonlinear operations which take 8bit inputs and generate 8bit outputs. In the Sbox, the irreducible polynomial of P(x) = x8 + x4 + x3 + x +1 is used to
obtained (three predicted parities for the
construct the binary field GF(28). Let = 7 ii
=0
=0
(28) and = 7 ii (28) be the input and
the output of the Sbox, respectively, where is a root of P(x), i.e. P()=0. Then, the Sbox consists of the multiplicative inversion, i.e., X1 GF(28), followed
by an affine transformation. Moreover, let Y1
defined by the following matrix multiplication on State.
02 03 01 01 0,0 s0,1 0,2 0.3
01 02 03 01
01 02 03 01
1.0 1,1 1,2 1,3
GF(28) and X1 GF(28) be the input and the output of the Inverse Sbox, respectively. Then, the Inverse Sbox consists of an inverse affine transformation followed by the multiplicative inversion.
01 01 02 03
03 01 01 02
2,0 2,1 2,2 2,3
3,0 3,1 3,2 3,3
0,0 s0,1 0,2 0.3
1.0 1,1 1,2 1,3
= 2,0 2,1 2,2 2,3
3,0 3,1 3,2 3,3
The composite fields can be represented using normal basis or polynomial basis. The Sbox and inverse Sbox for the polynomial and normal bases are shown in Figs. 1 and 2, respectively. For the Sbox using polynomial basis (respectively normal basis), the transformation matrix (respectively ') transforms a field element X in the binary field GF(28) to the corresponding representation in the composite fields GF(28) / GF(24). It is noted that the result of X= h u+ l is obtained using the irreducible polynomial
u2+ u + for polynomial basis method in Fig.1 and
X= 'h u16+ 'lu is obtained using the irreducible polynomial u2+ 'u + for normal basis method in Fig.2.
The multiplicative inversion in Fig.1 consists of composite field multiplications, additions and an inversion in the subfield GF(24) over GF(2) / x4 + x + 1.The decomposition can be further applied to represent GF(24) as a linear polynomial over GF(22) and then GF(2) using the irreducible polynomial 2++ and w2+w+1, respectively. As a result, it is understood that the implementation of the multiplicative inversion can be performed using the field represented by GF((24)2) or the field represented by GF(((22)2)2). For normal basis, the decomposition is performed using the irreducible polynomials of 2 + ' + ' and w2 + w + 1, respectively.
For calculating the multiplicative inversion, the most efficient choice is to let = = 1 (' = ' = 1) in the above irreducible polynomials. Then, the multiplicative inversion of the Sbox using polynomial basis and normal basis are respectively,
h
h
( h u + l )1 = [(( h+ l ) + 2 )1 h] u
2 1
2 1
+ (( h+ l ) + h ) ( h+ l )
(1)
2 2 1
2 2 1
2 2 1
2 2 1
( 'h u16 + 'l u )1 = [( 'h 'l + ('h + 'l ) ') ) 'h] u16
+ [( 'h 'l + ('h + 'l ) ') ) 'l] u
(2)
It is noted that one can replace (') with (') to obtain (1) and (2) for the inverse Sbox.

MIX COLUMN IMPLEMENTATION USING
POLYNOMIALS
The forward mix column transformation (in encryption process), called mix columns, operates on each column individually. Each byte of a column is mapped into a new value that is a function of all four bytes in that column. The transformation can be
Each element in the product matrix is the sum of products of elements of one row and one column. In this case, the individual additions and multiplications are performed in GF (28). The mix columns transformation on a single column j (0 j 3) of State can be expressed as
0, = 2 0, 3 1, 2, 3,
1, = 0, 2 1, (3*2, ) 3,
2, = 0, 1, 2 2, 3 3,
3, = 3 0, 1, 2, (2 3, )
(3)
As mix columns only requires multiplication by {02} and {03}, which, as we have seen, involved simple shifts, conditional XORs, and XORs. This can be implemented in a more efficient way that eliminates the shifts and conditional XORs. Equation Set (3) shows the equations for the mix columns transformation on a single column. Using the identity
{03} x = ({02} x) x, we can rewrite equation Set
(3) as follows:
= 0, 1, 2, 3,
0, = 0, [ 2 0, 1, ]
1, = 1, [ 2 1, 2, ]
2, = 2, [ 2 2, 3, ]
3, = 3, [ 2 3, 0, ]
(4)
Multiplication by 02 equivalents to multiply by x [2]. The gate count of this implementation (using combinational circuits only) is as shown in fig.(4) is as follows: 8 XORs to calculate ( s0,j s1,j) in equation (4.1), so 32 XORs are required for the same calculations in equations 4.
Additional 8 XORs are needed to calculate Tmp. 3 XORs are required to calculate 2*(s0,j s1,j) in equation (4.1) so we need 12 XORs for the same calculations in equations 4. Finally we need an 8 XORs (with 3 inputs) OR 16 XORs (with 2 inputs) to calculate (s0,j) in equation (4.1), so we need 32 XORs (with 3 inputs) OR 64 XORs (with 2 inputs) to calculate equations 4. Finally we can implement Forward mix columns transformation using 32+8+12+64 = 116 XORs with 2 inputs, OR (52 XORs with 2 inputs + 32 XORs with 3 inputs with total 84 XORs).
Fig. 3. Forward mix columns operation
Additional 8 XORs are needed to calculate Tmp.
3 XORs are required to calculate 2*(s0,j s1,j) in equation (4.1) so we need 12 XORs for the same calculations in equations 4. Finally we need an 8 XORs (with 3 inputs) OR 16 XORs (with 2 inputs) to calculate (s0,j) in equation (4.1), so we need 32 XORs (with 3 inputs) OR 64 XORs (with 2 inputs) to calculate equations 4. Finally we can implement Forward mix columns transformation using 32+8+12+64 = 116 XORs with 2 inputs, OR (52 XORs with 2 inputs + 32 XORs with 3 inputs with total 84 XORs).
In fig. 3, the block labeled Mul by (2) means multiply its input by 2 using the implementation shown in [2]. Each arrow represent 8 bits and each block such as s1,j represent 8 wires holds values of s1,j. The inverse mix column transformation (in decryption process), called InvMix Columns, is defined by the following matrix multiplication:
0 0 0 09 0,0 s0,1 0,2 0.3
09 0 0 0
09 0 0 0
1.0 1,1 1,2 1,3
0, = 0, 2 0, 2, [2
0, j 1, ]
1, = 1, 2 1, 3, [2
1, j 2, ]
2, = 2, 2 0, 2, [2
2, j 3, ]
3, = 3, 2 1, 3, [2
3, j 0,
(6)
As shown in fig. (4) the gate count of this implementation (using combinational circuits only) is as follows: We need 8 XORs to calculate ( s0,js1,j ) in equation (6.1), so 32 XORs are required for equations set 6. We need 3 XORs to calculate 2*( s0,js1,j ) in equation (6.1), so 12 XORs are required for the same calculations in equations 6. Additional 8 XORs are required to calculate ( s0,js2,j ) in equation (6.1), so we need 16 XORs for the same calculations in equations 6.
We need 16 XORs for the same calculations in equations 6. We need additional 3 XORs to calculate 2*(s0,js2,j) in equation (6.1), so 6 XORs are required for the same calculations in equations 6. We need additional 3 XORs to calculate 2*(2*(s0,j s2,j)) in equation (6.1) so 6 XORs are required for the same calculations in equations 6. We need additional 3 XORs to calculate 2*(2*(2*( s0,js2,j ))) in equation (6.1), so 6 XORs are required for the same calculations in equations 6. Additional 8 XORs are required to calculate 09*(s0,j s2,j), 8 XORs to calculae 09*(s1,j s3,j), and 8 XORs to calculate Tmp. Finally we need 24 XORs to calculate s'0,j in equation (6.1), and 96 XORs for the same calculations in equations 6. Implementing inverse mix columns
transformation uses 32+12+16+6+6+6+16+8+96 =
0 09 0 0
0 0 09 0
2,0 2,1 2,2 2,3
3,0 3,1 3,2 3,3
0,0 s0,1 0,2 0.3
1.0 1,1 1,2 1,3
= 2,0 2,1 2,2 2,3
3,0 3,1 3,2 3,3
198 XOR. Implementing forward and inverse mix columns transformation uses 116 +198 = 314 XOR gates.
Each element in the product matrix is the sum of products of elements of one row and one column. In this case, the individual additions and multiplications are performed in GF (28). The mix Columns transformation on a single column j (0 j 3) of State can be expressed as:
0, = 0 0, 0 1, 0 2, (09
3, )
1, = 09 0, 0 1, 0 2, (0
3, )
2, = 0 0, 09 1, 0 2, (0
3, )
3, = 0 0, 0 1, 09 2, (0
3, ) (5)
Equation set (5) is formulated to simplify its hardware implementation as follows:
= 09 (0, 1, 2, 3, )
Fig. 4. Inverse mix columns operation

FAULT DETECTION SCHEMES
The Sbox and the inverse Sbox structures are divided into five blocks as shown in Fig.1 and 2 to obtain the lowoverhead parities. In these figures, the modulo2 additions, consisting of 4 XOR gates, are shown by two concentric circles with a plus inside. Furthermore, the multiplications in GF (24) are shown by rectangles with crosses inside. Let bi be the output of block i denoted by dots in Fig.1 and 2 for Sbox. The outputs of the five blocks for Sbox using polynomial basis in Fig.1 are represented as b1 = h+ l , b2 = , b3 =, b4 = and b5 = Y. Similarly, for Fig.2 b1 = 'h+ 'l , b2 = ', b3 =', b4 = ' and b5 = Y. One can replace (') with (') and X with Y for the inverse Sbox. In the following, the least overhead parity are denoted by b1 – b5 in Figs. 1 and 2.
A. The SBox and the Inverse SBox using Polynomial Basis
The implementation complexities of different blocks of the Sbox and the Inverse Sbox and those for their predicted parities are dependent on the choice of the coefficients GF(24) and GF(22) in the irreducible polynomials u2 + u + and v2 + v + used for the composite fields. The goal in the following is to find GF(24) and GF(22) for the composite fields GF(((22)2)2) and GF(24) for the composite fields GF((24)2) so that the area complexity of the entire fault detection implementations becomes optimum. According to [19], 16 the possible combinations for GF(24) and GF(22) exist. Moreover, for the composite fields GF((24)2), the possible choices for making the polynomial x2 + x
+ irreducible has been exhaustively searched and found.
The blocks are explained below:
Blocks 1 and 5: Based on the possible values of and in GF(((22)2)2) ( in GF((24)2) ), the transformation matrices in Fig. 1 in blocks 1 and 5 of the Sbox and the inverse Sbox can be constructed using the algorithm presented in [21]. Using an exhaustive search, eight base elements in GF(((22)2)2)
( or GF((24)2) ) to which eight base elements of GF(28) are mapped, are found to construct the transformation matrix.
expression sharing is suggested for obtaining the low complexity implementations for the Sbox. Here, we have also considered these transformation matrices for = {11}2 as well as the transformation matrices for the inverse Sbox for different values of and and have derived their area and delay complexities. Moreover, the gate count and the critical path delay for blocks 1 and 5 and their predicted parities, i.e., b1 and b5, of the Sbox and the inverse Sbox in have been obtained.
Blocks 2 and 4: As shown in Fig. 1, block 2 of the S box and the inverse Sbox consists of a multiplication, an addition, a squaring and a multiplication by constant in GF((24)2) .The following lemma is presented for deriving the predicted parity of the multiplication in GF((24)2), using which the predicted parities of blocks 2 and 4 in Fig. 1 are obtained.
Lemma 1: Let = ( 3,2,1,0 ) and = ( 3,2,1,0) be the inputs of the multiplier in GF((22)2). The predicted parities of the result of the multiplication of and in GF((22)2) for = {10}2 and = {11}2 are as follows, respectively
= 3( 3+2+0 )+ 2( 3+2+0 )+ 1( 2+0 )+ 0( 3+ 0 +2+0) if = {10}2 (7)
= 3( 3 +0 )+ 2( 2+1+0 )+ 1( 2+0 )+
0( 3+ 0 +2+0) if = {11}2 (8)
The predicted parity of block 2 of the Sbox and the inverse Sbox, i.e., in Fig.1, depends on the choice of the coefficients GF ((22)2) and GF(22) []. Using Lemma 1, we have derived the complexity of the predicted parity of block 2 for these coefficients. Furthermore, for block 4 in Fig.1, which consists of two multiplications in GF ((22)2), one can also use Lemma 1 to derive the predicted parity. For block 2 of the Sbox (respectively the inverse Sbox) over GF((24)2) in Fig. 1, only the multiplication by constant is affected for different values of s. For this block, we have exhaustively searched for and obtained the optimum implementation for different values of s. Moreover, block 4 in Fig. 1 is independent of the value of . Therefore, the complexity of the predicted parity for this block is the same for all possible s.
Block 3: We present the following theorem for block 3 of the
Sbox and the inverse Sbox over GF((22)2) in Fig. 1.
Theorem 1: Let = (3,2,1,0) be the input and =(3,2,1,0 ) be the output of an inverter in GF((22)2). The predicted parities of the result of the inversion in GF((22)2), i.e., b3 , for = {10}2 and =
{11}2 are as follows, respectively
In [22], the Hamming weights, i.e., the number of nonzero elements, of the transformation matrices for the case = {10}2 and different values of in
GF(((22)2)2) are obtained. It is noted that in [21],
instead of considering the Hamming weights, sub
(9)
(10)
= (21)0+(1+0)3 if = {10}2
= (2 1 0) + 3 1 if = {10}2
where, represents OR operation. It is noted that the inversion in GF(24) [] in Fig. 1 is independent of the value of . Therefore, the complexity of the predicted parity for this block is the same for any possible s.

The SBox and the Inverse SBox Using Normal Basis
The optimum fault detection Sbox using normal basis in Fig. 2 is derived. Here an exhaustive search for finding the optimum predicted parities based on the choice of the coefficients ' GF(24) and ' GF(22) and for the five blocks of the inverse Sbox using normal basis. We have exhaustively searched for the least overhead transformation matrices and their parity predictions combined for the inverse Sbox and have derived the total complexity for the predicted parities of blocks 1 and 5, i.e., b1 and b5, and the delays associated with them. These are used to obtain the optimum Sbox inverse Sbox and its parity predictions in this section. It is also noted that as shown in Fig. 2, blocks 2, 3, and 4 of the Sbox and the inverse Sbox are the same. Therefore, considering [15], the predicted parities of these blocks can be obtained for the inverse Sbox.

Optimum parity prediction techniques

For polynomial basis:
Considering the discussions presented for different combinations of and for polynomial basis, the following optimum parity prediction technique is presented.
The fault detection SBox using composite fields
GF(((22)2)2) has the least area complexity for =
having = {1001}2 yields to the lowest area complexity architecture. It is noted that blocks 3 and 4 have the same predicted parities as the Sbox by swapping ad . For other blocks of the optimum inverse Sbox (PB1) we have:
b1= x0
b2 = 3(7+4)+2(7+ Ph)+1(6+4)+0 P h+6+7
b3=(210)+13
b4 =3(3+0)+2(P+3)+1(2+0)+0P
b5 =7+5+3+2+0
Additionally, for the optimum inverse Sbox (PB2) we have:
b1=x 7+x 0
b2 = 34+2(5+4)+1(P h+7)+0 P h+ Ph +4
b3= 320+0(1( 2+ 3))
b4 =30+2(1+0)+1(P+3)+0P
b5 = 0

For normal basis:
For different combinations of ' and ' for normal basis, for the Sbox and the inverse Sbox, ' = {10}2 and ' = {0001}2 have the least area for the operations and their fault detection circuits combined. The following is the predicted parities for the Sbox:
b1= x7+x5
b2 = ('7'3)+('6'2)+('4 '0)+'5'1
b3= '2'0('3+ '1)+ '3 '1('2+ '0)
= (' +' ) ' +(' +' ) ' +(' +' )
{11}2 and = {1010}2. For this optimum SBox (PB1),
b4 7 3
'1+('4+'0) '0
3 6 2
2 5 1
the following predicted parities for the five blocks in Fig.1 are as given below:
b1= x0
b2 = 3( 7+4)+2( 7+ Ph)+1( 6+4)+0 P h+6+7
b3=(210)+13
b4 =3(3+0)+2(P+3)+1(2+0)+0P
b5 =7+5+3+2+0
where, h=7+6+5+4 and =3+2+1+0. Additionally, among all the possible values for using composite fields GF ((24)2), = {1010}2 yields to the least complexity architecture for the optimum S box (PB2), respectively. Then, for the Sbox we have:
b1= x7+x0
b2 = 34+2(5+4)+1(P h+7)+0 P h+ Ph +4
b3= 320+0(1( 2+ 3))
b4 =30+2(1+0)+1(P+3)+0P
b5 =4+3+2+1+0
Furthermore, for the inverse Sbox the following method is used. For the inverse Sbox using composite field GF(((22)2)2), choosing = {11}2 and = {1011}2 and for the one using composite field GF((24)2)
b5 ='7+'5+'4+'3+'2
Moreover, for the inverse Sbox, b2 – b4 are the same as those for the Sbox by swapping ' and '. For the other blocks, the predicted parities are given as: b1= y7+y6+ y2+y1 and b5 ='7+'5+'4+'3+'2.
It is noted that the area overhead of the proposed scheme for the optimum structures consists of those of the optimum parity predictions. In addition, 23 XORs for the actual parities (3 XORs for adding the coordinates of each of 'h+'l, ' and ' and 7 XORs each for those of ' and Y ) are utilized. Moreover, the delay overhead of the predicted parities of five blocks can overlap the delays for the implementations of five blocks in Figs. 1 and 2. The only delay overhead for this scheme is the delay of the actual parity of block 5, which is 3TX, where TX, is the delay of an XOR gate.


Error indication
In order to develop a fault detection structure, the predicted parity can be compared with the actual parity in order to obtain the error indication flag of the corresponding block . By ORing five indication flags of five blocks, the error indication of the entire Sbox is obtained [15].


SIMULATION RESULTS
First the Sbox and the inverse Sbox using composite fields is constructed for low area and low power dissipation. Also the time delay is reduced compared to other techniques. Here SBox and inverse SBox are constructed using both polynomial and normal basis. Then, single StruckAtFaults have been introduced to the Sbox and the Inverse Sbox and the corresponding output simulation is obtained. After that the circuit is tested for multiple StruckAtFaults. If exactly one bit error appears at the output of the Sbox (respectively inverse Sbox), the presented fault detection scheme is able to detect it and the error coverage is about 100%. This is because in this case, the error indication flag of the corresponding block alarms the error. However, due to the technological constraints, single stuckat error may not be applicable for an attacker to gain more information [23]. Thus, multiple bits will actually be flipped and hence multiple stuckat errors are also considered in this paper covering both natural faults and fault attacks [23].
Here a lightweight Mixcolumns is also implemented using logic gates. The total number of gates required for implementing mix columns operation in the proposed design is 116+198 =314 XOR gates[],[]. Since our design is implemented using combinational circuits only, each resultant mix column takes a single clock cycle. The proposed mix column implementation takes four clock cycles compared to 28 clock cycles in [4]. The circuit for both AES encryption and decryption is designed. Xilinx ISE 8.1 and ModelSim are the simulation tools used here. The target device is XC2S600E. Finally, the error coverage has been calculated from the obtained results. The design is also simulated for power, delay and area calculations. From the simulation result the following is inferred.
Operation
Architecture
No. of 4input LUTs
No. of slices
SBox
LUT
250
158
PB
87
31
NB
83
31
Inverse SBox
LUT
250
158
PB
84
31
NB
73
31
Operation
Architecture
No. of 4input LUTs
No. of slices
SBox
LUT
250
158
PB
87
31
NB
83
31
Inverse SBox
LUT
250
158
PB
84
31
NB
73
31
TABLE I COMPARISON OF LUTS AND SLICES

Low area and Low power
From the synthesis report, the number of LUTs and slices needed to design the Sbox and the Inverse S
box is calculated. Table I gives the comparison of the number of LUTs and slices used for the design of S box and Inverse Sbox using various techniques.
From the Table I the number of LUTs and Slices used for Sbox and Inverse Sbox using composite fields is less when compared to Sbox based on LUTs
.
Table II illustrates the comparison results based on simulation in terms of power.
TABLE II COMPARISON OF POWER
Operation
Architecture
Power (mW)
SBox
LUT
56
PB
34
NB
34
Inverse SBox
LUT
56
PB
34
NB
34

Fault detection
The proposed architecture for the Sbox and Inverse Sbox is able to find all the single StruckAt faults. Faults are injected randomly on the input and output nodes of the logic gates. In the case of multiple StruckAt faults in Sbox, also the faults have been identified. We have performed error simulations for the Sboxes and the inverse Sboxes using the optimum composite field obtained in the previous section to confirm our above theoretical computation. In our simulations, we use stuckat error model at theoutputs of the five blocks forcing one or multiple nodes to be stuck at logic one (for stuckat one) or zero (for stuckat zero) independent of the errorfree values. We use Fibonacci implementation of the LFSRs for injecting random multiple errors, where, the numbers, the locations and the types of the errors are randomly chosen. In this regard, the maximum sequence length polynomial for the feedback is selected. The injected errors are transient, i.e., they last for one clock cycle. However, the results would be the same if permanent errors are considered. The results of the error simulation [] is shown in Table III. It is noted that in these tables, the optimum polynomial basis (PB) and normal basis (NB) are presented. As shown in the table, using five parity bits of the five blocks, the error coverage for random faults reaches 97% which is the same as our theoretical computation in this section. This error coverage will be increased if the outputs of more than one Sbox (respectively inverse Sbox) of the AES implementation are erroneous. In this case, the errors detected in any of 16 Sboxes (respectively inverse S boxes) contribute to the total error coverage. Thus, error coverage of very close to 100% is achieved.
The optimum Sbox and inverse Sbox using normal basis have the least hardware complexity with the fault detection scheme. Moreover, the optimum structures using composite fields and polynomial basis have the least post place and route timing overhead among other schemes. It is noted that using sub pipelining for the presented fault detection scheme in this paper, one can reach much faster hardware implementations of the composite field fault detection structures. The AES encryption and decryption presented here using composite fields and forward mix columns method has least area compared to its counterparts.
TABLE III
ERROR SIMULATION RESULTS
Operation
Architecture
Error coverage
SBox (Inverse SBox)
PB
97.008
NB
97.003


CONCLUSION
In this paper, low power AES encryption and decryption has been designed. Parity based fault detection scheme for the low power Sbox and the Inverse Sbox are presented in order to find the faults in the hardware implementation of the Sbox and the Inverse Sbox. Instead of using the lookup table approach for the implementation of the Sbox and its parity prediction, the composite field arithmetic with logical gates is used. Simulation results show that very high error coverage for the presented scheme is obtained when compared to other fault detection schemes like those based on LUTs and redundant units. Also low power and low area is achieved when compared to previous methods. An alternative lightweight design for both forward and inverse mix columns operation required also included in the AES hardware implementation. The comparisons indicate that the proposed mixcolumn design have less complexity than previous relevant work in gate size and no. of clock cycles.
ACKNOWLEDGMENT
The authors would like to thank the management, and Faculty Members of Department of Electronics and Communication Engineering, Younus College of Engineering and Technology, Kollam, Kerala for many insightful discussions and the facilities extended for completing the task.
REFERENCES

Mehran MozaffariKermani and Arash ReyhaniMasoleh, A lightweight HighPerformance Fault Detection Scheme for the Advanced Encryption Standard Using Composite Fields IEEE Trans. On Very Large Scale Integration (VLSI) Syestems, vol. 19, no. 1, pp. 85591,January 2011.

National Institute of Standards and Technologies, Announcing the Advanced Encryption Standard (AES) FIPS 197, Nov. 2001.

R. Karri, K. Wu, P. Mishra, and K. Yongkook, Faultbased sidechannel cryptanalysis tolerant Rijndael symmetric block cipher architecture, in Proc. DFT, Oct. 2001, pp. 418426..

R. Karri, K. Wu, P. Mishra, and Y. Kim, Concurrent error detection schemes for faultbased sidechannel cryptanalysis of symmetric block
ciphers, IEEE Trans. Comput.Aided Des. Integr. Circuits Syst., vol. 21, no. 12, pp. 15091517, Dec. 2002.

G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri,
Error analysis and detection procedures for a hardware implementation of the advanced encryption standard, IEEE Trans. Computers, vol. 52, no. 4, pp. 492505, Apr. 2003.

G. Bertoni, L. Breveglieri, I. Koren, P. Maistri and V. Piuri,
A parity code based fault detection for an implementation of the advanced encryption standard, Proc. of IEEE Intl S mp., Defect and Fault Toler n e in VLSI S stems (DFT 02), pp. 5159, Nov. 2002.

M. Mozaffari Kermani, Fault Detection Schemes for High Performance VLSI Implementations of the Advanced Encryption Standard, M.E.Sc. Thesis, Department of Electrical and Computer Engineering, The University of Western Ontario, London,Ontario, Canada, April 2007.

M. MozaffariKermani and A. ReyhaniMasoleh,
Concurrent StructureIndependent Fault Detection Schemes for the Advanced Encryption Standard, IEEE Trans. Computers, vol. 59, no. 5, pp. 608622, May 2010.

C. H. Yen and B. F.Wu, Simple error detection methods for hardware implementation of advanced encryption standard, IEEE Trans. Computers, vol. 55, no. 6, pp. 720731, Jun. 2006.

G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri,
A parity code based fault detection for an implementation of the advanced encryption standard, in Proc. DFT, Nov. 2002, pp. 5159.

G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri,
Error analysis and detection procedures for a hardware implementation of the advanced encryption standard, IEEE Trans. Computers, vol. 52, no. 4, pp. 492505, Apr. 2003.

C. Moratelli, F. Ghellar, E. Cota, and M. Lubaszewski, A faulttolerant DFAresistant AES core, in Proc. ISCAS, 2008, pp. 244247.

M. MozaffariKermani and A. ReyhaniMasoleh, Parity based fault detection architecture of Sbox for advanced encryption standard, in Proc. DFT, Oct. 2006, pp. 572580.

S.Y. Wu and H.T. Yen, On the Sbox architectures with concurrent error detection for the advanced encryption standard, IEICE Trans. Fundam. Electron., Commun. Comput. Sci., vol. E89A, no. 10, pp. 25832588, Oct. 2006.

R. Karri, K. Wu, P. Mishra, and K. Yongkook, Faultbased SideChannel Cryptanalysis Tolerant Rijndael Symmetric Block Cipher Architecture, ro . IEEE Intl Symp. Defect and Fault Toler n e in VLSI S stems (DFT 01), pp. 418426,
Oct.2001

M. MozaffariKermani and A. ReyhaniMasoleh, A lightweight concurrent fault detection scheme for the AES S boxes using normal
basis, in Proc. CHES, Aug. 2008, pp. 113129.

D. Canright, A very compact Sbox for AES, in Proc. CHES, Aug. 2005, pp. 441455.

A. Satoh, S. Morioka, K. Takano, and S. Munetoh, A compact Rijndael hardware architecture with Sbox optimization, in Proc. ASIACRYPT, Dec. 2001, pp. 239 254.

J.Wolkerstorfer, E. Oswald, and M. Lamberger, An ASIC implementation of the AES SBoxes, in Proc. CTRSA, 2002, pp. 6778.

V. Rijmen, Dept. ESAT, Katholieke Universiteit Leuven, Leuven, Belgium, Efficient Implementation of the Rijndael SBox, 2000.

X. Zhang and K. K. Parhi, Highspeed VLSI architectures for the AES algorithm, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. VLSI12, no. 9, pp. 957967, Sep. 2004.

X. Zhang and K. K. Parhi, On the optimum constructions of composite field for the AES algorithm, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 10, pp. 11531157, Oct. 2006.

N. Mentens, L. Batina, B. Peneel, and I. Verbauwhede, A systematic evaluation of compact hardware implementations
for the RijndaelSbox, in Proc. CTRSA, Feb. 2005, pp. 323333.

L. Breveglieri, I. Koren, and P. Maistri, An operation centered approach to fault detection in symmetric cryptography ciphers, IEEE Trans. Computers, vol. C56, no. 5, pp. 534540, May 2007.

Xilinx [Online]. Available: http://www.xilinx.com/