- Open Access
- Total Downloads : 1275
- Authors : Kirat Pal Singh, Shivani Parmar
- Paper ID : IJERTV1IS3044
- Volume & Issue : Volume 01, Issue 03 (May 2012)
- Published (First Online): 30-05-2012
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Design of High Performance MIPS Cryptography Processor Based on T-DES Algorithm
Design of High Performance MIPS Cryptography Processor Based on T-DES Algorithm
Kirat Pal Singp, Shivani Parmar2
1Research Fellow, 2Assistant Professor
1Academic and Consultancy Services Division, 2 ECE Department
1Centre for development of advanced computing (C-DAC), 2GGS College
1, 2 Mohali-160071, India
The paper describes the design of high performance MIPS Cryptography processor based on triple data encryption standard. The organization of pipeline stages in such a way that pipeline can be clocked at high frequency. Encryption and Decryption block s of triple data encryption standard (T-DES) crypto system and dependency among themselves are explained in detail with the help of block diagram. In order to increase the processor functionality and performance, especially for security applications we include three new 32-bit instructions LKLW, LK UW and CRYPT. The design has been synthesized at 40nm process technology targeting using Xilinx Virtex-6 device. The overall MIPS Crypto processor work s at 209MHz.
ALU, register file, pipeline, me mory, T -DES, throughput
odays digital world, Cryptography is the art and science that deals with the principles and methods
for keeping message secure. Encryption is e merging as a disintegrable part of all co mmunication networks and informat ion processing systems, involving tran smission of data. Encryption is the transformat ion of plain data (known as plaintext) into inintengible data (known as cipher text) through an algorithm referred to as cipher. MIPS arch itecture employs a wide range of applications. The architecture remains the same for all MIPS based processors while the imp le mentations may diffe r . The proposed design has the feature of 32- bit asymmet ric and symmetric c ryptography system as a security application. There is a 16- bit RSA cryptography MIPS cryptosystem have been previously designed . There a re the sma ll adjustments and minor imp rovement in the MIPS pipe lined a rchitecture design to protect data transmission over insecure med iu m using authenticating devices such as data encryption standard [DES], Trip le-DES and advanced encryption standard [AES] . These cryptographic devices use an identical key for the receiver side and sender side. Our design ma inly includes the symmet ric cryptosystem into MIPS pipe line stages. That is suitable to encrypt large amount data with high speed.
The MIPS is simply known as Millions of instructions per second and is one of the best RISC (Reduced Instruction Set Co mputer) processor ever designed. High speed MIPS processor possessed Pipeline architecture for speed up processing, increase the frequency and performance of the processor. A MIPS based RISC p rocessor was described in . It consist of basic five stages of pipelining that are Instruction Fetch, Instruction Decode, Instruction Execution, Me mory access, write back. These five pipeline stages generate 5 clock cycles processing delay and several Haza rd during the operation . These pipelin ing Ha zard are e liminates by inserting NOP (No
Operation Performed) instruction which generate some delays for the proper execution of instruction . The pipelin ing Haza rds are of three types data, structural and control hazard. These hazards are handled in the MIPS processor by the imple mentation of forward ing unit, Pre-fetching or Haza rd detection unit, branch and ju mp prediction unit . Forward ing unit is used for preventing data hazards which detects the dependencies and forward the required data fro m the running instruction to the dependent instructions . Stall are occurred in the pipelined architecture when the consecutive instruction uses the same operand of the instruction and that require more c lock cycles for e xecution and reduces performance. To overco me this situation, instruction pre-fetching unit is used which reduces the stalls and improve performance. The control hazard are occurs when a branch prediction is mistaken or in general, when the system has no mechanis m for handling the control hazards . The control hazard is handled by two mechanisms: Flush mechanis m and Delayed jump mechanis m. The branch and jump pred iction unit uses these two mechanisms for preventing control hazards. The flush mechanism runs instruction after a branch and flushes the pipe after the misprediction . Frequent flushing may increase the clock cycles and reduce performance. In the delayed ju mp mechanism, to handle the control hazard is to fill the pipe after the jump instruction with specific numbers of NOPs . The branch and jump prediction unit placement in the pipelin ing architecture may affect the critical or longest path. To detecting the longest path and improving the hardware that resulting minimu m clock period and is the standard method of increasing the performance of the processor.
To further speed up processor and minimize c lock period, the design incorporates a high speed hybrid adder which e mploys both carry skip and carry select techniques with in the A LU unit to handle the additions. This paper is organized as follows. The system architecture hardware design and imple mentation are e xpla ined in Section II. Instruction s et of MIPS including new instructions in detail with corresponding diagra ms shown in sub-sections. Hardware imple mentation design methodology is explained in section III. The experimental results of pipeline stages are shown in section IV. Simu lation results of encrypted MIPS pipeline processor and their Verification & synthesis report are describes in sub sections. The conclusions of paper are described in
The global architecture of encrypted and decrypted pipelined processor is shown in Fig.1 which contain 5 basic pipeline stages that are Instruction Fetch [IF], Instruction Decode [ID], Instruction Execution [EXE], Memory Access [MEM], Write Back [WB]. These pipeline stages operate concurrently, using synchronization signals: Clock and Reset. MIPS
architecture e mploys three diffe rent Instruction format: R-Type Instruction, I-Type Instruction, and J-Type Instruction . MIPS processor for encryption/decryption we just insert the cryptography module such as data encryption standard (DES), Triple data encryption standard (T-DES), advanced encryption
standard (AES) etc. to the pipeline stage. Only single cryptographic module is used in same hardware imple mentation. In this design, we insert T-DES crypto core inside the instruction fetch stage and memory access stage for instructions fetching and data storing.
Figure 1. Block diagram of Encrypted/Decrypted MIPS processor
Encrypte d MIPS Processor
The 32-b it Encrypted MIPS processor is generally based on MIPS architecture. The pipelined MIPS architecture is modified in such a way that it executes encrypted instruction. The function of Instruction fetch unit is to obtain an instruction from the instruction me mo ry using the current value of PC and incre ment PC value fo r ne xt instruction and placed that value to IF register. The instruction fetch unit of encrypted MIPS contains program counter (PC), instruction me mory, T- DES decryption core and MUX. The instruction me mo ry read address from PC and store instruction value at the particular address that points by the PC. Instruction me mory sends encrypted instruction to MUX and decryption core. The decryption core give decrypted instructions and furher send to the MUX and output of MUX is fed to the IF register. The M UX control signal comes fro m control un it. The instruction decode unit contain register file and key register. Key register store the key data of encryption/decryption core. Key address and key data comes fro m write back stage. Once the key data to be stored into register file it will re main same for a ll progra m instruction execution. The control unit provides various control signals to other stages. The e xecute unit e xecutes the register file output data and perform the particular operation determined by the ALU. The ALU output data send to EXE reg ister. The me mory access unit contains T-DES encryption core, T-DES decryption core, data me mory, MUX, DEM UX. The second register data from register file fed to the encryption core and also to MUX. Here the crypt signal enable/disable encryption operation. The read/write signal of data me mo ry describes whether reading/writing operation is done. Output of data me mory pass through DEM UX and its one output go to decryption core and other goes to MEM reg ister. Here the unencrypted me mory data and d ecrypted data temporarily store to MEM register. The MEM output fed to write back data MUX and according to control signal the output of MUX goes to register file.
Decrypted MIPS Processor
The 32-bit decrypted MIPS processor is generally based on MIPS architecture. The pipelined MIPS architecture is modified in such a way that it executes
decrypted instruction. The instruction fetch unit of decrypted MIPS processor contains program counter (PC), instruction me mory, T -DES encryption core and MUX. The instruction me mory read address from PC and store instruction value at the particular address that points by the PC. Instruction memo ry sends decrypted instruction to both MUX and encryption core. The encryption core give encrypted instructions and further send to the MUX and output of MUX is fed to the IF register. The MUX control signal co mes fro m control unit. The instruction decode unit contain register file and key register. Key register store the key data of encryption/decryption core. Key address and key data comes fro m write back stage. Once the key data to be stored into register file it will re ma in same for all program instruction execution. The control unit provides various control signals to other stages. The e xecute unit e xecutes the register file outp ut data and perform the particu lar operation determined by the ALU. The A LU output data send to EXE register. The me mo ry access unit contains T-DES encryption core, T-DES decryption core, data me mory, MUX, DEMUX. The second register data from register file fed to the decryption core and also to MUX. He re the crypt signal enable/disable decryption operation. The read/write signal of data me mory describes whether reading/writ ing operation is done. Output of data me mo ry pass through DEMUX and its one output go to encryption core and other goes to MEM register. Here the unencrypted me mory data and encrypted data temporarily store to MEM register. The MEM output fed to write back data MUX and according to control signal the output of MUX goes to register file.
MIPS Instruction Set
The operational mode of the MIPS crypto processor is controlled by a RESET signal. When the RESET signal is at logic 0, the crypto processor is in the reset mode and the processing unit writes the me mo ry and register contents using the 32-bit bidirectional data bus, 10-bit address bus, and four control signals. The keys are kept in the key registers of the register file of crypto processor that are availab le to other stages of processor. The MIPS instruction set is straightforward like other RISC designs. MIPS are a load/store architecture, which means that only load and store instructions
access me mory. Other instructions can only operate on values in registers . Genera lly, the MIPS instructions can be broken into three classes: the me mory -refe rence instructions, the arithmetic – logica l instructions, and the branch instructions. Also, there are three different instructions formats in MIPS architecture: R-Type instructions, I-Type instructions, and J-Type instructions as shown in Fig. 2.
Figure 2. MIPS Instruction Type
The MIPS instruction field is described in Table 1. There are three more new instructions that support encrypted and decrypted operation. These instructions are load key upper word (LKUW), load key lower word (LKLW) and encryption mode (CRYPT). These instructions randomly used opcodes in the hardware imple mentation. LKLW and LKUW co me under I-type instruction and variant of load word (LW). These two instruction need not to specify a destination address in the assembly code. CRYPT instruction comes under J- type instruction and instead of address, only single argument i.e . Boolean value is to be assigned. This indicates enable/disable encryption and decryption process. Any nonzero value enables the encryption/decryption process and zero value disables the encryption process.
Table 1. MIPS Instruction filed
is a 6-bit operation code
is a 5-bit source register specifier
is a 5-bit target(source/destination)
register or branch condition
is a 16-bit immidiate, branch
displacement or address displacement
is a 26-bit jump target address
is a 5-bit destination register specifier
is a 5-bit shift amount
is a 6-bit function field
IMPLEMENTATION METHODOLOGY Current applications demand high speed processor for la rge amount of data transmission in real time. As compared to software alternatives, hardware imple mentation provides highly secure algorithms and fast solutions approaches for high performance applications. Software approaches could be a good choice but it has some limitations like low performance and speed. Main advantages of software are low cost and short time to market. But they are unacceptable in terms of high speed and performance specification. So that Hardware alternatives could be selected for
imple menting MIPS crypto processor architecture.
Hardwa re imp le mentation supports both Field Progra mmab le Gate Arrays (FPGAs) and Application Specific Integrated Circu its (ASIC) at h igh data rates. Such design has high performance but more time consuming and e xpensive as compared to software alternatives. The detailed comparison of hardware vs. software solutions for imp le menting the MIPS crypto processor architecture is shown in Table 2. Based on the comparison, hardware solution is a better choice in most of the cases because they have high performance. The ma in advantage of FPGA in hardware alternative, FPGA a re low density and low area consumption.
Logic integration, size and density are the ma jor drawbacks in ASIC but have higher performance than FPGA.
We use the T-DES Crypto core which supports both encryption and decryption. T-DES core has a 64-bit plainte xt input, three 64-bit key input, start signal, encryption/decryption enable signal, and 64-b it cipher text output. The 32-bit encrypted processor pack two 32-bit MIPS instruction into a single 64-b it instruction block for DES encryption/decryption process and breakout each individual instruction into the hardware. In this processor, there is some unencrypted instruction stored in data me mory as a zero-padded 64-bit word. The program counter increment by 8 instead of 4 due to loading of 64-bit instructions. Both data and instruction me mo ry reads 64-bit instruction at a time.
Table 2. Hardware vs.software alternatives for crypto processor
Before fetching the encrypted instruction the key loaded fro m me mory is to be done and there may be a dependency of instruction takes place that causes hazard. So to overcome this dependency we may use NOP instruction that deactivates the control signal of current instructions. The forwarding unit also modifying to give the load key instruction. The NOPs are sufficient to insert between load key and CRYPT instruction. This is e xpla ined by an exa mp le.
Exa mple :-
addi $r1, $r0, 104 lklw 0($r1)
addi $r1, $r1, 8 lkuw 0($r1) addi $r2, $r1, 8 lklw 0($r2) addi $r2, $r2, 8 lkuw 0($r2) addi $r3, $r0, 8 lklw 0($r3) addi $r3, $r3, 8 lkuw 0($r3) nop
nop crypt 1
addi $r1, $r0, 7 add $r2, $r0, $r0 addi $r3, $r0, 0 addi $r4, $r0, 0
Loop: add $r5, $r2, $r2
add $r5, $r5, $r5
add $r5, $r5, $r5 add $r5, $r5, $r3 lw $r6, 0($r5) add $r4, $r4, $r6 addi $r2, $r2, 1 slt $r7, $r2, $r1 j Loop
Exit: sw $r4, 56($r0)
The e xa mple shows the MIPS pipeline instructions in assembly. The first instruction loads the base address for the key. Second instruction loads the lower word of the key at same register in upper instruction. Third instruction increment the base address of the key. Fourth instruction loads the upper word of the key. Similarly ne xt 8 instructions stored key data to particular reg isters. Ne xt four instructions are NOP which indicates delay of two clock cycles for key to be loaded. CRYPT 1 instruction enables encryption process. Further, Ne xt instructions are the simple MIPS program instructions that gives the output data which are stored in me mory location 56. All these MIPS instructions after CRYPT are encrypted if used in encrypted MIPS processor and decrypted if used in decrypted MIPS processor.
EXPERIMENTAL RESUL TS
The complete pipeline processor stages are modelled in VHDL. The syntax of the RTL design is checked using Xilin x tool. For functional verificat ion of the design the MIPS processor is modelled in Ha rdware descriptive language. The design is verified both at a block leve l and top level. Test cases for the block level are generated in VHDL by both directed and random way. Result shows the corresponding symbol and an architectural body in the RTL v iew. For top level verification assembly program a re written and the corresponding hex code fro m the assembler is fed to both RTL design and model the checker module captures and compares the signal fro m both the model and display the message form mis matching of digital values.
The complete design along with all timing
constraints, area utilization and optimization options are described using synthesis report. The design has been synthesized targeting 40n m t rip le o xide process technology using Xilin x FPGA Virte x-6 (xc 6vlx240t- 3ff1156) device. The Virte x fa mily is the latest and fastest FPGA which aims to provide up to 15% lo wer dynamic and static power and 15% improved performance than the previous generation. It is obvious that there is a trade-off between ma ximu m c lock frequency and area utilization (nu mber of slices LUTs) because the basic programmab le part of FPGA is the
slice that contain four LUTs (look up table) and eight Flip flops. Some of the slice can use their LUTs as distributed RAM.
Diffe rent capabilities and features of VHDL lead to various imp le mentation of the design in terms of performance and speed. All the simulat ion results are based on Xilin x ISE tool , using test bench waveform generator. All the individual waveforms of both the encrypted and decrypted MIPS processor simu lated using FPGA Virte x-6 device .
Encr ypte d MIPS Pr ocessorResult
Fig. 4 shows the default input encrypted instruction me mo ry contents (ime mcontents) inside the instruction me mo ry. These encrypted ime mcontents values allocate starting address of 0 to 287 me mory locations. The default input decrypted data me mo ry contents (dme mcontents) inside the data me mory and allocate me mo ry address of 0 to 55 locations. The three 64-bit unencrypted key values stored at starting address of 104 to 151 inside data memo ry. First two key values is zero and third key value i.e . KIRATPA L (a string of 8 ASCII characters) in terms of He x value i.e. 0x4b4952415450414c inside the data memo ry, starting address of 136 to 151 locations. The figure a lso shows the resultant waveform generated by the 32-bit encrypted MIPS processor. The input is clock of 4ps time period, active high reset which initializes all processor subunit to zero. After clock period 4ps, active low reset all encrypted instructions are loaded and e xecute. The input is clock of 16232ps, active low reset, and the resultant encrypted output is obtained as cipher data. The output registers values at 2688ps after e xecuted all encrypted instruction fro m instruction me mo ry. The plain te xt (input data) is stored at register
(4) i.e . 0x00000038 and output cipher data value is stored at me mory location address of 56 (decimal). Before 16232ps the me mo ry location is empty as shown in this figure. The resultant value is obtained after 16232ps as shown in above Fig. 8. The lo wer byte of encrypted cipher data is stored at me mory location of 56 (dec ima l) and upper byte is stored at 63(decimal). Hence, total 64-bit c ipher data value is obtained.
Input/plain text 0x00000038 Key1 0x0000000000000000 Key2 0x0000000000000000 Key3 0x4b4952415450414c
Output/cipher text 0x2542b 17039a61551
Figure 3. Input and output val ues of encrypte d MIPS pr ocessor and their resultant wavefor m
Decrypte d MIPS processor Result
Fig. 5 shows the default input encrypted instruction me mo ry contents (ime mcontents) inside the instruction me mo ry. These encrypted ime mcontents values allocate starting address of 0 to 287 me mory locations. The default input decrypted data me mo ry contents (dme mcontents) inside the data me mory and allocate me mo ry address of 0 to 55 locations. The three 64-bit unencrypted key values stored at starting address of 104 to 151 inside data memo ry. First two key values is zero and third key value i.e . KIRATPA L (a string of 8 ASCII characters) in terms of He x value i.e. 0x4b4952415450414c inside the data memo ry, starting address of 136 to 151 locations. The resultant waveform generated by the 32-bit encrypted MIPS processor. The input is clock of 4ps time perid, active high reset which initializes all processor subunit to zero. After c lock period 4ps, active low reset all encrypted instructions are loaded and execute. The input is clock of 16232ps, active low reset, and the
4.2 Verification and Synthesis
The design is verified both at block level and top level. As system verificat ion, we successfully e xecute encrypted MIPS progra m and T-DES encryption and decryption. Test case for block leve l is generated in VHDL by both directed and random way.
The synthesis and mapping result of encrypted MIPS pipeline processor design are summarized in Table 3. The speed performance of the processor is affected by hardware (i.e. c lock rate), instruction set, and compile r. The timing report of encrypted and decrypted MIPS processor shows that our processor works at 4.77ns clock period at synthesis level and 8.116ns clock period at simulat ion level. The synthesis report shows the area utilizat ion and timing summary. The area utilization is same for both the encrypted and decrypted MIPS processor. Both the processor can work at 123M Hz (at simu lation leve l) to fully e xecuted all instructions.
Table 3. S ynthesis report
resultant encrypted output is obtained as cipher data.
The output registers values at 16232ps after executed all encrypted instruction fro m instruction me mory. The plain te xt (input data) is stored at register (4) i.e. 0x00000038.The output cipher data value is stored at
Target FPGA Device Virtex-6 (XC6vlx240t-
Process Technology 40nm
Optimization Goal Speed
me mo ry location address of 56 (decimal). Before 16232ps the me mory location is empty as shown in this figure. The resultant value is obtained after 16232ps as shown in Figure. The lowe r byte of encrypted cipher
Max. operating frequency (hardware)
Max. operating frequency (software)
209MHz (synthesis level)
123MHz (simulation level)
data is stored at me mory location of 56 (decimal) and
upper byte is stored at 63(decimal). Hence, total 64-bit cipher data value is obtained.The decrypted MIPS
Number of slice registers 10411
Number of slice LUTs 69148
processor found the correct value for sum of array i.e. 0x00000038, which is stored in register (4).
Number of fully used LUT flip
Input/cipher text 0x00000038
Number of bonded IOBs 598
Key1 0x0000000000000000 Key 2 0x0000000000000000 Key3 0x4b 4952415450414c
Output/plain te xt 0x2c824fe 86704fd6e
Instruction throughput Latency
Key Length Data Length
21 cycles per instruction 64-bits
Figure 4. Input and output val ues of decrypte d MIPS pr ocessor and their resultant wavefor m
We proposed the high performance 32-bit encrypted and decrypted MIPS processor based on Triple Data Encryption Standard. Which e xecutes encrypted/decrypted instructions, read and decrypt encrypted data from me mory unit and write encrypted data back to me mo ry. The processor uses the symmet ric block pla in/cipher that can process data block of length 64-bits plain text, three 64-b its key and 64-bit cipher data. The design has been modeled in VHDL and functional verification polic ies adopted for it. Optimization and synthesis of design is carried out at latest and fastest FPGA Viret x-6 device that improves
performance. Each progra m instructions are tested with some of vectors provided by MIPS. The high performance and high fle xib ility of crypto processor design makes it applicable to various security applications. We conclude that system imp le mentation reach ma ximu m frequency of 209M Hz after synthesizing at 40n m process technology and 123MHz at simu lation leve l.
Gautham P, Parthasarathy R, Karthi Balasubramanian.2009, Low-power pipelined MIPS processor design, International symposium on integrated circuit (ISIC 2009), pp. 462-465.
Zulkifli, Yudhanto, Soetharyo and adinono.2009, Reduced Stall MIPS architecture using Pre-fetching accelerator, International conference on electrical engineering and informatics, IEEE, pp. 611-616, ISBN: 978-1-4244-4913-2, IEEE, Aug. 2009.
Pravin B. ghewari, M rs. Jaymala K. patil, Amit B. Chougule.2010, Efficient hardware design and implementation of AES cryptosystem, International journal of engineering science and technology, 2010, Vol. 2(3), 2010, pp. 213-219, ISSN: 0975-5462.
D. A. Patterson and J. L. Hennessy, Computer Organization and Design, The hardware/Software Interface. M organ Kaufmann, 2005.
Pejman lotfi, Ali-As ghar Salehpour, Amir-M ohammad Rahmani, Ali Afzali-kusha, and zainalabedin Navabi.2011, dynamic power reduction of stalls in pipelined architecture processors, International journal of design, analysis and tools for circuits and ststems, June 2011, Vol. 1, No. 1, pp. 9-15.
Rupali S. Balpande, Rashmi S. Keote.2011, Design of FPGA based Instruction fetch & decode Module of 32- bit RISC (MIPS) processor, International Conference on communication Systems and Network Technologies,
2011, pp.409-413, ISBN: 978-0-7695-4437-3, IEEE,
Saeid Taherkhani, Enver Ever and Orhan Gemikonakli.2010, Implementation of Non-pipelined and pipelined data encryption standard (DES) using Xilinx Virtex-6 technology, 10th IEEE International Conference on computer and information technology(CIT 2010), pp. 1257-1262, Z.navabi.2007, VHDL: Modular design and synthesis of cores and systems, pp.283-291, ISBN:978-0-07-147545- 7,M cGrew-Hills, 2007.
L. Floyd.2003, Digital Fundamental with VHDL, pp.362-368, ISBN: 0-13-099527-4, Pearson Education, 2003.
Xilinx, ISE Simulator, [online] Available: http://www.xilinx.com/tools/isim.htm.
Xilinx, XST Synthesis, [Online] Available; http://www.xilinx.com/tools/xst.htm.
Xilinx,ISE In-Depth tutorial, pp.95-120, jun.2009, Available. http://www.xilinx.com/support/documentation/sw_manu als/xilinx11/ise11 tut.pdf.