 Open Access
 Total Downloads : 227
 Authors : Jivin M, Anas A. S.
 Paper ID : IJERTV3IS20559
 Volume & Issue : Volume 03, Issue 02 (February 2014)
 Published (First Online): 22022014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
FPGA Based RISC and DSP System Design
Jivin M
PG student, VLSI & Embedded Systems, ECE Department TKM Institute of Technology
Karuvelil P.O, Kollam, Kerala691505, India
Anas A. S.
Assistant professor, ECE Department TKM Institute of Technology
Karuvelil P.O, Kollam, Kerala691505, India
Abstract Nowadays most of the microprocessor and microcontroller designs are based on Reduced Instruction Set Computer (RISC) core and many operations such as Discrete Cosine transform (DCT), Inverse DCT, Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT) are performed by DSP system. The concept of RISC architecture involves an attempt to reduce execution time by simplifying the instruction set of the computer and a Digital Signal Processor is a specialized microprocessor with an architecture developed for the fast operational needs of digital signal processing. A RISC (Reduced Instruction Set Computer) and DSP system which can perform Arithmetic, Logic and DSP operations are proposed. The processor use a 4 bit opcode and it can perform 15 different operations which include Arithmetic, Logic and DSP operations like DCT, IDCT, DFT & IDFT. The RISC machine fetches an instruction from memory. The instruction is 20 bit out of which 03 bits represent an opcode which decide the operation to be performed, 411 and 1219 bits represent the registers holding the values to be used for the instructions. The output is of 8 bit value. The coding is done in VHDL language, synthesized using Xilinx ISE 13.2 and simulated using ISim.
KeywordsRISC, DSP, FPGA, DFT, IDFT, DCT,
IDCT, opcode.
1. INTRODUCTION
Reduced Instruction Set Computer (RISC) architectures represent an important innovation in the area of computer organization. This architecture attempt to
produce more CPU power by simplifying the instruction set of the CPU. Reduced Instruction Set Computer (RISC) use fewer instructions with simple constructs, therefore they can be executed much faster within the CPU without having to use memory as often. The concept of RISC architecture involves an attempt to reduce execution time by simplifying the instruction set of the computer [4].
Main features of a RISC processor are

Relatively few instructions.

Most instruction is register based.

Relatively few Addressing modes.
Addressing modes are usually register, direct, register indirect, displacement.

Better compilation.

Fixed length, easily decoded instruction format.
Fixed length instructions are easier to decode than variable length instructions, and use fast, inexpensive memory to execute a larger piece of code. Decoding is simplified because opcode and address fields are located in the same position for all instructions [4].

All operation done within the registers of the CPU.

Efficient and optimization of instruction pipeline.

Better for parallelism, pipelined and superscalar architectures.

Hardwired controller instructions (as opposed to microcoded instructions).
The most important feature of RISC instruction format is to decode the information. It has the ability to execute one instruction per cycle. This is done by overlapping the fetch, decode, and execute phases of two or three instructions by using a procedure referred to as pipelining. Instructions are of fixed number of bytes and take fixed amount of time for execution [13]. RISC implements each instruction in a single cycle using a distinct hardwired control at lesser amount of circuitry and thus, power dissipation because of its reduced instruction set. A Digital Signal Processor is a specialized microprocessor with an architecture developed for the fast operational needs of digital signal processing. Digital Signal Processor is optimized specially for digital signal processing. It also support features as an applications processor or microcontroller. DSP operations process the continuous signals and data. Digital signal processing is used in many aspects of industry. Examples of applications include speech synthesis, speech recognition, and high speed modems. The main advantage of digital processing over analog processing is its ability to both process data and to control data based on earlier results [10]. The most important feature of a DSP is its ability to support repetitive and numerically intensive tasks. This ability is used in its calculation of Fourier transforms, multifilter systems and correlation calculations. The ability to perform a multiply accumulate operation in a single clock cycle is the key. The multiplyaccumulator is integrated into the data path. Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly
and repeatedly on a set of data. Signals are constantly converted from analog to digital, manipulated digitally, and then converted back to analog form. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time.
ALU
Register Array
Instruction Decoding Unit
Data Bus
1.1 Project Overview
Most of the microprocessor and microcontroller designs are based on Reduced Instruction Set Computer (RISC) core and many operations such as Discrete Cosine transform (DCT), Inverse DCT, Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT) are performed by DSP system. The concept of RISC architecture involves an attempt to reduce execution time by simplifying the instruction set of the computer and a Digital Signal Processor is a specialized microprocessor with an architecture developed for the fast operational needs of digital signal processing. The project focuses on developing a 20Bit RISC and DSP System described using VHDL which can perform Arithmetic, Logic and DSP operations.
2. LITERATURE SURVEY
The Microprocessor is a semiconductor device (Integrated Circuit) manufactured by the VLSI (Very Large Scale Integration) technique. The basic functional blocks of a microprocessor are ALU, Register arrays, Program Counter (PC), Instruction decoding unit and Control Unit. The basic block diagram of a microprocessor is shown in Figure:2.1. The ALU is the computational unit of the microprocessor which performs Arithmetic and Logical operations on binary data [14]. The Register array is the internal storage device and so it is also called internal memory. The input data for ALU, the output data of ALU (result of computations) and any other binary information needed for processing are stored in the register array. For any microprocessor, there will be a set of instructions given by the manufacturer of the microprocessor. For doing any useful work with the microprocessor, we have to write a program using these instructions and store them in a memory device external to the microprocessor [13]. The instruction pointer generates the address of the instructions to be fetched from the memory and send through address bus to the memory. The memory will send the instruction codes and data through data bus. The instruction codes are decoded by the decoding unit and send information to timing and control unit. The data is stored in the register array for processing by ALU. The control unit will generate the necessary control signals for internal and external operations of the microprocessor. The RISC ideas were developed mostly in the early 1980s and became popular in the second half of that decade. RISC architectures came, in pat as a reaction to the direction that computer architecture had taken in the 1970s.
Timing and Control Unit
Program Counter
Control Bus Address Bus
Figure: 2.1 Basic block diagram of a microprocessor
RISC is considered to be the basis for designing highperformance processors, and almost at any price level. Reduced Instruction Set Computer (RISC) architectures represent an important innovation in the area of computer organization [13]. This architecture attempt to produce more CPU power by simplifying the instruction set of the CPU. Reduced instruction set computing is a CPU design strategy based on the insight that simplified (as opposed to complex) instructions can provide higher performance if this simplicity enables much faster execution of each instruction.
Figure: 2.2 Instruction cycle in RISC system
The opposing architecture is called complex instruction set computer (CISC). In RISC system complex operations are executed as a sequence of simple instructions. But in the case of CISC system they are executed as one single or a few complex instructions [4]. An instruction cycle (fetch decodeexecute cycle) is the basic operation cycle of a computer. It is the process by which a computer retrieves a program instruction from its memory, determines what actions the instruction requires, and carries out those actions. This cycle is repeated continuously by the central processing unit (CPU) [5].
A Digital Signal Processor is a specialized microprocessor with an architecture developed for the fast operational needs of digital signal processing [9].
Main features of a DSP processor are

Special arithmetic operations, such as Multiply accumulates (MACs)

Perform DCT (Discrete Cosine transform) and IDCT (Inverse Discrete Cosine transform).

Perform FFT (Fast Fourier Transform) and IFFT (Inverse Fast Fourier Transform).

It can be implemented in general purpose computers or with embedded processors that may or may not include specialized microprocessors called digital signal processors.

Use VLIW (Very Large Instruction Word) techniques so each instruction drives multiple arithmetic units in parallel.
Digital signal processing (DSP) is the mathematical manipulation of an information signal to modify or improve it in some way. It is characterized by the representation of discrete time, discrete frequency, or other discrete domain signals by a sequence of numbers or symbols and the processing of these signals. The goal of DSP is usually to measure, filter and/or compress continuous realworld analog signals. A typical DSP system block diagram is shown in Figure: 2.3. The first step is usually to convert the signal from an analog to a digital form, by sampling and then digitizing it using an analogto digital converter (ADC), which turns the analog signal into a stream of numbers. However, often, the required output signal is another analog output signal, which requires a digitaltoanalog converter (DAC) [9]. Even if this process is more complex than analog processing and has a discrete value range, the application of computational power to digital signal processing allows for many advantages over analog processing in many applications, such as error detection and correction in transmission as well as data compression.
analyse,syntesis and modify signals in digital environment [2].Transform basically convert signal from one domain to another, that is from time domain to frequency or from frequency domain to time domain without no loss of information. A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies [12].
3. PROPOSED SYSTEM
Reduced Instruction Set Computer (RISC) architectures represent an important innovation in the area of computer organization. This architecture attempt to produce more CPU power by simplifying the instruction set of the CPU.
Figure: 3.1 shows the Block Diagram of RISC system [4]. It includes Decoder, fetch machine, Arithmetic and logic machine, and register set.
OPCODE
INSTRUCTION FETCH MACHINE
DECODER
ARITHMETIC LOGIC MACHINE
EXECUTION
REGISTER SET
Figure: 3.1 Block Diagram of RISC System
The Register Set of this system contains the following registers:

Instruction Register (IR) – holds the current instruction.

Program Counter (PC) – holds the address of the next instruction.
Unfiltered analog signal
PROCESSOR
ADC
Sampled digitized signal
DAC
Digitally filtered signal
Filtered analog signal
Instruction Fetch Machine fetches an instruction from external memory, and upon completion of the instruction fetch cycle this machine signals the decoder to decode the
Figure: 2.3 Block diagram of DSP System
Signal processing is a method of extracting information from signal which in turn depends on the type of signal and the nature of information it carries. Therefore, signal processing is concerned with the representing signals in mathematical terms and extracting the information by carrying out algorithmic operations on the signal. A signal can be mathematically expressed in terms of basic functions in original domain of independent variable or it can be expressed in terms of basic functions in transformed domain. DSP operations include discrete Fourier transform, inverse discrete Fourier transform, Discrete cosine transform, inverse Discrete cosine transform etc [1]. Discrete Fourier Transform is a fundamental mathematical operation in digital signal processing. It allows the user to
instruction. Upon completion of the instruction fetch cycle, the instruction is decoded. The decoder reads bit 3 down to 0 of the Instruction Register, decides which of the sixteen operations the CPU needs to performs, and signals one of the next states to begin its operation. The ALU can perform
Arithmetic and Logic operation based on the opcode obtained by decoding the instruction. The data is taken from two GPRs and is moved to the ALU. The result is stored in a GPR. For operations that involve one operand, a GPR can be specified to store the result.
RISC system mainly consists of 2 parts shown in Figure: 3.2.

Control Unit
A Control Unit that coordinates the behavior of the Data Path by issuing appropriate control signals that guarantee the correct sequence of operations.It is typically designed as a single or cooperating FSMs.

Data Path
A Data Path is a collection of interconnected modules that perform all the relevant computation on the data: it can use both combinational and sequential components. Data Path includes Registers, Memory, ALU etc. The Data Path is controlled by the control signals generated by the Control Unit.
Figure: 3.2 Control Unit and Data Path

Block Diagram
Figure: 3.3 shows the Architecture of RISC and DSP System which can perform Arithmetic, Logic and DSP operations. The RISC & DSP System mainly consist of 2 parts control unit and data path. The Data Path is controlled by the control signals generated by the Control Unit. A data path is a collection of functional units, such as Arithmetic Logic Units or multipliers that perform data processing operations. It is a central part of many central processing units (CPUs) along with the control unit, which largely regulates interaction between the data path and the data itself, usually stored in registers or main memory.

Components of architecture in RISC & DSP System
The main components of RISC & DSP System architecture are:

Register Set
The Register Set of this system contains Instruction Register (IR), Program Counter (PC) and four General Purpose Registers (R0, R1, R2, R3).
Figure: 3.3 Architecture of RISC & DSP System

Memory (RAM)
This is memory that can be written to as well as being read. This type of memory is volatile meaning when the computer is turned off anything in it is lost. RAM is used to store data when the computer is turned on. It is closer to the processor than other types of memory and is therefore faster.RAM is used to store the data relating to programs and files that are currently open on the computer. When the computer is operational any files currently in use will be stored in RAM this includes software and user files. They are stored in RAM for fast access. The two main types of RAM are static RAM and dynamic RAM.

Multiplexer
A multiplexer (or MUX) is a circuit that accepts inputs from several different channels and feeds all of them into a single output channel in a sequential order.

Arithmetic Logic Unit (ALU)

Arithmetic and logic unit is a digital circuit that performs arithmetic and logic operation. ALU is the fundamental building block of the central processing unit of a computer. Arithmetic operations include addition, subtraction and shifting operation and logical operations include Boolean comparisons, such as AND, OR, XOR, NOT, NAND, NOR, XNOR etc operations. Therefore, besides adding and subtracting numbers, ALUs often handle the multiplication of two integers, since the result is also an integer.

Working

Fetching the Instruction
The instruction is fetched from the memory using the address that is currently stored in the program counter (PC), and stored the instruction in the instruction register (IR). At the end of the fetch operation, the PC points to the next instruction that will be read at the next cycle.

Decode the instruction
The decoder interprets the instruction. During this cycle the instruction inside the IR (instruction register) gets decoded.

Execute the instruction
The control unit of the CPU passes the decoded information as a sequence of control signals to the relevant function units of the CPU to perform the actions required by the instruction such as reading values from registers, passing them to the ALU to perform mathematical or logic functions on them, and writing the result back to a register. If the ALU is involved, it sends a condition signal back to the Control Unit. Initially the Program Counter is at 00000000. During the Fetch machine cycle it fetches the instruction from the memory at 00000000 location. The instruction thus obtained is stored to Instruction Register after incrementing the Program Counter by 1. Then the Program Counter becomes 00000001. This is the fetch cycle. The instruction thus stored in the Instruction Register is decoded and opcode is obtained. This opcode is given to the ALU. On the basis of opcode the ALU can perform Arithmetic, Logic and DSP operations. If the opcode obtained after decoding the instruction denotes the DSP operation the input data is taken from DSP memory by enabling RISC_DSP signal.


Instruction Format
The RISC machine fetches an instruction from the memory. Each instruction decodes by internal decoder and the value of each instruction is 8 bits. In those 0 to 3 bits is the opcode which decide the operation to be performed. Figure:3.4 shown the instruction format for RISC processor
.
Destination
Register
Source Register
Opcode
7
2bits
6
5
2bits
4
3
4bits
0
Figure:3.4 Instruction Format

to 5 bits represents the Source Register and 6 to
7 bits represents the Destination Register. The Source Register and Destination Register is selected in such a way that,
IF 00 > R0 is selected IF 01 > R1 is selected IF 10 > R2 is selected IF 11 > R3 is selected
The instruction format for output is shown inFigure:3.5
Figure:3.5 Instruction Format for output

Instruction Sets
This RISC & DSP System perform 16 operations. It includes 11 Arithmetic and Logic operations and 4 DSP operations. In this instruction set the processor use 4 bit opcode to allow instruction to perform various operations. Table 3.1 shows the instruction set for RISC & DSP system.
Table:3.1 Instruction Sets for RISC & DSP System
Instruction
Opcode
OR
0000
AND
0001
NAND
0010
NOR
0011
XOR
0100
XNOR
0101
ADD
0110
SUBTRACT
0111
NOT
1000
INCREMENT
1001
DECREMENT
1010
FFT
1011
IFFT
1100
DCT
1101
IDCT
1110
READ
1111

DSP OPERATIONS
The important DSP operations performed by this system are Fast Fourier Transform (FFT), InverseFFT, Discrete Cosine transform (DCT) and Inverse DCT.

Fast Fourier Transform (FFT) & Inverse FFT
The Discrete Fourier Transform (DFT) plays an important role in the analyses, design and implementation of the discretetime signal processing algorithms and systems it is used to convert the samples in time domain to frequency domain. The Fast Fourier Transform (FFT) is simply a fast (computationally efficient) way to calculate the Discrete Fourier Transform (DFT).
=0
= 1 K=0,1,….,N1…(3.1)
Table: 5.1 Instruction or Data in various addresses of RAM
Ram Address
Instruction or Data In That Address
0000
00001111
0001
10000001
0010
01001111
0011
11100001
0100
10001111
0101
00001001
0110
11001111
0111
11111111
1000
00010000
1001
10110001
1010
01000010
1011
11000011
1100
11100001
The direct evaluation of X(K) requires 2complex multiplications and N (N1) complex additions.Inorder to avoid complexity Radix2 DIT FFT Algorithm is used. The Flow graph of 8point radix2 DIT FFT algorithm is shown in Figure:3.6
Figure:3.6 Flow graph of 8point radix2 DIT FFT algorithm
In a communication system that uses an FFT algorithm there is also a need for an IFFT algorithm to compute IDFT. The IDFT of an Npoint sequence X(K), K=0,1,2,……….N1 is defined as

Discrete Cosine Transform (DCT) & Inverse DCT A discrete cosine transform (DCT) expresses a
finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.
The N point 1D DCT is defined as:
The N point 1D IDCT is defined as:

Simulation of RISC System
The simulation result of RISC system which can perform various Arithmetic and Logic operations based on the data stored in RAM.When reset=0 the program counter becomes 00000000 and all other signals ar set to
0.When reset=1 & rd_wb=1 the system starts to work by reading data from the RAM. The program counter (PC) locate the position 00000000 of RAM and taken out the instruction from that position. Here the instruction is
00001111. The program counter is then incremented by 1 and the instruction is stored in the instruction register. The instruction stored in the instruction register is then decoded to obtain opcode. Thus the opcode obtained as a result of decode is 1111.The opcode 1111 represent the next instruction in the memory is data and is taken out and load to corresponding register. After incrementing PC it becomes
1
where k = 2
1
for k=0
for k0.
00000001. The PC locate the memory and read the data (10000001) from the memory and again the PC incremented (00000010) and load the data to the corresponding register. The register is selected in such a way that the first two bit of

SIMULATION RESULTS
The design entry is modelled using VHDL in Xilinx ISE Design Suite 12.1 and the simulation of the design is performed using Modelsim SE 6.2c to verify the functionality of the design.
the instruction represents the destination register. Here the first two bits is 00 which is used to select R0.Then R0=10000001.
Figure: 4.1 Simulation result of READ operation in RISC system.
Simulation result of various Arithmetic and Logic operations performed by the RISC system is shown Figure:
4.2. Now the PC is 00001000. PC locate the RAM position and taken out the instruction from that position. Here the instruction is 00010000.The program counter is then incremented by 1 and the instruction is stored in the instruction register. The instruction stored in the instruction register is then decoded to obtain opcode.
Figure: 4.2(a) Simulation result of OR operation (opcode=0000) in RISC system.
Thus the opcode obtained as a result of decode is 0000. The opcode 0000 represent the OR operation between two registers and the result is stored in the destination register. In the instruction00010000,01 is for the selection of source register (R1) and 00 for the selection of destination register (R0).Here the OR operation takes place between R1 &R0 and the result is stored in the destination register (R0).The simulation result of OR operation is shown in Figure: 5.7(a) and simulation results of AND, & ADD are shown in Figure: 5.7(b)&(c).
Figure: 4.2(b) Simulation result of AND operation (opcode=0001) in RISC system
Figure: 4.2(c)Simulation result of ADD operation (opcode=0110) in RISC system


Simulation of 8point FFT
The RISC & DSP System perform 4 DSP operations: FFT, IFFT, DCT & IDCT. Among the four DSP operations the 8 point FFT is designed and simulated using Decimation In Time Radix2 Algorithm. For designing 8 point FFT, firstly designed 2 point FFT with the help of butterfly diagram. Then designed 4 point FFT using two 2points FFT and so on. Simulation result of 8Point FFT is shown in Figure: 5.8.The input is given as
((0.0,0.0),(1.0,0.0),(2.0,0.0),(3.0,0.0),(4.0,0.0),(5.0,0.0),(6.0,
0.0),(7.0,0.0)) and the output thus obtained is ((28,0),( 4,9.6),(4,4),(4,1.65),(4,0),(4,1.65),(4,4),(4,9.65)).
Figure: 4.3 Simulation result of 8point FFT

CONCLUSIONS
This project intends to design 20 bit RISC & DSP system which can perform Arithmetic, Logic and DSP operations. The system uses a 4 bit opcode and 15 different operations (Arithmetic, Logic and DSP) can be done with these opcode. The various DSP operations performed by the system are FFT, IFFT, DCT and IDCT. The instruction is 20 bit out of which 03 bits represent an opcode which decide the operation to be performed, 411 and 1219 bits represent the registers holding the values to be used for the instructions. The output is of 8 bit value. The submodules of the system are designed and simulated. By combining these submodules, designed and simulated various Arithmetic, Logic & DSP operations performed by the RISC system.

REFERENCES

Deepak Kumar, K.AnusudhaImplementation of DSP System for Discrete Transforms using VHDL International Journal of Computer Applications (4245), Volume 69No.26, May 2013.

Asmita Haveliya, Design and Simulation of 32Point FFT Using Radix2 Algorithm for FPGA Implementation, Second International Conference on Advanced Computing & Communication Technologies, 2012.

Sneha N. Kherde, Meghana Hasamnis, Efficient Design and Implementation of FFT, International Journal of Engineering Science and Technology (IJEST), ISSN : 09755462 NCICT Special Issue Feb 2011.

Ryszard Gal, Adam Golda, Maciej Frankiewicz, Andrzej Kos, FPGA implementation of 8bit RISC Microcontroller for Embedded System MIXDES, 323328, 2011.

M. R.S. Balpande, M.R.S. Keota, Design of FPGA based Instruction Fetch & Decode Module of 32bit RISC (MIPS) Processor, Proc. ICCSNT, p. 409, 2011.

S. Belkouch, M. El Aakif and A. Ait Ouahman Improved Implementation of a Modified Discrete Cosine Transform on Low Cost FPGA IEEE 5th International Symposium on I/V Communications and Mobile Network, Oct 2010, Rabat, Morocco.

ZiWei Zheng and Zhe Ren Efficient Design of Fast Fourior Transform Processor Using FPGA Technology IEEE International Conference on Electrical and Control Engineering, pp.51955198, Aug 2010, Wuhan, China.

LI Xiaofeng, Chen Long, Wang Shihu The Implementation of Highspeed FFT processor based on FPGA IEEE International Conference on Computer, Mechatronics, Control & Electronic Engineering (CMCE), Vol.2 pp. 236239, June 2010, Changchun, China.

K. Anand and S. Gupta, Designing Of Customized Digital Signal Processor B.T. Thesis, Department of Electrical and Electronics, Indian Institute of Technology, Delhi, 2007.

Jarrod D. Luker and Vinod B. Prasad RISC System Design in an FPGA IEEE Conference Publication, Vol. 2, pp. 532536, Aug 2001, Dayton, USA.

Raj Kamal Architecture, Programming, Interfacing and System Design Pearson Education Dorling Kindersley (India), 2007.