- Open Access
- Total Downloads : 724
- Authors : Mr. Ravi H. Bailmare, Mr. Pravin V. Kinge, Prof. S. J. Honade
- Paper ID : IJERTV2IS110887
- Volume & Issue : Volume 02, Issue 11 (November 2013)
- Published (First Online): 30-11-2013
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
A Review paper on Design and Implementation of Adaptive FIR filter using Systolic Architecture
Mr. Ravi H. Bailmare Mr. Pravin V. Kinge
Student of M. E. Electronics & Telecommunication Student of M. E. Electronics & Telecommunication
Faculty of Electronics & Telecommunication
Department of Electronics & Telecommunication Engineering G.H.Raisoni College of Engineering, Amravati, Maharashtra.
and regularity which are important property for VLSI
The tremendous growth of computer and Internet technology wants a data to be process with a high speed and in a powerful manner. In such complex environment, the conventional methods of performing multiplications are not suitable to obtain the perfect solution. To obtain perfect solution parallel computing is use in contradiction. The DLMS adaptive algorithm minimizes approximately the mean square error by recursively altering the weight vector at each sampling instance. In order to obtain minimum mean square error and updated value of weight vector effectively, systolic architecture is used. Systolic architecture is an arrangement of processor where data flows synchronously across array element.
This project demonstrates an effective design for adaptive filter using Systolic architecture for DLMS algorithm, synthesized and simulated on Xilinx ISE Project navigator tool in very high speed integrated circuit hardware description language (VHDL) and Field Programmable Gate Arrays (FPGAs). Here, by combining the concept of pipelining and parallel processing in to the systolic architecture the computing speed increases.
Keywords- Systolic Architecture, DLMS algorithm, VHDL,
FPGA, Xilinx ISE[I] INTRODUCTION
In computer architecture, a systolic architecture is a pipelined network arrangement of Processing Elements (PEs) called cells. It is a specialized form of parallel computing, where cells compute the data which is coming as input and store them independently. Systolic architecture represent a network a processing element (PEs) that regularly compute and pass data through the stem, the PEs regularly pump data in an out such that regular flow of data is maintained, as a result systolic array feature modularity
design. The systolic array may be use as a coprocessor in combination of host computer pass through PEs and the final result is return to host computer (See fig 1).
In order to achieve the high speed and low power demand in ASP applications, parallel array multipliers are widely used. In DSP applications, most of the power is consumed by the multipliers. Hence, low power multipliers must be designed in order to reduce the power dissipation.
Figure: 1Basic Principle of systolic system
The LMS adaptive algorithm minimizes approximately the mean-square error by recursively altering the weight vector at each sampling instance. Thus an adaptive FIR digital filter driven by the LMS algorithm can be described in vector form as
Figure: 2 Block diagram of an adaptive FIR digital filter driven by LMS algorithm.
Where d(n) and y(n)denote the desired signal and output signal, respectively. The step-size Âµ is used for adaptation of the weight vector, and e(n) is the feedback error. In the above equations, the tap-weight vector w(n) and the tap-input vector x(n) are defined as.
W (n) = [w0 (n), w1 (n),, wn-1(n)]T
X (n) =[x (n), x (n-1),, x(n-N+1)]T W (n+1) =W (n) +Âµe(n-D)X(n-D)
Where N is the length of an FIR digital filter and t denotes the transpose operator. The block diagram of the LMS adaptive FIR digital filter is depicted in Fig. 2, where the symbol denotes the unit delay element. The coefficient update using the DLMS algorithm of an -tap adaptive FIR digital filter is represented by the following equation and where D is the delay value in weight adaptation.[ II] RELATED WORK
One of the important implementation of the Systolic array architecture by Bairu K. Saptalakar, Deepak kale, Mahesh Rachannavar, Pavankumar M. K, these implementation involve Systolic Array Multiplier which is designed for 4 bits using structural and behavioral styles and is implemented, tested on the Spartan-3 FPGA board. In structural modeling, multiplier is divided into 3 sections i.e. upper, middle and lower sections. Where, all the three sections operate on the data simultaneously. Full adder and AND gates are basic building blocks of the multiplier. Each section has 4 full adders and associated AND gates. The
behavioral description is written and implemented based on behavior of the Systolic Array multiplier. A timing analysis tool is then applied to the object module to determine maximum operating speed. Thus the design of 4 bit Systolic Array Multiplier In proposed design was optimized using structural style compared with behavioral style. 
Another important implementation of low power systolic base adaptive filter by Purushothaman A, Dr.C.Vijaykumar. They deign systolic architecture for RLS using FPGA technology with clock getting. They uses systolic architecture instead of adders, subtracters and multipliers Systolic arrays speed up the processing due to the parallel calculation, but have the issue that the circuit scale becomes extremely large if the number of elements is large. The clock gating technique is extensively used in the design of low-power circuit .It involves dynamically shutting off the clock to portions of a design that are idle or are not performing useful computation the result of implementing the RLS algorithm in FPGA using systolic arrays adopting the reusable configuration and clock gating proposed above for the internal cells. For the cases of two and four antenna elements, the required number of sampling data processing clock cycles was 20, which indicates that the weights are updated using a very small number of clock cycles. The total numbers of clock cycles required before convergence for the two and four antenna element cases were 320 and 540, respectively. 
Another important implementation of Systolic Array Architecture for Matrix Multiplication by Mahendra Vucha, Arvind Rajawat. They design Systolic Array Architecture for Matrix Multiplication algorithm
.algorithm can be implemented in two methods 1. Conventional method (with out Pipeline and Parallel Processing) 2. Systolic Architecture (Pipeline and Parallel Processing). In, this research, the PE is replaced with Multiplication and Accumulation (MAC) to enhance the speed and reduce the complexity of Systolic Architecture. The implementation of Matrix Multiplication is done in both methods i.e. Conventional and Systolic Architecture on FPGA. The RTL code is written in Verilog HDL, verification of logic and simulation is done by ModelSim XE 6.4b. The simulation results have given that, the Systolic architecture implementation requires less number of clock cycles then Conventional method .where input and output matricesA3Ã—3,B3Ã—3 and C3Ã—3 respectively, where the matrix elements are of 4 bit each. The parallel processing and pipelining is introduced into the proposed systolic architecture to
enhance the speed and reduce the complexity of the Matrix Multiplier. 
Another important implementation by Feifei Dong, Sihan Zhang and Cheng Chen, Improved Design and Analyze of Parallel Matrix Multiplication on Systolic Array Matrix. This paper deals with the problem and demonstrates an improved algorithm for the traditional Parallel Matrix Multiplication on Sstolic Array, which is widely applied in the architecture of Central Processing Unit (CPU). It also analyzes and proves merits of the improvement. And introduces an improved Systolic matrix vector based algorithm to maximize utilization of parallel processors.
Another important implementation by H. Herzberg, R. Haimi-Cohen, and Y. Beery, A systolic array realization of an LMS adaptive filter and the effects of delayed adaptation. This paper presents a design of a systolic array of an adaptive filter. The filter is based on the least mean square (LMS) algorithm, but due to the problems in implementation of the systolic array, a modified algorithm, a special case of the delayed LMS (DLMS), is used. The convergence and steady state behavior of the systolic array are analyzed.
Another important implementation by M. D. Meyer and D. P. Agrawal, A high sampling rate delayed LMS filter architecture. in this paper, The author observe problem regarding implementation of high sampling rate transversal adaptive filter. Investigate a highly pipelined systolic architecture alternative to the conventional LMS adaptive filter. And use this new architecture for LMS adaptive filter, The resulting filter structure can accommodate very high sampling rates. The result of the system observed in terms of maximum sampling rate, computational speed and the effect of adaptation delay on algorithm convergence. A modular and highly parallel alternative to the conventional LMS algorithm has been presented. The pipelined filter structure, which implements a parallel version of the DLMS algorithm, uses multiple identical processing modules to achieve a computational speedup proportional to the order of the filter. 
Another important implementation by R. D. Poltmann, Conversion of the delayed LMS algorithm into the LMS algorithm .they shown in which way the delayed LMS (DLMS) algorithm can be transformed into the standard LMS algorithm at only slightly increased computational expense.
From review of various paper we conclude that systolic architecture is use for design of multiplier, matrix multiplication & many DSP application i.e. RLS algorithm, LMS algorithm and FIR filter. Compared different architecture of LMS algorithm and obtain improved result in comparison with conventional method.
Therefore at this stage use of systolic architecture for design of adaptive FIR (finite impulse response) filter using DLMS algorithm, is a better solution to design architecture for DLMS algorithm. Try to Calculate improve result as compare to conventional method in turn of time require for convergence.
Bairu K. Saptalakar, Deepak kale, Mahesh Rachannavar, Pavankumar M. K., Design and Implementation of VLSI Systolic Array Multiplier for DSP Applications International Journal of Scientific Engineering and Technology (ISSN: 2277-1581)
Volume 2 Issue 3, PP : 156-159 1 April 2013
Purushothaman A, Dr.C.Vijaykumar, Implementation of Low power systolic based RLS adaptive Filter using FPGAIEEE 2011
Mahendra Vucha, Arvind Rajawat, Design and FPGA Implementation of Systolic Array Architecture for Matrix Multiplication International Journal of Computer Applications (0975 8887) Volume 26
No.3, July 2011
C. Wenyang, L. Yanda. and J. Yuc, "Systolic Realization for 2D Convolution Using Conrigurable Functional Method in VLSI Parallel Array Designs." Proc. IEEE, Computer und Digital Technology Vol. 138, No. S . Sept. 1991. pp. 361- 370.
Peter Mc Curry, Fearghal Morgan, Liam Kilmartin. Xilinx FPGA implementation of a pixel processor for object detection applications. In the Proc. Irish Signals and Systems Conference, Volume 3, Page(s):346 349, Oct. 2001.
Feifei Dong, Sihan Zhang and Cheng Chen, Improved Design and Analyze of Parallel Matrix Multiplication on Systolic Array Matrix, IEEE, 2009.
H. Herzberg, R. Haimi-Cohen, and Y. Beery, A systolic array realization of an LMS adaptive filter and the effects of delayed adaptation, IEEE Trans. Signal Processing, vol. 40, pp. 27992803, Nov. 1992.
M. D. Meyer and D. P. Agrawal, A high sampling rate delayed LMS filter architecture, IEEE Trans. Circuits Syst. II, vol. 40, pp. 727 729, Nov. 1993.
H. T. Kung Why systolic architectures?, IEEE Computer, vol. 15, pp. 37, Jan. 1982.
R. D. Poltmann, Conversion of the delayed LMS algorithm into the LMS algorithm,IEEE Signal Processing Lett. vol. 2, p. 223, Dec.1995.