 Open Access
 Total Downloads : 468
 Authors : Parvathy M, Dr. Ganesan. R
 Paper ID : IJERTV2IS121283
 Volume & Issue : Volume 02, Issue 12 (December 2013)
 Published (First Online): 30122013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
A Novel Turbo Decoder Architecture for High Throughput WSN using LUTlog BCJR Algorithm
Parvathy M a
a Research Scholar, NICE University,Thuckalay Dr. Ganesan. R b
bProfessor ,NICE University,Thuckalay
Abstract
This review paper depicts the aspects of implementing a Turbo decoder for high throughput with depth information. In this paper FPGA architecture of enabling two operating decoders at the same time interval. This simple yet effective modification yields almost doubled throughput results compared to the single BCJR decoder. The review paper surveys the various works in Turbo decoder architecture, and its hardware implementation using FPGA
KEYWORDSTurbo decoder, FPGA, LUTlogBCJR,ACS Unit.

Introduction
WIRELESS SENSOR NETWORKs (WSNs) can be
considered to be energy constrained wireless scenarios, since the sensors are operated for extended periods of time,while relying on batteries that are small, lightweight and in expensive.
Recent applicationspecific integrated circuit (ASIC) based turbo decoder architectures [5][7] have been designed for achieving a high transmission throughput, rather than for a low transmission energy. For example, turbo codes have facilitated transmission throughputs in excess of 50 Mbit/s in cellular standards, such as the 3rd Generation Partnership Project 3GPPLong Term Evolution (LTE) and recent ASIC
turbo decoder architectures have been designed for throughputs that are in excess of 100 Mbit/s [5], [6]. This has been achieved by employing the MaxLogBCJR turbo decoding algorithm, which is a lowcomplexity approximation of the optimal Logarithmic BahlCockeJelinekRaviv (Log BCJR) algorithm [8].The MaxLogBCJR algorithm appears to lend itself to both highthroughput scenarios, as well as to the abovementioned energyconstrained scenarios. This is because a low turbo decoder energy consumption is implied by MaxLogBCJR algorithms low complexity.
However, this is achieved at the cost of degrading the coding gain by 0.5 dB compared to the optimal LogBCJR algorithm [9], increasing the required transmission energy by 10%. As we shall demonstrate in Section IV, this disadvantage of the MaxLogBCJR outweighs its attractively low complexity, when optimizing the overall energy consumption of sensor nodes that are separated by dozens of meters.
This motivates the employment of the lookuptable logBCJR (LUTLogBCJR) algorithm [8] in energy constrained scenarios, since it approximates the optimal Log BCJR more closely than the MaxLogBCJR and therefore does not suffer from the associated coding gain degradation. However, to the best of our knowledge, no LUTLogBCJR ASICs have been specifically designed for energyconstrained scenarios.
Previous LUTLogBCJR turbo decoder designs [10][13] were developed as a part of the ongoing drive for higher and higher processing throughputs, although their throughputs have since been eclipsed by the MaxLogBCJR architectures. This opens the door for a new generation of LUTLogBCJR ASICs that exchange processing throughput for energy efficiency.
In order to implement an efficient turbo decoder, a suitable decoding algorithm has to be chosen. Turbo codes have been originally implemented with BCJR (Bahl, Cocke, Jelinek, Raviv) [2] algorithm. However, this algorithm performs complex mathematical operations such as multiplication, division and logarithmic calculations. Therefore, engineers have avoided implementing this complex algorithm and preferred the suboptimal derivatives of the BCJR (MAP) algorithm such as the LogMAP and the Max LogMAP algorithms which are much simpler to implement but yield worse BER performances [3].
With the advent of the technology, it is possible to implement the BCJR algorithm on a single FPGA. The details of this approach and detailed information about turbo encoders and decoders are given in [4].

Literature survey
Many recent works in this axis have been reported in literature survey.
In 2013,Design and Implementation of a High Speed MAP Decoder ArchitectureforTurboDecoding by Shrestha, R.; Paily, R Maximum says, a posteriori probability (MAP) decoder is an integral part of the most exciting error correcting turbo decoders. A high speed architecturefor MAP decoder is an essential entity for the design of high throughput turbo decoder which is widely used in the recent wireless communication standards. A new sliding window approach for the BahlCockeJelinekRaviv (BCJR) algorithm used in the design of MAP decoder. An architecturefor MAP decoder is also included. The proposed MAP decoder architecture is implemented on field programmable gate array (FPGA). The
proposed MAP decoder operates at a maximum frequency of
346 MHz and is compared with the state of the art implementations of MAP decoder. Finally, the bit error rate (BER) performance of an implemented MAP decoder in a communication environment is measured.
In 2011Christoph Studer [6] proposed a design and implementation aspects of parallel turbodecoders that reach the 326.4Mb/s LTE peak datarate using multiple softinput softoutput decoders that operate in parallel. To highlight the effectiveness of his designapproach, he realized a 3.57mmÂ² radix4based 8Ã— parallel turbodecoder ASIC in 0.13 Âµm CMOS technology achieving 390Mb/s. He furthermore detail a radix4based SISO decoder architecture that enables high throughput turbodecoding. As a proofofconcept, he show an 8x parallel ASIC prototype achieving the LTE peak datarate and the 100Mb/s milestone at low power, and finally compare the key characteristics to that of other measured turbodecoder ASICs.
In 2010 Matthias May[5]has presented a 3GPP LTE compliant Turbo code decoder which provides a throughput of 150Mbit/s at 6.5 decoding iterations and 300MHz clock frequency with a power consumption of about 300mW. The decoder has been integrated in an industrial SDR chip in 65 nm low power CMOS process. The architecture has a very good scalability for further throughput demands. Special emphasis was put on the problem of acquisition in highly punctured LTE Turbo codes with code rates up to 0.95. We considerably reduced the high acquisition length needed for this code rate by implementing NII in addition to the acquisition.
In January 2009 C. Benkeser, A. Burg, T. Cupaiuolo, and Q. Huang, in proposed in their paper Design and optimization of an HSDPA turbo decoder ASIC that, The turbo decoder is the most challenging component in a digital HSDPA receiver in terms of computation requirement and power consumption, where large block size and recursive algorithm prevent pipelining or parallelism. HighSpeed Downlink Packet Access (HSDPA) is an enhanced 3G (third
generation) mobiletelephony communications protocol in the HighSpeed Packet Access (HSPA) family, also dubbed 3.5G, 3G+ or turbo 3G, which allows networks based on Universal deployments can support downlink speeds of up to
42.2 Mbit/s. HSPA+ offers further speed increases, providing speeds of up to 337.5 Mbit/s with Release 11 of the 3GPP standards. This mainly focuses on the complexity and power consumption issues at algorithmic, arithmetic and gate levels of ASIC design, in order to bring power consumption and die area of turbo decoders to a level commensurate with wireless application. Realized in 0.13   mum CMOS technology, the turbo decoder ASIC measures 1.2 mm2 excluding pads, and can achieve 10.8 Mb/s throughput while consuming only 32 mW..
In 2009, Xenotran LLC, Crownsville, MD, USA ; Chau, P.M in their paper Iproved architectures for the add compareselect operation in long constraint length Viterbi decoding proposed about turbo decoders which had received much recent attention for their extraordinary coding gains, but inherently suffer latency limitations unacceptable in most telephony applications. Long constraint length (LCL) Viterbi decoding (VD) techniques hold promise for significant coding gains at low latencies. They also put forward a novel architectures for the addcompareselect unit of an LCL VD. The derived bitserial circuits are shown to be more efficient than traditional bitserial methods with one solution 24% more efficient than traditional approaches and requiring only 1/2 the I/O. They concluded by building a hardware Viterbi decoder was designed, built, and tested.
High throughput low energy FEC/ARQ technique for short frame turbo codes put forward by Chi, Zhipei Zhongfeng Wang ; Parhi, K.Kin the year 2009 focused how to protect the short frames using turbo decoding which is a challenging topic. At first they suggested a scalable and easily implementable interleaver design is proposed since good random interleavers for long frame as turbo codes are not guaranteed to perform well for short frames. Second, an efficient tailbiting encoding/decoding scheme is proposed,
Mobile Telecommunications System (UMTS) to have higher datatransfer speeds and capacity.
which does not sacrifice performance but significantly increases the throughput of the decoding process compared with existing methods. Finally, a novel error detection method, taking advantage a set of decoding metrics (DMs), is developed to reduce the number of cyclic redundancy check (CRC) bits used for error detection. Concluded their work by saving transmission throughput up to 12% , and 21.5% for the energy consumption of the turbo decoder when a frame size of 49 is used.
.F.M. Li, C.H. Lin, and A.Y. Wu, Unified convolutional/turbo decoder design using tilebased timing analysis of VA/MAP kernel, Oct. 2008 put forward that For the design of a unified Convolutional/Turbo decoder Convolutional code and Turbo code may coexist to satisfy the advanced forwarderrorcorrection (FEC).Here we analyze the timing charts of both the Viterbi algorithm and the MAP algorithm for the introduction of three techniques, including Distribution, Pointer, and Parallel schemes. They can be used as flexible tools in timingchart analysis to either reduce memory size or to increase throughput rate. Again a tile based methodology was proposed to analyze the key features of timing charts, such as computing/memory units and hardware utilization. On the basis of the timing analysis, developed a VA/MAP timing chart that has three modes (VA mode, MAP mode, and concurrent VA/MAP mode) by complementing the idle time of both VA and MAP decoding procedures. The new combined timing analysis helps us for constructing unified component decoder with near 100% utilization rate of the processing element (PE) in both VA/MAP decoding functions. triplemode FEC kernel that can perform both Convolutional/Turbo decoding functions seamlessly for different communication systems. By integrating the FEC kernel with different size of memory, they constructed four types of FEC decoders for different application scenarios, such as 1) standalone Convolutional decoder (VA mode); 2)
standalone Turbo decoder (MAP mode); 3) dual mode Convolutional/Turbo decoder (VA mode and MAP mode); prototyping FEC kernel processor that is compliant to 3GPP standard is verified in TSMC 0.18mum CMOS process in the type of triplemode FEC decoder.
L. Hanzo, J. P. Woodard, and P. Robertson, Turbo decoding and detection for wireless applications June. 2007.suggested about the importance of historical perspective of turbo coding andturbo transceivers inspired by the generic turbo principles from Shannon's visionary predictions. They reviewed about the classic maximum a posteriori probability decoder. These discussions are followed by studying the effect of a range of system parameters in a systematic fashion, in order to gauge their performance ramifications. Then they focused on the family of iterative receivers designed for wireless communication systems, which were partly inspired by the invention of turbo codes. Concluded by highlighting the family of iteratively detected joint coding and modulation schemes, turbo equalization, concatenated spacetime and channel coding arrangements, as well as multiuser detection and threestage multimedia systems.
In the year 2007 HighSpeed Recursion Architectures for MAPBased Turbo Decoders by Zhongfeng Wang said that the maximum a posterior probability (MAP) algorithm has been widely used in Turbo decoding for its outstanding performance. However, it is very challenging to design high speed MAP decoders because of inherent recursive computations. This paper presents two novel highspeed recursion architectures for MAPbased Turbo decoders. Algorithmic transformation, approximation, and architectural optimization are incorporated in the proposed designs to reduce the critical path. Simulations show that neither of the proposed designs has observable decoding performance loss compared to the true MAP algorithm when applied in Turbo decoding. Synthesis results show that the proposed Radix2 recursion architecture can achieve comparable processing speed to that of the stateoftheart recursion (Radix4) architecture with significantly lower complexity while they
and 4) triplemode Convolution/Turbo decoder (VA mode, MAP mode, and concurrent VA/MAP mode). Finally,a concluded by saying that the proposed Radix4 architecture is 32% faster than the best existing design.
In June 2007 ,E. Boutillon, C. Douillard, and G. Montorsi, Iterative decoding of concatenated convolutional codes: Implementation issues proposed a new idea related to turbo decoders, where the term turbo generally refers to iterative decoders intended for parallel concatenated convolutional codes as well as for serial concatenated convolutional codes. The general structure of iterative decoders and the main features of the softinput softoutput algorithm that forms the heart of iterative decoders. A softin softout (SISO) decoder is a type of softdecision decoder used with error correcting codes. "Softin" refers to the fact that the incoming data may take on values other than 0 or 1, in order to indicate reliability. "Softout" refers to the fact that each bit in the decoded output also takes on a value indicating reliability. Typically, the soft output is used as the soft input to an outer decoder in a system using concatenated codes, or to modify the input to a further decoding iteration such as in the decoding of turbocodes.Examples include the BCJR algorithm and the soft output Viterbi algorithm. In this paper they put forward a very efficient parallel architectures available for all types of turbo decoders allowing highspeed implementations. The proposed work includes implementation aspects like quantization issues and stopping rules used in conjunction with buffering for increasing throughput are considered. Finally, they concluded by an evaluation of the complexities of the turbo decoders as a function of the main parameters of the code.
N. Sadeghi, S. Howard, S. Kasnavi, K. I. V. C. Gaudet, and C. Schlegel, Analysis of error control code use in ultralowpower wireless sensor networks, 2006, suggested the importance of high speed wireless sensor networks in the field of industrial,medicine ,environmental and security scenerios.For limited embedded battery lifetime, ultralow power circuitry is needed in the sensor and processors for
increased transmission rates. Error control coding (ECC) potentially reduce the required transmit power for reliable communication, higher decoder complexity increases the required processing energy. Above idea is used to analyze the
importance of ECC in high transmission power systems.The four most energy efficient decoders are analog decoders. Th best analog decoder becomes energyefficient at about 1/4 the distance of the best digital implementation.
In March 2005D. Vogrig, A. Gerosa, A. Neviani, A. Graell i Amat, G. Montorsi, S. Benedetto, "A 035m CMOS Analog Turbo Decoder for the 40bit Rate 1/3 UMTS Channel Code suggested the prototype is fully integrated in a three metal doublepoly 0.35m CMOS technology, and includes an I/O interface that maximizes the decoder throughput. They first reported prototype of an analog decoder for a realistic errorcorrecting code with the help of CMOS technologies. They concluded saying that decoder was successfully tested at the maximum data rate defined in the standard (2 Mb/s), with an overall power consumption of 10.3 mW at 3.3 V, going down to 7.6 mW with the decoder core operated at 2 V, and an extremely low energy per decoded bit and trellis state (0.85 nJ for the decoder core alone.
In 2005 Dobkin, R.,Peleg M, Ginosar, R.in their paper Parallel interleaver design and VLSI architecture for lowlatency MAP turbo decoders described about the Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for lowlatency turbo decoding, comprising multiple single input singleoutput (SISO) elements, operating jointly on one turbocoded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Concluded that Latency is reduced up to 20 times and throughput for large blocks is increased up to sixfold
relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area.
In 2004 Thul, Michael J, Wehn, N. FPGA implementation of parallel turbodecoders Wireless communication penetrates more and more areas of our everyday lives. Turbocodes provide good forwarderror correction to improve the data transfer reliability. They are used in current standards and future system designers consider them promising candidates. Dedicated hardware, however, is too expensive to use in a new and still rapidly changing system; due to the nonrecurring engineering and mask costs. In this paper, we therefore present a scalable turbodecoder architecture targeted towards FPGA implementation for low volume devices. It allows to optimally exploit the given hardware resources on FPGA to match the desired system throughput. Our design is ported to the Xilinx VirtexII family. On the VirtexII 3000, we achieve a maximum throughput of 26 Mbit/s at 84 MHz with a latency of 185 s.
In 2003 M.Bickerstaff [11] proposed a radix4 log MAP turbo decoder for highspeed 3G mobile data terminals. It processes 3GPP data streams including High Speed Downlink Packet Access (HSDPA)[1] with up to 16 decoder iterations. Higher user data rates, up to 24Mb/s, are supported with 3GPP compliant interleaving. The LogMAP core processes two received symbols per clock cycle using a windowed radix4 architecture doubling the throughput for a given clock rate over a similar radix2 architecture. A reduced complexity radix4 log sum unit combines fast operation with only 0.04dB turbo decoding loss. The chip is fabricated in 0.18Âµm CMOS, operates at a peak clock frequency of 145MHz at 1.8V and dissipates 956mW when decoding continuous 10.8Mb/s HSDPA data streams. Power is reduced using the 1/2 iteration Hard Decision Assisted (HDA) stopping criteria[2] (to as low as 189mW for 10.8Mb/s). The rate 1/3 decoder has an energy efficiency of 10.0 nJ/b/iteration.
In AUG 2003,An efficient hardware interleaver for 3G turbo decoding by Ampadu, P. ; Cornell Broadband Commun. Res. Labs., Ithaca, NY, USA ; Kornegay, K. described an energy efficient approach for VLSI implementation of the 3rd generation partnership project (3GPP) turbo coding interleaver algorithm. Unlike previous implementations, this interleaver uses a twostage dedicated hardware data path that exploits the iterative nature of the decoding process, to compute addresses on the fly, eliminating the overhead associated with programmable processors and precomputed address storage. By separating the interleaving process into two stages, the prescribed architecture allows the preparatory phase to be turned off during iterations, while the decoder engages only the realtime address computation phase, further reducing power consumption.
In 2002,M. A. Bickerstaff, D. Garrett, T. Prokop, C. Thomas, B. Widdup, G. Zhou, L. M. Davis, G. Woodward, C. Nicol, and R.H. Yan suggested in their paper A unified turbo/Viterbi channel decoder for 3GPP mobile wireless in 0.18mCMOS that A channel decoder chip compliant with the 3GPP mobile wireless standard is described. It supports both data and voice calls simultaneously in unified turbo/Viterbi decoder architecture. For voice services, the decoder can process over 128 voice channels encoded with rate 1/2 or 1/3, constraint length 9 convolutional codes. For data services, the turbo decoder is capable of processing any mix of rate 1/3, constraint length 4 turbo encoded data streams with an aggregate data rate of up to 2.5 Mb/s with 10 iterations per block (or 4.1 Mb/s with six iterations). The turbo decoder uses the log MAP algorithm with a programmable log sum correction table. It features an interleaver address processor
that computes the 3GPP interleaver addresses for all block sizes enabling it to quickly switch context to support different data services for several users. The decoder also contains the 3GPP first channel deinterleaving function and a post decoder bit error rate estimation unit. The chip is fabricated in a 0.18m sixlayer metal CMOS technology, has an active area of 9 mm2, and has a peak clock frequency of 110.8 MHz at 1.8 V (nominal). The power consumption is 306 mW when turbo decoding a 2Mb/s data stream with ten iterations per block and eight voice calls simultaneously.
In 2002,G. Masera, M. Mazza, G. Piccinini, F. Viglione, and M. Zamboni, in their article Architectural strategies for lowpower VLSI turbo decoders, proposed that The use of "turbo codes" has been proposed for several applications, including the development of wireless systems, where highly reliable transmission is required at very low signaltonoise ratios (SNR). In the last years extractingbest coding gains has been deeply investigated. Implementing all these things in a hardware is the most difficult thing mainly due to the iterative nature of the decoding process, which demands an operating frequency much higher than the data rate. In the case of wireless applications, the design constraints became even more complex due to the lowcost and low power requirements. The proposed work first presents a new architecture for the decoder core with improved area and power dissipation properties. Later they also include partitioning techniques to reduce the power consumption of the decoder memories. Conclusion is that most of the power is dissipated by the large RAM units required by the decoder, so the described technique is very efficient: an average power saving of 70% with an area overhead of 23% has been obtained on a set of analyzed architectures.
Chun ling kei, wai ho mow A class of switching turbo decoders against severe snr mismatch in the year 2002 describes in mobile communication applications, the fading channel characteristics may vary very rapidly, resulting in a severe SNR mismatch in the receiver. On one hand, the performance of LogMAPbased turbo decoder degrades significantly in the presence of large SNR underestimation errors. On the other hand, the very robust MaxLogMAP based turbo decoder has a SNR loss of about 0.5 dB relative to the former decoder in the absence o SNR mismatch. In this work, we propose a class of switching turbo decoders each specified by a parameter S, which can be selected to compromise the performance without SNR mismatch for the robustness against severe SNR mismatchresults demonstrate that by choosing a proper value of S, it is possible to obtain a switching turbo decoder which is not only robust against severe SNR mismatch but also performs reasonably well compared to the LogMAPbased turbo decoder without SNR mismatch. In addition, the switching operation can be easily implemented by turning off all the correction term related operations in the LogMAP component decoder. In order to further increase the robustness of the decoder, an auto switching decoder is also proposed, which will automatically switch from LogMAP to MAXLogMAP. Finally, it is anticipated that the proposed scheme can be fruitfully applied to the serial concatenated convolutional codes.
ChienMing WuMingDer Shieh ; ChienHsing Wu Memory arrangements in turbo decoders using sliding window BCJR algorithm Turbo coding is a powerful encoding and decoding technique that can provide highly reliable data transmission at extremely low signaltonoise ratios. According to the computational complexity of the employed decoding algorithm, the realization of turbo decoders usually takes a large amount of memory spaces and potentially long decoding delay. Therefore, an efficient
memory management strategy becomes one of the key factors toward successfully designing turbo decoders. This paper focuses on the development of general formulas for efficient turbo decoders. The results thus provide useful and general information on practical implementations of turbo decoders complexity with only a negligible loss in BER performance.
In 2001 M. C. Valenti and J. Sun, The UMTS turbo code and an efficient decoder implementation suitable for softwaredefined radios, suggested some critical . Our simulation implementation issues involved in the development of a turbo decoder, using the UMTS specification as a concrete example. Assumption is that whether the decoder is to be implemented in software or hardware or both possible. Three twists on the decoding algorithm am proposed: (1) a linear approximation of the correction function used by the max* operator which reduces number of adopted processors.
(2) a method for normalizing the backward recursion which yields a 12.5% savings in memory usage; and (3) a simple method for halting the decoder iterations based only on the loglikelihood ratios. They concluded by working on these parameters.
In 2001 Feb., C.Schurgers, F. Catthoor, and M. Engels, paper Memory optimization of MAP turbo decoder algorithms, suggested that Turbo codes are the most recent breakthrough in coding theory. However, the decoder's implementation cost limits their incorporation in commercial systems. Although the decoding algorithm is highly data dominated, no true memory optimization study has been performed yet. Extensive and systematic investigation of different memory optimizations for the maximum a posteriori (MAP) class of decoding algorithms has been studied. It turns out that it is not possible to present one decoder structure as being optimal.
W.P. Ang and H. K. Garg, A new iterative channel estimator for the logMAP & maxlogMAP turbo decoder in Rayleigh fading channel, proposed in the year 2001 says that A new iterative channel estimator for turbo decoding over flat fading channel is studied using the optimum logMAP turbo decoder and the reducedcomplexity suboptimum maxlogMAP turbo decoder. Initially, pilot symbols are used to estimate the complex channel gain and noise variance. After each decoding iteration, only the detected message bits are fed back to the channel estimator to improve the channel estimates. The moving average filter, the FIR filter and the FFT filter are studied and compared to the optimum Wiener filter. It is shown that under very slow fading rate of fdTs = 0.005, the various filters perform closely to one another. Under a faster fading rate of f dTs = 0.02, the FFT filter and FIR filter respectively achieved performance within about 1/2 dB and 1
dB of that achieved by the optimum Wiener filter fora BER of 3 Ã— 104 .
Decoding metrics and their applications in VLSI turbo decoders Parhi, K.K., Zhongfeng Wang in 2000 implemented new ideas about a set of variables which can be easily computed in the course of iterative decoding of turbodecoders called decoding metrics (DMs) are introduced. According to the measured DMs after each iteration, a lot of information other than signaltonoise ratio (SNR) in the received bits, such as how good/bad the current block is and how close the current iteration of decoding is to convergence, can be obtained. Detailed discussions are provided regarding why these variables are chosen. Based on the measured DMs after the first iteration, an approximate SNRrelated variable can be obtained for MAPbased turbodecoders. Simulation results show that there is almost no performance degradation if approximated Lc values are used instead of exact values. It is
also shown that adaptive decoding using DMs is more efficient than existing methods both in terms of hardware and latency. Other applications of DMs are pointed out at last.
In the year 2000 Zhongfeng Wang, Suzuki, H.; Parhi,
K.K. Efficient approaches to improving performance of VLSI SOVAbased turbo decoders put forward two VLSI applicable approaches to improving performance of soft output Viterbi algorithm (SOVA)based turbo decoders. In the first approach, a pseudomedian filter is employed to modify the soft outputs of each SOVAbased constituent decoder. Compared with conventional SOVAbased turbo decoders, an extra coding gain of 0.2 dB can be achieved for a wide range of target biterrorrate (BER). In the second approach, an easily obtainable variable and a simple mapping function are used to avoid the complex computation of the scaling factor for extrinsic information in SOVAbased turbodecoders. An extra coding gain of 0.3 to 0.5 dB can be obtained in general. Conclusion is that this approach does not require signalto noise ratio (SNR) related information while the original method does.
Design of efficient high throughput pipelined parallel turbo decoder using QPP interleaver by Karim, S.M. put forward a novel energy efficient architecture for a turbo decoder using quadratic permutation polynomial (QPP) interleaver The Add Compare Select Offset (ACSO) unit of the maximum a posteriori probability (MAP) decoder, has been pipelined to a depth of four to reduce the critical path delay and increase the operating clock frequency and throughput as a consequence. The present turbo decoder architecture also benefits from a contentionfree quadratic permutation polynomial (QPP) based interleaver, the complexity of which has been considerably reduced by judicious memory partitioning.
In 1997, P. Robertson, P. Hoeher, and E. Villebrun in their paper titled Optimal and suboptimal maximum a posteriori algorithms suitable for turbo decoding proposed that estimating the states or outputs of a Markov process, the symbolbysymbol MAP algorithm is optimal. However, this algorithm, even in its recursive form, poses technical difficulties because of numerical representation problems, the necessity of nonlinear functions and a high number of additions and multiplications. MAP like algorithms operating in the logarithmic domain presented in the past solve the numerical problem and reduce the computational complexity, but are suboptimal especially at low SNR (a common example is the maxlogMAP because of its use of the max function).
A further simplification yields the softoutput Viterbi algorithm (SOVA). We present a logMAP algorithm that avoids the approximations in the maxlogMAP algorithm and hence is equivalent to the true MAP, but without its major disadvantages. We compare point of view to illuminate their commonalities and differences. Asa practical example forming the basis for simulations, we consider Turbo decoding, where recursive systematic convolutional component codes are decoded with the three algorithms, and we also demonstrate the practical suitability of the logMAP by including quantization effects. The SOVA is, at 104, approximately 0.7 dB inferior to the (log)MAP, the maxlog MAP lying roughly in between. We also present some complexity comparisons and conclude that the three algorithms increase in complexity in the order of their optimality.
High throughput low energy FEC/ARQ technique for short frame turbo codes put forward by Chi, Zhipei Zhongfeng Wang ; Parhi, K.Kin the year 2009 focused how to protect the short frames using turbo decoding which is a challenging topic. At first they suggested a scalable and easily implementable interleaver design is proposed since good random interleaves for long frame as turbo codes are not guaranteed to perform well for short frames. Second, an efficient tailbiting encoding/decoding scheme is proposed,
which does not sacrifice performance but significantly increases the throughput of the decoding process comparedwith existing methods. Finally, anovel error detection method, taking advantage a set of decoding metrics (DMs), is developed to reduce the number of cyclic redundancy check (CRC) bits used for error detection. Concluded their work by saving transmission throughput upto 12% , and 21.5% for the energy consumption of the turbo decoder when a frame size of 49 is used.
G.Raheli, Colavolpe, G.Ferrari R. Noncoherent iterative (turbo) decoding SEP 2000 mainly gives the idea about the noncoherent sequence detection schemes for coded linear and continuous phase modulations which deliver hard decisions by means of a Viterbi algorithm. The current trend in digital transmission systems is about iterative decoding algorithms .In The first solution has a structure similar to that of the wellknown algorithm by Bahl et al. (1974),second is based on noncoherent sequence detection and a reducedstate softoutput Viterbi algorithm. Further applications to non coherent iterative decoding of turbo codes and serially concatenated interleaved codes are also considered. The proposed noncoherent detection schemes exhibit moderate performance loss with respect to corresponding coherent schemes and are very robust to phase and frequency instabilities.
In 1998 A. J. Viterbis An intuitive justification and a simplified implementation of the MAP decoder for convolutional codes said that (MAP) decoder is presented toa dualmaxima computation combined with forward and backward recursions of Viterbi algorithms .Conversely, if a correction term is added to the approximation, the exact MAP algorithm is recovered. They concluded by showing how the MAP decoder memory can be drastically reduced.

Algorithms and Methods in Turbo Decoder
A. Turbo encoder and decoder scheme using sliding window Technique
A turbo encoder comprises a parallel concatenation of two convolutional encoders, each of which has a structure comprising number of memory elements. For example. Each encoder converts an uncoded bit sequence b1={b1,j}j=1N into the corresponding encoded bit
sequenceb2={b2,j}j=1N , where N is the length of the input bit
sequences. Fig. 1 depicts a turbo decoder [15],[16], which comprises a parallel concatenation of two decoders, that employ the LUTLogBCJR algorithm. Rather than operating on bits, each LUTLogBCJR decoder processes Logarithmic Likelihood Ratios (LLRs) where b=ln (P(b=0))/ (P(b=1)) quantifies the decoders confidence concerning its estimate of a bit from the
bit sequences b1 \and b2.
FIG 2.1 TURBO ENCODER AND DECODER SCHEME.
B.Convolutional LUTLOGBCJR Architecture
Each LUTLogBCJR decoder processes two a priori LLR sequences, which are converted into the extrinsic LLR sequence. This LLR sequence is iteratively exchanged with that generated by the other LUTLogBCJR decoder, used as
the a priori LLR sequence in the next iteration. Fig. 2(a) depicts the conventional LUTLogBCJR architecture, which employs the slidingwindow technique [18], [19] to generate the LLR sequence as the concatenation of equallength sub sequences. Each of these windows is generated separately, using a forward, a prebackward and a backward recursion, as shown in Fig. 2. These three different recursions are performed concurrently for three different windows, as exemplified. This schedule results in the completion of the windows in their natural order, starting with that containing the first LLR and ending with the one containing the last LLR
.The forward recursion of the LUTLogBCJR algorithm can be performed in two pipelined steps using the corresponding dedicated hardware components of Fig. 2(a).
Fig 2.2 Conventional LUTLogBCJR architecture,
It achieves a high throughput, provided that it can be operated at a high clock frequency. However, the recursions involve calculations that must be performed in series. Therefore, conventional architectures typically employ additional hardware1 during synthesis to achieve a short critical path, a high clock frequency and a high throughput. In summary, efforts to slow down the conventional LUTLogBCJR architecture result in energy wastage, which cannot be avoided without completely redesigning the architect

Previous EnergyEfficient LUTLogBCJR Architecture
Inspired by the analysis the proposed energyefficient LUTLogBCJR architecture is shown in Fig. 3.
FIG 2.3 PROPOSED ENERGYEFFICIENT LUTLOGBCJR ARCHITECTURE
Unlike conventional architectures, it does not use separate dedicated hardware for the three recursions shown in Fig. 2. Instead, this architecture implements the entire algorithm using 2mACS units in parallel, each of which performs one ACS operation per clock cycle.This architecture employs a twinlevel register structure to minimize the high energy consuming mainmemory access operations. At the first register level, each ACS unit is paired with a set of general purpose registers R1, R2, and R3.They store the result that is useful for the same ACS unit in consecutive cycles.
The second register level comprises REG bank 1 and REG bank 2 of Fig. 3, which are used to temporarily store the LUTLogBCJR variables between consecutive values of the bit index during the recursions decoding processes. The REG bank 1 contains priori LLRs and some dummy registers. In
stores all the required a priori LLR sequences and extrinsic LLR sequences during the decoding process and the state metrics from the previous window, which facilitates the processing of the entire LUTLogBCJR algorithm. A fully parallel arrangement of an arbitrary number of ACS units of
Fig. 3, it may be readily applied to any LUT LogBCJR decoder.
In contrast to the differentlength data paths of Fig 2(a), the identical 2m parallel data paths shown in Fig. 3 have equal lengths, which avoid energy wastage. Here also the importance is given for the low throughput. But the main idea is to increase the throughput by doubling the ACS unit and at the same time reducing the number of clock cycles and increasing the total energy efficiency of the decoder.


Proposed Work

Doubling ACS unit
BCJR algorithm is a computationally complex algorithm. Implementing complex mathematical operations such as multiplication and division significantly increases the usage of hardware elements. Therefore, BCJR algorithm uses more hardware elements and runs slower on hardware due to its complex mathematical operations. The impact of this disadvantage is reduced by using lookup table for complex operations except multiplication.
From the literature survey of research works done in the area of turbo decoding, it is inferred that an efficient coding technique is needed .In this paper, we demonstrated that upon aiming for a high throughput, conventional LUT LogBCJR architectures my have wasteful designs requiring high chip areas and hence high energy consumptions. However, in energyconstrained applications, achieving a low energy consumption has a higher priority than having a high throughput. This motivated our lowcomplexity energy efficient architecture, which achieves a low area and hence a low energy consumption by decomposing the LUTLogBCJR algorithm into its most fundamental ACS operations. It is clear that multiple decoders (as many as needed) can be inserted to the same platform and provided as a monolithic solution at reasonable costs. Power constraint applications where the desired BER is claimed at low SNR which means low power consumption.
Fig 3.1 Redesigned LUTlog BCJR Architecture
Despite its superior BER performance, the proposed BCJR turbo decoder has a clear throughput disadvantage. For this reason the decoder has been duplicated. This is done by simply inserting another BCJR turbo decoder on the same FPGA platform, enabling two operating decoders at the same time interval. This simple yet effective modification yields almost doubled throughput results compared to the single BCJR decoder.
Applications where both throughput and BER are important design issues. In such a case the proposed approach can be used in parallel by using multiple turbo decoding engine which can provide very high throughput at an already provided low BER.
Comparison of Implemented Turbo Decoders
Publication
Proposed
[1] [10] [11] [13] [5] [6] Algorithm
LUTlog
LUTlog
LUTlog
LUTlog
LUTlog
Maxlog
Maxlog
Gate count
75k
85k
410k
65k
0
553k
Area(mm2)
0.35
9
14.5
8.2
2.1
3.57
Memory Required(kbit)
188
239
450
161
0
129
Clock frequency F(MHz)
333
111
145
100
300
390.6
Throughput(Mb/s)
1.03
2
10.8
4.17
150
390.6
Power Consumption(Mw)
4.17
292
956
320
300
788.9
Supply voltage
1
1.8
1.8
1.8
0
1.2
Energy consumption
0.4
14.6
11.1
12.7
0.31
0.37
Publication
Proposed
[1] [10] [11] [13] [5] [6] Algorithm
LUTlog
LUTlog
LUTlog
LUTlog
LUTlog
Maxlog
Maxlog
Gate count
75k
85k
410k
65k
0
553k
Area(mm2)
0.35
9
14.5
8.2
2.1
3.57
Memory Required(kbit)
188
239
450
161
0
129
Clock frequency F(MHz)
333
111
145
100
300
390.6
Throughput(Mb/s)
1.03
2
10.8
4.17
150
390.6
Power Consumption(Mw)
4.17
292
956
320
300
788.9
Supply voltage
1
1.8
1.8
1.8
0
1.2
Energy consumption
0.4
14.6
11.1
12.7
0.31
0.37
TABLE 1
Performance of turbo decoders TABLE 2


Conclusion
The turbo decoding structure based on a previous work is implemented in this work. As indicated in the introduction section, the LUTLOGBCJR turbo decoder BY USING ACS UNIT is compared with the Xilinx LUTLogBCJR turbo decoder WITH SLIDING WINDOW TECHNIQUE. It is observed that the BCJR turbo decoder yields a better BER performance than the Xilinx LUTLOGBCJR turbo decoder as expected. In spite of its superior BER performance, implementation of the BCJR algorithm has been avoided because of its complexity considering the past VLSI technology. However, modern VLSI technology allows us to implement this algorithm at reasonable costs. The PROSPECTIVE APPLICATION areas of our proposed implementation are: Applications that require low BER with a disclaimed throughput performance. Despite its superior BER performance, the proposed BCJR turbo decoder has a clear throughput disadvantage. For this reason the decoder has been duplicated. This is done by simply inserting another BCJR turbo decoder on the same FPGA platform, enabling two operating decoders at the same time interval. This simple yet effective modification yields almost doubled throughput results compared to the single BCJR decoder. This modification leads to the fact that multiple decoders (as many
as needed) can be inserted to the same platform and provided a monolithic solution at reasonable costs.

References

P.Corke,T.Wark,R.Jurdak,H.Wen,P.Valencia,andD.Moore, Environmental wireless sensor networks,Proc. IEEE,vol.98,no.11, pp. 19031917, Nov. 2010.

S. L. Howard, C. Schlegel, and K. Iniewski, Error control coding in lowpower wireless sensor networks: When is ECC energyefficient?,EURASIP J. Wirel. Commun. Netw., vol. 2006, pp. 114, 2006.

L. Li, R. G. Maunder, B. M. AlHashimi, and L. Hanzo, An energyefficient error correction scheme for IEEE
802.15.4 wireless sensor networks,Trans. Circuits Syst. II, vol. 57, no. 3, pp. 233237, 2010.

M. May, T. Ilnseher, N. Wehn, and W. Raab, A 150 Mbit/s 3GPP LTE turbo code decoder, inProc. Design, Autom. Test in Euro. Conf. Exhib. (DATE), 2010, pp. 1420 1425.

C. Studer, C. Benkeser, S. Belfanti, and Q. Huang, Design and implementation of a parallel turbodecoder ASIC for 3GPPLTE,IEEE J.SolidState Circuits, vol. 46, pp. 817, 2011.

C. Wong, Y. Lee, and H. Chang, A 188size 2.1 mm reconfigurable turbo decoder chip with parallel architecture for 3GPP LTE system, inProc. Symp. VLSI Circuits, 2009, pp. 288289.

P. Robertson, P. Hoeher, and E. Villebrun, Optimal and suboptimal maximum a posteriori algorithms suitable for turbo decoding,Euro. Trans. Telecommun., vol. 8, no. 2, pp. 119125, 1997.

W.P. Ang and H. K. Garg, A new iterative channel estimator for the logMAP & maxlogMAP turbo decoder in Rayleigh fading channel, inProc. Global Telecommun. Conf., 2001, vol. 6, pp. 32523256.

M. A. Bickerstaff, D. Garrett, T. Prokop, C. Thomas, B. Widdup, G. Zhou, L. M. Davis, G. Woodward, C. Nicol, and R.H. Yan, A unified turbo/Viterbi channel decoder for 3GPP mobile wireless in 0.18mCMOS,IEEE J. SolidState Circuits, vol. 37, no. 11, pp. 15551564,Nov. 2002

M. Bickerstaff, L. Davis, C. Thomas, D. Garrett, and C. Nicol, A 24Mb/s radix4 logMAP turbo decoder for 3GPP HSDPA mobile wireless, inProc. IEEE Int. SolidState Circuits Conf., 2003, pp. 150484.

Z. Wang, Highspeed recursion architectures for MAP Based turbo decoders,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 4, pp. 470474, Apr. 2007.

F.M. Li, C.H. Lin, and A.Y. Wu, Unified convolutional/turbo decoder design using tilebased timing analysis of VA/MAP kernel,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 10, pp.10638210, Oct. 2008.

C. Schurgers, F. Catthoor, and M. Engels, Memory optimization of MAP turbo decoder algorithms,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 2, pp. 305 312, Feb. 2001.

G. Masera, M. Mazza, G. Piccinini, F. Viglione, and M. Zamboni, Architectural strategies for lowpower VLSI turbo decoders,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 3, pp. 279285, Mar. 2002.

C.M.Wu,M.D.Shieh, C.H.Wu,Y.T.Hwang, andJ.H.Chen, VLSI architectural design tradeoffs for slidingwindow log MAP decoders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp. 439447, Apr. 2005.

C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limiterror correcting coding and decoding: Turbo codes, inProc. IEEE Int.Conf. Commun., 1993, pp. 1064 1070.

L. Hanzo, T. H. Liew, B. L. Yeap, R. Tee, and S. X. Ng, Turbo Coding,Turbo Equalisation and SpaceTime Coding. New York: Wiley, 2011.

L. Hanzo, J. P. Woodard, and P. Robertson, Turbo decoding and detection for wireless applications,Proc. IEEE, vol. 95, no. 6, pp. 11781200, Jun. 2007.

P. Robertson, E. Villebrun, and P. Hoeher, A comparison of optimal and suboptimal MAP decoding algorithms operating in the log domain, inProc. IEEE Int. Conf. Commun., 1995, pp. 10091013.

C. Schurgers, F. Catthoor, and M. Engels, Memory optimization of MAP turbo decoder algorithms,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 2, pp. 305 312, Feb. 2001.

G. Masera, M. Mazza, G. Piccinini, F. Viglione, and M. Zamboni, Architectural strategies for lowpower VLSI turbo decoders,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 3, pp. 279285, Mar. 2002.

C.M.Wu,M.D.Shieh, C.H.Wu,Y.T.Hwang, andJ.H.Chen, VLSI architectural design tradeoffs for slidingwindow log MAP decoders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp. 439447, Apr. 2005.

A. J. Viterbi, An intuitive justification and a simplified implementation of the MAP decoder for convolutional codes,IEEE J. Sel. Areas in Commun., vol. 16, no. 2, pp. 162264, 1998.

L.Li,R.G.Maunder, B.M.AlHashimi, and L.Hanzo, Design of fixedpoint processing based turbo codes using extrinsic information transfer charts, inProc. IEEE Veh. Technol. Conf., 2010, pp. 15.

E. Boutillon, C. Douillard, and G. Montorsi, Iterative decoding of concatenated convolutional codes: Implementation issues,Proc. IEEE, vol. 95, no. 6, pp. 1201 1227, Jun. 2007.

Y. Zhang and K. K. Parhi, Highthroughput radix4 logMAP turbo decoder architecture, inProc. Asilomar Conf. [29]Z.He,P.Fortier,andS.Roy,Highly parallel decoding architectures for convolutional turbo codes,IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 10, pp. 1063 8210, Oct. 2006.

M. C. Valenti and J. Sun, The UMTS turbo code and an efficient decoder implementation suitable for softwaredefined radios,Int. J. Wirel. Inform. Netw., vol. 8, no. 4, pp. 203215, 2001.

C. Benkeser, A. Burg, T. Cupaiuolo, and Q. Huang, Design and optimization of an HSDPA turbo decoder ASIC,IEEE J. SolidState Circuits, vol. 44, no. 1, pp. 98 106, Jan. 2009.

S.G. Lee, C.H. Wang, and W.H.Sheen, Architecture design of QPP interleaver for parallel turbo decoding, in Proc. IEEE Veh. Technol. Conf., 2010, pp. 15.

Y. Sun and J. R. Cavallaro, Efficient hardware implementation of a highlyparallel 3GPP LTE, LTEadvance turbo decoder,Integr., VLSI J., vol. 44, no. 1, pp. 111, 2010.
p/>

G. J. Pottie and W. J. Kaiser, Wireless integrated network sensors, Commun. ACM, vol. 43, no. 5, pp. 5158, May 2000.

N. Sadeghi, S. Howard, S. Kasnavi, K. I. V. C. Gaudet, and C. Schlegel, Analysis of error control code use in ultra lowpower wireless sensor networks, inProc. Int. Symp. Circuits Syst., 2006, pp. 35583561.

G. Barrenetxea, F. Ingelres, G. Schaefer, and M. Vetterli, Wireless sensor networks for environmental monitoring: The Sensorscope experience, inProc. IEEE Int. Zurich Seminar Commun., 2008, pp. 98101.

Miyamoto, N. ; Kotani, K. ; Fujisawa, H. WiMAX turbo decoder with tailbiting BIP architecture in 2009.
Signals, Syst., Comput.,2006, pp. 17111715.

Colavolpe, G.Ferrari,G; Raheli, R. Noncoherent iterative (turbo) decodingSep 2000 .IEEE transaction,vol 48,issue 9. [39]ChienMing WuMingDer Shieh ; ChienHsing Wu Memory arrangements in turbo decoders using sliding window BCJR algorithm2002 , Page(s): V557 – V560 vol.5. [40]Parhi, K.K., Zhongfeng WangDecoding metrics and their applications in VLSI turbo decoders
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on (Volume:6 )

Zhongfeng Wang, Suzuki, H.; Parhi, K.K. Efficient approaches to improving performance of VLSI SOVAbased turbo decoders Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva.The 2000 IEEE International Symposium on (Volume:1 ).

Zhongfeng Wang HighSpeed Recursion Architectures for MAPBased Turbo DecodersVery Large Scale Integration (VLSI) Systems, IEEE Transactions on (Volume:15 Issue:4 [43]Dobkin, R., Peleg M, Ginosar, R.in their paper Parallel interleaver design and VLSI architecture for lowlatency MAP turbo decoders Very Large Scale Integration (VLSI) Systems, IEEE Transactions on (Volume:13 , Issue: 4 )Date of Publication:April 2005

Thul, Michael J, Wehn, N FPGA implementation of parallel turbodecodersIntegrated Circuits and Systems Design, 2004. SBCCI 2004.

Chun ling kei, wai ho mow A class of switching turbo decoders against severe snr mismatch Vehicular Technology Conference, 2002. Proceedings. VTC 2002Fall. 2002 IEEE 56th(Vol:4).