 Open Access
 Total Downloads : 76
 Authors : Roshan Anna Joseph , Hanna Mathew
 Paper ID : IJERTV8IS050547
 Volume & Issue : Volume 08, Issue 05 (May 2019)
 Published (First Online): 31052019
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Programmable CRC Computation on High Speed Parallel LFSR Architectures based on Improved State Space Structure
Roshan Anna Josepp
Department of Electronics & Communication Engineering Saintgits College of Engineering,
Kottayam, India
Hanna Mathew2
Department of Electronics & Communication Engineering Saintgits College of Engineering,
Kottayam, India
Abstract Linear Feedback Shift Register (LFSR) is a shift register with its input bit as a linear operation of its previous state. Most commonly, this function is a Boolean exclusive OR(XOR) .LFSRs are as such used in cyclic error correcting codes like BCH and CRC encoders. Parallelization of LFSR increases the system throughput rate, thus makes it useful for highspeed communication. Statespace transformation technique reduces the circuit complexity of parallel architectures and the encoder consists of matrix multiplications, with searching of transformation matrices. The new technique is the construction of a modified transformation matrix and the corresponding searching algorithms for the desirable transformation matrix which has both speed and area advantages. The register with a well chosen feedback produces sequence of bits which exhibits psuedorandom behavior thus making it applicable for very fast generation of pseudorandom sequence such as in cryptography, digital broadcasting, fast counters, testpattern generation, radio jamming systems etc.
Keywords Component, BCH, CRC, LFSR, statespace transformation.
INTRODUCTION
In digital processing, a Linear Feedback Shift Register (LFSR) is a shift register whose input bit is a direct capacity of its past state. The most generally utilized straight capacity of single bits is exclusive OR (XOR)[1]. Accordingly, an LFSR is regularly a shift register whose input bit is driven by the XOR of certain bits of the general register blocks. LFSRs are widely applied as such in BCH and CRC encoders, generally to compute the remainder polynomials. A traditional encoder or a sequential LFSR architecture can work at extremely high frequency, but it experiences the innate sequential in and sequential out restriction. At the point when the throughput of this sequential/serial architecture can't get up with the framework information rate, parallel processing must be considered to meet the general necessities of fast interchanges and will provide a higher throughput rate. For example, as in optical transmission, rates beyond Gigabits per second are needed.[2]
Many sorts of parallel LFSR structures have been proposed in the related writings for BCH and CRC encoders. In [3], a parallel CRC encoder has been proposed through scientific finding. Treeorganized calculation and sub expressions sharing are used to improve the cascaded rationale parts. Zhang and Parhi [4] proposed an improved
fast BCH encoder intended to dispense with the fanout bottleneck condition. This parallel LFSR is effective as far as it accelerates the calculation, but its equipment cost is high. A sort of statespace transformation technique is applied in

and [6] to decrease the multifaceted nature of the traditional parallel CRC circuits. By embracing straight network change, a full speedup factor can be accomplished at the expense of an extra hardware outside the criticism circle. Parhi [7] and Jung [8] proposed another sort of parallel LFSR dependent on IIR channel topology what's more, its leverage is that the pipeline method can be connected to accomplish some improvement in the equipment proficiency of LFSR.
In statespace transformation, the encoder is made out of some grid multiplications, with the most exertion committed to searching of transformation networks that may accomplish the minimalarea circuits. In this short, another change matrix development and an inexact searching strategy are proposed. Utilizing this surmised calculation, the transformation can be found in an a lot shorter time than thorough searching. A low multifaceted nature and rapid parallel LFSR engineering can be inferred by applying the statespace change with the alluring arrangement. Accordingly, the equipment complexity can be viably decreased without yielding execution or speed
The initial value of the LFSR is called the seed and since the operation of the register is deterministic, the stream of values produced by the register is completely created by the register is totally dictated by its current (or past) state. Similarly, in light of the fact that the register has a finite number of possible states, it should inevitably enter a repeating cycle. In any case, an LFSR with a wellpicked feedback function can produce a sequence of bits that shows up random nature and has a long cycle. The mathematics of cyclic redundancy check, utilized to give a brisk check against transmission errors, are firmly identified with those of an LFSR.[5] LFSRs can be implemented in hardware, and this makes them helpful in applications that require a quick generation of a pseudo irregular and random sequence. LFSRs have additionally been utilized for producing an estimate of repetitive sound in different programmable sound generators. [8] Complete LFSR are commonly used as pattern generators for exhaustive testing, since they cover all possible inputs for an ninput circuit. In builtin selftest (BIST) techniques, it is accomplished with a multipleinput signature register (MISR or MSR), which is a type of LFSR.
In this concise, a new transformation matrix development and an approximate looking strategy are proposed. Utilizing this estimated calculation, the desirable transformation matrix framework can be found in a lot shorter time than thorough pursuits. A low complexity and rapid parallel LFSR architecture can be inferred by applying the statespace change with the new arrangement. Therefore, the hardware complexity can be viably decreased without relinquishing execution or speed.The rest of this brief is organized as follows. In Section I,the basic serial and parallel LFSR designs are reviewed briefly. Statespace transformation and IIR filter based architectures are mentioned as parallel schemes. Section II briefs about the basic searching algorithm for the statepace method the and problems with existing architecture. Section III introduces a modified algorithm for the transformation and CRC check for the modified transformation matrix, and the corresponding searching algorithms. Section IV gives the experimental results and makes a comparison. Conclusions are drawn in Section V.

BASIC LFSR ARCHITECTURES
LFSRs can be implemented based on serial and parallel architectures. The initial value of the LFSR is called the seed and since the operation of the register is deterministic, the stream of values produced by the register is completely created by the register is totally dictated by its current (or past) state. Similarly, in light of the fact that the register has a finite number of possible states, it should inevitably enter a repeating cycle.The mathematics of cyclic redundancy check, utilized to give a brisk check against transmission errors, are firmly identified with those of an LFSR.[5]Few LFSR architectures are discussed below.

Serial LFSR Architecture
A conventional serial LFSR architecture for BCH(n, k) code is illustrated in figure 1. There is XOR operation and finite field multiply operation.[1] In binary codes, the coefficient of the generator polynomial equals either 0 or 1, then the corresponding multiply can be simplified to a connection or disconnection. Registers represent the remainder polynomial.Two kinds of tpical parallel LFSR architectures are introduced next.
Figure 1: Conventional serial LFSR architecture
Although a serial LFSR architecture can operate at very high frequency, it suffers from the inherent serialin and serialout limitation.[4][5] When the throughput of this serial architecture cannot catch up with the system data rate, parallel processing must be considered to meet the general requirements of highspeed communications and realize a higher throughput rate. There are many kinds of parallel
LFSR architectures. A kind of statespace transformation can reduce the complexity of the conventional parallel CRC.

Parallel LFSR Architecture based on statespace transformation
In statespace transformation, the encoder is composed of some matrix multiplications, with the most effort devoted to searching of transformation matrices that may achieve the minimalarea circuits.[1] In this brief, a new transformation matrix construction and anapproximatesearchingmethodare proposed. Using this approximate algorithm, the desirable transformation matrix can be found in a much shorter time than exhaustive searches. A lowcomplexity and highspeed parallel LFSR architecture can be derived by applying the statespace transformation with the desirable solution. As a result, the hardware complexity can be effectively reduced without sacrificing performance or speed.[5] Supposing an nk dimensional state vector R(t) = [rnk1(t), r1(t), r0(t)], where ri(t) represents the state of the ith remainder registers at time t and u(t) represents the tth singlebit input. R(t + 1) can be expressed as R(t + 1) = A * R(t) + b *u(t) where matrix A is
and vector b is b = (g(nk1), . . . , g1, g0)T .R(t+1) can be derived from recursive deductions. Assuming p message bits are input at the same time,
R(t + 1) = Ap R(t) + Bp U p(t).
In the case of parallel input, where matrix Bp is an n k, p
matrix,
Bp = (A(p 1)b, …, Ab, b).
To reduce the complexity of the system,a linear transformation of R(t) is applied through a non singular rmatrix T, R(t) = T RT (t).Then the state
equation can be rewritten as ,
RT (t + 1) = ApT RT (t) + BpT Up(t)
where,
R(t + 1) = T RT (t + 1)
ApT = T 1 ApT BpT = T1 Bp.
Figure 2: Block diagram for parallel LFSR architecture.
The figure 2 shows block diagram of the system described by this method. The transformation matrices are chosen such that matrix ApT is a companion matrix, which will simplify the feedback loop of the parallel architecture. Unfortunately, when Ap is transformed into a companion matrix ApT , matrix T and BpT may become complicated and dense with many 1s. Even the feedback loop is fast
and of low complexity as expected, the other parts of the encoder may have a longer critical path with high complexity. After applying pipelining and retiming techniques to reduce the critical path, the datapath timing is still not satisfying and brings extra hardware cost as well.To further optimize the hardware implementations with statespace transformations and to find a better transformation matrix, some constraints are needed to be defined.

The transformation matrix T must be reversible to make the statespace transformation workable.

The total number of 1s in matrices T , ApT , and BpT should be as small as possible. The number of 1s in these matrices determines the number of XOR gates of the encoder directly.

The method to find the transformation matrix needs to be efficient. The whole searching space is too huge for exhaustive search.
Based on these three constraints, a new method to construct the transformation matrix is introduced in the next section.


Parallel LFSR Architecture based on IIR filter method
A kind of parallel LFSR architecture can be implemented based on IIR filter, to mitigate the impractical hardware complexity of the parallel architecture.[7] The serial LFSR encoder shown in figure 1 is treated as an IIR filter, and the loop update equations can be derived by adopting the look ahead scheme. A parallel BCH encoder can be implemented as in figure 3 based on these equations. The past input and output values are combined to generate the output y(n).[7]
.
Figure 3: Example for an LFSR using IIR filter method


BASIC SEARCHING ALGORITHM FOR STATE SPACE MODEL OF LFSR
Few CRC or BCH codes are to be referenced . Their generator polynomials are to be considered. In order to find the desirable transformation matrix T , an exhaustive search is performed in the vector space of b. The parallel level p or the input size is chosen as the degree of generator polynomials for a clear comparison with previous architectures. The proposed method can be applied at any parallel level. Since the matrices T, ApT, and BpT actually imply the connection of the coupling circuits, it follows that the hardware complexity is related to the total number of 1s in these three matrices, which is denoted as TN in the following. Consequently, it is needed to find a vector b2 that minimizes TN. The basic searching algorithm to be used is as follows.

First, the coupling matrices A and Bp can be computed for the generator polynomials with input size p = nk.

Then traverse all the possible values of b2 to construct transformation matrix T, and subject it to statespace
transformation to obtain transformed coupling matrices ApT and BpT .

Afterward, the TN can be counted. The least one indicates the lowest complexity of hardware and the corresponding vector b2 can be regarded as the best solution. TN is regarded as the number of XOR gates.

For n k = 32 or more, the vector space of b is too enormous and this exhaustive searching algorithm requires more than one month to obtain the results. This restricts the application of this searching algorithm for highorder generator polynomials.
A. Problems With The Existing Architecture.
The transformation matrix used in [4] is chosen such that matrix ApT is a companion matrix, that simplifies the feedback loop of the parallel architecture. When Ap is transformed into a companion matrix ApT, matrix T and BpT may become complicated and dense with many 1s. Even the feedback loop is fast and of low complexity as expected. The other parts of the encoder may have a longer critical path with high com plexity. After applying pipelining and retiming techniques to reduce the critical path, the datapath timing is still not satisfying and brings extra hardware cost as well. As well the number of clocks needed will be much larger and only synchronous clock could be provided for an optimal output. This difference indicates that only a minority of possibilities of T are considered if using b to generate T. The transformation matrix derived from optimal b may not be the optimal one among all the possibilities of T [11]. Since there are different kinds of matrices that may change the circuits into increasingly proficient structures, a few endeavors can be committed to improve the technique for developing matrix T and to additionally advance the equip ment usage with state space changes. So as to locate a superior change network, a few imperatives need to be characterized here. To start with, the change in matrix T must be reversible to make the state space change useful. [5] Second, the all out number of 1s in networks T , ApT , and BpT ought to be as little as could reasonably be expected. The number of 1s in these networks decides the quantity of XOR entryways of the encoder straight forwardly. Third, the strategy to observe the change framework should be effective. The entire scanning space is unreasonably tremendous for comprehensive look [11].
In view of these limitations, another strategy to choose the transformation matrix is proposed as pursues. In this, encoder is composed of some matrix multiplications. Most effort is devted to the searching of transformation matrices. and it helps to achieve the minimalarea circuits. An alternative searching algorithm is designed to obtain the desired transformation matrix more quickly.


MODIFIED SEARCHING ALGORITHM FOR STATESPACE MODEL OF LFSR
Based on the approximate searching algorithm a modified state space model can be implemented with a different choices for T matrix. The generation of T matrix is an area of research. Efficient choice of T matrix yields in generating ApT BpT transformed matrices very faster. The generated T must
be reversible. Number of 1s in matrices be small, which in turn determines number of XOR gates in the architecture. So the method to find T needs to be efficient. The table 4.1 shows a few suggestions for T matrix and corresponding values for ApT and BPT.
Table 1: Few suggestions for T Matrix
Choice1
Choice2
Choice3
T=I
ApT = Ap
BpT = Bp
T=[ p]1
A
p 1
ApT = [A ]
BpT = ApBp
T=[Bp]1 ApT = I
BpT = Bp p1
A

Implementation of the Proposed Method In CRC Encoding Linear feedback shift registers (LFSRs) are applied as such in BCH and CRC encoders to compute the remainder polynomial. Many types of parallel LFSR architectures have already been presented for BCH and CRC encoders. In this, a parallel CRC implementation designed through mathematical deduction using the modified statespace model. Tree structured computation and subexpressions are used to
optimize the cascaded logic parts[12].
Table 2: Generator polynomials referenced for the programmable CRC.
CRC Name
Polynomials
CRC12
x12 + x11 + x3 + x2 + 1
CRC16
x16 + x15 + x2 + 1
CRC32
x32 + x23 + x22 + x16 + x12 + x11 + x10 + x8
+ x7 + x5 + x4 + x2 + x + 1

Programmable CRC for16 and 32bit
The generator polynomial CRC16 and CRC32 is given in table 2. A CRC16 bit code, detects single and double bit errors in the message. The burst errors are of length 16 bit or less. All errors are with an odd number of bits and most errors are for longer bursts. It is used in Ethernet IEEE 802.3 standard. Remainder polynomial is computed to min imize errors, get random values and lengthy pattern. The initial value, called as the seed, can be chosen to all 1 or 0.At the reset condition every flipflop must be set to high.
Table 3: CRC16: Hexadecimal output of some random values.
Input
CRCOutput
0000ABCD
F8A4
0000E3DD
48C2
0000A82D
70E7
00008A25
3CD4
000042CD
8EA8
00003DFD
8C0E
Table 4: CRC32: Hexadecimal output of some random values.
Input
CRCOutput
ABCD1234
3005573B
1234ABCD
5C380B83
AFCF1374
A9DEF39B
1A35ABD3
D4954474
A programmable CRC checker is implemented using the modified StateSpace trans formation model which selects between CRC16 and CRC32.Asynchronous clock is given for the reset condition, and for the modulo2 operations. The implemented CRC checker generates the CRC coded output for each input bits. The tables shows the corresponding CRC outputs of 16 and 32 bit CRC input values. The values were cross checked using Online CRC Calculator.


RESULTS AND DISCUSSION
The proposed, programmable CRC computation on high speed parallel LFSR architectures based on improved state space structure have been designed using Xilinx. The result of simulation is shown in the following sections. For the experimental evaluation, a Spartan3a Starter board xc3s700a fg484 was used. The Verilog code of the modified State Space methodology was generated and simulated with the Model Sim PE 5.5e to extract the output codes for the CRC bits. Following simulation, methodology is synthesized with ISE Design Suite 14.7. The Xilinx XPower analyzer was then used to determine power consumption, using the design netlist, the design constraints, and the simulation activity.

Simulation Results
The output values of CRC16 and CRC32 obtained through the simulation of Verilog code was crosschecked with values in table 4.5 which was obtained from online CRC calculator(www.ghi.de).

Analysis of the modified programmable crc with existing architectures
Outputs of CRC checksum were crosschecked using Online CRC checker for different values. Synthesis results shows this architecture outperforms the previous designs. Post place & route timing analysis shows that the clock needed to setup on destination clock=2.461 ns. Maximum clocking frequency, assuming 50 % duty cycle, T= 2.461nS was found to be 406.33888 MHz . CRC computations per second, assuming 4 clocks per CRC computation is 101584721 CRC. Throughput, assuming 4 clocks per CRC computation is calculated as 3.0274Gbps. Latency time (assuming 4 clocks per CRC computation),time taken for the input to reach the output is found as 9.844nS. Number of clocks needed were very less compared to the conventional structures. Asynchronous clock was given for the programmable CRC. Clocks were provided for the reset condition and loading the transformation matrix values. Delay was significantly reduced in the modified programmable structure. The table 5.2 shows the delay,
number of clocks, and power analysis of the existing architectures and the modified programmable CRC.
Model
CRC16
series
CRC32
series
CRC16
parallel
CRC32
parallel
PCC
Using Modified SST
Delay
8.158 ns
15.705 ns
2.885 ns
4.795 ns
5.202
ns(total)
Clock (no.s)
17
33
4
4
4
(asynch.)
Power
73.62 mW
92.41mW
13.73 mW
17.42 mW
20.35
mW
Model
CRC16
series
CRC32
series
CRC16
parallel
CRC32
parallel
PCC
Using Modified SST
Delay
8.158 ns
15.705 ns
2.885 ns
4.795 ns
5.202
ns(total)
Clock (no.s)
17
33
4
4
4
(asynch.)
Power
73.62 mW
92.41mW
13.73 mW
17.42 mW
20.35
mW
Table 5: Analysis of various LFSR architectures with the modified architecture using CRC Check.


CONCLUSION

The A linear feedback shift register (LFSR) is the shift register including input bit which is the direct XOR operation of its old block. The LFSR including wellchosen feedback operation is able to generate the series of bits that looks random that includes lengthy cycle. Through this project the basic serial and parallel LFSR designs were reviewed and analyzed. The parallel architectures can achieve hardware dsigns with the minimal XOR gates and the same CPD. LFSRs can be implemented in hard ware, and this makes them useful in applications that require very fast generation of a pseudorandom sequence, such as directsequence spread spectrum radio. LFSRs have also been used for generating an approximation of white noise in various programmable sound generators. The LFSR including wellchosen feedback operation is able to generate the series of bits that looks random that includes lengthy cycle. Modified statespace transformation architectures and the corresponding searching algorithms were introduced.
Through this project a modified method to choose the transformation matrix in the statespace model was computed. A CRC check for the modified transformation ma trix, and the corresponding searching algorithms were done. A programmable CRC checker was implemented based on the modified StateSpace transformation technique of parallel LFSR. Based on this improved model, reducedcomplexity parallel architectures is obtained. Experimental results shows that the throughput of the modified parallel architecture is much higher than those of serial LFSR architecture. Choosing of the transformation matrices is an area of research. Number of clocks were reduced which reduces the overall delay. There is a deterministic improvement in the throughput and speed and a significant reduction in power consumed.
REFERENCES

G. Hu, J. Sha and Z. Wang,HighSpeed Parallel LFSR Architectures Based on Improved StateSpace Transformations,IEEE Transaction on Very Large Scale In tegration (VLSI) Systems,vol.25, 1159 1163,2017 .

M. Y. Hsiao and K. Y. Sih, Serialtoparallel transformation of linear feedback shiftregister circuits,Trans. Electron. Comput.,vol.6,738 740,2007.

M. Ayinala and K. K. Parhi,Highspeed parallel architectures for linear feedback shift registers,IEEE Trans. Signal Process.,vol.59,44594469,2011. [8] M. Ayinala and K. K. Parhi, Efficient Parallel VLSI Architecture for Linear Feedback Shift Registers, in Proc. IEEE Workshop on SiPS, Oct. 2010, pp. 5257.

J. Jung, H. Yoo, Y. Lee, and I.C. Park,Efficient parallel architecture for lin ear feedback shift registers, IEEE Trans. Circuits Syst. II, Express Briefs,vol.62, 10681072,2015.

K. K. Parhi, Eliminating the fanout bottleneck in parallel long BCH en coders,IEEE Trans. Circuits Syst. I, Reg. Papers,vol.51,512 516,2004.

T.B. Pei and C. Zukowski,Highspeed parallel CRC circuits in VLSI, IEEE Trans.Commun.,vol.40,653657,2004.

K. K. Parhi and D. G. Messerschmitt, Pipeline interleaving and parallelism in re cursive digital filters. II. Pipelined incremental block filtering,IEEE Trans. Acoust., Speech, Signal Process.,vol.37,1118 1134,2000.

J. H. Derby,Highspeed CRC computation using statespace transformations, in Proc. IEEE GLOBECOM, 166170,2001.

X. Zhang and K. K. Parhi, Highspeed architectures for parallel long BCH en coders,(IEEE Trans. Very Large Scale Integr. VLSI Syst.),13,872877,2005.

C. Kennedy and A. ReyhaniMasoleh, Highspeed parallel CRC circuits, (in Proc. 42nd Annu. Asilomar Conf. Signals, Syst., Comput),18231829,2008.

L. Song, M.L. Yu, and M. Shaffer, 10 and 40Gb/s forward error correction de vices for optical communications,IEEE J. SolidState Circuits,11,15651573,2002.

Y.M. Lin, C.H. Yang, C.H. Hsu, H.C. Chang, and C.Y. Lee, A MPCNbased parallel architecture in BCH decoders for NAND flash memories ,IEEE Trans. Cir cuits and Sys. II,58,682686,2011.

H.C. Chang, C.Y. Lee, Y.M. Lin, and C.H. Yang, Apparatus and method of processing cyclic codes, U.S. Patent 20 110 292 681 A1,12, 2011.

H. Chen, CRTBased highspeed parallel architecture for long BCH encoding,IEEE Trans. Circuits Syst. II, Exp. Briefs,56,684686,20

R. Cherukuri, Agile encoder architecture for strengthadaptive long BCH codes,in Proc. IEEE GLOBECOM,19001904,2010.

S. Lin and D. J. Costello, Error control coding: Fundamentals and Applica tions,2nd ed. Englewood Cliffs, NJ: PrenticeHall Inc.,2004.

H. Yoo, J. Jung, J. Jo, and I.C. Park, AreaEfficient Multimode Encoding Ar chitecture for Long BCH Codes,IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60,12,872876,2013.

S. Gregori, A. Cabrini, O. Khouri, and G. Torelli, Onchip error correction tech niques for newgeneration flash memories,Proc. IEEE,91,602616,2003.

M. Sprachmann,Automatic Generation of Parallel CRC Circuits,IEEE Des.Test. Comput., vol. 18,3,108114,2001.

J. Zhang, Z. Wang, Q, Hu, and J. Xiao, Optimized Design for High Speed Parallel BCH Encoder, in Proc. IEEE Int. Workshop VLSI Des. Video Tech nol.(IWVDVT),179182,2005.