Design and VLSI Implementation of an Area-Efficient Pipelined AES Accelerator for 5G NR Security

Sushma G; Prof. Pallavi Patil

doi:10.5281/zenodo.20441047

Volume 15, Issue 05 (May 2026)

Design and VLSI Implementation of an Area-Efficient Pipelined AES Accelerator for 5G NR Security

DOI : 10.5281/zenodo.20441047

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 23
Authors : Sushma G, Prof. Pallavi Patil
Paper ID : IJERTV15IS052153
Volume & Issue : Volume 15, Issue 05 , May – 2026
Published (First Online): 29-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Design and VLSI Implementation of an Area-Efficient Pipelined AES Accelerator for 5G NR Security

Sushma G

Dept of ECE, VTU CPGS KALABURAGI, INDIA

Prof. Pallavi Patil

Dept of ECE, VTU CPGS KALABURAGI, INDIA

Abstract As the 5G NR speeds get higher, the need for quick and efficient security increases proportionally. Traditional software-based cryptographic algorithms fail to meet required latency requirements in real-time applications, leading to an inevitable transition to hardware-based cryptography. This paper proposes a design that meets this criterion: an area-efficient, pipelined AES-128 hardware accelerator optimized for future communication systems.The proposed design employs a 10-stage fine-grain pipeline, enabling simultaneous processing of multiple data blocks. As such, high throughput is achieved alongside reduced latency. In addition, to minimize hardware size, a hardware-efficient Composite Field Arithmetic (CFA)-based design replaces conventional Look-Up Table (LUT) based S-boxes, thereby reducing the number of gates and power consumption with minimal impact on cryptographic security.The design will be implemented using the Verilog Hardware Description Language (HDL) and then synthesized using the Xilinx Vivado tool. Simulations will be carried out using the ModelSim or Vivado simulator. Validation tests will involve testing the design’s functionality using standard AES test vectors. Experiments prove that the proposed approach provides better throughput, area utilization, and power consumption than conventional AES implementations.

Index Terms AES-128, Pipelined Architecture, 5G Security, Verilog HDL, Composite Field Arithmetic (CFA), Hardware Acceleration, Area Efficiency, High Throughput, Cryptographic Systems, VLSI Design.

Introduction

The emergence of modern communication technologies, headed by 5G New Radio technology, has fueled the need for ultrafast data transmission and reliable security. As the data rates grow beyond 20 Gbps, maintaining the confidentiality and integrity of the transmitted information becomes a critical challenge. The Advanced Encryption Standard (AES) has emerged as a backbone technology for safeguarding sensitive data in wireless communications, mobile networks, and Internet of Things (IoT). However, classic software AES algorithms cannot support real-time encryption due to their high latency, low throughput, and increasing computational overhead. These obstacles hinder applications such as streaming video, autonomous systems, mobile payments, and automated industry from achieving maximum performance. Thus, the need arises for a hardware

solution to AES that would maximize throughput and minimize computational overhead.

Several research works have been conducted in this field. Zhang and Parhi developed high-performance VLSI designs of the AES algorithm with pipelining capabilities. In their footsteps, Zhu et al. and Joshi et al. refined the SubBytes (S-box) stage using efficient combinational logic and optimized hardware design. Recently, Ueno et al. and Kumar et al. have focused on high-throughput AES architectures that are resistant to side-channel attacks. Nannipieri et al. and Ahmad & Hasan have furthered advances in advanced AES cryptoprocessors and ASIC accelerators. Furthermore, Subramanian et al. made energy-saving possible using clock-gated S-box designs.

In 2022, notable progress has been achieved in terms of throughput and area efficiency. J.-S. Ng et al. designed a highly secure FPGA-based AES accelerator immune to side-channel attacks, whereas Ramya et al. provided an efficient VLSI design of the AES algorithm. Padmavathi et al. proposed a memory-based architecture of AES to increase processing efficiency, and Teng et al. created an efficient area implementation of the S-box using Composite Field Arithmetic (CFA).

Nevertheless, most solutions focus on the improvement of one particular characteristic rather than a balanced discipline. To overcome this problem, the current project proposes designing and implementing a highly area-efficient and high-throughput AES-128 algorithm accelerator using VLSI methodology. The proposed accelerator will feature

10 stages of pipelining for simultaneous processing of several data blocks to significantly enhance its throughput and reduce latency, making it suitable for real-time applications in 5G networks. One of the main issues with hardware AES implementation lies in the excessive amount of hardware area consumed by lookup table-based S-box implementations. The current project utilizes Composite Field Arithmetic (CFA) to substitute time-consuming memory accesses with mathematical computations, which was previously investigated in other papers.
LITERATURE SURVEY

In recent years, significant strides have been made in terms of achieving security, speed, and hardware efficiency in AES hardware design. For instance, in 2022, J.-S. Ng et al. developed an advanced FPGA-based AES acceleration circuitry utilizing dual-hiding asynchronous logic for protecting from side-channel attacks by masking both power and timing signals to render secret-key extraction extremely difficult albeit resulting in increased hardware and resources consumption.

Further, Ramya et al. proposed a VLSI-based AES architecture for fast encryption and overall system improvement in terms of performance. While throughput was increased, partial power optimization has been achieved in this work that can be crucial for mobile and embedded applications.

Additionally, Padmavathi et al. put forward an AES-in-Memory (AESIM) architecture wherein encryption process is integrated with memory to reduce transfer overheads which results in higher speed and efficiency while heavy reliance on memories increases the size of the chip.

Moreover, Teng et al. introduced an area-efficient S-box based on Composite Field Arithmetic (CFA) that replaces traditional lookup tables with calculations in order to reduce hardware costs and energy consumption, albeit adding design complexity through sophisticated mathematics.

On the other hand, Nannipieri et al. (on behalf of the European Processor Initiative) introduced an advanced AES cryptoprocessor for modern CPUs offering enhanced performance. However, this design has relatively high complexity and cost associated with it, making it suitable only for large-scope projects.

Furthermore, Ahmad et al. offered an ASIC-based AES accelerator that provides performance and energy efficiency improvements, but flexibility is sacrificed as compared to FPGA-based designs. Also, Subramanian et al. developed a clock-gated S-box for power reduction through circuitry deactivation when inactive. In turn, this gating can adversely affect performance.

Likewise, Kumar et al. developed a side-channel attack resistant AES-128 by employing masking and dual rail logic in order to increase security level, but adding to implementation complexity. On top of that, Ueno et al. introduced datapath compression for raising throughput and reducing gate counts in the process, thus sacrificing some flexibility and performance. Moreover, Joshi et al. worked solely on efficient S-box implementations without any focus on power and speed. Additionally, Zhu designed AES SubBytes implementation using combinational logic in order to decrease memory usage, though the propagation delay was somewhat increased.

Zhang and Parhi explored pipelined and parallel architectures for fast AES implementation, albeit requiring more hardware space.

From all of the above, it is evident thatsubstantial strides were made in terms of enhancing performance, area, or security of AES accelerators. Nevertheless, most papers focus primarily on improving one or two parameters while ignoring the others, thus failing to offer balanced approaches. Accordingly, the proposed solution aims to strike a perfect balance between performance, power, and hardware resources by combining pipelining for enhanced throughput with CFA for hardware efficiency. As demonstrated above, such approach is justified by past research in terms of [2] and [12].

Methodology

The design philosophy behind this proposed system is one that relies on a systematic approach towards developing and implementing a pipeline AES-128 accelerator that is mindful of area optimization while at the same time delivering blazing-fast speeds in its operation.

A: Stages of Methodology

System Design Overview

This technique is based on the following key decisions:
- – Using AES in hardware via Verilog HDL
- – Using pipelining in order to increase throughput
- – Applying CFA in order to decrease silicon area
- – Optimizing implementation at the gate level
AES Module Development

The AES algorithm can be divided into four crucial units, which are designed as individual hardware blocks:
1. SubBytes (S-Box)
  - Implemented using Composite Field Arithmetic
  - Reduces LUT-based memory requirements
  - Increases power and area efficiency
2. ShiftRows
  - Involves shifting rows cyclically
  - Designed using simple wiring circuitry
3. MixColumns
  - Makes use of GF (2) arithmetic
  - Executed via XOR and shifts operations
4. AddRoundKey
  - XORing of state and round key
  - Keeps encryption secure
Pipelined Architecture Design
- A 10-stage pipeline is used to represent 10 rounds of AES.
- Registers are inserted between each stage.
- Enables parallel processing of multiple data blocks.
  1. Outcome:
    - High throughput (>20 Gbps)
    - Reduced latency
    - Efficient real-time processing
Data Processing Flow
1. Plaintext input is provided to the AES module
2. Each pipeline stage performs:
  - SubBytes ShiftRows MixColumns
    
    AddRoundKey
3. Intermediate outputs are stored in registers
4. Final ciphertext is obtained after 10 stages
Optimization Techniques

To enhance efficiency, the following optimizations are applied:
- Composite Field Arithmetic (CFA) was employed to reduce the size of the chip
- Gate-level optimization for minimizing delay
- Proper register allocation in modules for increasing the clock rate
- Resource sharing between encryption and key generation processes
- Power gating in simulation to minimize power consumption
Simulation and Verification
- Simulation is performed using ModelSim / Vivado Simulator
- Outputs are verified using:
  - Standard AES test vectors (NIST)
  - MATLAB reference results
    1. Ensures:
      - Correct encryption
      - Reliable hardware behavior
Synthesis and Performance Analysis

Using Xilinx Vivado, the design is evaluated through:
- Utilization Report Area (LUTs, Flip-Flops)
- Timing Report Speed (Clock frequency, delay)
- Power Report Energy consumption
Final Implementation Output
- Verilog design files
- Simulation waveforms
- RTL schematic
- Area, timing, and power reports

B: Manual CFA Calculation

In order to verify that the CFA method embedded in SubBytes (S-Box) operates correctly, let us go through some manual computations grounded on finite fields transformation.

The Concept of Composite Field Arithmetic
- In regular AES, the S-Box depends on finding the inverse in GF(2^8).
- The CFA reduces complexity by breaking up GF(2^8) into:
  
  o GF((2^4)^2)
  
  o GF(((2^2)^2)^2)
  
  As a result, multiplications and inversion become simpler.
Field Transformation

The input byte undergoes:
- Isomorphism (): GF(2^8) transformation to the composite field
- Inversion within subfields
- Reverse isomorphism (¹): going back to GF(2^8)

Manual Inversion Example For an input byte (example: 0x53): Step-by-step:
1. Convert hexadecimal to binary
  
  0x53 = 01010011
2. Apply isomorphic mapping ()
  
  Transform into composite field representation
3. Perform multiplicative inverse in GF((2)²)
  - Break into sub-elements
  - Apply:
    - Squaring
    - Multiplication
    - Inversion in GF(2)
4. Apply inverse mapping (¹)
  
  Convert result back to GF(2)
5. Apply affine transformation
Final S-Box output (for 0x53 0xED)
Mathematical Representation

The inversion operation follows:

S(x)=Ax1+b Where:
x1= multiplicative inverse in GF(2)
A = constant matrix
b = constant vector

Fig 7 represents the internal Composite Field Arithmetic (CFA) structure using a coordinate-based grid, where each labeled block (X0Y0 to X1Y2) corresponds to a subfield operation or intermediate value in the decomposition of GF(2).

Coordinate Interpretation (XY Mapping)
X-axis (X0, X1) Represents data partitions (lower/upper nibbles)
Y-axis (Y0, Y1, Y2) Represents stages of computation in CFA

So each coordinate (e.g., X1Y2) corresponds to: A specific subfield element at a particular computation stage

Mapping GF(2) Composite Field

In CFA, one byte AGF(28) is decomposed as:
A=a1xa0, where a1,a0GF(24) This directly maps to:
X0 a0 (lower 4 bits)

X1 a1 (upper 4 bits)

TABLE 1: Meaning of Each Coordinate Block

Coordinate	GF Meaning	Operation Role
X0Y0	a0	Input lower nibble
X1Y0	a1	Input upper nibble
X0Y1	Intermediate result of a0	After transformation/multiplication
X1Y1	Intermediate result of a1	After transformation/multiplication
X0Y2	Processed a01	After inversion stage
X1Y2	Processed a11	After inversion stage

Mathematical Flow Across Coordinates

The computation progresses vertically (Y-direction):

Stage Y0 (Input Stage):

(X0Y0,X1Y0)=(a0,a1)

Stage Y1 (Transformation Stage):
Perform operations such as:
- Squaring: a2
- Multiplication: a0a1
- Constant multiplication in GF(2)
  
  Stage Y2 (Inversion Stage):
Compute multiplicative inverse using:
- (a1x+a0)1 in GF((2)²)
Results stored as:
- X0Y2,X1Y2
  
  Relation to Hardware Implementation
Each coordinate block corresponds to a hardware module or logic cluster:
- XOR gates GF addition
- Shift operations Multiplication by x
- Small LUTs / logic GF(2) inversion
Data flows:

This forms Stage 0 of the pipeline and ensures immediate key dependency.

10-Stage AES Pipeline Architecture

The design uses a fully pipelined architecture with 10 stages, corresponding to AES-128 rounds:

TABLE 2: 10-Stage AES Pipeline Architecture

Stage

Operation

Stage 0

Initial AddRoundKey

Stage 19

Full AES rounds (SubBytes ShiftRows

MixColumns AddRoundKey)

Stage 10

Final round (SubBytes ShiftRows

AddRoundKey)

o

o

Key Insight

Horizontally (X direction) parallel

nibble processing

Vertically (Y direction) pipeline stages
1. Pipeline Depth = 10 Stages
  - Each stage is separated by registers
  - Each stage processes one block per clock cycle
Pipeline Working Principle

The coordinate system simplifies understanding: (X, Y) = (data partition, computation stage)

Thus:
- X0Y0 raw input
- X0Y1 transformed value
- X0Y2 final inverted component

System Architecture

Fig 1: System Architecture
1. Input: Plaintext + Key
  
  The system begins with:
  - 128-bit plaintext
  - 128-bit secret key
    
    These are fed into the encryption pipeline as the initial inputs.
2. Initial AddRoundKey (Stage 0)
  
  The plaintext is XORed with the key:
  - State=PlaintextKey
  - At cycle 1 First plaintext enters Stage 0
  - At cycle 2 Moves to Stage 1, new input enters Stage 0
  - This continues
    
    a) After 10 clock cycles (initial latency):
  - First ciphertext output is produced
    1. After pipeline is filled:
  - One ciphertext is produced per clock cycle
1. Throughput Calculation (64 Gbps)
  
  Once the pipeline reaches steady state:
  - Block size = 128 bits
  - Output rate = 1 block per clock cycle
    
    If clock frequency = 500 MHz, then:
    
    Throughput=128×500×106
    
    =64×109 bits/sec=64 Gbps
2. Key Insight: Latency vs Throughput
  - Latency = 10 cycles (time for first output)
  - Throughput = 1 block/cycle after fill (continuous high-speed output)
    
    This is why pipelining is powerful:
  - Latency remains fixed
  - Throughput increases significantly
3. Role of Each Pipeline Stage
  
  Each stage internally performs:
  1. SubBytes
    - Non-linear substitution
    - Implemented using Composite Field Arithmetic (CFA)
    - Reduces LUT usage and area
  2. ShiftRows
    - Cyclic row shifting
    - Provides diffusion
  3. MixColumns
    - Uses Galois Field GF(2) arithmetic
    - Implemented using XOR and shifts
  4. AddRoundKey
  - XOR with round key
  - Ensures cryptographic strength
4. Final Round (Stage 10)
  - Same as other rounds but:
    - No MixColumns
  - Produces final encrypted output
5. Ciphertext Output
  - Final output: 128-bit ciphertext
  - After initial 10-cycle delay:
    - Continuous output every cycle
    - Enables real-time 5G data encryption
6. Performance Summary
  
  Table 3: Performnace Summary
  
  Parameter
  
  Value
  
  Pipeline Depth
  
  10 stages
  
  Initial Latency
  
  10 cycles
  
  Throughput
  
  64 Gbps
  
  Block Size
  
  128 bits
  
  Technique
  
  CFA + Pipelining
7. Final Insight
  
  This architecture achieves ultra-high throughput (64 Gbps) by:
  - Parallel processing of multiple blocks
  - Eliminating idle cycles after pipeline fill
  - Using CFA to reduce area and improve speed

Results And Discussions

Fig 2: Home Screen

Fig 3: Input Configuration Screen

Figure 4: AES pipeline Flow

This graph titled AES Pipeline Flow is showing how data moves through a pipelined AES (Advanced Encryption Standard) hardware design over time (clock cycles).

axis (horizontal) Clock cycles / time steps

Y-axis (vetical) Pipeline stages (or processing levels) Rising diagonal line (from cycle 1 to ~9)

This indicates the pipeline is filling up.

Each clock cycle, a new data block enters the AES pipeline. At the same time, previous blocks move to the next stage. So, multiple blocks are being processed simultaneously (this is the key idea of pipelining).

Peak point (~cycle 9, stage 10) The pipeline is now fully utilized.

All stages are busy maximum throughput achieved. Sudden drop (~cycle 10 back to stage 1)

This represents pipeline reset or flush. Possible reasons:

End of a data batch Reinitialization Key change in AES

The pipeline empties quickly.

Second rising diagonal (cycle 10 to 18)

The pipeline starts filling again with new data blocks. Same process repeats.

Fig 5: Latency Vs throughput

This chart shows a clear relationship between the speed of individual blocks, or latency, and the number of blocks that can be processed per unit of time, which is throughput. High latency corresponds to low throughput, where the system is mostly inactive, allowing for just a few blocks to pass through the pipeline. This causes a lot of stages to go unused. However, as throughput increases, many blocks are fed into the pipeline, resulting in parallelism among the different stages. The efficiency goes up, reducing the latency dramatically. However, at some stage, the efficiency

begins to level off, showing that peak throughput has been achieved. Beyond that point, throughput may not offer much improvement in latency since each block must traverse all the stages. In the case of AES encryption, the pipeline offers increased throughput, where one block can be processed per clock cycle in full operation. However, latency remains relatively constant.

Fig 6: AES Encryption Simulation Waveform (Plaintext to Ciphertext)

The image represents the simulation result of the AES module implementation (using simulation software like Modelsim and Vivado), allowing one to observe the behavior of inputs and outputs over time.
1. Signals in the waveform
  - clk (clock)
    - Value: 1
    - It controls the entire AES process, with each change representing one clock cycle.
  - rst (reset)
    - Value: 0
    - Reset is inactive, so the AES module is operating normally.
  - plaintext [127:0]
    - This is a 128-bit input block (standard AES input).
  - ciphertext [127:0]
    - This is the encrypted output produced by the AES module.

Fig 7: Composite Field Arithmetic (CFA) Coordinate Mapping in RTL Layout

In this figure, we are shown the internal structure of the AES S-Box, which is a fundamental component of the SubBytes process of the AES cipher. This chart is segregated into blocks named with coordinate points such as X0Y0, X1Y0, X0Y1, X1Y1, X0Y2, and X1Y2. This sort of

layout is widely used in hardware implementations (VLSI/FPGA). Here, the S-Box is optimized using composite field arithmetic instead of the standard look-up

table approach. This figure demonstrates that each block represents a specific point in the process or function of computing the AES S-Box, allowing for parallelism and hardware simplicity. The division between the two columns shows how the input bits are split up and processed in various logic units. Overall, from this figure, one can understand how to implement the AES S-Box in hardware by decomposing it into coordinate-wise computation blocks.

Fig 8: RTL Schematic Showing AES Input Connection with 128 -bit Test Vector

This figure represents a block diagram of the AES-128 hardware architecture that includes the input of plain text going into the device under test (DUT). At the center of the diagram, we have the aes128_top component, which is the core AES encryption hardware that receives inputs like plaintext[127:0], clk (clock), and rst (reset). To the left side, there is the input plain text with a fixed 128-bit test vector: 00112233445566778899AABBCCDDEEFF. These are

supplied by a signal source to ensure a consistent test vector entry into the AES block. There is a straightforward wiring from the input plain text to the DUT as well as the connection of the reset to the ground (inactive). A clock operates the sequential circuitry of the AES hardware.

Table 4: Performance Comparison with Existing Work

Parameter	Proposed Work	Existing Work 1	Existing Work 2
AES Type	AES-128	AES-128	AES-128
Architecture	Fully Pipelined (10-stage)	Partially Pipelined	Iterative / Loop-based
Pipeline Depth	10 stages	46 stages	No pipeline
Clock Frequency	~500 MHz	~220 MHz	~100 MHz
Throughput	64 Gbps	~28 Gbps	~12.8 Gbps
Latency	10 cycles (initial)	Moderate	High
S-Box Design	Composite Field Arithmetic (CFA)	LUT-based	LUT-based
Area Efficiency	High (reduced LUT usage)	Moderate	Low
Power	Optimized	Moderate	Higher

Parameter

Proposed Work

Existing

Work 1

Existing

Work 2

Consumption

(power gating)

Parallel Processing

Yes (1

block/cycle)

Limited

No

Conclusion

The proposed design offers a fast, small AES-128 hardware accelerator that suits the challenging environment of 5G. The use of hardware accelerators will improve the encryption process since they can handle multiple data blocks simultaneously. For example, the 10-stage pipeline architecture allows processing several data blocks concurrently by leveraging Composite Field Arithmetic (CFA) in the S-box stage, which reduces the chip area and power consumption significantly. Therefore, the design is suitable for small devices such as mobile phones and Internet of Things (IoT) devices.

The design was created using Verilog HDL, tested on Xilinx Vivado simulations, and verified by synthesizing. The system design meets all specified design parameters as indicated in the results section. Compared to other conventional AES designs, the proposed design exhibits better performance measures in terms of area and throughput efficiency. Overall, the system demonstrates remarkable efficiency and reliability in achieving fast, low-power, and compact hardware designs for 5G security. Extensions to AES-192/256, full decryption process, and Application-Specific Integrated Circuits (ASICs) are possible.

Future Enhancement

Future improvements will focus on optimizing power consumption, developing countermeasures against side channel attacks, and integrating the design with the 5G/IoT security framework.

References

J. -S. Ng, J. Chen, K. -S. Chong, J. S. Chang and B. -H. Gwee, A Highly Secure FPGA-Based Dual-Hiding Asynchronous-Logic AES Accelerator Against Side-Channel Attacks, in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vl. 30, no. 9, pp. 1144 – 1157, Sept. 2022, doi: 10.1109/TVLSI.2022.3175180.
Zhang, Xinmiao, and Keshab K. Parhi. High-speed VLSI architectures for the AES algorithm. IEEE transactions on very large scale integration (VLSI) systems 12, no. 9 ( 2004 ): 957 – 967.
Zhu, Minling, Xi Wang, Jinghong Rao, and Ai He. SubByte for the AES using combinational logic. In 2011 International Conference on Electronics, Communications and Control (ICECC), pp. 1064 – 1067. IEEE, 2011.
Joshi, Arundhati, P. K. Dakhole, and Ajay Thatere. Implementation of S-Box for advanced encryption standard. In 2015 IEEE International Conference on Engineering and Technology (ICETECH), pp. 1 – 5. IEEE, 2015.
Ramya, T., G. Ramya, Karthik Raju, J. Ravi, and Deepak Verma. An Efficient AES Algorithm for Cryptography Using VLSI. ECS Transactions 107, no. 1 ( 2022 ): 5605.

CrossRef

Google Scholar
Nannipieri, Pietro, Stefano Di Matteo, Luca Baldanzi, Luca Crocetti, Luca Zulberti, Sergio Saponara, and Luca Fanucci. VLSI design of Advanced-Features AES CryptoProcessor in the framework of the European Processor Initiative. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 30, no. 2 ( 2021 ): 177 – 186.
Ahmad, Nabihah, and SM Rezaul Hasan. A new ASIC implementation of an AES crypto-hardware accelerator. Microelectronics Journal 117 ( 2021 ): 105255.
Padmavathi, R. Anusha, and K. S. Dhanalakshmi. An advanced encryption standard in memory (aesim) efficient, high performance s-box based aes encryption and decryption architecture on vlsi. Wireless Personal Communications 123, no. 4 ( 2022 ): 3081 – 3101.
Subramanian, K., M. Venkatachalam, and M. Saroja. Adaptive counter clock gated S-Box transformation based AES algorithm of low power consumption and dissipation in VLSI system design. In Journal of Physics: Conference Series, vol. 1979, no. 1, p. 012066. IOP Publishing, 2021.
Kumar, Raghavan, Vikram Suresh, Monodeep Kar, Sudhir Satpathy, Mark A. Anders, Himanshu Kaul, Amit Agarwal et al. A 4900 – m 2 839-Mb/s Side-Channel Attack-Resistant AES-128 in 14-nm CMOS With Heterogeneous Sboxes, Linear Masked MixColumns, and Dual-Rail Key Addition. IEEE Journal of Solid-State Circuits 55, no. 4 ( 2020 ): 945 – 955.
Ueno, Rei, Sumio Morioka, Noriyuki Miura, Kohei Matsuda, Makoto Nagata, Shivam Bhasin, Yves Mathieu, Tarik Graba, Jean-Luc Danger, and Naofumi Homma. High throughput/gate AES hardware architectures based on datapath compression. IEEE Transactions on Computers 69, no. 4 ( 2019 ): 534 – 548.

r
Y. -T. Teng, W. -L. Chin, D. -K. Chang, P. -Y. Chen and P. -W. Chen, VLSI Architecture of S-Box With High Area Efficiency Based on Composite Field Arithmetic, in IEEE Access, vol. 10, pp. 2721 – 2728, 2022, doi: 10.1109/ACCESS.2021.3139040.

Stage	Operation
Stage 0	Initial AddRoundKey
Stage 19	Full AES rounds (SubBytes ShiftRows MixColumns AddRoundKey)
Stage 10	Final round (SubBytes ShiftRows AddRoundKey)

Design and VLSI Implementation of an Area-Efficient Pipelined AES Accelerator for 5G NR Security

LITERATURE SURVEY

TABLE 1: Meaning of Each Coordinate Block

TABLE 2: 10-Stage AES Pipeline Architecture

Table 3: Performnace Summary

Table 4: Performance Comparison with Existing Work