Design of 256-Bit Block Cipher and Enhanced Non-Linear Substitution

doi:https://doi.org/10.5281/zenodo.18863312

Volume 15, Issue 02 (February 2026)

Design of 256-Bit Block Cipher and Enhanced Non-Linear Substitution

DOI : https://doi.org/10.5281/zenodo.18863312

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 94
Authors : Prasant Pradhan, Pushkar Raj, Upashna Darjee, Ronit Subba
Paper ID : IJERTV15IS020840
Volume & Issue : Volume 15, Issue 02 , February – 2026
Published (First Online): 04-03-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Design of 256-Bit Block Cipher and Enhanced Non-Linear Substitution

Prasant Pradhan *1, Pushkar Raj *2, Upashna Darjee *3, Ronit Subba*4

1Assistant Professor & Head, Department Of Computer Engineering, 2, 3, 4Student,

Department of Computer Engineering, Sikkim Institute of Science and Technology South Sikkim, Sikkim, India

Abstract:- In todays digital era, the need for advanced encryption techniques is paramount to ensure secure and resilient information exchange. This project proposes the design of a custom 256bit block cipher, inspired by the Advanced Encryption Standard (AES), with structural and functional enhancements aimed at addressing emerging security challenges. The cipher operates on a 256bit data block structured as a 4×8 matrix doubling the block size of standard AES and introduces compatibility with MIMD (Multiple Instruction, Multiple Data) architectures, in contrast to AESs SIMD-friendly nature. While preserving AESs core operations (Sub Bytes, Shift Rows, Mix Columns, and Add Round Key), the design incorporates multiple nonlinear S-boxes, row-wise substitution, and parallelism-friendly optimizations. These modifications enhance both confusion and diffusion properties, aiming to offer improved resistance against classical and quantum cryptanalytic techniques. The proposed cipher represents a step toward future-ready symmetric encryption aligned with post- quantum security requirements.

Keyword: AES, MIMD, SIMD, Sub Bytes, Shift Rows, Mix Columns, Add Round Key, Sbox, birthday-bound.

INTRODUCTION:

With the advent of the post-quantum era and the exponential growth in a computational power, traditional cryptographic standards face increased risks from statistical and brute-force attacks. The widely adopted Advanced Encryption Standard (AES), while still considered secure, operates on a fixed 128-bit block size, which may no longer provide sufficient protection in the face of future computational capabilities. Additionally, AESs SIMD-friendly processing model limits its scalability and efficiency on modern multi-core and heterogeneous computing architectures. To address these challenges, this project introduces a novel 256- bit block cipher inspired by AES but enhanced for future-proof security and parallel performance. By doubling the block size from 128 to 256 bits and organizing the data as a 4×8 matrix, the proposed design increases throughput per round and strengthens resistance to statistical and differential attacks. The cipher architecture introduces several key innovations. First, it supports MIMD (Multiple Instruction, Multiple Data) execution by assigning a distinct S-box to each row of the state matrix, allowing for simultaneous and independent substitution operations. This improves both encryption speed and hardware efficiency. Second, the design includes a spatial substitution step before

the traditional ShiftRows operation, introducing additional non- linearity early in the encryption process. This enhances diffusion and reduces the number of rounds needed without compromising security.

LITERATURE REVIEW:

Recent advancements in cryptographic research indicate a growing concern regarding the long-term security and scalability of conventional block ciphers, particularly the Advanced Encryption Standard (AES)[1]. While AES remains computationally secure, its fixed 128-bit block size presents limitations against birthday-bound attacks and emerging quantum threats. Furthermore, AES was primarily designed for SIMD-oriented execution[4], which restricts its efficiency on modern multi-core and heterogeneous computing platforms. Several studies have focused on improving AES through enhanced key scheduling, optimized substitution layers, and parallel execution strategies. Research on AES key schedule representations has improved diffusion understanding but remains confined to 128-bit architectures. Parallel AES implementations have demonstrated performance improvements through pipelining and concurrent MixColumns operations[10]; however, these approaches do not extend naturally to wider state matrices such as 4×8 configurations required for 256- bit block sizes. S-box design has been widely explored to strengthen resistance against differential and linear cryptanalysis. While multiple works propose high non-linearity and low differential uniformity S-boxes, most approaches rely on a single substitution function per round[4]. Limited research investigates the cryptographic and architectural implications of employing multiple independent S- boxes within the same encryption round. Additionally, lightweight and next-generation block ciphers emphasize reduced rounds and memory efficiency but often lack support for MIMD-friendly designs and advanced substitution diversity[13]. Overall, the surveyed literature highlights a clear research gap in the design of block ciphers that simultaneously support larger block sizes, multiple nonlinear substitution layers, and MIMD-based parallel execution. These limitations motivate the development of a 256-bit AES-inspired block cipher incorporating multiple S-boxes and scalable parallelization techniques to enhance both security and performance in post-quantum computing environments.

ALGORITHM ARCHITECTURE:

ENCRYPTION:

Fig: 1 Algorithm Architecture for Encryption
DECRYPTION:

Fig: 2 Algorithm Architecture for Decryption

S-BOX ANALYSIS AND SELECTION:

After analysing dynamic key dependent sbox[6][7] and static key dependent sbox and various other sboxes based on BAT algorithm[8] ,genetic algorithm etc. we inferred that AES mathematical structure is most secure till date although there are various other s-boxes of similarly comparable security for an 8-bit sbox.

S_BOX-0

For its 0th row ,the proposed system uses 8 bit AES- SBOX used in AES-128 which is based upon following mathematical structure as specified in AES

-128[4] .

Fig 4. Forward Substitution Box

Compute inverse in GF(2^8) using an irreducible polynomial

p(x)=x^8+x^4+x^3+x+1

(A look up figure is provided in fig 3 )
Affine transformation : the value is then affine transformed bi= bi b(i+4) mod

8 b(i+5)mod 8 b(i+6) mod 8 b(i+7) mod 8

ci Where C=0x63

And for Inverse Sbox we use x= (A1 (y c)) 1 . For decryption affine

transforation is defined by bi = b (i+2)

mod 8 b (i+5) mod 8 b (i+7) mod 8 di Where di =0x05

Fig 3: Multiplicative inverses for GF (28)

Fig 5. Inverse Substitution Box

S_BOX-1

For its 1st row, the proposed system uses key dependent AES Sbox but the key is static A5 and hence it makes Sbox static. We did not use dynamic key to avoid on the fly calculation as it may be more expensive than lookups in terms of overhead. This SBOXA5 uses the same Mathematical structure as of AES-128[4] BUT with a little tweak resulting in different permutation,(How to calculate is explained below),resulting in more workload for side channel attacker if different row uses Sboxes having different permutations.

The multiplicative inverse for GF

(28) is computed using same reducible polynomial as P(x).[14]
The Multiplicative inverse is xored with key = 0xA5.
The result is then affine transformed using same A and C

[14] bi= bi b (i+4) mod 8 b

(i+5) mod 8 b (i+6) mod 8 b (i+7) mod

8 ci C=0x63 .

The inverse Sbox uses same Affine transformation bi = b (+2) mod 8 b (i+5) mod 8 b (i+7) mod

8 di

Fig 6: Forward SBOXA5

Fig 7: Inverse SBOXA5

S_BOX -2

For its 2nd row, the proposed system uses SBOX derived in [9], which consists of four key steps. First, an action of the modular group or projective special linear group SL (2, ) on a projective line PL (F7) over a finite field F7 to yield a permutation group G. After that, draw a coset diagram for the permutation group G

obtained corresponding to the action of (2, ) on PL (F7). Then, generate an adjacency matrix corresponding to the obtained coset diagram. Finally, use this adjacency matrix and apply an affine transformation on the Galois field elements followed by the addition of an 8- bit number to generate the final S-box. [9] .The Sbox S4 obtained via this method had two fixed points (0x00 and 0x23) as shown below, which is not a desirable property in secure sboxes [4] To eliminate Fixed points We XOR this with a key = 0x02 which yields no fixed point Sbox, S402.

Fig.8: S4 having fixed points

Fig.9: Forward S402

S_BOX-3

Fig.9: Inverse S402

Fig 11: Inverse SM4 SBOX THEORATICAL SECURITY ANALYSIS:

For the 3rd and last row we use SM4 SBOX [14] which is calculated similar to AES, The Algorithm to compute it is as follows:

Compute multiplicative Inverse in GF (28) using irreducible polynomial.

P(x) = x8+x7+x6+x5+x4+x3+x2+1[29]
Then an affine transformation: B=A.x+C [29]

The forward and inverse SM4 SBOX is :

Fig 10: Forward SM4 SBOX

Fig.12: Theoratical Security Analysis for Sbox-0, Sbox-1, Sbox-2 and Sbox-3

MATHEMATICAL MODELLING FOR MIX-COLUMN OPERATION AND PARALLELIZATION:

The mathematical timing framework and pipelined round decomposition used in this work are based on the framework presented in [10] for the AES-128 architecture with 128-bit block and 10 rounds. This Research extends this to AES-256 Architecture with a 256 bit block and 14 rounds, incorporating parallel sub_byte step. . We measure performance based on classical parallel performance theory metrics like speedup and improvement (%) [11][12].

Nb: Number of state columns Nr: Number of rounds

Txor: Time to execute one bytewise XOR operations.

TShift: Time to execute one bit shift operations .

Tsub_bytes: Time to perform subbyte step in a round. To model and compare mathematically we take an assumption [13] that Txor is six time as complex as Tshift.

TPipeline =Time taken by last block to get encrypted. Assuming t1: Time for initial stage t2: Time for any of the ith stage, t3: Time for final round

Sequential Processing : For Nr=14 TES=2395*Txor+14*Tsub_bytes (1) For L blocks

TES= L *(2395*Txor +14*Tsub_bytes) (2)
Pipelining (Temporal Parallelism) : Let these be n PEs working on each round.

N=15*n N is the total number of processors TAdd_rk=32/n*Txor

Tmix_col=64/n*[Max (TPEi, TPEi+1) +T] (3) Where T= Time for sending and receiving operations.
Parallelising Mix_col and Add_rk : TAdd_rk=32/n*Txor (4)

TMix_col=64/n*[Max (TPEi, TPEi+1) +T] (5)
The Effect of Parallelisation of Add_round_key, Mix_column and piplining on Encryption:

Speedup= (L*7184/3)/112+32*(L+289/3+26)/n

PERFORMANCE ANALYSIS:

Fig 13: S v/s L for =0

Fig 14: S v/s L for =1 and n=8

MATHEMATICAL MODELLING FOR INVERSE MIXCOLUMN OPERATION:

The mathematical timing framework and pipelined round decomposition used in this work are based on the framework presented in [10] for the AES-128 architecture with 128-bit block and 10 rounds. This Research extends this to AES-256 Architecture with a 256 bit block and 14 rounds,

incorporating parallel sub_byte step. We measure performance based on classical parallel performance theory metrics like speedup and improvement (%) [11][12].

Nb: Number of state columns Nr: Number of rounds

Txor: Time to execute one bytewise XOR operations.

TShift: Time to execute one bit shift operations Tsub_bytes: Time to perform subbyte step in a round.

To model and compare mathematically we take an assumption [14] that Txor is six time as complex as Tshift.

Sequential Processing:

For L block TDS=L*5584*Txor (6)
Pipelining the AES decryption round : Tpipeline= =32*L*Txor+5552*Txor (7)
Parallelising of Add_rk and inv_Mix_col :

TAdd_rk=32/n*Txor (8)

TInv_Mix_column=128/n*
[max (TPEk,TPEk+1,TPEk+2,TPEk+3)+Txor] (9)

TPEk=3*Tshift+4*Txor TPEk+1=3*Tshift+2*Txor TPEk+2=3*Tshift+3*Txor TPEk+3=3*Tshift+Txor T=overhead time
Effect of Parallelisation and piplining : Speedup = (L*5584)/ [560+32/n {L+234+52}] (10)

PERFORMANCE ANALYSIS:

Fig 15: Degree of improvement for tou= 0 and n=4

temp = SubWord(RotWord(temp)) xor Rcon[i/Nk] else if (Nk > 6 and i mod Nk = 4) temp

= SubWord(temp) end if

Fig 16: Degree of improvement for tou= 1 and n=4

ShiftRows:

The shiftrows operation here will be same as Rijndael 256 to provide maximum diffusion [14].
- Row 0 will be shifted by 0 bytes
- Row 1 will be cyclic shifted to the left by 1 byte
- Row 2 will be cyclic shifted to the left by 3 bytes
- Row 3 will be cyclic shifted to the left by 4 bytes
  
  InvShiftRows:
  
  The InvShiftRows is just the opposite of ShiftRows.
- Row 0 will be shifted by 0 bytes
- Row 1 will be cyclic shifted to the right by 1 byte
- Row 2 will be cyclic shifted to the right by 3 bytes
- Row 3 will be cyclic shifted to the right by 4 bytes

Key Schedule:

The key schedule uses 4 Sboxes to generate multiple round key with structure similar to AES key schedule [4].Each roundkey is used in one AES round to provide key dependent diffusion. Initial 256 bit key is expanded into 15 round keys of 256 each.

The pseudo code for above is as follows [13].

KeyExpansion (byte key[4*Nk], word w[Nb*(Nr+1)],

Nk) begin word temp

i = 0 while (i < Nk) w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]) i = i+1

end while i

= Nk

while (i < Nb * (Nr+1)] temp = w[i-1] if (i mod Nk = 0)

w[i] = w[i-Nk] xor temp

= i + 1

end while end

Where Nr=14, Nb=8 nk= 8 Nr is no of rounds

Nb is no of columns in state Nk is no of words in a key

SubWord():

It is a function that takes a four-byte input word and applies the S-boxes[i] for ith row to each of the four bytes to produce an output word. The function RotWord () takes a word [a0,a1,a2,a3] as input, performs a cyclic permutation, and returns the word [a1,a2,a3,a0]. The round constant word array, Rcon[i], contains the values given by [xi-1, {00}, {00}, {00}], with x i-1 being powers of x (x is denoted as {02}) in the field GF (13).

WORKING DESIGN OF PARALLELIZATION OF MIXCOLUMN:

The Framework of parallelisation proposed in[10] for AES-

128 with 128 bit block size and 10 rounds, which uses processor pair architecture to distribute computation of MixColumn has been extended and redefined for the proposed system of modified AES having 256 bit block size and 14 rounds.

Let there be N number of blocks and Bi be the input state to the Mix-Columns then the Ci is computed by following matrix

multiplication. We will exploit below 4 functions which are just permutation of 2,3,1,1 and pair up processor to compute a single row.

Ci,1=2bi,1 bi,9 bi,17 3bi,25 Ci,9=3bi,1 2bi,9 bi,17 bi,25 Ci,17=bi, 1 3bi, 9 2bi, 17 bi, 25

Ci,25=bi, 1 bi, 9 3bi, 17 2bi, 25

Similar to Ci, 1 all the bytes of row 1 will be computed.

Similar to Ci, 9 all the bytes of row 2 will be computed, and hence forth.

The Hardeware architecture used here is taken from [14] which was defined for 128-bit AES and is redefined for this 256-bit cipher.

HARDWARE ARCHITECTURE:

Fig.17: hardware architecture for parallelization of MixColumn

Similarily all other 3 rows will be calculated in parallel

Fig. 19: Degree of Improvement (%) for Pipelined Model

(=0) for Encryption

WORKING DESIGN OF PARALLELIZATION OF INVMIXCOLUMN:

The Framework of parallelisation proposed in[10] for AES-

128 with 128 bit block size and 10 rounds, which uses processor pair architecture to distribute computation of MixColumn has been extended and redefined for the proposed system of modified AES having 256 bit block size and 14 rounds.

HARDWARE ARCHITECTURE:

Fig.18:hardware architecture for parallelization of InvMixColumn

Similarily all other 3 rows will be calculated in parallel

Fig. 20: Degree of Improvement (%) for Pipelined Model

(=0) for Encryption

Test Vector:

Plain text: 6920616D20686170707920202020202020202

020202020202020202020202020

Ciphertext: 94501CE577B8E62259F8ADFDBB437375F3

5D3BBC6C6628ED2A9D311F27820321

Plain text: 796F752061726520616E20616D617A6F6E20

706572736F6E2020202020202020

Ciphertext: 108246D27E4E74E98E7550E172F832E65DE

45E301A4DDAC4C1CB5F9312784E69

Plain text: 6162652079616172206B7961206B617275206 D61692020202020202020202020

Ciphertext: 2ACCB71956693E648AD9434AF9D343FEC

64920F66606C0DCA418807DE26C81

CONCLUSION:

The proposed multi-S-box 256-bit block cipher based on AES introduces a structural extension of the standard AES design by increasing the block size to 256 bits and employing multiple Sboxesone per state rowto explore improved diffusion and flexibility. The design retains the core AES principles, including SubBytes, ShiftRows, MixColumns, and AddRoundKey operations, while adapting them to a larger state matrix, and leveraging 4 sboxes. This work also demonstrates a parallel architecture of MixCoulumn with considerable degree of improvement over 98 percent. This work demonstrates a feasible framework for extending AES to larger block size of 256 bit with multiple S-box configurations.

Future research will focus on rigorous security evaluation and performance optimization to assess its practical applicability in modern cryptographic systems.

REFERENCES:

National Institute of Standards and Technology. (2024, December 24). NIST proposes to standardize a wider variant of AES.
Nie, J., Lin, J., & Wang, Y. (2022, May). A variational quantum attack for AES-like symmetric cryptography. arXiv.
Nyberg, K. (1991). Perfect nonlinear S-boxes. In Advances in Cryptology EUROCRYPT 91 (Vol. 547, pp. 378386).

Springer
Daemen, J., & Rijmen, V. (2013). The design of Rijndael: AES The advanced encryption standard. Springer.
Shannon, C. E. (1949). Communication Theory of Secrecy Systems,Bell System Technical Journal.
I. Abd-ElGhafar, A. Rohiem, A. Diaa and F. Mohammed, Generation of AES Key Dependent S Boxes using RC4 Algorithm, 13th International Conference on Aerospace Sciences & Aviation Technology ( ASAT- 13), May 26 28, 2009, Military Technical College, Kobry Elkobbah, Cairo, Egypt, 2009.
J. Juremi, R. Mahmod, S. Sulaiman and J. Ramli, Enhancing Advanced Encryption Standard S-Box Generation Based on Round Key, International Journal of Cyber-Security and Digital Forensics (IJCSDF) Vol. 1, No. 3, pp. 183-188, 2012.
Maiya Din, Saibal K. Pal, S. K. Muttoo, and Sushila Madan, A New S-Box Design by Applying Bat Algorithm- Based Technique, Journal of information technology management.Vol. 15, issue 3, pp.85-98
Nasir Siddiqui ,Fahim Yousaf ,Fiza Murtaza , Muhammad Ehatisham-ul-Haq , M Usman Ashraf Ahmed, M Alghamdi , Ahmed S Alfakeeh A highly nonlinear substitution box (S- box) design using action of modular group on a projective line over a finite field, PLoS One. 2020 Nov 12;15(11):e0241890
M. Rasslan, H. K. Aslan, G. Elkabbany (2014). A design of a fast parallel-pipelined architecture for the AES algorithm. International Journal of Computer Science & Information Technology (IJCSIT), 6(6), 4761.
Hennessy, J. L., & Patterson, D. A. (2019). Computer architecture: A quantitative approach (6th Ed.). Morgan Kaufmann.
Grama, A., Gupta, A., Karypis, G., & Kumar, V. (2003). Introduction to parallel computing (2nd Ed.). Addison- Wesley.
Abdellatif, R., Benslimane, A., & Gerndt, M. (2023). Next-generation block ciphers: Achieving superior memory efficiency, security, and performance. Cryptography, 7(4), 47.
Standards Press of China. (2016), GB/T 32907-2016, SM4 block cipher algorithm (English Version) Beijing: Standards Press of China, 2016.