Asmcodes: LEA-128 Block Cipher


LEA is a 128-bit block cipher with support for 128, 192 and 256-bit keys published in 2014. It was designed by Deukjo Hong, Jung-Keun Lee, Dong-Chan Kim, Daesung Kwon, Kwon Ho Ryu, and Dong-Geon Lee.

The only operations used for encryption and the key schedule are 32-bit Addition, eXclusive OR and Rotation (ARX structure): the designers state “the usage of 32-bit and 64-bit processors will grow rapidly compared to 8-bit and 16-bit ones”

Today I’ll just focus on an implementation using 128-bit keys referred to as LEA-128. This just about fits onto the 32-bit x86 architecture. The 256-bit version requires additional registers and is probably better suited for 64-bit mode.

Key Schedule

During generation of subkeys, a number of predefined constants are used.

\textbf{Constants. } \text{The key schedule uses several constants for generating round keys, which are defined as}

\delta[0] = 0xc3efe9db, \quad \delta[1] = 0x44626b02,
\delta[2] = 0x79e27c8a, \quad \delta[3] = 0x78df30ec,
\delta[4] = 0x715ea49e, \quad \delta[5] = 0xc785da0a,
\delta[6] = 0xe04ef22a, \quad \delta[7] = 0xe5c40957.

\text{They are obtained from hexadecimal expression of } \sqrt{766995}, \text{ where 76, 69 and 95 are ASCII codes of 'L', 'E' and 'A'.}

There are 3 different key schedule functions but I only focus on the 128-bit variant for now.

\textbf{Key Schedule with a 128-Bit Key. } \text{Let } K = (K[0], K[1], K[2], K[3]) \text{ be a 128-bit key. We set } T[i] = K[i]\: for\: 0\leq i < 4.\\   \text{ Round key} RK_i = (RK_i[0], RK_i[1],\dots , RK_i[5])\: for\: 0\leq i < 24 \text{ are produced through the following relations:}

T[0]\leftarrow ROL_1(T[0]\oplus ROL_i(\delta[i\:mod\: 4])),\\  T[1]\leftarrow ROL_3(T[1]\oplus ROL_{i+1}(\delta[i\:mod\: 4])),\\  T[2]\leftarrow ROL_6(T[2]\oplus ROL_{i+2}(\delta[i\:mod\: 4])),\\  T[3]\leftarrow ROL_{11}(T[3]\oplus ROL_{i+3}(\delta[i\:mod\: 4])),\\  RK_i\leftarrow (T[0], T[1], T[2], T[3], T[1]).


This is a really simple algorithm to implement and there’s not much to say that can’t be found in the specification published by the authors.

So here’s the function in C to perform encryption on 128-bits of plaintext using 128-bit key in one single call. I’d suggest using something like this with counter (CTR) mode so it doesn’t require the decryption procedure.

Full x86 assembly

You might notice the constants are different from C sources. For whatever reason, the last 3 are rotated a number of bits left before entering the encryption loop as you can see above.

Obviously a compiler will be smart enough to see this and automatically optimize but for assembly code, we must rotate them manually. They’re stored on the stack using PUSHAD. So, EDI, ESI, EBP and ESP are used for TD array.

ESP has to be initialized after the PUSHAD for obvious reasons. We don’t want to cause an exception.

bits 32
    %ifndef BIN
      global lea128_encryptx
      global _lea128_encryptx
struc pushad_t
  _edi resd 1
  _esi resd 1
  _ebp resd 1
  _esp resd 1
  _ebx resd 1
  _edx resd 1
  _ecx resd 1
  _eax resd 1
%define k0 ebx
%define k1 edx
%define k2 edi
%define k3 ebp
%define p0 [esi+4*0]     
%define p1 [esi+4*1]     
%define p2 [esi+4*2]     
%define p3 [esi+4*3]
%define t ecx
%define x eax
%define y ecx
%define LEA128_RNDS 24
    ; initialize 4 constants
    mov    edi, 0xc3efe9db   ; td[0]
    mov    esi, 0x88c4d604   ; td[1]
    mov    ebp, 0xe789f229   ; td[2]
    mov    dword[esp+_esp], 0xc6f98763   ; td[3] 
    mov    esi, [esp+64+4]   ; esi = key
    ; load key
    xchg   eax, k0
    xchg   eax, k1
    xchg   eax, k2
    xchg   eax, k3
    mov    esi, [esp+64+8]   ; esi = block
    xor    eax, eax          ; i = 0    
    push   eax
    and    al, 3
    mov    t, [esp+eax*4+4]  ; t = td[i % 4]
    rol    t, 4
    mov    [esp+eax*4+4], t  ; td[i & 3] = ROTL32(t, 4);
    ror    t, 4
    ; **************************************
    ; create subkey
    ; **************************************
    ; k0 = ROTL32(k0 + t, 1);
    add    k0, t             
    rol    k0, 1
    ; k1 = ROTL32(k1 + ROTL32(t, 1),  3);
    rol    t, 1              
    add    k1, t
    rol    k1, 3
    ; k2 = ROTL32(k2 + ROTL32(t, 2),  6);
    rol    t, 1              
    add    k2, t
    rol    k2, 6
    ; k3 = ROTL32(k3 + ROTL32(t, 3), 11);
    rol    t, 1              
    add    k3, t
    rol    k3, 11
    ; **************************************
    ; encrypt block
    ; **************************************
    ; p3 = ROTR32((p2 ^ k3) + (p3 ^ k1), 3);
    mov    x, p2             
    mov    y, p3
    xor    x, k3
    xor    y, k1
    add    x, y
    ror    x, 3
    mov    p3, x
    ; p2 = ROTR32((p1 ^ k2) + (p2 ^ k1), 5);
    mov    x, p1             
    mov    y, p2
    xor    x, k2
    xor    y, k1
    add    x, y
    ror    x, 5
    mov    p2, x
    ; p1 = ROTL32((p0 ^ k0) + (p1 ^ k1), 9);
    mov    x, p0             
    mov    y, p1
    xor    x, k0
    xor    y, k1
    add    x, y
    rol    x, 9
    mov    p1, x
    ; rotate block 32-bits
    mov    t, p3
    xchg   t, p2             ; XCHG(p0, p1); 
    xchg   t, p1             ; XCHG(p1, p2);
    xchg   t, p0             ; XCHG(p2, p3);
    mov    p3, t
    pop    eax
    inc    eax               ; i++
    cmp    al, LEA128_RNDS
    jnz    lea_l0


The assembly generated by MSVC using /O2 /Os is 232 bytes for the single call (encryption only) and 266 for 2 separate functions. The hand written x86 assembly for just the single call is currently 161 bytes.

See lx.asm for x86 assembly or lea.c for C source

Thanks to 0x4d_ for \LaTeX formulas.

This entry was posted in assembly, cryptography, encryption, programming and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s