Gimli: a cross-platform permutation function

Introduction

Gimli, named after the Lord Of The Rings character, is a 384-bit permutation function, designed specifically to be lightweight, high performance, and secure across multiple architectures.

It was designed by Daniel J. Bernstein, Stefan Kölbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, Peter Schwabe, François-Xavier Standaert, Yosuke Todo, and Benoît Viguier.

Gimli can easily be used to build high-security block ciphers, tweakable block ciphers, stream ciphers, message-authentication codes, authenticated ciphers and hash functions.

What follows here is a compact implementation in C and x86 assembly. For performance enhanced implementations, please look at reference code written by Frank Denis here.

So happy was Frank with this function, that the LibHydrogen library is built atop Gimli.

However, Mike Hamburg points out in Cryptanalysis of 22 1/2 rounds of Gimli, there are weaknesses in how the Linear layer is applied which may lead to an update of this function in the future.

The Gimli team responded to Hamburg in Statement regarding “Cryptanalysis of 22 1/2 rounds of Gimli”

Mike responded further here.

C code

The only thing to mention here is the cyclic left shift by 24 bits is replaced with cyclic right shift by 8 bits. Same outcome, different direction.

void gimli(void *state)
{
  int      r, j;
  uint32_t t, x, y, z;
  uint32_t *s=(uint32_t*)state;
  
  for (r=0x9e377918; r!=0x9e377900; r--) {
    for (j=0; j<4; j++) {
      x = ROTR32(s[    j], 8);
      y = ROTL32(s[4 + j], 9);
      z =        s[8 + j];

      s[8 + j] = x ^ (z << 1) ^ ((y & z) << 2);
      s[4 + j] = y ^ x        ^ ((x | z) << 1);
      s[j]     = z ^ y        ^ ((x & y) << 3);
    }

    t = r & 3;
    
    // small swap
    if (t == 0) {
      XCHG(s[0], s[1]);
      XCHG(s[2], s[3]);
      
      // add constant      
      s[0] ^= r;
    }   
    // big swap
    if (t == 2) {
      XCHG(s[0], s[2]);
      XCHG(s[1], s[3]);
    }
  }
}

x86 Assembly

The following code includes optmizations from Peter Ferrie It’s currently 112 bytes.

%define j  ebx
%define x  eax
%define y  ebp
%define z  edx

%define s  esi

%define t0 ebp 
%define t1 edi

%define r  ecx

%define s0 eax
%define s1 ebx
%define s2 ebp
%define s3 esi

gimlix:
_gimlix:
    pushad  
    mov    r, 0x9e377900 + 24
g_l0:
    mov    s, [esp+32+4]        ; esi = s 
    push   s
    push   4
    pop    j
g_l1:
    ; x = ROTR32(s[    j], 8);
    lodsd  ;mov    x, [s]  
    ror    x, 8  
    
    ; y = ROTL32(s[4 + j], 9);
    mov    y, [s + (4*3)]   
    rol    y, 9
    
    ; z = s[8 + j];
    mov    z, [s + (4*7)]
    
    ; s[8 + j] = x ^ (z << 1) ^ ((y & z) << 2);
    push   x
    push   y
    lea    t1, [z + z]
    and    y, z
    shl    y, 2
    xor    t1, y
    xor    x, t1    
    mov    [s + (7*4)], x
    pop    y
    pop    x
    
    ; s[4 + j] = y ^ x        ^ ((x | z) << 1);
    push   x
    push   y
    xor    y, x
    or     x, z
    shl    x, 1
    xor    y, x
    mov    [s + (3*4)], y
    pop    y
    pop    x
    
    ; s[j]     = z ^ y        ^ ((x & y) << 3);    
    xor    z, y
    and    x, y
    shl    x, 3
    xor    z, x
    push   z
    
    dec    j
    jnz    g_l1

    pop    s3
    pop    s2
    pop    s1
    pop    s0

    pop    t1

    mov    dl, cl
    and    dl, 3
    jnz    g_l2
    
    ; XCHG (s[0], s[1]);
    xchg   s0, s1
    ; XCHG (s[2], s[3]);
    xchg   s2, s3
    ; s[0] ^= 0x9e377900 ^ r;
    xor    s0, r    
g_l2:
    cmp    dl, 2
    jnz    g_l3  
    ; XCHG (s[0], s[2]);
    xchg   s0, s2
    ; XCHG (s[1], s[3]);
    xchg   s1, s3
g_l3:
    stosd
    xchg   eax, s1
    stosd
    xchg   eax, s2
    stosd
    xchg   eax, s3
    stosd    
    dec    cl   
    jnz    g_l0    
    popad
    ret

See sources here

Advertisements
This entry was posted in assembly, cryptography, encryption, programming, security and tagged , , , , , . Bookmark the permalink.

3 Responses to Gimli: a cross-platform permutation function

  1. David wong says:

    Careful when implementing experimental crypto 🙂 https://eprint.iacr.org/2017/743

    I’m building a small project and I think I’ll go for Keccak-f[400, 12] which is a 400-bit permutation using 12 rounds of the keccak-f iterated permutation.

    Liked by 1 person

  2. Official statement from the Gimli team regarding Mike Hamburg’s cryptanalysis: https://gimli.cr.yp.to/statement.html

    Liked by 1 person

  3. Pingback: XooDoo Permutation Function | crypto on x86 and ARM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s