## Gimli: a cross-platform permutation function

Introduction

Gimli, named after the Lord Of The Rings character, is a 384-bit permutation function, designed specifically to be lightweight, high performance, and secure across multiple architectures.

Gimli can easily be used to build high-security block ciphers, tweakable block ciphers, stream ciphers, message-authentication codes, authenticated ciphers and hash functions.

What follows here is a compact implementation in C and x86 assembly. For performance enhanced implementations, please look at reference code written by Frank Denis here.

So happy was Frank with this function, that the LibHydrogen library is built atop Gimli.

However, Mike Hamburg points out in Cryptanalysis of 22 1/2 rounds of Gimli, there are weaknesses in how the Linear layer is applied which may lead to an update of this function in the future.

The Gimli team responded to Hamburg in Statement regarding “Cryptanalysis of 22 1/2 rounds of Gimli”

## C code

The only thing to mention here is the cyclic left shift by 24 bits is replaced with cyclic right shift by 8 bits. Same outcome, different direction.

```void gimli(void *state)
{
int      r, j;
uint32_t t, x, y, z;
uint32_t *s=(uint32_t*)state;

for (r=0x9e377918; r!=0x9e377900; r--) {
for (j=0; j<4; j++) {
x = ROTR32(s[    j], 8);
y = ROTL32(s[4 + j], 9);
z =        s[8 + j];

s[8 + j] = x ^ (z << 1) ^ ((y & z) << 2);
s[4 + j] = y ^ x        ^ ((x | z) << 1);
s[j]     = z ^ y        ^ ((x & y) << 3);
}

t = r & 3;

// small swap
if (t == 0) {
XCHG(s[0], s[1]);
XCHG(s[2], s[3]);

s[0] ^= r;
}
// big swap
if (t == 2) {
XCHG(s[0], s[2]);
XCHG(s[1], s[3]);
}
}
}
```

## x86 Assembly

The following code includes optmizations from Peter Ferrie It’s currently 112 bytes.

```%define j  ebx
%define x  eax
%define y  ebp
%define z  edx

%define s  esi

%define t0 ebp
%define t1 edi

%define r  ecx

%define s0 eax
%define s1 ebx
%define s2 ebp
%define s3 esi

gimlix:
_gimlix:
mov    r, 0x9e377900 + 24
g_l0:
mov    s, [esp+32+4]        ; esi = s
push   s
push   4
pop    j
g_l1:
; x = ROTR32(s[    j], 8);
lodsd  ;mov    x, [s]
ror    x, 8

; y = ROTL32(s[4 + j], 9);
mov    y, [s + (4*3)]
rol    y, 9

; z = s[8 + j];
mov    z, [s + (4*7)]

; s[8 + j] = x ^ (z << 1) ^ ((y & z) << 2);
push   x
push   y
lea    t1, [z + z]
and    y, z
shl    y, 2
xor    t1, y
xor    x, t1
mov    [s + (7*4)], x
pop    y
pop    x

; s[4 + j] = y ^ x        ^ ((x | z) << 1);
push   x
push   y
xor    y, x
or     x, z
shl    x, 1
xor    y, x
mov    [s + (3*4)], y
pop    y
pop    x

; s[j]     = z ^ y        ^ ((x & y) << 3);
xor    z, y
and    x, y
shl    x, 3
xor    z, x
push   z

dec    j
jnz    g_l1

pop    s3
pop    s2
pop    s1
pop    s0

pop    t1

mov    dl, cl
and    dl, 3
jnz    g_l2

; XCHG (s[0], s[1]);
xchg   s0, s1
; XCHG (s[2], s[3]);
xchg   s2, s3
; s[0] ^= 0x9e377900 ^ r;
xor    s0, r
g_l2:
cmp    dl, 2
jnz    g_l3
; XCHG (s[0], s[2]);
xchg   s0, s2
; XCHG (s[1], s[3]);
xchg   s1, s3
g_l3:
stosd
xchg   eax, s1
stosd
xchg   eax, s2
stosd
xchg   eax, s3
stosd
dec    cl
jnz    g_l0
ret
```

See sources here

This entry was posted in assembly, cryptography, encryption, programming, security and tagged , , , , , . Bookmark the permalink.

### 3 Responses to Gimli: a cross-platform permutation function

1. David wong says:

Careful when implementing experimental crypto 🙂 https://eprint.iacr.org/2017/743

I’m building a small project and I think I’ll go for Keccak-f[400, 12] which is a 400-bit permutation using 12 rounds of the keccak-f iterated permutation.

Liked by 2 people

2. Official statement from the Gimli team regarding Mike Hamburg’s cryptanalysis: https://gimli.cr.yp.to/statement.html

Liked by 2 people