### Introduction

The Chaskey cipher is a 128-bit block, 128-bit key symmetric encryption algorithm which is the underlying function used for the Chaskey Message Authentication Code (MAC).

It’s based on an Even-Mansour construction which makes it very simple to implement and because of its permutation function derived from SipHash, using only ARX instructions makes it highly suitable for many architectures.

Shimon Even and Yishay Mansour published a paper in 1997 titled A Construction of a Cipher From a Single Pseudorandom Permutation which suggested an incredibly simple but provably secure design for a cryptographic algorithm.

Key Whitening, a technique invented by Ron Rivest in 1984 is performed before and after the application of F which is a publicly known permutation function.

Key Whitening consists of using a simple XOR operation of the key with data which is intended to increase resistance to brute force attack and is used in many modern ciphers today.

### F function

The permutation function uses 16 rounds of ADD/ROL/XOR (ARX) instructions for encryption and is derived from the SipHash algorithm.

SipHash is a fast short-input Pseudo-Random-Function(PRF) designed and published in 2012 by Jean-Philippe Aumasson and Daniel J. Bernstein.

The decryption of ciphertext is simply reversing the permutation function using SUB/ROR/XOR.

```void chas_encrypt(int enc, void *key, void *buf)
{
int      i;
uint32_t *v=(uint32_t*)buf;
uint32_t *k=(uint32_t*)key;

// pre-whiten
for (i=0; i<4; i++) {
v[i] ^= k[i];
}

// apply permutation function
for (i=0; i<16; i++) {
{
v[0] += v[1];
v[1]=ROTL32(v[1], 5);
v[1] ^= v[0];
v[0]=ROTL32(v[0],16);
v[2] += v[3];
v[3]=ROTL32(v[3], 8);
v[3] ^= v[2];
v[0] += v[3];
v[3]=ROTL32(v[3],13);
v[3] ^= v[0];
v[2] += v[1];
v[1]=ROTL32(v[1], 7);
v[1] ^= v[2];
v[2]=ROTL32(v[2],16);
} else {
v[2]=ROTR32(v[2],16);
v[1] ^= v[2];
v[1]=ROTR32(v[1], 7);
v[2] -= v[1];
v[3] ^= v[0];
v[3]=ROTR32(v[3],13);
v[0] -= v[3];
v[3] ^= v[2];
v[3]=ROTR32(v[3], 8);
v[2] -= v[3];
v[0]=ROTR32(v[0],16);
v[1] ^= v[0];
v[1]=ROTR32(v[1], 5);
v[0] -= v[1];
}
}
// post-whiten
for (i=0; i<4; i++) {
v[i] ^= k[i];
}
}
```

The assembly is straight forward. We load buffer into ESI, key into EDI and enc into ECX. Load 4 32-bit registers with 128-bit data, apply pre-whitening with 128-bit key. Test ECX for zero, then save flag status with PUSHFD. This then frees ECX to use as a loop counter which is set to 16 (for LTS).

After each round of permutation, restore the flag status with POPFD and keep looping until ECX is zero. Finally apply post-whitening using 128-bit key, save and return.

```%define v0 eax
%define v1 ebx
%define v2 edx
%define v3 ebp

chas_encryptx:
_chas_encryptx:
lea     esi, [esp+32+4]
lodsd
xchg    ecx, eax          ; ecx = enc
lodsd
xchg    edi, eax          ; edi = key
lodsd
xchg    eax, esi          ; esi = buf
push    esi
lodsd
xchg    eax, v3
lodsd
xchg    eax, v1
lodsd
xchg    eax, v2
lodsd
xchg    eax, v3
; pre-whiten
xor     v0, [edi   ]
xor     v1, [edi+ 4]
xor     v2, [edi+ 8]
xor     v3, [edi+12]
test    ecx, ecx
mov     cl, 16
ck_l0:
pushfd
jz      ck_l1
; encrypt
rol     v1, 5
xor     v1, v0
rol     v0, 16
rol     v3, 8
xor     v3, v2
rol     v3, 13
xor     v3, v0
rol     v1, 7
xor     v1, v2
rol     v2, 16
jmp     ck_l2
ck_l1:
; decrypt
ror     v2, 16
xor     v1, v2
ror     v1, 7
sub     v2, v1
xor     v3, v0
ror     v3, 13
sub     v0, v3
xor     v3, v2
ror     v3, 8
sub     v2, v3
ror     v0, 16
xor     v1, v0
ror     v1, 5
sub     v0, v1
ck_l2:
popfd
loop    ck_l0
ck_l3:
; post-whiten
xor     v0, [edi   ]
xor     v1, [edi+ 4]
xor     v2, [edi+ 8]
xor     v3, [edi+12]
pop     edi
; save buf
stosd
xchg    eax, v1
stosd
xchg    eax, v2
stosd
xchg    eax, v3
stosd