HC-256 Stream Cipher

Introduction

HC-256 is a software-efficient stream cipher designed and written by Wu Hongjun in 2004. In 2008, it was selected for the final portfolio of eSTREAM, the stream cipher project of the European Network of Excellence for Cryptology (ECRYPT, 2004-2008).

HC-256 was designed to demonstrate that strong stream cipher can be built from nonlinear feedback function and nonlinear output function. The large secret states of the two ciphers are updated in a nonlinear way, and table lookup (with changing tables) is used in the generation of keystream.

Stack exceptions problems

One of the negative features of this cipher is the large RAM requirement when generating key dependent sboxes. It requires approx. between 16-20KB but you cannot just allocate 20KB of stack in your procedure using SUB, ADD or ENTER.

The key generating function uses the following code

;
    xor    ecx, ecx          ; ecx=0
    mul    ecx               ; eax=0, edx=0
    
    mov    cl, 5             ; ecx=5
    mov    dh, 16            ; edx=4096
    ; allocate stack memory in 4096 byte blocks
    ; 4 x 4096 = 16384 bytes, 
    ; additional 4096 bytes just in case (not needed?)
xalloca:
    sub    esp, edx          ; subtract page size
    test   [esp], esp        ; page probe
                             ; causes pages of memory to be 
                             ; allocated via the guard page 
                             ; scheme (if possible)
    loop   xalloca           ; raises exception if 
                             ; unable to allocate

If you don’t perform the page probe after allocating the space required, an exception will occur once you try to use that address of memory. I never imagined this could be an issue… but again, it’s one of those things you need to consider when using lots of stack space.

Key Initialization

The initialization process of HC-256 consists of expanding the key and initialization vector into P and Q (similar to the message setup in SHA-256) and running the cipher 4096 steps without generating output.

Steps 4 and 5 are obviously from the SHA-256 hash algorithm.

#define SIG0(x)(ROTR32((x),  7) ^ ROTR32((x), 18) ^ ((x) >>  3))
#define SIG1(x)(ROTR32((x), 17) ^ ROTR32((x), 19) ^ ((x) >> 10))

// both key and iv must be 32 bytes / 256-bits!
void hc256_setkeyx(hc_ctx *c, void *kiv)
{
    uint32_t W[2560], i;
    uint32_t *pq, *wp;
    
    // 1. set counter
    c->ctr = 0;
    
    // 2. zero initialize key and iv buffers
    for (i=0; i<16; i++) {
      c->kiv[i] = 0;
    }
    
    // 3. setup the key and iv
    for (i=0; i<64; i++) {
      c->key[i>>2] = c->key[i>>2] | ((uint8_t*)kiv)[i];
      c->key[i>>2] = ROTL32(c->key[i>>2], 8);
    }
    
    // 4. copy 16 words to work buffer
    for (i=0; i<16; i++) { 
      W[i] = c->kiv[i];
    }

    // 5. expand buffer using SHA-256 macros
    for (i=16; i<2560; i++) {
      W[i] = SIG1(W[i-2])  + W[i-7]  + 
             SIG0(W[i-15]) + W[i-16] + i; 
    }
    
    pq = c->T;
    wp = &W[512];
    
    // 6. set the P and Q tables
    for (i=0; i<2048; i++) { 
      *pq++ = *wp++;
    }
    
    // 7. run cipher 4096 iterations before generating output
    for (i=0; i<4096; i++) {
      generate(c);
    }
}
hc256_setkeyx:
_hc256_setkeyx:
    pushad
    mov    edi, [esp+32+4]   ; edi=c
    mov    esi, [esp+32+8]   ; esi=kiv
    
    xor    ecx, ecx          ; ecx=0
    mul    ecx               ; eax=0, edx=0
    
    mov    cl, 5             ; ecx=5
    mov    dh, 16            ; edx=4096
    ; allocate stack memory in 4096 byte blocks
    ; 4 x 4096 = 16384 bytes, 
    ; additional 4096 bytes just in case (not needed?)
xalloca:
    sub    esp, edx          ; subtract page size
    test   [esp], esp        ; page probe
                             ; causes pages of memory to be 
                             ; allocated via the guard page scheme (if possible)
    loop   xalloca           ; raises exception if unable to allocate
    mov    ebx, esp          ; ebx=W
    
    push   edx               ; save 4096    
    push   edi               ; save ptr to c
    stosd                    ; c->ctr=0
    push   edx               ; save 4096
    push   edi               ; save ptr to c->T
    push   edx               ; save 4096
    
    ; 2. copy 512-bits of key/iv to workspace
    mov    cl, 64
    mov    edi, ebx          ; edi=W
    rep    movsb
    
    mov    esi, ebx          ; esi=W
    mov    cl, 16
expand_key:
    ; eax = SIG0(W[i-15])
    mov    eax, [edi - 15*4]
    mov    edx, eax
    mov    ebp, eax
    ror    eax, 7
    ror    edx, 18
    shr    ebp, 3
    xor    eax, edx
    xor    eax, ebp
    ; ebx = SIG1(W[i-2])
    mov    ebx, [edi - 2*4]
    mov    edx, ebx
    mov    ebp, ebx
    ror    ebx, 17
    ror    edx, 19
    shr    ebp, 10
    xor    ebx, edx
    xor    ebx, ebp
    ; W[i] = ebx + W[i-16] + eax + w[i-7] + i
    add    eax, [edi - 16*4]
    add    ebx, [edi -  7*4]
    add    eax, ebx
    add    eax, ecx
    stosd
    inc    ecx
    cmp    ecx, [esp]        ; 4096 words
    jnz    expand_key
    
    pop    ecx               ; ecx=4096
    pop    edi               ; edi=c->T
    shr    ecx, 1            ; /=2 for 2048 words
    add    esi, ecx          ; add 512*4
    rep    movsd
    
    pop    ecx               ; ecx=4096
    pop    edi               ; edi=ctx
sk_l3:
    call   hc256_generatex
    loop   sk_l3
    
    pop    eax              ; eax=4096
    lea    esp, [esp+eax*4] ; free stack
    add    esp, eax
    popad
    ret

Keystream Generation

At each step, one element of a table is updated and one 32-bit output is generated. An S-box is used to generate only 1024 outputs, then it is updated in the next 1024 steps. The keystream generation process of HC-256 is given below.

uint32_t h(uint32_t T[], uint32_t x) {
  return T[x & 255]                 + 
         T[256 + ((x >>  8) & 255)] + 
         T[512 + ((x >> 16) & 255)] + 
         T[768 + ((x >> 24) & 255)];  
}

// step function
uint32_t generate(hc_ctx* c)
{
    uint32_t r, i, i3, i10, i12, i1023;
    uint32_t *x0, *x1, *t;
    
    // 1. obtain indices based on counter
    i     = c->ctr     & 0x3ff;
    i3    = (i - 3)    & 0x3ff;
    i10   = (i - 10)   & 0x3ff;
    i12   = (i - 12)   & 0x3ff;
    i1023 = (i - 1023) & 0x3ff;

    if (c->ctr < 1024) {
      x0=c->P;
      x1=c->Q;
    } else {
      x0=c->Q;
      x1=c->P;
    }
    
    // 2.
    x0[i] = x0[i]   + 
            x0[i10] + 
            (ROTR32(x0[i3], 10) ^ ROTR32(x0[i1023], 23)) + 
            x1[(x0[i3] ^ x0[i1023]) & 0x3ff];
         
    // 3. 
    r = h(x1, x0[i12]) ^ x0[i];
    
    // 4. update counter modulo 2048
    c->ctr = (c->ctr+1) & 0x7ff;
    
    return r;
}
; expects ctx in edi
hc256_generatex:
_hc256_generatex:
    pushad
    xor    edx, edx
    mov    dh, 8               ; edx = 2048
    mov    esi, edi            ; esi = c
    lodsd                      ; eax = c->ctr
    push   eax                 ; save  c->ctr
    inc    eax                 ; c->ctr++
    dec    edx                 ; edx = 2047
    and    eax, edx            ; c->ctr &= 2047
    stosd                      ; save new c->ctr
    pop    eax                 ; restore old c->ctr
    lea    edi, [esi+edx*2+2]  ; x0  = c->Q
    shr    edx, 1              ; edx = 1023
    push   edx                 ; save 1023
    cmp    eax, edx            ; c->ctr > 1023
    jbe    gen_l0
    xchg   esi, edi            ; swap Q and P ptrs
gen_l0:
    and    eax, edx            ; i &= 1023
    
    lea    ebx, [eax-3]        ; i3 = (i - 3) & 1023;
    and    ebx, edx
    
    lea    ecx, [eax-10]       ; i10 = (i - 10) & 1023;
    and    ecx, edx
    
    lea    ebp, [eax-12]       ; i12 = (i - 12) & 1023;
    and    ebp, edx
    mov    ebp, [esi+ebp*4]
    push   ebp                 ; save i12
    
    mov    ebp, eax            ; i1023 = (i - 1023) & 1023;
    sub    ebp, edx
    and    ebp, edx
    
    push   eax                 ; save i
    mov    eax, [esi+eax*4]    ; eax  = x0[i]
    add    eax, [esi+ecx*4]    ; eax += x0[i10]
    
    mov    ebp, [esi+ebp*4]    ; ebp  = x0[i1023]
    mov    ebx, [esi+ebx*4]    ; ebx  = x0[i3]
    mov    ecx, ebx            ; ecx  = x0[i3]
    xor    ecx, ebp            ; ecx ^= x0[i1023]
    and    ecx, edx            ; ecx &= 0x3ff
    add    eax, [edi+ecx*4]    ; ecx  = x1[(x0[i3] ^ x0[i1023]) & 1023]
    ror    ebx, 10             ; ebx  = ROTR32(x0[i3], 10)
    rol    ebp, 9              ; ebp  = ROTL32(x0[i1023], 9)
    xor    ebx, ebp            ;
    add    eax, ebx            ; 
    pop    ebx                 ; ebx = i
    mov    [esi+ebx*4], eax    ; eax = (x0[i] += eax)
    pop    edx                 ; edx = i12
    
    pop    ebp                 ; ebp=1023
    inc    ebp                 ; ebp=1024
    xchg   eax, ecx
    xor    eax, eax            ; r=0
gen_l1:
    movzx  ebx, dl
    add    eax, [edi+ebx*4]
    add    edi, ebp            ; x1 += 1024/4
    shr    edx, 8              ; w1 >>= 8
    jnz    gen_l1
    
    xor    eax, ecx            ; r ^= w0;
    mov    [esp+_eax], eax     ; return r;
    popad
    ret

Encryption and Decryption

The keystream is XORed with the message for encryption. The decryption is to XOR the keystream with the ciphertext.

// encrypt/decrypt data in place
void hc256_crypt(hc_ctx *c, void *in, uint32_t inlen)
{
  uint32_t i, w;
  uint8_t *p=(uint8_t*)in;
  
  // encrypt all bytes
  for (i=0; i<inlen;) {
    w = generate(c);
    // encrypt 4 bytes or until i equals inlen
    do {
      p[i++] ^= (w & 255);
      w >>= 8;
    } while (i < inlen && w != 0);
  }
}
hc256_cryptx:
_hc256_cryptx:
    pushad
    lea    esi, [esp+32+4]
    lodsd
    xchg   edi, eax          ; edi=ctx
    lodsd
    xchg   edx, eax          ; edx=in
    lodsd
    xchg   ecx, eax          ; ecx=len
hc_l0:                       ; .repeat
    jecxz  hc_l2             ; .break .if ecx == 0
    call   hc256_generatex
hc_l1:
    xor    [edx], al         ; *in ^= (w0 & 0xFF)
    inc    edx               ; in++
    shr    eax, 8            ; w0 >>= 8
    loopnz hc_l1             ; .while ecx != 0 && eax != 0
    jmp    hc_l0
hc_l2:
    popad
    ret

Test Vector results

HC256 test #1 - OK
5b, 07, 89, 85, d8, f6, f3, 0d,
42, c5, c0, 2f, a6, b6, 79, 51,
53, f0, 65, 34, 80, 1f, 89, f2,
4e, 74, 24, 8b, 72, 0b, 48, 18,

HC256 test #2 - OK
af, e2, a2, bf, 4f, 17, ce, e9,
fe, c2, 05, 8b, d1, b1, 8b, b1,
5f, c0, 42, ee, 71, 2b, 31, 01,
dd, 50, 1f, c6, 0b, 08, 2a, 50,

HC256 test #3 - OK
1c, 40, 4a, fe, 4f, e2, 5f, ed,
95, 8f, 9a, d1, ae, 36, c0, 6f,
88, a6, 5a, 3c, c0, ab, e2, 23,
ae, b3, 90, 2f, 42, 0e, d3, a8,

Summary

architecture size
x86 272
x64 n/a

Since it was chosen for the final portfolio of eSTREAM, it’s likely to have undergone extensive cryptanalysis. It’s unpatented, suitable for 32-bit CPUs and the assembly output of hc256x.c is approx. 454 bytes which is just a little bigger than ChaCha20 (424 bytes) a close relative of Salsa20 which was also selected for the final portfolio.

Please refer to here for sources, documentation and any future updates that may not be shown here.

Known attacks

Improved Distinguishing Attacks on HC-256 by Gautham Sekar and Bart Preneel
A Cache Timing Analysis of HC-256 by Erik Zenner

Advertisements
This entry was posted in assembly, cryptography, encryption, programming, security and tagged , , , , . Bookmark the permalink.

One Response to HC-256 Stream Cipher

  1. Pingback: Asmcodes: Rabbit Stream Cipher | x86 crypto

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s