MD5 T and SHA-256 K Constants

Introduction

During the process of optimizing MD5 and SHA-256, I wasn’t sure how much an impact it would have to generate the T and K constants during compression. Obviously it would be much slower but how much space could be saved?

They appear to the uninformed eye as nothing more than random integers but are in fact derived from sine and cube roots of prime numbers converted to 32-bit integers. In SHA-512, they are 64-bit integers but I won’t discuss that here as it isn’t suitable for x86 architecture.

The constants used in the hash compression function occupy 1024 bytes which is unimportant for most systems today but I was curious how much space could be saved using FPU code.

MD5 Constants

Section 3.4 of the specification published by Rivest in 1992 describes how T constants are created.

This step uses a 64-element table T[1 … 64] constructed from the sine function. Let T[i] denote the i-th element of the table, which is equal to the integer part of 4294967296 times abs(sin(i)), where i is in radians. The elements of the table are given in the appendix.

The following piece of code will generate a T constant when compiled with MSVC.

#pragma intrinsic (fabs, pow, sin)

uint32_t tc (uint32_t i)
{
  uint32_t r;
  r = (uint32_t)(fabs(sin(i)*pow(2,32)));
  return r;
}

The above function when provided 1-64 yields the following values. (note that some are removed)

  D76AA478 E8C7B756 242070DB C1BDCEEE F57C0FAF 
  698098D8 8B44F7AF FFFF5BB1 895CD7BE 6B901122 
  F61E2562 C040B340 265E5A51 E9B6C7AA D62F105D 
  21E1CDE6 C33707D6 F4D50D87 455A14ED A9E3E905 
  FFFA3942 8771F681 6D9D6122 FDE5380C A4BEEA44 
  289B7EC6 EAA127FA D4EF3085 04881D05 D9D4D039 
  F4292244 432AFF97 AB9423A7 FC93A039 655B59C3 
  6FA87E4F FE2CE6E0 A3014314 4E0811A1 F7537E82 

Achieving the same results with x86 assembly can be done with FPU instructions. Since the FPU by default rounds upward and C Math functions round down, we must first set the control word bit otherwise we’ll get incorrect values.

init_fpu proc fastcall
    ; round numbers down
    push   eax
    fstcw  [esp]            ; store control word
    pop    eax
    or     ax, 00400H       ; set round down bit
    and    ax, 0F7FFH       ; clear bit
    push   eax
    fldcw  [esp]            ; load control word
    pop    eax
    ret
init_fpu endp

Then, from 1-64 is processed with sin2int to obtain the 32-bit constant.

    ; get sine of number
    ; return 32-bit fractional part as integer
sin2int proc fastcall integer:dword
    push   1
    push   0
    fild   qword ptr[esp]   ; load 2^32 or 4294967296UL
    push   integer
    fild   dword ptr[esp]
    fsin                    ; fsin (integer)
    fabs
    fmul                    ; fabs(fsin (integer)) * pow(2,32)
    fistp  qword ptr[esp]
    pop    eax
    add    esp, 2*4
    ret
sin2int endp

The above code requires 45 bytes and while it does save a lot of space when compared to using array, it will undoubtedly run much slower.

SHA-256 Constants

This time using cubic root of prime numbers, the specification describes generation.

Both use the same sequence of sixty-four constant 32-bit words, K0{256}, K1{256}, …,K63{256} These words represent the first thirty-two bits of the fractional parts of the cube roots of the first sixty-four prime numbers. In hex, these constant words are (from left to right)

I tried using a simple primality test function but in the end, I used 64 prime numbers stored as array.

#pragma intrinsic(fabs,pow,sqrt)

// cube root of integer
// return fractional part as integer
uint32_t cbr2int (uint32_t x) {
  uint32_t r;
  r = (uint32_t)(fabs(pow((double)p[x],1.0/3.0))*pow(2,32));
  return r;
}

When the above function is called in sequence of SHA-256 round, it will generate following values (some are removed)

  428A2F98 71374491 B5C0FBCF E9B5DBA5 3956C25B 
  D807AA98 12835B01 243185BE 550C7DC3 72BE5D74 
  E49B69C1 EFBE4786 0FC19DC6 240CA1CC 2DE92C6F 
  983E5152 A831C66D B00327C8 BF597FC7 C6E00BF3 
  27B70A85 2E1B2138 4D2C6DFC 53380D13 650A7354 
  A2BFE8A1 A81A664B C24B8B70 C76C51A3 D192E819 
  19A4C116 1E376C08 2748774C 34B0BCB5 391C0CB3 
  748F82EE 78A5636F 84C87814 8CC70208 90BEFFFA

Converting cube roots to integer in assembly using the FPU

; get cubic root of number 
    ; return 32-bit fractional part as integer
cbr2int:
    pushad
    call   cbr_primes
cbr_primes:
    pop    esi
    lea    esi, [esi+(primes-cbr_primes)]
    push   1
    push   0
    fild   qword[esp]   ; load 2^32
    push   1
    fild   dword[esp]
    push   3
    fild   dword[esp]
    fdivp                   ; 1.0/3.0
    movzx  eax, word[esi+2*i]
    push   eax
    fild   dword[esp]   ;
    fyl2x                   ; ->log2(Src1)*exponent
    fld    st0               ; copy the logarithm
    frndint                 ; keep only the characteristic
    fsub   st1, st0        ; keeps only the mantissa
    fxch   st1                 ; get the mantissa on top
    f2xm1                   ; ->2^(mantissa)-1
    fscale
    fstp   st1            ; copy result over and "pop" it
    fmulp  st1, st0                 ; * 2^32
    fistp  qword[esp]   ; save integer
    pop    eax
    add    esp, 4*4         ; release stack
    mov    [esp+1ch], eax
    popad
    ret

SHA-256 Initialization

Not only are the K constants generated from primes but also the initialization values.
This time however, it is square roots of the first eight prime numbers.

// square root of integer
// return fractional part as integer
uint32_t sqrt2int (uint32_t x) {
  uint32_t r;
  r = (uint32_t)(fabs(sqrt((double)primes[x]))*pow(2,32));
  return r;
}

The following values (last 3 are removed)

6A09E667 BB67AE85 3C6EF372 A54FF53A 510E527F ...

And in assembly, the following achieves the same result.

sqrt2int:
    pushad
    call   ld_primes

primes dw 2, 3, 5, 7, 11, 13, 17, 19, 
       dw 23, 29, 31, 37, 41, 43, 47, 53
       dw 59, 61, 67, 71, 73, 79, 83, 89
       dw 97, 101, 103, 107, 109, 113, 127, 131
       dw 137, 139, 149, 151, 157, 163, 167, 173 
       dw 179, 181, 191, 193, 197, 199, 211, 223 
       dw 227, 229, 233, 239, 241, 251, 257, 263
       dw 269, 271, 277, 281, 283, 293, 307, 311
primes_len equ $-primes

ld_primes:
    pop    esi
    ; get square root of number in eax 
    ; return 32-bit fractional part as integer
    push   1
    push   0
    fild   qword[esp]   ; load 2^32
    movzx  eax, word[esi+2*i]
    push   eax
    fild   dword[esp]   ; load integer
    push   eax
    fsqrt
    fld1                ; load 1 
    fsubp  st1, st0     ; subtract to get fractional part
    fmulp  st1, st0     ; multiply fractional part by 2^32
    frndint
    fistp  qword[esp]
    pop    eax 
    add    esp, 3*4         ; release 2^32 on stack
    mov    [esp+1ch], eax
    popad
    ret

Summary

For both assembly versions of MD5 and SHA-256, space is definitely saved but both MSVC and GCC use external functions for floating point arithmetic so there’s no advantage there. You could link custom assembler routines but most people want a function with high performance. Neither of these algorithms are suitable for small devices which is probably another reason to consider SHA-3.

Advertisements
This entry was posted in assembly, cryptography, encryption, programming and tagged , , , , , , , . Bookmark the permalink.

6 Responses to MD5 T and SHA-256 K Constants

  1. Pingback: Asmcodes: SHA-2 | Odzhan

  2. Pingback: Asmcodes: SHA-256 | Odzhan

  3. Pingback: Asmcodes: BLAKE2 Crypto Hash and MAC | Odzhan

  4. Pingback: Asmcodes: SM3 Cryptographic Hash Algorithm (Chinese Standard) | x86 crypto

  5. Pingback: BLAKE2 cryptographic hash | x86 crypto

  6. Pingback: SHA-256 cryptographic hash | x86 crypto

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s