Asmcodes: Diffie Hellman and Modular Exponentiation

Introduction

The Story Of Non-Secret Encryption published in 1997 by the now deceased J.H.Ellis documents early research by GCHQ employees into what we now refer to as PKC (Public Key Cryptography).

Below is a list of papers/reports referenced by Ellis in his 1997 paper.

paper/report authors date
Final Report on Project C43 Bell Telephone Laboratory 1944
The Possibility of Secure Non-Secret Digital Encryption James H. Ellis (GCHQ) January 1970
A Note on Non-Secret Encryption Clifford C. Cocks (GCHQ) November 1973
Non-Secret Encryption Using a Finite Field M J Williamson (GCHQ) January 1974
Thoughts on Cheaper Non-Secret Encryption M J Williamson (GCHQ) August 1976
New Directions in Cryptography W Diffie and M E Hellman November 1976
A Method for Obtaining Digital Signatures and Public-key Cryptosystems R L Rivest, A Shamir, L Adleman February 1978

Ellis states his first thoughts about NSE began in the late 60s which he finally wrote about in 1970 while working at the GCHQ. His idea was based on a Bell Laboratory report published in 1944.

The story begins in the 60’s. The management of vast quantities of key material needed for secure communication was a headache for the armed forces. It was obvious to everyone, including me, that no secure communication was possible without secret key, some other secret knowledge, or at least some way in which the recipient was in a different position from an interceptor. After all, if they were in identical situations how could one possibly be able to receive what the other could not? Thus there was no incentive to look for something so clearly impossible.

At that time, he was unable to devise and implement a practical solution for his theory and it wasn’t until 1973 when another cryptographer and mathematician working at GCHQ by the name of Clifford Cocks provided a mathematical proof of concept. This was all classified of course until the death of Ellis in 1997.

Publicly however, the concept of PKC was first proposed by Ralph Merkle in his 1974 paper Secure Communications over Insecure Channels

This later became the basis of the Diffie-Hellman key exchange protocol described by Whitfield Diffie and Martin Hellman in their paper New Directions In Cryptography

Incidentally, another GCHQ cryptographer M.J. Williamson wrote about the same key exchange algorithm 3 months before the publication of the Diffie/Hellman paper.

The method was published by Diffie and Hellman. This was identical to Williamson’s version, except that they restricted q to be prime. — Ellis

What most early and modern day PKC systems have in common is their use of Modular Exponentiation and what follows is a tiny implementation in x86 assembler, optimized for size, so if you require a version optimized for speed, consider using a library such as PolarSSL or OpenSSL

The security of this function depends upon the inability of an adversary/interceptor to solve the (DLP) Discrete Logarithm Problem which is believed to be difficult when the numbers involved are large. More recently in 2015, cryptographers published Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice which suggests feasible attacks against DH which may be used by the NSA and GCHQ.

Simple implementation

If p and g have already been chosen (which they normally are, I’ll use predefined oakley prime numbers from RFC2409) then you only need modular exponentiation and a cryptographically secure random number generator.

diffie

Just to illustrate the basic functionality of modular exponentiation, here are 3 functions that perform the calculation over 64-bit field.

Obviously 64-bit integers are insufficient for security purposes but it’s simply to show what mathematical operators are used. These are then implemented in assembly to work with arbitrary sized numbers.

In order for this to work, the base and exponent must be in range of the modulus for addmod() to return correct results. The base and exponent values must be less than the modulus. This is how Diffie-Hellman and RSA work anyway so I don’t see it as a limitation. (open to correction)

// Modular Addition
uint64_t addmod (uint64_t a, uint64_t b, uint64_t m)
{
  a = a + b;
  if (a >= m) {
    a = a - m;
  }
  return a;
}

// Modular multiplication
uint64_t mulmod (uint64_t a, uint64_t b, uint64_t m)
{
  uint64_t r=0, t=a;
  
  while (b > 0) {
    if (b & 1) {
      r = addmod (r, t, m);
    }
    t = addmod (t, t, m);
    b >>= 1;
  }
  return r;
}

// Modular exponentiation
uint64_t expmod (uint64_t b, uint64_t e, uint64_t m)
{
  uint64_t r=1, t=b;

  while (e > 0) {
    if (e & 1) {
      r = mulmod (r, t, m);
    }
    t = mulmod (t, t, m);
    e >>= 1;
  }
  return r;
}

You can see that e is shifted to the right by one, this is essentially dividing by 2.
Implementing a function to perform this is possible but we’d need to make a copy of the exponent due to modification and it’s not necessary. The BT instruction sets the carry flag if a bit at specific position in memory is set and is much more efficient than shifting right using RCR instruction.

The following is just some simple code to demonstrate using the above functions.
What isn’t shown is how p and g are generated.

// Alice and Bob agree to use same p and g for key exchange
  // Both generate a random integer to establish session key.
  x=rand_range (p);
  y=rand_range (p);
  
  // Alice computes g ^ x mod p
  A=modexp (g, x, p);
  
  // Bob computes g ^ y mod p
  B=modexp (g, y, p);
  
  // Alice and Bob exchange A and B values
  
  // Alice computes session key using B ^ x mod p
  S1 = modexp (B, x, p);
  printf ("nn  Alice S=%llu", S1);
  
  // Bob computes session key using A ^ x mod p
  S2 = modexp (A, y, p);
  printf ("n  Bob   S=%llu", S2);
  
  // S1 and S2 should be equal
  printf ("nn  Key exchange %sn", 
    S1 == S2 ? "succeeded" : "failed");

Getting highest bit

The following will obtain the highest bit of large number.

; 
; Find highest bit in big memory
; Return in eax
;
; in: eax = 0, 
;     ecx = max size of memory in bytes, 
;     ebx = pointer to memory
;
; out: eax = last bit in array
;
cntbits proc
    lea   eax, [eax+8*ecx]
    dec   eax
ghb:
    bt    [ebx], eax
    jc    end_cnt
    dec   eax
    jnz   ghb
end_cnt:
    ret
cntbits endp

Next function is modular addition which is used in the modular multiplication function. Why modular addition using subtraction? Normally we need a full modulo function but it isn’t required here because we know b and e are less than m, or at least they should be for diffie-hellman key exchange. This is obviously much faster too.

Modular Addition

// Modular Addition
uint64_t addmod (uint64_t a, uint64_t b, uint64_t m)
{
  a = a + b;
  if (a >= m) {
    a = a - m;
  }
  return a;
}

In assembly, we make use of ADC (Add with Carry) and SBB (Subtract with Borrow)
The numbers are compared from top to bottom.

; **************************************************
; r = r + t; if (r >= m) r -= m;
; t = t + t; if (t >= m) t -= m;
; **************************************************
addmod proc c
    pushad
    cmovnc  edi, esi
    clc
    pushad
add_loop:
    lodsb
    adc    al, [edi]
    stosb
    loop   add_loop
    popad
    mov    esi, edi
    mov    edi, ebp
    pushad
cmp_loop:
    mov    al, [esi+ecx-1]
    cmp    al, [edi+ecx-1]
    loope  cmp_loop
    popad
    jb     end_am
sub_loop:
    lodsb
    sbb    al, [edi]
    mov    [esi-1], al
    lea    edi, [edi+1]
    loop   sub_loop
end_am:
    popad
    ret
addmod endp

Modular Multiplication

Essentially you are multiplying two numbers together and applying modulo operator like (a * b) % m. However, because we expect numbers to be much larger than 64-bits, we use binary multiplication.

uint64_t mulmod (uint64_t num1, uint64_t num2, 
    uint64_t modulus)
{
  uint64_t r=0, b=num1, e=num2, m=modulus;

  while (e > 0) {
    if (e & 1) {
      r = addmod (r, b, m);
    }
    b = addmod (b, b, m);
    e >>= 1;
  }
  return r;
}
mulmod proc c
    pushad

    ; if (cf==1) do mulmod (r, b, m) else do mulmod (b, b, m)
    mov    eax, esi ; eax = b
    cmovc  eax, edi ; eax = r
    push   eax      ; where to store result
   
    ; e = b/num2
    sub    esp, ecx
    mov    edi, esp
    pushad
    rep    movsb
    popad
    mov    ebx, esp
    
    ; b = b|r/num1
    sub   esp, ecx
    mov   edi, esp
    xchg  eax, esi
    pushad
    rep   movsb
    popad
    mov   esi, esp
    
    ; r = 0
    xor    eax, eax
    sub    esp, ecx
    mov    edi, esp
    pushad
    rep    stosb
    popad
    
    sub    esp, ecx
    
    cdq
    call  cntbits
mul_loop:
    ; e & 1
    bt    [ebx], edx
am_loop:
    pushfd
    call  addmod
    popfd
    cmc
    jnc   am_loop
    
    inc   edx
    dec   eax
    jns   mul_loop

    lea   esp, [esp+4*ecx]

    ; return r
    mov   esi, edi
    pop   edi
    rep   movsb
    
    popad
    ret
mulmod endp

Finally, the main function used in Diffie-Hellman key exchange and various other PKC.

Modular Exponentiation

uint64_t modexp (uint64_t b, uint64_t e, uint64_t m)
{
  uint64_t r=1, t=b;
  int n, i;
  
  n=bits (e);
  
  for (i=0; i<=n; i++) { 
    if (e & 1) { 
      r = mulmod (r, t, m); 
    } 
    t = mulmod (t, t, m); 
    e >>= 1;
  }
  return r;
}
; int modexp (int bitlen, void *base, void 
    *exponent, void *modulus, void *result);

modexp proc c
    pushad
    
    mov    ecx, [esp+32+ 4] ; maxbits
    mov    esi, [esp+32+ 8] ; base
    mov    ebx, [esp+32+12] ; exponent
    mov    ebp, [esp+32+16] ; modulus
    mov    eax, [esp+32+20] ; result

    shr    ecx, 3
    
    ; b = base
    sub    esp, ecx
    mov    edi, esp
    pushad
    rep    movsb
    popad
    mov    esi, esp
    
    ; e = exponent
    sub    esp, ecx
    mov    edi, esp
    pushad
    mov    esi, ebx
    rep    movsb
    popad
    mov    ebx, esp
    
    ; result = 1
    xchg   eax, edi
    xor    eax, eax
    cdq
    pushad
    rep    stosb
    popad
    inc    byte ptr[edi]
    
    ; n = bits(e)
    call   cntbits
exp_loop:
    ; e & 1
    bt     [ebx], edx
mm_loop:
    pushfd
    call   mulmod
    popfd
    cmc
    jnc   mm_loop
    
    inc   edx
    dec   eax
    jns   exp_loop
    
    lea   esp, [esp+2*ecx]
    popad
    ret
modexp endp

Demonstration

The following uses first and second oakley groups from RFC 2409 to simulate Diffie-Hellman key exchange. The modexp function is called from assembly itself but so long as maxbits parameter is aligned by 32, you can call from any language that supports cdecl convention.

Testing

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>

#include <openssl/bn.h>
#include <openssl/rand.h>

const char OAKLEY_PRIME_MODP768[]=
	"FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1"
	"29024E088A67CC74020BBEA63B139B22514A08798E3404DD"
	"EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245"
	"E485B576625E7EC6F44C42E9A63A3620FFFFFFFFFFFFFFFF";
  
const char OAKLEY_PRIME_MODP1024[]=
	"FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1"
	"29024E088A67CC74020BBEA63B139B22514A08798E3404DD"
	"EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245"
	"E485B576625E7EC6F44C42E9A637ED6B0BFF5CB6F406B7ED"
	"EE386BFB5A899FA5AE9F24117C4B1FE649286651ECE65381"
	"FFFFFFFFFFFFFFFF";
    
extern void dh_asm (uint32_t numlen, void *p, 
  void *g, void *x, void *y);
  
void dh (void)
{
  BIGNUM *p, *g, *x, *y, *A, *B, *s1, *s2;
  BN_CTX *ctx;
  uint32_t maxbits;
  
  // initialize context
  ctx = BN_CTX_new ();
  BN_CTX_init (ctx);
  
  p  = BN_new ();
  g  = BN_new ();
  x  = BN_new ();
  y  = BN_new ();
  A  = BN_new ();
  B  = BN_new ();
  s1 = BN_new ();
  s2 = BN_new ();
  
  BN_hex2bn (&p, OAKLEY_PRIME_MODP768);
  BN_hex2bn (&g, "2");
  
  // generate 2 random integers in range of p
  BN_rand_range (x, p);
  BN_rand_range (y, p);
  
  printf ("nP is %i-bits", BN_num_bits (p));
  
  printf ("np = %s", BN_bn2hex (p));
  printf ("ng = %s", BN_bn2hex (g));
  printf ("nx = %s", BN_bn2hex (x));
  printf ("ny = %s", BN_bn2hex (y));
  
  // Alice does g ^ x mod p
  BN_mod_exp (A, g, x, p, ctx);

  printf ("nA = %s", BN_bn2hex (A));
  
  // Bob does g ^ y mod p
  BN_mod_exp (B, g, y, p, ctx);

  printf ("nB = %s", BN_bn2hex (B));
  
  // *************************************
  // Bob and Alice exchange A and B values
  // *************************************
  
  // Alice computes session key
  BN_mod_exp (s1, B, x, p, ctx);

  printf ("ns1 = %s", BN_bn2hex (s1));
  
  // Bob computes session key
  BN_mod_exp (s2, A, y, p, ctx);
  
  printf ("ns2 = %s", BN_bn2hex (s2));
  
  // check if both keys match
  if (BN_cmp (s1, s2) == 0) {
    printf ("nnKey exchange succeeded");
    printf ("nnSession key = %sn", BN_bn2hex (s1));
  } else {
    printf ("nnKey exchange failedn");
  }
  
  // call the assembler function
  dh_asm (BN_num_bits(p), &p->d[0], &g->d[0], 
      &x->d[0], &y->d[0]);
  
  BN_free (s2);
  BN_free (s1);
  BN_free (p);
  BN_free (g);
  BN_free (y);
  BN_free (x);
  BN_free (B);
  BN_free (A);
  
  // release context
  BN_CTX_free (ctx);
}

The following should be assembled with JWASM/Win32 include files and linked with C code above.

.686
.model flat, c

    option casemap:none
    .nolist
    .nocref
    WIN32_LEAN_AND_MEAN equ 1
    include <windows.inc>
    include <stdio.inc>
    include <stdlib.inc>
    
    ;includelib <msvcrt.lib>
    includelib <kernel32.lib>
    .list
    .cref
.data

MAX_NUM equ 256 ; 2048 bytes

CStr macro szText:VARARG
  local dbText
.data
dbText db szText, 0
.code
exitm <offset dbText>
endm

modexp proto C numlen:dword, base:dword, exponent:dword, 
  modulus:dword, result:dword

.code

my_memcpy proto :dword, :dword, :dword

my_memcpy proc dst:dword, src:dword, len:dword
  pushad
  mov    ecx, [len]
  mov    esi, [src]
  mov    edi, [dst]
  rep    movsb
  popad
  ret
my_memcpy endp

my_memset proc x:dword, b:dword, len:dword
  pushad
  mov    ecx, [len]
  mov    edi, [x]
  mov    eax, [b]
  rep    stosb
  popad
  ret
my_memset endp

dump_hex proc uses esi edi ebx txt:DWORD, 
  buf:DWORD, len:DWORD
    mov   esi, [buf]
    mov   ebx, [len]
    invoke printf, txt
    .while ebx > 0
      movzx eax, byte ptr[esi+ebx-1]
      invoke printf, CStr("%02X"), eax
      dec ebx
    .endw
    ret
dump_hex endp

BN_LEN equ 1024

DH_ARGS struct
  p db BN_LEN dup (?)
  g db BN_LEN dup (?)
  
  x db BN_LEN dup (?)
  y db BN_LEN dup (?)
  
  A db BN_LEN dup (?)
  B db BN_LEN dup (?)
  
  s1 db BN_LEN dup (?)
  s2 db BN_LEN dup (?)
DH_ARGS ends

.data
dhargs DH_ARGS <>
maxbits dd ?
maxbytes dd ?
.code
dh_asm proc uses esi ebx edi numlen:dword, p:DWORD, 
  g:DWORD, x:DWORD, y:DWORD
    
    mov eax, [numlen]
    shr eax, 3
    mov [maxbytes], eax
    
    mov eax, [numlen]
    and eax, -32
    add eax, 32
    mov [maxbits], eax
    
    ; copy p, g, x and y
    invoke my_memcpy, addr dhargs.p, [p], [maxbytes]
    invoke my_memcpy, addr dhargs.g, [g], 1
    invoke my_memcpy, addr dhargs.x, [x], [maxbytes]
    invoke my_memcpy, addr dhargs.y, [y], [maxbytes]

      invoke dump_hex, CStr(10, 10, "p = "), 
        addr dhargs.p, [maxbytes]
        
      invoke dump_hex, CStr(10, 10, "g = "), 
        addr dhargs.g, 1
   
      invoke dump_hex, CStr(10, 10, "x = "), 
        addr dhargs.x, [maxbytes]
        
      invoke dump_hex, CStr(10, 10, "y = "), 
        addr dhargs.y, [maxbytes]
        
    ;int 3
    ; Alice does g ^ x mod p
    invoke modexp, [maxbits], addr dhargs.g, addr dhargs.x, 
      addr dhargs.p, addr dhargs.A
        
    ; Bob does g ^ y mod p
    invoke modexp, [maxbits], addr dhargs.g, addr dhargs.y, 
      addr dhargs.p, addr dhargs.B
    
    ; *************************************
    ; Bob and Alice exchange A and B values
    ; *************************************
    
    ; Alice computes session key
    invoke modexp, [maxbits], addr dhargs.B, addr dhargs.x, 
      addr dhargs.p, addr dhargs.s1
    
    ; Bob computes session key
    invoke modexp, [maxbits], addr dhargs.A, addr dhargs.y, 
      addr dhargs.p, addr dhargs.s2
    
      invoke dump_hex, CStr(10, 10, "A = "), 
        addr dhargs.A, [maxbytes]
        
      invoke dump_hex, CStr(10, 10, "B = "), 
        addr dhargs.B, [maxbytes]
        
      invoke dump_hex, CStr(10, 10, "s1 = "), 
        addr dhargs.s1, [maxbytes]
        
      invoke dump_hex, CStr(10, 10, "s2 = "), 
        addr dhargs.s2, [maxbytes]
        
    ; s1 + s2 should be equal
    invoke memcmp, addr dhargs.s1, addr dhargs.s2, [maxbytes]
    .if eax == 0
      ; use a KDF to generate symmetric key using s1 as input
      invoke printf, CStr(10, 10, "Key exchange succeeded")
      invoke dump_hex, CStr(10, 10, "Session key = "), 
        addr dhargs.s1, [maxbytes]
    .else
      ; something went wrong..
      invoke printf, CStr(10, "Key exchange failed")
    .endif
    ret
dh_asm endp

Results

architecture size
x86 213
x64 n/a

Conclusion

Although there are now effective attacks against Diffie-Hellman when using short groups in the range of 1024-bits, this doesn’t render the modular exponentiation itself obsolete for facilitating secure key exchange, we just need to use larger numbers in range of 4096-bits or more.

sources here

Advertisements
This entry was posted in assembly, cryptography, key exchange protocol, programming, security and tagged , , , , , , , . Bookmark the permalink.

3 Responses to Asmcodes: Diffie Hellman and Modular Exponentiation

  1. Pingback: Asmcodes: modexp | Odzhan

  2. Pingback: Windows: Interactive shells Part 4 |

  3. Pingback: Asmcodes: Modular Exponentiation | x86 crypto

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s