**Introduction**

*The Story Of Non-Secret Encryption* published in 1997 by the now deceased **J.H.Ellis** documents early research by GCHQ employees into what we now refer to as PKC (Public Key Cryptography).

Below is a list of papers/reports referenced by Ellis in his 1997 paper.

paper/report | authors | date |
---|---|---|

Final Report on Project C43 | Bell Telephone Laboratory | 1944 |

The Possibility of Secure Non-Secret Digital Encryption | James H. Ellis (GCHQ) | January 1970 |

A Note on Non-Secret Encryption | Clifford C. Cocks (GCHQ) | November 1973 |

Non-Secret Encryption Using a Finite Field | M J Williamson (GCHQ) | January 1974 |

Thoughts on Cheaper Non-Secret Encryption | M J Williamson (GCHQ) | August 1976 |

New Directions in Cryptography | W Diffie and M E Hellman | November 1976 |

A Method for Obtaining Digital Signatures and Public-key Cryptosystems | R L Rivest, A Shamir, L Adleman | February 1978 |

**Ellis** states his first thoughts about NSE began in the late 60s which he finally wrote about in 1970 while working at the GCHQ. His idea was based on a Bell Laboratory report published in 1944.

The story begins in the 60’s. The management of vast quantities of key material needed for secure communication was a headache for the armed forces. It was obvious to everyone, including me, that no secure communication was possible without secret key, some other secret knowledge, or at least some way in which the recipient was in a different position from an interceptor. After all, if they were in identical situations how could one possibly be able to receive what the other could not? Thus there was no incentive to look for something so clearly impossible.

At that time, he was unable to devise and implement a practical solution for his theory and it wasn’t until 1973 when another cryptographer and mathematician working at GCHQ by the name of **Clifford Cocks** provided a mathematical proof of concept. This was all classified of course until the death of **Ellis** in 1997.

Publicly however, the concept of PKC was first proposed by **Ralph Merkle** in his 1974 paper Secure Communications over Insecure Channels

This later became the basis of the Diffie-Hellman key exchange protocol described by **Whitfield Diffie and Martin Hellman** in their paper New Directions In Cryptography

Incidentally, another GCHQ cryptographer **M.J. Williamson** wrote about the same key exchange algorithm 3 months before the publication of the Diffie/Hellman paper.

The method was published by Diffie and Hellman. This was identical to Williamson’s version, except that they restricted q to be prime. — Ellis

What most early and modern day PKC systems have in common is their use of Modular Exponentiation and what follows is a tiny implementation in x86 assembler, optimized for size, so if you require a version optimized for speed, consider using a library such as PolarSSL or OpenSSL

The security of this function depends upon the inability of an adversary/interceptor to solve the (DLP) *Discrete Logarithm Problem* which is believed to be difficult when the numbers involved are large. More recently in 2015, cryptographers published Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice which suggests feasible attacks against DH which may be used by the NSA and GCHQ.

**Simple implementation**

If p and g have already been chosen (which they normally are, I’ll use predefined oakley prime numbers from RFC2409) then you only need modular exponentiation and a cryptographically secure random number generator.

Just to illustrate the basic functionality of modular exponentiation, here are 3 functions that perform the calculation over 64-bit field.

Obviously 64-bit integers are insufficient for security purposes but it’s simply to show what mathematical operators are used. These are then implemented in assembly to work with arbitrary sized numbers.

In order for this to work, the **b**ase and **e**xponent must be in range of the **m**odulus for addmod() to return correct results. The base and exponent values must be less than the modulus. This is how Diffie-Hellman and RSA work anyway so I don’t see it as a limitation. (open to correction)

// Modular Addition uint64_t addmod (uint64_t a, uint64_t b, uint64_t m) { a = a + b; if (a >= m) { a = a - m; } return a; } // Modular multiplication uint64_t mulmod (uint64_t a, uint64_t b, uint64_t m) { uint64_t r=0, t=a; while (b > 0) { if (b & 1) { r = addmod (r, t, m); } t = addmod (t, t, m); b >>= 1; } return r; } // Modular exponentiation uint64_t expmod (uint64_t b, uint64_t e, uint64_t m) { uint64_t r=1, t=b; while (e > 0) { if (e & 1) { r = mulmod (r, t, m); } t = mulmod (t, t, m); e >>= 1; } return r; }

You can see that e is shifted to the right by one, this is essentially dividing by 2.

Implementing a function to perform this is possible but we’d need to make a copy of the exponent due to modification and it’s not necessary. The BT instruction sets the carry flag if a bit at specific position in memory is set and is much more efficient than shifting right using RCR instruction.

The following is just some simple code to demonstrate using the above functions.

What isn’t shown is how p and g are generated.

// Alice and Bob agree to use same p and g for key exchange // Both generate a random integer to establish session key. x=rand_range (p); y=rand_range (p); // Alice computes g ^ x mod p A=modexp (g, x, p); // Bob computes g ^ y mod p B=modexp (g, y, p); // Alice and Bob exchange A and B values // Alice computes session key using B ^ x mod p S1 = modexp (B, x, p); printf ("nn Alice S=%llu", S1); // Bob computes session key using A ^ x mod p S2 = modexp (A, y, p); printf ("n Bob S=%llu", S2); // S1 and S2 should be equal printf ("nn Key exchange %sn", S1 == S2 ? "succeeded" : "failed");

**Getting highest bit**

The following will obtain the highest bit of large number.

; ; Find highest bit in big memory ; Return in eax ; ; in: eax = 0, ; ecx = max size of memory in bytes, ; ebx = pointer to memory ; ; out: eax = last bit in array ; cntbits proc lea eax, [eax+8*ecx] dec eax ghb: bt [ebx], eax jc end_cnt dec eax jnz ghb end_cnt: ret cntbits endp

Next function is modular addition which is used in the modular multiplication function. Why modular addition using subtraction? Normally we need a full modulo function but it isn’t required here because we know b and e are less than m, or at least they should be for diffie-hellman key exchange. This is obviously much faster too.

**Modular Addition**

// Modular Addition uint64_t addmod (uint64_t a, uint64_t b, uint64_t m) { a = a + b; if (a >= m) { a = a - m; } return a; }

In assembly, we make use of ADC (Add with Carry) and SBB (Subtract with Borrow)

The numbers are compared from top to bottom.

; ************************************************** ; r = r + t; if (r >= m) r -= m; ; t = t + t; if (t >= m) t -= m; ; ************************************************** addmod proc c pushad cmovnc edi, esi clc pushad add_loop: lodsb adc al, [edi] stosb loop add_loop popad mov esi, edi mov edi, ebp pushad cmp_loop: mov al, [esi+ecx-1] cmp al, [edi+ecx-1] loope cmp_loop popad jb end_am sub_loop: lodsb sbb al, [edi] mov [esi-1], al lea edi, [edi+1] loop sub_loop end_am: popad ret addmod endp

**Modular Multiplication**

Essentially you are multiplying two numbers together and applying modulo operator like **(a * b) % m**. However, because we expect numbers to be much larger than 64-bits, we use binary multiplication.

uint64_t mulmod (uint64_t num1, uint64_t num2, uint64_t modulus) { uint64_t r=0, b=num1, e=num2, m=modulus; while (e > 0) { if (e & 1) { r = addmod (r, b, m); } b = addmod (b, b, m); e >>= 1; } return r; }

mulmod proc c pushad ; if (cf==1) do mulmod (r, b, m) else do mulmod (b, b, m) mov eax, esi ; eax = b cmovc eax, edi ; eax = r push eax ; where to store result ; e = b/num2 sub esp, ecx mov edi, esp pushad rep movsb popad mov ebx, esp ; b = b|r/num1 sub esp, ecx mov edi, esp xchg eax, esi pushad rep movsb popad mov esi, esp ; r = 0 xor eax, eax sub esp, ecx mov edi, esp pushad rep stosb popad sub esp, ecx cdq call cntbits mul_loop: ; e & 1 bt [ebx], edx am_loop: pushfd call addmod popfd cmc jnc am_loop inc edx dec eax jns mul_loop lea esp, [esp+4*ecx] ; return r mov esi, edi pop edi rep movsb popad ret mulmod endp

Finally, the main function used in Diffie-Hellman key exchange and various other PKC.

**Modular Exponentiation**

uint64_t modexp (uint64_t b, uint64_t e, uint64_t m) { uint64_t r=1, t=b; int n, i; n=bits (e); for (i=0; i<=n; i++) { if (e & 1) { r = mulmod (r, t, m); } t = mulmod (t, t, m); e >>= 1; } return r; }

; int modexp (int bitlen, void *base, void *exponent, void *modulus, void *result); modexp proc c pushad mov ecx, [esp+32+ 4] ; maxbits mov esi, [esp+32+ 8] ; base mov ebx, [esp+32+12] ; exponent mov ebp, [esp+32+16] ; modulus mov eax, [esp+32+20] ; result shr ecx, 3 ; b = base sub esp, ecx mov edi, esp pushad rep movsb popad mov esi, esp ; e = exponent sub esp, ecx mov edi, esp pushad mov esi, ebx rep movsb popad mov ebx, esp ; result = 1 xchg eax, edi xor eax, eax cdq pushad rep stosb popad inc byte ptr[edi] ; n = bits(e) call cntbits exp_loop: ; e & 1 bt [ebx], edx mm_loop: pushfd call mulmod popfd cmc jnc mm_loop inc edx dec eax jns exp_loop lea esp, [esp+2*ecx] popad ret modexp endp

**Demonstration**

The following uses first and second oakley groups from RFC 2409 to simulate Diffie-Hellman key exchange. The modexp function is called from assembly itself but so long as maxbits parameter is aligned by 32, you can call from any language that supports cdecl convention.

**Testing**

#include <stdio.h> #include <string.h> #include <stdlib.h> #include <stdint.h> #include <openssl/bn.h> #include <openssl/rand.h> const char OAKLEY_PRIME_MODP768[]= "FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1" "29024E088A67CC74020BBEA63B139B22514A08798E3404DD" "EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245" "E485B576625E7EC6F44C42E9A63A3620FFFFFFFFFFFFFFFF"; const char OAKLEY_PRIME_MODP1024[]= "FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1" "29024E088A67CC74020BBEA63B139B22514A08798E3404DD" "EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245" "E485B576625E7EC6F44C42E9A637ED6B0BFF5CB6F406B7ED" "EE386BFB5A899FA5AE9F24117C4B1FE649286651ECE65381" "FFFFFFFFFFFFFFFF"; extern void dh_asm (uint32_t numlen, void *p, void *g, void *x, void *y); void dh (void) { BIGNUM *p, *g, *x, *y, *A, *B, *s1, *s2; BN_CTX *ctx; uint32_t maxbits; // initialize context ctx = BN_CTX_new (); BN_CTX_init (ctx); p = BN_new (); g = BN_new (); x = BN_new (); y = BN_new (); A = BN_new (); B = BN_new (); s1 = BN_new (); s2 = BN_new (); BN_hex2bn (&p, OAKLEY_PRIME_MODP768); BN_hex2bn (&g, "2"); // generate 2 random integers in range of p BN_rand_range (x, p); BN_rand_range (y, p); printf ("nP is %i-bits", BN_num_bits (p)); printf ("np = %s", BN_bn2hex (p)); printf ("ng = %s", BN_bn2hex (g)); printf ("nx = %s", BN_bn2hex (x)); printf ("ny = %s", BN_bn2hex (y)); // Alice does g ^ x mod p BN_mod_exp (A, g, x, p, ctx); printf ("nA = %s", BN_bn2hex (A)); // Bob does g ^ y mod p BN_mod_exp (B, g, y, p, ctx); printf ("nB = %s", BN_bn2hex (B)); // ************************************* // Bob and Alice exchange A and B values // ************************************* // Alice computes session key BN_mod_exp (s1, B, x, p, ctx); printf ("ns1 = %s", BN_bn2hex (s1)); // Bob computes session key BN_mod_exp (s2, A, y, p, ctx); printf ("ns2 = %s", BN_bn2hex (s2)); // check if both keys match if (BN_cmp (s1, s2) == 0) { printf ("nnKey exchange succeeded"); printf ("nnSession key = %sn", BN_bn2hex (s1)); } else { printf ("nnKey exchange failedn"); } // call the assembler function dh_asm (BN_num_bits(p), &p->d[0], &g->d[0], &x->d[0], &y->d[0]); BN_free (s2); BN_free (s1); BN_free (p); BN_free (g); BN_free (y); BN_free (x); BN_free (B); BN_free (A); // release context BN_CTX_free (ctx); }

The following should be assembled with JWASM/Win32 include files and linked with C code above.

.686 .model flat, c option casemap:none .nolist .nocref WIN32_LEAN_AND_MEAN equ 1 include <windows.inc> include <stdio.inc> include <stdlib.inc> ;includelib <msvcrt.lib> includelib <kernel32.lib> .list .cref .data MAX_NUM equ 256 ; 2048 bytes CStr macro szText:VARARG local dbText .data dbText db szText, 0 .code exitm <offset dbText> endm modexp proto C numlen:dword, base:dword, exponent:dword, modulus:dword, result:dword .code my_memcpy proto :dword, :dword, :dword my_memcpy proc dst:dword, src:dword, len:dword pushad mov ecx, [len] mov esi, [src] mov edi, [dst] rep movsb popad ret my_memcpy endp my_memset proc x:dword, b:dword, len:dword pushad mov ecx, [len] mov edi, [x] mov eax, [b] rep stosb popad ret my_memset endp dump_hex proc uses esi edi ebx txt:DWORD, buf:DWORD, len:DWORD mov esi, [buf] mov ebx, [len] invoke printf, txt .while ebx > 0 movzx eax, byte ptr[esi+ebx-1] invoke printf, CStr("%02X"), eax dec ebx .endw ret dump_hex endp BN_LEN equ 1024 DH_ARGS struct p db BN_LEN dup (?) g db BN_LEN dup (?) x db BN_LEN dup (?) y db BN_LEN dup (?) A db BN_LEN dup (?) B db BN_LEN dup (?) s1 db BN_LEN dup (?) s2 db BN_LEN dup (?) DH_ARGS ends .data dhargs DH_ARGS <> maxbits dd ? maxbytes dd ? .code dh_asm proc uses esi ebx edi numlen:dword, p:DWORD, g:DWORD, x:DWORD, y:DWORD mov eax, [numlen] shr eax, 3 mov [maxbytes], eax mov eax, [numlen] and eax, -32 add eax, 32 mov [maxbits], eax ; copy p, g, x and y invoke my_memcpy, addr dhargs.p, [p], [maxbytes] invoke my_memcpy, addr dhargs.g, [g], 1 invoke my_memcpy, addr dhargs.x, [x], [maxbytes] invoke my_memcpy, addr dhargs.y, [y], [maxbytes] invoke dump_hex, CStr(10, 10, "p = "), addr dhargs.p, [maxbytes] invoke dump_hex, CStr(10, 10, "g = "), addr dhargs.g, 1 invoke dump_hex, CStr(10, 10, "x = "), addr dhargs.x, [maxbytes] invoke dump_hex, CStr(10, 10, "y = "), addr dhargs.y, [maxbytes] ;int 3 ; Alice does g ^ x mod p invoke modexp, [maxbits], addr dhargs.g, addr dhargs.x, addr dhargs.p, addr dhargs.A ; Bob does g ^ y mod p invoke modexp, [maxbits], addr dhargs.g, addr dhargs.y, addr dhargs.p, addr dhargs.B ; ************************************* ; Bob and Alice exchange A and B values ; ************************************* ; Alice computes session key invoke modexp, [maxbits], addr dhargs.B, addr dhargs.x, addr dhargs.p, addr dhargs.s1 ; Bob computes session key invoke modexp, [maxbits], addr dhargs.A, addr dhargs.y, addr dhargs.p, addr dhargs.s2 invoke dump_hex, CStr(10, 10, "A = "), addr dhargs.A, [maxbytes] invoke dump_hex, CStr(10, 10, "B = "), addr dhargs.B, [maxbytes] invoke dump_hex, CStr(10, 10, "s1 = "), addr dhargs.s1, [maxbytes] invoke dump_hex, CStr(10, 10, "s2 = "), addr dhargs.s2, [maxbytes] ; s1 + s2 should be equal invoke memcmp, addr dhargs.s1, addr dhargs.s2, [maxbytes] .if eax == 0 ; use a KDF to generate symmetric key using s1 as input invoke printf, CStr(10, 10, "Key exchange succeeded") invoke dump_hex, CStr(10, 10, "Session key = "), addr dhargs.s1, [maxbytes] .else ; something went wrong.. invoke printf, CStr(10, "Key exchange failed") .endif ret dh_asm endp

**Results**

architecture | size |
---|---|

x86 | 213 |

x64 | n/a |

**Conclusion**

Although there are now effective attacks against Diffie-Hellman when using short groups in the range of 1024-bits, this doesn’t render the modular exponentiation itself obsolete for facilitating secure key exchange, we just need to use larger numbers in range of 4096-bits or more.

Pingback: Asmcodes: modexp | Odzhan

Pingback: Windows: Interactive shells Part 4 |

Pingback: Asmcodes: Modular Exponentiation | x86 crypto