TEA block cipher

Introduction

Tiny Encryption Algorithm was published in November 1994 by Cambridge computer scientists, David John Wheeler and Roger Needham

TEA is not subject to any patents. This along with its simplicity made it attractive to various developers, particularly in the field of microelectronics where memory is limited. (well, I’m sure it probably was 20 years ago)

Others just wanted a simple encryption algorithm free of license restrictions. It was popular enough that even Microsoft developers used it to secure firmware on the first xbox consoles as one hacker noted in 2007.

After reading Bruce Schneier’s book on crypto, we learned that TEA was a really bad choice as a hash. The book says that TEA must never be used as a hash, because it is insecure if used this way. If you flip both bit 16 and 31 of a 32 bit word, the hash will be the same. We could easily patch a jump in the second bootloader so that it would not be recognized. This modified jump lead us directly into flash memory.

But why did they make this mistake? Obviously the designers knew nothing about crypto – again! – and just added code without understanding it and without even reading the most basic books on the topic. A possible explanation why they chose TEA would be that they might have searched the internet for a “tiny” encryption algorithm – and got TEA.

What follows is a tiny implementation of TEA in x86 assembly. It is not written for speed and I do not endorse it as a secure block cipher. This isn’t about promoting any specific encryption algorithm, just exploring the ways in which they can be optimized for size.

First, the C implementations provided by the authors with some slight modifications. Key size is 128-bit, block size is 64-bit with 32 rounds.

#define TEA_ENCRYPT 0
#define TEA_DECRYPT 1

#define TEA_BLK_LEN 8
#define TEA_KEY_LEN 16

#define TEA_ROUNDS 32

typedef union tea_blk_t {
  uint8_t b[8];
  uint16_t w[4];
  uint32_t d[2];
  uint64_t q;
} tea_blk;

typedef union tea_key_t {
  uint8_t b[16];
  uint16_t w[8];
  uint32_t d[4];
  uint64_t q[2];  
} tea_key;

Encryption

void TEA_Encrypt (void *data, void *key, int enc)
{
  tea_blk *v=(tea_blk*)data;
  tea_key *k=(tea_key*)key;
  
  uint32_t v0, v1, i, sum=0x9e3779b9;
  
  if (enc==TEA_DECRYPT) {
    sum <<= 5;
  }
  
  v0 = (v->d[0]); 
  v1 = (v->d[1]);
  
  if (enc==TEA_ENCRYPT)
  {
    for (i=0; i<TEA_ROUNDS; i++) 
    {         
      v0 += ((v1 << 4) + k->d[0]) ^ 
             (v1 + sum) ^ 
            ((v1 >> 5) + k->d[1]);
            
      v1 += ((v0 << 4) + k->d[2]) ^ 
             (v0 + sum) ^ 
            ((v0 >> 5) + k->d[3]);
            
      sum += 0x9e3779b9;
    }
  } else {
    for (i=0; i<TEA_ROUNDS; i++) 
    {
      v1 -= ((v0 << 4) + k->d[2]) ^ 
             (v0 + sum) ^ 
            ((v0 >> 5) + k->d[3]);
              
      v0 -= ((v1 << 4) + k->d[0]) ^ 
             (v1 + sum) ^ 
            ((v1 >> 5) + k->d[1]);
            
      sum -= 0x9e3779b9;
    }  
  }
  v->d[0] = (v0); 
  v->d[1] = (v1);
}

Integrating Encryption and Decryption

There were 2 approaches to this and both depend on an input flag of zero or one defined as simply TEA_ENCRYPT and TEA_DECRYPT

The main parameters required for both functions are loaded and then the flag is checked. If TEA_DECRYPT is specified, code jumps to the decrypt routine and this is fairly common in most crypto libraries already.

The second approach is similar but instead of jumping to a decrypt routine, we negate the flag and use it to invert bits of one register for subtraction which is what happens when decrypting a block of ciphertext. XOR operation against zero does nothing so if flag is TEA_ENCRYPT it just adds value of 2 registers.

If you know how adders work, subtraction is simply addition with bits of 1 addend inverted. Take the following 2 functions which are identical except in subtraction where the first parameter is inverted before and after the addition.

// add y to x
uint32_t add (uint32_t x, uint32_t y)
{
  uint32_t a=x, b=y, c;
  
  while (b != 0) {
    c = a & b;  
    a = a ^ b;
    b = c << 1;
  }
  return a;
}

// subtract y from x
uint32_t subtract (uint32_t x, uint32_t y)
{
  uint32_t a=x, b=y, c;
  a = ~a;
  
  while (b != 0) {
    c = a & b;  
    a = a ^ b; 
    b = c << 1;
  }
  return ~a;
}

The same can be achieved with the following code where we want to either add or subtract two numbers.

;
  mov    eax, [esp+ 4]         ; num1
  mov    ebx, [esp+ 8]         ; num2
  mov    ecx, [esp+12]         ; 1 or 0
  neg    ecx                   ; 1 becomes -1, 0 becomes 0
  xor    eax, ecx              ; invert for subtraction
  add    eax, ebx              ; regular add
  xor    eax, ecx              ; invert

There may be a better way to do all this but that’s what I finished up with. (let me know in comments)

The only other problem really is the order of execution for each statement in the cycle. For encryption, v0 is modified first, for decryption, it’s v1 so again the input flag determines what settings are applied. I use XCHG to swap values beforehand.

// encryption
  for (i=0; i<TEA_ROUNDS; i++) 
  {         
    v0 += ((v1 << 4) + k->v32[0]) ^ 
           (v1 + sum) ^ 
          ((v1 >> 5) + k->v32[1]);
          
    v1 += ((v0 << 4) + k->v32[2]) ^ 
           (v0 + sum) ^ 
          ((v0 >> 5) + k->v32[3]);
          
    sum += 0x9e3779b9;
  }

There’s a sequential order for encryption when reading the key but not decryption so the 128-bit key is rotated right by 64-bits. I tried using MMX/XMM but this generated too much code. Swapping the lower 64-bits with upper 64 works fine.

This means having to cache the key on stack which unfortunately then requires more code. You could obviously trash the key space but I decided to store locally before modifying.

//
    v1 -= ((v0 << 4) + k->v32[2]) ^ 
           (v0 + sum) ^ 
          ((v0 >> 5) + k->v32[3]);
            
    v0 -= ((v1 << 4) + k->v32[0]) ^ 
           (v1 + sum) ^ 
          ((v1 >> 5) + k->v32[1]);
          
    sum -= 0x9e3779b9;

Last thing to mention is the above 2 statements. They are basically the same, (just using different values) so the carry flag is cleared in preparation for a double loop.

After the first loop, the flag is flipped and then goes again one more time until cleared. During this, both v0 and v1 are exchanged. This is a clever way to loop 2 times without using up other registers.

A traditional approach to looping twice might be something like (just for example)

;
   push    2
   pop    ecx
add_loop:
   lodsd
   add   eax, [edi]
   stosd
   loop  add_loop

You could replace it while preserving ecx with

;
  clc
add_loop:
  pushfd
  lodsd
  add   eax, [edi]
  stosd
  popfd
  cmc
  jc  add_loop

Normally you wouldn’t use a loop for something that simple but it’s just to illustrate how it would work.

It’s usually very effective when ecx and other registers are reserved for outer loop or some other purpose.
Peter Ferrie showed me a another way to loop twice using the Parity Flag.

;
    xor   edx, edx
add_loop:
    lodsd
    add   eax, [edi]
    stosd
    dec   edx
    jnp   add_loop

For decryption, the sum variable is intialized to delta times 32 so we skip over this part when encrypting. The final version for now is approx. 134 bytes.

Final Version

bits 32
    
%define TEA_ROUNDS 32
%define TEA_DELTA  09e3779b9h

%ifndef BIN
  global TEA_Encryptx
  global _TEA_Encryptx
%endif
  
_TEA_Encryptx:
TEA_Encryptx:
    pushad
    lea    esi, [esp+32+4]
    lodsd
    xchg   eax, edx          ; edx = data 
    lodsd
    push   eax               ; save key
    lodsd
    xchg   eax, edx          ; edx = flag
    pop    esi               ; esi = key
    mov    ecx, esp 
    pushad                   ; allocate 16 bytes for key
    sub    ecx, esp          ; ecx = 16
    mov    edi, esp          ; move stack pointer into edi
    rep    movsb             ; move key onto stack
    mov    edi, esp          ; esi = key
    mov    esi, esp          ; edi = key
  
    neg    edx               ; negate input flag
    pushfd                   ; save cpu flags
    jz     load_data         ; if neg edx was zero, we're encrypting
  
    mov    cl, 2             ; else we're decrypting and need to 
    pushad                   ; rotate key right 64-bits
load_key:                    ; this is because of how key is loaded
    lodsd                    ; load 32-bits of key
    xchg   eax, [edi+8]      ; swap with upper 64-bits
    stosd                    ; save upper 32-bits
    loop   load_key
    popad
    ; ------------------------
load_data:
    popfd                    ; restore register flags
    xchg   eax, esi          ; eax now contains data to process
    push   esi               ; save pointer to it
    push   edx               ; save result of neg edx for add/sub
    lodsd                    ; eax=v[0]
    xchg   eax, ebx          ; store in ebx
    lodsd                    ; ebx=v[1]
    xchg   eax, ebx          ; store in ebx
    ; for (i=32; i>0; i--)
    mov    cl, TEA_ROUNDS    ; default is 32 rounds
    ; delta = 0x9e3779b9
    mov    ebp, 09e3779b9h   ; ebp = delta
    jz     t_el              ; neg edx == 0? skip shift for decryption
    shl    ebp, 5            ; *= 32 for decryption
    xchg   eax, ebx          ; and swap v0 + v1
t_el:
    clc                      ; clear carry flag   
    push   edi
t_ev:
    pushfd
    mov    edx, ebx          ; v << 4 + k
    shl    edx, 4
    add    edx, [edi]        ; add key from edi
    scasd                    ; advance edi 4 bytes
    mov    esi, ebx          ; v + sum
    add    esi, ebp
    xor    edx, esi
    mov    esi, ebx          ; v1 >> 5 + k1
    shr    esi, 5
    add    esi, [edi]        ; add key from edi
    scasd                    ; advance edi 4 bytes
    xor    edx, esi
    ; v += ((v << 4) + k) ^
    ;        (v + sum) ^
    ;       ((v >> 5) + k)
    xor    eax, [esp+8]
    add    eax, edx
    xor    eax, [esp+8]
  
    popfd                    ; restore flags
    xchg   eax, ebx          ; swap v0, v1
    cmc                      ; flip carry flag
    jc     t_ev              ; keep going if carry
    pop    edi               ; restore pointer to key
  
    ; sum += delta or sum -= delta
    ;
    ; the following performs addition or subtraction
    ; depending on flag which we expect to be 1 or 0
    ; the negate operation earlier will change it to -1 if true
    ; but remain 0 if false.
    xor    ebp, [esp]        ; ebp ^= -flag
    add    ebp, 09e3779b9h   ; performs normal addition
    xor    ebp, [esp]        ; ebp ^= -flag
  
    loop   t_el
save_r:
    pop    ecx               ; restore -flag
    inc    ecx               ; -flag++
    jnz    $+3               ; if not zero, it means encryption
    xchg   eax, ebx          ; else decryption so swap again
    pop    edi
    stosd                    ; v[0] = v0
    xchg   eax, ebx
    stosd                    ; v[1] = v1
    popad                    ; release stack
    popad
    ret

An earlier version was simpler and smaller but I prefer this one despite all the additional code.

sources here

Advertisements
This entry was posted in assembly, cryptography, encryption, programming, security and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s