**Introduction**

Tiny Encryption Algorithm was published in November 1994 by Cambridge computer scientists, David John Wheeler and Roger Needham

TEA is not subject to any patents. This along with its simplicity made it attractive to various developers, particularly in the field of microelectronics where memory is limited. (well, I’m sure it probably was 20 years ago)

Others just wanted a simple encryption algorithm free of license restrictions. It was popular enough that even Microsoft developers used it to secure firmware on the first xbox consoles as one hacker noted in 2007.

After reading Bruce Schneier’s book on crypto, we learned that TEA was a really bad choice as a hash. The book says that TEA must never be used as a hash, because it is insecure if used this way. If you flip both bit 16 and 31 of a 32 bit word, the hash will be the same. We could easily patch a jump in the second bootloader so that it would not be recognized. This modified jump lead us directly into flash memory.

But why did they make this mistake? Obviously the designers knew nothing about crypto – again! – and just added code without understanding it and without even reading the most basic books on the topic. A possible explanation why they chose TEA would be that they might have searched the internet for a “tiny” encryption algorithm – and got TEA.

What follows is a tiny implementation of TEA in x86 assembly. It is not written for speed and I do not endorse it as a secure block cipher. This isn’t about promoting any specific encryption algorithm, just exploring the ways in which they can be optimized for size.

First, the C implementations provided by the authors with some slight modifications. Key size is 128-bit, block size is 64-bit with 32 rounds.

#define TEA_ENCRYPT 0 #define TEA_DECRYPT 1 #define TEA_BLK_LEN 8 #define TEA_KEY_LEN 16 #define TEA_ROUNDS 32 typedef union tea_blk_t { uint8_t b[8]; uint16_t w[4]; uint32_t d[2]; uint64_t q; } tea_blk; typedef union tea_key_t { uint8_t b[16]; uint16_t w[8]; uint32_t d[4]; uint64_t q[2]; } tea_key;

**Encryption**

void TEA_Encrypt (void *data, void *key, int enc) { tea_blk *v=(tea_blk*)data; tea_key *k=(tea_key*)key; uint32_t v0, v1, i, sum=0x9e3779b9; if (enc==TEA_DECRYPT) { sum <<= 5; } v0 = (v->d[0]); v1 = (v->d[1]); if (enc==TEA_ENCRYPT) { for (i=0; i<TEA_ROUNDS; i++) { v0 += ((v1 << 4) + k->d[0]) ^ (v1 + sum) ^ ((v1 >> 5) + k->d[1]); v1 += ((v0 << 4) + k->d[2]) ^ (v0 + sum) ^ ((v0 >> 5) + k->d[3]); sum += 0x9e3779b9; } } else { for (i=0; i<TEA_ROUNDS; i++) { v1 -= ((v0 << 4) + k->d[2]) ^ (v0 + sum) ^ ((v0 >> 5) + k->d[3]); v0 -= ((v1 << 4) + k->d[0]) ^ (v1 + sum) ^ ((v1 >> 5) + k->d[1]); sum -= 0x9e3779b9; } } v->d[0] = (v0); v->d[1] = (v1); }

**Integrating Encryption and Decryption**

There were 2 approaches to this and both depend on an input flag of zero or one defined as simply **TEA_ENCRYPT **and **TEA_DECRYPT**

The main parameters required for both functions are loaded and then the flag is checked. If **TEA_DECRYPT** is specified, code jumps to the decrypt routine and this is fairly common in most crypto libraries already.

The second approach is similar but instead of jumping to a decrypt routine, we negate the flag and use it to invert bits of one register for subtraction which is what happens when decrypting a block of ciphertext. XOR operation against zero does nothing so if flag is **TEA_ENCRYPT** it just adds value of 2 registers.

If you know how adders work, subtraction is simply addition with bits of 1 addend inverted. Take the following 2 functions which are identical except in subtraction where the first parameter is inverted before and after the addition.

// add y to x uint32_t add (uint32_t x, uint32_t y) { uint32_t a=x, b=y, c; while (b != 0) { c = a & b; a = a ^ b; b = c << 1; } return a; } // subtract y from x uint32_t subtract (uint32_t x, uint32_t y) { uint32_t a=x, b=y, c; a = ~a; while (b != 0) { c = a & b; a = a ^ b; b = c << 1; } return ~a; }

The same can be achieved with the following code where we want to either add or subtract two numbers.

; mov eax, [esp+ 4] ; num1 mov ebx, [esp+ 8] ; num2 mov ecx, [esp+12] ; 1 or 0 neg ecx ; 1 becomes -1, 0 becomes 0 xor eax, ecx ; invert for subtraction add eax, ebx ; regular add xor eax, ecx ; invert

There may be a better way to do all this but that’s what I finished up with. (let me know in comments)

The only other problem really is the order of execution for each statement in the cycle. For encryption, *v0* is modified first, for decryption, it’s *v1* so again the input flag determines what settings are applied. I use XCHG to swap values beforehand.

// encryption for (i=0; i<TEA_ROUNDS; i++) { v0 += ((v1 << 4) + k->v32[0]) ^ (v1 + sum) ^ ((v1 >> 5) + k->v32[1]); v1 += ((v0 << 4) + k->v32[2]) ^ (v0 + sum) ^ ((v0 >> 5) + k->v32[3]); sum += 0x9e3779b9; }

There’s a sequential order for encryption when reading the key but not decryption so the 128-bit key is rotated right by 64-bits. I tried using MMX/XMM but this generated too much code. Swapping the lower 64-bits with upper 64 works fine.

This means having to cache the key on stack which unfortunately then requires more code. You could obviously trash the key space but I decided to store locally before modifying.

// v1 -= ((v0 << 4) + k->v32[2]) ^ (v0 + sum) ^ ((v0 >> 5) + k->v32[3]); v0 -= ((v1 << 4) + k->v32[0]) ^ (v1 + sum) ^ ((v1 >> 5) + k->v32[1]); sum -= 0x9e3779b9;

Last thing to mention is the above 2 statements. They are basically the same, (just using different values) so the carry flag is cleared in preparation for a double loop.

After the first loop, the flag is flipped and then goes again one more time until cleared. During this, both *v0* and *v1* are exchanged. This is a clever way to loop 2 times without using up other registers.

A traditional approach to looping twice might be something like (just for example)

; push 2 pop ecx add_loop: lodsd add eax, [edi] stosd loop add_loop

You could replace it while preserving ecx with

; clc add_loop: pushfd lodsd add eax, [edi] stosd popfd cmc jc add_loop

Normally you wouldn’t use a loop for something that simple but it’s just to illustrate how it would work.

It’s usually very effective when ecx and other registers are reserved for outer loop or some other purpose.

Peter Ferrie showed me a another way to loop twice using the Parity Flag.

; xor edx, edx add_loop: lodsd add eax, [edi] stosd dec edx jnp add_loop

For decryption, the sum variable is intialized to delta times 32 so we skip over this part when encrypting. The final version for now is approx. 134 bytes.

**Final Version**

bits 32 %define TEA_ROUNDS 32 %define TEA_DELTA 09e3779b9h %ifndef BIN global TEA_Encryptx global _TEA_Encryptx %endif _TEA_Encryptx: TEA_Encryptx: pushad lea esi, [esp+32+4] lodsd xchg eax, edx ; edx = data lodsd push eax ; save key lodsd xchg eax, edx ; edx = flag pop esi ; esi = key mov ecx, esp pushad ; allocate 16 bytes for key sub ecx, esp ; ecx = 16 mov edi, esp ; move stack pointer into edi rep movsb ; move key onto stack mov edi, esp ; esi = key mov esi, esp ; edi = key neg edx ; negate input flag pushfd ; save cpu flags jz load_data ; if neg edx was zero, we're encrypting mov cl, 2 ; else we're decrypting and need to pushad ; rotate key right 64-bits load_key: ; this is because of how key is loaded lodsd ; load 32-bits of key xchg eax, [edi+8] ; swap with upper 64-bits stosd ; save upper 32-bits loop load_key popad ; ------------------------ load_data: popfd ; restore register flags xchg eax, esi ; eax now contains data to process push esi ; save pointer to it push edx ; save result of neg edx for add/sub lodsd ; eax=v[0] xchg eax, ebx ; store in ebx lodsd ; ebx=v[1] xchg eax, ebx ; store in ebx ; for (i=32; i>0; i--) mov cl, TEA_ROUNDS ; default is 32 rounds ; delta = 0x9e3779b9 mov ebp, 09e3779b9h ; ebp = delta jz t_el ; neg edx == 0? skip shift for decryption shl ebp, 5 ; *= 32 for decryption xchg eax, ebx ; and swap v0 + v1 t_el: clc ; clear carry flag push edi t_ev: pushfd mov edx, ebx ; v << 4 + k shl edx, 4 add edx, [edi] ; add key from edi scasd ; advance edi 4 bytes mov esi, ebx ; v + sum add esi, ebp xor edx, esi mov esi, ebx ; v1 >> 5 + k1 shr esi, 5 add esi, [edi] ; add key from edi scasd ; advance edi 4 bytes xor edx, esi ; v += ((v << 4) + k) ^ ; (v + sum) ^ ; ((v >> 5) + k) xor eax, [esp+8] add eax, edx xor eax, [esp+8] popfd ; restore flags xchg eax, ebx ; swap v0, v1 cmc ; flip carry flag jc t_ev ; keep going if carry pop edi ; restore pointer to key ; sum += delta or sum -= delta ; ; the following performs addition or subtraction ; depending on flag which we expect to be 1 or 0 ; the negate operation earlier will change it to -1 if true ; but remain 0 if false. xor ebp, [esp] ; ebp ^= -flag add ebp, 09e3779b9h ; performs normal addition xor ebp, [esp] ; ebp ^= -flag loop t_el save_r: pop ecx ; restore -flag inc ecx ; -flag++ jnz $+3 ; if not zero, it means encryption xchg eax, ebx ; else decryption so swap again pop edi stosd ; v[0] = v0 xchg eax, ebx stosd ; v[1] = v1 popad ; release stack popad ret

An earlier version was simpler and smaller but I prefer this one despite all the additional code.