Asmcodes: Light Message Authentication Code (LightMAC)


A Message Authentication Code or MAC is a cryptographic primitive that enables two parties using a shared secret key to authenticate messages exchanged between them.

MACs are used daily to secure online communications and will inevitably become a critical component of embedded systems in the future that have wireless functionality; specifically passive, low-power and IoT devices.

Some MACs are constructed from cryptographic hash algorithms. The Hash-based message authentication code (HMAC) published in 1996 for example recommended using MD5 or SHA-1. However, due to resources required for these hash functions, they were never suitable for microcontrollers.

Other MACs are constructed from block ciphers and this is more ideal for a developer where ROM and RAM are in short supply since there’s already a number of lightweight block ciphers available.

There’s no shortage of cryptographic solutions for desktop computers and mobile devices, however there still remains a significant problem with some of the current standardized algorithms when attempting to implement for resource constrained software or hardware devices.

A draft Report on Lightweight Cryptography published by NIST in August 2016 states that none of the NIST approved hash functions are suitable for resource constrained environments based on the findings in A Study on RAM Requirements of Various SHA-3 Candidates on Low-cost 8-bit CPUs

There’s currently an effort on part of engineers, cryptographers and organizations to design, evaluate and standardize lightweight cryptographic primitives for the embedded industry.

Whether we agree or disagree on the need to embed wireless devices in everyday objects, there will be many that have networking or wireless functionality in future which will require robust encryption solutions.


A MAC Mode for Lightweight Block Ciphers was designed by Atul Luykx, Bart Preneel, Elmar Tischhauser, Kan Yasuda and published in 2016.

\begin{array}{l}  \\\hline   \textbf{Algorithm 1: }\text{LightMAC} _{K_1,K_2}(M)  \\\hline  \quad\textbf{Input: } K_1, K_2\in \{0, 1\}^k,\: M\in \{0,1\}^{\leq 2^s(n-s)}\\  \quad\textbf{Output: } T\in\{0, 1\}^t\\  \textbf{\scriptsize{1}}\;\; V\leftarrow 0^n\in\{0, 1\}^n\\  \textbf{\scriptsize{2}}\;\; M[1]M[2]\cdots M[\ell]\xleftarrow{n-s}M\\  \textbf{\scriptsize{3}}\;\; \textbf{for } i = 1\textbf{ to } \ell - 1\textbf{ do}\\  \textbf{\scriptsize{4}}\;\; \mid V\leftarrow V\oplus E_{K_1}(i_s M[i])\\  \textbf{\scriptsize{5}}\;\; \textbf{end}\\  \textbf{\scriptsize{6}}\;\; V\leftarrow V\oplus (M[\ell]10^\ast)\\  \textbf{\scriptsize{7}}\;\; T\leftarrow \lfloor E_{K_2}(V)\rfloor_t\\  \textbf{\scriptsize{8}}\;\; \textbf{return } T    \\\hline  \end{array}

The other lightweight MAC algorithms under evaluation by NIST are Chaskey and TuLp. I’ve already covered Chaskey in a previous post.

All code and details of the algorithm shown here are derived from reference implementations provided by Atul here on github

Any developers that require a reference implementation should use those source codes instead of what’s shown here.

Instead of using PRESENT block cipher, I’ll use SPECK64/128 due to the ARX design which is much easier to implement on x86 architecture.

Block Cipher Parameters

LightMAC is a mode of operation and depends directly on an external block cipher.

The only other block ciphers that come close to the size of Speck would be XTEA or Chaskey. The following is a list of parameters recommended for use with Speck.

LightMAC Parameters

The authors define the following in reference material which are based on the block cipher.

  • Length of the protected counter sum in bytes. Not greater than N/2.

  • Length of tag in bytes. Should be at least 64-bits but not greater than N.

  • Length of block in bytes. Same as N.

  • Length of block cipher key. The MAC key is twice this length.

The following table shows some example parameters using existing lightweight block ciphers.

Cipher (E) Block Length (N) Cipher Key Length MAC Key Length (K) Counter Length (S) Tag Length (T)
PRESENT 64-bits 128-bits 256-bits 32-bits 64-bits
SPECK 64-bits 128-bits 256-bits 32-bits 64-bits
AES 128-bits 128-bits 256-bits 32-bits 128-bits


In the specification, V denotes an intermediate/local variable of cipher block size N. It is initialized to zero and updated after every encryption using an XOR operation with ciphertext before returning the result in T (truncated if required)

But in my own implementation, I assume T to be of cipher block size N and initialize it to zero. I then update T instead which is why I prefer readers use the reference implementation instead of what’s shown here. 🙂

My reason for not allocating space for V and using T directly is simply to reduce the amount of code required for the assembly code.


The update process is very similar to what you see used in cryptographic hash algorithms. I was gonna have a more detailed description here but I think comments should be clear enough.


An end bit (0x80) is appended to M buffer along with any data remaining or none if the input length was a multiple of the block cipher length.

This is then XORed with any previous cipher block state before being encrypted with the 2nd key before returning.

x86 Assembly code

First, here’s the SPECK block cipher using 64-bit block and 128-bit key.

%define SPECK_RNDS    27
; *****************************************
; Light MAC parameters based on SPECK64-128
; N = 64-bits
; K = 128-bits
%define COUNTER_LENGTH 4  ; should be <= N/2
%define BLOCK_LENGTH   8  ; equal to N
%define TAG_LENGTH     8  ; >= 64 && <= N
%define BC_KEY_LENGTH 16  ; K

%define ENCRYPT speck64_encryptx
%define k0 edi    
%define k1 ebp    
%define k2 ecx    
%define k3 esi

%define x0 ebx    
%define x1 edx

; esi = input
; ebp = key


      push    esi            ; save M
      lodsd                  ; x0 = x->w[0]
      xchg    eax, x0
      lodsd                  ; x1 = x->w[1]
      xchg    eax, x1
      mov     esi, ebp       ; esi = key
      xchg    eax, k0        ; k0 = key[0] 
      xchg    eax, k1        ; k1 = key[1]
      xchg    eax, k2        ; k2 = key[2]
      xchg    eax, k3        ; k3 = key[3]    
      xor     eax, eax       ; i = 0
      ; x0 = (ROTR32(x0, 8) + x1) ^ k0;
      ror     x0, 8
      add     x0, x1
      xor     x0, k0
      ; x1 = ROTL32(x1, 3) ^ x0;
      rol     x1, 3
      xor     x1, x0
      ; k1 = (ROTR32(k1, 8) + k0) ^ i;
      ror     k1, 8
      add     k1, k0
      xor     k1, eax
      ; k0 = ROTL32(k0, 3) ^ k1;
      rol     k0, 3
      xor     k0, k1    
      xchg    k3, k2
      xchg    k3, k1
      ; i++
      inc     eax
      cmp     al, SPECK_RNDS    
      jnz     spk_el
      pop     edi    
      xchg    eax, x0        ; x->w[0] = x0
      xchg    eax, x1        ; x->w[1] = x1

Now LightMAC.

You might notice how ctr and idx variables are initialized to zero at the same time using CDQ instruction. Once PUSHAD is executed, it preserves EDX on the stack and is then used as the protected counter sum S.

Although we convert the counter to big endian format before saving in block buffer, it wouldn’t affect the security to skip this. I’ve retained it for compatibility with reference but might remove it later.

; void lightmac_tag(const void *msg, uint32_t msglen, 
;     void *tag, void* mkey) 
      lea     esi, [esp+32+4]; esi = argv
      xchg    eax, ebx       ; ebx = msg
      cdq                    ; ctr = 0, idx = 0, 
      xchg    eax, ecx       ; ecx = msglen
      xchg    eax, edi       ; edi = tag
      xchg    eax, ebp       ; ebp = mkey      
      pushad                 ; allocate N-bytes for M
      ; zero initialize T
      mov     [edi+0], edx   ; t->w[0] = 0;
      mov     [edi+4], edx   ; t->w[1] = 0;
      ; while we have msg data
      mov     esi, esp       ; esi = M
      jecxz   lmx_l2         ; exit loop if msglen == 0
      ; add byte to M
      mov     al, [ebx]      ; al = *data++
      inc     ebx
      mov     [esi+edx+COUNTER_LENGTH], al           
      inc     edx            ; idx++
      ; M filled?
      ; --msglen
      loopne  lmx_l1
      jne     lmx_l2
      ; add S counter in big endian format
      inc     dword[esp+_edx]; ctr++
      mov     eax, [esp+_edx]
      ; reset index
      cdq                    ; idx = 0
      bswap   eax            ; m.ctr = SWAP32(ctr)
      mov     [esi], eax
      ; encrypt M with E using K1
      call    ENCRYPT
      ; update T
      lodsd                  ; t->w[0] ^= m.w[0];
      xor     [edi+0], eax      
      lodsd                  ; t->w[1] ^= m.w[1];
      xor     [edi+4], eax         
      jmp     lmx_l0         ; keep going
      ; add the end bit
      mov     byte[esi+edx+COUNTER_LENGTH], 0x80
      xchg    esi, edi       ; swap T and M
      ; update T with any msg data remaining    
      mov     al, [edi+edx+COUNTER_LENGTH]
      xor     [esi+edx], al
      dec     edx
      jns     lmx_l3
      ; advance key to K2
      add     ebp, BC_KEY_LENGTH
      ; encrypt T with E using K2
      call    ENCRYPT
      popad                  ; release memory for M
      popad                  ; restore registers


The x86 assembly generated by MSVC using /O2 /Os is 238 bytes. The x86 assembly written by hand is 152 bytes.

In order for developers to benefit from LightMAC on microcontrollers, they should choose a lightweight block cipher but not necessarily SPECK. It’s only used here for illustration.

See lmx32.asm and lightmac.c for any future updates.

This entry was posted in assembly, cryptography, encryption, programming and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s