## SM3 Cryptographic Hash Algorithm (Chinese Standard)

### Introduction

SM3 is 256-bit cryptographic hash algorithm derived from SHA-2 designed by the NSA. It was designed by Xiaoyun Wang who is responsible for discovering attacks against many cryptographic hash functions, most notably MD5 and SHA-1. At the CRYPTO 2004 conference, she and co-authors demonstrated collision attacks against MD5 and SHA-0. Attacks against SHA-1 were published later in 2005. SM3 was published in December 2007 by the Chinese National Cryptographic Administration Bureau as part of the Trusted Computing framework in China. SM3 uses a Merkle-Damgard construction like MD4, MD5, SHA-1 and SHA-2. SM means commercial cipher. The full list of cryptographic standards for commercial purposes are as follows.

• SM2 – Elliptic curve algorithms for digital signatures and key exchange.
• SM3 – A cryptographic hash.
• SM4 – A block cipher.
• SM9 – A set of identity-based cryptographic schemes from pairings.

### Macros and data types

#define R(v,n)(((v)<<(n))|((v)>>(32-(n))))
#define F(n)for(i=0;i<n;i++)

#define rev32(x) __builtin_bswap32(x)
#define rev64(x) __builtin_bswap64(x)

typedef unsigned long long Q;
typedef unsigned int W;
typedef unsigned char B;

typedef struct _sm3_ctx {
W s[8];
union {
B b[64];
W w[16];
Q q[8];
}x;
Q len;
}sm3_ctx;


### Initialization

In the original SHA-2 specification, the 8 initialization values are fractional parts of square roots for the first 8 primes 2..19). See here for more detailed description on how to create these.

$H_0^{(0)}\: =\: 6a09e667$
$H_1^{(0)}\: =\: bb67ae85$
$H_2^{(0)}\: =\: 3c6ef372$
$H_3^{(0)}\: =\: a54ff53a$
$H_4^{(0)}\: =\: 510e527f$
$H_5^{(0)}\: =\: 9b05688c$
$H_6^{(0)}\: =\: 1f83d9ab$
$H_7^{(0)}\: =\: 5be0cd19$

In SM3, the following 8 values are used:

$H_0^{(0)}\: =\: 7380166f$
$H_1^{(0)}\: =\: 4914b2b9$
$H_2^{(0)}\: =\: 172442d7$
$H_3^{(0)}\: =\: da8a0600$
$H_4^{(0)}\: =\: a96f30bc$
$H_5^{(0)}\: =\: 163138aa$
$H_6^{(0)}\: =\: e38dee4d$
$H_7^{(0)}\: =\: b0fb0e4e$

Unfortunately, there’s no explanation for how, or why these values were selected.

void sm3_init(sm3_ctx*c) {
c->s[0]=0x7380166f;
c->s[1]=0x4914b2b9;
c->s[2]=0x172442d7;
c->s[3]=0xda8a0600;
c->s[4]=0xa96f30bc;
c->s[5]=0x163138aa;
c->s[6]=0xe38dee4d;
c->s[7]=0xb0fb0e4e;
c->len =0;
}


### Updating context

Updating the buffer and state is exactly the same as SHA-2 that is based on the original design for MD4 by Ron Rivest. Once the buffer has 64-bytes of data, it’s processed using sm3_compress

void sm3_update(sm3_ctx*c,const void*in,W len) {
B *p=(B*)in;
W i, idx;

idx = c->len & 63;
c->len += len;

for (i=0;i<len;i++) {
c->x.b[idx]=p[i]; idx++;
if(idx==64) {
sm3_compress(c);
idx=0;
}
}
}


### Finalization

This step is also the exact same as SHA-2.

void sm3_final(void*h,sm3_ctx*c) {
W i,len,*p=h;

i = len = c->len & 63;
while(i<64) c->x.b[i++]=0;
c->x.b[len]=0x80;

if(len>=56) {
sm3_compress(c);
F(16)c->x.w[i]=0;
}
c->x.q[7]=rev64((Q)c->len*8);
sm3_compress(c);
F(8)p[i]=rev32(c->s[i]);
}


### Compression

The main difference between SHA-2 and SM3 is in the compression function. In SHA-2, the 512-bit message is expanded into 64 32-bit words. For SM3, it’s expanded into 68 32-bit words. The following two linear functions are defined:

$P_1$ for expansion, $P_0$ for compression

$P_0(X) = X\oplus (X\lll 9)\oplus (X\lll 17)$
$P_1(X) = X\oplus (X\lll 15)\oplus (X\lll 23)$

#define P0(x)x^R(x,9)^R(x,17)
#define P1(x)x^R(x,15)^R(x,23)


The first part of expansion is similar to SHA-2, but uses different offsets.

$\text{for}\: i=16\: \text{to}\: 67\\ \hphantom{---}W_i\leftarrow P_1(W_{i-16}\oplus W_{i-9}\oplus(W_{i-3}\lll 15))\oplus (W_{i-13}\lll 7)\oplus W_{i-6}\\ \text{endfor}$

$\text{for}\: i = 0\: \text{to}\: 63\\ \hphantom{---}W_i'=W_i\oplus W_{i+4}\\ \text{endfor}$

The following code performs the first loop of the expansion. The second loop is inlined with the main loop.

    for(i=16;i<68;i++)
w[i]=P1(w[i-16]^w[i-9]^R(w[i-3],15))^R(w[i-13],7)^w[i- 6];


The additional 4 words in SM3 are required for a separate expansion that I’ve inlined with the compression function to try minimize code usage.

$W_i'=W_i\oplus W_{i+4}\: 0\leq i < 64.$

You can see in main compression loop while creating TT1 and TT2.

For SHA-2, 64 words are used during compression which are described here in detail.

Both use the same sequence of sixty-four constant 32-bit words, K0{256}, K1{256}, …,K63{256} These words represent the first thirty-two bits of the fractional parts of the cube roots of the first sixty-four prime numbers. In hex, these constant words are (from left to right)

For SM3, only 2 constants are used. One is selected depending on the position of loop counter, then a circular shift by loop counter is applied before adding into the state. This certainly helps make SM3 more compact than SHA-2 although I can’t say if it strengthens in some way.

$T_i = \begin{cases} 79cc4519 & 0\leq i \leq 15 \\[1ex] 7a879d8a & 16\leq i \leq 63 \end{cases}$

t = (i < 16) ? 0x79cc4519 : 0x7a879d8a;


There are 3 boolean operations used during the compression which are similar to bitwise majority MAJ and bitwise choice CH operations in SHA-2.

$FF_i(X,Y,Z) = \begin{cases} X\oplus Y\oplus Z & 0\leq i \leq 15 \\[1ex] (X\wedge Y)\vee (X\wedge Z)\vee (Y\wedge Z) & 16\leq i \leq 63 \end{cases}$

$GG_i(X,Y,Z) = \begin{cases} X\oplus Y\oplus Z & 0\leq i \leq 15 \\[1ex] (X\wedge Y) \vee (\neg X \wedge Z) & 16\leq i \leq 63 \end{cases}$

#define F1(x,y,z)(((x)^(y)^(z)))
#define FF(x,y,z)(((x)&(y))^((x)&(z))^((y)&(z)))
#define GG(x,y,z)(((x)&(y))^(~(x)&(z)))


$ABCDEFGH\leftarrow V^{(i)}$
\begin{aligned} \text{for}\; & i= 0\; \text{to}\: 63\\ & SS_1\leftarrow ((A\lll 12)+E+(T_i\lll i))\lll 7\\ & SS_2\leftarrow SS_1\oplus(A\lll 12)\\ & TT_1\leftarrow FF_i(A,B,C)+D+SS_2+W_i'\\ & TT_2\leftarrow GG_i(E,F,G)+H+SS_1+W_i\\ & D\leftarrow C\\ & C\leftarrow B\lll 9\\ & B\leftarrow A\\ & A\leftarrow TT_1\\ & H\leftarrow G\\ & G\leftarrow F\lll 19\\ & F\leftarrow E\\ & E\leftarrow P_0(TT_2)\\ \text{endfor}&\; \end{aligned}

### Updating state

At the end of compression function in SHA-2, the input hash is updated by adding the output from compression.

$H_0^{(i)}=a+H_0^{(i-1)}$
$H_1^{(i)}=b+H_1^{(i-1)}$
$H_2^{(i)}=c+H_2^{(i-1)}$
$H_3^{(i)}=d+H_3^{(i-1)}$
$H_4^{(i)}=e+H_4^{(i-1)}$
$H_5^{(i)}=f+H_5^{(i-1)}$
$H_6^{(i)}=g+H_6^{(i-1)}$
$H_7^{(i)}=h+H_7^{(i-1)}$

In SM3, it is instead XOR’d which follows a Davies-Meyer idea for compression functions.

$H_0^{(i)}=a\oplus H_0^{(i-1)}$
$H_1^{(i)}=b\oplus H_1^{(i-1)}$
$H_2^{(i)}=c\oplus H_2^{(i-1)}$
$H_3^{(i)}=d\oplus H_3^{(i-1)}$
$H_4^{(i)}=e\oplus H_4^{(i-1)}$
$H_5^{(i)}=f\oplus H_5^{(i-1)}$
$H_6^{(i)}=g\oplus H_6^{(i-1)}$
$H_7^{(i)}=h\oplus H_7^{(i-1)}$

The output state from compression is XORed with the previous hash value to produce the next hash value. In the first round when there is no previous hash value it uses a constant pre-specified initial values.

$H_i=E_{m_i}(H_{i-1})\oplus H_{i-1}.$

Here’s the full compression function.

void sm3_compress(sm3_ctx*c) {
W t1,t2,i,j,t,s1,s2,x[8],w[68];

F(16)w[i]=rev32(c->x.w[i]);
// expand
for(i=16;i<68;i++)
w[i]=P1(w[i-16]^w[i-9]^R(w[i-3],15))^R(w[i-13],7)^w[i- 6];

// load internal state
F(8)x[i]=c->s[i];

// compress data
F(64) {
t=(i<16)?0x79cc4519:0x7a879d8a;
s2=R(x[0],12);
s1=R(s2+e+R(t,i),7);
s2^=s1;
if(i<16) {
t1=F1(x[0],x[1],x[2])+x[3]+s2+(w[i]^w[i+4]);
t2=F1(x[4],x[5],x[6])+x[7]+s1+w[i];
} else {
t1=FF(x[0],x[1],x[2])+x[3]+s2+(w[i]^w[i+4]);
t2=GG(x[4],x[5],x[6])+x[7]+s1+w[i];
}
x[3]=x[2];x[2]=R(x[1],9);x[1]=x[0];x[0]=t1;
x[7]=x[6];x[6]=R(x[5],19);x[5]=x[4];x[4]=P0(t2);
}
// update internal state
F(8)c->s[i]^=x[i];
}


Sources here. Thanks to 0x4d_ for submitting $\LaTeX$ equations.