hoping for no cyber bullshit...

Solving a WASM Crackme staticallylun. 09 juillet 2018

tags : NuitduHack / challenge / WASM /

What you need to know at least before starting

WASM exists in two forms of representation, the binary one, which is the one directly loaded by our browsers, and the textual one, readable by a human. Basically, this second form was made especially for humans to be able to directly code some WASM, and to use debugging/developper tools on it. I won't present here the binary format, because we will only work on the textual one.

The basic unit for WASM code is what is called an S-expression. Basically, it's just a parenthesized expression like this : ( ..... ). As in lisp, this parenthesized expression defines a node, then nodes can then be assembled like a tree for code to be evaluated (think of it as a parsed code). For a real code to be fonctionnal, it needs to define one module. So the real basics of a complete WASM code is the following expression : (module). Then, one can for example defines a function in it: (module (func))... A function is defined like this in WASM : ( func <params> <result> <locals> <body> ), where params defines the parameters of the function, return the return type and variable, locals defines every local variables of the function, and body defines the code of it.

For example, let's define a WASM module that has a simple function adding its only integer parameter to a constant equal to 3 then return the result. We will see how to define the body after the declaration :

      (param i32)
      (result i32)
      (local i32)

To write the body of this function, what you have to know is that WASM is a stack-based language. That means that every operation of the language is manipulating the stack of the module, taking zero, one or more parameters on the stack, then pushing the result on it. In the case of an adding operation, the idea is to push the two operands on the stack, then get those for the operation, and finally push the result back to stack. Also, local variables and parameters are accessed through their index. So, if you want to get the first parameter of your function, you will use get_local 0 and if you want to write into let's say the local variable which is the fourth one, then you will use set_local 4.

The body of our function is so the following :

get_local 0    | pushes parameter on stack
i32.const 3    | pushes constant 3 on stack
i32.add        | get the two values on stack(3 and our parameter), then calculates the addition, then push result back on stack
               | stack is (3+param)

Finally, here are a bunch of some good resources on WASM :

The challenge

"Diablo, Fighting Lazarus" was a WASM Crackme proposed at Nuit du Hack Jeopardy CTF. It presented itself as an archive, decompressed into diablo.html, diablo.wasm and diablo.js files. When we open the diablo.html file, here is what we get :


This challenge asks for an attack to kill Lazarus, and as any crackme, if the attack entered is correct, then we should have a "Win" message.

Challenge files, as well as the non compiled version can be found there.

A solution has been written by the Aperikube team here. Their solution uses the debugger shipped in principal browsers in order to single step through the file and understand what's going on. As some points of this challenge were still a bit uncleared to the majority of players, i decided to make a writeup that uses purely static techniques to solve the challenge. The idea is so to introduce and explain some concepts of WebAssembly along the resolution.

Let's start

I will solve this challenge using only static techniques. So, to start, i take the wasm module shipped with the challenge, and transform it to text representation. For doing such, one can use the wasm2wat tool. After processing, we are greeted with an almost 20000 lines long module. One can do his first assumption here : this was module may have been created from compilation of a C/C++ file. And indeed it was.!

One of the most blocking point i have seen for challengers was to determine which function is indeed the one used to check for the flag. To do so, there is the intelligent solution and the hardcore one.

The first one is to check the html file of the challenge, and inspect its source code. The first thing to see is that when clicking the button to check for validation, we are indeed calling the PassValue javascript function. Indeed, we have the following in the source : <input type="button" value="Attack him!" onclick="PassValue();" /> . After digging a little bit more, one can check what this function is doing (comments were left as is in the code) :

function PassValue(){
  Module.ccall('Attack', // name of C function
  null, //return type 
  ['string'],//argument types 

So basically, this function is calling the function named Attack in the web assembly module, passing the password entered in the txtValue textbox as a string argument. What you need to know here is that we are calling a function on a conditional event. That means it's not the main function of our webassembly module, which is called when the module is loaded. For such a thing to work, the WASM module needs to export this Attack Function, and define a corresponding symbol for javascript to call this function. That means the name Attack should be exported. Let's load the text representation of our module in our favorite ext editor, and search for Attack. You should see something similar to this :


This line basically says that our Attack function is the function 21. Let's search for the reference to function 21, which in WASM is obtained through ;;:


And here we are, we can now analyze our function.

The other solution, the hardcore one, would have been to do the following to find our function : create a C program that manipulates strings (because we are asking for a password that will certainly be manipulated). The idea will be so to include "string.h" and "stdio.h". Let's now compile this program using emcc to a webassembly module. You can now make a generic diffing of the functions there. Indeed, our library functions will be included and compiled in the wasm modules. We can so determine which functions are not the library one. Of course, our function 21 won't be defined in your module.

Understanding memory management

As usual in a lot of crackmes, it's better to first find the "fail"/"win" messages, usually at the end, in order to then reverse the logic behind it. At the end of our function, we have the following :


What we see here is that we have the constants "1922" and "1597" that are pushed on the stack before calling function 67. There is two solutions here : those constants are used as is, or they are pointers to something else, especially memory that should reside somewhere in the module.

In WASM, there is basically only one way to have consistent memory not directly created on the stack. This way is to use a so-called "data" section, that defines basically every constants of our module (like a .rodata section in a binary). Let's search for this in the module :


So, there is indeed a data section, and this data section contains several interesting strings like our "win"/"fail" messages. The thing to understand here is that basically, the "(i32.const 1024)" written at the start of this section defines in fact the base pointer of this section. That means if you want to access the "n" which is the first byte of this section, then you can access it using the pointer of address 1024. If you want to get the corresponding pointers of interesting data, you can so just calculate the offset from the base address of this section. Let's do it for our "win" message :

data_section=<our data string where "\" has been replaced with "\x">
>>> data_section.find("\"So, ")
>>> 1024+573

Here we are, we know that the pointer to our "win" message is 1597. By doing the same, you are then able to get a clearer view of the function.

Studying our logic

Let's go back to our logic. By starting from the end, here is what we have:

get_local 45
set_local 66   
get_local 66    --> here we are basically putting value of variable 45 into variable 66 and then use variable 66 (ty compiler...)
i32.const 0
i32.eq --> is equal to zero?
set_local 68
get_local 68
if  ;; label = @1
  i32.const 1597 --> "win"
  get_local 72
  call 67        --> printf
  get_local 75
  set_global 12
  return --> end of function
  i32.const 1922  --> "fail"
  get_local 73
  call 67       --> printf
  get_local 75
  set_global 12

This block is basically saying "if the value contained in variable 45 is zero, then prints win". So let's search what is indeed contained in variable 45, from the last time it was set (searching set_local 45). We basically obtain this kind of block:

i32.const 0
set_local 23
loop  ;; label = @1
  block  ;; label = @2
    get_local 23
    get_local 61
    if  ;; label = @3
      get_local 45
      set_local 62
      get_local 62
      i32.const 1
      set_local 63
      get_local 63
      set_local 45                 -----> here the value is set!
    get_local 23
    set_local 64
    get_local 64
    i32.const 1
    set_local 65
    get_local 65
    set_local 23
    br 1 (;@1;)

So, let's understand what is done there, by annotating the full block:

i32.const 0                                 |
set_local 23                                | $var23 = 0  
loop  ;; label = @1                         | loop1:
  block  ;; label = @2                      |    
    get_local 23                            |
    set_local 46                            |
    get_local 46                            |
    i32.const 9                             |
    i32.lt_s                                |      
    set_local 47                            |    $var47 = 9 - $var23
    get_local 47                            |
    i32.eqz                                 |    $var47 == 0 ?
    if  ;; label = @3                       |    if yes then exit
      br 1 (;@2;)                           |
    end                                     |    
    get_local 23                            |
    set_local 48                            |
    get_local 48                            |
    i32.const 3                             |   
    i32.mul                                 |
    set_local 49                            |    $var49 = $var23 * 3
    get_local 49                            |
    i32.const 2                             |
    i32.add                                 |
    set_local 50                            |
    get_local 50                            |
    set_local 34                            |    $var34 = $var49 + 2
    get_local 1                             |  
    set_local 51                            |    
    get_local 34                            |    
    set_local 52                            |
    get_local 51                            |
    get_local 52                            |
    i32.add                                 |
    set_local 53                            |    $var53 = $var1 + $var34
    get_local 53                            |
    i32.load8_s                             |    
    set_local 54                            |    $var54 = value at memory address ($var1 + $var34)
    get_local 54                            |
    i32.const 24                            |
    i32.shl                                 |
    i32.const 24                            |
    i32.shr_s                               |
    set_local 55                            |
    get_local 55                            |
    i32.const 66                            |
    i32.xor                                 |
    set_local 57                            |    $var57 = $var54 ^ 66
    get_local 23                            |    
    set_local 58                            |    
    get_local 69                            |    
    get_local 58                            |    
    i32.const 2                             |    
    i32.shl                                 |
    i32.add                                 |
    set_local 59                            |    $var59 = ($var23<<2)+$var69
    get_local 59                            |
    i32.load                                |
    set_local 60                            |    $var60 = value at memory address ($var59) 
    get_local 57                            |
    get_local 60                            |
    i32.ne                                  |    if $var60 != $var57
    set_local 61                            |
    get_local 61                            |
    if  ;; label = @3                       |    alors $var45 += 1 
      get_local 45                          |
      set_local 62                          |
      get_local 62                          |
      i32.const 1                           |
      i32.add                               |
      set_local 63                          |
      get_local 63                          |
      set_local 45                          |
    end                                     |
    get_local 23                            |   
    set_local 64                            |
    get_local 64                            |
    i32.const 1                             |
    i32.add                                 |   $var23 += 1
    set_local 65                            |
    get_local 65                            |
    set_local 23                            |
    br 1 (;@1;)                             |   goto loop1
  end                                       |
end                                         |

This part can so be rewritten like that in pseudo code:

for (int i=0;i<9;++i){
   int p1 = ($var1 + i*3 + 2)^66;
   int p2 = $var69 + i<<2;
   if (p1 != p2)
       $var45 +=1;

So from our previous analysis, we know that we must verify that $var45 is zero, so that here p1 and p2 are equal. Let's search what there is in $var1 (by searching for set_local 1) :


So here we see that $var1 is in fact the same that $var0, that is our entered password. So, $p1 is password[i*3+2]^66. Let's now understand what is inside $var69 :

get_global 12   | if we search, we have: " (global (;12;) (mut i32) (get_global 5)) ", and "   (import "env" "STACKTOP" (global (;5;) i32)) "
set_local 75    | so $var75 is basically the stack top
<snip>          |
get_local 75    |
i32.const 20    | 
i32.add         |
set_local 69    |and $var69 is basically the stack top+20.

So, $var69 is basically an offset on the stack, where data is loaded afterwards. Is there anyway to know statically what is loaded there? What we can assume is that we will see something like that : get value of $var69, add an offset, get the value we want, then push it to $var69+offset. With a little bit of searching, you will be able to see a section of the analysed function with the following :

get_local 69           | get $var69 value
i32.const 1096         | 
i64.load align=4       | get the value at pointer 1096, with alignment 4
i64.store align=4      | store the previous value got at $var69 offset of the stack
get_local 69           | do the same than before but with $var69+8 and pointer 1096+8...
i32.const 8            |
i32.add                |
i32.const 1096         |
i32.const 8            |
i32.add                |
i64.load align=4       |
i64.store align=4      |
get_local 69           | still do the same than before but with $var69+16 and pointer 1096+16
i32.const 16           |
i32.add                |
i32.const 1096         |
i32.const 16           |
i32.add                |
i64.load align=4       |
i64.store align=4      |
get_local 69           | ....
i32.const 24           |
i32.add                |
i32.const 1096         |
i32.const 24           |
i32.add                |
i64.load align=4       |
i64.store align=4      |
get_local 69           |
i32.const 32           |
i32.add                |
i32.const 1096         |
i32.const 32           |
i32.add                |
i32.load               |
i32.store              |

So basically here, we are getting data in our data section, and push it onto the stack. Let's see what is at pointer 1096:

>>> data_section[1096-1024:1096-1024+8]

So if we come back to our pseudo code where p1 and p2 are compared, we are putting in p2 the value of every 4 bytes consecutively from pointer 1096. By gathering the values in the data section, we have :

>>> data = '%#\x1d.*s\x1d6\x09'
>>> result = ''.join(chr(ord(c)^66) for c in data)
>>> result

So our password is something like : **g**a**_**l**h**1**_**t**K. You can then do the same with the two other loop blocks (that are the same logic). At the end, you finally get that the password is M4g1karP_SPl4sh_l1ke_4ttacK.


to the top