Both challenge involves a custom VM interpretar that was used to run a different binary file depending on the level. At first the challenge was only a Mach-o binary (where I did most of the vermu1 challenge). However, during the CTF the organizers dicided to give also an ELF binary with the same functionallity as the Mach-o one. (Not everyone has a mac these days :) ).

Vermu 1 [250 pts]

The idea was to understand how the internal vm was operating with a simple vm code file. Those are the files given to the participants:

If we run the vm interpreter with the level1 file and give it 32 bytes of data, we get a reponse: So as you can see, the vm is capable of using STDIN and STDOUT.

$ ./vermu level1
nope :(

I initially started by looking into the actual level1 binary with the objective of identifying potentially already known opcodes for different architectures. However, this was not the case. So I decided to go for the actual interpreter.

On the right side of the branch, we are calling 2 functions that I renamed as prepare_vm and start_vm_loop.

alt text

The prepare_vm does the following:

  • Allocates a large buffer of 0x81000 bytes.
  • Sets the allocated buffer with '0x00'.
  • Calls 3 internal functions and stores them on the allocated buffer + 0x80000.

alt text

Those functions are responsible for setting up all the associated functions to the corresponding opcode values as well as the handlers.

Checking how those functions are then called inside the vm_start_loop function, shows that the values are inserted in a function table starting at 0x80010. This address corresponds to the opcode 00, meaning that all the opcodes can be translated to the corresponding offset with 0x80010 + (opcode - 0x10) * 8. For example, the opcode 0x20 will be at 0x8090 from the buffer:

alt text

From all the previous information and by manually checking some of the opcodes I came to the conclusion that this vm implementation corresponds to an stack machine. And the memory layout for it is the following:

alt text

The code is copied at address 0x0 of allocated buffer (from now on, vmstate). And the stack is just below the code data and starts at 0x80000 and moves to lower addresses on push. (Yes, stack can override code :) ).

All the opcodes implementations are either pushing or poping values from/to the stack and operating over them. Some opcodes such as the 0x10 are designed to read values from memory (including code). For example the opcode 0x20 is used to write to memory by first poping the value to write and then the address to write to. The operations can be categorized on the following groups:

  • 0x10: Push value from code (after the instruction) to stack
  • 0x2x: Write memory values (indirect reference push and write)
  • 0x3x: Arithmetical operations such as, add, mul, xor, cmp, not.
  • 0x4x: Jump operations, such as jmp not/equal/greater/smaller (used together with cmp)
  • 0x50: Some handlers functions for IO. (STDIN if 0 is pushed and STDOUT if 1 is pushed together with the address to read/write and the length).

So my solution was the following:

  • Dynamically break at the address where the functions are called using the function table.
  • Check the function that is going to be called and statically analyze the functionallity.
  • Confirm that the operation did what I expected.
  • Write a decompiler and emulator in parallel.

This way, I only had to reverse the executed opcodes and not all of them.

The decompiler and emulator writen to solve this challenge can be found here:

This script is capable of emulating the entire code even the STDIN functionallity. (Please, don't judge the code it was written fast for this task only).

After executing the emulator, we can see that the input is xored in word chunks with some predefined values on the binary by negating them first. For example the first 4 bytes are xored with the negation of 0xab8e6bc and compared against 0x9a242b31.

READING @ 0x1008 (32 bytes)
PUSH 0x56
JMP @ 0x56
PUSH 0x1
PUSH 0x1004
STORE @ 0x1004 -> 0x1
PUSH 0x1008
PUSH 0x1000
PUSH [0x1000] -> 0x0
PUSH 0x4
ADD 0x0 + 0x1008
PUSH [0x1008] -> 0x41414141
PUSH 0x16
PUSH 0x1000
PUSH [0x1000] -> 0x0
PUSH 0x4
ADD 0x0 + 0x16
PUSH [0x16] -> 0xab8e6bc
NEGATE 0xab8e6bc -> 0xf5471943
XOR 0xf5471943 ^ 0x41414141 -> 0xb4065802
PUSH 0x36
PUSH 0x1000
PUSH [0x1000] -> 0x0
PUSH 0x4
ADD 0x0 + 0x36
PUSH [0x36] -> 0x9a242b31
CMP 0x9a242b31 and 0xb4065802 -> 1

I wrote an script that extracts those values from the level1 binary and performs the operations to pass all checks:

b = bytearray(open("level1", 'rb').read())

key = b[0x16:0x16+32]
result = b[0x36:0x36+32]

sol = ''

for k,r in zip(key, result):
    sol += chr(((~k) & 0xff) ^ r)

print sol

Vermu 2 [400 pts]

Those are the files given to the participants for the second challenge:

I first checked if the interpreter code was different:

$ shasum vermu2_elf
ea4a12ac64a496083388f77b3bc9d9a12f0a1af8 vermu2_elf
$ shasum vermu1_elf
ee5b6774a329fb3448045654d50857d532d8a257 vermu1_elf

Sadly, the two binaries hashes are different. However, while reversing the opcodes for this new version and comparing them against the first version I saw that they were the same.

Executing the same decompiler with this new level showed that some new opcodes were used. I manually reversed those new opcodes and added them to the emulator script. However, I noticed that some opcodes were not even registered on the prepare_vm function. After spending some time looking at the output of the decompiler I saw that the code was mutating itself, as the addresses written to are from the code.

0000044f PUSH 0x4
00000454 MULTIPLY 4 * 0
00000455 ADD 0x0 + 0x4ac -> 0x4ac
00000456 PUSH [0x4ac] -> 0x5f4f4f5f
00000457 PUSH 0x8d1
0000045c PUSH [0x8d1] -> 0x4f4f4f4f
0000045d XOR 0x4f4f4f4f ^ 0x5f4f4f5f -> 0x10000010
0000045e PUSH 0x4ac
00000463 PUSH 0x1000
00000468 PUSH [0x1000] -> 0x0
00000469 PUSH 0x4
0000046e MULTIPLY 4 * 0
0000046f ADD 0x0 + 0x4ac -> 0x4ac
00000470 STORE @ 0x4ac -> 0x10000010

I modified the emulator code to be able to modify itself by copying the opcodes to the beggining of the memory and using the opcode extraction from memory instead of the binary itself:

r = open('level2','rb').read()

b = bytearray(r)

mem = bytearray([0] * 0x100000)
mem[0:len(b)] = b[0:len(b)]

# ...

def parse_instruction(b):
    global pc
    if mem[pc] == 0:
        pc += 1
# ...

I fixed some more unknown opcodes, all of them trivial and rerun the emulator. It worked on the first try and outputed 1593 lines of emulated code and finally printed the nope :( to the screen:

0000077b JMP NZ CONDITIONAL 0 to addr 0x4ac
0000077c PUSH 0x1008
00000781 PUSH [0x1008] -> 0x0
00000782 PUSH 0x1
00000787 CMP 0x1 and 0x0 -> 1
00000788 PUSH 0x89e
0000078d JMP Z CONDITIONAL 1 to addr 0x89e
0000078e PUSH 0x8c9
00000793 PUSH 0x8
00000798 PUSH 0x1
0000079d PRINTING from @ 0x8c9 (8 bytes)
0000079d nope :(

The emulator and decompiler can be found here:

I saw that my input was again xored with some values stored in-memory. However, this time, the memory was mutated during the initial execution steps.

000004be PUSH [0x100c] -> 0x41414141
000004bf PUSH 0x4fe
000004c4 PUSH 0x1004
000004c9 PUSH [0x1004] -> 0x0
000004ca PUSH 0x4
000004cf MULTIPLY 4 * 0
000004d0 ADD 0x0 + 0x4fe -> 0x4fe
000004d1 PUSH [0x4fe] -> 0xaa2e1cc
000004d2 XOR 0xaa2e1cc ^ 0x41414141 -> 0x4be3a08d
000004d3 PUSH 0x51e
000004d8 PUSH 0x1004
000004dd PUSH [0x1004] -> 0x0
000004de PUSH 0x4
000004e3 MULTIPLY 4 * 0
000004e4 ADD 0x0 + 0x51e -> 0x51e
000004e5 PUSH [0x51e] -> 0x65c1d3be
000004e6 CMP 0x65c1d3be and 0x4be3a08d -> 1

After extracting those values from the emulated code for every word I wrote the following solution script:

from pwn import *

w1 = 0xaa2e1cc ^ 0x65c1d3be
w2 = 0xbab1daba ^ 0xd6d2a1d4
w3 = 0xa12dd7c ^ 0x557cbc48
w4 = 0x571b1cb2 ^ 0x24447dd6

w5 = 0x9721c57f ^ 0xfc42f10b
w6 = 0x48025565 ^ 0x2e31373a
w7 = 0x80df1593 ^ 0xdfba67fc
w8 = 0x7d7c2915 ^ 0xc447f

sol = p32(w1) + p32(w2) + p32(w3) + \
      p32(w4) + p32(w5) + p32(w6) + \
      p32(w7) + p32(w8)

print sol