Both challenge involves a custom VM interpretar that was used to run a different binary file depending on the level. At first the challenge was only a
Mach-o binary (where I did most of the
vermu1 challenge). However, during the
CTF the organizers dicided to give also an
ELF binary with the same functionallity as the
Mach-o one. (Not everyone has a mac these days :) ).
Vermu 1 [250 pts]
The idea was to understand how the internal vm was operating with a simple vm code file. Those are the files given to the participants:
If we run the
vm interpreter with the
level1 file and give it
32 bytes of data, we get a reponse: So as you can see, the
vm is capable of using
$ ./vermu level1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA nope :(
I initially started by looking into the actual
level1 binary with the objective of identifying potentially already known opcodes for different architectures. However, this was not the case. So I decided to go for the actual interpreter.
On the right side of the branch, we are calling 2 functions that I renamed as
prepare_vm does the following:
- Allocates a large buffer of
- Sets the allocated buffer with '0x00'.
- Calls 3 internal functions and stores them on the allocated buffer +
Those functions are responsible for setting up all the associated functions to the corresponding opcode values as well as the handlers.
Checking how those functions are then called inside the
vm_start_loop function, shows that the values are inserted in a function table starting at
0x80010. This address corresponds to the opcode
00, meaning that all the opcodes can be translated to the corresponding offset with
0x80010 + (opcode - 0x10) * 8. For example, the opcode
0x20 will be at
0x8090 from the buffer:
From all the previous information and by manually checking some of the opcodes I came to the conclusion that this vm implementation corresponds to an stack machine. And the memory layout for it is the following:
The code is copied at address
0x0 of allocated buffer (from now on,
vmstate). And the stack is just below the code data and starts at
0x80000 and moves to lower addresses on push. (Yes, stack can override code :) ).
All the opcodes implementations are either pushing or poping values from/to the stack and operating over them. Some opcodes such as the
0x10 are designed to read values from memory (including code). For example the opcode
0x20 is used to write to memory by first poping the value to write and then the address to write to. The operations can be categorized on the following groups:
0x10: Push value from code (after the instruction) to stack
0x2x: Write memory values (indirect reference push and write)
0x3x: Arithmetical operations such as, add, mul, xor, cmp, not.
0x4x: Jump operations, such as jmp not/equal/greater/smaller (used together with
0x50: Some handlers functions for
0is pushed and
1is pushed together with the address to read/write and the length).
So my solution was the following:
- Dynamically break at the address where the functions are called using the function table.
- Check the function that is going to be called and statically analyze the functionallity.
- Confirm that the operation did what I expected.
- Write a decompiler and emulator in parallel.
This way, I only had to reverse the executed opcodes and not all of them.
The decompiler and emulator writen to solve this challenge can be found here:
This script is capable of emulating the entire code even the
STDIN functionallity. (Please, don't judge the code it was written fast for this task only).
After executing the emulator, we can see that the input is xored in word chunks with some predefined values on the binary by negating them first. For example the first 4 bytes are xored with the negation of
0xab8e6bc and compared against
READING @ 0x1008 (32 bytes) READING 32 bytes: AAAABBBBCCCC PUSH 0x56 JMP @ 0x56 PUSH 0x1 PUSH 0x1004 STORE @ 0x1004 -> 0x1 PUSH 0x1008 PUSH 0x1000 PUSH [0x1000] -> 0x0 PUSH 0x4 MULTIPLY 4 * 0 ADD 0x0 + 0x1008 PUSH [0x1008] -> 0x41414141 PUSH 0x16 PUSH 0x1000 PUSH [0x1000] -> 0x0 PUSH 0x4 MULTIPLY 4 * 0 ADD 0x0 + 0x16 PUSH [0x16] -> 0xab8e6bc NEGATE 0xab8e6bc -> 0xf5471943 XOR 0xf5471943 ^ 0x41414141 -> 0xb4065802 PUSH 0x36 PUSH 0x1000 PUSH [0x1000] -> 0x0 PUSH 0x4 MULTIPLY 4 * 0 ADD 0x0 + 0x36 PUSH [0x36] -> 0x9a242b31 CMP 0x9a242b31 and 0xb4065802 -> 1
I wrote an script that extracts those values from the
level1 binary and performs the operations to pass all checks:
b = bytearray(open("level1", 'rb').read()) key = b[0x16:0x16+32] result = b[0x36:0x36+32] sol = '' for k,r in zip(key, result): sol += chr(((~k) & 0xff) ^ r) print sol
Vermu 2 [400 pts]
Those are the files given to the participants for the second challenge:
I first checked if the interpreter code was different:
$ shasum vermu2_elf ea4a12ac64a496083388f77b3bc9d9a12f0a1af8 vermu2_elf $ shasum vermu1_elf ee5b6774a329fb3448045654d50857d532d8a257 vermu1_elf
Sadly, the two binaries hashes are different. However, while reversing the opcodes for this new version and comparing them against the first version I saw that they were the same.
Executing the same decompiler with this new level showed that some new opcodes were used. I manually reversed those new opcodes and added them to the emulator script. However, I noticed that some opcodes were not even registered on the
prepare_vm function. After spending some time looking at the output of the decompiler I saw that the code was mutating itself, as the addresses written to are from the code.
0000044f PUSH 0x4 00000454 MULTIPLY 4 * 0 00000455 ADD 0x0 + 0x4ac -> 0x4ac 00000456 PUSH [0x4ac] -> 0x5f4f4f5f 00000457 PUSH 0x8d1 0000045c PUSH [0x8d1] -> 0x4f4f4f4f 0000045d XOR 0x4f4f4f4f ^ 0x5f4f4f5f -> 0x10000010 0000045e PUSH 0x4ac 00000463 PUSH 0x1000 00000468 PUSH [0x1000] -> 0x0 00000469 PUSH 0x4 0000046e MULTIPLY 4 * 0 0000046f ADD 0x0 + 0x4ac -> 0x4ac 00000470 STORE @ 0x4ac -> 0x10000010 ... [UNK OPCODES]
I modified the emulator code to be able to modify itself by copying the opcodes to the beggining of the memory and using the opcode extraction from memory instead of the binary itself:
r = open('level2','rb').read() b = bytearray(r) mem = bytearray( * 0x100000) mem[0:len(b)] = b[0:len(b)] # ... def parse_instruction(b): global pc if mem[pc] == 0: print_pc("NOP") pc += 1 # ...
I fixed some more unknown opcodes, all of them trivial and rerun the emulator. It worked on the first try and outputed 1593 lines of emulated code and finally printed the
nope :( to the screen:
0000077b JMP NZ CONDITIONAL 0 to addr 0x4ac 0000077c PUSH 0x1008 00000781 PUSH [0x1008] -> 0x0 00000782 PUSH 0x1 00000787 CMP 0x1 and 0x0 -> 1 00000788 PUSH 0x89e 0000078d JMP Z CONDITIONAL 1 to addr 0x89e 0000078e PUSH 0x8c9 00000793 PUSH 0x8 00000798 PUSH 0x1 0000079d PRINTING from @ 0x8c9 (8 bytes) 0000079d nope :(
The emulator and decompiler can be found here:
I saw that my input was again xored with some values stored in-memory. However, this time, the memory was mutated during the initial execution steps.
# input AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 000004be PUSH [0x100c] -> 0x41414141 000004bf PUSH 0x4fe 000004c4 PUSH 0x1004 000004c9 PUSH [0x1004] -> 0x0 000004ca PUSH 0x4 000004cf MULTIPLY 4 * 0 000004d0 ADD 0x0 + 0x4fe -> 0x4fe 000004d1 PUSH [0x4fe] -> 0xaa2e1cc 000004d2 XOR 0xaa2e1cc ^ 0x41414141 -> 0x4be3a08d 000004d3 PUSH 0x51e 000004d8 PUSH 0x1004 000004dd PUSH [0x1004] -> 0x0 000004de PUSH 0x4 000004e3 MULTIPLY 4 * 0 000004e4 ADD 0x0 + 0x51e -> 0x51e 000004e5 PUSH [0x51e] -> 0x65c1d3be 000004e6 CMP 0x65c1d3be and 0x4be3a08d -> 1
After extracting those values from the emulated code for every word I wrote the following solution script:
from pwn import * w1 = 0xaa2e1cc ^ 0x65c1d3be w2 = 0xbab1daba ^ 0xd6d2a1d4 w3 = 0xa12dd7c ^ 0x557cbc48 w4 = 0x571b1cb2 ^ 0x24447dd6 w5 = 0x9721c57f ^ 0xfc42f10b w6 = 0x48025565 ^ 0x2e31373a w7 = 0x80df1593 ^ 0xdfba67fc w8 = 0x7d7c2915 ^ 0xc447f sol = p32(w1) + p32(w2) + p32(w3) + \ p32(w4) + p32(w5) + p32(w6) + \ p32(w7) + p32(w8) print sol