Challenge 9 - serpentine

Given a x64 executable. Run it, it's absolute Flag checker

At main function

First, it registers an ExceptionFilter but the handler is nothing but an simple error message and exit

Key is copied into an hardcode address, then pass into a shellcode at lpAddress is called. Find reference to lpAddress, it is used in a TLS_Callback

VirtualAlloc a memory range 0x800000 bytes with RXW permission, then copy the same byte from 0x140097AF0 into that address. Jump to where the shellcode locate, it's kinda weird. First instruction is "hlt", which is an previlege instruction, can only run in kernel mode. If run in userspace, it will generate an EXCEPTION_PRIV_INSTRUCTION

As analyze before, the handler at main function just print out error message and exit, so it have to another mechanism to handle exception and it may enabled even before main function is called. IDA support to find EntryPoint, after the TLS_Callback, the main entrypoint will be called

At start, it jump to scrt_common_main_seh, and inside this function, we find what we want. Before main is called, an array of function from First to Last is called

sub_140001090: Get ntdll handle
sub_140002000: Find RtlInstallFunctionTableCallback address
sub_140001030: Call RtlInstallFunctionTableCallback

As MSDN , the API is used to add a dynamic function table to the dynamic function table list. Why is it important here ? The x64 exception handle mechanism in MSDN and excellent article on x64 say it all

So when exception occurs, the PGET_RUNTIME_FUNCTION_CALLBACK Callback will be called, it generates a runtime function, which used for handle exception. The struct of RUNTIME FUNCTION is defined as below:

Let's jump into the callback, we find the logic to generate RUNTIME FUNCTION

FunctionStart = exception_ocurr offset
FunctionEnd = FunctionStart + 1
UnwindInfo = FunctionEnd + idc.get_wide_byte(base + FunctionEnd) + 1
UnwindInfo += (UnwindInfo & 1)

The UnwindInfo hold the infomations that need to handle the exception

At first, with naive thinking, i write a simple script that find the handler and jump immediately from where exception occurs into handler. That give me the flow of that code but still a lot of stubs doesn't make sense, ie: before exception, R8 hold the address of input key, but in the handle, the value of R8 in context no longer hold that value, .. After for while, reading the the MSDN again, i found that the unwind codes array is the reason i lost the track. It's kinda similar as VM obfuscate, each unwind code is contain info with normal intense to clean the stack, but in this case, it does some "magic" work. Base on the ReactOS src, i write some script that simulate that flow of unwind code (still some bug in this code, but it help me figure out the lost track).

def handle(base, exception_offset):
start_offset = exception_offset
end_offset = start_offset + 1
unwind_offset = end_offset + get_wide_byte(base + end_offset) + 1
unwind_offset += unwind_offset & 1
unwind_info = base + unwind_offset
#print('UnwindInfo: ' + hex(unwind_info))
flags = (get_wide_byte(unwind_info) & 0xf8) >> 3
#print('FLAG: ' + hex(flags))
sizeofprolog = get_wide_byte(unwind_info + 1)
#print('Size of prolog: ' + hex(sizeofprolog))
countofcode = get_wide_byte(unwind_info + 2)
#print('COUNT OF CODE: ' + hex(countofcode))
frame = get_wide_byte(unwind_info + 3)
fram_reg = frame & 0xf
#print('FRAME REG: ' + hex(fram_reg))
framoffset = (frame & 0xf0) >> 4
#print('FRAME_OFFSET: ' + hex(framoffset))
unwindcode = unwind_info + 4
i = 0
while i < countofcode:
k = 0
#print('Uwincode address: ' + (hex(unwindcode)))
code_offset = get_wide_byte(unwindcode)
code = get_wide_byte(unwindcode + 1)
opcode = (code & 0xf)
#print('OPCODE: ' + hex(opcode) + ': ', end='')
opinfo = (code & 0xf0) >> 4
if opcode == UWOP_PUSH_NONVOL:
insn = 'pop ' + regs[opinfo]
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
k = 1
elif opcode == UWOP_ALLOC_LARGE:
if opinfo != 0:
insn = 'add rsp, ' + hex(get_wide_dword(unwindcode + 2))
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
k = 3
else:
insn = 'add rsp, ' + hex(get_wide_word(unwindcode + 2) * 8)
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
k = 2
elif opcode == UWOP_ALLOC_SMALL:
insn = 'add rsp, ' + hex(8 * (opinfo + 1))
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
k = 1
elif opcode == UWOP_SET_FPREG:
reg = regs[fram_reg]
delta = framoffset * 16
if delta:
insn = 'mov rsp, ' + reg + ' - ' + hex(delta) +']'
else:
insn = 'mov rsp, ' + reg
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
k = 1
elif opcode == UWOP_SAVE_NONVOL:
reg = regs[opinfo]
offset = get_wide_word(unwindcode + 2)
insn = 'mov ' + reg + ', [rsp + ' + hex(offset) +']'
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
k = 2
elif opcode == UWOP_SAVE_NONVOL_FAR:
reg = regs[opinfo]
offset = get_wide_dword(unwindcode + 2)
insn = 'mov ' + reg + ', [rsp + ' + hex(offset) +']'
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
k = 3
elif opcode == UWOP_EPILOG:
k = 1
elif opcode == UWOP_SPARE_CODE:
k = 2
elif opcode == UWOP_PUSH_MACHFRAME:
insn = 'add rsp, ' + hex(opinfo * 8)
#print(insn)
exception_offset += patch_asm(base + exception_offset, insn)
insn = 'mov rip, [rsp]'
#print(insn)
insn = 'mov rsp, [rsp + 0x18]'
exception_offset += patch_asm(base + exception_offset, insn)
#print(insn)
k = 1
i += k
unwindcode += k * 2
countofcode *= 2
countofcode += countofcode % 4
handler = base + get_wide_dword(unwind_info + 4 + countofcode)
# delta = handler - base - exception_offset - 5
# patch_byte(base + exception_offset, 0xe9)
# patch_dword(base + exception_offset + 1, delta)
# create_insn(base + exception_offset)
set_cmt(base + start_offset, hex(unwind_info) + ', ' + hex(handler), 0)
return handler

Beside that obfuscation x64 exception mechanism, the code at Exception Handler is also obfuscated

The inside the call always has the same logic as below

First, it pop the return address into hardcode address, then it decrypt the real instruction by moving the opcode of real instruction and the then return address is calculated. So each instruction is actual just do one instruction at a time, which is the decrypted code. For deobfuscate the call obfuscate, i use this silly code

def patch_call(call, start_call, ret):
ea = start_call
create_insn(ea) # pop instruction
poped_addr = get_operand_value(ea, 0)
patch_qword(poped_addr, ret)
ea += 0xe # point to mov
insn_len = create_insn(ea)
ah = get_wide_byte(get_operand_value(ea, 1)) << 8
ea += insn_len # point to lea
insn_len = create_insn(ea)
eax = ah + get_operand_value(ea, 1)
ea += insn_len # point to mov
addr_patch = get_operand_value(ea, 0)
patch_dword(addr_patch, eax)
ea = addr_patch
real_insn = create_insn(ea)
if print_insn_mnem(ea) == 'jmp':
jmp = get_operand_value(ea, 0)
delta = jmp - call - 5
patch_byte(call, 0xe9)
patch_dword(call + 1, delta)
return jmp
#in case it not a jmp to htl
for i in range(real_insn):
patch_byte(call + i, get_wide_byte(ea + i))
create_insn(call)
ea = poped_addr + 8
create_insn(ea)
delta = get_operand_value(ea, 1) # get the real return value
return ret + delta

Dump the deobfuscated shellcode to a file, investigate on that code, we can found some key concept below:

There' re 32 checks, everytime check fail, it jump into function sub_1400011F0 which print out "Wrong key" message and terminate program.

For each check, it is a equation of operation include plus, xor, mul where input key is manage to join in mul operation, each equation take random 8 characters from input key, each character mul with a hardcode number, then xor/add with another hardcode number
The hardcode value is not show up clearly in the code but hided by some simple maths (push a number on stack, add/sub a number, the result is the actual value, …) which can be quickly remove by find the pattern replace with coressponse value of that value.

Almost value is manual construct byte by byte (except the value used to mul with input key character), each byte of the value appear in form of array, corresponse to two type of operations xor and add. the final value can be managed to collect with help of idapython script. But compare to actual value we find when debug, some of them are slicely different.

At this point, we decided to debug and log the results of every operation, then convert each result back to the 'actual value' used in the operations. After debugging the first check, we can reimplement it in Python for the first case like this

def c1():
result = data[0x4] * 0xef7a8c
result = result + 0x9d865d8d
result = result - (data[0x18] * 0x45b53c)
result = result + 0x18baee57
result = result - (data[0x0] * 0xe4cf8b)
result = result + 0x6ec04422
result = result - (data[0x8] * 0xf5c990)
result = result + 0x6bfaa656
result = result ^ (data[0x14] * 0x733178)
result = result ^ 0x61e3db3b
result = result ^ (data[0x10] * 0x9a17b8)
result = result + 0x35d7fb4f
result = result ^ (data[0xc] * 0x773850)
result = result ^ 0x5a6f68be
result = result ^ (data[0x1c] * 0xe21d3d)
result = result ^ 0x5c911c23
result = result + 0x7e9b8687
result &= 0xffffffffffffffff
return result

We still have 31 functions like this left, and while debugging each one is possible, it would be extremely time-consuming. I decided to automate the process using IDAPython. My plan is to dump the values of each operation and then back-calculate them to find the hardcoded numbers. I patched the input to \x00 * 32 to make the calculation easier. With my deobfuscated shellcode dump file, it's easy to locate the offset of any instruction. First, I found that there are only 32 test instructions, each one checking the final value in each function

I’ll set breakpoints at each of these 32 test instructions.

Each of the 32 checks will trigger a conditional jump to a Print "Wrong" function if the check fails. To automate the debugging process, you’ll need to ensure that the instruction pointer (RIP) is automatically adjusted after each failed check. To handle this, I’ll also set 32 additional breakpoints on each cmovnz instruction. Every time a breakpoint on a cmovnz instruction is hit, it will automatically adjust RIP to point to the correct jmp instruction to continue debugging smoothly. Below is my script to dump the values:


test_array =[0x179fd,0x2f386,0x4751a,0x5d2cd,0x7230b,0x8917b,0xa0de8,0xb7d00,0xd0742,0xe7b7b,0xff3a3,0x1164ea,0x12ce52,0x14492b,0x15ec1e,0x176fbd,0x190732,0x1a5a58,0x1bc75d,0x1d572b,0x1ecdf1,0x205671,0x21b636,0x22f442,0x243e47,0x25a19e,0x26ec62,0x285d0b,0x29e558,0x2b5f19,0x2cd429,0x2e4b8f]
cmove_array = [0x17a07, 0x2f390, 0x47524, 0x5d2d7, 0x72315, 0x89185, 0xa0df2, 0xb7d0a, 0xd074c, 0xe7b85, 0xff3ad, 0x1164f4, 0x12ce5c, 0x144935, 0x15ec28, 0x176fc7, 0x19073c, 0x1a5a62, 0x1bc767, 0x1d5735, 0x1ecdfb, 0x20567b, 0x21b640, 0x22f44c, 0x243e51, 0x25a1a8, 0x26ec6c, 0x285d15, 0x29e562, 0x2b5f23, 0x2cd433, 0x2e4b99]
base = get_qword(0x14089B8E0)
for i in range(len(test_array)):
idaapi.create_insn(base+test_array[i])
add_bpt(base+test_array[i],0,BPT_DEFAULT);
for i in range(len(cmove_array)):
idaapi.create_insn(base + cmove_array[i])
add_bpt(base + cmove_array[i], 0, BPT_DEFAULT)
for i in range(len(test_array)+len(cmove_array)):
ida_dbg.continue_process()
idaapi.wait_for_next_event(WFNE_SUSP, -1)
if(idc.print_insn_mnem(get_reg_value("rip"))=="cmovnz"):
set_reg_value(get_reg_value("rip")+idaapi.create_insn(get_reg_value("rip"))+idaapi.create_insn(get_reg_value("rip")+idaapi.create_insn(get_reg_value("rip"))),"rip")
else:
offset = get_reg_value("rip")-base
value_final = get_reg_value(print_operand(get_reg_value("rip"),0))
print(hex(value_final),end=",")
f.close()

We will got 32 final value in mỗi hàm với input is \x00 * 32 :

0x5be3e290,0x62a2a0fb,0x3ade6641,0x18d62e9b4,0xffffffffd01904a1,0xffffffffbbe5233e,0x8478c40,0xffffffffb1b9939b,0xfffffffff31da2f5,0x8c8bc76,0xffffffffd45091a6,0x1a35812a,0xffffffffd8bc839e,0xffffffffe724ad90,0x1a1d901a,0xffffffff81514d31,0xffffffff530d9146,0x5db36e01,0x6c511b62,0xffffffff4f1727a1,0x213022fe,0xfffffffff4d5f043,0xffffffff92a2ef25,0xfffffffedf1a2ab0,0xffffffff750e3b65,0x448addb5,0xfffffffffe691b35,0xffffffffed2a9d03,0x224e42c5,0xfffffffda375005b,0xfffffffe71ef098c,0xffffffffbcc44d66

Similarly, I set a breakpoint on the mul instruction to dump the factor values in rsp

After the first mul, the value is pushed to the stack. Starting from the second mul, it will use the result of the previous calculation to perform an xor, add, or sub with the newly multiplied result. It's easy to calculate the offsets of these instructions from the mul instruction's address (as they appear just a few instructions after mul) using my deobfuscated shellcode dump file. The output for the first three check functions will look like this.

Now, we have dumped the sign and factors of each multiplication operation. Next, I set a breakpoint on the ldmxcsr instruction to dump the hardcoded value used for calculations between the multiplication operations (mul).

My dump:

The final piece of data we need is the sign of the operations between the multiplication steps. Through debugging, I found a pattern: if the operation is an xor, there is an array containing the xor values, indexed accordingly:

For example, if you want to XOR 0x5 with 0x32, there is an array pre-calculated with the XOR results. By taking the address of the value 0x32 and adding 5, you get the result of 0x32 ^ 0x5 = 0x37. The array looks like this:

From all of the above, I was able to reimplement all 32 check functions and use Z3 to find the flag And Here is Flag: $$_4lway5_k3ep_mov1ng_and_m0ving@flare-on.com

Challenge 10 - Catbert Ransomware

10 - Catbert Ransomware

Initial analysis

The given files include a UEFI boot image and an encrypted disk image. First, i tried booting this with QEMU with command: qemu-system-x86_64 -bios bios.bin -drive file=disk.img,format=raw The disk contains three encrypted images and an EFI file

After some exploration, I found the command decrypt_file, which can be used to decrypt the three .c4tb files shown above

Next, I extracted the bios.bin file using UEFITool with the unpack option, which yielded numerous files. Since I am not very familiar with UEFI and wasn’t sure what to do next , I used a "super" tool — strings | grep — to search for the decrypt_file string in all the extracted files, hoping to locate the file that processes this command. And boom! Look what we found — a PE file!

Shell_body.bin

It's a PE file running in UEFI

Let’s load it into IDA with plugin efiXplorer. It's easy to find the function sub_31BC4() that handles the decrypt_file command.

By running the efiXplorer plugin, we can easily resolve many library function names

First, it checks if the signature of the encrypted file is C4TB

I created two structs and renamed some parameters to make it easier to read

struct c4tb
{
  DWORD signature;
  DWORD enc_length;
  DWORD vmcode_offset;
  DWORD length_vmcode;
  char *enc_data_offset;
};

struct x
{
  DWORD signature;
  DWORD encrypted_data_length;
  DWORD vmcode_offset;
  DWORD len_vmcode;
  __int64 decrypt_buffer;
  __int64 enc_buffer;
};

This code snippet reads the c4tb file and copies the input to vmcode:

c4tb File Format Structure

It then checks the key we input against vmcode in the CheckKey() function (we’ll discuss this later). If the key is correct, it will decrypt the image using the RC4 algorithm

After successfully decrypting three images, it will decrypt the EFI using the key that was used to decrypt images 1 and 3:

So, our task now is to find the key to decrypt the three images.

VMCode Compiler

Now, let’s go back to the CheckKey() function. It’s a compiler for vmcode, consisting of opcodes from 0x00 to 0x26 that perform operations similar to push, pop, xor, and, shl, and others.

I have reimplemented it in Python (here). With an input of aaaaaaaaaaaaaaaa, it will print out as follows:

push 0x0
push 0x6161
MOV STORE[0x0],0x6161
push 0x1
push 0x6161
MOV STORE[0x1],0x6161
push 0x2
push 0x6161
MOV STORE[0x2],0x6161
push 0x3
push 0x6161
MOV STORE[0x3],0x6161
push 0x4
push 0x6161
MOV STORE[0x4],0x6161
push 0x5
push 0x6161
...
...
...

Image 1

The vmcode for checking the key to decrypt Image 1 performs operations like this

input = [0x44,0x61,0x43,0x75,0x62,0x69,0x63,0x6c,0x65,0x4c,0x69,0x66,0x65,0x31,0x30,0x31]
cipher = [0x44,0x61,0x34,0x75,0x62,0x69,0x63,0x6c,0x65,0x31,0x69,0x66,0x65,0x62,0x30,0x62]
def lose():
    print("wrong")
    exit()
def win():
    print("correct")
    exit()
for i in range(16):
    if i == 2:
        if(input[i]!=0xff&((cipher[i] >> 0x4) | (cipher[i] <<0x4))):
            lose()
        continue
    if i == 9:
        if(input[i]!=0xff&((cipher[i] << 0x6) | (cipher[i] >> 0x2))):
            lose()
        continue
    if i == 13 or i == 15:
        if(input[i]!=0xff&((cipher[i] >> 0x1) | (cipher[i] << 0x7))):
            lose()
        continue
    if (input[i] != cipher[i]):
        lose()
win()

The first key is easy to find: DaCubicleLife101

We’ve obtained the first part of the flag

Image 2

Key 2 verification algorithm

input = [0x47, 0x33, 0x74, 0x44, 0x61, 0x4a, 0x30, 0x62, 0x44, 0x30, 0x6e, 0x65, 0x4d, 0x34, 0x74, 0x65]
value = [0x1e, 0x93, 0x39, 0x2e, 0x42, 0x94, 0xf0, 0x46, 0xa6, 0x54, 0xdf, 0x3c, 0x4a, 0x46, 0x28, 0x1a]
cipher = [0x59, 0xa0, 0x4d, 0x6a, 0x23, 0xde, 0xc0, 0x24, 0xe2, 0x64, 0xb1, 0x59, 0x07, 0x72, 0x5c, 0x7f]

def lose():
    print("wrong")
    exit()
def win():
    print("correct")
    exit()

for i in range(len(input)):
    if(input[i]^value[i]!=cipher[i]):
        lose()
win()

Decrypt image 2 with Key G3tDaJ0bD0neM4te

Image 3

The first 4 characters of the key are checked using the DJB2 hash algorithm:

def djb2_hash(string):
    hash_value = 0x1505
    for char in string:
        hash_value = (hash_value *33) + ord(char)
    return (hash_value & 0xFFFFFFFF) == 0x7c8df4cb

I used brute force and got many plausible results, but only VerY seemed meaningful. The next 4 characters are simply a rotation by 13 (ror 13) added to the input:

def ror13AddHash32(string):
    val = 0
    for i in string:
        val = ror(val, 0xd, 32)
        val += i
    return (val & 0xffffffff) == 0x8b681d82

Easy to find that an input that satisfies the condition is DumB The last 8 characters are checked using Adler-32

def adler32(input_vals):
    MOD_ADLER = 0xFFF1
    s1 = 1
    s2 = 0

    for byte in input_vals:
        s1 = (s1 + byte) % MOD_ADLER
        s2 = (s2 + s1) % MOD_ADLER

    return (s2 << 16) | s1 == 0xf910374