This blogpost will show some techniques I personally like to use, that involve Python.
Unlike other publications I had, this is an "opinion" or "usage" blogpost, so take what I write here as my favorite techniques rather than things set in stone.
Of course, I could be talking about pwntools (which I slightly covered in my introduction to pwn blogpost) or any other module, but I thought I'd show things that are more unique.
In the first part of this blogpost, we'll show how to "move from C to Python" - assuming we have native code execution and show how to execute Python code.
In the second part - we'll do the opposite - we'll use ctypes to interact with low-level API using Python.
Python is awesome. These days there's a lot of debate on how much we should be using Python due to its performance compared to C, but in my opinion that's a fallacy.
I use Python when I need to use Python, and I use C when I need to use C. Also, last time I checked, scripting languages such as PowerShell on Windows or AppleScript on macOS are used by real malware authors all the time...
In any case, Python's advantages are clear:
- It's popular (has tons of libraries to use, has rich community of developers).
- It's open-source.
- It has amazing integration with C (more on that later).
- It's very easy to express ideas quickly into code without worrying about strong typing or freeing memory (in most cases).
Now that we got that out of the way, I'd like to show a quick introduction on how to move from C to Python execution and vice-versa.
One thing I sometimes might do is compile CPython myself, add functionality etc. and ship it, if I need a quick implant.
This is quite easy to do but might end up with a large binary, which might not be adventagous, albeit being a bit harder for defenders to sign.
One other trick I use and I haven't seen anyone use is load Python dynamically on systems that do have Python (commonly Linux, some macOS).
You can easily find your dynamic library path like that:
from distutils.sysconfig import get_config_var
import os
print(os.path.join(get_config_var('LIBDIR'), get_config_var('LDLIBRARY')))For example, on my Linux box, I got /usr/lib/x86_64-linux-gnu/libpython3.10.so.1.0.
We could use this library to execute arbitrary Python code straight from C! Here's why:
jbo@hax:~$ readelf -sW /usr/lib/x86_64-linux-gnu/libpython3.10.so | grep Run_SimpleString
1118: 000000000020d660 130 FUNC GLOBAL DEFAULT 14 PyRun_SimpleStringFlags
1214: 000000000020d6f0 11 FUNC GLOBAL DEFAULT 14 PyRun_SimpleString
The documentation for it is clear:
int PyRun_SimpleString(const char *command);If we try to execute it as-is we'll get a segmentation fault because we need to "initialize an environment", which is really a function Py_Initialize.
We will also call a corresponding Py_Finalize function to finalize the environment (cleanup), and now we could run any code we'd want!
#include <stdio.h>
#include <dlfcn.h>
#include <stdbool.h>
typedef void (*py_init_t)();
typedef void (*py_fini_t)();
typedef int (*py_exec_t)(const char*);
int
main(
int argc,
char** argv
)
{
int result = -1;
char* libpython_path = NULL;
char* python_code = NULL;
void* handle = NULL;
py_init_t py_init = NULL;
py_fini_t py_fini = NULL;
py_exec_t py_exec = NULL;
bool should_finalize = false;
// Validate argument(s)
if (3 > argc)
{
fprintf(stderr, "Missing arguments.\nUsage: %s [LIBPYTHON_PATH] [PYTHON_CODE]\n", argv[0]);
goto cleanup;
}
// Get arguments
libpython_path = argv[1];
python_code = argv[2];
// Load Python
handle = dlopen(libpython_path, RTLD_LAZY);
if (NULL == handle)
{
fprintf(stderr, "Error loading Python library.\n");
goto cleanup;
}
// Resolve symbols
py_init = dlsym(handle, "Py_Initialize");
if (NULL == py_init)
{
fprintf(stderr, "Error resolving symbol \"Py_Initialize\".\n");
goto cleanup;
}
py_fini = dlsym(handle, "Py_Finalize");
if (NULL == py_init)
{
fprintf(stderr, "Error resolving symbol \"Py_Finalize\".\n");
goto cleanup;
}
py_exec = dlsym(handle, "PyRun_SimpleString");
if (NULL == py_exec)
{
fprintf(stderr, "Error resolving symbol \"PyRun_SimpleString\".\n");
goto cleanup;
}
// Initialize environment
py_init();
should_finalize = true;
// Execute command and propagate result
result = py_exec(python_code);
cleanup:
// Free resources
if (should_finalize)
{
py_fini();
}
if (NULL != handle)
{
dlclose(handle);
handle = NULL;
}
// Return result
return result;
}Note how simple that is:
jbo@hax:~$ gcc -oc2py ./c2py.c
jbo@hax:~$ ./c2py /usr/lib/x86_64-linux-gnu/libpython3.10.so "import os;os.system('id');"
uid=1000(jbo) gid=1000(jbo) groups=1000(jbo),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare),142(libvirt)
Of course you can make this code stealthy - but my point is using the Python library as a living-off-the-land technique.
In macOS, it's even worse since EDRs might rely on a Python commandline or script on disk - this is completely different, of course.
I've included the source of c2py in this repository for you to download and exepriment with.
This transition might seem odd - we already have the power of Python, why would we need to work in native code?
Well, sometimes you might want to do low-level work, interacting with memory, working with shellcodes and so on.
The advantage of working with Python here is clear - we benefit from running in a process that can run anything, with its logic not present on disk.
So, for this blogpost, we'll do an interesting exercise - write a DLL injector for Windows!
As a reminder, here's what a C-coded injector would do (I've shown how to implement that exactly in a past blogpost):
- Given a process name, map it to a process ID by calling the CreateToolhelp32Snapshot API and calling Process32First and Process32Next APIs.
- Once a PID is found - call OpenProcess on it to get a handle.
- Now we'd like to write the DLL name in the foreign process address space - we first allocate memory via the VirtualAllocEx API and then write the DLL path there with WriteProcessMemory.
- Finally, we call CreateRemoteThread API on the address of kernel32!LoadLibraryW and the argument which is the foreign process memory address we allocated and wrote on.
So, we will do all of that one step at-a-time, with the best Python module - ctypes!
The ctypes module comes pre-shipped with Python and is capable of interacting with shared libraries, raw memory and so on.
It's quite a heavy module in terms of the functionality it exposes, so we'll learn by example.
The first part we'd want to do is convert the process name into an ID, and use the CreateToolhelp32Snapshot, Process32First and Process32Next APIs. For that, we'll need a handle to kernel32.dll, which is the DLL that implements all of those APIs.
In ctypes, we have the capability to load libraries:
cdllis used for cdecl calling convension, and is quite useful on Linux and macOS.windllis used for the stdcall calling convension and is the calling convension used for most Windows APIs.
Thus, we can get kernel32.dll handle easily:
import ctypes
kernel32 = ctypes.windll.LoadLibrary('kernel32.dll')We can already refer to functions exported by kernel32 but we'd need to help Python understand the structures, argument types and return type of functions.
Let's start with CreateToolhelp32Snapshot:
HANDLE CreateToolhelp32Snapshot(
[in] DWORD dwFlags,
[in] DWORD th32ProcessID
);We need to declare the function gets two DWORDs and returns a HANDLE, and we can do it easily:
import ctypes
from ctypes import wintypes
# Get kernel32
kernel32 = ctypes.windll.LoadLibrary('kernel32.dll')
# Prototype - kernel32!CreateToolhelp32Snapshot
kernel32.CreateToolhelp32Snapshot.argtypes = [ wintypes.DWORD, wintypes.DWORD ]
kernel32.CreateToolhelp32Snapshot.restype = wintypes.HANDLENote I use wintypes which is imported from ctypes - if you're dealing with raw C you can use ctypes types:
| C type | ctypes type |
|---|---|
| bool | ctypes.c_bool |
| unsigned char | ctypes.c_byte |
| char | ctypes.c_char |
| char* | ctypes.c_char_p |
| double | ctypes.c_double |
| int | ctypes.c_int |
| int64t | ctypes.c_int64 |
| uint64t | ctypes.c_uint64 |
| long | ctypes.c_long |
| size_t | ctypes.c_size_t |
| void* | ctypes.c_void_p |
The wintypes types are just nice definitions of existing ctypes types - for example:
HANDLE = ctypes.c_void_pLet's continue to the next API - Process32First:
BOOL Process32First(
[in] HANDLE hSnapshot,
[in, out] LPPROCESSENTRY32 lppe
);First note MSDN "lies" - Process32First is either an ANSI or a Wide version - we really would like to work with the wide version since this is how Windows works internally, as well as Python strings as wide by default. So, we'll refer to the API as kernel32.Process32FirstW.
The other problem is the data structure PROCESSENTRY32W (note that I took the wide version of the structure!), which is luckily documented:
typedef struct tagPROCESSENTRY32W {
DWORD dwSize;
DWORD cntUsage;
DWORD th32ProcessID;
ULONG_PTR th32DefaultHeapID;
DWORD th32ModuleID;
DWORD cntThreads;
DWORD th32ParentProcessID;
LONG pcPriClassBase;
DWORD dwFlags;
WCHAR szExeFile[MAX_PATH];
} PROCESSENTRY32W;Luckily, we can easily define that structure in ctypes!
MAX_PATH = 260
ULONG_PTR = wintypes.LPVOID
# Define the PROCESSENTRY32W structure
class PROCESSENTRY32W(ctypes.Structure):
_fields_ = [
('dwSize', wintypes.DWORD),
('cntUsage', wintypes.DWORD),
('th32ProcessID', wintypes.DWORD),
('th32DefaultHeapID', ULONG_PTR),
('th32ModuleID', wintypes.DWORD),
('cntThreads', wintypes.DWORD),
('th32ParentProcessID', wintypes.DWORD),
('pcPriClassBase', wintypes.LONG),
('dwFlags', wintypes.DWORD),
('szExeFile', wintypes.WCHAR * MAX_PATH)
]There are a few interesting takeaways here:
- Note each sturcture like that is a Python class. Inheriting from
ctypes.Structuremeans Python will be looking for a member_fields_which is a list of 2-tuplesname -> type. - Note how the last member has a type of
wintypes.WCHAR * MAX_PATH- yes,ctypesallow defining a type of an array like that (equivalent toWCHAR[260]). - In C, structures by default might have padding (unless defined packed) -
ctypestakes care of that too and thus thePROCESSENTRY32Wclass is binary-compatible with the WindowsPROCESSENTRY32Wstructure! - I defined
ULONG_PTRmanually since that doesn't exist inwintypes.
Now we can easily declare the prototype for the functions:
ctypes.kernel32.Process32FirstW.argtypes = [ wintypes.HANDLE, ctypes.POINTER(PROCESSENTRY32W) ]
ctypes.kernel32.Process32FirstW.restype = wintypes.BOOL
ctypes.kernel32.Process32NextW.argtypes = [ wintypes.HANDLE, ctypes.POINTER(PROCESSENTRY32W) ]
ctypes.kernel32.Process32NextW.restype = wintypes.BOOLNote how you can use ctypes.POINTER to define a pointer to an existing type - you can also use the builtin types (e.g. ctypes.POINTER(ctypes.c_int) is equivalent to C's int*.
Now we can write our first function that gets a process name and finds its PID:
import ctypes
from ctypes import wintypes
# Windows definitions
MAX_PATH = 260
TH32CS_SNAPPROCESS = 0x00000002
INVALID_HANDLE_VALUE = wintypes.HANDLE(-1)
SIZE_T = ctypes.size_t
ULONG_PTR = wintypes.LPVOID
# Get kernel32
kernel32 = ctypes.windll.LoadLibrary('kernel32.dll')
# Define the PROCESSENTRY32W structure
class PROCESSENTRY32W(ctypes.Structure):
_fields_ = [
('dwSize', wintypes.DWORD),
('cntUsage', wintypes.DWORD),
('th32ProcessID', wintypes.DWORD),
('th32DefaultHeapID', wintypes.LPVOID),
('th32ModuleID', wintypes.DWORD),
('cntThreads', wintypes.DWORD),
('th32ParentProcessID', wintypes.DWORD),
('pcPriClassBase', wintypes.LONG),
('dwFlags', wintypes.DWORD),
('szExeFile', wintypes.WCHAR * MAX_PATH)
]
# Prototype - kernel32!CreateToolhelp32Snapshot
kernel32.CreateToolhelp32Snapshot.argtypes = [ wintypes.DWORD, wintypes.DWORD ]
kernel32.CreateToolhelp32Snapshot.restype = wintypes.HANDLE
# Prototype - kernel32!CloseHandle
kernel32.CloseHandle.argtypes = [ wintypes.HANDLE ]
kernel32.CloseHandle.restype = wintypes.BOOL
# Prototype - kernel32!Process32FirstW
kernel32.Process32FirstW.argtypes = [ wintypes.HANDLE, ctypes.POINTER(PROCESSENTRY32W) ]
kernel32.Process32FirstW.restype = wintypes.BOOL
# Prototype - kernel32!Process32NextW
kernel32.Process32NextW.argtypes = [ wintypes.HANDLE, ctypes.POINTER(PROCESSENTRY32W) ]
kernel32.Process32NextW.restype = wintypes.BOOL
def find_pid(process_name:str) -> int:
"""
Finds a process by its name and returns its PID.
"""
# Save the lowercase process name
process_name_lower = process_name.lower()
# Handle exceptions
snapshot = INVALID_HANDLE_VALUE
try:
# Get a snapshot
snapshot = kernel32.CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0)
assert snapshot != INVALID_HANDLE_VALUE, Exception('kernel32!CreateToolhelp32Snapshot failed')
# Get the first process
entry = PROCESSENTRY32W()
entry.dwSize = ctypes.sizeof(entry)
assert kernel32.Process32FirstW(snapshot, ctypes.byref(entry)), Exception('kernel32!Process32FirstW failed')
# Iterate the snapshot
while True:
# Compare entry name (case insensitive) and return the PID if found
if entry.szExeFile.lower() == process_name_lower:
return entry.th32ProcessID
# Continue to the next entry
assert kernel32.Process32NextW(snapshot, ctypes.byref(entry)), Exception(f'Process name "{process_name}" not found')
# Cleanup
finally:
# Free resources
if snapshot != INVALID_HANDLE_VALUE:
kernel32.CloseHandle(snapshot)Note some intersting insights:
- I defined
kernel32!CloseHandleprototype as well because we need to clean up Windows handles when we're done (the snapshot handle). - Note how we use
try..finallyto cleanup the snapshot - this is a common Python pattern for cleaning up resources. - Note how we access
PROCESSENTRY32Wmembers using Python literals and syntax, including treatingszExeFileas a Python string! - Note the use of
ctypes.byrefto pass a variable by its reference (just like the&operator you'd use in C).
Similarly, we'll define the main prototypes we require for injecting the DLL:
# Type - SIZE_T
SIZE_T = ctypes.size_t
# Prototype - kernel32!OpenProcess
kernel32.OpenProcess.argtypes = [ wintypes.DWORD, wintypes.BOOL, wintypes.DWORD ]
kernel32.OpenProcess.restype = wintypes.HANDLE
# Prototype - kernel32!VirtualAllocEx
kernel32.VirtualAllocEx.argtypes = [ wintypes.HANDLE, wintypes.LPVOID, SIZE_T, wintypes.DWORD, wintypes.DWORD ]
kernel32.VirtualAllocEx.restype = wintypes.LPVOID
# Prototype - kernel32!WriteProcessMemory
kernel32.WriteProcessMemory.argtypes = [ wintypes.HANDLE, wintypes.LPVOID, wintypes.LPVOID, SIZE_T, ctypes.POINTER(SIZE_T) ]
kernel32.WriteProcessMemory.restype = wintypes.BOOL
# Prototype - kernel32!GetProcAddress
kernel32.GetProcAddress.argtypes = [ ctypes.wintypes.HMODULE, ctypes.wintypes.LPCSTR ]
kernel32.GetProcAddress.restype = wintypes.LPVOID
# Prototype - kernel32!CreateRemoteThread
kernel32.CreateRemoteThread.argtypes = [ wintypes.HANDLE, wintypes.LPVOID, SIZE_T, wintypes.LPVOID, wintypes.LPVOID, wintypes.DWORD, ctypes.POINTER(wintypes.DWORD) ]
kernel32.CreateRemoteThread.restype = wintypes.HANDLE- Note how in
CreateRemoteThreadI skipped defining the structure forSECURITY_ATTRIBUTESand usedwintypes.LPVOIDinstead - since I know I will be supplyingNone(the Python way of supplying a CNULL) there's no need to really define it as a structure. - I also skipped defining the
THREAD_START_ROUTINE. Inctypes, you can easily do that with ctypes.WINFUNCTYPE, but since I will simply be supplying the address of kernel32, I will not be needing it.
Now let's get to the injection itself:
PROCESS_CREATE_THREAD = 0x0002
PROCESS_VM_OPERATION = 0x0008
PROCESS_VM_READ = 0x0010
PROCESS_VM_WRITE = 0x0020
MEM_COMMIT = 0x00001000
MEM_RESERVE = 0x00002000
PAGE_READWRITE = 0x04
def inject_to_pid(pid:int, dll_path:str):
"""
Injects a DLL to the given PID.
"""
# Handle exceptions
proc = None
remote_thread = None
try:
# Get the address of kernel32!LoadLibraryW
load_library_w_pfn = kernel32.GetProcAddress(kernel32._handle, b'LoadLibraryW\0')
assert load_library_w_pfn is not None, Exception('kernel32!GetProcAddress failed')
# Open the process
proc = kernel32.OpenProcess(PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION | PROCESS_VM_READ | PROCESS_VM_WRITE, False, pid)
assert proc is not None, Exception('kernel32!OpenProcess failed')
# Create a wide buffer containing the DLL path including the NUL terminator
dll_buffer = ctypes.create_unicode_buffer(dll_path)
# Allocate memory in foreign process
remote_addr = kernel32.VirtualAllocEx(proc, None, ctypes.sizeof(dll_buffer), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE)
assert remote_addr != 0, Exception('kernel32!VirtualAllocEx failed')
# Copy the side buffer
written = SIZE_T()
total = 0
while total < ctypes.sizeof(dll_buffer):
assert kernel32.WriteProcessMemory(proc, remote_addr + total, ctypes.addressof(dll_buffer) + total, ctypes.sizeof(dll_buffer) - total, ctypes.byref(written)), Exception('kernel32!WriteProcessMemory failed')
total += written.value
# Create the remote thread
remote_thread = kernel32.CreateRemoteThread(proc, None, 0, load_library_w_pfn, remote_addr, 0, None)
assert remote_thread is not None, Exception('kernel32!CreateRemoteThread failed')
# Cleanup
finally:
# Free resources
if remote_thread is not None:
kernel32.CloseHandle(remote_thread)
if proc is not None:
kernel32.CloseHandle(proc)- Note how I use
ctypes.addressoffto get an address of a variable - I use that in caseWriteProcessMemoryonly wrote a partial buffer. - Note
ctypes.create_unicode_buffer, which also takes care of the NUL terminator for me (in C you'd have to remember copying the NUL terminator in principal). - Note I used
kernel32._handleto get the base address ofkernel32(I could have usedkernel32!GetModuleHandleWbut I already have a handle tokernel32anyway). - Again note the nice cleanup in a
try..finallypattern.
I have uploaded the source code of the complete Python Windows DLL injector to injector.py.
The C-to-Python angle might be a bit easier to digest, but some of you might still be asking - why go through all of this trouble when you can just compile your code (for going from Python to C)?
Well, in my perspective, after you've done that enough times, you'd accumulate enough code that wraps useful APIs (or use an LLM, if you trust them).
I've done similar things for COM, WinAPI, much of the Linux glibc and some macOS API from both libSystem.dylib as well as some private frameworks.
In any case, I hope this blogpost has been useful.
Stay tuned!
Jonathan Bar Or