WerWolv

Blog Component Zoo

WerWolv — 2026-04-21T00:00:00.000Z

import MdxPreview from ’./assets/MdxPreview’

This is a collection of reusable components that I have created for my blog.

This component calculates the age based on the birth date and the current date.

import Age from '@/Age'

Hello, I am years old.

This component creates a link to an archived version of a webpage.

import ArchivedLink from '@/ArchivedLink'

My old blog from 2020 is archived on the Wayback Machine.

This component creates a horizontal divider with an icon in the middle.

import Divider from '@/Divider'

This component adds a fold-out panel to add extra information for something without cluttering the main text.

import ExtraInformation from '@/ExtraInformation'

The 
  The **USB Device Descriptor** is a data structure that provides information about a USB device, such as its vendor ID, product ID, and supported USB version. It is used by the host to identify and configure the device.
 is a crucial part of the USB protocol.

import ExtraInformation from ’@/ExtraInformation’

The The USB Device Descriptor is a data structure that provides information about a USB device, such as its vendor ID, product ID, and supported USB version. It is used by the host to identify and configure the device. is a crucial part of the USB protocol.

This component creates a figure with an image and a caption.

import Figure from '@/Figure'

This component renders a file tree structure based on a given data structure.

import FileTree from '@/FileTree'


- src
  - components
    - Button.tsx
    - Card.tsx
  - pages
    - index.astro
    - about.astro

import FileTree from ’@/FileTree’

- src - components - Button.tsx - Card.tsx - pages - index.astro - about.astro

This component creates a horizontal layout for its children, allowing for side-by-side content.

import Horizontal from '@/Horizontal'

Left Content

Right Content

This component displays a given time in the user’s local timezone.

import LocalTime from '@/LocalTime'

`12:00` my time is  for you.

import LocalTime from ’@/LocalTime’

12:00 my time is for you.

This component creates a styled panel for notes or important information.

import NotePanel from '@/NotePanel' This is an information.
This is a note.
This is a hint.
This is an alert.
This is an idea.
Something blew up.
This will take some time.
Take a look somewhere else.

This component renders digital signal waveforms, useful for visualizing binary data or timing diagrams.

import Oscilloscope from '@/Oscilloscope'

import Oscilloscope from ’@/Oscilloscope’

This component embeds a link to a PDF document with an optional fragment

import PdfDocument from '@/PdfDocument' import datasheet from './assets/datasheet.pdf'

Datasheet for the STM32F446 microcontroller.

This component renders a hand-drawn pencil underline effect for emphasizing text.

import PencilMarked from '@/PencilMarked'

This is a pencil underline effect. Every single one will look different, just like if you would draw it by hand.

This component renders a styled blockquote for highlighting quotes or important text.

import Quote from '@/Quote' Goats are like mushrooms. If you shoot a duck, I am scared of toasters.

This component hides its content behind a spoiler warning, which can be revealed by the user.

import Spoiler from '@/Spoiler'

In case you didn’t know, Snape kills Dumbledore.

This component renders a step-by-step guide or process flow.

import Steps from '@/Steps'

### Step 1: Do this First, you need to do this thing. It's very important to get this right.

### Step 2: Do that Next, you should do that thing. Make sure to follow the instructions carefully.

### Step 3: Finish up Finally, finish up by doing this last thing. Congratulations, you've completed the process!

This component highlights technical terms and can provide additional information when clicking on it.

import TechnicalTerm from '@/TechnicalTerm'

The  is responsible for executing instructions in a computer.

import TechnicalTerm from ’@/TechnicalTerm’

The Central Processing Unit, the brain of the computer is responsible for executing instructions in a computer.

This component renders a terminal emulator. This component is mostly useful when embedded into another component. Then it can display dynamic terminal output.

import Terminal from '@/Terminal'

import Terminal from ’@/Terminal’

This component embeds a video player for a given video source.

import Video from '@/Video'
import test_video from './assets/video.mp4'

import Video from ’@/Video’ import test_video from ’./assets/video.mp4’

This component embeds a YouTube video based on a given video ID.

import YouTube from '@/YouTube'

Adding Stack Traces to C++ Exceptions

WerWolv — 2026-04-18T00:00:00.000Z

import NotePanel from ‘packages/mdx-blog-components/src/NotePanel’;

Compared to languages like Python or Java, C++ does not provide built-in support for stack traces in exceptions. This can make debugging somewhat difficult when all you end up having is a crash log stating what(): bad optional access. With some linker ticks and a bit of code, it is possible though to add cleaner stack traces to C++ exceptions, including standard library ones.

This post focuses specifically on GCC and Clang using the Itanium C++ ABI. It will work on Linux and macOS out of the box and also on Windows when using MinGW with `libstdc++`.

For MSVC, a similar approach can be taken by hooking different functions that are used for exception handling in the MSVC runtime, but the details will be different and are not covered in this post.

The simplest way to get stack traces working for exceptions is to just create a custom exception type that captures the stack trace in the constructor and then we can print that stack trace in the what() method.

#include 
#include 
#include 
#include 

class ExceptionWithStackTrace : public std::exception {
public:
  ExceptionWithStackTrace(std::string what) {
    m_what = std::format(
      "what(): {}\n{}", 
      what,
      // Capture the stack trace
      // skipping the current frame which is the constructor itself
      std::stacktrace::current(1)
    );
  }

  auto what() const noexcept -> const char* override {
    return m_what.c_str();
  }

private:
  std::string m_what;
};

auto some_function() -> void {
  throw ExceptionWithStackTrace("Test");
}

auto main() -> int {
  try {
    some_function();
  } catch (const std::exception &e) {
    std::println("{}", e.what());
  }
}

This works great if all you need is stack traces for code you wrote yourself

$ ./custom_exceptions

what(): Test
   0#  someFunction() at main.cpp:20 [0x6520e5502718]
   1#  main at main.cpp:25 [0x6520e5502793]
   2#   [0x785ae3c29d8f]
   3#  __libc_start_main [0x785ae3c29e3f]
   4#  _start [0x6520e5502604]

For any exception that is of a different type though, including standard library exceptions, we won’t get a stack trace just like before.

To support stack traces for all exceptions, we need to somehow inject this code into the existing exceptions. However, there are quite a few of them and if libraries add their own, it would need to be added there as well. Luckily, the Itanium C++ ABI provides a simple common entry point for all exceptions that can be used.

Exceptions in the Itanium ABI are implemented using a set of functions that are automatically inserted by the compiler when you throw or catch an exception.

main:
    ; ...
    ; try {
        ; printf("Throwing\n");
        call    puts@PLT
        ; ...
        ; auto exception = std::runtime_error("My exception");
        call    __cxa_allocate_exception@PLT
        call    std::runtime_error::runtime_error(char const*)@PLT
        ; ...
        ; throw exception;
        call    __cxa_throw@PLT
        ; ...
        jmp     .catch_statement
        ; ...
        call    __cxa_free_exception@PLT
    ; }
.catch_statement:
    ; ...
    ; catch (const std::exception& e) {
        call    __cxa_begin_catch@PLT
        ; ...
        ; printf("Caught exception\n");
        call    puts@PLT
        call    __cxa_end_catch@PLT
    ; }
    ; ...
    ret

As you can see, the compiler generates calls to __cxa_allocate_exception and __cxa_throw at the call site of the throw statement and calls __cxa_begin_catch and __cxa_end_catch at the start and end of the catch block respectively. __cxa_throw is the function that actually starts the whole unwinding process so it is the perfect place to inject our stack trace code into. By hooking this function, we can capture the stack trace at the point where the exception is thrown and then we can store that stack trace in a thread-local variable that can be accessed later in the catch block.

Hooking `__cxa_throw`

To hook __cxa_throw, we can make use of the Linker to redirect all calls made to __cxa_throw to our own function. This can be done by using the -Wl,--wrap=__cxa_throw flag when linking the application.

The linker will then replace all calls to __cxa_throw with calls to __wrap___cxa_throw and it will rename the original __cxa_throw function to __real___cxa_throw.

Note the triple-underscore here, the linker appends `__wrap_` and `__real_` to the function name and `__cxa_throw` already starts with two underscores...
We can now implement our own `__wrap___cxa_throw` function that captures the stack trace and then calls the original `__cxa_throw` function.

#include 
#include 
#include 
#include 

// Storage to store up to 5 nested stack traces for exceptions
//   should be enough for most cases but can be increased if needed
thread_local std::array s_stacktraces;

// Forward declaration for the original __cxa_throw function
//   that will be called from our wrapper
extern "C" auto __real___cxa_throw(
  void *thrown_object, 
  std::type_info *tinfo, 
  void (*dest)(void *)
) -> void;

// Wrapper function for __cxa_throw that will be called
//   instead of the original one
extern "C" auto __wrap___cxa_throw(
  void *thrown_object, 
  std::type_info *tinfo, 
  void (*dest)(void *)
) -> void {
  // Called when an exception is thrown, at the call site of the `throw`

  // std::uncaught_exceptions() returns the number of currently
  //   active exceptions that have been thrown but not yet caught
  auto exception_count = std::uncaught_exceptions();

  // If there's still some space on the exception stack,
  //   capture a new stacktrace and add it to the stack
  if (exception_count < ssize_t(s_stacktraces.size())) {
    // Add the current stack trace to the end of the
    //   stack trace storage array skipping the current
    //   frame which is the __wrap___cxa_throw function itself
    s_stacktraces[exception_count] = std::stacktrace::current(1);
  }

  // Forward to original __cxa_throw()
  __real___cxa_throw(thrown_object, tinfo, dest);
}

// Simple helper function to print the stack trace
//   of the exception that's currently being caught
auto print_stacktrace() -> void {
  auto exception_count = std::uncaught_exceptions();
  if (exception_count < 0) {
    std::println("No active exception");
    return;
  } else if (exception_count >= s_stacktraces.size()) {
    std::println("Too many active exceptions");
    return;
  }

  std::print("Stacktrace:\n{}", s_stacktraces[exception_count]);
}

Now code like this here will automatically have stack traces for all exceptions, including standard library ones, without needing any further modifications to existing code. The only thing that is needed is to add the -Wl,--wrap=__cxa_throw flag when linking the final executable and all shared libraries that need this functionality.

#include 

auto main() -> int {
  try {
    // A non-existent int
    std::optional nothing;

    // This will throw a std::bad_optional_access exception
    nothing.value();
  } catch (const std::exception &e) {
    // Print the exception what() message
    std::println("Caught exception: {}", e.what());

    // Print the stack trace
    print_stacktrace();
  }
}

$ ./wrapped_exceptions

Caught exception: bad optional access
Stacktrace:
   0#  std::__throw_bad_optional_access() at optional:126 [0x5e5c24b3cf94]
   1#  std::optional::value() & at optional:1279 [0x5e5c24b3ccb2]
   2#  main at main.cpp:9 [0x5e5c24b3c8a4]
   3#   [0x7b3a63229d8f]
   4#  __libc_start_main [0x7b3a63229e3f]
   5#  _start [0x5e5c24b3c644]

There’s a few downsides with this approach that I’d like to mention.

The first one is obviously that this is highly implementation defined. It works out of the box on latest GCC and Clang at the time of writing, both using libstdc++ as their C++ standard library.

LLVM’s libc++ also uses the same exception mechanism, however it does currently not support so something like libbactrace or boost::stacktrace() would need to be used instead.

If your application throws a lot of exceptions that are caught and handled silently, this may add quite a substantial overhead as now every throw has to generate a stack trace, no matter if it’s being used or not. Definitely do some profiling if you’re using this in production code.

Finally, due to the linker magic needed here, this only works for binaries you’re linking yourself. If you rely on external shared libraries that you do not compile yourself, the exceptions there will not generate any stack traces. This includes exceptions thrown from the pre-compiled C++ standard library too. libstdc++ seems to be doing this in a few places to prevent including exception headers in various libraries, libc++ doesn’t do this as often. Exceptions from there may be intercepted as well by using dynamic linker magic like a LD_PRELOAD library that defines the __cxa_throw symbol. For statically linked code, this is not necessary though.

How to bring up the Linux Kernel on a new platform

WerWolv — 2026-04-17T00:00:00.000Z

import LinuxEmulator from ’./components/Linux’ import TechnicalTerm from ’@/TechnicalTerm’ import PdfDocument from ’@/PdfDocument’ import Steps from ’@/Steps’ import NotePanel from ’@/NotePanel’ import PencilMarked from ’@/PencilMarked’ import Figure from ’@/Figure’

import riscv_spec from ’./assets/riscv-spec.pdf’ import wd8250_datasheet from ’./assets/WD8250_82C50_16C450.pdf’

Many embedded systems these days run Linux as their operating system. Generally because it’s a great foundation to run anything you like on top of and because many great developers and manufacturers already took care of writing drivers for all kinds of hardware components.

Even though this makes building the finished project much easier, if you’re dealing with custom hardware, you will most likely still have to bring up the Linux kernel on your own initially. This post is about just that: Setting up the bare minimal to get Linux running on a new platform. In this case, the new platform is not a new PCB but a minimal, emulated RISC-V CPU.

Even though Linux is a very complex piece of software, it doesn’t actually have that many requirements to run. The only thing it really needs in terms of hardware is:

A CPU with an Memory Management Unit
A piece of hardware on the SoC that handles translating virtual addresses to physical addresses, allowing for isolation between multiple different userspace processes and the Kernel
Enough RAM to, at least, load the Kernel, a Device Tree and an initramfs.
A periodically lapsing system clock timer

These things are what I ended up implementing in about 2000 lines of C++ code. For the CPU I implemented the RV32IMA instruction set which has all the things Linux needs to run. For the Hardware, I implemented a simple RAM peripheral, the Supervisor Binary Interface
Basically a syscall-like interface that allows the Kernel to send request to Machine Mode (the CPU itself) to configure built-in functionality for timers and a UART peripheral based on the WD8250 chip which is well supported and is really simple to implement.

Documenting the entire process of writing the emulator is a bit out of scope for this post but the full implementation can be found here: WerWolv/riscv-emulator.

On a real system, none of this would be necessary of course since the hardware would already be there. Instead of mapping the peripherals into the CPU's address space, you would instead read the datasheet of your SoC to figure out where the peripherals are mapped and then go from there to the next step. Also instead of simply being able to load the Kernel, Device Tree and initramfs into the RAM, you'd generally use a bootloader such as u-boot to load everything from some non-volatile storage medium such as a NOR Flash or an eMMC.

Using this emulator, the final hardware definition looks like this:

struct Hardware {
  Hardware() {
    // Map peripherals into the address space
    cpu.address_space().map(0x0000'0000, &ram);
    cpu.address_space().map(0xF400'0000, &uart8250);

    // Hook up a RISC-V compatible MMU as the address translator
    cpu.address_space().add_address_translator(&riscv_mmu);

    // Configure the UART peripheral to print output to the terminal
    uart8250.output_callback([](std::uint8_t c) {
      putchar(c);
      fflush(stdout);
    });

    // Load the Linux Kernel to the start of the RAM
    ram.write(0x00, LinuxKernel);

    // Load the Device Tree to the end of the RAM
    constexpr static auto DeviceTreeBlobLoadAddress = 512_MiB - 1_MiB;
    ram.write(DeviceTreeBlobLoadAddress, DeviceTreeBlob);
    // Put the device tree address into register a1 to emulate the way
    // the bootloader would do it on a real system
    emulator.cores()[0].a1() = DeviceTreeBlobLoadAddress;

    // Load the Device Tree in front of the device tree
    // but at a fixed address we can use later
    constexpr static auto InitRamFsLoadAddress = 0x1F700000;
    ram.write(InitRamFsLoadAddress, InitRamFs);

    // Start up cpu
    cpu.power_up();
  }

  auto step() -> void {
    cpu.step();
  }

private:
  riscv::Cpu<1> cpu;
  dev::riscv::MMU riscv_mmu;

  dev::Ram ram(512_MiB);
  dev::UART8250 uart8250;
};

This, of course, currently does absolutely nothing since there’s no code for it to execute. To get Linux running, we need to first configure and compile the Linux Kernel for our platform and then populate those LinuxKernel, DeviceTreeBlob and InitRamFs arrays with the correct data.

To compile anything to run on the target, the first thing we need is a toolchain. Compiling it by hand is generally a bit of a pain since you need to get the correct version of various different tools and libraries and then configure and compile them in the correct order (sometimes even multiple times with different configurations!). Luckily for us though, the crosstool-ng Project exists and handles all of that for us. With it we can simply configure the toolchain we need and then let it do all the work.

Since we’re planing on building an application that later on runs on 32-bit RISC-V Linux, we need a toolchain that targets riscv32-unknown-linux-gnu. This will give us a GCC compiler that can compile C code for our target platform as well as glibc as the C standard library implementation to interface with the Kernel.

.
$ ./bootstrap
INFO  :: *** Generating package version descriptions
...
INFO  :: *** Done!


$ make
gmake[1]: Entering directory 'crosstool-ng'
...
GEN      ct-ng



$ ./ct-ng riscv32-unknown-elf
                                                    
...
Now configured for "riscv32-unknown-elf"


$ ./ct-ng menuconfig

This will open a visual configuration menu where we can change various options for the toolchain.

In here, the main things that needs to be changed here are:

Enable Target options -> Use the MMU
Set Operating System -> Target OS to Linux
Set C library -> C library to glibc
Enable C compiler -> C++

Save the configuration and then run ./ct-ng build to start the compilation process. This can very well take up to an hour or more depending on your machine, on mine it took about 11 minutes. After it’s done, you should have a fully working riscv32-unknown-linux-gnu toolchain installed in ~/x-tools/riscv32-unknown-linux-gnu that you can use to compile the Linux Kernel and any other software for your target platform.

Now that we have a toolchain, we can move on to compiling the Linux Kernel. After cloning the torvalds/linux repository, the first thing we need to do is to create a configuration for our platform. Since our platform is quite minimal and doesn’t have any special hardware, we can just use the default RISC-V configuration as a base and then modify it to fit our needs. To do that, we can use ARCH=riscv make menuconfig to once again open a visual configuration menu.

The main thing we need to change here is to get the Kernel to be built for 32 bit RISC-V instead of 64 bit. For that the following options need to be changed:

Platform type -> Allow configurations that result in non-portable kernels option needs to be enabled
- Now the Base ISA option below can be set to RV32I
All the extension support options need to be disabled as they’re not implemented in the emulator.
Boot options -> UEFI runtime support needs to be disabled
- This is necessary only so the Emit compressed instructions when building Linux option can be disabled as our emulator doesn’t support the compressed instruction set.
- Under Target options, the Generate code for the specific ABI option needs to be set to ilp32 and Architecture Level to rv32ima to prevent the compiler from using compressed instructions when building glibc and our init program later on.
Under Kernel Features, I had to enable the SBI v0.1 support for our Timer and Build a relocatable kernel since that allows us to load the Kernel at any address in memory which is quite nice for our use case.
Finally we can disable Virtualization, Enable loadable module support, Networking support and Cryptographic API since we don’t need any of that and it just bloats the Kernel and increases boot time.
If you like, many of the Device Drivers can also be disabled since we don’t have any of that hardware. I left everything standard there though since it’s quite a few options to go through

Everything else can be left as it is. After saving the configuration, we can compile the Kernel using the following command:

$ make -j    \
  ARCH=riscv \
  CROSS_COMPILE=/path/to/riscv32-unknown-linux-gnu-

If the make process asks questions if you want to enable or disable any other options, you can simply answer n to all of them.

After the compilation is done, you should have a arch/riscv/boot/Image file that contains the compiled Kernel. This is the final file that can be executed.

Compiling a `init` program

Using the same compiler we can also compile a simple init program that will be executed by the Kernel as soon as it has booted and reached userspace. In a real world embedded system, this would generally be something like Busybox is a very small implementation of the basic GNU utilities that most Linux systems use. It’s highly configurable but is generally used to provide tools like sh, ls, cd or many others.
You can find out more about it here: busybox.net init binary but it can really be any ELF executable. I simply wrote a small C program that prints some text to the console.

#include 
#include 
#include 
#include 
#include 

int main() {
  // Open /dev/kmsg to write logs directly to the Kernel Log
  int kmsg = open("/dev/kmsg", O_WRONLY);
  if (kmsg == -1) {
    return EXIT_FAILURE;
  }
  
  // Print some message
  dprintf(kmsg, "Hello World from \033[32mLinux Userspace\033[0m!\n");
  
  // Loop forever to prevent the init process from exiting
  while (true) {
    sleep(1);
  }
  
  // Close the file handle
  close(kmsg);

  return EXIT_SUCCESS;
}

To make creating the initramfs a bit easier, we can compile the init program as a statically linked binary. This way we don’t have to worry about any shared library dependencies and can just put the binary itself into the initramfs without needing to include any additional files.

$ riscv32-unknown-linux-gnu-gcc -static init.c -o init

An initramfs or Initial RAM File System is usually just a cpio archive that contains all files and directories that should be available to the system after it boots. It is mounted by the Kernel to / and usually contains things like /bin, /dev and /proc but it can really contain anything you want.

You can create a simple initramfs yourself by just creating a directory with the structure you want and then using the find and cpio commands to create the archive. However, building a reproducible initramfs this way can be difficult due to permissions, timestamps and other metadata that cpio includes in the archive. Instead, we can use a tool that was built automatically when we compiled the Kernel: gen_init_cpio. This tool takes in a textual description of the initramfs structure and content and then generates a cpio archive from it. The description file looks something like this:

.
# /bin
dir /bin 755 0 0
file /bin/init init 755 0 0


# /dev
dir /dev 755 0 0
nod /dev/kmsg 644 0 0 c 1 11
nod /dev/initrd 644 0 0 b 1 250
nod /dev/console 600 0 0 c 5 1
nod /dev/null 666 0 0 c 1 3
nod /dev/ram 644 0 0 b 1 0
nod /dev/root 644 0 0 b 4 0
nod /dev/ttyS0 660 0 0 c 4 64

# /proc
dir /proc 755 0 0

This file can be then finally compiled into a cpio archive using the gen_init_cpio tool like this:

$ gen_init_cpio initramfs.txt -i initramfs.cpio

A device tree is a data structure that describes the hardware of the system to the Linux Kernel. It is used by the Kernel to know which drivers to load and how to configure them.

Let’s start with the bare minimum and build up from there.

/dts-v1/;


/ {
  #address-cells = <0x01>;
  #size-cells = <0x01>;

  model = "WerWolv's Emulator";


  cpus {
    #address-cells = <0x01>;
    #size-cells = <0x00>;

    timebase-frequency = <65000000>;


    cpu@0 {
      device_type = "cpu";
      compatible = "riscv";
      reg = <0x00>;
      riscv,isa = "rv32ima";
      mmu-type = "riscv,sv32";


      interrupt_controller: interrupt-controller {
        #interrupt-cells = <0x01>;
        #address-cells = <0x00>;
        compatible = "riscv,cpu-intc";

        interrupt-controller;
      };
    };
  };


  memory@0 {
    device_type = "memory";
    reg = <0x00 (512 * 1024 * 1024)>;
  };
}

This configuration is enough to describe the CPU and its timer to the Kernel. However, what’s still missing is all the peripherals. These don’t go under the cpus node but instead under the soc node

/ {
  // ...



  soc@F0000000 {
    #address-cells = <0x01>;
    #size-cells = <0x01>;
    compatible = "simple-bus";
    ranges = <0x00 0xF0000000 0x10000000>;



    serial@4000000 {


      compatible = "ns8250", "ns16550";
      reg = <0x4000000 0x100000>;
      interrupt-parent = <&interrupt_controller>;
      interrupts = <0x01>;
      no-loopback-test;
      clock-frequency = <5000000>;
    };
  };
};

The final thing that’s needed is configuration for the Kernel itself:

/ {
  // ...


  aliases {
    serial0 = "/soc@F0000000/serial@4000000";
  };


  chosen {
    bootargs = "earlycon console=/dev/ttyS0 rdinit=/bin/init root=/dev/initrd";
    stdout-path = "serial0";
    linux,initrd-start = <0x1F700000>;
    linux,initrd-end = <0x1FEFFFFF>;
  };
};

The important bit here are the values in the chosen node.

The bootargs property defines the The same way you can pass arguments to a program, you can also pass arguments to the kernel itself. Usually it’s used to pass any configuration properties from the Bootloader to the Kernel.

Run cat /proc/cmdline on your machine to see what arguments have been set by e.g grub. Here we tell the Kernel that we want to use the earlycon feature to get log output as quickly as possible, it will do that by sending all logs to the serial0 device which is the alias we defined for our UART. We also tell the Kernel to execute our /bin/init program from the initramfs as the init process as soon as it has booted and reached userspace.
linux,initrd-start and linux,initrd-end define the location of the initramfs in memory. That’s where we copied the initramfs we built previously to in the emulator.

Now that we have the full Device Tree ready, we can compile it to a Device Tree Blob (.dtb file) using the dtc compiler. This is the final file that can be loaded by the Kernel.

$ dtc -I dts -O dtb -o riscv-emulator.dtb riscv-emulator.dts

The 3 files we built now can be bundled into the emulator binary and then we can finally run the emulator and see Linux booting!

constexpr std::uint8_t LinuxKernel[] = {
    #embed "Image"
};

constexpr std::uint8_t DeviceTreeBlob[] = {
    #embed "device_tree.dtb"
};

constexpr std::uint8_t InitRamFs[] = {
    #embed "initramfs.cpio"
};

$ ./command

[    0.000000] Booting Linux on hartid 0
[    0.000000] Linux version 7.0.0-00166-g0f0013213293 (werwolv@fedora) ...
[    0.000000] Machine model: WerWolv's Emulator
...
[   54.748992] Run /bin/init as init process
[   54.814251] Hello World from Linux Userspace!

Now that we have a self-contained executable that we can run to get the Linux Kernel to boot, there’s nothing stopping us from also getting it to run in a Browser. For that, we can cross-compile the binary to WebAssembly and call into it from JavaScript.

It might take up to a minute or so for the Kernel to reach userspace and execute the `init` program since this is not exactly a greatly optimized emulator in a greatly optimized environment

Click on the Startup Machine button below to boot up the emulator.

We now have a working Linux system running on our fantasy machine! While a few things here might feel a bit contrived, the general process is exactly the same for a real piece of hardware. Though instead of just choosing where things is mapped into memory, you’d instead consult the datasheet of your SoC.

Also things like the Kernel or the initramfs of course can’t just exist in the memory magically but need to be put there instead by a Bootloader. On embedded systems this is usually something like u-boot or a proprietary thing provided by the chip manufacturer. There, you might need to further bundle up the Kernel, Device Tree and initramfs in some custom file format that the Bootloader can understand then. Check out the Android Bootloader Manual if you’d like to read about a real-world example of this.

As always, the code that was used to get all of this running can be found on my GitHub Page:

WerWolv

riscv-emulator

USB for Software Developers: An introduction to writing userspace USB drivers

WerWolv — 2026-04-07T00:00:00.000Z

import NotePanel from ’@/NotePanel’ import PencilMarked from ’@/PencilMarked’ import PdfDocument from ’@/PdfDocument’

import { Image } from ‘astro’ import device_descriptor_imhex from ’./assets/device_descriptor_imhex.png’; import usb_spec from ’./assets/usb_20.pdf’

Say you’re being handed a USB device and told to write a driver for it. Seems like a daunting task at first, right? Writing drivers means you have to write Kernel code, and writing Kernel code is hard, low level, hard to debug and so on.

None of this is actually true though. Writing a driver for a USB device is actually not much more difficult than writing an application that uses Sockets.

This post aims to be a high level introduction to using USB for people who may not have worked with Hardware too much yet and just want to use the technology. There are amazing resources out there such as USB in a NutShell that go into a lot of detail about how USB precisely works (check them out if you want more information), they are however not really approachable for somebody who has never worked with USB before and doesn’t have a certain background in Hardware. You don’t need to be an Embedded Systems Engineer to use USB the same way you don’t need to be a Network Specialist to use Sockets and the Internet.

The device we’ll be using an Android phone in Bootloader mode. The reason for this is that

It’s a device you can easily get your hands on
The protocol it uses is well documented and incredibly simple
Drivers for it are generally not pre-installed on your system so the OS will not interfere with our experiments

Getting the phone into Bootloader mode is different for every device, but usually involves holding down a combination of buttons while the phone is starting up. In my case it’s holding the volume down button while powering on the phone

Enumeration refers to the process of the host asking the device for information about itself. This happens automatically when you plug in the device and it’s where the OS normally decides which driver to load for the device. For most standard devices, the OS will look at the USB Device Class and loads a driver that supports that class. For vendor specific devices, you generally install a driver made by the manufacturer which will look at the VID (Vendor ID) and PID (Product ID) instead to detect whether or not it should handle the device.

Even without a driver, plugging the phone into your computer will still make it get recognized as a USB device. That’s because the USB specification defines a standard way for devices to identify themselves to the host, more on how that exactly works in a bit though.

On Linux, we can use the handy lsusb tool to see what the device identified itself as:

$ lsusb
...
Bus 008 Device 014: ID 18d1:4ee0 Google Inc. Nexus/Pixel Device (fastboot)
...

Bus and Device are just identifiers for the physical USB port the device is plugged into. They will most likely differ on your system since they depend on which port you plugged the device into. ID is the most interesting part here. The first part 18d1 is the Vendor ID (VID) and the second part 4ee0 is the Product ID (PID). These are identifiers that the device sends to the host to identify itself. The VID is assigned by the USB-IF to companies that pay them a lot of money, in this case Google, and the PID is assigned by the company to a specific product, in this case the Nexus/Pixel Bootloader.

Using the lsusb -t command we can also see the device’s USB class and what driver is currently handling it:

$ lsusb -t
...
/:  Bus 008.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 001: Dev 002, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 003: Dev 003, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 002: Dev 014, If 0, Class=Vendor Specific Class, Driver=[none], 480M
...

This shows the entire tree of USB devices connected to the system. The bottom most one in this part of the tree is our device (Bus 008, Device 014 as reported in the previous command). The Class=Vendor Specific Class part specifies that the device does not use any of the standard USB classes (e.g HID, Mass Storage or Audio) but instead uses a custom protocol defined by the manufacturer. The Driver=[none] part simply tells us that the OS didn’t load a driver for the device which is good for us since we want to write our own.

If you're on Windows, you won't have `lsusb` but you can still find most of this information using the Device Manager or tools like USB Device Tree Viewer

We will also go after the VID and PID since they are the only real identifying information we have. The Device Class is not very useful for it here since it’s just Vendor Specific Class which any manufacturer can use for any device. Instead of doing all of this in the Kernel though, we can write a Userspace application that does the same thing. This is much easier to write and debug (and is arguably the correct place for drivers to live anyway but that’s a different topic). To do this, we can use the libusb library which provides a simple API for communicating with USB devices from Userspace. It achieves this by providing a generic driver that can be loaded for any device and then provides a way for Userspace applications to claim the device and talk to it directly.

Enumerating the device with `libusb`

The same thing we just did manually can also be done in software though. The following program initializes libusb, registers a hotplug event handler for devices matching the 18d1:4ee0 VendorId / ProductId combination and then waits for that device to be plugged into the host.

#include 
#include 

auto hotplug_callback(
    libusb_context *ctx, 
    libusb_device *device, 
    libusb_hotplug_event event, 
    void *user_data
) -> int {
    std::println("Device plugged in!\n");

    return 0;
}

auto main() -> int {
    // Create a context so we can interact with the libusb driver
    libusb_context *context = nullptr;
    libusb_init(&context);

    // Register a hotplug event handler to wait for our device to be plugged in
    libusb_hotplug_callback_handle hotplug_callback_handle;
    libusb_hotplug_register_callback(
        context,
        LIBUSB_HOTPLUG_EVENT_DEVICE_ARRIVED, // Device plugged in event
        LIBUSB_HOTPLUG_ENUMERATE,  // Fire event for already plugged in devices
        0x18d1, 0x4ee0,            // The VID and PID we found previously
        LIBUSB_HOTPLUG_MATCH_ANY,  // Match any USB Class
        hotplug_callback, nullptr, // The callback to call
        &hotplug_callback_handle
    );

    // Handle the libusb events
    while (true) {
        if (libusb_handle_events(context) < 0)
            break;
    }

    // Clean up
    libusb_hotplug_deregister_callback(context, hotplug_callback_handle);
    libusb_exit(context);
}

If you compile and run this, plugging in the device should result in the following output:

$ ./libusb_enumerate

Device plugged in!

Congrats! You have a program now that can detect your device without ever having to touch any Kernel code at all.

On Linux, all of this will generally just work. If for any reason a driver anyway being loaded, you can forcefully detach it using `libusb_detach_kernel_driver()`.

On Windows, things may look different. If you're lucky, the device has a `Microsoft OS Descriptor` that tells Windows to load the `Winusb.sys` driver for your device. In that case, `libusb` can talk to it directly. However if no driver was loaded (the device shows up in the Device Manager with a little ⚠️ icon), you might need to use Zadig to force-replace the driver of the device with `Winusb.sys` or another supported driver. More information can be found here: libusb Wiki

# Talking to the device

Next step, getting any answer from the device. The easiest way to do that for now is by using the standardized Control endpoint. This endpoint is always on ID 0x00 and has a standardized protocol. This endpoint is also what the OS previously used to identify the device and get its VID:PID.

We're getting a bit ahead of ourselves here since we don't even know what endpoints are but it will all make sense in a bit, I promise. For now, simply think of Endpoints as Ports of a Device on the network with a specific number that we send data to.

The way we use this endpoint is with yet another libusb function that’s made specifically to send requests to that endpoint. So we can extend our hotplug event handler using the following code:

// Open the device so we can communicate with it
libusb_device_handle *handle = nullptr;
libusb_open(device, &handle);

std::vector data(0xFF);

// Do a Control transfer
const auto result = libusb_control_transfer(
    handle,
    uint8_t(LIBUSB_ENDPOINT_IN)      | // Ask for data from the device...
        LIBUSB_RECIPIENT_DEVICE      | //   about the device as a whole...
        LIBUSB_REQUEST_TYPE_STANDARD,  //   using a standard request.
    LIBUSB_REQUEST_GET_STATUS,         // Send a GET_STATUS request
    0x00,                              // wValue value of 0x00
    0x00,                              // wIndex value of 0x00
    data.data(), data.size(),          // Buffer to read the data into
    1000                               // 1000ms timeout
);

// Print the data returned by the device if there was no error
if (result >= 0)
    print_bytes(std::span(data).subspan(0, result));

// Close device again
libusb_close(handle);

This code will now send a GET_STATUS request to the device as soon as it’s plugged in and prints out the data it sends back to the console.

$ ./libusb_enumerate
Addr  00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F

0000: 01 00

Those bytes came from the device itself! Decoding them using the USB Specification tells us that the first byte tells us whether or not the device is Self-Powered (1 means it is which makes sense, the device has a battery) and the second byte means it does not support Remote Wakeup (meaning it cannot wake up the host).

There are a few more standardized request types (and some devices even add their own for simple things!) but the main one we (and the OS too) are interested in is the GET_DESCRIPTOR request.

Descriptors are binary structures that are generally hardcoded into the firmware of a USB device. They are what tells the host exactly what the device is, what it’s capable of and what driver it would like the OS to load. So when you plug in a device, the host simply sends multiple GET_DESCRIPTOR requests to the standardized Control Endpoint at ID 0x00 to get back a struct that gives it all the information it needs for enumeration. And the cool thing is, we can do that too!

Instead of a GET_STATUS request, we now send a GET_DESCRIPTOR request:

const auto result = libusb_control_transfer(
    handle,
    uint8_t(LIBUSB_ENDPOINT_IN)      | // Ask for data from the device...
        LIBUSB_RECIPIENT_DEVICE      | //   about the device as a whole...
        LIBUSB_REQUEST_TYPE_STANDARD,  //   using a standard request.
    LIBUSB_REQUEST_GET_DESCRIPTOR,     // Send a GET_DESCRIPTOR request
    (LIBUSB_DT_DEVICE << 8) | 0,       // Request the 0th Device Descriptor
    0x00,                              // Language ID, can be ignored here
    data.data(), data.size(),          // Buffer to read the data into
    1000                               // 1000ms timeout
);

This now instead returns the following data:

$ ./libusb_enumerate
Addr  00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F

0000: 12 01 00 02 00 00 00 40  D1 18 E0 4E 99 99 01 02 
0010: 00 01

Now to decode this data, we need to look at the USB specification on Chapter 9.6.1 Device. There we can find that the format looks as follows:

struct DeviceDescriptor {
    u8  bLength;
    u8  bDescriptorType;
    u16 bcdUSB;
    u8  bDeviceClass;
    u8  bDeviceSubClass;
    u8  bDeviceProtocol;
    u8  bMaxPacketSize0;
    u16 idVendor;
    u16 idProduct;
    u8  iManufacturer;
    u8  iProduct;
    u8  iSerialNumber;
    u8  bNumConfigurations;
};

Throwing the data into ImHex and giving its Pattern Language this structure definition yields the following result:

And there we have it! idVendor and idProduct correspond to the values we found previously using lsusb.

There’s more than just the device descriptor though. There’s also Configuration, Interface, Endpoint, String and a couple of other descriptors. These can all be read using the same GET_DESCRIPTOR request on the control endpoint. We could still do this all by hand but luckily for us, lsusb has an option that can do that for us already!

$ lsusb -d 18d1:4ee0 -v

Bus 001 Device 012: ID 18d1:4ee0 Google Inc. Nexus/Pixel Device (fastboot)
Negotiated speed: High Speed (480Mbps)
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 [unknown]
  bDeviceSubClass         0 [unknown]
  bDeviceProtocol         0
  bMaxPacketSize0        64
  idVendor           0x18d1 Google Inc.
  idProduct          0x4ee0 Nexus/Pixel Device (fastboot)
  bcdDevice           99.99
  iManufacturer           1 Synaptics
  iProduct                2 USB download gadget
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0020
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          2 USB download gadget
    bmAttributes         0xc0
      Self Powered
    MaxPower                2mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass     66 [unknown]
      bInterfaceProtocol      3
      iInterface              3 Android Fastboot
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0 [unknown]
  bDeviceSubClass         0 [unknown]
  bDeviceProtocol         0
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0001
  Self Powered

This output shows us a few more of the descriptors the device has. Specifically, it has a single Configuration Descriptor that contains a Interface Descriptor for the Android Fastboot interface. And that interface now contains two Endpoints. This is where the device tells the host about all the other endpoints, besides the Control endpoint, and these will be the ones we’ll be using in the next step to actually finally send data to the device’s Fastboot interface!

Let’s talk a bit more about endpoints first though. We already learned about the Control endpoint on address 0x00. Endpoints are basically the equivalent to ports that a device on the network opened for us to send data back and fourth. The device specifies in its descriptor which kind of endpoints it has and then services these in its firmware. So we don’t even need to do port scanning or know that SSH just runs on port 22 usually, we have a nice way of finding out what interfaces the device has, what language they speak and how we can speak to them. Looking at the descriptors above, that control descriptor is not there though. Instead, there’s two others with different types.

There’s exactly one per device and it’s always fixed on Endpoint Address 0x00. It’s what is used do initial configuration and request information about the device.

The main purpose of the Control endpoint is to solve the chicken-and-egg problem where you couldn’t communicate with a device without knowing its endpoints but to know its endpoints you’d need to communicate with it. That’s also why it doesn’t even appear in the descriptors. It’s not part of any interface but the device itself. And we know about its existence thanks to the spec, without it having to be advertised.

It’s made for setting simple configuration values or requesting small amounts of data. The function in libusb doesn’t even allow you to set the endpoint address to make a control request to because there’s only ever one control endpoint and it’s always on address 0x00

Bulk Endpoints are what’s used when you want to transfer larger amounts of data. They’re used when you have large amounts of non-time-sensitive data that you just want to send over the wire.
This is what’s used for things like the Mass Storage Class, CDC-ACM (Serial Port over USB) and RNDIS (Ethernet over USB).

One detail: Data sent over Bulk endpoints is high bandwidth but low priority. This means, Bulk data will always just fill up the remaining bandwidth. Any Interrupt and Isochronous transfers (further detail below) have a higher priority so if you’re sending both Bulk and Isochronous data over the same connection, the bandwidth of the Bulk transmission will be lowered until the Isochronous one can transmit its data in the requested timeframe.

Interrupt Endpoints are the opposite of Bulk Endpoints. They allow you to send small amounts of data with very low latency. For example Keyboards and Mice use this transfer type under the HID Class to poll for button presses 1000+ times per second. If no button was pressed, the transfer fails immediately without sending back a full failure message (only a NAK), only when something actually changed you’ll get a description back of what happened.

The important fact here is, even though these are called interrupt endpoints, there’s no interrupts happening. The Device still does not talk to the Host without being asked. The Host just polls so frequently that it acts as if it’s an interrupt.
The functions in libusb that handle interrupt transfers also abstract this behaviour away further. You can start an interrupt transfer and the function will block until the device sends back a full response.

Isochronous Endpoints are somewhat special. They’re used for bigger amounts of data that is really timing critical. They’re mainly used for streaming interfaces such as Audio or Video where any latency or delay will be immediately noticeable through stuttering or desyncs. In libusb, these work asynchronously. You can setup multiple transfers at once and they will be queued and you’ll get back an event once data has arrived so you can process it and queue further requests.
This type is generally not used very often outside of the Audio and Video classes.

Besides the Transfer Type, endpoints also have a direction. Keep in mind, USB is a full master-slave oriented interface. The Host is the only one ever making any requests and the Device will never answer unless addressed by the Host. This means, the device cannot actually send any data directly to the Host. Instead the Host needs to ask the Device to please send the data over.

This is what the direction is for.

IN endpoints are for when the Host wants to receive some data. It makes a request on an IN endpoint and waits for the device to respond back with the data.
OUT endpoints are for when the Host wants to transmit some data. It makes a request on an OUT endpoint and then immediately transfers the data it wants to send over. The Device in this case only acknowledges (ACK) that it received the data but won’t send any additional data back.

The way I remember the directions is using that master-slave analogy. The master is very self-centered and always refers to everything from its perspective. - `IN`: I want to get data in - `OUT`: I want to send data out

Contrary to the transfer type, the direction is encoded in the endpoint address instead. If the topmost bit (MSB) is set to 1, it’s an IN endpoint, if it’s set to 0 it’s an OUT endpoint. (If you’re into Hardware, you might recognize this same concept from the I2C interface.)

That means two things:

You can have a maximum of 27−1=1272^7 - 1 = 12727−1=127 custom endpoints available at once
- 272^727 because we have 7 bits available for addresses
- −1-1−1 because we always have the control endpoint that’s on the fixed address 0x00.
Endpoints are entirely unidirectional. Either you’re using an endpoint to request data or to transmit data, it cannot do both at once
- That’s also the reason why our Fastboot interface has two Bulk endpoints: one is dedicated to listening to requests the Host sends over and the other one is for responding to those same requests

Now that we have all this information about USB, let’s look into the Fastboot protocol. The best documentation for this is both the u-boot Source Code and as its Documentation.

According to the documentation, the protocol really is incredibly simple. The Host sends a string command and the device responds with a 4 character status code followed by some data.

Host:    "getvar:version"        request version variable

Client:  "OKAY0.4"               return version "0.4"

Host:    "getvar:nonexistant"    request some undefined variable

Client:  "OKAY"                  return value ""

Let’s update our code to do just that then:

// Open the device so we can communicate with it
libusb_device_handle *handle = nullptr;
libusb_open(device, &handle);

// Claim the interface to let libusb know which interface
// we're sending data to
libusb_claim_interface(handle, 0);

// Setup a 64 byte buffer for our request and response
// The documentation specifies 64 bytes for full-speed and
// 512 bytes for high-speed. Since this is a full-speed device,
// we use 64 bytes.
std::vector bytes(64);

// Copy the command "getvar:version"
// to the start of the buffer
std::ranges::copy(
    "getvar:version", 
    bytes.begin()
);

// Do a Bulk transfer of that data on the OUT Endpoint 0x02
int num_bytes_transferred = 0;
libusb_bulk_transfer(
    handle,                     // Device handle
    LIBUSB_ENDPOINT_OUT | 0x02, // Endpoint OUT 0x02
    bytes.data(), bytes.size(), // Data to send
    &num_bytes_transferred,     // Number of bytes sent
    1000                        // 1000ms timeout
);

// Print the transmitted data data
std::println("Request: {}", 
    std::string_view(
        reinterpret_cast(bytes.data()),
        num_bytes_transferred
    )
);

// Clear the buffer
std::ranges::fill(bytes, 0x00);
num_bytes_transferred = 0;

// Do a Bulk transfer on the IN Endpoint 0x01
libusb_bulk_transfer(
    handle,                     // Device handle
    LIBUSB_ENDPOINT_IN | 0x01,  // Endpoint IN 0x81
    bytes.data(), bytes.size(), // Buffer to receive into
    &num_bytes_transferred,     // Number of bytes received
    1000                        // 1000ms timeout
);

// Print the returned characters
std::println("Response: {}", 
    std::string_view(
        reinterpret_cast(bytes.data()),
        num_bytes_transferred
    )
);

// Release the interface again
libusb_release_interface(handle, 0);

// Close the device handle
libusb_close(handle);

Plugging the device in now, prints the following message to the terminal:

$ ./libusb_enumerate
Request:  getvar:version
Response: OKAY0.4

That seems to match the documentation!
First 4 bytes are OKAY, specifying that the request was executed successfully The rest of the data after that is 0.4 which corresponds to the implemented Fastboot Version in the Documentation: v0.4

And that’s it! You successfully made your first USB driver from scratch without ever touching the Kernel.

All these same principles apply to all USB drivers out there. The underlying protocol may be significantly more complex than the fastboot protocol (I was pulling my hair out before over the atrocity that the MTP protocol is) but everything around it stays identical. Not much more complex than TCP over sockets, is it? :)

Thermal Printer BLE Protocol reverse engineering

WerWolv — 2021-04-30T00:00:00.000Z

import Figure from ’@/Figure’;

Thermal Printers are amazing for quickly printing out notes and todo lists since all they need is paper and some power. Printing is done by heating up the paper to color it black without needing any sort of ink that can run out. Unfortunately the model I got here only supports Bluetooth and the only official way to talk to it is through the horrible iPrint app.

Two weeks ago I ordered this thermal printer from AliExpress hoping it would be the same as the one a friend got a while ago. Their printer has an STM32 MCU on it as well as a super nicely labeled UART header. Unfortunately for me, mine has some weird ass controller with no information online whatsoever and no UART or USB support. Great.

import Horizontal from ’@/Horizontal’;

#	Description
A	LDO to generate lower voltages required to drive for example the MCU
B	H-Bridge which controls the stepper motor of the printer
C	Step-up 2-cell Lithium Battery charger IC
D	Weird ass MCU apparently nobody has ever heard of. It has an integrated bluetooth PHY

The only thing it does have is Bluetooth, or rather BLE. And the only way to talk to it is through a proprietary, shady, chinese app you have to download from some apk mirroring site because it’s not even on the Play Store (update from the future, it now is). It’s called iPrint and has some basic functions to let you print out photos, text and some weird built-in frames and images.

But it works pretty well :)

import test_print from ’./assets/test_print.png’;

The main problem though is, having to use my Phone to print those notes isn’t really all that great. Being able to quickly generate notes on the computer and printing them out would be so much more useful! Unfortunately nobody’s done anything like this for that printer already so I guess I have to do it.

Since the App is the only thing that really has the ability to talk to the printer, let’s start there. There’s probably better ways to do this but I simply googled for apk decompiler, clicked on the first link and used that. It’s a site called javadecompilers.com where I uploaded the iprint apk I downloaded earlier and after a few minutes the fully decompiled project was ready to be downloaded.

At first I was shocked, the decompiled code was almost 200MB with 14k files but looking into the project a bit I noticed, most of them are just libraries they bundled into the app. Looking around a bit I found a promising looking file called com.blueUtils.PrintDataUtils.java. In there are multiple functions that format provided input data into a byte array of “cmds” that will end up being sent to the printer over BLE.

Let’s first take a look at a function named public byte[] eachLinePixToCmdB(byte[] bArr, int i, int i2). The decompilation isn’t that great but there are multiple similar looking sections in there like this one:

LogUtils.m1960e(Integer.valueOf(getEneragy()));
bArr2 = new byte[((i7 * length) + BluetoothOrder.print_text.length + 9 + 10)];
byte[] bArr5 = new byte[10];
bArr5[0] = 81;
bArr5[1] = 120;
bArr5[2] = -81;
bArr5[3] = 0;
bArr5[4] = 2;
bArr5[5] = 0;
bArr5[6] = ConvertUtils.hexString2Bytes(Integer.toHexString(getEneragy()))[1];
bArr5[7] = ConvertUtils.hexString2Bytes(Integer.toHexString(getEneragy()))[0];
bArr5[8] = BluetoothOrder.calcCrc8(bArr5, 6, 2);
bArr5[9] = -1;
System.arraycopy(bArr5, 0, bArr2, 0, bArr5.length);
this.packageLength += 10;

(There’s grammatical and logical errors in the naming of functions everywhere. The developers absolutely weren’t native english speakers.)

Since Java doesn’t really know unsigned variables, there’s some negative values in there. However they can simply be converted to unsigned representation by using the 2’s complement rules. Comparing all the sections that look like this, I concluded that the command protocol must look something like this:

Magic0:         0x51
Magic1:         0x78
CommandID:      0x00 - 0xFF
AlwaysZero0:    0x00
Data Size:      0x00 - 0xFF
AlwaysZero1:    0x00
Data:           [ Array of bytes with the length provided before, Big Endian ]
DataCRC8:       0x00 - 0xFF
Magic4:         0xFF

Of course, after I reverse engineered this all manually, I randomly stumbled over BluetoothOrder.java.

A list of hardcoded commands used by the app as well as the CRC8 look up table used for the calculation. The table is pretty much the default one though. With this table I could conclude the following list of command IDs:

RetractPaper:     0xA0
FeedPaper:        0xA1
DrawBitmap:       0xA2
SetDrawingMode:   0xBE
SetEnergyLevel:   0xAF
SetQuality:       0xA4

Perfect, everything that’s needed to start talking to the printer!

It took me a while to find a library (and language) that supported talking to BLE on Windows but ultimately I ended up using Python with the Bleak library. Looking at their example suggested I need to provide the device’s bluetooth mac address as well as some UUID. I ended up reading through some of the BLE specs and found out that this UUID was an identifier for a so called Characteristic the printer provides. It’s basically like specifying what port to send the data to for the printer to properly receive it. Finding the device ID was pretty simple, I downloaded some bluetooth analysis app from the Play Store and the mac address was displayed there right away. But how on earth do I find the characteristic UUID? My first thought was to look at the app again since it needs to be hardcoded there somewhere. Searching for UUIDs in general yielded a lot of results so I quickly stopped there. Instead I downloaded a free app from the Microsoft Store (lol) called Bluetooth LE Lab which was amazingly helpful for this. Selecting the device brings you to this screen which displays all Services and Characteristics of the device.

There’s two WriteWithoutResponse, two Notify, one Indicate and one Read, Write characteristic. Since they have commands in their app that read data from the printer, the only one that really worked was the Read, Write one: 0000AE10-0000-1000-8000-00805F9B34FB. The app even allowed me to directly send data to the device. I converted the paper command array found in the app to hexadecimal and 🎉, paper was ejected from the printer!

Now that I knew that the commands worked, I started to reimplement it in Python. It took a while to get the whole thing installed on Windows and to get it to talk to the computer’s Bluetooth module but in the end I got the same command working from Python.

crc8_table = [
    0x00, 0x07, 0x0e, 0x09, ... # Rest of array
]

def crc8(data):
    crc = 0
    for byte in data:
        crc = crc8_table[(crc ^ byte) & 0xFF]
    return crc & 0xFF

def formatMessage(command, data):
    data = [ 0x51, 0x78 ] + [command] + [0x00] + [len(data)] + [0x00] + data + [crc8(data)] + [0xFF]
    return data

PrinterAddress = "93:2A:BB:C4:95:8D"
PrinterCharacteristic = "0000AE01-0000-1000-8000-00805F9B34FB"

FeedPaper = 0xA1
async def feedPaper():
  device = await BleakScanner.find_device_by_address(PrinterAddress, timeout=20.0)
  async with BleakClient(device) as client:
    await client.write_gatt_char(PrinterCharacteristic, formatMessage(FeedPaper, [0x70, 0x00]))

loop = asyncio.get_event_loop()
loop.run_until_complete(feedPaper())

Looking through the App once again, I noticed this printer doesn’t really support printing text directly. Instead what they do is make the app render whatever the user enters using HTML, render that to a bitmap and then send that bitmap to the printer. The DrawBitmap command 0xA2 takes an array of bytes where each bit represents one pixel in the image. If the bit is a 1, the printer will burn the paper at that position, if it’s a 0 it won’t. I found this out by simply sending some patterns to the printer. 0xFF lead to a opaque line, 0xAA lead to every second pixel to be drawn. Knowing that, I wrote a quick function that loaded in a image using the PIL library and turned it into a byte array line-by-line. To print a full image now though, we need to print multiple lines. This is done by drawing a single line and then advancing one step using the FeedPaper command 0xA1 and so on. After many attemps I finally managed to get it right.

The whole project took me about 6 hours spread out over the course of two days. When I first got the printer and took it apart I was really disappointed to not find any UART or USB interface but now I’m really happy I had to use BLE for it. It made the whole thing really fun and easy to use now since it doesn’t need any extra hardware (besides a computer with Bluetooth support). Finally though, the result. Fucking worth it.

import Video from ’@/Video’; import printing_video from ’./assets/printing.mp4’

I published all my example code here on my GitHub:

WerWolv

PythonCatPrinter

There’s still a ton to do though. Printing right now is really slow because sending data too fast causes the Printer to jam up and refuses to do anything anymore until it’s restarted. The App does have some sort of compression for the data but I did not yet manage to figure out how it works. The next step probably will be to get text rendering and printing to work. I’ll update this blog once I know more.

Reverse Engineering the Surface Book 2's proprietary IOCTL commands

WerWolv — 2020-07-30T00:00:00.000Z

import Figure from ’@/Figure’;

The Surface Book 2 is one of Microsoft’s self made notebooks. What makes it different from other laptops is it’s deep integration of the drawing pen into Windows and the ability to detach the entire screen from it’s base by pressing a button on the keyboard or using their pre-installed SurfaceDTX tool.

Since Windows any many other operating systems run on a ton of different hardware, it’s impossible to bundle support for every device directly into the Kernel, however userspace programs may still want to communicate with hardware installed in the computer. Instead of adding custom system calls for every device ever built, most OSes support loading of kernel extensions at runtime (kernel modules on Linux, device drivers on Windows) together with a unified way to communicate with these extensions, called ioctl.

The reason ioctl and device drivers are necessary in the first place is for security reasons. On startup all hardware devices found on- or connected to the computer’s mainboard are mapped into the kernel’s address space and have to be controlled from there using extensions that live in the kernel’s address space as well. The kernel’s address space cannot be directly accessed by userspace applications so the kernel may allow access to certain devices through syscalls while denying access to others.

The greatness of ioctl comes from its simplicity. A single syscall is used on windows called NtDeviceIoControlFile with its wrapper function DeviceIoControl. It takes the following arguments:

A HANDLE to the device, usually obtained by using the NtCreateFile syscall
A control code describing the operation the device driver should execute. These codes consist of several fields:
- Device type (CD-ROM, mouse, network, printer, etc.) values 0x8000 or greater are vendor specific devices (for example the latch mechanism on the Surface Book 2)
- Function code describing the command the device driver should execute. These are arbitrary values defined in the individual drivers.
- Transfer type specifying if the data gets buffered or not
- Required access for operations, either read-only, write-only or read-write
A pointer to a in-buffer sent to the device driver
The size of the in-buffer
A pointer to a out-buffer which will be filled with data returned from the driver
The size of the out-buffer
A pointer to a uint32_t where the received data size will be written to
A optional pointer to a overlapped struct for async operations

When calling DeviceIoControl a syscall handler in the kernel gets called. That handler uses the passed device handler to find the right device driver to be called. The in-buffer then gets copied from user- into kernel space and the driver’s DEVICE_CONTROL callback gets called containing the control code and pointers to the in- and out-buffers. The control code is used to figure out what operation should be executed, the in-buffer is used for parameters and the out-buffer for possible returned values.

To find out how the latch driver works, there are two possible approaches. Either we reverse engineer the device driver directly and analyze the DEVICE_CONTROL callback or we use the already implemented latch control tool Microsoft built to find the correct driver and control codes.

I decided to go for the latter since the tool was trivial to find (by simply looking at the task manager) and even better, it was written in C# containing full symbol information. To analyze the .NET application, I used JetBrain dotPeek. Simply looking through the different namespaces in dotPeek quickly made me discover a promising class called DriverLatch.cs.

Conveniently, at the very start of the file, the latch interface GUID and all the different control codes were specified.

private static readonly Guid g_latchInterfaceId = new Guid("f49e75f6-f869-4346-9eb8-ded248275916");

private static readonly IOControlCode g_latchCommandIoctl = new IOControlCode((ushort) 32768, (ushort) 2065, (IOControlAccessMode) 2, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_latchChangedIoctl = new IOControlCode((ushort) 32768, (ushort) 2066, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_latchStatusIoctl = new IOControlCode((ushort) 32768, (ushort) 2064, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_detachChangedIoctl = new IOControlCode((ushort) 32768, (ushort) 2067, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_detachStateIoctl = new IOControlCode((ushort) 32768, (ushort) 2068, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);

The interesting one here is only g_latchCommandIoctl though since *ChangedIoctl control codes are callbacks and *StateIoctl control codes are there to query information about the current latch state.

Looking further through the class led to a method conveniently named void OpenLatch(uint cancelAfterMs). It does exactly what the name implies, it sends a ioctl command through the .NET Windows API opening the latch. It does not return any values but it takes in a struct of data as input buffer:

private enum LatchCommandType
{
    Invalid,
    Open,
    Close_DEPRECATED,
    ButtonPress,
    Cancel,
    MaximumValue,
}

[StructLayout(LayoutKind.Sequential, Pack = 1)]
private struct LatchCommandInArgs
{
    public DriverLatch.LatchCommandType LatchCommand;
    public uint TimeoutMs;
}

Again, very conveniently labeled :)

In order to send data to the driver, a handle is required which is returned by the NtCreateFile syscall. The issue is though, how to get the path of it? This, I couldn’t figure out either at first. Consulting Microsoft’s documentation didn’t really help a lot either. The way I came up with is sadly not super great but it did the trick. The path always contains the GUID found previously in the source code. And for the .NET tool to communicate with the driver it needs to have the full path in memory somewhere. So why not use Cheat Engine’s string search tool to search for the GUID string in memory and look around a bit to find the rest of the string. Important to note is, since this is a .NET application, all strings are stored in UTF-16. After some fiddling around, this is what turned up:

Or in plain text: \\?\ACPI#MSHW0133#2&daba3ff&1#{f49e75f6-f869-4346-9eb8-ded248275916}

To finish off, I wanted to write a program in C/C++ which simply unlocks the latch when executed. Having all the information required from the binary, this was rather trivial:

#include 
#include 

enum class LatchCommandType : std::uint32_t {
    Invalid,
    Open,
    Close_DEPRECATED,
    ButtonPress,
    Cancel,
    MaximumValue
};

struct [[gnu::packed]] LatchCommandInArgs {
    LatchCommandType LatchCommand;
    std::uint32_t TimeoutMs;
};

// Latch command ioctl control code
constexpr DWORD latchCommandIoctl = CTL_CODE(0x8000, 2065, METHOD_BUFFERED, FILE_WRITE_ACCESS);

int main() {
    // Open a handle to the latch device driver
    HANDLE ioctlLatchFile = CreateFileW (
        L"\\\\?\\ACPI#MSHW0133#2&daba3ff&1#{f49e75f6-f869-4346-9eb8-ded248275916}",
        GENERIC_READ | GENERIC_WRITE,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        nullptr,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL,
        nullptr
    );

    // Specify the device driver arguments sent through the in-buffer
    LatchCommandInArgs args = { .LatchCommand = LatchCommandType::ButtonPress, .TimeoutMs = 5000 };
    DWORD readSize = 0;

    // Make the ioctl call, opening the latch
    DeviceIoControl(ioctlLatchFile, latchCommandIoctl, &args, sizeof(LatchCommandInArgs), nullptr, 0, &readSize, nullptr);

    return 0;

}

A open source implementation of SurfaceDTX written in C# can be found on my GitHub repository:

WerWolv

SurfaceAlwaysDTX

Getting a home-made OS running on a STM32MP1 based development board

WerWolv — 2020-07-24T00:00:00.000Z

import Steps from ’@/Steps’; import PdfDocument from ’@/PdfDocument’; import Figure from ’@/Figure’;

The STM32MP157C-DK2 is one of the latest dev boards by ST Microelectronics. It features a STM32MP1 SoC with two ARM A7 cores and one M4 co-processor core. The intended way of using this SoC according to STM is to run their custom, pre-built Linux distribution, toolchain, SDK and proprietary flash tools. Their Wiki does a great job at not telling you a lot of important details because STM’s linux distro “takes care of that”. This post however deals with all the dirty details about how the entire boot process works and how to bring the DK2 board into a state where a custom kernel can be loaded.

import stpmic1_datasheet from ’./assets/stpmic1.pdf’; import board_schematics from ’./assets/p574376-en.MB1272-DK2-C01_Schematic.pdf’;

When the board gets plugged in, the first thing happening is the STPMIC1 power management IC initializing it’s output voltages to the default settings. According to its Datasheet and the Board Schematics, this means the core gets powered with 1.2V, the DDR RAM with 1.1V and the rest of the SoC peripherals with 3.3V.

Once the voltage has stabilized, the Boot ROM starts running and determines where to boot from by reading either the BOOT0 and BOOT1 pin or a value burnt into the OTP eFuses. Possible boot sources are NAND and NOR flash, eMMC, Serial and SD cards. The DK2 uses the SD card by default as no other sources can be found on the board.

The Boot ROM now tries to boot from the SD card. The SD card must contain a GPT containing two partitions named fsbl1 and fsbl2. These are two, hopefully identical, copies of the First Stage bootloader the Boot ROM will later execute.

For the Boot ROM to recognize and load the FSBL (or in fact any binary), a special format is used. It is described as the STM32 header for binary files and consists of a 256 bytes long header followed by the binary data to load.

The following code is a python script which generates a correct header for any given FSBL binary:

header = struct.pack("<4sQQQQQQQQIIIIIIIIIIQQQQQQQQ83xb",
  # Header magic
  b"STM\x32",                 
  # ECDSA signature, unsigned here                      
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  # Checksum of payload, sum of all bytes   
  sum(payload),
  # Header version 1.0                                     
  0x00010000,
  # Length of payload                                       
  len(payload),
  # Entrypoint address. SYSRAM + 0x2400 (BROM data) + 0x100 (header)                                     
  0x2FFC0000 + 0x2400 + 0x100,
  # Reserved                      
  0x00,
  # Load address of image, unused                                             
  0x2FFC0000 + 0x2400 + 0x100,
  # Reserved                      
  0x00,
  # Image version                                             
  0x00,
  # Option flags, disable signature verification                                             
  0x01,
  # ECDSA algorithm set to P-256 NIST, unused                                             
  0x01,
  # ECDSA signature, unsigned here                                             
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  # Binary type: U-Boot   
  0x00
)

Some information about the header:

Since the SoC is not in production mode currently, the ECDSA signature is optional and not used here (bit 0 in options flag set)
The checksum is calculated by summing up all bytes in the payload mod 0xFFFF'FFFF to get a 32 bit value
The binary always gets loaded to address 0x2FFC'2400, no matter what load address was specified in the header.
- The STM32MP1’s so called SYSRAM starts at address 0x2FFC'0000
- The Boot ROM will use the first 0x2400 bytes of the SYSRAM for its own data segment. Besides the Boot ROM’s execution data, it also stores multiple structs in there containing various boot information such as boot device, retries, etc.
The entry point value is the 32 bit address the Boot ROM will jump to if validation succeeded.
- For this binary, execution should start at the beginning of the .text segment which is located directly after the header. Therefore the entry point is at 0x2FFC0000 + 0x2400 + 0x100 (addressof(SYSRAM) + sizeof(.data_BootROM) + sizeof(STM32Header))

Once the image was verified, copied and the Boot ROM jumped to the start of the FSBL’s .text section, the real fun starts. Important to note is that the Boot ROM does not have a proper ELF loader or anything. The binary is simply memcpy’d into SYSRAM. Therefore things like .bss segments will not be expanded automatically. These sections need to be expanded statically beforehand to work properly.

The only setup to do now is to update the SP to point to 0x3000'0000, the end of SYSRAM.

For the longest time of this project my testing cycle looked as follows:

Writing code

Compiling source code with `make`

Generating a MBR SD card image with `genimage`

Downloading that image from my build server

Flashing it to an SD card using balenaEtcher

Inserting the SD card into the board

Resetting the board

Hoping that one of the two LEDs used for debugging would light up

This process usually took between 2 and 3 minutes and the only way to debug things was extracting two bits of information at the time using the two on-board LEDs. This time is luckily over thanks to the amazing OpenOCD project. OpenOCD is an open source embedded debugging software providing the ability to interface with many different debugging interfaces such as JTAG and ST-Link. Additionally it integrates a gdb-server which can be used control, program and debug the connected SoC though GDB. This ultimately allows for one click building, flashing and debugging right within VSCode:

Getting OpenOCD to run properly is really easy thanks to a bunch of pre-made scripts.

$ openocd -f $OPENOCD_SCRIPTS/board/stm32mp15x_dk2.cfg -c "gdb_flash_program enable"

The first part is the debug config for the dk2 board. When using a custom board with an MP1 on board, -f $OPENOCD_SCRIPTS/target/stm32mp15x.cfg -f $OPENOCD_SCRIPTS/interface/stlink-dap.cfg can be used instead. At the time of writing this, they are not in the latest release and require OpenOCD to be built from sources (or using the AUR package openocd-git). -c "gdb_flash_program enable" is necessary for gdb to be able to flash the executable to the board.

Using the Native Debug extension for VSCode, the following task and launch config can be used to directly load the fsbl ELF executable into SYSRAM, executing it and start debugging:

// launch.json
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Debug",
      "type": "gdb",
      "request": "launch",
      "cwd": "${workspaceRoot}",
      "target": "${workspaceRoot}/fsbl/build/fsbl.elf",
      "gdbpath": "/bin/arm-none-eabi-gdb",
      "preLaunchTask": "build", // Before debugging, build changes
      "autorun": [
        "target remote tcp:localhost:3333", // Connect to OpenOCD's gdb-server
        "load ./fsbl/build/fsbl.elf", // Flash fsbl.elf to target
        "file ./fsbl/build/fsbl.elf", // Load symbols from fsbl.elf
        "b main", // Set a breakpoint at the start of main()
        "j _start" // Jump to _start, the beginning of the crt0
      ]
    }
  ]
}

// tasks.json
{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "build",
      "type": "shell", // Run a command
      "command": "make", // Run make
      "problemMatcher": []
    }
  ]
}

All of this allows for instant loading of your changes without having to go through the SD card at all. Additionally variables can be inspected, breakpoints set, code stepped, the strack trace inspected and more. It cuts the dev cycle from 2 to 3 minutes down to 20 seconds with tons of cool extra features.

The easiest way to indicate a sign of life is always to light up an LED. The STM32MP157C-DK2 conveniently has two user LEDs on board, a blue one and an orange one. Checking the board schematics shows, they are connected to GPIO Pin D11 and H7 respectively.

To set up the GPIO pins now, the following steps need to be done.

First, the GPIO clock needs to be enabled. This is done by setting the respective bit in the right RCC_MP_AXXXENSETR register. The reference manual shows, RCC_MP_AHB4ENSETR contains all the bits for the GPIOA to GPIOK clocks.

Next, the GPIO pins need to be configured. The relevant registers for this are GPIOX_MODER, GPIOX_OTYPER, GPIOX_OSPEEDR and GPIOX_PUPDR.

GPIOX_MODER contains the mode of all pins in this port. Since the pin needs to be able to drive an LED, this should be set to Output mode.

GPIOX_OTYPER contains the output type of that pin. It may be either Push-Pull or Open-Drain. Looking at the schematics once again, the LED’s annode is connected to the GPIO pin and the cathode directly to ground. This means to light the LED up, a current needs to flow out of the GPIO pin, through the LED into ground. This can only be achieved using the Push-Pull configuration.

GPIOX_OSPEEDR is the speed at which the pin needs to be able to respond. Higher values here will cause a higher current flow and possible reflections if the line is not properly matched. For an LED though, this all doesn’t really matter so it can be safely set to Low or Medium speed.

Finally GPIOX_PUPDR defines whether a pullup, pulldown or no resistor at all should be used. In Push-Pull mode, this is generally unwanted and should be set to no pullup.

This finishes up the configuration of the GPIO pin. To make the LEDs light up now, the bit in the GPIOX_ODR register corresponding to that pin needs to be set to pull the pin to VDD voltage. And that’s it!

Code is being executed but it’s running in a very limited environment right now. To get a kernel up and running, the things still necessary are the following:

Write an I2C driver to communicate with the STPMIC1 power management IC and increase the voltage delivered to the onboard DDR3 RAM.
Write a RAM interface driver to initialize and map the DDR3 RAM into memory.
Write an SDMMC interface driver and mount the SD card again
Use for example FATFs to load a SSBL from a FAT32 partition on the SD card.
Write a proper ELF loader to load and map the SSBL into the DDR RAM.
Write a SSBL that loads a kernel image from the SD card.
Write a kernel

This blog post and list will be updated as I go along and finish more of the boot chain. The current progress can be found here:

WerWolv

STM32MP1OS

One of the professors of the University I’m on, emailed me and a friend about projects next year and asked if we wanted to work on a project including the STM32MP1 to possibly use it in the future to teach new students about low-level C, Embedded Linux and Asynchronous multiprocessing. This not only means we don’t have to worry about getting a good project next year, it also means I can continue working on the same board during my Bachelor thesis which is absolutely amazing

The project we ended up doing using this board was a demo application showcasing a hardware accelerated GUI running on the A7 Cores under Linux controlling a real-time, bare metal firmware running on the M4 coprocessor. The coprocessor is running an RTOS which uses a PID controller to regulate the height of a ball inside a tube using a ToF sensor to measure the ball’s height and a fan to blow the ball up.

Our professor just told us that we’re getting an A for the work we’ve done using the STM32MP157C-DK2 and has high hopes that we’re doing as well during the bachelor thesis. During the bachelor thesis we’ll be developing a modular development board for students to replace the current (old and incredibly large and heavy) development boards for Embedded systems, Linux and Android classes as well as FPGA/SoC design in VHDL class. This is going to be amazing :)

import Divider from ‘packages/mdx-blog-components/src/Divider’;

These are the notes I took while reading the wiki, reverse engineering u-boot, st’s drivers and the DK2’s schematic. Everything in all it’s unfinished and messy glory. There is some more information here about u-boot and how u-boot finds and loads the linux kernel, however this has not much to do with the bare metal OS mentioned above. It is left here in the hope that it helps people understand the thought processes I went though when looking at the official sd card image, u-boot, the reference manual and the schematics.

MBR
- First (actually fourth) partition marked as active / bootable, this is the rootfs
- CHS address : 0x001E'0D00
  - H Head: 30
  - S Sector: 13
  - C Cylinder: 0
  - LBA = (C _ HPC + H) _ SPT + (S - 1) = (0 _ 256 + 30) _ 63 + (13 - 1) = 30 * 63 + 12 = 1902
    - HPC: Heads per Cylinder = 256
    - SPT: Sectors per Track = 63
    - Block / Sector Size = 512
  - Address of first partition: 1902 * 512 = 0x000E'DC00
GPT
- First EFI Partition Entry called fsbl1 = "First Stage Bootloader Copy 1
  - FirstLBA: 0x22 -> 0x22 * 512 = 0x4400 -> Start of first stage bootloader with custom STM32 header
- Second EFI Partition Entry called fsbl2 = "First Stage Bootloader Copy 2"
  - FirstLBA: 0xB7 -> 0xB7 * 512 = 0x16E00 -> Start of safety copy of the first stage bootloader
STM32 FSBL (SPL)
- Loaded into SYSRAM at address 0x2FFC'2400
- Header specifies entry point at 0x2FFC'2400 -> SYSRAM + 0x100
  - STM32 header is exactly 0x100 bytes long, so execution starts off at the beginning of the binary right after the header
- Header specifies U-boot FSBL
- STM32MP157C-DK2 Schematics
- Enables BUCK3 of the STPMIC1 Power Management IC over a I2C u-boot driver
  - This down steps the 5V from the USB-C socket and applies it to the VDD net
    - Applies VDD to PDR_ON and PDR_ON_CORE (Power On Reset Enable)
    - Powers the XTAL oscillator
    - Powers up the STM32MP1’s peripheral interfaces
    - Brings up the DDR RAM
- Chooses U-boot’s MMC loader to load the next stage of U-Boot into the DDR RAM at 0xC020'0000 ref
  - Boot target device
SSBL (U-Boot)
- Voltage Regulators
  - Sets the VDD_DDR voltage to 1.35V for DDR3 RAM, 1.25V for LPDDR RAM in 32 bit mode or 1.2V for LPDDR in 16 bit mode (BUCK2)
  - Sets VTT_DDR voltage to 1.8V (LDO3)
- Enable UART
  - Only for debug mode
  - Enables GPIOG clocks (UART4_TX on PG11)
- Checks the device specified here with id for a suitable partition
  - They specify auto as the partition so it looks for the first bootable partition and if there’s none, it falls back to the first valid partiton. (reference)
  - Loads configuration from rootfs:/boot/extlinux/extlinux.conf found here
    - Added here
    - Specified here
    - It parses this file to figure out where the zImage and dtb file is located
  - Loads the Kernel from rootfs:/boot/zImage
    - Loaded to 0xC200'0000 (Into DDR RAM going from 0xC000'0000 to 0xDFFF'FFFF)
      - Configuration found here
  - Loads the Device Tree Blob from rootfs:/boot/stm32mp157c-dk2.dtb
- Jumps to Kernel
  - Cleanup
  - Switch to EL2
  - reference
Drivers

U-Boot 2020.04 (Jul 03 2020 - 20:11:15 +0200)

CPU: STM32MP157CAC Rev.B
Model: STMicroelectronics STM32MP157C-DK2 Discovery Board
Board: stm32mp1 in basic mode (st,stm32mp157c-dk2)
Board: MB1272 Var2 Rev.C-01
DRAM:  512 MiB
Clocks:
- MPU : 650 MHz
- MCU : 208.878 MHz
- AXI : 266.500 MHz
- PER : 24 MHz
- DDR : 533 MHz
NAND:  0 MiB
MMC:   STM32 SDMMC2: 0
Loading Environment from EXT4... OK
In:    serial
Out:   serial
Err:   serial
****************************************************
*       WARNING 1.5mA power supply detected        *
*     Current too low, use a 3A power supply!      *
****************************************************

Net:   eth0: ethernet@5800a000
Hit any key to stop autoboot:  0
Boot over mmc0!
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:4...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
131 bytes read in 22 ms (4.9 KiB/s)
1:      stm32mp157c-dk2-buildroot
Retrieving file: /boot/zImage
4171640 bytes read in 202 ms (19.7 MiB/s)
append: root=/dev/mmcblk0p4 rootwait
Retrieving file: /boot/stm32mp157c-dk2.dtb
49532 bytes read in 24 ms (2 MiB/s)
## Flattened Device Tree blob at c4000000
   Booting using the fdt blob at 0xc4000000
   Loading Device Tree to cfff0000, end cffff17b ... OK

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.7.1 (werwolv@werwolv-vmwarevirtualplatform) (gcc version 10.1.0 (Buildroot 2020.08-git-00490-gf50086e59f), GNU ld (GNU Binutils) 2.33.1) #1 SMP PREEMPT Fri Jul 3 18:44:10 CEST 2020
[    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: Machine model: STMicroelectronics STM32MP157C-DK2 Discovery Board
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] Reserved memory: created DMA memory pool at 0x10000000, size 0 MiB
[    0.000000] OF: reserved mem: initialized node mcuram2@10000000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x10040000, size 0 MiB
[    0.000000] OF: reserved mem: initialized node vdev0vring0@10040000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x10041000, size 0 MiB
[    0.000000] OF: reserved mem: initialized node vdev0vring1@10041000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x10042000, size 0 MiB
[    0.000000] OF: reserved mem: initialized node vdev0buffer@10042000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x30000000, size 0 MiB
[    0.000000] OF: reserved mem: initialized node mcuram@30000000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x38000000, size 0 MiB
[    0.000000] OF: reserved mem: initialized node retram@38000000, compatible id shared-dma-pool
[    0.000000] cma: Reserved 128 MiB at 0xd8000000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] percpu: Embedded 15 pages/cpu s30028 r8192 d23220 u61440
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 113664
[    0.000000] Kernel command line: root=/dev/mmcblk0p4 rootwait
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 313508K/458752K available (6144K kernel code, 188K rwdata, 1540K rodata, 1024K init, 171K bss, 14172K reserved, 131072K cma-reserved, 0K highmem)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[    0.000000]  Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000] random: get_random_bytes called from start_kernel+0x320/0x4b0 with crng_init=0
[    0.000000] arch_timer: cp15 timer(s) running at 24.00MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x588fe9dc0, max_idle_ns: 440795202592 ns
[    0.000008] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 4398046511097ns
[    0.000024] Switching to timer-based delay loop, resolution 41ns
[    0.000806] Console: colour dummy device 80x30
[    0.001850] printk: console [tty0] enabled
[    0.001905] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000)
[    0.001954] pid_max: default: 32768 minimum: 301
[    0.002157] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.002202] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.003023] CPU: Testing write buffer coherency: ok
[    0.003373] CPU0: update cpu_capacity 1024
[    0.003412] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.004120] Setting up static identity map for 0xc0100000 - 0xc0100060
[    0.004303] rcu: Hierarchical SRCU implementation.
[    0.004743] smp: Bringing up secondary CPUs ...
[    0.005409] CPU1: update cpu_capacity 1024
[    0.005420] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[    0.005587] smp: Brought up 1 node, 2 CPUs
[    0.005663] SMP: Total of 2 processors activated (96.00 BogoMIPS).
[    0.005689] CPU: All CPU(s) started in SVC mode.
[    0.006297] devtmpfs: initialized
[    0.022241] VFP support v0.3: implementor 41 architecture 2 part 30 variant 7 rev 5
[    0.022755] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.022825] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[    0.028600] pinctrl core: initialized pinctrl subsystem
[    0.029641] NET: Registered protocol family 16
[    0.032494] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.039443] /soc/interrupt-controller@5000d000: bank0
[    0.039494] /soc/interrupt-controller@5000d000: bank1
[    0.039524] /soc/interrupt-controller@5000d000: bank2
[    0.043618] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOA bank added
[    0.044033] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOB bank added
[    0.044387] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOC bank added
[    0.044727] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOD bank added
[    0.045067] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOE bank added
[    0.045391] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOF bank added
[    0.045727] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOG bank added
[    0.046052] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOH bank added
[    0.046403] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOI bank added
[    0.046513] stm32mp157-pinctrl soc:pin-controller@50002000: Pinctrl STM32 initialized
[    0.047280] stm32mp157-pinctrl soc:pin-controller-z@54004000: GPIOZ bank added
[    0.047335] stm32mp157-pinctrl soc:pin-controller-z@54004000: Pinctrl STM32 initialized
[    0.058799] usbcore: registered new interface driver usbfs
[    0.058906] usbcore: registered new interface driver hub
[    0.059026] usbcore: registered new device driver usb
[    0.059281] pps_core: LinuxPPS API ver. 1 registered
[    0.059310] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti 
[    0.059368] PTP clock support registered
[    0.059683] Advanced Linux Sound Architecture Driver Initialized.
[    0.060887] clocksource: Switched to clocksource arch_sys_counter
[    0.071787] NET: Registered protocol family 2
[    0.072549] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[    0.072626] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.072718] TCP bind hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.072833] TCP: Hash tables configured (established 4096 bind 4096)
[    0.072991] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.073057] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.073306] NET: Registered protocol family 1
[    0.074665] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[    0.083817] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248)
[    0.083875] io scheduler mq-deadline registered
[    0.083900] io scheduler kyber registered
[    0.153081] STM32 USART driver initialized
[    0.153532] stm32-usart 40010000.serial: IRQ index 1 not found
[    0.153638] 40010000.serial: ttySTM0 at MMIO 0x40010000 (irq = 21, base_baud = 4000000) is a stm32-usart
[    0.853240] printk: console [ttySTM0] enabled
[    0.858169] stm32-usart 40010000.serial: rx dma alloc failed
[    0.863329] stm32-usart 40010000.serial: interrupt mode used for rx (no dma)
[    0.870383] stm32-usart 40010000.serial: tx dma alloc failed
[    0.876091] stm32-usart 40010000.serial: interrupt mode used for tx (no dma)
[    0.906704] random: fast init done
[    0.908119] brd: module loaded
[    0.912034] random: crng init done
[    0.922703] loop: module loaded
[    0.925951] libphy: Fixed MDIO Bus: probed
[    0.928672] CAN device driver interface
[    0.933588] stm32-dwmac 5800a000.ethernet: IRQ eth_wake_irq not found
[    0.938970] stm32-dwmac 5800a000.ethernet: IRQ eth_lpi not found
[    0.945195] stm32-dwmac 5800a000.ethernet: PTP uses main clock
[    0.950903] stm32-dwmac 5800a000.ethernet: no reset control found
[    0.960987] stm32-dwmac 5800a000.ethernet: User ID: 0x40, Synopsys ID: 0x42
[    0.966577] stm32-dwmac 5800a000.ethernet:   DWMAC4/5
[    0.971617] stm32-dwmac 5800a000.ethernet: DMA HW capability register supported
[    0.978910] stm32-dwmac 5800a000.ethernet: RX Checksum Offload Engine supported
[    0.986294] stm32-dwmac 5800a000.ethernet: TX Checksum insertion supported
[    0.993201] stm32-dwmac 5800a000.ethernet: Wake-Up On Lan supported
[    0.999541] stm32-dwmac 5800a000.ethernet: TSO supported
[    1.004858] stm32-dwmac 5800a000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    1.012753] stm32-dwmac 5800a000.ethernet: Enabled Flow TC (entries=2)
[    1.019296] stm32-dwmac 5800a000.ethernet: TSO feature enabled
[    1.025188] stm32-dwmac 5800a000.ethernet: Using 32 bits DMA width
[    1.032121] libphy: stmmac: probed
[    1.037644] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.042821] ehci-platform: EHCI generic platform driver
[    1.048394] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    1.054322] ohci-platform: OHCI generic platform driver
[    1.061849] stm32_rtc 5c004000.rtc: IRQ index 1 not found
[    1.065839] stm32_rtc 5c004000.rtc: alarm can't wake up the system: -6
[    1.073068] stm32_rtc 5c004000.rtc: registered as rtc0
[    1.077625] stm32_rtc 5c004000.rtc: setting system clock to 2000-01-01T00:00:22 UTC (946684822)
[    1.086648] stm32_rtc 5c004000.rtc: Date/Time must be initialized
[    1.092543] stm32_rtc 5c004000.rtc: registered rev:1.2
[    1.097792] i2c /dev entries driver
[    1.121614] stm32f7-i2c 40012000.i2c: can't use DMA
[    1.129085] i2c i2c-0: Added multiplexed i2c bus 1
[    1.133191] edt_ft5x06 0-0038: supply vcc not found, using dummy regulator
[    1.145478] input: generic ft5x06 (11) as /devices/platform/soc/40012000.i2c/i2c-0/0-0038/input/input0
[    1.153880] stm32f7-i2c 40012000.i2c: STM32F7 I2C-0 bus adapter
[    1.182290] stm32f7-i2c 5c002000.i2c: can't use DMA
[    1.186728] stpmic1 2-0033: PMIC Chip Version: 0x10
[    1.192618] BUCK1: supplied by regulator-dummy
[    1.198691] BUCK2: supplied by regulator-dummy
[    1.204581] BUCK3: supplied by regulator-dummy
[    1.210553] BUCK4: supplied by regulator-dummy
[    1.216397] LDO1: supplied by v3v3
[    1.221977] LDO2: supplied by regulator-dummy
[    1.227973] LDO3: supplied by vdd_ddr
[    1.233149] LDO4: supplied by regulator-dummy
[    1.236660] LDO5: supplied by regulator-dummy
[    1.243498] LDO6: supplied by v3v3
[    1.248521] VREF_DDR: supplied by regulator-dummy
[    1.254543] BOOST: supplied by regulator-dummy
[    1.258117] VBUS_OTG: supplied by bst_out
[    1.262218] SW_OUT: supplied by bst_out
[    1.267892] input: pmic_onkey as /devices/platform/soc/5c002000.i2c/i2c-2/2-0033/5c002000.i2c:stpmic@33:onkey/input/input1
[    1.278222] stm32f7-i2c 5c002000.i2c: STM32F7 I2C-2 bus adapter
[    1.286594] mmci-pl18x 58005000.sdmmc: Got CD GPIO
[    1.290542] mmci-pl18x 58005000.sdmmc: mmc0: PL180 manf 53 rev1 at 0x58005000 irq 46,0 (pio)
[    1.325558] sdhci: Secure Digital Host Controller Interface driver
[    1.330375] sdhci: Copyright(c) Pierre Ossman
[    1.335820] Synopsys Designware Multimedia Card Interface Driver
[    1.341056] sdhci-pltfm: SDHCI platform and OF driver helper
[    1.348863] usbcore: registered new interface driver usbhid
[    1.353093] usbhid: USB HID core driver
[    1.357656] stm32-ipcc 4c001000.mailbox: ipcc rev:1.0 enabled, 6 chans, proc 0
[    1.364638] OF: Can't handle multiple dma-ranges with different offsets on node(/ahb)
[    1.373179] OF: Can't handle multiple dma-ranges with different offsets on node(/ahb)
[    1.380360] stm32-rproc 10000000.m4: wdg irq registered
[    1.385339] stm32-rproc 10000000.m4: failed to get pdds
[    1.390643] remoteproc remoteproc0: m4 is available
[    1.398481] NET: Registered protocol family 10
[    1.402931] Segment Routing with IPv6
[    1.405275] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    1.412137] NET: Registered protocol family 17
[    1.415651] can: controller area network core (rev 20170425 abi 9)
[    1.422004] NET: Registered protocol family 29
[    1.426323] can: raw protocol (rev 20170425)
[    1.430606] can: broadcast manager protocol (rev 20170425 t)
[    1.436392] can: netlink gateway (rev 20190810) max_hops=1
[    1.442153] ThumbEE CPU extension supported.
[    1.446136] Registering SWP/SWPB emulation handler
[    1.452705] stm32-dma 48000000.dma-controller: STM32 DMA driver registered
[    1.459654] stm32-dma 48001000.dma-controller: STM32 DMA driver registered
[    1.466288] mmc0: new high speed SDHC card at address 0007
[    1.468382] stm32-mdma 58000000.dma-controller: STM32 MDMA driver registered
[    1.478263] mmcblk0: mmc0:0007 SD8GB 7.42 GiB
[    1.478435] reg11: supplied by vdd
[    1.485844] reg18: supplied by vdd
[    1.489331] stm32-usbphyc 5a006000.usbphyc: registered rev:1.0
[    1.500002] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.506266] [drm] Initialized stm 1.0.0 20170330 for 5a001000.display-controller on minor 0
[    1.514711] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    1.521174] GPT:247694 != 15564799
[    1.524492] GPT:Alternate GPT header not at the end of the disk.
[    1.530542] GPT:247694 != 15564799
[    1.533981] GPT: Use GNU Parted to correct GPT errors.
[    1.539199]  mmcblk0: p1 p2 p3 p4
[    1.993335] Console: switching to colour frame buffer device 60x50
[    2.018194] stm32-display 5a001000.display-controller: fb0: stmdrmfb frame buffer device
[    2.026848] dwc2 49000000.usb-otg: supply vusb_d not found, using dummy regulator
[    2.034054] dwc2 49000000.usb-otg: supply vusb_a not found, using dummy regulator
[    2.042060] dwc2 49000000.usb-otg: Configuration mismatch. dr_mode forced to host
[    2.055120] usb33: supplied by vdd_usb
[    2.058079] dwc2 49000000.usb-otg: DWC OTG Controller
[    2.062719] dwc2 49000000.usb-otg: new USB bus registered, assigned bus number 1
[    2.070208] dwc2 49000000.usb-otg: irq 42, io mem 0x49000000
[    2.076920] hub 1-0:1.0: USB hub found
[    2.079557] hub 1-0:1.0: 1 port detected
[    2.084452] ehci-platform 5800d000.usbh-ehci: EHCI Host Controller
[    2.089839] ehci-platform 5800d000.usbh-ehci: new USB bus registered, assigned bus number 2
[    2.105114] ehci-platform 5800d000.usbh-ehci: irq 48, io mem 0x5800d000
[    2.140919] ehci-platform 5800d000.usbh-ehci: USB 2.0 started, EHCI 1.00
[    2.154036] hub 2-0:1.0: USB hub found
[    2.159806] hub 2-0:1.0: 2 ports detected
[    2.167463] ALSA device list:
[    2.172483]   No soundcards found.
[    2.182411] EXT4-fs (mmcblk0p4): INFO: recovery required on readonly filesystem
[    2.195322] EXT4-fs (mmcblk0p4): write access will be enabled during recovery
[    2.496866] EXT4-fs (mmcblk0p4): recovery complete
[    2.508272] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: (null)
[    2.522013] VFS: Mounted root (ext4 filesystem) readonly on device 179:4.
[    2.534466] usb 2-1: new high-speed USB device number 2 using ehci-platform
[    2.548092] devtmpfs: mounted
[    2.555314] Freeing unused kernel memory: 1024K
[    2.562151] Run /sbin/init as init process
[    2.691199] EXT4-fs (mmcblk0p4): re-mounted. Opts: (null)
[    2.742529] hub 2-1:1.0: USB hub found
[    2.748854] hub 2-1:1.0: 4 ports detected

Boot Process of the GCW0, RG350 and similar devices

WerWolv — 2020-06-05T00:00:00.000Z

import Figure from ’@/Figure’;

The GCW0 as well as the RG350 are small handheld retro emulation and homebrew devices running the OpenDingux operating system. Although the RG350 was released 6 years after the GCW0, they both use the exact same Ingenic JZ4770 SoC. This post focuses on how the RG350’s system image is structured, how the JZ4770 loads data from it and how it ultimately jumps to the OpenDingux Linux kernel.

The layout of the system image looks as follows:


MBR
First Stage Bootloader
System image (containing the Kernel)
Data image (containing the rootfs)

When the JZ4770 is powered up, it first start executing it’s Boot ROM. The Boot ROM reads the boot select pins and determines on the RG350 that it’s supposed to boot from an MMC device, here an SD card.

The Boot ROM proceeds by copying the first 0x2000 bytes on the SD card into the SoCs dcache and deinitializing the MMC interface again. To validate if the read data is correct, the Boot ROM skips past the MBR, right to the first stage bootloader, and checks for the magic value MSPL. If valid, the 0x2000 bytes previously loaded into dcache are copied into icache where the Boot ROM jumps into. This is possible due to the unique fact that the JZ4770s cache is mapped into the address space at address 0x8000'0000.

The data found in the icache now is the MBR immediately followed by the first stage bootloader and the Program Counter now points to address 0x8000'0000. This is the start of the MBR which looks as follows in memory.

The MBR starts with the so called bootstrap code. A 440 bytes section containing code to setup the environment and jump into the actual first stage bootloader. On the GCW0 and the RG350, this is one instruction: 80 00 00 10 which is the MIPS instruction B PC+#0x204 which jumps past the MBR (0x200 bytes) and the MSPL magic (4 bytes). Congratulations, we’re now in our bootloader!

In red, the branch instruction is shown, in blue the `MSPL` magic in little endian.

The first thing the bootloader does is disabling all interrupts and clearing their flags. This is important since in the current state, interrupts cannot be handled and would cause the processor to jump to invalid addresses. They will be later on reenabled in the linux kernel. Next the stack is setup by initializing the stack pointer to point to a free area in memory. This is necessary to properly support C function calls and variables and the next step: the jump to main().

Execution now has reached the actual loader code of the bootloader. First, the MBR, which still can be found at address 0x8000'0000 in the memory mapped dcache, is parsed in order to find the offsets and sizes of the linux kernel and rootfs partitions. The MMC interface is then once again initialized and reads the linux kernel from the SD card into KSEG1. This is a 512MB region in memory which does not allow for caching. This is very important as the currently executed code still lives inside dcache and icache. This means any read or write to a region that supports caching would trash the cache causing the bootloader to be overwritten with garbage data and so corrupting the boot environment.

Once the linux kernel is loaded into memory, the linux configuration struct is setup so Linux later on for example knows where to load the rootfs from. After this, the bootloader is done and can jump to the address the kernel was loaded at via a function call that passes in the kernel parameter config struct.

The rest of the boot process is now the same as on any other linux system and is described very thoroughly here.

pcercuei for answering many of my questions about the system
circuits for answering many more of my questions and providing a pretty title image for this post
All the nice people on the Retro Gaming Handhelds Discord for being awesome to talk to

ImGui Game Overlays using DLL injection

WerWolv — 2020-04-19T00:00:00.000Z

import NotePanel from ’@/NotePanel’ import Steps from ’@/Steps’ import Figure from ’@/Figure’

I started this project mainly because I wanted a proper XP tracker I didn’t have to pay extra for since the price felt really unjustifiable to me.

This page aims to explain how I used Visual C++ to write a DLL injector tool for RuneScape’s new NXT Client as well as a DLL that hooks into RuneScape’s OpenGL draw calls to render a full Dear ImGui overlay on top of it.

For the DLL Injector, I used a basic C++ Console App from Visual Studio. We don’t need anything fancy for this PoC.

The principle of DLL injection is the following:

Find the PID of the process the DLL should be injected to.

Use the Windows API to get a Handle for that Process.

Allocate some memory in the target process and copy the DLL's path into it.

Start a new Thread in the target process with `LoadLibraryA` as the start routine.

In code, this looks like follows:

std::uint32_t pid = 0;

// Loop infinitely till RuneScape is launched
while (pid == 0) {
    pid = getPID(L"rs2client.exe");
    Sleep(1000);
}

#include 
#include 
#include 
#include 

std::uint32_t getPID(const std::wstring &processName) {
    std:uint32_t pid = 0;

    // Create snapshot
    HANDLE hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);

    // Check if the snapshot is valid, otherwise bail out
    if (hSnap == INVALID_HANDLE_VALUE)
        return 0;

    PROCESSENTRY32 procEntry{};
    procEntry.dwSize = sizeof(PROCESSENTRY32);

    // Iterate over all processes in the snapshot
    if (Process32First(hSnap, &procEntry)) {
        do {
            // Check if current process name is the same
            // as the passed in process name
            if (_wcsicmp(procEntry.szExeFile, processName.c_str()) == 0) {
                pid = procEntry.th32ProcessID;
                break;
            }
        } while (Process32Next(hSnap, &procEntry));
    }

    // Cleanup
    CloseHandle(hSnap);

    return pid;
}

This code uses the Windows API found in the Windows.h header to first create a snapshot of all running processes using CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0) and then iterating over it continuously calling Process32Next(hSnap, &procEntry) to get the next entry in the list. This is done until the process name of the current process matches the passed in name, in our case rs2client.exe.

This is super simple and straight-forward. We can just use the Windows API again.

HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);

This will give us a handle based on the PID with full access to that process. It’s how we interact with the RuneScape client.

This part is done in preparation for the next step. It’s allocating memory in the remote process and places the DLL string inside of it. This is needed since we need to call the LoadLibrary function there which takes in the path to the DLL to load.

// Allocate memory in the remote process
void *injectDllPathRemote = VirtualAllocEx(hProc, 0x00,
    MAX_PATH, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

// If allocation failed, bail out
if (injectDllPathRemote == nullptr)
    return 1;

// Write DLL path to the memory we just allocated
constexpr auto DllPath = "C:\\path\\to\\inject.dll";
WriteProcessMemory(hProc, injectDllPathRemote, DllPath, strlen(DllPath) + 1, 0);

Now we’re putting everything together. Starting a thread in the RuneScape process using CreateRemoteThread(...) with LoadLibraryA as the thread routine. This actually only works because of the nice coincidence that the signature of a LPTHREAD_START_ROUTINE is very similar to the one of LoadLibraryA. Both functions have a pointer as argument and a integer as return value. If there were more parameters to this function, it would get a lot more difficult to do DLL injection.

// Create a thread in the RuneScape process which
// runs LoadLibraryA("C:\\path\\to\\inject.dll")
HANDLE hRemoteThread = CreateRemoteThread(hProc, nullptr, 0,
    (LPTHREAD_START_ROUTINE)LoadLibraryA, injectDllPathRemote, 0, nullptr);

// Check if we succeeded
if (hRemoteThread != nullptr && hRemoteThread != INVALID_HANDLE_VALUE)
    CloseHandle(hRemoteThread);
else
    printf("[*] Error starting thread! Error Code: %x\n", GetLastError());

If CreateRemoteThread succeeds, we can now execute our code in the context of the target process allowing us to read its memory directly, patch code and insert hooks. Now we have to make a DLL that handles the hooking.

A DLL can be made by creating a Dynamic-Link Library (DLL) project in Visual Studio which sets up all the build configuration correctly and includes a basic template containing the DLL’s DllMain(...) entry-point function.

To actually run our code in the Game, we need to start yet another thread. This is generally a bad idea since during DllMain runs, Windows’ loader lock is held. This means many Windows calls that use the loader will cause the application to dead-lock. We don’t use any of these functions here though so we’re safe for the most part. In our case, a thread is definitely needed since DllMain blocks both our injector and API calls elsewhere in the target application. Therefore it has to run and finish quickly without blocking us.

Our DLL simply creates a new thread that runs our code when being loaded.

BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved) {
    switch (ul_reason_for_call) {
    case DLL_PROCESS_ATTACH: {
        HANDLE hThread = CreateThread(nullptr, 0,
            (LPTHREAD_START_ROUTINE)patcherThread, hModule, 0, 0);
        if (hThread != nullptr)
            CloseHandle(hThread);
        break;
    }
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }

    return TRUE;
}

For debug purposes it’s usually useful to have some sort of logging in our application. Luckily, the Windows API once again got us covered. Using the AllocConsole function allows us to create a new console window in the current program. Make sure to not close this window though as closing it will act as a SIGINT exception, potentially crashing the game if exceptions don’t get handled properly.

AllocConsole();                             // Open a new console window
FILE *f = new FILE();
freopen_s(&f, "CONOUT$", "w", stdout);      // Redirect stdout to CONOUT$, the
                                            // current console window.

printf("[*] Running under RuneScape!\n");   // Console works!

The secret of drawing an overlay in any process is hooking the graphic library’s “Frame End” function. In case of OpenGL this function is called wglSwapBuffers, in case of DirectX it’s d3dEndScene. We simply let the Game draw all its content and when it calls the function to end the current frame, we draw our overlay on top before calling the actual end frame function.

An easy way to find out what the Game you want to hook uses, is to load the Game executable into Ghidra and checking its imports in the Symbol Tree.

(RuneScape imports both opengl32.dll AND d3d9.dll here but according to the wiki, it only uses Direct3D if OpenGL is not working)

But how does hooking even work?

A hook works by overwriting some instruction(s) in a function’s code with a jmp instruction.

This instruction will redirect execution flow to trampoline routine which first executes the instruction(s) we overwrote with the jump. Then we have to save the current context. We don’t know how our code modifies the registers but what we know is that after our code ran and execution gets back to the hooked function, they need to be in the same state as before our hook ran. Otherwise the original function might end up doing unpredictable things or just straight out crashes. This is usually done by pushing all registers onto the stack, executing the hook, popping all registers back into the right registers and then jumping back right after the injected jmp instruction in the original function.

PUBLIC wglSwapBuffersTrampoline

wglSwapBuffersTrampoline PROC
	mov [rsp + 20], rsi         ; Execute the instruction that
                                ; was overwritten by our hook patch.

	push rax                    ; Save the current context.
	push rbx                    ; Pushing all the registers is probably
	push rcx                    ; overkill but better safe than sorry.
	push rdx
	push rsi
	push rdi
	push rbp
	push rsp
	push r8
	push r9
	push r10
	push r11
	push r12
	push r13
	push r14
	push r15

	call wglSwapBuffers_hook;    ; Call our hook

	pop r15                     ; Restore the context in reverse order
	pop r14                     ; as a Stack is a FILO buffer (First in last out)
	pop r13
	pop r12
	pop r11
	pop r10
	pop r9
	pop r8
	pop rsp
	pop rbp
	pop rdi
	pop rsi
	pop rdx
	pop rcx
	pop rbx
	pop rax

	jmp wglSwapBuffers_return   ; Jump back to the original function.
                                ; This is the address of where our jmp
                                ; instruction was inserted + 5, so immediately
                                ; after it.

wglSwapBuffersTrampoline ENDP

Now that we have a place for our hook to jump to, we need to insert it into the function we want to hook. The following functions take care of removing code page write restrictions, inserting the hook and restoring the original restrictions again.

namespace mem {

    template
    T read(DWORD64 addr) {
        return *((T *)addr);
    }

    template
    void write(DWORD64 addr, T value) {
        *((T *)addr) = value;
    }

    template
    DWORD64 protect(DWORD64 addr, DWORD protection) {
        DWORD oldProtection;
        VirtualProtect((LPVOID)addr, sizeof(T), protection, &oldProtection);

        return oldProtection;
    }

    DWORD64 hookFunction(DWORD64 hookAt, DWORD64 newFunc, unsigned int size) {
        // -5 since the jump is relative to the next instruction
        DWORD64 newOffset = newFunc - hookAt - 5;

        auto oldProtection = mem::protect(hookAt + 1, PAGE_EXECUTE_READWRITE);

        // Opcode of the jmp instruction
        mem::write(hookAt, 0xE9);
        mem::write(hookAt + 1, newOffset);

        // nop extra bytes to avoid corrupting the overwritten opcode
        for (unsigned int i = 5; i < size; i++)
            mem::write(hookAt + i, 0x90);

        mem::protect(hookAt + 1, oldProtection);

        return hookAt + 5;
    }
}

Using this, a hook can be inserted as follows:

using wglSwapBuffers_t = void(*)(_In_ HDC hDc);
extern "C" wglSwapBuffers_t wglSwapBuffers_return = nullptr;
extern "C" void wglSwapBuffersTrampoline();

// ...

// Get a handle to the opengl32.dll
HMODULE hOpengl32 = GetModuleHandle(L"opengl32.dll");

if (hOpengl32 != nullptr) {
    // Get the address of wglSwapBuffers
    DWORD64 wglSwapBuffersHookAddr = (DWORD64)GetProcAddress(
        hOpengl32,
        "wglSwapBuffers"
    );

    // Insert a hook to our trampoline at the start of wglSwapBuffers,
    // returns the address to return to
    wglSwapBuffers_return = (glSwapBuffers_t) mem::hookFunction(
        wglSwapBuffersHookAddr, 
        (DWORD64)wglSwapBuffersTrampoline, 
        5
    );
}

Finally, after all this work we can start drawing the ImGui overlay. For this, we just download the ImGui source code and compile it together with the rest of the DLL code. ImGui also needs a OpenGL wrapper to compile, I used GLEW for this. For it to compile, both the opengl32.lib and glew32s.lib (note the s for static linking) have to be linked into the DLL. opengl32.lib gets dynamically linked as it’s already been loaded by RuneScape but GLEW HAS to be linked statically since we can’t load another DLL within the injected DLL without risking a dead-lock.

Using the ImGui impl files for win32 and opengl3 found in the examples folder of it’s repo, a simple overlay can be created. I used imgui_impl_win32.h and imgui_impl_opengl3.h for RuneScape but this depends heavily on what your game uses.

During initialization of our graphics stuff, we also hook the game’s wndProc callback. It’s use for us is to capture keyboard and mouse events and direct them to ImGui. It also allows us to toggle the overlay though and taking away focus from the game when the overlay is present.

HWND hGameWindow;
WNDPROC hGameWindowProc;
bool menuShown = true;

LRESULT CALLBACK windowProc_hook(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam) {
    // Toggle the overlay using the delete key
    if (uMsg == WM_KEYDOWN && wParam == VK_DELETE) {
        menuShown = !menuShown;
        return false;
    }

    // If the overlay is shown, direct input to the overlay only
    if (menuShown) {
        CallWindowProc(ImGui_ImplWin32_WndProcHandler, hWnd, uMsg, wParam, lParam);
        return true;
    }

    // Otherwise call the game's wndProc function
    return CallWindowProc(hGameWindowProc, hWnd, uMsg, wParam, lParam);
}

void glSwapBuffers_hook(HDC hDc) {
    // Initialize GLEW and ImGui but only once
    static bool imGuiInitialized = false;
    if (!imGuiInitialized) {
        imGuiInitialized = true;

        // Get the game's window from it's handle
        hGameWindow = WindowFromDC(hDc);

        // Overwrite the game's wndProc function
        hGameWindowProc = (WNDPROC)SetWindowLongPtr(hGameWindow,
            GWLP_WNDPROC, (LONG_PTR)windowProc_hook);

        // Init GLEW, create ImGui context, init ImGui
        glewInit();
        ImGui::CreateContext();
        ImGui_ImplWin32_Init(hGameWindow);
        ImGui_ImplOpenGL3_Init();
        ImGui::StyleColorsDark();
        ImGui::GetStyle().AntiAliasedFill = false;
        ImGui::GetStyle().AntiAliasedLines = false;
        ImGui::CaptureMouseFromApp();
        ImGui::GetStyle().WindowTitleAlign = ImVec2(0.5f, 0.5f);
    }

    // If the menu is shown, start a new frame and draw the demo window
    if (menuShown) {
        ImGui_ImplOpenGL3_NewFrame();
        ImGui_ImplWin32_NewFrame();
        ImGui::NewFrame();
        ImGui::ShowDemoWindow();
        ImGui::Render();

        // Draw the overlay
        ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
    }
}

While it’s pretty simple to inject a DLL, there are a lot of things that can go wrong. Here are some of the issues I faced when writing this and how I got around them:

This happened after adding GLEW to the DLL. I tried to dynamically link to glew.dll by loading it in my own DLL. This does not work as my DLL now depended on GLEW already being loaded. I fixed it by simply linking GLEW statically.

This was because I forgot to push/pop one register in the trampoline causing the context to be tainted when returning back to the original function. I also originally didn’t replicate the instruction I overwrote with the jump which caused the stack to corrupt when the hooked function tried to return

StackOverflow suggested to use SetWindowLong to overwrite the wndProc function. This did not work and my hook was never called. Switching to SetWindowLongPtr instead fixed the issues.

WerWolv

Blog Component Zoo

Adding Stack Traces to C++ Exceptions

Hooking __cxa_throw

How to bring up the Linux Kernel on a new platform

Compiling a init program

USB for Software Developers: An introduction to writing userspace USB drivers

Enumerating the device with libusb

Thermal Printer BLE Protocol reverse engineering

Reverse Engineering the Surface Book 2's proprietary IOCTL commands

Getting a home-made OS running on a STM32MP1 based development board

Boot Process of the GCW0, RG350 and similar devices

ImGui Game Overlays using DLL injection

Hooking `__cxa_throw`

Compiling a `init` program

Enumerating the device with `libusb`