Skip to content
shinian98 edited this page Apr 4, 2025 · 16 revisions

TaintEMU-VMI User Guide

TaintEMU: Decoupling Tracking from Functional Domains for Architecture-Agnostic and Efficient Whole-System Taint Tracking

Quick Example

1. Prepare necessaries

First, create a directory named xtaint, and place TaintEMU inside this directory. Next, prepare a guest image (for example, a Debian on ARM64, which can be downloaded form DQIB, and the image used by this wiki can be download from this LINK) and place it into the xtaint directory.

Figure 1-1 xtaint Directory

As shown in Figure 1-1, after this step, the xtaint directory contains two folders: dqib-arm64-virt and TaintEMU-VMI. The dqib-arm64-virt folder contains the ARM64 Debian system and its configuration files, while the TaintEMU-VMI folder contains all the source code download form this git repository.

2. Install QEMU Compilation Dependencies

On Ubuntu 22.04, run the following commands:

sudo apt update
sudo apt install build-essential ninja-build make git bison flex gawk libpixman-1-dev libsdl2-dev libslirp-dev python3 python3-pip rlwrap socat 

pip3 install meson

Note: Make sure to add ~/.local/bin to your PATH in order to use meson.

3. Compile TaintEMU

Navigate to the xtaint/TaintEMU-VMI folder and run the following commands:

mkdir build
cd build
../configure --target-list=aarch64-softmmu --enable-taint-engine
make -j8

After compilation, the build directory will contain the executable file qemu-system-aarch64.

Figure 1-2 TaintEMU Compilation Completed

As shown in Figure 1-2, after compilation, the build directory contains the executable file qemu-system-aarch64.

4. Use TaintEMU to Start the ARM64 Debian System

4.1 Start TaintEMU

Open a new terminal (Terminal 1) and navigate to the xtaint/dqib_arm64-virt folder, then run:

./start_x_taint.sh

Figure 1-3 Start TaintEMU

As shown in Figure 1-3, TaintEMU has started successfully.

4.2 Connect to the Emulated USB Serial

Open another terminal (Terminal 2), navigate to the xtaint/dqib_arm64-virt folder, and run:

./serial.sh

Figure 1-4 Connect to Emulated USB Serial Port

Figure 1-5 TaintEMU Running Debian

After connecting to the serial, TaintEMU runs the Debian operating system.

5. Communicate with TaintEMU Using QMP Protocol

Open another terminal (Terminal 3), navigate to the xtaint/dqib_arm64-virt folder, and run:

./qmp.sh

Figure 1-6 Connect to QMP

After successfully connecting via QMP, Terminal 3 will receive a response from QEMU.

6. Configure Virtual Machine Introspection

6.1 Get the Kernel Symbol Table

Wait for the Debian operating system to boot completely until the login prompt appears.

Figure 1-7 Debian Boot Completed

Open another terminal (Terminal 4), navigate to the xtaint/dqib_arm64-virt folder, and run:

python3 copy_out_sysmap.py

Once you see the message done., the guest kernel symbol table has been copied to the host machine.

Figure 1-8 Copying Symbol Table

6.2 Set Up Virtual Machine Introspection for TaintEMU

In Terminal 3, run the following commands:

{ "execute": "qmp_capabilities" }
{ "execute": "setup-vmi", "arguments":{"path":"config.json"} }

After executing these commands, the virtual machine introspection function will be successfully configured.

Figure 1-9 Setting Up Virtual Machine Introspection

7. Test Virtual Machine Introspection Functionality

In Terminal 3, run the command:

{ "execute": "x-ray-ps" }

The process list will be returned, indicating that the VMI function is working properly.

Figure 1-10 Returning Process List

8. Test Dynamic Information Flow Tracking Functionality

8.1 Log in as Root in Terminal 1 and Read data from USB Serial 0

Log in as the root user:

Username: root
Password: root

Run the following command:

cat /dev/ttyUSB0 | grep hello

Figure 1-11 Reading Data from USB Serial 0 (Terminal 1)

8.2 Send Data to Serial and Track It

In Terminal 2, type:

hello,world

Figure 1-12 Sending Data to USB Serial (Terminal 2)

Terminal 1 will echo the tracking information.

Figure 1-13 Dynamic Information Flow Tracking (Terminal 1)

This concludes the example.

Feature Overview

TaintEMU is a QEMU-based dynamic information flow tracking tool that provides high-performance and high-compatibility tracking capabilities across various instruction sets. The functionality consists of two main modules: virtual machine introspection and dynamic information flow tracking.

The virtual machine introspection feature follows an event-driven programming model based on the publish-subscribe pattern. The event source is guest function call, and users need to register the corresponding event handler as needed. The dynamic information flow tracking functionality includes a series of interfaces for reading and writing data labels, with these interfaces, users can easily achieve whole-system dynamic information flow tracking as they need.

Programming Interfaces

Virtual Machine Introspection Functionality

Include the header file "sysemu/x-ray.h"

int x_ray_add_kernel_hook (const char *name, xray_callback_t cb);

Description: Set a hook function based on the function name; when the guest executes the kernel function name, the callback function cb is called.

Parameters:

  • @name: The name of the kernel function.
  • @cb: The callback function.
int x_ray_add_process_hook (TVM_task_struct *task, uint64_t ptr, xray_callback_t cb);

Description: Set a hook function based on the process descriptor and memory address; when the guest executes the process described by task and reaches the address ptr, the callback function cb is called.

Parameters:

  • @task: The process descriptor.
  • @ptr: The memory address.
  • @cb: The callback function.
TVM_task_struct* x_ray_get_current_task (CPUState *cpu);

Description: Return the process descriptor running on the specified CPU based on CPUState.

Parameters:

  • @cpu: The specified CPUState.

Dynamic Information Flow Tracking Functionality

  1. Data Label Read/Write Interfaces

Include the header file "exec/cpu_ldst.h"

uint32_t cpu_ldub_taint(CPUArchState *env, abi_ptr ptr); 
int cpu_ldsb_taint(CPUArchState *env, abi_ptr ptr); 
uint32_t cpu_lduw_be_taint(CPUArchState *env, abi_ptr ptr); 
int cpu_ldsw_be_taint(CPUArchState *env, abi_ptr ptr); 
uint32_t cpu_ldl_be_taint(CPUArchState *env, abi_ptr ptr); 
uint64_t cpu_ldq_be_taint(CPUArchState *env, abi_ptr ptr); 
uint32_t cpu_lduw_le_taint(CPUArchState *env, abi_ptr ptr); 
int cpu_ldsw_le_taint(CPUArchState *env, abi_ptr ptr); 
uint32_t cpu_ldl_le_taint(CPUArchState *env, abi_ptr ptr); 
uint64_t cpu_ldq_le_taint(CPUArchState *env, abi_ptr ptr); 
void cpu_stb_taint(CPUArchState *env, abi_ptr ptr, uint32_t val); 
void cpu_stw_be_taint(CPUArchState *env, abi

_ptr ptr, uint32_t val); 
void cpu_stl_be_taint(CPUArchState *env, abi_ptr ptr, uint32_t val); 
void cpu_stq_be_taint(CPUArchState *env, abi_ptr ptr, uint64_t val); 
void cpu_stw_le_taint(CPUArchState *env, abi_ptr ptr, uint32_t val); 
void cpu_stl_le_taint(CPUArchState *env, abi_ptr ptr, uint32_t val); 
void cpu_stq_le_taint(CPUArchState *env, abi_ptr ptr, uint64_t val); 

The interfaces are similar to QEMU native CPU interfaces. For usage details, refer to the QEMU API Documentation.

  1. Callback Interfaces

Include the header file "tcg/tcg-taint.h"

void taint_write_notify (uint64_t addr, uint64_t taint, uint64_t val, CPUArchState *env);

Description: Callback function when labeled data is written to memory.

Parameters:

  • @addr: The address of the memory being written to.
  • @taint: The label value.
  • @val: The value being written.
  • @env: The guest environment variable.
void taint_read_notify (uint64_t addr, uint64_t taint, uint64_t val, CPUArchState *env);

Description: Callback function when labeled data is read from memory.

Parameters:

  • @addr: The address of the labeled data in memory.
  • @taint: The label value.
  • @val: The value being read.
  • @env: The guest environment variable.
void taint_exec_notify (uint64_t addr, uint64_t taint);

Description: Callback function when labeled data is executed.

Parameters:

  • @addr: The address of the labeled data in memory.
  • @taint: The label value.
  • @val: The instruction value.
  • @env: The guest environment variable.