<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.0.1">Jekyll</generator><link href="https://mcejp.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://mcejp.github.io/" rel="alternate" type="text/html" /><updated>2026-01-25T16:51:59+01:00</updated><id>https://mcejp.github.io/feed.xml</id><title type="html">Reinventing The Wheel</title><entry><title type="html">Extracting regions from ESA WorldCover</title><link href="https://mcejp.github.io/2026/01/25/worldcover.html" rel="alternate" type="text/html" title="Extracting regions from ESA WorldCover" /><published>2026-01-25T00:00:00+01:00</published><updated>2026-01-25T00:00:00+01:00</updated><id>https://mcejp.github.io/2026/01/25/worldcover</id><content type="html" xml:base="https://mcejp.github.io/2026/01/25/worldcover.html"><![CDATA[<p>ESA WorldCover is a categorical dataset with 11 classes of land cover (such as water, trees, built-up etc.) at a 10-meter resolution covering most of the world.
In this short post I show how to extract a specific region given its geographic coordinates.</p>

<h3 id="acquiring-the-data">Acquiring the data</h3>

<p>The dataset is delivered in tiles covering 3x3 degrees each, bundled into <em>macrotiles</em> of 60x60 degrees. The macrotiles can be freely downloaded <a href="https://worldcover2021.esa.int/downloader">on the project’s website</a>.</p>

<p><img src="../../../images/2026/worldcover/downloader.jpg" alt="screenshot of the macrotile picker interface" /></p>

<p>Assuming you have downloaded and extracted the correct macrotile, you can use the <code class="highlighter-rouge">rasterio</code> package to load the tile corresponding to the given geographic coordinates:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">rasterio</span>

<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="n">lat_lon</span><span class="p">):</span>
    <span class="c1"># dataset is provided in the form of tiles 3x3 degrees in size
</span>    <span class="c1"># compute tile coordinates
</span>    <span class="n">tile_coords</span> <span class="o">=</span> <span class="p">(</span><span class="nb">int</span><span class="p">((</span><span class="n">lat_lon</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">//</span> <span class="mi">3</span><span class="p">)</span> <span class="o">*</span> <span class="mi">3</span><span class="p">),</span> <span class="nb">int</span><span class="p">((</span><span class="n">lat_lon</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">//</span> <span class="mi">3</span><span class="p">)</span> <span class="o">*</span> <span class="mi">3</span><span class="p">))</span>
    <span class="n">tile_extent</span> <span class="o">=</span> <span class="p">(</span><span class="n">tile_coords</span><span class="p">,</span> <span class="p">(</span><span class="n">tile_coords</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="n">tile_coords</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">3</span><span class="p">))</span>

    <span class="c1"># build file name
</span>    <span class="n">lat_str</span> <span class="o">=</span> <span class="p">(</span><span class="sa">f</span><span class="s">"N</span><span class="si">{</span><span class="nb">abs</span><span class="p">(</span><span class="n">tile_coords</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="si">:</span><span class="mi">02</span><span class="n">d</span><span class="si">}</span><span class="s">"</span> <span class="k">if</span> <span class="n">tile_coords</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="k">else</span>
               <span class="sa">f</span><span class="s">"S</span><span class="si">{</span><span class="nb">abs</span><span class="p">(</span><span class="n">tile_coords</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="si">:</span><span class="mi">02</span><span class="n">d</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="n">lon_str</span> <span class="o">=</span> <span class="p">(</span><span class="sa">f</span><span class="s">"E</span><span class="si">{</span><span class="n">tile_coords</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="mi">03</span><span class="n">d</span><span class="si">}</span><span class="s">"</span> <span class="k">if</span> <span class="n">tile_coords</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&lt;</span> <span class="mi">180</span> <span class="k">else</span>
               <span class="sa">f</span><span class="s">"W</span><span class="si">{</span><span class="p">(</span><span class="mi">360</span> <span class="o">-</span> <span class="n">tile_coords</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">%</span> <span class="mi">360</span><span class="si">:</span><span class="mi">03</span><span class="n">d</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

    <span class="n">filename</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"ESA_WorldCover_10m_2021_V200_</span><span class="si">{</span><span class="n">lat_str</span><span class="si">}{</span><span class="n">lon_str</span><span class="si">}</span><span class="s">_Map.tif"</span>

    <span class="k">with</span> <span class="n">rasterio</span><span class="p">.</span><span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="k">as</span> <span class="n">dataset</span><span class="p">:</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># read first band
</span>
    <span class="k">return</span> <span class="n">data</span><span class="p">,</span> <span class="n">tile_extent</span>
</code></pre></div></div>

<h3 id="extracting-the-region-of-interest">Extracting the region of interest</h3>

<p>Now, suppose we have a <em>region of interest</em> specified as a tuple of <em>min-latitude, max-latitude, min-longitude, max-longitude</em>. We now need to “cut out” this area out of the full tile.</p>

<p>The key to this is a function called <code class="highlighter-rouge">discretize_extent</code>. It takes 3 parameters:</p>
<ul>
  <li>extent of the tile we have loaded</li>
  <li>shape of the tile (X &amp; Y resolution)</li>
  <li>the requested region</li>
</ul>

<p>…and returns the raster (integer) coordinates corresponding to the region within the tile. Note that by snapping to the integer coordinates, the geographic coordinates will slightly change as well and must be recomputed. Given the high resolution of the dataset, this inaccuracy will be usually negligible, but we like to do things <em>properly</em>, so I also recompute the updated extent.</p>

<p>We follow the convention given by <a href="https://en.wikipedia.org/wiki/ISO_6709">ISO 6709</a>:</p>
<ol>
  <li>Latitude comes before longitude</li>
  <li>North latitude is positive</li>
  <li>East longitude is positive</li>
  <li>Coordinates are represented as decimal degrees</li>
</ol>

<p>On the other hand, rows in the dataset are ordered north-to-south, which means that at some point, we need to flip the Y coordinate.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">math</span>

<span class="k">def</span> <span class="nf">discretize_extent</span><span class="p">(</span><span class="n">tile_extent</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">],</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]],</span>
                      <span class="n">tile_shape</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
                      <span class="n">requested_extent</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">],</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]]):</span>
    <span class="c1"># axis 0 is latitude north-&gt;south (!)
</span>    <span class="c1"># axis 1 is longitude west-&gt;east
</span>
    <span class="p">(</span><span class="n">tile_min_lat</span><span class="p">,</span> <span class="n">tile_min_lon</span><span class="p">),</span> <span class="p">(</span><span class="n">tile_max_lat</span><span class="p">,</span> <span class="n">tile_max_lon</span><span class="p">)</span> <span class="o">=</span> <span class="n">tile_extent</span>
    <span class="p">(</span><span class="n">req_min_lat</span><span class="p">,</span> <span class="n">req_min_lon</span><span class="p">),</span> <span class="p">(</span><span class="n">req_max_lat</span><span class="p">,</span> <span class="n">req_max_lon</span><span class="p">)</span> <span class="o">=</span> <span class="n">requested_extent</span>
    <span class="n">relative_extent</span> <span class="o">=</span> <span class="p">((</span><span class="n">req_min_lat</span> <span class="o">-</span> <span class="n">tile_min_lat</span><span class="p">,</span>
                        <span class="n">req_min_lon</span> <span class="o">-</span> <span class="n">tile_min_lon</span><span class="p">),</span>
                       <span class="p">(</span><span class="n">req_max_lat</span> <span class="o">-</span> <span class="n">tile_min_lat</span><span class="p">,</span>
                        <span class="n">req_max_lon</span> <span class="o">-</span> <span class="n">tile_min_lon</span><span class="p">))</span>

    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">relative_extent</span><span class="o">=</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

    <span class="c1"># project user extents into raster space
</span>    <span class="n">px_per_deg_lat</span> <span class="o">=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="p">(</span><span class="n">tile_max_lat</span> <span class="o">-</span> <span class="n">tile_min_lat</span><span class="p">)</span>
    <span class="n">px_per_deg_lon</span> <span class="o">=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">/</span> <span class="p">(</span><span class="n">tile_max_lon</span> <span class="o">-</span> <span class="n">tile_min_lon</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">px_per_deg_lat</span><span class="o">=</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">px_per_deg_lon</span><span class="o">=</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

    <span class="c1"># snap "outwards" to integer coordinates
</span>    <span class="n">min_y</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">math</span><span class="p">.</span><span class="n">floor</span><span class="p">(</span><span class="n">relative_extent</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">px_per_deg_lat</span><span class="p">))</span>
    <span class="n">min_x</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">math</span><span class="p">.</span><span class="n">floor</span><span class="p">(</span><span class="n">relative_extent</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">px_per_deg_lon</span><span class="p">))</span>
    <span class="n">max_y</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">math</span><span class="p">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">relative_extent</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">px_per_deg_lat</span><span class="p">))</span>
    <span class="n">max_x</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">math</span><span class="p">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">relative_extent</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">px_per_deg_lon</span><span class="p">))</span>

    <span class="c1"># sanity check :)
</span>    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">min_y</span><span class="o">=</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">min_x</span><span class="o">=</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">max_y</span><span class="o">=</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">max_x</span><span class="o">=</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">assert</span> <span class="n">min_y</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">min_y</span> <span class="o">&lt;=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
    <span class="k">assert</span> <span class="n">min_x</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">min_x</span> <span class="o">&lt;=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
    <span class="k">assert</span> <span class="n">max_y</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">max_y</span> <span class="o">&lt;=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
    <span class="k">assert</span> <span class="n">max_x</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">max_x</span> <span class="o">&lt;=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>

    <span class="c1"># reconstruct geo coordinates
</span>    <span class="n">discretized_extent</span> <span class="o">=</span> <span class="p">((</span><span class="n">tile_min_lat</span> <span class="o">+</span> <span class="n">min_y</span> <span class="o">/</span> <span class="n">px_per_deg_lat</span><span class="p">,</span>
                           <span class="n">tile_min_lon</span> <span class="o">+</span> <span class="n">min_x</span> <span class="o">/</span> <span class="n">px_per_deg_lon</span><span class="p">),</span>
                          <span class="p">(</span><span class="n">tile_min_lat</span> <span class="o">+</span> <span class="n">max_y</span> <span class="o">/</span> <span class="n">px_per_deg_lat</span><span class="p">,</span>
                           <span class="n">tile_min_lon</span> <span class="o">+</span> <span class="n">max_x</span> <span class="o">/</span> <span class="n">px_per_deg_lon</span><span class="p">))</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">discretized_extent</span><span class="o">=</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

    <span class="c1"># flip Y
</span>    <span class="n">min_y_corr</span> <span class="o">=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">max_y</span>
    <span class="n">max_y_corr</span> <span class="o">=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">min_y</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">min_y_corr</span><span class="o">=</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">max_y_corr</span><span class="o">=</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">assert</span> <span class="n">min_y_corr</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">min_y_corr</span> <span class="o">&lt;=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
    <span class="k">assert</span> <span class="n">max_y_corr</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">max_y_corr</span> <span class="o">&lt;=</span> <span class="n">tile_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>

    <span class="k">return</span> <span class="p">((</span><span class="n">min_y_corr</span><span class="p">,</span> <span class="n">max_y_corr</span><span class="p">),</span> <span class="p">(</span><span class="n">min_x</span><span class="p">,</span> <span class="n">max_x</span><span class="p">)),</span> <span class="n">discretized_extent</span>
</code></pre></div></div>

<p>In practice you might also want to down-sample the extract for performance or other reasons. Since the data is categorical, there is no point in trying to interpolate the sample values; nearest-neighbor sampling must be used:</p>

<p><img src="../../../images/2026/worldcover/output.png" alt="plot of the extracted and down-sampled data" /></p>]]></content><author><name></name></author><summary type="html"><![CDATA[ESA WorldCover is a categorical dataset with 11 classes of land cover (such as water, trees, built-up etc.) at a 10-meter resolution covering most of the world. In this short post I show how to extract a specific region given its geographic coordinates.]]></summary></entry><entry><title type="html">STAK: My weird programming language for DOS games</title><link href="https://mcejp.github.io/2025/10/29/stak.html" rel="alternate" type="text/html" title="STAK: My weird programming language for DOS games" /><published>2025-10-29T00:00:00+01:00</published><updated>2025-10-29T00:00:00+01:00</updated><id>https://mcejp.github.io/2025/10/29/stak</id><content type="html" xml:base="https://mcejp.github.io/2025/10/29/stak.html"><![CDATA[<p><img src="../../../images/2025/stak/pt8086.jpg" alt="screenshot" /></p>

<p>The inspiration for this project came from two main sources.</p>

<p>The first is Zork, a text adventure released back in 1977. The original version ran on the PDP-10 mainframe and was coded in MDL, a Lisp derivative. When the time came for a microcomputer version, however, MDL was just too big to fit. The game was reimplemented in a custom language called ZIL – <em>Zork Implementation Language</em>. Incredibly, the source code has been preserved and is now available publicly. It looks like this:</p>

<pre><code class="language-zil">&lt;ROUTINE TRAP-DOOR-EXIT ()
         &lt;COND (,RUG-MOVED
                &lt;COND (&lt;FSET? ,TRAP-DOOR ,OPENBIT&gt;
                       &lt;RETURN ,CELLAR&gt;)
                      (T
                       &lt;TELL "The trap door is closed." CR&gt;
                       &lt;THIS-IS-IT ,TRAP-DOOR&gt;
                       &lt;RFALSE&gt;)&gt;)
               (T
                &lt;TELL "You can't go that way." CR&gt;
                &lt;RFALSE&gt;)&gt;&gt;
</code></pre>

<p>You can clearly see the Lispy origins, but this language compiles to a bytecode for a quite rudimentary <a href="https://web.archive.org/web/20131204205854/http://www.gnelson.demon.co.uk/zspec/">stack-based virtual machine</a> (keep in mind it had to run on 8-bit machines like <a href="https://en.wikipedia.org/wiki/TRS-80">TRS-80</a> or the <a href="https://en.wikipedia.org/wiki/Apple_II">Apple II</a>).</p>

<p>The second impulse came from reading about the Amiga game Another World. It is likewise built on top of a custom virtual machine and, as far as I understood, its graphics are based completely on vector drawing commands. This game was programmed directly in the bytecode “assembly language” – apparently without even variable names!</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>start   jsr       init
        setvec    60        flip10
        seti      v255      4
        seti      v246      1
        seti      v227      0
        break
        play      106       20    0    2
        play      106       20    0    3
        jsr       initiz
        jsr       setmaz
        setvec    28        nag1
        setvec    21        nag2
        seti      v99       2
        si        v4        =     48   suit
</code></pre></div></div>

<p><em>(this snippet has been edited for brevity, but it should give a pretty accurate impression)</em></p>

<p>Especially the idea of using strictly vector graphics stuck with me. In theory, it means a single game build could work on various devices with different screen resolutions – this was usually not the case with classic bitmapped graphics.</p>

<p><a href="https://www.youtube.com/watch?v=J0gv2bV9ok4">In an interview with Eric Chahi</a>, the game’s author, he mentions that during development, he would make changes and instantly see them in the running game. This would be tricky to do when you use a language like C, which compiles to optimized native code, and an off-the-shelf compiler: even though GCC and Clang are fully open-source, it would take considerable effort to make all the necessary changes.</p>

<p>One solution is to leave existing languages and compilers behind and re-invent the world from scratch. As the target, I chose a 286-era DOS PC with VGA graphics (the 320x200x8bpp <a href="https://en.wikipedia.org/wiki/Mode_13h">Mode 13h</a> in particular). I consider the 286 to be the last truly 16-bit CPU in the x86 series; the subsequent 386, while fully backwards compatible, is natively 32-bit.</p>

<h3 id="creating-a-language">Creating a language</h3>

<p>Having implemented language parsing many times before, I knew I wanted to avoid inventing a new grammar this time around. Fortunately, there is an existing grammar that has been successfully used for programming languages for decades – you guessed it, I am talking about <a href="https://en.wikipedia.org/wiki/S-expression">S-expressions</a>, aka LISP syntax.</p>

<p>My language is quite minimalist, with expressive power on par with early BASICs. It takes many shortcuts, often at the cost of run-time efficiency. It knows only one data type: signed 16-bit integer. There are two variable scopes: global and function. There are no structures, no arrays and no text strings. On the other hand, there are a few convenience features that made the cut because they were easy to implement; one example is the possibility for functions to return multiple values, somewhat offsetting the missing the absence of a tuple/vector type.</p>

<p>Since the language does not include text output (after all, we want our games to be visual!), a Hello World example consists of filling the screen with a solid color and pausing for a few seconds before exiting:</p>

<div class="language-scheme highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">main</span><span class="p">)</span>
  <span class="p">(</span><span class="nf">fill-rect</span> <span class="nv">COLOR:WHITE</span> <span class="mi">0</span> <span class="mi">0</span> <span class="nv">W</span> <span class="nv">H</span><span class="p">)</span>
  <span class="p">(</span><span class="nf">pause-frames</span> <span class="mi">100</span><span class="p">))</span>
</code></pre></div></div>

<p>The second example demonstrates user-defined functions, variables and looping:</p>

<div class="language-scheme highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">clear-screen</span> <span class="nv">color</span><span class="p">)</span>
  <span class="p">(</span><span class="nf">fill-rect</span> <span class="nv">color</span> <span class="mi">0</span> <span class="mi">0</span> <span class="nv">W</span> <span class="nv">H</span><span class="p">))</span>

<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">main</span><span class="p">)</span>
  <span class="p">(</span><span class="k">define</span> <span class="nv">foo</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="k">define</span> <span class="nv">bar</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">foo</span> <span class="mi">20</span><span class="p">))</span>
  <span class="p">(</span><span class="k">set!</span> <span class="nv">foo</span> <span class="mi">40</span><span class="p">)</span>

  <span class="p">(</span><span class="nf">dotimes</span> <span class="p">(</span><span class="nf">color</span> <span class="nv">COLOR:COUNT</span><span class="p">)</span>
    <span class="p">(</span><span class="nf">clear-screen</span> <span class="nv">color</span><span class="p">)</span>
    <span class="p">(</span><span class="nf">pause-frames</span> <span class="nv">interval</span><span class="p">)))</span>
</code></pre></div></div>

<p>Those UPPERCASE symbols are built-in constants, by the way. I don’t want to dwell on the language too much, as it is not particularly innovative and I imagine the (parenthesis-heavy syntax will ((be off-putting) to most)). Several <a href="https://github.com/mcejp/STAK/blob/main/flower.scm">example programs</a> can be found in the repository.</p>

<h3 id="designing-the-virtual-machine">Designing the virtual machine</h3>

<figure>
<p><img src="../../../images/2025/stak/stak-vm.png" alt="Screenshot" width="60%" /></p>
  <figcaption>At startup, a compiled program is loaded from a file. The interpreter steps through the bytecode. Input and output is performed through a library of built-in functions.</figcaption>
</figure>

<p>The VM is a textbook stack machine. It comes with a library of built-in functions, ranging from basic arithmetic and logic (<code class="highlighter-rouge">+</code>, <code class="highlighter-rouge">or</code>), utility functions (<code class="highlighter-rouge">random</code>) to input/output primitives (<code class="highlighter-rouge">fill-rect</code>, <code class="highlighter-rouge">key-held?</code>). Each of these gets a dedicated opcode, which improves bytecode density. In its disassembled form, a snippet of bytecode might look like this:</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>05 02         getlocal 2
00 40 01      pushconst 320
05 01         getlocal 1
82            *
00 09 00      pushconst 9
83            /
00 C8 00      pushconst 200
05 05         getlocal 5
81            -
00 40 01      pushconst 320
00 09 00      pushconst 9
83            /
05 05         getlocal 5
B1            fill-rect
</code></pre></div></div>

<p>The VM is implemented in C, with some assembly parts where it really matters (graphics routines). By the way, in old compilers – I am using Open Watcom – inline assembly is actually fun to use! It’s nothing like modern GCC, where it feels like you are filling out a tax form whenever you just want to insert a few instructions. For example, this is how you just casually call a BIOS interrupt:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">video_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">_asm</span> <span class="p">{</span>
        <span class="n">mov</span> <span class="n">ax</span><span class="p">,</span> <span class="mi">13</span><span class="n">h</span>
        <span class="kt">int</span> <span class="mi">10</span><span class="n">h</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The bytecode is produced by a compiler <a href="https://hylang.org">written in Hy</a>, a Lisp dialect embedded in Python. This means that the toolchain must run on a reasonably modern machine – certainly not a 286. That’s okay, and not much different from how a dedicated game console would be programmed: you have a beefy <strong>host</strong> machine with your favorite editor/IDE, source control and the build environment, and only binary code/data is transferred to the <strong>target</strong> for execution.</p>

<h3 id="the-juicy-parts">The juicy parts</h3>

<h4 id="live-reload">Live reload</h4>

<p>Something that gets annoying when cross-compiling for another machine is the constant swapping of memory media, whether it is a floppy, a flash cart, let alone tape, every time you make a change to your program (unless you’re lucky enough to have a hardware debugger). Right from the beginning, I dreamed of some kind of debug console, which would continuously watch the source file and apply the changes at every edit. The utopia would be to make live changes on the level of <em>individual statements</em> without disturbing the program state; for now, we will settle for automatic recompile and restart of the whole program.</p>

<p>Step one is to somehow connect the host and the target. A straightforward way is to use a USB-to-RS232 adapter cable. Then, on the target, the STAK VM can be started in “listen mode”. On the host you run an interactive interpreter. This interpreter lets you enter individual statements (REPL – read, eval, print, loop) or load an entire file, optionally continuing to watch for changes. Whenever new code is entered, or the file changes, the compiler and linker are invoked in the background, aware of the previous program state, and they produce a set of binary “patches” which are sent to the listening VM for execution.</p>

<figure>
<p><img src="../../../images/2025/stak/listener.png" alt="Screenshot" /></p>
  <figcaption></figcaption>
</figure>

<p>I will not go into details of the debug protocol and state machine (it’s straightforward and you can see the implementation <a href="https://github.com/mcejp/STAK/blob/main/interp/debug.c">in this file</a>), but I would like to give a shout-out to <a href="https://ascii.mcejp.com/posts/Diary-Framing-in-STAK-listener-protocol.html">HDLC framing</a>. In the protocol it was necessary to somehow delimit the various <em>frames</em> that are sent through the serial port, in a way that allows recovery in case of desynchronization, such as if the REPL process gets terminated and restarted. HDLC reserves a few byte values for control signals and provides an elegant way of escaping those if they appear in the actual data. It is not the most efficient scheme in terms of overhead, but is easy to implement and does not require any look-ahead.</p>

<h4 id="faking-floats">Faking floats</h4>

<p>There are no floating-point values in the language, in part because I didn’t want to give up the dramatic simplicity of a single data type and in part due to the limitations of the target platform (having an FPU was not very common in the 286 era). This, however, is at odds with the stated goal of supporting game prototyping/development! Floats are very convenient for any kind of math that models our world – something that most games do to some extent.</p>

<p>The old-school work-around is to use fixed-point values; for example, you could say that the upper 8 bits of a 16-bit value shall represent the integer part and the lower 8 bits are dedicated to the fractional part (in increments of 2<sup>-8</sup>, or approximately 0.004). It’s far from perfect: first, you need to allocate your bits <em>very carefully</em>; the trade-off is between the range of the integer part and the precision of the fractional part. The second issue is that fixed-point numbers are <em>clunky</em>. Many operations – such as anything that involves multiplication – require inserting shift operations to adjust the decimal point of the result. Real floats effectively provide a layer of abstraction, letting you chain many operations without giving much thought to the representation (even if that comes with <a href="https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems">many caveats</a>).</p>

<figure>
<p><img src="../../../images/2025/stak/fixed-point.png" alt="Screenshot" /></p>
  <figcaption>Comparison of two signed 16-bit fixed-point formats. As with signed integers, the highest bit determines the sign.</figcaption>
</figure>

<p>Still, I was not willing to concede on linguistic complexity by adding a separate data type, but I was willing to expand the built-in function library. After some experimentation, I chose a signed 10.6 format. It has just enough range to represent screen coordinates (which range from 0 to 319 on the X-axis), and just enough resolution to use sin/cos to compute them without visible loss of precision. I added functions like <code class="highlighter-rouge">mul@</code>, <code class="highlighter-rouge">sin@</code> and <code class="highlighter-rouge">cos@</code>. The <code class="highlighter-rouge">@</code> denotes that they operate on the fixed-point format; it is all a matter of convention, since the compiler, ignorant of data types, cannot verify that the program is using the appropriate function.</p>

<h4 id="transformations">Transformations</h4>

<p>To provide a little more syntax sugar without complicating the compiler any further, a layer of <em>transformations</em> is applied before compiling each form. If you are familiar with Lisp, think of macros – except you don’t get to define your own. The <code class="highlighter-rouge">dotimes</code> loop shown above, for example, is first transformed to a <code class="highlighter-rouge">while</code> loop before being compiled down to bytecode:</p>

<div class="language-scheme highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">do</span>
  <span class="p">(</span><span class="k">define</span> <span class="nv">color</span> <span class="mi">0</span><span class="p">)</span>
  <span class="p">(</span><span class="nf">while</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">color</span> <span class="nv">COLOR:COUNT</span><span class="p">)</span>
    <span class="p">(</span><span class="nf">clear-screen</span> <span class="nv">color</span><span class="p">)</span>
    <span class="p">(</span><span class="nf">pause-frames</span> <span class="mi">10</span><span class="p">)</span>
    <span class="p">(</span><span class="k">set!</span> <span class="nv">color</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">color</span> <span class="mi">1</span><span class="p">))))</span>
</code></pre></div></div>

<p>It is a rather simplistic substitution which comes with plenty of <a href="https://en.wikipedia.org/wiki/Hygienic_macro#The_hygiene_problem">well-known issues</a>. It does provide a tremendous bang for the buck, though.</p>

<h3 id="now-to-build-something-with-it">Now to build something with it…</h3>

<p>As the flagship demo program I decided to build a clone of the legendary GORILLA.BAS, an example program from Microsoft QBasic.</p>

<figure>
<video width="720" height="720" controls="">
  <source src="../../../images/2025/stak/output.webm" type="video/webm" />
  Your browser does not support the video tag.
</video>
  <figcaption>Gorillas running on a <a href="https://512pixels.net/2025/01/pocket-8086/">Pocket 8086</a> (NEC V30 CPU @ 10 MHz). The flickering of the crosshairs and banana are due to absence of double buffering; the machine is simply too slow for that.</figcaption>
</figure>

<p>Despite already reducing the scope with respect to the original game, the language’s limitations made the process frustrating to the point that I almost gave up.
For example, I needed to randomly generate building heights at the beginning of the round and store them for later. A reasonable solution would be to add arrays to the language, at least in some rudimentary form. Not for me. <em>You want an array? Implement it in userspace!</em> And so I did.</p>

<div class="language-scheme highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">define</span> <span class="nv">NUM-BLDGS</span> <span class="mi">5</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">building-h-0</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">building-h-1</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">building-h-2</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">building-h-3</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">building-h-4</span> <span class="mi">0</span><span class="p">)</span>

<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">set-building-h!</span> <span class="nv">n</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="k">cond</span> <span class="p">(</span><span class="nb">=</span> <span class="nv">n</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="k">set!</span> <span class="nv">building-h-0</span> <span class="nv">value</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">=</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="k">set!</span> <span class="nv">building-h-1</span> <span class="nv">value</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">=</span> <span class="nv">n</span> <span class="mi">2</span><span class="p">)</span> <span class="p">(</span><span class="k">set!</span> <span class="nv">building-h-2</span> <span class="nv">value</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">=</span> <span class="nv">n</span> <span class="mi">3</span><span class="p">)</span> <span class="p">(</span><span class="k">set!</span> <span class="nv">building-h-3</span> <span class="nv">value</span><span class="p">)</span>
              <span class="mi">1</span> <span class="p">(</span><span class="k">set!</span> <span class="nv">building-h-4</span> <span class="nv">value</span><span class="p">)))</span>

<span class="o">...</span>

<span class="p">(</span><span class="nf">dotimes</span> <span class="p">(</span><span class="nf">i</span> <span class="nv">NUM-BLDGS</span><span class="p">)</span>
  <span class="p">(</span><span class="k">define</span> <span class="nv">MIN-H</span> <span class="mi">20</span><span class="p">)</span>
  <span class="p">(</span><span class="k">define</span> <span class="nv">MAX-H</span> <span class="mi">120</span><span class="p">)</span>
  <span class="p">(</span><span class="nf">set-building-h!</span> <span class="nv">i</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">MIN-H</span> <span class="p">(</span><span class="nf">%</span> <span class="p">(</span><span class="nf">random</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">MAX-H</span> <span class="nv">MIN-H</span><span class="p">)))))</span>
</code></pre></div></div>

<p>It is clear that this would not scale beyond a trivial number of tiny arrays, and ultimately beyond a rather simplistic game. I do think, though, that there is value in this kind of exercise where you impose some axiomatic constraints at the beginning and then carry them through until reaching the point of absurdity. Maybe it’s not productive, but it’s certainly an interesting challenge that forces one to think outside the box and thoroughly question all previous assumptions.</p>

<h3 id="conclusion">Conclusion</h3>

<p>I think the project has fulfilled its purpose now. I am, of course, <a href="https://github.com/mcejp/STAK">publishing everything as open-source</a>. One day I would like to revisit the idea of finer-grained live program modification. I’m sure it has been done in some form before, but it seems worth reinventing.</p>

<p>If you want to try it out yourself, you can use DOSBox or a real DOS machine. It is also possible to build the VM natively (with SDL). Instructions can be found <a href="https://github.com/mcejp/STAK/blob/main/README.md">in the README</a>. It is not complicated, but there are some prerequisites that need to be installed first. Polishing the user experience was just not high in the list of priorities. Sorry about that.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">How I failed to make a game</title><link href="https://mcejp.github.io/2024/10/16/gba-raycasting.html" rel="alternate" type="text/html" title="How I failed to make a game" /><published>2024-10-16T00:00:00+02:00</published><updated>2024-10-16T00:00:00+02:00</updated><id>https://mcejp.github.io/2024/10/16/gba-raycasting</id><content type="html" xml:base="https://mcejp.github.io/2024/10/16/gba-raycasting.html"><![CDATA[<p>Today I am releasing my raycasting tech demo for the GameBoy Advance. Although it falls short of the goals I set out in the beginning – which included releasing a playable game –, I think there are some lessons worth sharing with the world.</p>

<p>This project started essentially as a challenge: inspired by the impressive work of <a href="https://www.youtube.com/@3DSage/videos">3DSage</a>, I wanted to see if I could build a raycasting “2.5D” engine that ran well enough to play a game.</p>

<p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/5rjSpQ_rC7I?si=1zJRQxStlLgq8e_1" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
</p>

<p>I explained in <a href="/2023/10/30/gba-benchmarking.html">a previous post</a> why I find the GBA a really nice platform to develop for. I have always enjoyed game programming, but pretty much from the beginning of my C++ journey I have been stuck in the <a href="https://geometrian.com/programming/tutorials/write-games-not-engines/index.php">Game Engine Trap</a>. Due to that realization, and based on experience from my other gamedev projects, it was really important to avoid getting carried away by tasks that are fun, but ultimately non-important. To that end I established some firm rules:</p>

<ul>
  <li>Build a <em>game</em>, not a <em>game engine</em></li>
  <li>Don’t get stuck on assets. Make a quick placeholder when necessary</li>
  <li>Don’t get distracted by languages. Use straightforward C++ and, if necessary, Python to generate code. Want fancy LISP-compiled-to-bytecode AI scripting? Put it in another project.</li>
  <li>Stick to one design direction. Don’t start branching out in the middle of development.</li>
</ul>

<p>And <em>it worked</em> – until I realized that tech development is the really interesting part for me, and while it is fun to think about the final result, with all the captivating environments and assets and mechanics, actually getting there eventually turns into a chore. Therefore, after not touching the project in several months, I declare time for a post-mortem!</p>

<h3 id="what-worked-out-well">What worked out well</h3>

<p>My tech stack was fixed more-or-less from the start. The main language for run-time code was C++, technically <strong>C++20</strong>, but I use barely any new language features. I was determined to use <strong>CMake</strong>, partly because it is the native project format of my favorite C++ IDE (CLion), partly because <a href="https://ascii.mcejp.com/posts/CMake.html">I just like it</a>. As toolchain, I used devkitARM at first, but moved to the official <strong>ARM GNU Toolchain</strong> with <a href="https://www.coranac.com/man/tonclib/"><strong>Tonclib</strong></a> to access the GBA hardware. Tonclib is really great. I don’t think it gets enough praise, and I think that more libraries should follow the same kind of spartan philosophy.</p>

<p>For really hot code paths, namely sprite scaling code, I also generate what are basically unrolled loops. They’re still compiled as C++, GCC does a good job of optimizing it. I didn’t want to write any assembly code except as a last resort at the end of the project, since it would be time-consuming to maintain.</p>

<p>Most games cannot exist without a considerable amount of <strong>assets</strong>. Having previously built somewhat sophisticated asset pipelines, this time I wanted something dead simple, knowing my game <a href="https://github.com/mcejp/GBA-raycaster/blob/master/doc/content-guidelines.rst#content-system-assumptions">would be very small</a>. I had recently finally grasped how input/output dependencies work for custom steps in CMake projects, so it seemed kind of obvious to use these for asset compilation, including the management of any dependencies between assets (such as between sprites and palettes). On GBA, the entirety of the cartridge ROM shows up as addressable memory, so instead of embedding custom binary formats and filesystem images, I just decided to <strong>generate a C++ header with the compiled data</strong> for each asset. Here’s an example snippet from the animation data for a spider enemy:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="p">...</span>
    <span class="c1">// col 30</span>
    <span class="p">{</span><span class="mi">25</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="n">ani_spider_dead_1_data_30</span><span class="p">},</span>
    <span class="c1">// col 31</span>
    <span class="p">{</span><span class="mi">25</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="n">ani_spider_dead_1_data_31</span><span class="p">},</span>
<span class="p">};</span>

<span class="k">static</span> <span class="k">const</span> <span class="n">AnimFrame</span> <span class="n">ani_spider_dead_frames</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">spans</span> <span class="o">=</span> <span class="n">ani_spider_dead_0_spans</span> <span class="p">},</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">spans</span> <span class="o">=</span> <span class="n">ani_spider_dead_1_spans</span> <span class="p">},</span>
<span class="p">};</span>

<span class="k">static</span> <span class="k">const</span> <span class="n">AnimImage</span> <span class="n">ani_spider_anims</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">frames</span> <span class="o">=</span> <span class="n">ani_spider_idle_frames</span><span class="p">,</span> <span class="p">.</span><span class="n">num_frames</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">frames</span> <span class="o">=</span> <span class="n">ani_spider_walk_frames</span><span class="p">,</span> <span class="p">.</span><span class="n">num_frames</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">frames</span> <span class="o">=</span> <span class="n">ani_spider_attk_frames</span><span class="p">,</span> <span class="p">.</span><span class="n">num_frames</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">frames</span> <span class="o">=</span> <span class="n">ani_spider_dead_frames</span><span class="p">,</span> <span class="p">.</span><span class="n">num_frames</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span>
<span class="p">};</span>

<span class="k">static</span> <span class="k">const</span> <span class="n">SpriteImage</span> <span class="n">ani_spider</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">anims</span> <span class="o">=</span> <span class="n">ani_spider_anims</span><span class="p">,</span> <span class="p">};</span>
</code></pre></div></div>

<p>I cannot overstate how satisfied I am with this solution. Again, this might only work for games up to a certain size. Rumor has it, though, that there is leaked Pokémon R/S/E code out there which does the exact same thing.</p>

<p>The custom asset compilers are all written in Python with only the most essential external dependencies (Numpy and <a href="https://pillow.readthedocs.io/">Pillow</a>). Python dependency management in this context is still an unsolved problem for me; in principle, the CMake script could set up a <em>venv</em> and install arbitrary dependencies at configuration time, but such a setup sounds rather fragile. So I just require those two packages to be already installed in whatever interpreter is used to configure the project.</p>

<p>To create and edit maps I opted for the venerable <a href="https://www.mapeditor.org/">Tiled</a> editor and it didn’t disappoint. About the only point of friction was due to difference in the coordinate system between the editor and the game, so the conversion script has to include some transformation math which took a few iterations to get right:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># correct for rotation
# Tiled anchors sprites in bottom left of the cell, but in-game sprite origin
# (incl. for rotation) is at the center
# therefore un-rotate the vector from corner to center
</span><span class="n">xxx</span> <span class="o">=</span> <span class="n">obj</span><span class="p">[</span><span class="s">"width"</span><span class="p">]</span> <span class="o">/</span> <span class="mi">2</span>
<span class="n">yyy</span> <span class="o">=</span> <span class="n">obj</span><span class="p">[</span><span class="s">"height"</span><span class="p">]</span> <span class="o">/</span> <span class="mi">2</span>
<span class="n">aaa</span> <span class="o">=</span> <span class="o">-</span><span class="n">obj</span><span class="p">[</span><span class="s">"rotation"</span><span class="p">]</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">math</span><span class="p">.</span><span class="n">pi</span> <span class="o">/</span> <span class="mi">360</span>
<span class="n">xxxx</span> <span class="o">=</span> <span class="n">xxx</span> <span class="o">*</span> <span class="n">math</span><span class="p">.</span><span class="n">cos</span><span class="p">(</span><span class="n">aaa</span><span class="p">)</span> <span class="o">-</span> <span class="n">yyy</span> <span class="o">*</span> <span class="n">math</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="n">aaa</span><span class="p">)</span>
<span class="n">yyyy</span> <span class="o">=</span> <span class="n">xxx</span> <span class="o">*</span> <span class="n">math</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="n">aaa</span><span class="p">)</span> <span class="o">+</span> <span class="n">yyy</span> <span class="o">*</span> <span class="n">math</span><span class="p">.</span><span class="n">cos</span><span class="p">(</span><span class="n">aaa</span><span class="p">)</span>
<span class="n">obj_x</span> <span class="o">=</span> <span class="n">obj</span><span class="p">[</span><span class="s">"x"</span><span class="p">]</span> <span class="o">+</span> <span class="n">xxxx</span>
<span class="n">obj_y</span> <span class="o">=</span> <span class="n">obj</span><span class="p">[</span><span class="s">"y"</span><span class="p">]</span> <span class="o">-</span> <span class="n">yyyy</span>

<span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">tiled_to_world_coords</span><span class="p">(</span><span class="n">obj_x</span><span class="p">,</span> <span class="n">obj_y</span><span class="p">)</span>
</code></pre></div></div>

<p>Benchmarking was an important part of the development process. I wrote about it at length in <a href="/2023/10/30/gba-benchmarking.html">a previous article</a>. Unless you are willing to spend a lot of time experimenting and guessing, I would now say that having strong benchmarking capability built-in is <strong>essential</strong> in such a project sensitive to “detail optimization” (I don’t know if there is an established term for this – what I mean is that changing e.g. the alignment of some function or structure can make a 1% performance difference, which quckly adds up. A related idea is <a href="https://youtu.be/Ca1hHC2EctY">“performance lottery”</a>, which should really be called <em>performance chaos theory</em>, whereby a minor change to the codebase can have performance impacts elsewhere by changing the layout of the compiled code).</p>

<figure>
<p><img src="../../../images/2024/gba-raycasting/benchmarks.png" alt="Benchmarking results are processed by CI and rendered to a webpage" /></p>
</figure>

<h3 id="what-didnt-work">What didn’t work</h3>

<p>Content. Brace for an obvious statement: it is one thing to imagine a grand, Daggerfall-scale game in your head, and a completely different thing to actually start building the pieces that make it up. I am not a graphic artist and I get quickly frustrated playing one. (If you haven’t figured by now, I am writing this article as a reminder/deterrent to future me.) I could commission the graphics and sounds, but that gets expensive for a game with no prospect of commercialization. Making maps was more interesting, but I often felt like I was missing the right textures for the environments I wanted to build.</p>

<p>Controls were an issue from the beginning. I should have done more research into how other FPS games solved this. What I settled on was “tank controls” on the D-pad, A/B to strafe and R to shoot.</p>

<p>The screen on the GBA is notoriously bad. I don’t know how we managed to stare at it for hours as kids. The gamma curve is crazy, the colors are kinda washed out, and the contrast is a joke. The right answer here is probably to simply use a GBA SP, but I don’t have any emotional connection to it.</p>

<p>In any case, it always feels great to see your code run on real hardware, but after already being somewhat burned out with assets, it really drove home the latter two points above, providing a final straw to kill the project.</p>

<h3 id="just-show-me-the-code">Just show me the code</h3>

<p>The release is as complete as was possible; unfortunately, I do not have redistribution rights for the third-party assets. I have blackened them out in the published files, hopefully keeping just the silhouettes can be considered fair use.</p>

<p>And here it is: <a href="https://github.com/mcejp/GBA-raycaster">mcejp/GBA-raycaster</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Today I am releasing my raycasting tech demo for the GameBoy Advance. Although it falls short of the goals I set out in the beginning – which included releasing a playable game –, I think there are some lessons worth sharing with the world.]]></summary></entry><entry><title type="html">sys/iosupport.h: No such file or directory</title><link href="https://mcejp.github.io/2024/06/02/iosupport-h.html" rel="alternate" type="text/html" title="sys/iosupport.h: No such file or directory" /><published>2024-06-02T00:00:00+02:00</published><updated>2024-06-02T00:00:00+02:00</updated><id>https://mcejp.github.io/2024/06/02/iosupport-h</id><content type="html" xml:base="https://mcejp.github.io/2024/06/02/iosupport-h.html"><![CDATA[<p>I ran into this issue in the process of moving away from devkitARM to a vanilla ARM GNU Toolchain.</p>

<p>It boils down to a library called <em>libsysbase</em>, which was at proposed for merging into Newlib <a href="https://sourceware.org/pipermail/newlib/2006/004569.html">all the way back in 2006</a>, but <a href="https://sourceware.org/pipermail/newlib/2010/008623.html">never made it in</a>, so it remains a  <a href="https://github.com/devkitPro/buildscripts/blob/23afd9959c86396054d7856c7e0bd9f137416f53/dkarm-eabi/patches/newlib-4.4.0.20231231.patch">devkitARM addition</a>.</p>

<p>In Newlib, the way to implement one’s own STDOUT handler for functions like <code class="highlighter-rouge">printf</code> is <a href="https://stackoverflow.com/q/55014043">to implement</a> a set of low-level, “syscall-like” functions; the most important of those is <code class="highlighter-rouge">_write</code>. On the other hand, <em>libsysbase</em> already implements these, providing a layer of abstraction on top, having the concept of <em>devices</em> for dealing with the file system.</p>

<p>If you only care about STDOUT, it’s quite easy to migrate from <em>libsysbase</em> to vanilla Newlib. Instead of declaring <code class="highlighter-rouge">devoptab_t</code> entries, implement the <code class="highlighter-rouge">_write</code> function directly.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[I ran into this issue in the process of moving away from devkitARM to a vanilla ARM GNU Toolchain.]]></summary></entry><entry><title type="html">Automated benchmarking in GameBoy Advance homebrew</title><link href="https://mcejp.github.io/2023/10/30/gba-benchmarking.html" rel="alternate" type="text/html" title="Automated benchmarking in GameBoy Advance homebrew" /><published>2023-10-30T00:00:00+01:00</published><updated>2023-10-30T00:00:00+01:00</updated><id>https://mcejp.github.io/2023/10/30/gba-benchmarking</id><content type="html" xml:base="https://mcejp.github.io/2023/10/30/gba-benchmarking.html"><![CDATA[<p>The GBA is a delightful platform to develop for. It is straightforward enough to understand thoroughly
– a single 32-bit CPU, no OS, no built-in wireless features –
but also sufficiently advanced to allow an ergonomic workflow based on modern languages and tools
like C++20 and CMake (or Rust).
Still, it is not particularly powerful in terms of raw computation and when writing rendering
code, even a simple ray-caster, some way of benchmarking the performance is essentially a must.</p>

<figure>
<p><img src="../../../images/2023/gba-benchmarking/raycaster.png" alt="Screenshot of an untextured raycaster" /></p>
</figure>

<p>It is, of course, possible to add FPS (or better, cycle) counters on the screen and
check them after every code change, but we seek a more rigorous approach.
It should be completely automatic, so that it is trivial to execute both locally and in a CI pipeline.</p>

<p>Here is the rough idea:</p>

<ol>
  <li>Implement benchmarking in the game, using hardware resources built into the GBA.
In my project I am using the TONC library which provides
<a href="https://www.coranac.com/man/tonclib/group__grpTimer.htm">rudimentary cycle counting</a> using GBA Timers 2 &amp; 3 in a cascade mode.</li>
  <li>(optional) Record a sequence of inputs to replay during the benchmark, to move around 
the scene and get more statistically relevant results.</li>
  <li>Execute a GBA emulator in headless mode (no GUI) and have it output the measured
performance metrics.</li>
  <li>Capture this output for further processing.</li>
</ol>

<p>This leaves me with 2 problems to solve: find a cycle-accurate emulator that can
run headless and somehow exfiltrate the measurements from the emulated console.</p>

<p>It turns out that the GBA’s <em>SIO</em> peripheral supports an <a href="https://problemkaputt.de/gbatek.htm#siouartmode">UART mode</a>,
which in essence means very simple character output.
Perfect, now just to find an emulator that can forward the UART to the host operating
system.</p>

<h3 id="mgba">mGBA</h3>

<p>As of writing, <a href="https://mgba.io">mGBA</a> is the only GBA emulator “confidently recommended” by the
<a href="https://emulation.gametechwiki.com/index.php/Game_Boy_Advance_emulators">Emulation General Wiki</a>.
Not to imply that this is somehow the authoritative source on all things emulation,
but I have gotten good advice from it in the past.
mGBA is being actively developed and it’s <a href="https://mgba.io/2014/12/28/classic-nes/">technical blog</a> provides some
excellent reading.
Does it fit the bill, though?</p>

<p>mGBA implements <a href="https://mgba.io/docs/scripting.html">Lua scripting support</a> that allows to introspect
the emulated system rather deeply.
Unfortunately, it <a href="https://github.com/mgba-emu/mgba/issues/3053">does not implement</a> UART mode.
However, there is an alternative – albeit non-standard – way to get stuff out of the
emulator.
This mechanism consists of several I/O registers in the 0x04FFxxxx address space of the
emulated system.
Through these registers, the ROM has access to the emulator’s own logging facilities.</p>

<p>mGBA comes with <a href="https://github.com/mgba-emu/mgba/blob/0.10.2/opt/libgba/mgba.c">example code for using these</a>, assuming a libgba runtime.
As mentioned earlier, my project is based on TONC;
fortunately, porting the example code is trivial – in mgba.c,
<code class="highlighter-rouge">#include &lt;gba_types.h&gt;</code> just needs to be changed to <code class="highlighter-rouge">#include &lt;tonc_types.h&gt;</code>.</p>

<p>Next, I was looking for a way to run mGBA headless and for a limited duration.
At first it looked like some level of source code hackery would be necessary,
but then I discovered one of the built-in test utilities, <em>mgba-rom-test</em>.
It can execute  a ROM, without any user interface, and while it cannot be told to quit
after a fixed interval of time (like <em>mgba-perf</em> can),
it can exit once the game calls an SWI specified by the user.</p>

<p>Calls a <em>what</em>? SWI (software interrupt) instructions are normally used to invoke
<a href="https://problemkaputt.de/gbatek.htm#biosfunctions">functions built into the GBA BIOS</a>;
we can therefore either re-purpose a function that would never be used during the benchmark,
or find an unallocated SWI number to claim for our purposes.
In absence of convincing reasons for either option, I went with the former, appropriating
the <code class="highlighter-rouge">Stop</code> call (<code class="highlighter-rouge">swi 0x03</code>).</p>

<h3 id="putting-it-together">Putting it together</h3>

<p>A minimal, but complete example then looks like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;tonc.h&gt;</span><span class="cp">
</span>
<span class="cp">#include</span> <span class="cpf">"mgba/mgba.h"</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">i</span><span class="p">;</span>

    <span class="n">mgba_open</span><span class="p">();</span>

    <span class="n">profile_start</span><span class="p">();</span>

    <span class="c1">// waste some time</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">Div</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">uint</span> <span class="n">duration</span> <span class="o">=</span> <span class="n">profile_stop</span><span class="p">();</span>
    <span class="n">mgba_printf</span><span class="p">(</span><span class="n">MGBA_LOG_INFO</span><span class="p">,</span> <span class="s">"BENCHMARK: %d cycles"</span><span class="p">,</span> <span class="n">duration</span><span class="p">);</span>

    <span class="n">Stop</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <em>mgba-rom-test</em> executable is not distributed with mGBA releases, in fact, it is not even
compiled by default. We will need to build mGBA from source with some custom flags.
Since we don’t care about GUI or fancy features, we can use additional options to minimize
the build time and dependencies.
The following configuration worked well for me:</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cmake -DBUILD_QT=OFF \
        -DBUILD_ROM_TEST=ON \             &lt;-- secret sauce
        -DBUILD_SDL=OFF \
        -DUSE_EDITLINE=OFF \
        -DUSE_ELF=OFF \
        -DUSE_EPOXY=OFF \
        -DUSE_FFMPEG=OFF \
        -DUSE_LIBZIP=OFF \
        -DUSE_USE_MINIZIP=OFF \
        -DUSE_PNG=OFF \
        -DUSE_SQLITE3=OFF \
        -DUSE_ZLIB=OFF \
        -G Ninja ..

$ ninja mgba-rom-test
</code></pre></div></div>

<p>If everything goes well, we run it like this:</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./test/mgba-rom-test -S 0x03 --log-level 15 /path/to/ROM.gba
GBA Debug: BENCHMARK: 1751 cycles
</code></pre></div></div>

<p><code class="highlighter-rouge">--log-level</code> is a bit field whose documentation leaves it rather mysterious, but it seems
to correspond to <a href="https://github.com/mgba-emu/mgba/blob/0.10.2/include/mgba/core/log.h#L15"><code class="highlighter-rouge">enum mLogLevel</code></a> in the code base.
A value of 15, or 0x0F, then corresponds to all levels from FATAL down to INFO, but excluding
DEBUG and lower.
Without the flag, mGBA’s implementation of the GBA BIOS emits a message every time a BIOS
function is used, which can be annoying.</p>

<h3 id="multiple-scenarios">Multiple scenarios</h3>

<p>At the beginning I alluded to recording inputs for later playback.
It doesn’t seem to be implemented as a native feature of mGBA at this time, so let’s come
back to this topic in the future.</p>

<p>Let’s start with a presumably easier problem, which is how to select one out of multiple
test scenarios to execute.
There exists a trivial solution which is to use compile-time flags and build a number of
different ROMs, one per scenario.
That feels rather wasteful, and a potential nightmare to manage as the number of tests cases
goes up.
Can we, instead, bake everything into a single ROM and make the choice at runtime?</p>

<p>Having previously solve the problem of <em>extracting</em> data from mGBA, let’s now look at the ways
to <em>inject</em> data at the start.
A cursory glance reveals a number of entry vectors:</p>

<ul>
  <li>the ROM file itself</li>
  <li>the built-in command-line debugger (<code class="highlighter-rouge">-d</code>)</li>
  <li>the built-in GDB (<code class="highlighter-rouge">-g</code>)</li>
  <li>IPS patches (<code class="highlighter-rouge">-p</code>)</li>
  <li>Lua scripts</li>
  <li>save states (<code class="highlighter-rouge">-t</code>)</li>
  <li>cheat codes (<code class="highlighter-rouge">-c</code>)</li>
</ul>

<p>Some of these are out – for example, the scripting engine is not accessible in <code class="highlighter-rouge">mgba-rom-test</code>.
Patching a fixed address inside the ROM is always an option, but let’s look for something more
elegant.
Save states are very powerful, but they use a proprietary binary format, which might even
change between versions of the emulator.</p>

<p>What about cheat codes? They operate on the principle of hooking the game code and allowing
pretty much arbitrary memory modifications. That sounds interesting, to say the least!</p>

<figure>
<p><img src="../../../images/2023/gba-benchmarking/gameshark.jpg" alt="Photo" /></p>
  <figcaption>GameShark for Game Boy Advance</figcaption>
</figure>

<p>Of course, things cannot be <em>too</em> simple. The cheat devices for the GBA are a mess, with the
most famous ones (Action Replay and GameShark Advance) <a href="https://wunkolo.tumblr.com/post/144418662792">encrypting their codes</a>.
While the encryption has been broken open years ago, it would be preferable to avoid such
a complication altogether.
There is <a href="https://doc.kodewerx.org/hacking_gba.html#cba">Codebreaker, where encryption is optional</a>.
Finally, mGBA supports a “VBA” cheat file format (presumably pioneered by VisualBoyAdvance),
which is the most straightforward of all: it’s just a list of address-value pairs.</p>

<p>In order to do their dirty work, the classic cheat devices work by hooking the game code
and hijacking a suitable branch instruction,
whose address is encoded as part of the cheat “master code”.
With VBA cheats, this is not necessary; the memory modifications are applied at the end of each
emulated frame.
This has the downside that the game has to wait for one frame to pass before checking the memory
location (~5 extra lines of code including clean-up).
It still seems like a better trade-off than having to look for a Thumb branch instruction that
is on the main code path and guaranteed to be stable across builds.</p>

<p>Having sorted out the mechanism, we still need a place to put our magic cookie.
For a proof of concept, let’s just put it at the very end of EWRAM, which spans the 256 KiB from
0x0200 0000 to 0x0203 FFFF.
The linker script should ideally be adjusted to make sure that the compiler will not interfere
with our chosen special location.</p>

<p>The cheat file itself then boils down to a single line:</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0203FFFE:beef
</code></pre></div></div>

<p>And reading the value from inside is straightforward, too:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// wait for 1 frame to pass</span>
<span class="n">irq_init</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">irq_enable</span><span class="p">(</span><span class="n">II_VBLANK</span><span class="p">);</span>
<span class="n">VBlankIntrWait</span><span class="p">();</span>

<span class="n">mgba_printf</span><span class="p">(</span><span class="n">MGBA_LOG_INFO</span><span class="p">,</span> <span class="s">"Requested test case: %04Xh"</span><span class="p">,</span> <span class="o">*</span><span class="p">(</span><span class="n">u16</span><span class="o">*</span><span class="p">)</span><span class="mh">0x0203FFFE</span><span class="p">);</span>
</code></pre></div></div>

<p>Let’s see it in action:</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$  ./test/mgba-rom-test -S 0x03 --cheats cheatfile.txt --log-level 15 /path/to/ROM.gba
GBA Debug: Requested test case: BEEFh
GBA Debug: BENCHMARK: 1751 cycles
</code></pre></div></div>

<h3 id="gitlab-ci">GitLab CI</h3>

<figure>
<p><img src="../../../images/2023/gba-benchmarking/pipeline.png" alt="CI pipeline with 'build' and 'test' steps" /></p>
</figure>

<p>We can now take it one step further, and have the benchmark run automatically in a CI pipeline.
It makes sense to separate the ROM build step from the benchmarking step,
since the former needs the GBA toolchain to build, while the latter builds mGBA for the host
platform (or whatever container image we run it in),
before executing the test proper.
The complete example is a bit too long to reproduce here in full, but you can 
<a href="https://gitlab.com/mcejp/tonc-mgba-ci-demo/-/blob/master/.gitlab-ci.yml">check it out here</a>.</p>

<p>In <a href="/2022/06/20/builds.html">a previous post</a>, I have shown how you could accumulate
these results in a database and track long-term trends, generate fancy badges and so on.
This time, it is left as an exercise to the reader :)</p>]]></content><author><name></name></author><summary type="html"><![CDATA[The GBA is a delightful platform to develop for. It is straightforward enough to understand thoroughly – a single 32-bit CPU, no OS, no built-in wireless features – but also sufficiently advanced to allow an ergonomic workflow based on modern languages and tools like C++20 and CMake (or Rust). Still, it is not particularly powerful in terms of raw computation and when writing rendering code, even a simple ray-caster, some way of benchmarking the performance is essentially a must.]]></summary></entry><entry><title type="html">Generating core dumps for bare-metal AArch64 programs</title><link href="https://mcejp.github.io/2023/07/30/aarch64-coredump.html" rel="alternate" type="text/html" title="Generating core dumps for bare-metal AArch64 programs" /><published>2023-07-30T00:00:00+02:00</published><updated>2023-07-30T00:00:00+02:00</updated><id>https://mcejp.github.io/2023/07/30/aarch64-coredump</id><content type="html" xml:base="https://mcejp.github.io/2023/07/30/aarch64-coredump.html"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>Bare-metal 64-bit ARM programming is a strange niche: small, power-efficient microcontrollers usually implement the – considerably simpler – 32-bit version of the architecture. And on larger chips, one would typically run their application under a full-blown OS, namely Linux.
Yet, there are cases where one needs the raw performance of an advanced 64-bit CPU, but a standard OS, despite all efforts to tune it, would bring in too much timing uncertainty for real-time process control.</p>

<p>Welcome to CERN, where standard approaches don’t quite cut it, and <em>state of the art</em> is yesterday’s news. The FGC4, a new digital controller in development by the <a href="https://sy-dep-epc.web.cern.ch/">Electrical Power Converters</a> group has <a href="https://www.ipac23.org/preproc/pdf/WEPM080.pdf">exactly this kind of requirements</a>.</p>

<p>Debugging these kinds of systems can be… <em>interesting</em>. There is a whole spectrum of mechanisms that one might have at their disposal, depending on choices made in the system design – from the simplicity of the blinking LED, the versatility of a serial output, to the sophistication of a JTAG adapter.
But once the device is out in the field, you cannot guarantee to be present when an error occurs. The best you can do is to log it in the maximum detail possible, and attempt to understand the issue later.
Fortunately, there is a standard mechanism for that – the core dump. Unfortunately, it is not readily available in environments like the one described in this post.</p>

<h2 id="what-is-a-core-dump">What is a core dump?</h2>

<p>A core dump is a snapshot of the state of a process, usually at the time it crashed.
The idea is that you can take this snapshot, load it up in a debugger, and inspect the process as if the error had occurred just moments ago.
The most precious piece of knowledge to recover is perhaps the stack trace; however, the ability to inspect the program’s variables can certainly be useful as well.</p>

<p>Unfortunately, while a core dump is the de-facto standard way of capturing process state, its format is actually dependent on the host operating system (as is the case for executables).
But on bare metal, there is no OS. No OS, no core dump – right?
Well, consider this: what if we could <em>pretend</em> to be running an OS, and synthesize the core dump file accordingly?
Then we could use all the standard tools to analyze it and extract useful information.</p>

<h3 id="anatomy-of-a-core-dump">Anatomy of a core dump</h3>

<p>Interestingly, on Linux the same ELF format used for executables and libraries is used as a container for the core dump.
They even use the same <a href="http://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html">program headers</a> to describe memory segments, although things begin to diverge beyond that.
Before diving deeper into the specifics, let’s take a look at what such a dump contains:</p>

<ol>
  <li>A data structure giving basic information about the process, such as its PID or argument list</li>
  <li>A snapshot of the process’ memory, including code, global variables, thread stacks and the heap</li>
  <li>For each thread:
    <ul>
      <li>A structure giving basic information</li>
      <li>A copy of the CPU state, including general-purpose and floating-point registers</li>
    </ul>
  </li>
</ol>

<p>For better illustration, let’s draw a comparison between a core dump and a standard executable. An ELF executable usually includes these sections:</p>

<ul>
  <li><code class="highlighter-rouge">.text</code> (executable code)</li>
  <li><code class="highlighter-rouge">.rodata</code> (read-only data)</li>
  <li><code class="highlighter-rouge">.data</code> (initialized data)</li>
  <li><code class="highlighter-rouge">.bss</code> (zero-initialized data); since the contents of this section are known to be all zeroes, it is not necessary to physically include them in the file</li>
</ul>

<p>When you use GDB to save a core dump, it will contain a copy of all of these, plus the program heap and stacks of all threads.</p>

<figure>
<p><img src="../../../images/2023/aarch64-coredump/executable-vs-coredump.png" alt="Screenshot" /></p>
  <figcaption>Comparison of the memory sections between executables and core dumps – both in ELF format. Note that each file contains additional sections <em>not</em> corresponding to program memory</figcaption>
</figure>

<p>This has been a simplification, and to correctly synthesize a core dump, we have to be a bit more precise:
The core dump will indeed contain a snapshot of memory corresponding to the sections mentioned above,
however, this snapshot is described by one or more <code class="highlighter-rouge">PT_LOAD</code> segments rather than sections; names and other attributes of sections are therefore lost.
This is not a problem, because we can extract section information from the original executable file.</p>

<h4 id="process--thread-information">Process &amp; thread information</h4>

<p>An ELF-encoded core dump also contains a <code class="highlighter-rouge">PT_NOTE</code> segment providing some general information about the process and its threads.</p>

<figure>
<p><img src="../../../images/2023/aarch64-coredump/pt_note.png" alt="Screenshot" /></p>
  <figcaption>Structure of the <code class="highlighter-rouge">PT_NOTE</code> segment</figcaption>
</figure>

<h3 id="inspecting-a-core-dump">Inspecting a core dump</h3>

<p>To extract useful information from a core dump, the original ELF file of the program is also required. This is because the core dump does not contain information about symbols, let alone mappings from compiled code to source locations.
With a core file in hand, we can execute</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb program.elf core
</code></pre></div></div>

<p>This will start GDB, load the program ELF file and combine it with the information found in the core file.
This article is not meant as a GDB tutorial, but in case you need a rundown, <a href="https://www.cse.unsw.edu.au/~learn/debugging/modules/gdb_coredumps/">this one</a> is quite nice.
What matters to us here is the observation that information contained in the core dump is “overlaid” on top of the original executable.</p>

<h2 id="writing-our-own">Writing our own</h2>

<h3 id="impedance-mismatch">Impedance mismatch</h3>

<p>There are some discrepancies between the Linux model and a bare-metal application.
For example, there is no concept of Process ID or a command line. We also assume a single thread of execution.
That might not always be the case; one might, for example, want to capture the state of a multi-threaded FreeRTOS application.</p>

<p>Another difference is that of address spaces: Linux processes always execute in virtual memory, while a bare-metal program would use physical memory directly.
In practice, it’s not really a problem. All that matters is that the addresses agree between the ELF of the program and the dumped image. As long as we do not move the program around when loading it, this will indeed be the case.</p>

<h3 id="collecting-information">Collecting information</h3>

<p>A core dump would usually be emitted in response to a program crash. Under an operating system with memory protection, it is well-defined what a program is allowed to do. On bare metal, the possibilities are much wider and there are footguns aplenty. Without going too much into detail, one symptom of a critical problem on AArch64 can be a <a href="https://developer.arm.com/documentation/den0024/a/AArch64-Exception-Handling/Synchronous-and-asynchronous-exceptions">Synchronous exception</a>. If the goal is to produce a dump, it is important to save <em>all</em> general-purpose registers (GPRs) on exception entry instead of just caller-saved ones, as is often done before calling an exception handler written in C/C++.
You can see an example <a href="https://gist.github.com/mcejp/56c26608cf90e2120cdf887ea961b851">here</a>, but it will probably require adapting to your specific application.</p>

<p>Besides the GPRs, we also need to gather the contents of floating-point registers, and finally, we need to know which region(s) of memory are relevant to the program.</p>

<h3 id="writing-the-core-file">Writing the core file</h3>

<p>With all the inputs on hand, we can move on to assembling the actual file.
ELF is not the simplest format to write manually (without the use of any libraries), but in this case the structure will be simple enough. To begin, we need a copy of <a href="https://github.com/torvalds/linux/blob/master/include/uapi/linux/elf.h">elf.h</a> (careful about the license, though) to provide the structures and constants.
A slight complication here lies in the fact that we have to precompute all the offsets and sizes. Let’s begin by visualizing the physical layout of the file we are going to write:</p>

<figure>
<p><img src="../../../images/2023/aarch64-coredump/physical-layout.png" alt="Screenshot" /></p>
  <figcaption>Arrangement of data structures comprising the core file</figcaption>
</figure>

<p>First comes the ELF file header. Not much surprising here. Note the file type of <code class="highlighter-rouge">ET_CORE</code>.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">FILE</span><span class="o">*</span> <span class="n">elf</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">"core"</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">);</span>

<span class="c1">// ELF header</span>
<span class="n">Elf64_Ehdr</span> <span class="n">ehdr</span> <span class="p">{};</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ident</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>   <span class="o">=</span> <span class="n">ELFMAG0</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ident</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>   <span class="o">=</span> <span class="n">ELFMAG1</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ident</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>   <span class="o">=</span> <span class="n">ELFMAG2</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ident</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>   <span class="o">=</span> <span class="n">ELFMAG3</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ident</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>   <span class="o">=</span> <span class="n">ELFCLASS64</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ident</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>   <span class="o">=</span> <span class="n">ELFDATA2LSB</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ident</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span>   <span class="o">=</span> <span class="n">EV_CURRENT</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_type</span>       <span class="o">=</span> <span class="n">ET_CORE</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_machine</span>    <span class="o">=</span> <span class="n">EM_AARCH64</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_version</span>    <span class="o">=</span> <span class="n">EV_CURRENT</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_phoff</span>      <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ehdr</span><span class="p">);</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_ehsize</span>     <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ehdr</span><span class="p">);</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_phentsize</span>  <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Elf64_Phdr</span><span class="p">);</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_phnum</span>      <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">ehdr</span><span class="p">.</span><span class="n">e_shentsize</span>  <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Elf64_Shdr</span><span class="p">);</span>
<span class="n">fwrite</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ehdr</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ehdr</span><span class="p">),</span> <span class="n">elf</span><span class="p">);</span>
</code></pre></div></div>

<p>Next, we will need to write two program headers: one for the memory snapshot (it can, in fact, come in multiple segments, but we’re keeping things simple) and one for the <code class="highlighter-rouge">PT_NOTE</code> segment described previously. Some complexity comes from the computation of the segment size and the need to align the snapshot to page size.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Elf64_Phdr</span> <span class="n">phdr</span> <span class="p">{};</span>

<span class="c1">// NOTE segment</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_type</span>     <span class="o">=</span> <span class="n">PT_NOTE</span><span class="p">;</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_offset</span>   <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Elf64_Ehdr</span><span class="p">)</span> <span class="o">+</span> <span class="n">ehdr</span><span class="p">.</span><span class="n">e_phnum</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">phdr</span><span class="p">);</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_filesz</span>   <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Elf64_Nhdr</span><span class="p">)</span> <span class="o">+</span> <span class="mi">8</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">elf_prpsinfo</span><span class="p">)</span> <span class="o">+</span>
                  <span class="k">sizeof</span><span class="p">(</span><span class="n">Elf64_Nhdr</span><span class="p">)</span> <span class="o">+</span> <span class="mi">8</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">elf_prstatus</span><span class="p">)</span> <span class="o">+</span>
                  <span class="k">sizeof</span><span class="p">(</span><span class="n">Elf64_Nhdr</span><span class="p">)</span> <span class="o">+</span> <span class="mi">8</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">elf_fpregset_t</span><span class="p">);</span>
<span class="n">fwrite</span><span class="p">(</span><span class="o">&amp;</span><span class="n">phdr</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">phdr</span><span class="p">),</span> <span class="n">elf</span><span class="p">);</span>

<span class="c1">// LOAD segment (memory image)</span>
<span class="c1">// First, compute alignment after previous segment</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_align</span>    <span class="o">=</span> <span class="mi">4096</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">note_align</span> <span class="o">=</span> <span class="n">phdr</span><span class="p">.</span><span class="n">p_align</span> <span class="o">-</span> <span class="p">((</span><span class="n">phdr</span><span class="p">.</span><span class="n">p_offset</span> <span class="o">+</span> <span class="n">phdr</span><span class="p">.</span><span class="n">p_filesz</span><span class="p">)</span> <span class="o">%</span> <span class="n">phdr</span><span class="p">.</span><span class="n">p_align</span><span class="p">);</span>

<span class="k">if</span> <span class="p">(</span><span class="n">note_align</span> <span class="o">==</span> <span class="n">phdr</span><span class="p">.</span><span class="n">p_align</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">note_align</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">phdr</span><span class="p">.</span><span class="n">p_type</span>     <span class="o">=</span> <span class="n">PT_LOAD</span><span class="p">;</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_flags</span>    <span class="o">=</span> <span class="n">PF_R</span> <span class="o">|</span> <span class="n">PF_X</span> <span class="o">|</span> <span class="n">PF_W</span><span class="p">;</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_offset</span>  <span class="o">+=</span> <span class="n">phdr</span><span class="p">.</span><span class="n">p_filesz</span> <span class="o">+</span> <span class="n">note_align</span><span class="p">;</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_vaddr</span>    <span class="o">=</span> <span class="n">MEMORY_SNAPSHOT_ADDR</span><span class="p">;</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_paddr</span>    <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_filesz</span>   <span class="o">=</span> <span class="n">MEMORY_SNAPSHOT_SIZE</span><span class="p">;</span>
<span class="n">phdr</span><span class="p">.</span><span class="n">p_memsz</span>    <span class="o">=</span> <span class="n">MEMORY_SNAPSHOT_SIZE</span><span class="p">;</span>
<span class="n">fwrite</span><span class="p">(</span><span class="o">&amp;</span><span class="n">phdr</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">phdr</span><span class="p">),</span> <span class="n">elf</span><span class="p">);</span>
</code></pre></div></div>

<p>We don’t need to write any sections, so after the program headers we immediately proceed with the note segment. The alignment/padding convention here justifies writing a couple of helper functions first:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="kt">size_t</span> <span class="n">alignment</span><span class="p">&gt;</span>
<span class="k">static</span> <span class="k">auto</span> <span class="nf">make_padding_span</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">length</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">byte</span> <span class="n">zeros</span><span class="p">[</span><span class="n">alignment</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="p">{};</span>
    <span class="k">auto</span> <span class="n">padding_needed</span> <span class="o">=</span> <span class="p">(</span><span class="n">length</span> <span class="o">%</span> <span class="n">alignment</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">?</span> <span class="mi">0</span> <span class="o">:</span> <span class="p">(</span><span class="n">alignment</span> <span class="o">-</span> <span class="n">length</span> <span class="o">%</span> <span class="n">alignment</span><span class="p">);</span>

    <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">span</span><span class="p">{</span><span class="n">zeros</span><span class="p">,</span> <span class="n">padding_needed</span><span class="p">};</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">bool</span> <span class="n">write_note</span><span class="p">(</span><span class="kt">FILE</span><span class="o">*</span> <span class="n">f</span><span class="p">,</span>
                       <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">name</span><span class="p">,</span>
                       <span class="n">Elf64_Word</span> <span class="n">type</span><span class="p">,</span>
                       <span class="n">std</span><span class="o">::</span><span class="n">span</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">byte</span> <span class="k">const</span><span class="o">&gt;</span> <span class="n">desc</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">auto</span> <span class="n">terminated_name_len</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">auto</span> <span class="n">nhdr</span> <span class="o">=</span> <span class="n">Elf64_Nhdr</span> <span class="p">{</span> <span class="p">.</span><span class="n">n_namesz</span> <span class="o">=</span> <span class="p">(</span><span class="n">Elf64_Word</span><span class="p">)</span> <span class="n">terminated_name_len</span><span class="p">,</span>
                             <span class="p">.</span><span class="n">n_descsz</span> <span class="o">=</span> <span class="p">(</span><span class="n">Elf64_Word</span><span class="p">)</span> <span class="n">desc</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span>
                             <span class="p">.</span><span class="n">n_type</span> <span class="o">=</span> <span class="n">type</span> <span class="p">};</span>

    <span class="k">auto</span> <span class="n">name_padding</span> <span class="o">=</span> <span class="n">make_padding_span</span><span class="o">&lt;</span><span class="mi">4</span><span class="o">&gt;</span><span class="p">(</span><span class="n">terminated_name_len</span><span class="p">);</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">fwrite</span><span class="p">(</span><span class="o">&amp;</span><span class="n">nhdr</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">nhdr</span><span class="p">),</span> <span class="n">f</span><span class="p">)</span> <span class="o">!=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">nhdr</span><span class="p">)</span>
            <span class="o">||</span> <span class="n">fwrite</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">terminated_name_len</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span> <span class="o">!=</span> <span class="n">terminated_name_len</span>
            <span class="o">||</span> <span class="n">fwrite</span><span class="p">(</span><span class="n">name_padding</span><span class="p">.</span><span class="n">data</span><span class="p">(),</span> <span class="mi">1</span><span class="p">,</span> <span class="n">name_padding</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span> <span class="n">f</span><span class="p">)</span> <span class="o">!=</span> <span class="n">name_padding</span><span class="p">.</span><span class="n">size</span><span class="p">()</span>
            <span class="o">||</span> <span class="n">fwrite</span><span class="p">(</span><span class="n">desc</span><span class="p">.</span><span class="n">data</span><span class="p">(),</span> <span class="mi">1</span><span class="p">,</span> <span class="n">desc</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span> <span class="n">f</span><span class="p">)</span> <span class="o">!=</span> <span class="n">desc</span><span class="p">.</span><span class="n">size</span><span class="p">())</span>
    <span class="p">{</span>
        <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now for the main show:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">//  Process information (we leave most fields set to zero)</span>
<span class="n">elf_prpsinfo</span> <span class="n">prpsinfo</span> <span class="p">{};</span>
<span class="n">strncpy</span><span class="p">(</span><span class="n">prpsinfo</span><span class="p">.</span><span class="n">pr_psargs</span><span class="p">,</span> <span class="s">"bare-metal application"</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">prpsinfo</span><span class="p">.</span><span class="n">pr_psargs</span><span class="p">));</span>
<span class="n">write_note</span><span class="p">(</span><span class="n">elf</span><span class="p">,</span> <span class="s">"CORE"</span><span class="p">,</span> <span class="n">NT_PRPSINFO</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">as_bytes</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">span</span><span class="p">{</span><span class="o">&amp;</span><span class="n">prpsinfo</span><span class="p">,</span> <span class="mi">1</span><span class="p">}));</span>

<span class="c1">// Thread status and integer registers</span>
<span class="n">elf_prstatus</span> <span class="n">prstatus</span> <span class="p">{};</span>
<span class="n">prstatus</span><span class="p">.</span><span class="n">pr_pid</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">prstatus</span><span class="p">.</span><span class="n">pr_reg</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">saved_gp_registers</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">saved_gp_registers</span><span class="p">));</span>
<span class="n">write_note</span><span class="p">(</span><span class="n">elf</span><span class="p">,</span> <span class="s">"CORE"</span><span class="p">,</span> <span class="n">NT_PRSTATUS</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">as_bytes</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">span</span><span class="p">{</span><span class="o">&amp;</span><span class="n">prstatus</span><span class="p">,</span> <span class="mi">1</span><span class="p">}));</span>

<span class="c1">// FPU registers</span>
<span class="n">write_note</span><span class="p">(</span><span class="n">elf</span><span class="p">,</span> <span class="s">"CORE"</span><span class="p">,</span> <span class="n">NT_FPREGSET</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">as_bytes</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">span</span><span class="p">{</span><span class="o">&amp;</span><span class="n">saved_fp_registers</span><span class="p">,</span> <span class="mi">1</span><span class="p">}));</span>
</code></pre></div></div>

<p>Finally, write the memory image, respecting the alignment calculated earlier:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">note_align</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="n">scratch</span><span class="p">[</span><span class="n">note_align</span><span class="p">];</span>
    <span class="n">memset</span><span class="p">(</span><span class="n">scratch</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">scratch</span><span class="p">));</span>
    <span class="n">fwrite</span><span class="p">(</span><span class="n">scratch</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">scratch</span><span class="p">),</span> <span class="n">elf</span><span class="p">);</span>
<span class="p">}</span>

<span class="n">fwrite</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="n">MEMORY_SNAPSHOT_ADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">MEMORY_SNAPSHOT_SIZE</span><span class="p">,</span> <span class="n">elf</span><span class="p">);</span>
</code></pre></div></div>
<p>And that’s it! You can find the complete code <a href="https://gist.github.com/mcejp/2f6b4405589d6507a68eb893a8a6700d">in this Gist</a>.</p>

<p>As for testing, here is a tip: GDB can be used to extract information from a core dump non-interactively – useful for unit tests, et cetera:</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb --batch -n -ex bt &lt;program&gt; &lt;core&gt;

warning: core file may not match specified executable file.
[New LWP 1]
Core was generated by `bare-metal application'.
#0  0x0000000078020d8c in access_invalid_memory () at access_violation.cpp:8
#1  0x0000000078020db0 in main (argc=&lt;optimized out&gt;, argv=&lt;optimized out&gt;) at access_violation.cpp:23
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
</code></pre></div></div>

<h2 id="see-also">See also</h2>

<ul>
  <li>Linux core dump <del>specification</del> haha! <a href="https://elixir.bootlin.com/linux/latest/source/fs/binfmt_elf.c">code</a></li>
  <li><a href="https://github.com/anatol/google-coredumper">Google Coredumper</a></li>
  <li><a href="https://refspecs.linuxfoundation.org/elf/elf.pdf">Executable and Linking Format (ELF) Specification</a></li>
  <li><a href="https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst">ELF for the Arm® 64-bit Architecture (AArch64)</a></li>
</ul>]]></content><author><name></name></author><summary type="html"><![CDATA[Introduction]]></summary></entry><entry><title type="html">Generating Makefile-like workflows with Python</title><link href="https://mcejp.github.io/2022/09/23/goeiedag.html" rel="alternate" type="text/html" title="Generating Makefile-like workflows with Python" /><published>2022-09-23T00:00:00+02:00</published><updated>2022-09-23T00:00:00+02:00</updated><id>https://mcejp.github.io/2022/09/23/goeiedag</id><content type="html" xml:base="https://mcejp.github.io/2022/09/23/goeiedag.html"><![CDATA[<p>Every now and then I feel a need to programatically generate and execute a Makefile: for example, to execute a graph-based procedural generation workflow, or to compile some game assets that are discovered dynamically by a tool – in general, all these problems can be represented as <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graphs</a> of files and tasks.</p>

<p>At some point it gets quite tiring to write the same code over and over, so I went ahead and made it into a little library. It’s pretty straightforward. First, you create a graph.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">goeiedag</span>

<span class="n">graph</span> <span class="o">=</span> <span class="n">goeiedag</span><span class="p">.</span><span class="n">CommandGraph</span><span class="p">()</span>
</code></pre></div></div>

<p>Afterwards, you add tasks to it…</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>

<span class="c1"># Get username
</span><span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">([</span><span class="s">"whoami"</span><span class="p">,</span> <span class="s">"&gt;"</span><span class="p">,</span> <span class="s">"username.txt"</span><span class="p">],</span>
          <span class="n">inputs</span><span class="o">=</span><span class="p">[],</span>
          <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="s">"username.txt"</span><span class="p">])</span>
</code></pre></div></div>

<p>…and execute it.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">goeiedag</span><span class="p">.</span><span class="n">build_all</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">Path</span><span class="p">())</span>
</code></pre></div></div>

<p>Note that if you run this code twice, it will not re-execute the <code class="highlighter-rouge">whoami</code> command, since the output already exists and is considered up-to-date.</p>

<p>For commands that have inputs (or <em>dependencies</em>), the build tool needs to know about these, so that it is able to determine when the output has become obsolete and needs to be rebuilt. Of course, one command can depend on outputs from other commands, as long as there are no <a href="https://en.wikipedia.org/wiki/Circular_dependency">circular dependencies</a>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Extract OS name from /etc/os-release
</span><span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">([</span><span class="s">"grep"</span><span class="p">,</span> <span class="s">"^NAME="</span><span class="p">,</span> <span class="s">"/etc/os-release"</span><span class="p">,</span> <span class="s">"&gt;"</span><span class="p">,</span> <span class="s">"os-name.txt"</span><span class="p">],</span>
          <span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="s">"/etc/os-release"</span><span class="p">],</span>
          <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="s">"os-name.txt"</span><span class="p">])</span>
</code></pre></div></div>

<p>To make usage more convenient and avoid repetition, the library provides some special symbols to let you refer to the declared inputs and outputs when building up the command.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">goeiedag</span> <span class="kn">import</span> <span class="n">ALL_INPUTS</span><span class="p">,</span> <span class="n">INPUT</span><span class="p">,</span> <span class="n">OUTPUT</span>

<span class="c1"># Extract OS name from /etc/os-release
</span><span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">([</span><span class="s">"grep"</span><span class="p">,</span> <span class="s">"^NAME="</span><span class="p">,</span> <span class="n">INPUT</span><span class="p">,</span> <span class="s">"&gt;"</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">],</span>
          <span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="s">"/etc/os-release"</span><span class="p">],</span>
          <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="s">"os-name.txt"</span><span class="p">])</span>
<span class="c1"># Get username
</span><span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">([</span><span class="s">"whoami"</span><span class="p">,</span> <span class="s">"&gt;"</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">],</span>
          <span class="n">inputs</span><span class="o">=</span><span class="p">[],</span>
          <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="s">"username.txt"</span><span class="p">])</span>
<span class="c1"># Glue together to produce output
</span><span class="n">graph</span><span class="p">.</span><span class="n">add</span><span class="p">([</span><span class="s">"cat"</span><span class="p">,</span> <span class="n">ALL_INPUTS</span><span class="p">,</span> <span class="s">"&gt;"</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">.</span><span class="n">result</span><span class="p">],</span>
          <span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="s">"os-name.txt"</span><span class="p">,</span> <span class="s">"username.txt"</span><span class="p">],</span>
          <span class="n">outputs</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">result</span><span class="o">=</span><span class="s">"result.txt"</span><span class="p">))</span>
</code></pre></div></div>

<p>When <code class="highlighter-rouge">graph.build_all</code> is called, the library will generate a <a href="https://ninja-build.org/">Ninja build</a> file and execute it. I didn’t feel a need to re-implement the logic of orchestrating and executing the DAG efficiently, so the aim was to take advantage of an existing build executor. Compared to Make, Ninja has more pleasant tooling, better performance when dealing with complex builds, and is also easier to generate code for (for one, it cleanly supports tasks that generate multiple outputs).</p>

<p>There exist many similar libraries with somewhat different paradigms; for example, <a href="https://docs.dask.org/en/latest/graphs.html">Dask</a> and <a href="https://pypi.org/project/taskgraph/">TaskGraph</a> use plain Python functions, rather than shell commands, as a unit of execution. This is advantageous if your flow is Python-heavy, but it locks you out of taking advantage of high-quality build executors like Ninja. <a href="https://snakemake.readthedocs.io/en/stable/">Snakemake</a> does work with commands, but AFAIU doesn’t help you build the task graph programatically – the input has to be provided in Snakemake’s text format.</p>

<p>The package is <a href="https://pypi.org/project/goeieDAG/">available on PyPI</a> and <a href="https://github.com/mcejp/goeieDAG">GitHub</a>. Let me know what you think!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Every now and then I feel a need to programatically generate and execute a Makefile: for example, to execute a graph-based procedural generation workflow, or to compile some game assets that are discovered dynamically by a tool – in general, all these problems can be represented as directed acyclic graphs of files and tasks.]]></summary></entry><entry><title type="html">Macro expansion in Hy-based custom languages</title><link href="https://mcejp.github.io/2022/08/18/hy-macroexpand.html" rel="alternate" type="text/html" title="Macro expansion in Hy-based custom languages" /><published>2022-08-18T00:00:00+02:00</published><updated>2022-08-18T00:00:00+02:00</updated><id>https://mcejp.github.io/2022/08/18/hy-macroexpand</id><content type="html" xml:base="https://mcejp.github.io/2022/08/18/hy-macroexpand.html"><![CDATA[<p>This seems like a very obvious thing to do, but I could not find a simple example anywhere.</p>

<p>Suppose we want to define a very simple S-expression-based language, with no variables or functions, just literals and a single core form – <code class="highlighter-rouge">print</code>:</p>

<div class="language-hy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">print</span><span class="w"> </span><span class="p">[</span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="mi">3</span><span class="p">])</span><span class="w">
</span></code></pre></div></div>

<p>Outputting:</p>

<div class="language-hy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">'</span><span class="p">[</span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="mi">3</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>

<p>However, to allow the user to abstract, we want to permit the usage of Hy macros – with the full Hy language available at expansion time:</p>

<div class="language-hy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span><span class="w"> </span><span class="n">iota</span><span class="w"> </span><span class="p">[</span><span class="n">max</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">list</span><span class="w"> </span><span class="p">(</span><span class="nb">range</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">max</span><span class="p">))))</span><span class="w">

</span><span class="c1">; pointless macro just to demonstrate repeated expansion</span><span class="w">
</span><span class="p">(</span><span class="nb">defmacro</span><span class="w"> </span><span class="nb">identity</span><span class="w"> </span><span class="p">[</span><span class="n">expr</span><span class="p">]</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w">

</span><span class="p">(</span><span class="nf">print</span><span class="w"> </span><span class="p">(</span><span class="nb">identity</span><span class="w"> </span><span class="p">(</span><span class="nf">iota</span><span class="w"> </span><span class="mi">3</span><span class="p">)))</span><span class="w">
</span></code></pre></div></div>

<p>Output:</p>

<div class="language-hy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">'</span><span class="p">[</span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="mi">3</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>

<p>Here is the Hy source of an interpreter for this language. This is as simple as I managed to get:</p>

<div class="language-hy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">#</span><span class="n">!/usr/bin/env</span><span class="w"> </span><span class="n">hy</span><span class="w">

</span><span class="p">(</span><span class="k">import</span><span class="w"> </span><span class="n">hyrule</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">import</span><span class="w"> </span><span class="n">os</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">import</span><span class="w"> </span><span class="n">sys</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">import</span><span class="w"> </span><span class="n">types</span><span class="p">)</span><span class="w">

</span><span class="p">(</span><span class="nf">with</span><span class="w"> </span><span class="p">[</span><span class="n">f</span><span class="w"> </span><span class="p">(</span><span class="nf">open</span><span class="w"> </span><span class="s">"minimal.hy"</span><span class="p">)]</span><span class="w"> </span><span class="p">(</span><span class="nb">do</span><span class="w">
    </span><span class="p">(</span><span class="nb">setv</span><span class="w"> </span><span class="n">module</span><span class="w"> </span><span class="p">(</span><span class="nf">types.ModuleType</span><span class="w"> </span><span class="s">"minimal"</span><span class="p">))</span><span class="w">
    </span><span class="c1">; without the following, evaluation of defmacro triggers an error</span><span class="w">
    </span><span class="p">(</span><span class="nb">setv</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">sys.modules</span><span class="w"> </span><span class="n">module.__name__</span><span class="p">)</span><span class="w"> </span><span class="n">module</span><span class="p">)</span><span class="w">

    </span><span class="p">(</span><span class="nb">setv</span><span class="w"> </span><span class="n">compiler</span><span class="w"> </span><span class="p">(</span><span class="nf">hy.compiler.HyASTCompiler</span><span class="w"> </span><span class="n">module</span><span class="w"> </span><span class="n">module.__name__</span><span class="p">))</span><span class="w">

    </span><span class="p">(</span><span class="k">for</span><span class="w"> </span><span class="p">[</span><span class="n">form</span><span class="w"> </span><span class="p">(</span><span class="nf">hy.read-many</span><span class="w"> </span><span class="n">f</span><span class="p">)]</span><span class="w">
        </span><span class="c1">; expand macros -- this includes processing of "defmacro" itself</span><span class="w">
        </span><span class="p">(</span><span class="nb">setv</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="p">(</span><span class="nf">hyrule.macrotools.macroexpand-all</span><span class="w">
                </span><span class="no">:form</span><span class="w"> </span><span class="n">form</span><span class="w">
                </span><span class="no">:ast-compiler</span><span class="w"> </span><span class="n">compiler</span><span class="w">
                </span><span class="p">))</span><span class="w">
        </span><span class="c1">;(print "EVAL" (hy.repr exp))</span><span class="w">

        </span><span class="c1">; evaluate form</span><span class="w">
        </span><span class="p">(</span><span class="nb">cond</span><span class="w">
            </span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">hy.models.Symbol</span><span class="w"> </span><span class="s">"defmacro"</span><span class="p">))</span><span class="w">
                </span><span class="c1">; at evaluation time ignore defmacro</span><span class="w">
                </span><span class="k">None</span><span class="w">
            </span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">hy.models.Symbol</span><span class="w"> </span><span class="s">"print"</span><span class="p">))</span><span class="w">
                </span><span class="c1">; this is our core form</span><span class="w">
                </span><span class="p">(</span><span class="nf">print</span><span class="w"> </span><span class="p">(</span><span class="nf">hy.repr</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="mi">1</span><span class="p">)))</span><span class="w">
            </span><span class="k">True</span><span class="w">
                </span><span class="p">(</span><span class="k">raise</span><span class="w"> </span><span class="p">(</span><span class="nf">Exception</span><span class="w"> </span><span class="s">"invalid form"</span><span class="p">))</span><span class="w">
        </span><span class="p">)</span><span class="w">
    </span><span class="p">)</span><span class="w">
</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>

<p>One could argue that this is a stupid thing to do, that I should have simply defined my <code class="highlighter-rouge">print</code> as a new macro, and used the standard <code class="highlighter-rouge">hy.eval</code>. Well, what if my set of core forms <em>is not known ahead of time</em>? Consider another language, one where each core form executes a shell command of the same name:</p>

<div class="language-hy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span><span class="w"> </span><span class="n">get-filename</span><span class="w"> </span><span class="p">[]</span><span class="w"> </span><span class="s">"hello.txt"</span><span class="p">)</span><span class="w">

</span><span class="p">(</span><span class="nf">echo</span><span class="w"> </span><span class="s">"Hello"</span><span class="w"> </span><span class="s">"World"</span><span class="w"> </span><span class="s">"&gt;"</span><span class="w"> </span><span class="p">(</span><span class="nf">get-filename</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">cat</span><span class="w"> </span><span class="p">(</span><span class="nf">get-filename</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">uname</span><span class="w"> </span><span class="s">"-a"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>Well, with this structure, you can do that! The evaluation block just changes to this:</p>

<div class="language-hy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="w">        </span><span class="c1">; evaluate form</span><span class="w">
        </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">!=</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">hy.models.Symbol</span><span class="w"> </span><span class="s">"defmacro"</span><span class="p">))</span><span class="w">
            </span><span class="p">(</span><span class="nf">os.system</span><span class="w"> </span><span class="p">(</span><span class="nf">.join</span><span class="w"> </span><span class="s">" "</span><span class="w"> </span><span class="p">(</span><span class="nf">list</span><span class="w"> </span><span class="n">exp</span><span class="p">)))</span><span class="w">
            </span><span class="k">None</span><span class="w">
        </span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>This seems quite useful, doesn’t it – imagine the possibilities! I would be curious to see the <a href="https://racket-lang.org/">Racket</a> equivalent, since the juggling of modules and scopes seemed quite a bit more involved there. On the other hand, Racket has first-class custom language support, which might help quite a bit. And then there is <a href="https://docs.racket-lang.org/ee-lib/index.html">ee-lib</a>, which I have yet to explore.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[This seems like a very obvious thing to do, but I could not find a simple example anywhere.]]></summary></entry><entry><title type="html">ARMv8: Cache coherency between code running in different exception levels</title><link href="https://mcejp.github.io/2022/08/05/cache-coherency.html" rel="alternate" type="text/html" title="ARMv8: Cache coherency between code running in different exception levels" /><published>2022-08-05T00:00:00+02:00</published><updated>2022-08-05T00:00:00+02:00</updated><id>https://mcejp.github.io/2022/08/05/cache-coherency</id><content type="html" xml:base="https://mcejp.github.io/2022/08/05/cache-coherency.html"><![CDATA[<p>At its heart, the Xilinx UltraScale+ SOC has a multi-core Cortex-A53 CPU. This is not the fastest ARM out there, but it’s still plenty capable. One interesting feature is its built-in Snoop Control Unit (SCU). This enables transparent synchronization of L1 caches among the individual cores. There is one pitfall that you might fall into when running bare-metal code: the distinction between secure and non-secure memory access.</p>

<p>If one of your cores operates in a secure exception level (such as EL3) and another runs a non-secure exception level (EL1), by default they see different memory spaces. The AXI bus has a signal called <code class="highlighter-rouge">AxPROT</code> that distinguishes secure and non-secure access. Once your data reaches a “stupid” memory such as DDR SDRAM, this distinction vanishes; however, the data cache does takes it into account and effectively treats these two as separate address spaces. Thus, your precious shared memory buffer will not be coherent until flushed on one side and invalidated on the other.</p>

<p>Fortunately, there is a simple solution. The page tables that you set up to configure the MMU (a prerequisite for using the snooper) <a href="https://developer.arm.com/documentation/ddi0406/cb/System-Level-Architecture/Virtual-Memory-System-Architecture--VMSA-/Long-descriptor-translation-table-format/Memory-attributes-in-the-Long-descriptor-translation-table-format-descriptors?lang=en">have a bit called <code class="highlighter-rouge">NS</code> (Non-secure)</a>. Setting this bit forces all accesses to be treated as Non-secure even when running in a secure EL. The converse (forcing secure access from EL1) is of course not possible, because it would completely break the security model.</p>

<p><code class="highlighter-rouge">NS</code> is bit 5 of the <em>Lower attributes</em>, so if you’re using the Xilinx-provided template code (<code class="highlighter-rouge">translation_table.S</code>) and you don’t really care about this type of security, you might want to simply change the line which says</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.set Memory,	0x405 | (3 &lt;&lt; 8) | (0x0)		/* normal writeback write allocate inner shared read write */
</code></pre></div></div>

<p>to</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.set Memory,	0x425 | (3 &lt;&lt; 8) | (0x0)		/* normal writeback write allocate inner shared read write (forced non-secure) */
</code></pre></div></div>

<p>Alternatively, this can be done at runtime.</p>

<p>Thanks to <a href="https://community.arm.com/support-forums/f/architectures-and-processors-forum/52365/caches-el3-and-el1-on-a53-clusters">this thread</a> on the ARM Support forums.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[At its heart, the Xilinx UltraScale+ SOC has a multi-core Cortex-A53 CPU. This is not the fastest ARM out there, but it’s still plenty capable. One interesting feature is its built-in Snoop Control Unit (SCU). This enables transparent synchronization of L1 caches among the individual cores. There is one pitfall that you might fall into when running bare-metal code: the distinction between secure and non-secure memory access.]]></summary></entry><entry><title type="html">Tracking FPGA design build metrics</title><link href="https://mcejp.github.io/2022/06/20/builds.html" rel="alternate" type="text/html" title="Tracking FPGA design build metrics" /><published>2022-06-20T00:00:00+02:00</published><updated>2022-06-20T00:00:00+02:00</updated><id>https://mcejp.github.io/2022/06/20/builds</id><content type="html" xml:base="https://mcejp.github.io/2022/06/20/builds.html"><![CDATA[<h3 id="with-low-infrastructure-footprint">…with low infrastructure footprint</h3>

<p><img src="../../../images/2022-06-20-builds/screenshot.png" alt="screenshot" /></p>

<p>For my <a href="https://github.com/mcejp/Poly94">latest FPGA toy project</a>, I was looking for a way to have an overview of builds with performance metrics – f<sub>max</sub> and resource usage, but also results of simulation and benchmarks.</p>

<p>GitLab CI has a convenient feature whereby one can specify a regular expression to extract test coverage from CI logs, and this value is then displayed on the job list page. Unfortunately it does not offer any flexibility beyond that.</p>

<p>Despite searching far and wide, I didn’t find any satisfactory solution (at least not for free), so I set out to build my own.</p>

<p>The rough idea was the following:</p>

<ol>
  <li>collect key metrics from CI jobs:
    <ul>
      <li>timing results</li>
      <li>resource usage</li>
      <li>test results under simulation</li>
      <li>benchmarks under simulation</li>
    </ul>
  </li>
  <li>accumulate data across builds</li>
  <li>present in a spreadsheet for easy viewing</li>
</ol>

<p><img src="../../../images/2022-06-20-builds/diagram.png" alt="schematic" /></p>

<p>Since I am already quite invested in GitLab CI, I wanted to maximally reuse the facilities it provides. However, to accumulate data across many builds, I still needed some kind of database. I opted for <a href="https://www.postgresql.org/">PostgreSQL</a>, motivated by the existence of <a href="https://supabase.com/">Supabase</a>, which provides a 500 MB cloud-hosted database for free. It doesn’t really matter, a private MariaDB or some NoSQL solution would do the job as well. The only requirement is reachability from the CI runners.</p>

<p>The next question was one of the database schema. I decided to harcode only a few very general columns which might need to be indexed later, for purposes of filtering and sorting:</p>

<ul>
  <li>commit hash &amp; title</li>
  <li>timestamps of commit + pipeline</li>
  <li>CI pipeline URL</li>
  <li>branch name</li>
</ul>

<p>The rest of the build data is shoved into a JSON object to allow maximum flexibility and easy schema evolution.</p>

<p>To present the data, a final CI job generates a static, self-contained HTML page and deploys it via GitLab Pages. Simple!</p>

<h2 id="implementation">Implementation</h2>

<p>Let’s take a walk through the code now.</p>

<h3 id="db-schema">DB schema</h3>

<p>We start by creating a table. (Recall that <code class="highlighter-rouge">public</code> is the default schema in a new PostgreSQL database.)</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="k">public</span><span class="p">.</span><span class="n">builds</span> <span class="p">(</span>
    <span class="n">id</span> <span class="nb">serial</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">pipeline_timestamp</span> <span class="nb">timestamp</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">pipeline_url</span> <span class="nb">varchar</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">branch</span> <span class="nb">varchar</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">commit_sha1</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">40</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">commit_timestamp</span> <span class="nb">timestamp</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="nv">"attributes"</span> <span class="n">json</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">CONSTRAINT</span> <span class="n">builds_pk</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<h3 id="configuring-the-jobs-to-always-save-artifacts">Configuring the jobs to always save artifacts</h3>

<p>Builds may succeed or fail, but in any case, we want the logs to be saved for later processing. This applies to all build and simulation jobs.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">build_yosys</span><span class="pi">:</span>
  <span class="na">script</span><span class="pi">:</span> <span class="s">...</span>

  <span class="na">artifacts</span><span class="pi">:</span>
    <span class="na">paths</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">build/nextpnr-report.json</span>
      <span class="pi">-</span> <span class="s">ulx3s.bit</span>
      <span class="pi">-</span> <span class="s2">"</span><span class="s">*.log"</span>
    <span class="na">when</span><span class="pi">:</span> <span class="s">always</span>
</code></pre></div></div>

<p>Note that if an entry under <code class="highlighter-rouge">paths</code> fails refers to a non-existent file, GitLab Runner will complain a bit, but ultimately will go about its day, without triggering a job failure or skipping the remaining artifacts.</p>

<h3 id="collecting-the-results">Collecting the results</h3>

<p>The first custom job is tasked with collecting the results of all builds/tests, extracting key metrics, and storing them into the database.</p>

<p>For this reason, it needs to depend on all the previous jobs and their artifacts.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">reports</span><span class="pi">:</span>
  <span class="na">stage</span><span class="pi">:</span> <span class="s">upload</span>
  <span class="na">needs</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">job</span><span class="pi">:</span> <span class="s">build_ulx3s</span>
    <span class="na">artifacts</span><span class="pi">:</span> <span class="no">true</span>
  <span class="pi">-</span> <span class="na">job</span><span class="pi">:</span> <span class="s">test_cocotb</span>
    <span class="na">artifacts</span><span class="pi">:</span> <span class="no">true</span>
  <span class="pi">-</span> <span class="na">job</span><span class="pi">:</span> <span class="s">test_verilator</span>
    <span class="na">artifacts</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">when</span><span class="pi">:</span> <span class="s">always</span>

  <span class="na">image</span><span class="pi">:</span> <span class="s">python:3.10</span>

  <span class="na">script</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">pip install junitparser "psycopg&gt;=3"</span>
    <span class="pi">-</span> <span class="s">./tools/ci/save_build_stats.py</span>
</code></pre></div></div>

<p><a href="https://gitlab.com/mcejp/Poly94/-/blob/develop/tools/ci/save_build_stats.py">The body of this script is</a>, for the most part, unremarkable. We begin by preparing a dictionary and collecting the first pieces of metadata:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">results</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">results</span><span class="p">[</span><span class="s">"commit_title"</span><span class="p">]</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"CI_COMMIT_TITLE"</span><span class="p">]</span>
</code></pre></div></div>

<p>The results extraction is mostly tool specific, so I will only reproduce one example here, which is that of parsing Cocotb results in JUnit format:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">exists</span><span class="p">(</span><span class="s">"results.xml"</span><span class="p">):</span>
    <span class="n">xml</span> <span class="o">=</span> <span class="n">JUnitXml</span><span class="p">.</span><span class="n">fromfile</span><span class="p">(</span><span class="s">"results.xml"</span><span class="p">)</span>

    <span class="n">failures</span> <span class="o">=</span> <span class="p">[]</span>

    <span class="k">for</span> <span class="n">suite</span> <span class="ow">in</span> <span class="n">xml</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">case</span> <span class="ow">in</span> <span class="n">suite</span><span class="p">:</span>
            <span class="c1"># Failures are reported by a &lt;failure /&gt; node under the test case,
</span>            <span class="c1"># while passing tests don't carry any result at all.
</span>            <span class="c1"># To be determined whether this is the JUnit convention,
</span>            <span class="c1"># or a cocotb idiosyncrasy.
</span>            <span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">r</span><span class="p">.</span><span class="n">_tag</span> <span class="o">==</span> <span class="s">"failure"</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">case</span><span class="p">.</span><span class="n">result</span><span class="p">):</span>
                <span class="n">failures</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">case</span><span class="p">.</span><span class="n">classname</span> <span class="o">+</span> <span class="s">":"</span> <span class="o">+</span> <span class="n">case</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>

    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">failures</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">results</span><span class="p">[</span><span class="s">"sim"</span><span class="p">]</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">result</span><span class="o">=</span><span class="s">"fail"</span><span class="p">,</span> <span class="n">failed_testcases</span><span class="o">=</span><span class="n">failures</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">results</span><span class="p">[</span><span class="s">"sim"</span><span class="p">]</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">result</span><span class="o">=</span><span class="s">"pass"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
    <span class="n">results</span><span class="p">[</span><span class="s">"sim"</span><span class="p">]</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">result</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>   <span class="c1"># result unknown, maybe design failed to compile
</span></code></pre></div></div>

<p>Finally, we collect the indexable metadata and shove everything into the table:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">psycopg</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"POSTGRES_CONN_STRING"</span><span class="p">])</span> <span class="k">as</span> <span class="n">conn</span><span class="p">:</span>
    <span class="n">cursor</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="n">cursor</span><span class="p">()</span>
    <span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'INSERT INTO builds(pipeline_timestamp, pipeline_url, branch, '</span>
                   <span class="s">'commit_sha1, commit_timestamp, "attributes") '</span>
                   <span class="s">'VALUES (%s, %s, %s, %s, %s, %s)'</span><span class="p">,</span> <span class="p">(</span>
        <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"CI_PIPELINE_CREATED_AT"</span><span class="p">],</span>
        <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"CI_PIPELINE_URL"</span><span class="p">],</span>
        <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"CI_COMMIT_BRANCH"</span><span class="p">],</span>
        <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"CI_COMMIT_SHA"</span><span class="p">],</span>
        <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"CI_COMMIT_TIMESTAMP"</span><span class="p">],</span>
        <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">results</span><span class="p">))</span>
    <span class="p">)</span>
</code></pre></div></div>

<p>The environment variable <code class="highlighter-rouge">POSTGRES_CONN_STRING</code> must be defined in the project’s CI settings. Normally your database host will provide the connection string readily. It should follow this template: <code class="highlighter-rouge">postgresql://username:password@hostspec/dbname</code>. Don’t forget to include the password!</p>

<h3 id="presentation">Presentation</h3>

<p><a href="https://gitlab.com/mcejp/Poly94/-/blob/develop/tools/ci/present_build_stats.py">A second script</a> takes care of fetching all historical records and presenting them on a webpage.</p>

<p>To deploy into GitLab pages the name of the job must be literally <code class="highlighter-rouge">pages</code> and it needs to upload a directory called <code class="highlighter-rouge">public</code> as artifact. Otherwise, the configuration is straightforward, it just needs to wait for the <code class="highlighter-rouge">reports</code> job to finish, in order to always work with the most up-to-date data.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">pages</span><span class="pi">:</span>
  <span class="na">stage</span><span class="pi">:</span> <span class="s">upload</span>
  <span class="na">needs</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">reports</span><span class="pi">]</span>

  <span class="na">image</span><span class="pi">:</span> <span class="s">python:3.10</span>

  <span class="na">script</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">mkdir public</span>
    <span class="pi">-</span> <span class="s">cd public</span>
    <span class="pi">-</span> <span class="s">pip install Jinja2 "psycopg&gt;=3"</span>
    <span class="pi">-</span> <span class="s">../tools/ci/present_build_stats.py</span>

  <span class="na">artifacts</span><span class="pi">:</span>
    <span class="na">paths</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">public</span>
</code></pre></div></div>

<p>The Python part starts by fetching all the DB records and passing them onto a <a href="https://gitlab.com/mcejp/Poly94/-/blob/develop/tools/ci/build_stats.html">Jinja template</a>.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">psycopg</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"POSTGRES_CONN_STRING"</span><span class="p">])</span> <span class="k">as</span> <span class="n">conn</span><span class="p">:</span>
    <span class="n">cursor</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="n">cursor</span><span class="p">()</span>
    <span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'SELECT id, pipeline_timestamp, pipeline_url, branch, commit_sha1, '</span>
                   <span class="s">'commit_timestamp, "attributes" '</span>
                   <span class="s">'FROM builds ORDER BY pipeline_timestamp DESC, id DESC'</span><span class="p">)</span>

    <span class="n">builds</span> <span class="o">=</span> <span class="p">[</span><span class="nb">dict</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span>
                   <span class="n">pipeline_timestamp</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span>
                   <span class="n">pipeline_url</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span>
                   <span class="n">branch</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span>
                   <span class="n">commit_sha1</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="mi">4</span><span class="p">],</span>
                   <span class="n">commit_timestamp</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="mi">5</span><span class="p">],</span>
                   <span class="o">**</span><span class="n">row</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span>   <span class="c1"># unpack all other attributes into the dictionary
</span>                   <span class="p">)</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">cursor</span><span class="p">.</span><span class="n">fetchall</span><span class="p">()]</span>

<span class="n">env</span> <span class="o">=</span> <span class="n">jinja2</span><span class="p">.</span><span class="n">Environment</span><span class="p">(</span><span class="n">loader</span><span class="o">=</span><span class="n">jinja2</span><span class="p">.</span><span class="n">FileSystemLoader</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">__file__</span><span class="p">)))</span>
<span class="n">template</span> <span class="o">=</span> <span class="n">env</span><span class="p">.</span><span class="n">get_template</span><span class="p">(</span><span class="s">"build_stats.html"</span><span class="p">)</span>

<span class="n">Path</span><span class="p">(</span><span class="s">"builds.html"</span><span class="p">).</span><span class="n">write_text</span><span class="p">(</span><span class="n">template</span><span class="p">.</span><span class="n">render</span><span class="p">(</span>
    <span class="n">builds</span><span class="o">=</span><span class="n">builds</span><span class="p">,</span>
    <span class="n">project_url</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"CI_PROJECT_URL"</span><span class="p">]</span>
    <span class="p">))</span>
</code></pre></div></div>

<p><a href="https://mcejp.gitlab.io/Poly94/builds.html">Here</a> you can see the result live.</p>

<p>For more flavor, we also generate some badges. The idea here is to generate a publicly accessible JSON file that can be fed into <a href="https://shields.io">shields.io</a> <em>Endpoint</em> mode.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fmax_str</span> <span class="o">=</span> <span class="s">"%.1f MHz"</span> <span class="o">%</span> <span class="n">reference_build</span><span class="p">[</span><span class="s">"build"</span><span class="p">][</span><span class="s">"fmax"</span><span class="p">][</span><span class="n">reference_clk</span><span class="p">][</span><span class="s">"achieved"</span><span class="p">]</span>
<span class="n">Path</span><span class="p">(</span><span class="s">"fmax.json"</span><span class="p">).</span><span class="n">write_text</span><span class="p">(</span><span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="nb">dict</span><span class="p">(</span><span class="n">schemaVersion</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
                                             <span class="n">label</span><span class="o">=</span><span class="s">"Fmax"</span><span class="p">,</span>
                                             <span class="n">message</span><span class="o">=</span><span class="n">fmax_str</span><span class="p">,</span>
                                             <span class="n">color</span><span class="o">=</span><span class="s">"orange"</span><span class="p">)))</span>
<span class="c1"># --&gt; render via https://img.shields.io/endpoint?url=https://MY_GITLAB_PAGES.io/fmax.json
</span></code></pre></div></div>

<p>This produces a static URL that can be added to the project website and the image will always reflect the result of the most recent build: <img src="../../../images/2022-06-20-builds/fmax-badge.svg" alt="Fmax badge" /></p>

<p>Magic? No – <em>science</em>!</p>

<h3 id="a-note-about-security">A note about security</h3>

<p>The database connection string is supplied through an environment variable, which allows us to store it in the GitLab project with reasonable security – <a href="https://docs.gitlab.com/ee/ci/variables/#limit-the-environment-scope-of-a-cicd-variable">as long as an untrusted party cannot execute an <code class="highlighter-rouge">echo</code> command in our CI</a>. Admittedly, this is a weakness of the chosen approach. To allow secure collection of results from pipelines triggered by third parties (which includes all merge requests, for example), it would probably require a separate, trusted pipeline or some kind of scraper task running somewhere in the cloud.</p>

<h3 id="scalability">Scalability</h3>

<p>You might have noticed that there is no pagination mechanism; in fact, the presentation job has <em>O(n)</em> complexity with respect to the number of historic builds. This will not scale once there are hundreds and thousands of builds, and a more efficient approach will be required.</p>

<p>The collection and presentation jobs also add some time to the overall runtime of the pipeline. Unfortunately, it seems that most of this time is just overhead of spinning the runner up, thus quite difficult to get rid of.</p>

<h2 id="final-thoughs">Final thoughs</h2>

<p>At this point, it might seem an logical next step to generalize the presented solution into something more flexible, an off-the-shelf tool that could suit other teams. For now, I have decided against that for reasons of complexity. In the world of build automation and complex FPGA designs, different projects have wildly different needs; and while a couple hard-coded scripts are easy to understand and maintain, a useful generic framework would need a lot of flexibility, thus incurring a high upfront cost in terms of complexity.</p>

<p>Therefore, my recommendation would be that you just copy the code, and adapt it to your specific needs.</p>

<p><strong>UPDATE (2023-11-01):</strong> Originally, the post recommended bit.io as the cloud Postgres database. Unfortunately, that service <a href="https://blog.bit.io/whats-next-for-bit-io-joining-databricks-ace9a40bce0d">shut down earlier in 2023</a>. Supabase seems like a competent alternative.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[…with low infrastructure footprint]]></summary></entry></feed>