<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://blog.ethantwardy.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.ethantwardy.com/" rel="alternate" type="text/html" /><updated>2026-04-22T12:09:42+00:00</updated><id>https://blog.ethantwardy.com/feed.xml</id><title type="html">Software Development Blog</title><subtitle>I believe in human communication, which is why this site is free of AI-generated content. All of these posts have been carefully crafted with love by a human.</subtitle><entry><title type="html">Learning OCaml by Parsing JSON</title><link href="https://blog.ethantwardy.com/languages/2025/11/23/learning-ocaml.html" rel="alternate" type="text/html" title="Learning OCaml by Parsing JSON" /><published>2025-11-23T19:00:00+00:00</published><updated>2025-11-23T19:00:00+00:00</updated><id>https://blog.ethantwardy.com/languages/2025/11/23/learning-ocaml</id><content type="html" xml:base="https://blog.ethantwardy.com/languages/2025/11/23/learning-ocaml.html"><![CDATA[<p>Two weeks ago, I started on a journey to learn how to design programming
languages. After thorough research, I decided to learn and use OCaml for this.</p>

<p>For starters, it’s very important to me that the programming languages I create
are supported by formal mathematics and type theory. The Coq proof assistant
(which now prefers to be called Rocq) enables logicians to write their compiler
alongside a proof of the language’s interesting properties. More on this in a
future post. Rocq extracts to OCaml. It extracts to other languages, as well,
like Haskell, but since the official installation instructions reference
<code>opam</code>, I thought that I would have a better time learning and using OCaml than
fighting with the Rocq compiler to extract to workable Haskell. I’ll probably
never learn whether that hypothesis holds water.</p>

<p>The other books I’ve selected for my journey also make heavy use of OCaml.
Besides this, I haven’t yet been able to claim an ML dialect as one of my
competencies, so I thought this would be an opportunity check that box.</p>

<p>I selected the book <em>Real World OCaml</em> to teach myself the language. The second
edition is available for free online from Cambridge, but I like to touch paper,
so I ordered a used paperback on Thriftbooks. I read Part I from start to
finish, only skimming the chapters on objects and classes, and picked a
smattering of content from Part II–the chapters on testing, command line
arguments and parsing. I skipped Part III altogether.</p>

<h1 id="reflections-on-the-module-system">Reflections on the module system</h1>

<p>So far, I’ve been pleasantly impressed with the language. Features like named
parameters and the module system are unique from other mainstream languages.
For a file <code>foo.ml</code> containing this definition:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>type</span> <span class='ts-type'>t</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-type-builtin'>int</span>
</code></pre></div>

<p>We can name this type in client modules with <code>Foo.t</code>, where <code>Foo</code> is the name
that OCaml assigns to the module from its filename. If I prefer to nest
modules, I might make <code>Foo</code> a submodule in <code>bar.ml</code>, where the name of <code>t</code>
would be <code>Bar.Foo.t</code>:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>module</span> <span class='ts-module'>Foo</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-keyword'>struct</span>
  <span class='ts-keyword'>type</span> <span class='ts-type'>t</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-type-builtin'>int</span>
<span class='ts-keyword'>end</span>
</code></pre></div>

<p>The rules for forming the name of a definition are clear and predictable. This
is something I miss when going back to C++, where <code>namespaces</code> are completely
orthogonal to file structure. A C++ development team has to discuss and align
on a set of rules, and then somehow enforce and maintain them, either with
tools or code inspection<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<p>Coming from non-ML languages, however, I find it a little strange that modules
are <em>the</em> mechanism for constructing types. OCaml does not have type classes,
interfaces, traits or templates. In place of these, it has only modules.</p>

<h2 id="monads-in-haskell-and-ocaml">Monads in Haskell and OCaml</h2>

<p>Let’s consider the definition of <code>Monad</code> in Haskell:</p>

<div class="language-haskell highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>class</span> <span class='ts-type'>Applicative</span> <span class='ts-type'>m</span> <span class='ts-operator'>=&gt;</span> <span class='ts-type'>Monad</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-type'>m</span> <span class='ts-operator'>::</span> <span class='ts-type'>Type</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>Type</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-keyword'>where</span>
  <span class='ts-punctuation-bracket'>(</span><span class='ts-operator'>&gt;&gt;=</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-operator'>::</span> <span class='ts-type'>m</span> <span class='ts-type'>a</span> <span class='ts-operator'>-&gt;</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-type'>a</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-type'>b</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-type'>b</span>
  <span class='ts-punctuation-bracket'>(</span><span class='ts-operator'>&gt;&gt;</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-operator'>::</span> <span class='ts-type'>m</span> <span class='ts-type'>a</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-type'>b</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-type'>b</span>
  <span class='ts-type'>return</span> <span class='ts-operator'>::</span> <span class='ts-type'>a</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-type'>a</span>
</code></pre></div>

<p>This is clearly a type. We can export it from a module, and import it:</p>

<div class="language-haskell highlighter-tree-sitter"><pre><code ><span class='ts-spell'>-- In Monad.hs</span>
<span class='ts-keyword-import'>module</span> <span class='ts-module'>Control</span><span class='ts-operator'>.</span><span class='ts-module'>Monad</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-type'>Monad</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-keyword'>where</span>

<span class='ts-spell'>-- In MyApp.hs</span>
<span class='ts-keyword-import'>import</span> <span class='ts-module'>Control</span><span class='ts-operator'>.</span><span class='ts-module'>Monad</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-type'>Monad</span><span class='ts-punctuation-bracket'>)</span>
</code></pre></div>

<p>Here, the module system is clearly separate from the type system, and it’s an
organizational structure–not a logical one. Let’s compare this to the
definition of <code>Monad</code> in OCaml:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>module</span> <span class='ts-keyword'>type</span> <span class='ts-module'>Monad</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-keyword'>sig</span>
  <span class='ts-keyword'>type</span> <span class='ts-variable'>&#39;a</span> <span class='ts-type'>t</span>
  <span class='ts-keyword'>val</span> <span class='ts-function'>return</span> <span class='ts-punctuation-delimiter'>:</span> <span class='ts-variable'>&#39;a</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-variable'>&#39;a</span> <span class='ts-type'>t</span>
  <span class='ts-keyword'>val</span> <span class='ts-punctuation-bracket'>(</span> <span class='ts-operator'>&gt;&gt;=</span> <span class='ts-punctuation-bracket'>)</span> <span class='ts-punctuation-delimiter'>:</span> <span class='ts-variable'>&#39;a</span> <span class='ts-type'>t</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-variable'>&#39;a</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-variable'>&#39;b</span> <span class='ts-type'>t</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-variable'>&#39;b</span> <span class='ts-type'>t</span>
<span class='ts-keyword'>end</span>
</code></pre></div>

<p>There are some obvious and superficial syntactic differences. For example, the
type parameters appear here on the left side of the type constructor, whereas
in Haskell they appear on the right. This type, however, is inextricably
connected to the filesystem. The language facility for scoping lexical names
<em>is the same as the one used to construct a higher order type abstraction</em>.</p>

<h2 id="implementing-monads">Implementing Monads</h2>

<p>We’re going to take this two steps further. To implement <code>Monad</code> for your type
in Haskell, you declare it as an <code>instance</code>:</p>

<div class="language-haskell highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>instance</span> <span class='ts-type'>Monad</span> <span class='ts-type'>Maybe</span> <span class='ts-keyword'>where</span>
  <span class='ts-variable-parameter'><span class='ts-punctuation-bracket'>(</span><span class='ts-constructor'>Just</span> <span class='ts-type'>x</span><span class='ts-punctuation-bracket'>)</span></span> <span class='ts-operator'>&gt;&gt;=</span> <span class='ts-type'>k</span> <span class='ts-operator'>=</span> <span class='ts-type'>k</span> <span class='ts-type'>x</span>
  <span class='ts-constructor'>Nothing</span>  <span class='ts-operator'>&gt;&gt;=</span> <span class='ts-operator'>_</span> <span class='ts-operator'>=</span> <span class='ts-constructor'>Nothing</span>

  <span class='ts-punctuation-bracket'>(</span><span class='ts-operator'>&gt;&gt;</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-operator'>=</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-operator'>*&gt;</span><span class='ts-punctuation-bracket'>)</span>
</code></pre></div>

<p>This is copied directly from <a href="https://hackage.haskell.org/package/ghc-internal-9.1201.0/docs/src/GHC.Internal.Base.html#line-1574">the source in <code>Base</code></a>. If the monad laws are
upheld, <code>return</code> is definitionally equal to <code>pure</code> from the <code>Applicative</code>
typeclass, and the right sequence operator is equivalent to the same operator
in <code>Applicative</code>.</p>

<p>To implement <code>Monad</code> in OCaml, you <em>could</em> create a module that satisfies the
<code>Monad</code> module signature:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>module</span> <span class='ts-module'>MOption</span> <span class='ts-punctuation-delimiter'>:</span> <span class='ts-module'>Monad</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-keyword'>struct</span>
  <span class='ts-keyword'>type</span> <span class='ts-variable'>&#39;a</span> <span class='ts-type'>t</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-variable'>&#39;a</span> <span class='ts-type-builtin'>option</span>
  <span class='ts-keyword'>let</span> <span class='ts-function'>return</span> <span class='ts-variable-parameter'>x</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-constructor'>Some</span> <span class='ts-variable-parameter'>x</span>
  <span class='ts-keyword'>let</span> <span class='ts-punctuation-bracket'>(</span> <span class='ts-operator'>&gt;&gt;=</span> <span class='ts-punctuation-bracket'>)</span> <span class='ts-variable-parameter'>m</span> <span class='ts-variable-parameter'>f</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-module'>Option</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>bind</span> <span class='ts-variable-parameter'>m</span> <span class='ts-variable-parameter'>f</span>
<span class='ts-keyword'>end</span>
</code></pre></div>

<p>I should point out another syntactic difference: OCaml does not support
pointfree style definitions like Haskell, so this expression is ill-typed: <code>let
return = Some</code>.</p>

<p>In practice, however, no one writes OCaml this way. There is no abstraction for
monads. The interface is enforced for common monads in <code>Base</code> by convention.</p>

<h2 id="programming-with-monads">Programming With Monads</h2>

<p>To program using monads in Haskell, we often don’t use the typeclass functions
directly. Instead, we use <code>do</code> notation, which allows us to write code that
reads like an imperative program:</p>

<div class="language-haskell highlighter-tree-sitter"><pre><code ><span class='ts-function'><span class='ts-function'><span class='ts-type'>maybeSum</span> <span class='ts-operator'>::</span> <span class='ts-type'>Maybe</span> <span class='ts-type'>Int</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>Maybe</span> <span class='ts-type'>Int</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>Maybe</span> <span class='ts-type'>Int</span></span></span>
<span class='ts-type'>maybeSum</span> <span class='ts-type'>a</span> <span class='ts-type'>b</span> <span class='ts-operator'>=</span> <span class='ts-keyword'>do</span>
  <span class='ts-type'>a&#39;</span> <span class='ts-operator'>&lt;-</span> <span class='ts-type'>a</span>
  <span class='ts-type'>b&#39;</span> <span class='ts-operator'>&lt;-</span> <span class='ts-type'>b</span>
  <span class='ts-type'>return</span> <span class='ts-operator'>$</span> <span class='ts-type'>a&#39;</span> <span class='ts-operator'>+</span> <span class='ts-type'>b&#39;</span>
</code></pre></div>

<p>GHC transforms this into an equivalent expression that uses the typeclass
functions during compilation:</p>

<div class="language-haskell highlighter-tree-sitter"><pre><code ><span class='ts-type'>maybeSum</span> <span class='ts-type'>a</span> <span class='ts-type'>b</span> <span class='ts-operator'>=</span>
  <span class='ts-type'>a</span> <span class='ts-operator'>&gt;&gt;=</span> <span class='ts-operator'>\</span><span class='ts-variable'><span class='ts-variable-parameter'><span class='ts-type'>a&#39;</span></span></span> <span class='ts-operator'>-&gt;</span>
    <span class='ts-type'>b</span> <span class='ts-operator'>&gt;&gt;=</span> <span class='ts-operator'>\</span><span class='ts-variable'><span class='ts-variable-parameter'><span class='ts-type'>b&#39;</span></span></span> <span class='ts-operator'>-&gt;</span>
      <span class='ts-type'>return</span> <span class='ts-operator'>$</span> <span class='ts-variable'>a&#39;</span> <span class='ts-operator'>+</span> <span class='ts-variable'>b&#39;</span>
</code></pre></div>

<p>OCaml has two solutions that provide similar ergonomics. The most mature and
common tool is <code>ppx_let</code> (a Jane Street invention), but OCaml 4.08 introduced
<em>binding operators</em> which might someday provide the same level of convenience
for common monads. This example writes our <code>maybeSum</code> function using <code>ppx_let</code>:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>open</span> <span class='ts-module'>Base</span>

<span class='ts-keyword'>let</span> <span class='ts-function'>maybe_sum</span> <span class='ts-variable-parameter'>a</span> <span class='ts-variable-parameter'>b</span> <span class='ts-punctuation-delimiter'>=</span>
  <span class='ts-keyword'>let</span> <span class='ts-keyword'>open</span> <span class='ts-module'>Option</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-module'>Let_syntax</span> <span class='ts-keyword'>in</span>
  <span class='ts-keyword'>let</span><span class='ts-punctuation-special'>%</span><span class='ts-tag'>bind</span> <span class='ts-variable'>a&#39;</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-variable-parameter'>a</span> <span class='ts-keyword'>in</span>
  <span class='ts-keyword'>let</span><span class='ts-punctuation-special'>%</span><span class='ts-tag'>bind</span> <span class='ts-variable'>b&#39;</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-variable-parameter'>b</span> <span class='ts-keyword'>in</span>
  <span class='ts-constructor'>Some</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-variable'>a&#39;</span> <span class='ts-operator'>+</span> <span class='ts-variable'>b&#39;</span><span class='ts-punctuation-bracket'>)</span>
</code></pre></div>

<p>Both <code>ppx_let</code> and <code>do</code>-notation are syntax extensions–they are transformed
into equivalent monadic code before compilation. These functions are both
written directly in terms of the <code>Maybe</code> monad.</p>

<p>Monads are, of course, very powerful, and we can express all sorts of
interesting logic on them. The functional reactive programming library Yampa in
Haskell provides <code>reactimate</code>, a function for running an arrow function on a
monad given a pair of “input sensing” and “actuation” monadic actions:</p>

<div class="language-haskell highlighter-tree-sitter"><pre><code ><span class='ts-type'>reactimate</span>
  <span class='ts-operator'>::</span> <span class='ts-type'>Monad</span> <span class='ts-type'>m</span>
  <span class='ts-operator'>=&gt;</span> <span class='ts-type'>m</span> <span class='ts-type'>a</span>                            <span class='ts-spell'>-- Initializing action</span>
  <span class='ts-operator'>-&gt;</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-type'>Bool</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-type'>DTime</span><span class='ts-punctuation-delimiter'>,</span> <span class='ts-type'>Maybe</span> <span class='ts-type'>a</span><span class='ts-punctuation-bracket'>)</span><span class='ts-punctuation-bracket'>)</span>   <span class='ts-spell'>-- Input sensing action</span>
  <span class='ts-operator'>-&gt;</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-type'>Bool</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>b</span> <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-type'>Bool</span><span class='ts-punctuation-bracket'>)</span>          <span class='ts-spell'>-- Actuation (output) action</span>
  <span class='ts-operator'>-&gt;</span> <span class='ts-type'>SF</span> <span class='ts-type'>a</span> <span class='ts-type'>b</span>                         <span class='ts-spell'>-- The arrow function</span>
  <span class='ts-operator'>-&gt;</span> <span class='ts-type'>m</span> <span class='ts-string-special-symbol'><span class='ts-punctuation-bracket'>(</span><span class='ts-punctuation-bracket'>)</span></span>
</code></pre></div>

<p>In practice, this is often run on a side-effect-producing monadic action, like
<code>IO</code>. However, it’s polymorphic in the monad <code>m</code>, so it could just as easily be
run on a <code>Maybe</code> or a list.</p>

<p>As far as I’m aware, there’s no easy way to write this function in OCaml. If we
were to create a <code>Monad</code> module, like above, we could implement this as an
OCaml functor (a function on modules). However, that module does not exist in
the standard library, so any solution would be inconsistent with the rest of
the ecosystem. I’m interested to hear if I’m missing something.</p>

<h1 id="parsing-json">Parsing JSON</h1>

<p>I only completed one of the exercises in the book–the <a href="https://dev.realworldocaml.org/parsing-with-ocamllex-and-menhir.html">JSON parser</a>
exercise built using OCamllex and Menhir. This was the one I thought would be
directly applicable to developing compilers.</p>

<p>Since the book is available online for free, I won’t reproduce the code for the
parser here. Instead, I just want to point out a couple of interesting experiences I encountered while developing it.</p>

<h2 id="testing">Testing</h2>

<p>I wanted to try <code>ppx_expect</code>, so I pulled it in and wrote an expect test for the parser:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>let</span><span class='ts-punctuation-special'>%</span><span class='ts-tag'>expect_test</span> <span class='ts-variable'>_</span> <span class='ts-punctuation-delimiter'>=</span>
  <span class='ts-keyword'>let</span> <span class='ts-variable'>lexbuf</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-module'>Lexing</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>from_string</span> <span class='ts-string'><span class='ts-string'>{</span>|{&quot;obj&quot;:&quot;\&quot;foo\&quot;&quot;}|<span class='ts-string'>}</span></span> <span class='ts-keyword'>in</span>
  <span class='ts-keyword'>let</span> <span class='ts-variable'>result</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-module'>Option</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>get</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-module'>Json_parser</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>parse_with_error</span> <span class='ts-variable'>lexbuf</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-keyword'>in</span>
  <span class='ts-module'>Json_parser</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>output_value</span> <span class='ts-variable'>stdout</span> <span class='ts-variable'>result</span><span class='ts-punctuation-delimiter'>;</span>
  <span class='ts-punctuation-special'>[%</span><span class='ts-tag'>expect</span> <span class='ts-string'><span class='ts-string'>{</span>| |<span class='ts-string'>}</span></span><span class='ts-punctuation-special'>]</span><span class='ts-punctuation-delimiter'>;</span>
</code></pre></div>

<p>I followed the advice of the book and established a “test only” library, separate from my parser library. Supposedly, this prevents code bloat. It does require me to deviate from the Dune project template a little, so here’s my <code>test/dune</code> file:</p>

<pre><code>(library
 (name test_json)
 (inline_tests)
 (preprocess (pps ppx_inline_test ppx_expect))
 (libraries json_parser))
</code></pre>

<p>This test is written to expect empty output, so we fully expect it to fail, and
it does. But expect tests produce a diff in the output that, if accepted, would
cause the test to pass:</p>

<div class="language-diff highlighter-tree-sitter"><pre><code class="highlight">[json_parser]$ opam exec -- dune runtest
<span class="p">File "test/test_json.ml", line 1, characters 0-0:
</span><span class="gh">diff --git a/_build/default/test/test_json.ml b/_build/.sandbox/713d6c9a80082f32d86b6de371e3845a/default/test/test_json.ml.corrected
index 435e144..884c113 100644
</span><span class="gd">--- a/_build/default/test/test_json.ml
</span><span class="gi">+++ b/_build/.sandbox/713d6c9a80082f32d86b6de371e3845a/default/test/test_json.ml.corrected
</span><span class="p">@@ -3,4 +3,4 @@</span> let%expect_test _ =
   let lexbuf = Lexing.from_string {|{"obj":"\"foo\""}|} in
   let result = Option.get (Json_parser.parse_with_error lexbuf) in
   Json_parser.output_value stdout result;
<span class="gd">-  [%expect {| |}];
</span><span class="gi">+  [%expect {| {"obj":"\"foo\""} |}];
</span></code></pre></div>

<p>This is called <em>Exploratory Programming</em>, and I’m very excited to use it when writing my compiler.</p>

<h2 id="escaping-quotes-in-string-literals">Escaping Quotes in String Literals</h2>

<p>The authors wrote a function for printing a <code>Json.t</code> back to a string, which is
not rendered in the output. I copied it from the <a href="https://github.com/realworldocaml/book/blob/master/book/parsing-with-ocamllex-and-menhir/examples/correct/parsing-test/json.ml#L15">book sources on GitHub</a>,
but changed mine to output “minified” JSON, without whitespace between. I
figured this would be easier to write expect tests against.</p>

<p>When given a string literal that contains escaped quotes, the parser fails with
a syntax error. This turns out to be because the lexer in the book is missing a
rule for escaped double quotes in string literals. Here’s my updated definition
of the <code>read_string</code> rule:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-function'>and</span> <span class='ts-variable'>read_string</span> <span class='ts-variable'>buf</span> <span class='ts-operator'>=</span>
  <span class='ts-variable'>parse</span>
  <span class='ts-punctuation-delimiter'>|</span> &#39;<span class='ts-string'>&quot;&#39;       { STRING (Buffer.contents buf) }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;/&#39;  { Buffer.add_char buf &#39;/&#39;; read_string buf lexbuf }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;<span class='ts-escape'>\\</span>&#39; { Buffer.add_char buf &#39;<span class='ts-escape'>\\</span>&#39;; read_string buf lexbuf }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;b&#39;  { Buffer.add_char buf &#39;<span class='ts-escape'>\b</span>&#39;; read_string buf lexbuf }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;f&#39;  { Buffer.add_char buf &#39;<span class='ts-escape'>\012</span>&#39;; read_string buf lexbuf }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;n&#39;  { Buffer.add_char buf &#39;<span class='ts-escape'>\n</span>&#39;; read_string buf lexbuf }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;r&#39;  { Buffer.add_char buf &#39;<span class='ts-escape'>\r</span>&#39;; read_string buf lexbuf }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;t&#39;  { Buffer.add_char buf &#39;<span class='ts-escape'>\t</span>&#39;; read_string buf lexbuf }</span>
<span class='ts-string'>  | &#39;<span class='ts-escape'>\\</span>&#39; &#39;&quot;</span>&#39;  <span class='ts-punctuation-bracket'>{</span> <span class='ts-module'>Buffer</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-variable'>add_char</span> <span class='ts-variable'>buf</span> <span class='ts-string'>&#39;&quot;&#39;</span><span class='ts-punctuation-delimiter'>;</span> <span class='ts-function'>read_string</span> <span class='ts-variable'>buf</span> <span class='ts-variable'>lexbuf</span> <span class='ts-punctuation-bracket'>}</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-punctuation-bracket'>[</span><span class='ts-operator'>^</span> <span class='ts-string'>&#39;&quot;&#39;</span> <span class='ts-string'>&#39;<span class='ts-escape'>\\</span>&#39;</span><span class='ts-punctuation-bracket'>]</span><span class='ts-punctuation-delimiter'>+</span>
    <span class='ts-punctuation-bracket'>{</span> <span class='ts-module'>Buffer</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-property'>add_string</span><span class='ts-punctuation-bracket'></span> <span class='ts-variable'>buf</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-module'>Lexing</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>lexeme</span> <span class='ts-variable'>lexbuf</span><span class='ts-punctuation-bracket'>)</span><span class='ts-punctuation-delimiter'>;</span>
      <span class='ts-function'>read_string</span> <span class='ts-variable'>buf</span> <span class='ts-variable'>lexbuf</span>
    <span class='ts-punctuation-bracket'>}</span>
  <span class='ts-punctuation-delimiter'>|</span> _ <span class='ts-punctuation-bracket'>{</span> raise <span class='ts-punctuation-bracket'>(</span><span class='ts-constructor'>SyntaxError</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-string'>&quot;Illegal string character: &quot;</span> <span class='ts-operator'>^</span> <span class='ts-module'>Lexing</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>lexeme</span> <span class='ts-variable'>lexbuf</span><span class='ts-punctuation-bracket'>)</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-punctuation-bracket'>}</span>
  <span class='ts-punctuation-delimiter'>|</span> eof <span class='ts-punctuation-bracket'>{</span> raise <span class='ts-punctuation-bracket'>(</span><span class='ts-constructor'>SyntaxError</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-string'>&quot;String is not terminated&quot;</span><span class='ts-punctuation-bracket'>)</span><span class='ts-punctuation-bracket'>)</span> <span class='ts-punctuation-bracket'>}</span>
</code></pre></div>

<p>I also had to change the <code>output_value</code> function to escape double-quote
characters when serializing string literals. Here’s that fragment. The rest is
the same:</p>

<div class="language-ocaml highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>let</span> <span class='ts-keyword'>rec</span> <span class='ts-function'>output_value</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-keyword'>function</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`Assoc</span> <span class='ts-variable-parameter'>obj</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-function'>print_assoc</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-variable-parameter'>obj</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`List</span> <span class='ts-variable-parameter'>l</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-function'>print_list</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-variable-parameter'>l</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`String</span> <span class='ts-variable-parameter'>s</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-function'>print_string</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-variable-parameter'>s</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`Int</span> <span class='ts-variable-parameter'>i</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-function'>printf</span> <span class='ts-string'>&quot;<span class='ts-string-special'>%d</span>&quot;</span> <span class='ts-variable-parameter'>i</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`Float</span> <span class='ts-variable-parameter'>x</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-function'>printf</span> <span class='ts-string'>&quot;<span class='ts-string-special'>%f</span>&quot;</span> <span class='ts-variable-parameter'>x</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`Bool</span> <span class='ts-constant'>true</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-module'>Out_channel</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>output_string</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-string'>&quot;true&quot;</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`Bool</span> <span class='ts-constant'>false</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-module'>Out_channel</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>output_string</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-string'>&quot;false&quot;</span>
  <span class='ts-punctuation-delimiter'>|</span> <span class='ts-constructor'>`Null</span> <span class='ts-punctuation-delimiter'>-&gt;</span> <span class='ts-module'>Out_channel</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>output_string</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-string'>&quot;null&quot;</span>

<span class='ts-keyword'>and</span> <span class='ts-function'>print_string</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-variable-parameter'>s</span> <span class='ts-punctuation-delimiter'>=</span>
  <span class='ts-keyword'>let</span> <span class='ts-variable'>escaped</span> <span class='ts-punctuation-delimiter'>=</span> <span class='ts-module'>CCString</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>replace</span> <span class='ts-punctuation-delimiter'>~</span><span class='ts-property'>sub</span><span class='ts-punctuation-delimiter'>:</span><span class='ts-string'>&quot;<span class='ts-escape'>\&quot;</span>&quot;</span> <span class='ts-punctuation-delimiter'>~</span><span class='ts-property'>by</span><span class='ts-punctuation-delimiter'>:</span><span class='ts-string'>&quot;<span class='ts-escape'>\\</span><span class='ts-escape'>\&quot;</span>&quot;</span> <span class='ts-variable-parameter'>s</span> <span class='ts-keyword'>in</span>
  <span class='ts-module'>Out_channel</span><span class='ts-punctuation-delimiter'>.</span><span class='ts-function'>output_string</span> <span class='ts-variable-parameter'>outc</span> <span class='ts-punctuation-bracket'>(</span><span class='ts-string'>&quot;<span class='ts-escape'>\&quot;</span>&quot;</span> <span class='ts-operator'>^</span> <span class='ts-variable'>escaped</span> <span class='ts-operator'>^</span> <span class='ts-string'>&quot;<span class='ts-escape'>\&quot;</span>&quot;</span><span class='ts-punctuation-bracket'>)</span>
</code></pre></div>

<h1 id="whats-next">What’s Next?</h1>

<p>I have so far enjoyed programming in OCaml, even though it seems that OCaml has
less support for categorical abstractions than Haskell does. I’m now moving on
to reading <em>Types and Programming Languages</em>, by Benjamin C. Pierce, which
develops type checking algorithms in OCaml. My next post is likely to include
the results of experimenting with these.</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Of course, C++ is mostly alone in these troubles–modern languages like
Rust also have clear and predictable rules for paths of types and
definitions, which are derived from the file structure. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="languages" /><summary type="html"><![CDATA[Two weeks ago, I started on a journey to learn how to design programming languages. After thorough research, I decided to learn and use OCaml for this.]]></summary></entry><entry><title type="html">Pinned Places in C++</title><link href="https://blog.ethantwardy.com/2025/04/05/Pinning-in-C++.html" rel="alternate" type="text/html" title="Pinned Places in C++" /><published>2025-04-05T08:00:00+00:00</published><updated>2025-04-05T08:00:00+00:00</updated><id>https://blog.ethantwardy.com/2025/04/05/Pinning-in-C++</id><content type="html" xml:base="https://blog.ethantwardy.com/2025/04/05/Pinning-in-C++.html"><![CDATA[<p>In C++, move semantics become very difficult to express and constrain
accurately without dynamic memory allocation. This is an unfortunate feature of
the language, especially when working in environments that don’t have a heap.</p>

<h2 id="storage-durations">Storage Durations</h2>

<p>As a brief reminder, all variables in C++ have a storage duration, and there
are <a href="https://en.cppreference.com/book/storage_durations">four storage duration classes</a>: automatic, static, thread and dynamic.
Memory with dynamic storage duration is allocated on the heap, of course.
All non-global variables have automatic storage duration, unless they are
declared <code>static</code>, <code>extern</code>, or <code>thread_local</code>.</p>

<h2 id="the-hypothetical-adc-driver">The Hypothetical ADC Driver</h2>

<p>Imagine, for example, that I have a class that represents a SPI bus, and that
I’m writing a driver for a specific ADC device that’s connected to my SPI bus.
So far, that may look something like this:</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>enum</span> <span class='ts-keyword'>class</span> <span class='ts-type'>SpiError</span> { <span class='ts-comment'>/* Enumeration Literals... */</span> }<span class='ts-delimiter'>;</span>

<span class='ts-keyword'>class</span> <span class='ts-type'>Spi</span> {
<span class='ts-keyword'>public</span>:
  <span class='ts-function'>Spi</span>() <span class='ts-operator'>=</span> <span class='ts-keyword'>default</span><span class='ts-delimiter'>;</span>
  <span class='ts-comment'>// Destructor _may_ actually do something interesting.</span>
  ~<span class='ts-variable'>Spi</span>()<span class='ts-delimiter'>;</span>

  std::<span class='ts-type'>expected</span><span class='ts-operator'>&lt;</span>std::<span class='ts-type'>monostate</span>, <span class='ts-type'>SpiError</span><span class='ts-operator'>&gt;</span> <span class='ts-function'>write</span>(
    std::<span class='ts-type'>span</span><span class='ts-operator'>&lt;</span><span class='ts-keyword'>const</span> <span class='ts-type'>uint8_t</span><span class='ts-operator'>&gt;</span> <span class='ts-variable'>data</span>,
    <span class='ts-type'>uint8_t</span> <span class='ts-variable'>chip_select</span>)<span class='ts-delimiter'>;</span>

  <span class='ts-comment'>// Delete the copy constructor</span>
  <span class='ts-function'>Spi</span>(<span class='ts-keyword'>const</span> <span class='ts-type'>Spi</span><span class='ts-operator'>&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>delete</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>Spi</span><span class='ts-operator'>&amp;</span> operator<span class='ts-operator'>=</span>(<span class='ts-keyword'>const</span> <span class='ts-type'>Spi</span><span class='ts-operator'>&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>delete</span><span class='ts-delimiter'>;</span>

  <span class='ts-comment'>// Moving is okay, though.</span>
  <span class='ts-function'>Spi</span>(<span class='ts-type'>Spi</span><span class='ts-operator'>&amp;&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>default</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>Spi</span><span class='ts-operator'>&amp;</span> operator<span class='ts-operator'>=</span>(<span class='ts-type'>Spi</span><span class='ts-operator'>&amp;&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>default</span><span class='ts-delimiter'>;</span>

<span class='ts-keyword'>private</span>:
  <span class='ts-comment'>// Instance data...</span>
}<span class='ts-delimiter'>;</span>

<span class='ts-keyword'>class</span> <span class='ts-type'>Adc</span> {
<span class='ts-keyword'>public</span>:
  <span class='ts-function'>Adc</span>(<span class='ts-type'>Spi</span><span class='ts-operator'>*</span> <span class='ts-variable'>spi</span>, <span class='ts-type'>uint8_t</span> <span class='ts-variable'>chip_select</span>) : <span class='ts-property'>spi_</span>{<span class='ts-variable'>spi</span>}, <span class='ts-property'>chip_select_</span>{<span class='ts-variable'>chip_select</span>} {}
  <span class='ts-type'>uint32_t</span> <span class='ts-function'>read_channel</span>(<span class='ts-type'>uint8_t</span> <span class='ts-variable'>channel_index</span>)<span class='ts-delimiter'>;</span>

<span class='ts-keyword'>private</span>:
  <span class='ts-type'>Spi</span><span class='ts-operator'>*</span> <span class='ts-property'>spi_</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>uint8_t</span> <span class='ts-property'>chip_select_</span><span class='ts-delimiter'>;</span>
}<span class='ts-delimiter'>;</span>
</code></pre></div>

<p>We don’t want <code>Adc</code> to own an instance of <code>Spi</code> by value, because we need to
share the <code>Spi</code> instance between multiple devices. <code>Spi</code> needs to implement
resource locking and ensure that concurrent access to the bus is impossible.</p>

<p>There’s no problem with these classes yet. We can leave the default move and
copy constructors–after all, they don’t own any resources, so it’s valid to
move an instance of these objects.</p>

<p>…Until I do this:</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>class</span> <span class='ts-type'>ApplicationState</span> {
<span class='ts-keyword'>public</span>:
  <span class='ts-function'>ApplicationState</span>() : <span class='ts-property'>spi_</span>{}, <span class='ts-property'>left_adc_</span>{<span class='ts-operator'>&amp;</span><span class='ts-variable'>spi_</span>}, <span class='ts-property'>right_adc_</span>{<span class='ts-operator'>&amp;</span><span class='ts-variable'>spi_</span>} {}

<span class='ts-keyword'>private</span>:
  <span class='ts-type'>Spi</span> <span class='ts-property'>spi_</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>Adc</span> <span class='ts-property'>left_adc_</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>Adc</span> <span class='ts-property'>right_adc_</span><span class='ts-delimiter'>;</span>
}<span class='ts-delimiter'>;</span>
</code></pre></div>

<p>The move semantics of <code>ApplicationState</code> are constrained, but not
automatically. When I create a <code>Spi*</code> member variable in the <code>Adc</code> class, I’m
introducing a new class invariant on <code>Adc</code>: a borrowed lifetime. This isn’t
Rust, but if it were, we would be forced to add a generic lifetime parameter on
<code>Adc</code>, like this:</p>

<div class="language-rust highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>struct</span> <span class='ts-type'>Adc</span><span class='ts-punctuation-bracket'>&lt;</span><span class='ts-operator'>&#39;</span><span class='ts-label'>a</span><span class='ts-punctuation-bracket'>&gt;</span> <span class='ts-punctuation-bracket'>{</span>
  <span class='ts-property'>spi</span><span class='ts-punctuation-delimiter'>:</span> <span class='ts-operator'>&amp;</span><span class='ts-operator'>&#39;</span><span class='ts-label'>a</span> <span class='ts-type'>Spi</span><span class='ts-punctuation-delimiter'>,</span>
  <span class='ts-property'>chip_select</span><span class='ts-punctuation-delimiter'>:</span> <span class='ts-type-builtin'>u8</span><span class='ts-punctuation-delimiter'>,</span>
<span class='ts-punctuation-bracket'>}</span>
</code></pre></div>

<p>So that the lifetime checker could ensure we are free of temporal memory safety
issues. C++ has nothing like this. Granted, this kind of thing becomes easier
to spot if you’re used to reviewing code for this (or perhaps if you’re a Rust
programmer). It becomes harder at scale–when there are many member variables,
or when we aren’t already familiar with the invariants on <code>Spi</code> and <code>Adc</code>–for
example, if we didn’t write them.</p>

<h2 id="pinned-places">Pinned Places</h2>

<p>The issue is that the class invariant is placed on the <em>instance data</em> of
<code>Adc</code>. We could make the class <code>Spi</code> immovable, but that’s not really correct.
There’s nothing in the type <code>Spi</code> that makes it immovable. We may be able to
coerce the C++ type system into helping us create a type that enforces this
invariant. The concept of a <a href="https://without.boats/blog/pin/">pinned place</a> may help us here.</p>

<p>We’ll start by introducing our type, <code>Pin</code>:</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>template</span><span class='ts-operator'>&lt;</span><span class='ts-keyword'>typename</span> <span class='ts-type'>T</span><span class='ts-operator'>&gt;</span>
<span class='ts-keyword'>concept</span> <span class='ts-variable'>ValueType</span> <span class='ts-operator'>=</span> std::<span class='ts-function'>is_same_v</span><span class='ts-operator'>&lt;</span>std::<span class='ts-type'>remove_reference_t</span><span class='ts-operator'>&lt;</span>std::<span class='ts-type'>remove_pointer_t</span><span class='ts-operator'>&lt;</span><span class='ts-type'>T</span><span class='ts-operator'>&gt;</span><span class='ts-operator'>&gt;</span>, <span class='ts-type'>T</span><span class='ts-operator'>&gt;</span><span class='ts-delimiter'>;</span>

<span class='ts-keyword'>template</span><span class='ts-operator'>&lt;</span><span class='ts-type'>ValueType</span> <span class='ts-constant'>T</span><span class='ts-operator'>&gt;</span>
<span class='ts-keyword'>class</span> <span class='ts-type'>Pin</span> {
<span class='ts-keyword'>public</span>:
    <span class='ts-keyword'>using</span> <span class='ts-type'>reference_type</span> <span class='ts-operator'>=</span> std::<span class='ts-type'>add_lvalue_reference_t</span><span class='ts-operator'>&lt;</span><span class='ts-type'>T</span><span class='ts-operator'>&gt;</span><span class='ts-delimiter'>;</span>
    <span class='ts-keyword'>using</span> <span class='ts-type'>pointer_type</span> <span class='ts-operator'>=</span> gsl::<span class='ts-type'>not_null</span><span class='ts-operator'>&lt;</span><span class='ts-type'>T</span><span class='ts-operator'>*</span><span class='ts-operator'>&gt;</span><span class='ts-delimiter'>;</span>

    <span class='ts-keyword'>template</span><span class='ts-operator'>&lt;</span><span class='ts-keyword'>typename</span>... <span class='ts-type'>Args</span><span class='ts-operator'>&gt;</span>
    <span class='ts-function'>Pin</span>(<span class='ts-type'>Args</span><span class='ts-operator'>&amp;&amp;</span>... <span class='ts-variable'>args</span>) : <span class='ts-property'>m_value</span>{std::<span class='ts-function'>forward</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Args</span><span class='ts-operator'>&gt;</span>(<span class='ts-variable'>args</span>)...} {}
    ~<span class='ts-variable'>Pin</span>() <span class='ts-operator'>=</span> <span class='ts-keyword'>default</span><span class='ts-delimiter'>;</span>

    <span class='ts-comment'>// Pinned objects intentionally have non-copyable/non-movable semantics.</span>
    <span class='ts-comment'>// Strictly speaking, copy semantics ought to be definable if T is</span>
    <span class='ts-comment'>// copyable, but default-ing them would restrict Pin to copyable types T.</span>
    <span class='ts-comment'>// Semantically, there is no reason why we should be able to copy a pinned</span>
    <span class='ts-comment'>// object, so we are safe to delete this.</span>
    <span class='ts-function'>Pin</span>(<span class='ts-keyword'>const</span> <span class='ts-type'>Pin</span><span class='ts-operator'>&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>delete</span><span class='ts-delimiter'>;</span>
    <span class='ts-type'>Pin</span><span class='ts-operator'>&amp;</span> operator<span class='ts-operator'>=</span>(<span class='ts-keyword'>const</span> <span class='ts-type'>Pin</span><span class='ts-operator'>&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>delete</span><span class='ts-delimiter'>;</span>
    <span class='ts-function'>Pin</span>(<span class='ts-type'>Pin</span><span class='ts-operator'>&amp;&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>delete</span><span class='ts-delimiter'>;</span>
    <span class='ts-type'>Pin</span><span class='ts-operator'>&amp;</span> operator<span class='ts-operator'>=</span>(<span class='ts-type'>Pin</span><span class='ts-operator'>&amp;&amp;</span>) <span class='ts-operator'>=</span> <span class='ts-keyword'>delete</span><span class='ts-delimiter'>;</span>

    <span class='ts-type'>reference_type</span> operator<span class='ts-operator'>*</span>() <span class='ts-keyword'>noexcept</span> { <span class='ts-keyword'>return</span> <span class='ts-variable'>m_value</span><span class='ts-delimiter'>;</span> }
    <span class='ts-type'>pointer_type</span> operator<span class='ts-operator'>-&gt;</span>() <span class='ts-keyword'>noexcept</span> { <span class='ts-keyword'>return</span> <span class='ts-operator'>&amp;</span><span class='ts-variable'>m_value</span><span class='ts-delimiter'>;</span> }

<span class='ts-keyword'>private</span>:
    <span class='ts-type'>T</span> <span class='ts-property'>m_value</span><span class='ts-delimiter'>;</span>
}<span class='ts-delimiter'>;</span>
</code></pre></div>

<p>A <code>Pin&lt;T&gt;</code> object wraps an instance of a movable type <code>T</code> in an immovable
container. For variables with automatic storage duration, this has the effect
of “pinning” them on the stack. The <code>noexcept</code> declaration on the operators is
not necessary for this discussion, but it allows the <code>noexcept</code> constraint on
this operator to be inherited from the same operator on the type <code>T</code>, which is
nice. Similarly, we use <code>gsl::not_null</code> for a little bit of extra safety. We
never intend to return a null pointer, so it’s helpful to annotate that.
Finally, you may notice that I created the concept <code>Value</code> to ensure that <code>T</code>
is not a pointer or reference type. This would make the nested type
declarations more complicated, and after all, “pinning” a reference type has no
semantic value, so we disallow it.</p>

<p>Now, we need a type that will allow classes to require their callers to uphold
the immovable invariant on owned instance data. We’ll call it <code>PinPtr</code>:</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>template</span><span class='ts-operator'>&lt;</span><span class='ts-type'>ValueType</span> <span class='ts-constant'>T</span><span class='ts-operator'>&gt;</span>
<span class='ts-keyword'>class</span> <span class='ts-type'>PinPtr</span> {
<span class='ts-keyword'>public</span>:
    <span class='ts-keyword'>using</span> <span class='ts-type'>reference_type</span> <span class='ts-operator'>=</span> std::<span class='ts-type'>add_lvalue_reference_t</span><span class='ts-operator'>&lt;</span><span class='ts-type'>T</span><span class='ts-operator'>&gt;</span><span class='ts-delimiter'>;</span>
    <span class='ts-keyword'>using</span> <span class='ts-type'>pointer_type</span> <span class='ts-operator'>=</span> gsl::<span class='ts-type'>not_null</span><span class='ts-operator'>&lt;</span><span class='ts-type'>T</span><span class='ts-operator'>*</span><span class='ts-operator'>&gt;</span><span class='ts-delimiter'>;</span>

    <span class='ts-comment'>// TODO: Constructors?</span>

    <span class='ts-type'>reference_type</span> operator<span class='ts-operator'>*</span>() <span class='ts-keyword'>const</span>
        <span class='ts-keyword'>noexcept</span>(<span class='ts-function'>noexcept</span>(<span class='ts-operator'>*</span>std::<span class='ts-function'>declval</span><span class='ts-operator'>&lt;</span><span class='ts-type'>pointer_type</span><span class='ts-operator'>&gt;</span>())) {
      <span class='ts-keyword'>return</span> <span class='ts-operator'>*</span><span class='ts-variable'>m_value</span><span class='ts-delimiter'>;</span>
    }
    <span class='ts-type'>pointer_type</span> operator<span class='ts-operator'>-&gt;</span>() <span class='ts-keyword'>const</span> <span class='ts-keyword'>noexcept</span> { <span class='ts-keyword'>return</span> <span class='ts-variable'>m_value</span><span class='ts-delimiter'>;</span> }

<span class='ts-keyword'>private</span>:
    <span class='ts-type'>pointer_type</span> <span class='ts-property'>m_value</span><span class='ts-delimiter'>;</span>
}<span class='ts-delimiter'>;</span>
</code></pre></div>

<p>We’ll come back to the constructors in a moment. We can now rewrite <code>Adc</code> like
this, and <code>PinPtr</code> acts just like any smart pointer type:</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>class</span> <span class='ts-type'>Adc</span> {
<span class='ts-keyword'>public</span>:
  <span class='ts-function'>Adc</span>(<span class='ts-type'>PinPtr</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Spi</span><span class='ts-operator'>&gt;</span> <span class='ts-variable'>spi</span>, <span class='ts-type'>uint8_t</span> <span class='ts-variable'>chip_select</span>) : <span class='ts-property'>spi_</span>{<span class='ts-variable'>spi</span>}, <span class='ts-property'>chip_select_</span>{<span class='ts-variable'>chip_select</span>} {}
  <span class='ts-type'>uint32_t</span> <span class='ts-function'>read_channel</span>(<span class='ts-type'>uint8_t</span> <span class='ts-variable'>channel_index</span>) {
    <span class='ts-keyword'>static</span> <span class='ts-keyword'>constexpr</span> std::<span class='ts-type'>array</span><span class='ts-operator'>&lt;</span><span class='ts-type'>uint8_t</span>, <span class='ts-number'>2</span><span class='ts-operator'>&gt;</span> <span class='ts-variable'>command</span> <span class='ts-operator'>=</span> {<span class='ts-number'>0x08</span>, <span class='ts-number'>0x00</span>}<span class='ts-delimiter'>;</span>
    <span class='ts-variable'>spi_</span><span class='ts-operator'>-&gt;</span><span class='ts-function'>write</span>(<span class='ts-variable'>command</span>, <span class='ts-variable'>chip_select_</span>)<span class='ts-delimiter'>;</span>
  }

<span class='ts-keyword'>private</span>:
  <span class='ts-type'>PinPtr</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Spi</span><span class='ts-operator'>&gt;</span> <span class='ts-property'>spi_</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>uint8_t</span> <span class='ts-property'>chip_select_</span><span class='ts-delimiter'>;</span>
}<span class='ts-delimiter'>;</span>
</code></pre></div>

<p><code>Adc</code> is still movable, and so is <code>Spi</code>. But the idea is that now, when I write
<code>ApplicationState</code>:</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>class</span> <span class='ts-type'>ApplicationState</span> {
<span class='ts-keyword'>public</span>:
  <span class='ts-function'>ApplicationState</span>() : <span class='ts-property'>spi_</span>{}, <span class='ts-property'>left_adc_</span>{<span class='ts-variable'>spi_</span>}, <span class='ts-property'>right_adc_</span>{<span class='ts-variable'>spi_</span>} {}

<span class='ts-keyword'>private</span>:
  <span class='ts-type'>Pin</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Spi</span><span class='ts-operator'>&gt;</span> <span class='ts-property'>spi_</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>Adc</span> <span class='ts-property'>left_adc_</span><span class='ts-delimiter'>;</span>
  <span class='ts-type'>Adc</span> <span class='ts-property'>right_adc_</span><span class='ts-delimiter'>;</span>
}<span class='ts-delimiter'>;</span>
</code></pre></div>

<p>The move and copy constructors for <code>ApplicationState</code> are automatically
deleted!</p>

<h2 id="constructing-a-pinptr">Constructing a <code>PinPtr</code></h2>

<p>The last thing is to ensure that it’s only possible to construct a <code>PinPtr</code>
from a valid <code>Pin&lt;T&gt;</code>. For this, we need to recall how <a href="https://en.cppreference.com/w/cpp/language/value_category">value categories</a>
work. Remember that an rvalue is either a temporary value that has no address,
or a temporary value that is “expiring”. An lvalue is something that has an
address, and is neither of these. How these are represented in the type system,
however, may be surprising:</p>

<ul>
  <li><code>T&amp;</code> can only represent an lvalue.</li>
  <li><code>T&amp;&amp;</code> can only represent an rvalue.</li>
  <li><code>const T&amp;</code> can represent <em>either</em> an lvalue or an rvalue.</li>
</ul>

<p>The last point is critical! Objects of <code>Pin&lt;T&gt;</code> are only valid as lvalues, and
so it’s only valid to construct a <code>PinPtr&lt;T&gt;</code> from a <code>Pin&lt;T&gt;&amp;</code>–this is the
only way that we can ensure the pinned object will remain valid after the
constructor runs. With this in mind, we can add our constructors:</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>template</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Value</span> <span class='ts-constant'>T</span><span class='ts-operator'>&gt;</span>
<span class='ts-keyword'>class</span> <span class='ts-type'>PinPtr</span> {
<span class='ts-keyword'>public</span>:
  <span class='ts-keyword'>template</span><span class='ts-operator'>&lt;</span><span class='ts-keyword'>typename</span> <span class='ts-type'>U</span><span class='ts-operator'>&gt;</span>
  <span class='ts-function'>PinPtr</span>(<span class='ts-type'>Pin</span><span class='ts-operator'>&lt;</span><span class='ts-type'>U</span><span class='ts-operator'>&gt;</span><span class='ts-operator'>&amp;</span> <span class='ts-variable'>pin</span>) : <span class='ts-property'>m_value</span>{<span class='ts-function'>pin</span><span class='ts-delimiter'>.</span>operator<span class='ts-operator'>-&gt;</span>()} {}
}<span class='ts-delimiter'>;</span>
</code></pre></div>

<p>We template the constructor to allow polymorphic pointers. We can construct a
<code>PinPtr&lt;T&gt;</code> from a <code>Pin&lt;U&gt;</code> if <code>U</code> is a derived class of <code>T</code>. Also recall that
defining this constructor implicitly deletes the default constructor, which is
critical to enforcing this invariant.</p>

<p>Prevailing wisdom would have us declare the single-arg constructor as
<code>explicit</code>, but I think that’s not necessary here. There is one–and only
one–way to construct a <code>PinPtr</code>. Requiring an explicit constructor call would
not add clarity, and would only add visual noise.</p>

<h2 id="does-it-really-work-though">Does It Really Work, Though?</h2>

<p>The answer seems to be yes!</p>

<div class="language-cpp highlighter-tree-sitter"><pre><code ><span class='ts-keyword'>class</span> <span class='ts-type'>Resource</span> {}<span class='ts-delimiter'>;</span>

<span class='ts-keyword'>class</span> <span class='ts-type'>Borrows</span> {
<span class='ts-keyword'>public</span>:
    <span class='ts-function'>Borrows</span>(<span class='ts-type'>PinPtr</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Resource</span><span class='ts-operator'>&gt;</span> <span class='ts-variable'>resource</span>) : <span class='ts-property'>resource_</span>{<span class='ts-variable'>resource</span>} {}
<span class='ts-keyword'>private</span>:
    <span class='ts-type'>PinPtr</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Resource</span><span class='ts-operator'>&gt;</span> <span class='ts-property'>resource_</span><span class='ts-delimiter'>;</span>
}<span class='ts-delimiter'>;</span>

<span class='ts-type'>int</span> <span class='ts-function'>main</span>() {
    <span class='ts-type'>Resource</span> <span class='ts-variable'>r</span><span class='ts-delimiter'>;</span>
    <span class='ts-type'>Pin</span><span class='ts-operator'>&lt;</span><span class='ts-type'>Resource</span><span class='ts-operator'>&gt;</span> <span class='ts-variable'>pinned</span><span class='ts-delimiter'>;</span>

    <span class='ts-comment'>// These fail to compile:</span>
    <span class='ts-comment'>// Borrows borrower{Pin&lt;Resource&gt;{Resource{}}};</span>
    <span class='ts-comment'>// Borrows borrower{&amp;r};</span>
    <span class='ts-comment'>// Borrows borrower{Resource{}};</span>

    <span class='ts-comment'>// This is the only way to construct a Borrows:</span>
    <span class='ts-type'>Borrows</span> <span class='ts-variable'>borrower</span>{<span class='ts-variable'>pinned</span>}<span class='ts-delimiter'>;</span>
}
</code></pre></div>

<h2 id="conclusion">Conclusion</h2>

<p>It’s not a perfect bolt-on solution for temporal memory safety. In the previous
example, <code>PinPtr</code> cannot ensure that the lifetime of <code>pinned</code> outlives the
lifetime of <code>borrower</code>. Now that we’re protected from pointer invalidation by
move/destruction, however, other temporal memory safety issues are
<em>theoretically</em> harder to invoke <em>accidentally</em>, and may be easier to locate
during code inspection.</p>

<p>Whether this adds value, though, or visual noise, is up to you. It may feel odd
to represent a new semantic value category using a vocabulary type. If that’s
the case, this likely isn’t for you! I think it has the potential to prevent a
number of nasty UB issues, however, so I think I’m a fan of it! After trying to
use it, I’ll give an update to see if that turned out to be the case.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[In C++, move semantics become very difficult to express and constrain accurately without dynamic memory allocation. This is an unfortunate feature of the language, especially when working in environments that don’t have a heap.]]></summary></entry><entry><title type="html">Running LineageOS for the First Time</title><link href="https://blog.ethantwardy.com/2025/02/17/First-time-running-LineageOS.html" rel="alternate" type="text/html" title="Running LineageOS for the First Time" /><published>2025-02-17T20:00:00+00:00</published><updated>2025-02-17T20:00:00+00:00</updated><id>https://blog.ethantwardy.com/2025/02/17/First-time-running-LineageOS</id><content type="html" xml:base="https://blog.ethantwardy.com/2025/02/17/First-time-running-LineageOS.html"><![CDATA[<p>Lately, I’ve been consuming a lot of literature that comes exclusively in
digital form. Academic papers, PhD theses, blogs, journals, and creative
commons digital books. I find it to be really uncomfortable to spend long hours
reading material on my laptop, so I decided to buy my first tablet.</p>

<p>I poked around on eBay to see what I could find cheaply available. The iPads
were enticing, but I decided I would look for one with LineageOS support. If I
only spent a few dollars, I wouldn’t mind if I brick the thing. I <em>have</em> been a
professional Android engineer in my very short career, but I’ve never engaged
with the LineageOS ecosystem, so this felt like a good opportunity.</p>

<p>I selected the Samsung Galaxy Tab S2 9.7 (Wi-Fi), which is about 9 years old at
the time of this writing. I spent $60, shipping included (I probably overpaid),
and when it arrived a week later I dubiously factory reset the thing. It was
pretty speedy, and the battery life is not terrible, considering the device’s
age. Additionally, it was very comfortable to read with for extended periods of
time, and the screen is roomy.</p>

<p>The LineageOS team stopped supporting this device after LineageOS 16 (which
seems to have been around 2019?), but the <a href="https://wiki.lineageos.org/devices/gts210vewifi/install/#">instructions</a> are still easily
found on Google. I was able to enable the “OEM Unlock” switch in the Developer
Settings without any trouble at all, which can not be said for my Samsung
Galaxy A11. The instructions are fairly simple, from a high-level. Install
Heimdall, download a TWRP recovery image (another product I had no previous
familiarity with), and sideload the Lineage image using ADB. The linked version
of Heimdall didn’t provide a release bundle for arm64 Linux (which I’m not
surprised by), but it seemed to run perfectly under <code>box64</code>.</p>

<p>I had almost no trouble at all after that, until step 4: <em>Build a LineageOS
installation package.</em> I was not prepared to build. Thankfully, again, the
<a href="https://wiki.lineageos.org/devices/gts210vewifi/build/">build instructions</a> are also linked nearby. One thing that can definitely
be said for LineageOS is that their instructions are <em>very</em> clear, almost
without exception.</p>

<p>The build instructions seem stock-standard for Android, with a couple of cute
oddities (the <code>lunch</code> command replaced by <code>brunch</code>, etc.). The standard build
dependencies are mostly the same, but Lineage 16 was the last version to
require Python 2, which is no longer available in the Debian testing
repositories. It <em>is</em> still available in <code>nixpkgs</code>, although <code>nix-env</code> refuses
to install until this fragment is added to <code>~/.config/nixpkgs/config.nix</code>:</p>

<pre><code>{
  permittedInsecurePackages = [
    "python-2.7.18.8"
  ];
}
</code></pre>

<p>From there, I was able to set up a Python 2 <code>virtualenv</code>:</p>

<pre><code>nix-env -iA nixpkgs.python2
python2.7 -m ensurepip --user --default-pip
python2.7 -m pip install --user virtualenv
virtualenv --python=python2.7 .lineage_venv
</code></pre>

<p>Which I can activate in every shell with <code>. ./.lineage_venv/bin/activate</code>.</p>

<p>The build instructions then ask the user to run the <code>extract-files.sh</code> script
within the build tree, which seems to extract proprietary blobs from the
running device using ADB. The script worked great–however, my device is
apparently missing some of the proprietary blobs that are necessary, because
the build system bailed out <em>immediately</em> when I tried to run it. Luckily, I
was able to find an old <a href="https://lineage-archive.timschumi.net/#">prebuilt image for my device</a> on the unofficial
LineageOS archive, and the LineageOS wiki contains <a href="https://wiki.lineageos.org/extracting_blobs_from_zips">instructions for extracting
files from prebuilt images</a>, so I was able to supplement my losses.</p>

<p>That got me a little further, but the build was now failing on the first
compilation step, because the prebuilt clang that came from the repo manifest
links to two libraries, <code>libtinfo.so.5</code> and <code>libncurses.so.5</code> which aren’t
installed on my machine. Naturally, <em>these</em> aren’t available in the Debian
testing repositories either, but the build instructions indicated I might be
able to install them if I downloaded them from an older release’s repositories.
These versions were still being updated as of <code>buster</code>, so I clicked around
until I found the download link, and the manual install worked!</p>

<pre><code>curl -LO http://ftp.us.debian.org/debian/pool/main/n/ncurses/libtinfo5_6.4-4_amd64.deb
curl -LO http://ftp.us.debian.org/debian/pool/main/n/ncurses/libncurses5_6.4-4_amd64.deb
dpkg -i ./libtinfo5_6.4-4_amd64.deb
dpkg -i ./libncurses5_6.4-4_amd64.deb
</code></pre>

<p>Now a little bit further, and onto a <code>make</code> error:</p>

<pre><code>Makefile:791: *** multiple target patterns. Stop.
</code></pre>

<p>This one stumped me. I’ve never seen this error from GNUMake before, and it’s
not an immediately googleable problem. A little bit of poking around, and I did
find one person who reported an error like this when trying to build Ubuntu
touch for their Xperia Z5 Compact. Apparently, setting <code>USE_HOST_LEX=yes</code> in
the environment fixed it. To my shock and horror, it worked for me as well. In
the future, I might like to look into that a little further to see what was
actually causing it and why that would fix it.</p>

<p>At this point, I got about 25 build steps into the process, when I got an
obscure error about <code>cannot exec .../prebuilts/clang: File not found</code>. I’d seen
this enough to know it was either a dynamic linker error or a shebang error,
and the file turned out to be a Python script with a <code>#!/usr/bin/python</code>
shebang at the top. These pesky developers apparently never planned for me to
want to use a Python other than the system Python to build the image.
Unfortunately, the fix for this was to symlink <code>/usr/bin/python</code> to the Python2
installation in the virtual environment:</p>

<pre><code>sudo ln -s /home/edtwardy/Git/lineageos/.lineage_venv/bin/python /usr/bin/python
</code></pre>

<p>Luckily, Python never returned to using that path for Python 3 installations
after the Python 2 end-of-life, so this was a (relatively) non-intrusive
change.</p>

<p>Now, after a long 45 minute wait, it looked like the build was going to
succeed. At the <em>last</em> step, however, I got a Python exception trace that ended
with:</p>

<pre><code>AssertionError: compression of system.new.dat failed.
</code></pre>

<p>Nice. That was frustrating. The next day, I took a look at
<code>build/make/tools/releasetools/common.py</code>, the path mentioned in the stack
trace. It looks like they were trying to perform a <code>brotli</code> compression, which
should have been obvious from my earlier experience extracting files from the
prebuilt brotli-compressed ext2 system image.</p>

<p>I made a small code change to try to get more information:</p>

<div class="language-diff highlighter-tree-sitter"><pre><code class="highlight"><span class="gh">diff --git a/tools/releasetools/common.py b/tools/releasetools/common.py
index f7ab11cd8..c9ba9fc45 100644
</span><span class="gd">--- a/tools/releasetools/common.py
</span><span class="gi">+++ b/tools/releasetools/common.py
</span><span class="p">@@ -1755,10 +1755,10 @@</span> class BlockDifference(object):
                     '--output={}.new.dat.br'.format(self.path),
                     '{}.new.dat'.format(self.path)]
       print("Compressing {}.new.dat with brotli".format(self.partition))
<span class="gd">-      p = Run(brotli_cmd, stdout=subprocess.PIPE)
-      p.communicate()
</span><span class="gi">+      p = Run(brotli_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+      _, err = p.communicate()
</span>       assert p.returncode == 0,\
<span class="gd">-          'compression of {}.new.dat failed'.format(self.partition)
</span><span class="gi">+            'compression of {}.new.dat failed: {}'.format(self.partition, err.strip())
</span> 
       new_data_name = '{}.new.dat.br'.format(self.partition)
       ZipWrite(output_zip,
</code></pre></div>

<p>This change prints the output of <code>stderr</code> from the child process in the stack
trace, which gave me all I needed to know:</p>

<pre><code>Compressing system.new.dat with brotli
  running:  brotli --quality=6 --output=/tmp/tmpz4Bykj/system.new.dat.br /tmp/tmpz4Bykj/system.new.dat
Traceback (most recent call last):
  File "build/make/tools/releasetools/ota_from_target_files", line 2051, in &lt;module&gt;
    main(sys.argv[1:])
  File "build/make/tools/releasetools/ota_from_target_files", line 2025, in main
    output_file=args[1])
  File "build/make/tools/releasetools/ota_from_target_files", line 858, in WriteFullOTAPackage
    system_diff.WriteScript(script, output_zip)
  File "/home/edtwardy/Git/lineageos/android/lineage/build/make/tools/releasetools/common.py", line 1606, in WriteScript
    self._WriteUpdate(script, output_zip)
  File "/home/edtwardy/Git/lineageos/android/lineage/build/make/tools/releasetools/common.py", line 1761, in _WriteUpdate
    'compression of {}.new.dat failed: {}'.format(self.partition, err.strip())
AssertionError: compression of system.new.dat failed: failed to write output [/tmp/tmpz4Bykj/system.new.dat.br]: No space left on device
ninja: build stopped: subcommand failed.
06:29:59 ninja failed with: exit status 1
</code></pre>

<p>No Android development effort is complete without failing builds caused by a
disk space shortage! It’s unclear why the developers would choose to use <code>/tmp</code>
for this, when there’s a long history of system administrators putting <code>/tmp</code>
on a different (and smaller) device. I’m embarrassed to say that I am (was) one
of those admins. Luckily, it was an easy fix to mount a <code>tmpfs</code> over top of
<code>/tmp</code>:</p>

<pre><code>sudo mount -t tmpfs -o size=16g tmpfs /tmp
</code></pre>

<p>Let me tell you, it was such an adrenaline rush to see the build finally
succeed:</p>

<pre><code>
#### build completed successfully (05:03 (mm:ss)) ####

</code></pre>

<p>I know that LineageOS 16 is getting up there in age now, but I’ve been
impressed with the look and feel of it. I get nostalgia back to my first-ever
Android program as a professional developer, which was also based on Android 9.
Unfortunately, the camera doesn’t work. But I can live with that.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Lately, I’ve been consuming a lot of literature that comes exclusively in digital form. Academic papers, PhD theses, blogs, journals, and creative commons digital books. I find it to be really uncomfortable to spend long hours reading material on my laptop, so I decided to buy my first tablet.]]></summary></entry><entry><title type="html">Implementing the batch-sequential architecture style in Rust</title><link href="https://blog.ethantwardy.com/design/2024/12/11/Implementing-the-batch-sequential-architecture-style-in-Rust.html" rel="alternate" type="text/html" title="Implementing the batch-sequential architecture style in Rust" /><published>2024-12-11T09:00:00+00:00</published><updated>2024-12-11T09:00:00+00:00</updated><id>https://blog.ethantwardy.com/design/2024/12/11/Implementing-the-batch-sequential-architecture-style-in-Rust</id><content type="html" xml:base="https://blog.ethantwardy.com/design/2024/12/11/Implementing-the-batch-sequential-architecture-style-in-Rust.html"><![CDATA[<p>As promised, this post is a follow up to my <a href="/design/2024/12/11/New-Patterns-for-Redfish-Codegen.html">previous post</a> with the details of how we’ve implemented a batch-sequential architecture pattern for the redfish-codegen project.</p>

<h1 id="types-that-do-work">Types that do work</h1>

<p>The goal is to write code like this:</p>

<pre><code>struct Hello;
impl Process&lt;()&gt; for Hello {
    type Output = String;
    fn process(self, input: ()) -&gt; Self::Output {
        "Hello, world!".to_string()
    }
}

fn print_message(input: String) {
    println!("{}", &amp;input);
}

Pipeline::builder()
    .stage(Hello)
    .stage(print_message)
    .execute();
</code></pre>

<p>Here, we can construct a pipeline, and then execute it. We don’t care whether a
stage consists of a free function or a type <code>impl</code>. The pipeline ensures,
statically, that the input type of a stage is compatible with the output type
of the previous stage. To achieve this, we start with the trait <code>Process</code>:</p>

<pre><code>pub trait Process&lt;Input&gt; {
    type Output;
    fn process(self, input: Input) -&gt; Self::Output;
}
</code></pre>

<p>This type represents a procedure that consumes <code>self</code> (is terminating) from an
input value to an output. We can provide a blanket implementation for all
<code>FnOnce</code> closures:</p>

<pre><code>impl&lt;F, In, Out&gt; Process&lt;In&gt; for F
where
    F: FnOnce(In) -&gt; Out,
{
    type Output = Out;
    fn process(self, input: In) -&gt; Out {
        self(input)
    }
}
</code></pre>

<h1 id="composing-a-pipeline">Composing a Pipeline</h1>

<p>The next trait is <code>Stage</code>, and we can use it to combine stages to form our
pipeline:</p>

<pre><code>pub trait Stage&lt;P, In, Out&gt;: private::Sealed {
    type Result&lt;R&gt;;
    fn stage&lt;Q&gt;(self, process: Q) -&gt; Self::Result&lt;Q&gt;
    where
        Q: Process&lt;Out&gt;;
}
</code></pre>

<p>Note that I’ve <a href="https://predr.ag/blog/definitive-guide-to-sealed-traits-in-rust/">sealed</a> this trait. Other areas of the code should not be
implementing this trait. We’ll implement this on a type <code>Pipeline</code>. Here’s the
definition of <code>Pipeline</code>, with some of the cruft removed:</p>

<pre><code>pub struct Pipeline&lt;Proc, PreviousStage&gt; {
    pub(super) process: Proc,
    pub(super) previous: PreviousStage,
}

impl&lt;P, S, Input, Output&gt; Stage&lt;P, Input, Output&gt; for Pipeline&lt;P, S&gt;
where
    P: Process&lt;Input, Output = Output&gt;,
{
    type Result&lt;R&gt; = Pipeline&lt;R, Self&gt;;
    fn stage&lt;Q&gt;(self, process: Q) -&gt; Self::Result&lt;Q&gt;
    where
        Q: Process&lt;Output&gt;,
    {
        Pipeline {
            process,
            previous: self,
        }
    }
}
</code></pre>

<p>All this does is compose a new <code>Pipeline</code> containing the previous <code>Pipeline</code>.
You might refer to this type as <em>telescoping</em>, because its real type (as known
to the compiler) contains the type of its sub-pipeline…</p>

<p>…which of course, contains the type of its sub-pipeline (recursively).</p>

<h1 id="executing-a-pipeline">Executing a Pipeline</h1>

<p>We know that we need a trait that can allow us to execute a stage, so let’s
begin there:</p>

<pre><code>trait RunStage {
    type Output;
    fn run_stage(self) -&gt; Self::Output;
}
</code></pre>

<p>If <code>Process</code> is the category of types with a self-consuming function from an
input to an output, <code>RunStage</code> is the category of types that can execute a
<code>Process</code> and propagate its output value. I’ll leave this for now, and we’ll
come back to it in a moment.</p>

<p>Executing a pipeline requires executing each sub-pipeline. Earlier, while
showing the <code>Stage</code> trait, I left out one critical piece of information. What
is the nature of the pipeline’s first stage? I’ve got a type called
<code>PipelineBuilder</code>. The name of this type should conjure an accurate depiction
of its actual qualities–it contains nothing interesting. Except, an
implementation of <code>Stage</code>:</p>

<pre><code>impl Stage&lt;(), (), ()&gt; for PipelineBuilder {
    type Result&lt;R&gt; = Pipeline&lt;R, ()&gt;;
    fn stage&lt;Q&gt;(self, process: Q) -&gt; Self::Result&lt;Q&gt;
    where
        Q: Process&lt;()&gt;,
    {
        Pipeline {
            process,
            previous: (),
        }
    }
}
</code></pre>

<p>So with this, if there exists a function <code>Pipeline::builder()</code> which returns a
<code>PipelineBuilder</code>, we can construct a <code>Pipeline</code>, given a process whose input
is the unit type <code>()</code>, and whose previous stage is also the unit type. This is
nice–it requires that the initial stage of a pipeline take no inputs.</p>

<p>What does this have to do with executing a pipeline, though? Since we know the
“head” of the pipeline is always the unit type, we can use this as a kind of
recursive “base case” and provide an <code>impl</code> for it:</p>

<pre><code>impl RunStage for () {
    type Output = ();
    fn run_stage(self) -&gt; Self::Output {
        ()
    }
}
</code></pre>

<p>You might have noticed that the output of this <code>impl</code> is <em>also</em> the unit type,
which happens to <em>also</em> be the input type of the next stage. Isn’t that
beautiful? Finally, we can provide an implementation for <code>Pipeline</code>.</p>

<pre><code>impl&lt;P, R&gt; RunStage for Pipeline&lt;P, R&gt;
where
    P: Process&lt;&lt;R as RunStage&gt;::Output&gt;,
    R: RunStage,
{
    type Output = P::Output;
    fn run_stage(self) -&gt; Self::Output {
        let Self { process, previous } = self;
        process.process(previous.run_stage())
    }
}
</code></pre>

<p>The one thing I’ve left out is the trait <code>Execute</code>, which can be implemented in
terms of <code>RunStage</code>. It’s so simple that I don’t see any value to including it
here. I’ll leave this and other details as an exercise to the reader, or you
can go and look at the <a href="https://github.com/AmateurECE/redfish-codegen/tree/feature/sequential-batch-processing">feature branch</a> where I’ve implemented some of this
stuff. Cheater!</p>

<p>With this, the redfish-codegen project should be in a good place to begin
applying this pattern in future work.</p>

<p>Thanks for reading!</p>]]></content><author><name></name></author><category term="design" /><summary type="html"><![CDATA[As promised, this post is a follow up to my previous post with the details of how we’ve implemented a batch-sequential architecture pattern for the redfish-codegen project.]]></summary></entry><entry><title type="html">New Patterns for Redfish-Codegen</title><link href="https://blog.ethantwardy.com/design/2024/12/11/New-Patterns-for-Redfish-Codegen.html" rel="alternate" type="text/html" title="New Patterns for Redfish-Codegen" /><published>2024-12-11T08:00:00+00:00</published><updated>2024-12-11T08:00:00+00:00</updated><id>https://blog.ethantwardy.com/design/2024/12/11/New-Patterns-for-Redfish-Codegen</id><content type="html" xml:base="https://blog.ethantwardy.com/design/2024/12/11/New-Patterns-for-Redfish-Codegen.html"><![CDATA[<p>I haven’t spent very much time working on the Redfish-Codegen project lately,
unfortunately. There’s an open PR out there that’s festering, and I’m long
overdue for a release. I’ve been fixating on more important (read: vain) issues
that I haven’t found a path forward on, and I’m experiencing some executive
dysfunction.</p>

<p>The first problem is the technical complexity of the code generator, relative
to the size of the application. The constructor for the main class of the code
generator application is 113 lines long, and while I am pleased that we managed
to keep <em>expert information</em> localized in this area of the codebase for so
long, it’s become quite unwieldy. Mustache templates for code generation have
become cumbersome and bugprone to maintain, and there’s a monstrous 8.81 MiB
patch file under version control that performs a simple transformation on <em>all</em>
of the input schemas. Naturally, this patch is applied using <code>quilt(1)</code>, and
it’s generated by a Python script, which is called from a shell script. There
are a lot of skills that a contributor needs to be effective.</p>

<p>Contributors to this codebase know Rust–we can count on that, or else they
wouldn’t have been considering this solution. Anything else is extra. So, I’d
like to gradually rewrite the code generator in Rust.</p>

<p>The ultimate pattern for gradual replacement of legacy systems is the
<a href="https://martinfowler.com/bliki/StranglerFigApplication.html">Strangler Fig pattern</a>. In this pattern, rewrites can proceed as long as
they don’t cross the <em>seam</em> between two components. Rewriting a component is
done wholesale, but as long as components are isolated, we are saved from the
trap of the rewrite spiraling out of control. In order to apply this pattern,
though, we need <em>seams</em>. That’s where my next pattern comes in.</p>

<p>Since the beginning, the code generator has employed a kind of batch-processing
technique, similar to the Batch-sequential processing architecture pattern. In
this pattern, connectors pass data between <em>stages</em>, which transform the data
from one form to the next. Ideally, stages are decoupled from the
implementation of adjacent stages, coupled only to the representation of the
data that they receive. Stages receive their input in its entirety, and they
produce their entire output synchronous with the next stage. This is a common
architecture style for compilers and code generators of all kinds. In the
canonical implementation, stages are allowed (or expected) to terminate when
they have completed their processing.</p>

<p>This isn’t formalized in the architecture, however, so code for one “stage” is
mixed with code from another stage. When reading the code, one has to stare for
a long time to figure out <em>when</em> this transformation might be applied during
code generation.</p>

<p>This sucks.</p>

<p>Keeping all of the expert information localized to the main class was very
useful towards discovering this, however, since it made the problem painfully
obvious as soon as it appeared.</p>

<p>So, I implemented a few types in Rust that will allow us to begin
reconstructing the pipeline in Rust, with first-class abstractions that
represent our architecture pattern, which I’ll describe in my <a href="/design/2024/12/11/Implementing-the-batch-sequential-architecture-style-in-Rust.html">next post</a>!.</p>]]></content><author><name></name></author><category term="design" /><summary type="html"><![CDATA[I haven’t spent very much time working on the Redfish-Codegen project lately, unfortunately. There’s an open PR out there that’s festering, and I’m long overdue for a release. I’ve been fixating on more important (read: vain) issues that I haven’t found a path forward on, and I’m experiencing some executive dysfunction.]]></summary></entry><entry><title type="html">Fixing Git Clone Errors</title><link href="https://blog.ethantwardy.com/server/2024/11/28/Chasing-git-fetch-failures.html" rel="alternate" type="text/html" title="Fixing Git Clone Errors" /><published>2024-11-28T00:00:00+00:00</published><updated>2024-11-28T00:00:00+00:00</updated><id>https://blog.ethantwardy.com/server/2024/11/28/Chasing-git-fetch-failures</id><content type="html" xml:base="https://blog.ethantwardy.com/server/2024/11/28/Chasing-git-fetch-failures.html"><![CDATA[<p>Just before the holiday, I was working on a Yocto-based distribution for a
Raspberry Pi. I’ve been using the gadget to stream music to my stereo over
Bluetooth. I’d just finished pulling a bunch of junk out of the image when, to
my disappointment, the <code>do_fetch</code> task for <code>linux-raspberrypi</code> failed. So, I
tried again, and it failed again. I had been able to fetch successfully not
twenty minutes earlier. Inspecting the log, it looks like <code>git index-pack</code>
generated an invalid index file for one of the received pack archives:</p>

<pre><code>...
Receiving objects: 100% (74709/74709), 26.56 MiB | 7.52 MiB/s, done.
fatal: local object e0a447351623bfa2df5a7e7429e1479826bc9a7a is corrupt
fatal: fetch-pack: invalid index-pack output
</code></pre>

<p>I’m not fluent in git internals, so at the time, this meant nothing to me. My
immediate suspicion was a network error. I’ve seen repeatable problems with
git clone magically disappear after Europe goes to sleep in the past, so I
assumed this was another such fluke. It was already late by this time, so I
went to bed.</p>

<p>As you might imagine, it <em>did not</em> resolve itself in the morning. I tried
setting <code>BB_SHALLOW_CLONE</code> and <code>BB_SHALLOW_CLONE_DEPTH</code> in my kas file to see
if I could work around the issue by trying to minimize data transfer. No such
luck. Strangely, I had not seen this with any other repository in my distro.</p>

<p>I tried the clone manually–the same branch, from the same GitHub repository.
Here, I was able to get through a shallow clone, but trying to deepen the clone
with <code>git fetch --unshallow</code> produced the same errors as were in the BitBake
log.</p>

<p>So, I scripted an interaction to incrementally deepen the clone, to see how far
I could get:</p>

<pre><code>$ while true; do git fetch --deepen=1; done
</code></pre>

<p>This worked for a little while, until I got to a region of the history that
retrying wouldn’t seem to get through. It wasn’t a terribly large transaction,
only about 50 MiB. It gets more interesting, though–the error message isn’t
consistent. There are a few patterns that I could pull out, in addition to the
one shown above:</p>

<pre><code>Receiving objects: 100% (130810/130810), 48.85 MiB | 7.40 MiB/s, done.
fatal: SHA1 COLLISION FOUND WITH c8fdd0d03907f9d11d2080ec77d94add9f144916 !
fatal: fetch-pack: invalid index-pack output
</code></pre>

<pre><code>Receiving objects: 100% (130810/130810), 48.85 MiB | 8.33 MiB/s, done.
error: inflate: data stream error (incorrect data check)
</code></pre>

<p>In a situation like this, it often helps me to view the system from a high
level and work on ranking failure modes for each component. In this scenario,
I’m cloning the repository on my AMD machine running Debian testing. This
operation goes out to the network, and copies a bunch of data from a server to
disk. So, these are the major components:</p>

<ol>
  <li>The Git remote (GitHub)</li>
  <li>The network</li>
  <li>My installation of Git</li>
  <li>My server’s RAM</li>
  <li>My SSD</li>
</ol>

<p>Let’s move down the list. GitHub wasn’t reporting an outage, and since I hadn’t
had any other network troubles, it seemed unlikely to be something outside of
my box. A bad DIMM might fit the bill, but I would expect to see other kinds of
system instability–processes crashing and unrecoverable kernel panics at
runtime, etc.</p>

<p>Next is the installation of git. The reported version is 2.45.2, and that
matches the version of the installed package from <code>dpkg -l</code>. When I looked to
see if there was an upgrade available, <code>apt</code> took the liberty of reminding me
about an issue I’ve been ignoring for a month:</p>

<pre><code>  WARNING: Device /dev/sdb5 has size of 911755265 sectors which is smaller than corresponding PV size of 911757312 sectors. Was device resized?
  WARNING: One or more devices used as PVs in VG edtwardy-vg have changed sizes.
</code></pre>

<p>The partition <code>/dev/sdb5</code> is the only physical volume in the LVM2 volume group
that contains my home directory and root filesystem. This error is telling us
that the LVM2 physical volume is configured for a size exactly 1023.5 KiB
larger than the partition that actually contains it. I’m not exactly sure how
that happened. Recently, I was setting up a btrfs filesystem on a neighboring
partition. It’s likely that I made an arithmetic error when I was resizing
everything.</p>

<p>I procrastinate fixing things like this because my partitioning solution is
extremely complicated in its current state, <em>and</em> I never have a Debian Live CD
around when I need it. After booting into a live image, I fixed the issue by
freeing up 1 logical extent (about 4 MiB) from the volume containing my <code>/var</code>
partition and reallocating a couple of extents to make free space at the end of
the physical volume. This allowed me to reduce the size of the PV to the size
of the partition.</p>

<p>Apt no longer reports the above error, and a test shows that I can clone the
linux kernel. Even better, it still works <em>the second time</em>. It bothers me that
I’m not sure why this may have been the cause of the problem. I know that git
makes some temporary files in <code>/var/tmp</code>, perhaps the invalid logical extent
lived somewhere in that partition. I don’t exactly know what writing to that
region would do, but I’m not surprised that it wouldn’t work. I suppose I’m
more surprised that I didn’t see something about this in <code>dmesg</code> first.</p>

<h1 id="december-update">December Update</h1>

<p>I never saw the failing Git clone errors again, but I <em>did</em> start seeing other
kinds of system instability. I saw SEGFAULTs in GCC, crashing in pseudo, and
finally, ext4 corruption. This all prompted me to run memtest86+, and sure
enough, I had about 2049 bad addresses. A new pair of DIMMs passed a memtest
out of the box, and I haven’t seen the problems since! It’s entirely possible
this <em>was</em> caused by the bad RAM. But the lvm2 size issue was another ticking
time-bomb that needed action, so I can’t complain that now the both of them are
resolved.</p>]]></content><author><name></name></author><category term="server" /><summary type="html"><![CDATA[Just before the holiday, I was working on a Yocto-based distribution for a Raspberry Pi. I’ve been using the gadget to stream music to my stereo over Bluetooth. I’d just finished pulling a bunch of junk out of the image when, to my disappointment, the do_fetch task for linux-raspberrypi failed. So, I tried again, and it failed again. I had been able to fetch successfully not twenty minutes earlier. Inspecting the log, it looks like git index-pack generated an invalid index file for one of the received pack archives:]]></summary></entry><entry><title type="html">From Problem to Solution</title><link href="https://blog.ethantwardy.com/design/2024/11/11/From-Problem-to-Solution.html" rel="alternate" type="text/html" title="From Problem to Solution" /><published>2024-11-11T00:00:00+00:00</published><updated>2024-11-11T00:00:00+00:00</updated><id>https://blog.ethantwardy.com/design/2024/11/11/From-Problem-to-Solution</id><content type="html" xml:base="https://blog.ethantwardy.com/design/2024/11/11/From-Problem-to-Solution.html"><![CDATA[<h1 id="from-problem-to-solution">From Problem to Solution</h1>

<p>I often struggle to organize and prioritize my ideas. Time is precious, and I’d
prefer to spend it with love ones, relaxing, and practicing self care, but my
time spent tinkering is also very important to me. Today, I did some semantic
modeling to understand the motivation for the work I do in my free time, and I
came up with this model:</p>

<p><img src="/assets/images/2024-11-11/module.svg" alt="Semantic Model of Problem Solving and Motivation" /></p>

<p>In this model, a <em>Feature</em> is an increment of work; a thing that I spent time
to accomplish. My takeaway was that there are three motivations for completing
<em>Features</em>:</p>

<ol>
  <li>To enable a <em>Use Case</em>.</li>
  <li>To mitigate a <em>Risk</em>.</li>
  <li>To advance a <em>Goal</em>.</li>
</ol>

<p>I think that completely captures the solution space. I won’t explain every
relationship in the diagram, since the relationships speak for themselves.
There are a couple of details not represented in this diagram that deserve a
little explaining, however.</p>

<h1 id="solvable-problems">Solvable Problems</h1>

<p>This model exists so that I can focus my creativity to develop solutions to
problems. The starting point is to recognize when a need arises and to identify
the next step. The fundamental assumption of this process is that the <em>Problem</em>
statement is the genesis of the product–that there exists some hypothetical
product or process which is capable of addressing the need underlying the
problem statement. Obviously, some problems can’t be solved with products or
processes. I’m sure you have no trouble thinking of one. <em>Some</em> of these
problems can be decomposed further into problem statements that <em>do</em> uphold
this assumption, however.</p>

<h1 id="the-genesis-of-product">The Genesis of Product</h1>

<p>In my last post, I wrote about the source of requirements. You might notice
that I only listed two sources of requirements there, but there’s a third thing
here that motivates my work–<em>Goals</em>, which are driven by <em>Values</em>. This is a
different kind of value than the value-driven design I proposed in my last
post. I hope you won’t find me naive if I say that values are generally <em>not</em> a
motivating factor in industry. My employer defines a set of values, but they
only inform <em>how</em> I accomplish my work. They do not dictate <em>what</em> I work on.
Conversely, I as an individual can form a value statement around the
accessibility and quality of open source software. That’s enough justification
to make contributions to the Linux kernel. I’d be surprised if businesses were
making decisions in the same way. That’s why I included it here.</p>

<h1 id="use-case-subtypes">Use Case Subtypes</h1>

<p>I often work with a subtype of <em>Use Cases</em> with which any software developer
ought to be familiar: <em>User Stories</em>. I chose to omit it from the diagram to
minimize visual noise. Here, I consider a user story to be a subtype of a use
case because it has more constraints than a use case often does: a user story
contains the use case built into its statement, and includes the user’s
motivation. The latter is usually omitted from a canonical Use Case
description. It may be the case that the next logical step, after capturing a
problem statement, may be to conceive a <em>User Story</em> that addresses the
underlying need, especially if the solution to the problem involves a great
amount of technical detail not suitable for a high-level use case, or if the
motivation isn’t inherently clear from the solution.</p>

<h1 id="risk-subtypes">Risk Subtypes</h1>

<p>There are also two implicit subtypes of <em>Risk</em> not captured in the model.</p>

<p>A <em>Program Risk</em> is a risk that I will fail to implement my objective. For
example, if I’m working with a new technology to implement a feature, there may
be a chance that I misunderstand the technology and fail to implement the
product in a useful manner. I may choose to mitigate this risk by doing some
investigation or prototyping to burn down the risk.</p>

<p>I’ll refer to the other type of risk as <em>System Risk</em>. This is the risk that a
user, while acting on my product to achieve their goal, will fail or experience
harm. These are usually mitigated with tools like <em>invariants</em>, which enforce
rules on the ways that a user can interact with a product based on its state,
or with <em>architectural features</em>, which support a qualitative architectural
characteristic, such as quality or reliability.</p>

<p>Both kinds of risk impede my <em>Goals</em> and <em>Use Cases</em>, which is why I didn’t
draw the distinction in the model.</p>

<h1 id="conclusion">Conclusion</h1>

<p>My plan is to refer back to this model from time to time, to calibrate my
process for tinkering. The goal here was not to create a new process, but to
model the process I’m already using, to be able to understand and reason about
it more effectively in the future. I’ll provide updates in the context of my
projects in future posts!</p>]]></content><author><name></name></author><category term="design" /><summary type="html"><![CDATA[From Problem to Solution]]></summary></entry><entry><title type="html">Value- and Risk-driven Design</title><link href="https://blog.ethantwardy.com/design/2024/10/31/Value-and-Risk-Driven-Design-Methods.html" rel="alternate" type="text/html" title="Value- and Risk-driven Design" /><published>2024-10-31T00:00:00+00:00</published><updated>2024-10-31T00:00:00+00:00</updated><id>https://blog.ethantwardy.com/design/2024/10/31/Value-and-Risk-Driven-Design-Methods</id><content type="html" xml:base="https://blog.ethantwardy.com/design/2024/10/31/Value-and-Risk-Driven-Design-Methods.html"><![CDATA[<p>I’ve been favoring design methods lately that I would consider to be “Value-“
and/or “Risk-“ driven. “Risk-driven” design methods, as I refer to them, are
well-documented. The book <em>Just Enough Software Architecture: A Risk-driven
Approach</em> by George Fairbanks stresses a risk-driven method for software
architecture. There are international standards for various industries that
describe risk management processes proven to be successful in their target
industry:</p>

<ul>
  <li>IEC 61508 (safety-critical industrial applications)</li>
  <li>ISO 14971 (medical devices)</li>
  <li>ISO 26262 (typically automotive applications)</li>
</ul>

<p>Usually, this involves identifying hazards (situations that cause harm) and
failure modes (events that can cause hazardous situations). An engineer then
assigns a risk level based on the probability of the failure mode occurring and
the severity of the harm. If the risk level is not acceptable, the engineer
identifies mitigations that reduce the risk to an acceptable level. There are
many tools to aid this process, including fault tree analysis, reliability
testing and hypothesis testing.</p>

<p>Value-driven design methods are not well documented, but they are universally
understood; probably, a number of things come immediately to mind. Here, the
term refers to design methods that either increase the inherent “value” of a
product or service, or decrease its cost. This can involve optimizing unit cost
or reducing non-recurring expenses. Often, it means planning to deliver the
highest value features within the stakeholder’s financial constraints.</p>

<p>How do we identify the highest value work?</p>

<h1 id="the-source-of-requirements">The Source of Requirements</h1>

<p>Let’s forget about business requirements for this conversation. I find there
are two sources for the genesis of product requirements:</p>

<ol>
  <li>Suffering</li>
  <li>Risk</li>
</ol>

<p>In that order.</p>

<p>Really, <em>use cases</em> are the primary source of requirements. But what generates
use cases? Ultimately, people make choices to lessen their suffering (or the
suffering of others; I do believe in empathy). So <em>suffering</em>, I think, is the
root cause of a use case.</p>

<p>We can capture and model a person’s choices through <em>functionality scenarios</em>,
which describe the course taken by an actor with a goal as they navigate from
problem to solution (Fairbanks, 2010). Generally, these don’t capture the pain
point that motivated an actor, and they also don’t capture the system of
interest–they simply trace an actor’s steps. Though, perhaps they <em>should</em>
capture motivation. If they did, it may be easier for us to develop empathy for
our customers and end users. Where I work, in the engineering services
industry, empathy is what brings work through the door.</p>

<p>Functionality scenarios can be used to identify and quantify use cases and
domain concepts. From there, we can begin to identify features and facets of
the system under interest, and visions of the solution space may begin to dance
in our heads. Use cases lend themselves to requirements, and requirements lend
themselves to code. We <em>know</em> this process, and yet we frequently fail to apply
it.</p>

<p>On the other hand, though, engineering <em>generates</em> risk. Our product may
provide services that ease suffering, but it likely also introduces <em>new</em> ways
of creating suffering. What happens if our actor uses the product wrong, or the
product fails? Is the user better or worse off than they were to begin with?
What about our business? If our product fails, our name may become tarnished
and our employees may worry about feeding their families.</p>

<p>Risk-analysis is the activity that removes the barriers to joy. It helps us to
implement mitigations that protect our users, reduce technical risk, and finish
faster with a better product. Risk mitigations reveal non-functional
requirements. They also lend themselves naturally to architecture solutions.
Risk mitigations tend to be <em>intensional</em>–as in, related to design <em>intent</em>,
rather than solutions we can apply directly to our code and assemblies.
Consider, for example, a heterogeneous redundant solution in a high-SIL
application. We can <em>read</em> the code and <em>see</em> that two software units have a
similar function, but the average reader might jump to the conclusion that one
unit is dead code–a relic from a prototype. However, an engineer could create
an architectural <em>view</em> that traces the redundancy directly to a risk
mitigation.</p>

<p>In the medical and aerospace industries, outputs of risk analysis activities
feed into product requirements and architecture. I consider
cybersecurity-related activities to also fall into this category. Planning to
apply cybersecurity process at the end of a project is planning to fail.</p>

<p>I imagine the forces of suffering and risk as being on either side of a
see-saw. On Monday, we may discover a new use case. On Wednesday, we look at
our product invariants, affordances, and our new architecture, and consider all
the new ways we’ve just constructed to fail.</p>

<h1 id="designing-the-design-process">Designing the Design Process</h1>

<p><em>How</em> we tackle a problem is more important than the problem itself. Lately,
I’ve had two questions ringing in my head:</p>

<blockquote>
  <p>Do I know everything I need to know to succeed?
What can I do today to be more sure of my success tomorrow?</p>
</blockquote>

<p>This forces me to think about technical and project management risk. But it
also forces me to think about the problem statement, and the design process.
What’s my customer’s greatest pain point? What’s the problem they’re trying to
solve? How can I make sure I know the <em>right</em> answer to these questions? How
will I make sure the patient is safe, even if my code fails?</p>

<p>This, I think, is the fundamental principle underpinning the design methods I’m
referring to: not applying a rote development process, not indifferently
applying a canned architecture style. Continuously evaluating the present to
ensure I’m solving the right problem today.</p>

<h1 id="conclusion">Conclusion</h1>

<p>I’m currently trying to apply this strategy on a program at work, where I’m
serving the team as their software architect. I’m also trying to apply this to
two projects at home: designing a backup system for my server, and building a
financial tool.</p>

<p>In the past, I’ve failed at developing solutions for these because I would
either fall victim to a form of analysis paralysis, and end up designing an
ivory tower, or repeatedly prototype something that doesn’t solve the problem.
Do I really need a tool with a hundred views that graphs data in real time? Do
I really need that Redfish-enabled off-site RAID array? The answer would turn
out to be no, of course. Now, I’m hoping that this fresh perspective will help
me to apply just enough design to the right problem.</p>

<p>If you know about any books on this topic, reach out to me. This is an area
where I want to read and learn more.</p>]]></content><author><name></name></author><category term="design" /><summary type="html"><![CDATA[I’ve been favoring design methods lately that I would consider to be “Value-“ and/or “Risk-“ driven. “Risk-driven” design methods, as I refer to them, are well-documented. The book Just Enough Software Architecture: A Risk-driven Approach by George Fairbanks stresses a risk-driven method for software architecture. There are international standards for various industries that describe risk management processes proven to be successful in their target industry:]]></summary></entry><entry><title type="html">The Last Nine Months</title><link href="https://blog.ethantwardy.com/misc/2024/10/30/The-Last-Nine-Months.html" rel="alternate" type="text/html" title="The Last Nine Months" /><published>2024-10-30T00:00:00+00:00</published><updated>2024-10-30T00:00:00+00:00</updated><id>https://blog.ethantwardy.com/misc/2024/10/30/The-Last-Nine-Months</id><content type="html" xml:base="https://blog.ethantwardy.com/misc/2024/10/30/The-Last-Nine-Months.html"><![CDATA[<h1 id="the-last-nine-months">The Last Nine Months</h1>

<p>It’s been a while since I last posted. I’ve been reading. <em>A lot</em>. I’ve also
been trying to catch up on some projects that I’ve been neglecting, and start
an initiative around Rust at work. And keep up on my relationships. And
practice hobbies that aren’t programming.</p>

<p>In the time since my last post, I’ve spun up an instance of <a href="https://miniflux.app/"><code>miniflux</code></a> on
my server–an absolutely fantastic application, by the way. It’s a simple,
self-hosted RSS aggregator. I deploy it in the same fashion as <a href="https://github.com/AmateurECE/twardyece-services/commit/c30c0d2a607120bfddedf61ba6d2f155dc75263d">all of my other
applications</a>, including this blog. I’ve been reading the blogs of some
really smart people, and I’ve learned that I shouldn’t feel like I need a lot
of words to say what I want to say. The book <a href="https://www.goodreads.com/book/show/13155290-several-short-sentences-about-writing">Several Short Sentences About
Writing</a> by Verlyn Klinkenborg has also helped in that area.</p>

<p>So, I’m going to try to commit to shorter posts from now on. I’m writing this
one while I cook dinner. Hopefully, this will help me to post more in the
future.</p>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[The Last Nine Months]]></summary></entry><entry><title type="html">(Nearly) Immutable Jenkins Deployments</title><link href="https://blog.ethantwardy.com/server/2024/01/20/Nearly-Immutable-Jenkins.html" rel="alternate" type="text/html" title="(Nearly) Immutable Jenkins Deployments" /><published>2024-01-20T00:00:00+00:00</published><updated>2024-01-20T00:00:00+00:00</updated><id>https://blog.ethantwardy.com/server/2024/01/20/Nearly-Immutable-Jenkins</id><content type="html" xml:base="https://blog.ethantwardy.com/server/2024/01/20/Nearly-Immutable-Jenkins.html"><![CDATA[<h1 id="in-search-of-immutable-deployments">In Search of Immutable Deployments</h1>

<p>This blog, and the rest of my applications, are served from a single computer
running Debian in my closet. Among my deployments is an instance of Jenkins,
which I use to continuously build and deploy my two static websites–this blog,
and my <a href="https://twardyece.com/repository/">quick-reference documentation</a>. The setup was inspired by GitHub
Pages, and I could do this same thing much more easily if I full embraced the
GitHub solutions, but that doesn’t align with my romance for self-hosting. Git
is the only thing I’m too afraid to self-host for the moment (the right
combination of drive failures and my life’s work is gone forever), so that
significantly narrows the landscape of CI solutions that are available to me.</p>

<p>I originally introduced Jenkins in August of 2021 (<a href="https://github.com/AmateurECE/twardyece-services/commit/1ee4d6cc657e7258adb06bbdad50d016a8be1c80">commit</a>), where I was
running my Jenkins controller and a single agent in Podman containers. Very
quickly, the setup became unwieldy–there was no way to track what had been
changed, and it was difficult to remember what to do when I wanted to recreate
my agent or add a job. Additionally, the limitations of the configuration began
to conflict with the design goals of my other deployments:</p>

<ol>
  <li>Every application or site should be split into its own Debian package.</li>
  <li>Installing a package should bring a site up, and removing the package should
remove all site data and bring the site down gracefully.</li>
</ol>

<p>This is difficult to do when adding a site means I have to log in to my Jenkins
instance, click around on the GUI and chant some incantations in order to setup
a job for a new application. Thus spawned the need for configuration as code,
and the creation of an <em>immutable</em> deployment for Jenkins.</p>

<p><em>Immutability</em> in the context of application deployments means that the
application and all its associated data can be destroyed and programmatically
recreated anew–at will. Obviously, <strong>configuration as code</strong> is a big part of
immutable deployments. After that problem is solved, there is <strong>data
management</strong>–minimizing the data that needs to be persisted between
deployments, and maximizing the data that can be destroyed and recreated
deterministically at will. Finally, there is <strong>configuration discovery</strong>–how
can dependent applications configure the Jenkins instance programmatically,
e.g. by adding jobs?</p>

<p>Jenkins is simultaneously very slow moving and very fast moving. Certain
defects and critical feature requests go ignored for years, but plugins
introduce breaking changes constantly. Additionally, there aren’t a lot of
great resources on the management of Jenkins deployments. The world, it seems,
is moving away from Jenkins, towards CI solutions that are highly integrated
with source control solutions–Gitlab CI, Bamboo, GitHub CI, etc. Most of this
process was achieved by reading issues on GitHub and Jira created by people who
had encountered the same problems as I, and crafting the ultimate solution from
the breadcrumbs.</p>

<h1 id="configuration-as-code">Configuration as Code</h1>

<p>Thankfully, there is a plugin for this–and it works quite well. The Jenkins
configuration as code plugin can even export the configuration of a running
instance into YAML, to provide a starting point. I installed the plugin, and
navigated to <strong>Manage Jenkins &gt; Configuration as Code &gt; View Configuration</strong>.</p>

<p>The exported configuration was quite long, and there were many options that I
didn’t understand, but I decided that minimizing the configuration could wait
until after I had a running setup.</p>

<p>To inject this configuration into my container, I decided to do something very
similar to what I had recently implemented for my Nginx configuration–I would
install the configuration file to somewhere in my root filesystem, then have a
<a href="https://github.com/AmateurECE/twardyece-services/blob/77d95e9670864aefba1c89de13d5474874aa3bbb/debian/twardyece-jenkins.postinst">dpkg trigger</a> that would produce a squashfs image at installation time, and
finally a <a href="https://github.com/AmateurECE/twardyece-services/blob/77d95e9670864aefba1c89de13d5474874aa3bbb/jenkins/jenkins-controller-config.volume">Quadlet volume configuration</a> file that would mount the squashfs
image into the container. This would provide the mechanism for other packages
(applications) to install configuration fragments later that could be picked up
by the dpkg trigger. Configuring Jenkins to use this configuration file is as
easy as following the <a href="https://github.com/jenkinsci/configuration-as-code-plugin">README for the plugin</a>.</p>

<p>At first, the Jenkins instance was failing to read the mounted configuration
file from the squashfs image. Thankfully, the container entrypoint remains
running after this failure, so it’s easy to <code>exec</code> into the container to poke
around. Obviously, it was a permissions issue with the mountpoint. Since the
Jenkins instance runs as a non-root user in the container, I had to change the
volume configuration to mount the volume as owned by the <code>jenkins</code> user:</p>

<pre><code>[Volume]
# Other options...
Options=allow_other
User=1000
Group=1000
</code></pre>

<p>The <code>allow_other</code> option is surprisingly required. Without it, the image is
mounted as owned by UID 1000, but non-root users cannot access <em>the mountpoint
itself</em>. It took me a while to figure this out.</p>

<h2 id="bootstrapping-plugins">Bootstrapping Plugins</h2>

<p>The official Jenkins instance comes with no plugins installed, so we have to
create a derived container image that comes with all the plugins we need. The
Jenkins configuration as code documentation points us to <a href="https://github.com/jenkinsci/docker/#preinstalling-plugins">a page in the
official Jenkins docs</a> that describes how to do this using the Jenkins
plugin CLI:</p>

<pre><code>FROM jenkins/jenkins:lts-jdk17
COPY --chown=jenkins:jenkins plugins.txt /usr/share/jenkins/ref/plugins.txt
RUN jenkins-plugin-cli -f /usr/share/jenkins/ref/plugins.txt
</code></pre>

<p>That same page gives us a mechanism to getting the list of currently installed
plugins with a cURL command:</p>

<pre><code>JENKINS_HOST=username:password@myhost.com:port
curl -sSL "http://$JENKINS_HOST/pluginManager/api/xml?depth=1&amp;xpath=/*/*/shortName|/*/*/version&amp;wrapper=plugins" | perl -pe 's/.*?&lt;shortName&gt;([\w-]+).*?&lt;version&gt;([^&lt;]+)()(&lt;\/\w+&gt;)+/\1 \2\n/g'|sed 's/ /:/'
</code></pre>

<h1 id="data-management">Data management</h1>

<p>Since we’re going for immutability, I want to persist as little as possible. In
the old configuration, there was a Jenkins “config” volume that persisted
everything under <code>/var/jenkins_home</code>. This ended up being pretty much
everything–secrets, plugin binaries, and of course, the configuration itself.</p>

<p>The ideal scenario is that no volumes are required–the container creates all
the data needed for the running instance of Jenkins, and all of that data is
destroyed when the instance stops. When I tried removing this volume, however,
the Jenkins agent failed to connect. After some poking around, it became clear
that this is because the Jenkins JNLP secrets used to connect agents to
controllers are not deterministically generated. If I were running my agents in
a k8s cluster, I could configure the controller to dynamically spin up agents
for job processing as necessary. However, single-node k8s is still not an
option in 2024 without virtual machines, and I just don’t care to introduce a
new heavy-handed virtualization mechanism on my poor Ryzen 3 CPU.</p>

<p>Planning for a future where k8s is an option, the current best option is to
create agents through the GUI, and then persist their secrets into a volume.
After some googling, I stumbled upon <a href="https://github.com/jenkinsci/configuration-as-code-plugin/issues/2250">this GitHub issue</a>, which recommends
provisioning the contents of <code>/var/jenkins_home/secrets</code> before starting the
container. This is a flat directory, containing only a few small files, so I’ll
choose to persist this with a btrfs volume:</p>

<pre><code>[Volume]
PodmanArgs=--driver=local
Type=btrfs
Options=subvol=@jenkins_controller-secrets
Device=/dev/disk/by-uuid/05599193-00bc-4a81-9550-54623b2ec8c4
</code></pre>

<p>This also demonstrates the new strategy I’ve taken for all of my container
volumes recently, which is to persist the actual data in a btrfs subvolume,
which I then create a Quadlet volume unit for. This creates a named volume that
mounts the subvolume into the container at runtime, so I can safely run <code>podman
volume prune</code> without worrying about a loss of data!</p>

<h1 id="configuration-discovery">Configuration Discovery</h1>

<p>The penultimate issue that needs solving is the discovery of configuration. How
do dependent applications programmatically create jobs in the controller? We
can use the Jenkins <code>job-dsl</code> plugin for this. The mechanism we’ll implement
allows packages to install individual Groovy files to a known location, which
will be picked up by our dpkg trigger and used to regenerate the configuration
for the Jenkins controller. The configuration fragment we need to generate from
the Groovy files will look something like <a href="https://github.com/jenkinsci/configuration-as-code-plugin/blob/master/demos/jobs/multibranch-github.yaml">this demo</a> from the
configuration-as-code project.</p>

<p>This requires some changes to our dpkg hook to generate a single configuration
fragment file, called <code>jobs.yaml</code>, containing the <code>job-dsl</code> scripts from the
individual Groovy files:</p>

<div class="language-patch highlighter-tree-sitter"><pre><code class="highlight"><span class="gh">diff --git a/debian/twardyece-jenkins.postinst b/debian/twardyece-jenkins.postinst
index ce18fbb..0cb3675 100644
</span><span class="gd">--- a/debian/twardyece-jenkins.postinst
</span><span class="gi">+++ b/debian/twardyece-jenkins.postinst
</span><span class="p">@@ -10,8 +10,23 @@</span> update_config() {
     rm -rf $LOCAL_STATE_DIR/casc
     mkdir -p $LOCAL_STATE_DIR/casc
 
<span class="gi">+    # Copy all YAML files directly to the config directory
+    cp $DATA_DIR/*.yaml $LOCAL_STATE_DIR/casc
+
+    # Emit all groovy files into a YAML fragment as job-dsl scripts
+    local IFS_SAVE=$IFS
+    IFS=$'\n'
+    printf '%s\n' "jobs:" &gt; $LOCAL_STATE_DIR/casc/jobs.yaml
+    for f in $DATA_DIR/*.groovy; do
+           printf '  - script: &gt;\n' &gt;&gt; $LOCAL_STATE_DIR/casc/jobs.yaml
+           for line in $(cat $f); do
+                   printf '      %s\n' "$line" &gt;&gt; $LOCAL_STATE_DIR/casc/jobs.yaml
+           done
+    done
+    IFS=$IFS_SAVE
+
</span>     # Create a volume image from the configuration files
<span class="gd">-    mksquashfs $DATA_DIR $LOCAL_STATE_DIR/$IMAGE -noappend
</span><span class="gi">+    mksquashfs $LOCAL_STATE_DIR/casc $LOCAL_STATE_DIR/$IMAGE -noappend
</span> }
 
 case "$1" in
</code></pre></div>

<p>Now, we can create the job that builds this blog as its own Groovy file and
install it to <code>/usr/share/twardyece-jenkins</code> to be automatically picked up at
package installation time:</p>

<pre><code>multibranchPipelineJob('Blog') {
  branchSources {
    git {
      id('blog-trunk')
      remote('https://github.com/AmateurECE/twardyece-blog.git')
      includes('trunk')
    }
  }
}
</code></pre>

<h1 id="agent-dependency-management">Agent Dependency Management</h1>

<p>In the old way, I had all the dependencies necessary to build my Jenkins jobs
installed in the image that the agent was running. However, this creates
another point of tight coupling between my jobs and my agents. At the company
where I work, our agents launch Docker containers that contain our build
environments when a new job is triggered. I tried multiple scenarios to achieve
a similar kind of thing:</p>

<ol>
  <li>The Jenkins Kubernetes plugin can dynamically launch agents, but for reasons
already stated, this wasn’t an option for me.</li>
  <li>The docker-workflow plugin allows the Jenkinsfile to specify a container
image in which the build should be run. However, this doesn’t work when
using docker-in-docker with Podman.</li>
  <li>The docker-plugin allows the Jenkins controller to spin up agents from
container images dynamically as jobs are run, but it seems to struggle when
using docker-in-docker when the controller is running in a container.
Additionally, it uses docker-java, which requires the Docker socket to be
mounted, and the controller container to be run with <code>--privileged</code>, and
since this controller is open to the wide internet, that’s not something I
was willing to consider.</li>
</ol>

<p>In the end, I decided that all Jenkins jobs would need to use Nix flakes to
manage their dependencies. I created an agent image that had Nix installed for
the <code>jenkins</code> user, and a wrapper that allows executing a bash script within a
devShell derived from a Nix flake. <a href="https://github.com/AmateurECE/twardyece-services/blob/77d95e9670864aefba1c89de13d5474874aa3bbb/jenkins/flake-run.sh">The wrapper</a> needed to include some
annoying workarounds, because apparently Jenkins does not set the <code>USER</code>
environment variable for a job (well, it sets the <code>user</code> environment variable,
which is obviously not the same). From a Jenkinsfile, I can conveniently use
this wrapper in the shebang:</p>

<pre><code>pipeline {
  # ...
  stages {
    stage('Build') {
      steps {
        # ...
        sh '''#!/usr/bin/flake-run
        bundle install
        bundle exec jekyll build
        '''
      }
    }
  }
}
</code></pre>

<h1 id="conclusion">Conclusion</h1>

<p>That’s about as close as we can get to a fully immutable Jenkins deployment
without moving to Kubernetes. Now, there’s only two volumes that store actual
data: One to store the secrets for the controller, and one to store the secrets
for the inbound agent. In reality, this took me about a week to accomplish in
my free time–spending a few moments here and there. Hopefully this will be
helpful for someone else who decides to traverse a similar path!</p>]]></content><author><name></name></author><category term="server" /><summary type="html"><![CDATA[In Search of Immutable Deployments]]></summary></entry></feed>