forge

hunting concurrency bugs in vinext

2026-03-13

i wanted to contribute to something bigger than my own projects. vinext caught my eye - it's cloudflare's experiment in reimplementing the entire next.js api surface on top of vite. almost every line was written by ai, and they're very open about that. the codebase is genuinely interesting to read through.

the issue i picked up was #478 - "phase 2/2: finalize als rollout with parity tests, cleanup, and docs". the previous pr (#450) had just landed a massive refactor that consolidated 5-6 separate AsyncLocalStorage instances into a single unified request context. my job was to prove it actually works under concurrent load, document the architecture, and clean up the leftovers.

what is asynclocalstorage and why does it matter here

quick context if you haven't worked with this before. AsyncLocalStorage (als) is a node.js api that lets you store data that follows an async call chain - think of it like thread-local storage but for javascript's single-threaded async model. when request a comes in and kicks off some async work, and request b arrives before a finishes, als makes sure each request sees its own data.

in a framework like next.js (or vinext), every request needs its own headers, cookies, navigation state, router context, cache tags, etc. without als, concurrent requests on something like cloudflare workers would stomp on each other's state. request a's title shows up in request b's response. that kind of thing.

the setup

vinext is a pnpm monorepo. the key commands:

pnpm test tests/some-file.test.ts   # targeted tests (always do this, not the full suite)
pnpm run typecheck                    # tsgo
pnpm run lint                         # oxlint
pnpm run fmt:check                    # oxfmt

they use conventional commits (fix:, feat:, test:, docs:), and every pr gets reviewed by an ai agent called bigbonk (claude opus with max thinking). their bias is towards merging, which is refreshing.

writing the concurrency tests

the first thing i needed was fixture pages that expose request-scoped state in the html output. i created two pages in the test fixture:

concurrent-head.tsx - takes a ?id=N query param via getServerSideProps and sets a </code> and <code><meta></code> tag with that id: <pre style="background-color:#2b303b;">export default function ConcurrentHeadPage({ reqId }: Props) { return ( <div> <Head> <title>{`req-${reqId}`}

`{reqId}`


    
  );
}

concurrent-router.tsx - echoes back the ssr pathname and query from getServerSideProps plus useRouter().
the test fires 15 concurrent requests at each page and verifies every response contains only its own data. if head state leaks between requests, request 0's response would have request 14's title.
small gotcha i hit along the way - req-{reqId} in jsx produces children as an array ["req-", "42"], not a single string. vinext's head shim only serializes string children for title tags, so the title rendered empty. switching to {`req-${reqId}`} produces a single string child and works fine. not a bug i introduced - it's pre-existing in the head shim - but it tripped me up for a bit.
finding a real bug
the router isolation test passed immediately. the head isolation test did not.
expected 'req-13' to be 'req-0'

the body content was correct (request 0 showed req-id=0), but the </code> showed <code>req-13</code> - another request's head state had leaked into this response. this is exactly the kind of bug the issue asked me to verify.</p>
<h2>the root cause</h2>
<p>this one took some digging. the architecture has a registration pattern - <code>head.ts</code> (the <code>next/head</code> shim) has module-level defaults for collecting ssr head children:</p>
<pre style="background-color:#2b303b;"><span style="color:#c0c5ce;">let _ssrHeadChildren: React.ReactNode[] = [];
</span><span style="color:#c0c5ce;">let _getSSRHeadChildren = (): React.ReactNode[] => _ssrHeadChildren;
</span></pre>
<p>and <code>head-state.ts</code> registers als-backed replacements:</p>
<pre style="background-color:#2b303b;"><span style="color:#c0c5ce;">_registerHeadStateAccessors({
</span><span style="color:#c0c5ce;">  getSSRHeadChildren(): React.ReactNode[] {
</span><span style="color:#c0c5ce;">    return _getState().ssrHeadChildren; // reads from per-request ALS scope
</span><span style="color:#c0c5ce;">  },
</span><span style="color:#c0c5ce;">  // ...
</span><span style="color:#c0c5ce;">});
</span></pre>
<p>the problem: vite's dev server has <strong>separate module graphs</strong> for different environments. the dev-server imports <code>head-state.ts</code> as a static import (node context), which registers the als accessors on the node context's copy of <code>head.ts</code>. but the <code>Head</code> react component runs during ssr rendering in <strong>vite's ssr module graph</strong> - a completely different module instance. that ssr instance of <code>head.ts</code> never had <code>_registerHeadStateAccessors</code> called on it, so it was still using the shared module-level <code>_ssrHeadChildren</code> array.</p>
<p>every concurrent request's <code>Head</code> component was pushing elements into the same array.</p>
<p>this is the kind of thing that doesn't surface in serial tests. you need real concurrent load to catch it.</p>
<h2>the fix</h2>
<p>two lines:</p>
<pre style="background-color:#2b303b;"><span style="color:#c0c5ce;">await server.ssrLoadModule("vinext/head-state");
</span><span style="color:#c0c5ce;">await server.ssrLoadModule("vinext/router-state");
</span></pre>
<p>added right after the unified request context is created, before any rendering happens. this loads the state modules in vite's ssr module graph, which triggers the accessor registration on the correct module instance. the same pattern was already used for <code>vinext/i18n-state</code> - just nobody had done it for head and router state.</p>
<p>the prod server doesn't have this problem because everything gets compiled into one bundle where the imports resolve to the same module instance.</p>
<p>i also checked the other server files for parity (agents.md is very clear about this - if you touch one server file, check all four). the app router rsc entry doesn't use pages router head/router state, and the generated prod entry already has the correct imports. the bug was dev-only.</p>
<h2>prod concurrency tests</h2>
<p>the prod build is a different beast. in dev, vite has separate module graphs for different environments (node vs ssr). in prod, everything gets compiled into a single bundle - so the module-level singleton problem that caused the head leak in dev doesn't exist.</p>
<p>but we still need to prove that <code>getServerSideProps</code> data is isolated between concurrent requests. the test builds the fixture to a temp directory, starts the prod server on a random port, and fires the same 15 concurrent requests.</p>
<p>setting this up was its own adventure. the <code>pages-basic</code> fixture has an <code>alias-test.tsx</code> page with a <code>@/components/heavy</code> import that breaks when building outside the original directory structure. filtered it out during the temp dir copy - it's not relevant to what we're testing.</p>
<p>interesting discovery: the prod server has some known limitations compared to dev. <code><Head></code> component children don't get injected into the html <code><head></code> section, and <code>useRouter().pathname</code> returns <code>/</code> instead of the actual route during ssr. these aren't concurrency bugs - they're consistent behavior regardless of load. so the prod tests focus on what matters: verifying <code>getServerSideProps</code> data and ssr props don't leak between requests.</p>
<p>all four tests pass - head isolation in dev, router isolation in dev, data isolation in prod (both pages).</p>
<h2>things i learned</h2>
<ul>
<li>vite's multi-environment module graphs mean you can have the same module loaded multiple times with completely different state. this is by design for rsc/ssr/client separation, but it creates subtle bugs when server-side code assumes module singletons.</li>
<li><code>AsyncLocalStorage</code> works great for request isolation, but only if the als-backed accessors are registered in every module instance that needs them.</li>
<li>jsx <code><title>text-{variable} produces an array of children, not a string. {`text-${variable}`} produces a single string. matters when the consumer only handles string children.
writing concurrency tests that actually catch isolation bugs requires real parallel Promise.all with enough requests to trigger interleaving. serial tests will never catch these.



svdex: svd image compression
2026-03-09
i wanted to understand how linear algebra compresses images. not the "read a wikipedia article" kind of understanding - the "build it from scratch and watch it work" kind. so i wrote svdex, a little rust cli that compresses images using singular value decomposition.
here's what i learned along the way.
the big idea
every matrix can be broken into three pieces. this is svd:
$$A = U \Sigma V^T$$
$U$ and $V^T$ are rotation matrices. $\Sigma$ is a diagonal matrix of "singular values" - numbers that tell you how important each component is. they come sorted from biggest to smallest: $\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0$.
the insight that makes compression possible: the first few singular values are usually way bigger than the rest. most of the information lives in a small number of components.
so what if we just... kept the top $k$ and threw the rest away?
$$A_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^T$$
turns out this is provably optimal. the eckart-young theorem says this rank-$k$ approximation is the best you can do - no other method using $k$ components will get you closer to the original. that's not a vague claim, it's a mathematical guarantee.
i got a lot of my initial intuition from zerobone's post on svd image compression, which does a great job of connecting the math to what's actually happening with pixels.
so how does this compress an image?
an image is just three matrices stacked on top of each other - one for red, one for green, one for blue. each entry is a pixel value from 0 to 255.
the plan is simple:

pull apart the rgb channels
run svd on each one
keep only the top $k$ singular values
put it back together

in code, the truncation step looks like this:
pub fn low_rank_approx(svd: &SvdResult, k: usize) -> Array2<f64> {
    let k = k.min(svd.s.len());

    let u_k = svd.u.slice(s![.., ..k]).to_owned();
    let s_k = &svd.s.slice(s![..k]);
    let vt_k = svd.vt.slice(s![..k, ..]).to_owned();

    let u_scaled = &u_k * s_k;
    u_scaled.dot(&vt_k)
}

three slices and a dot product. that's the whole thing.
one gotcha - the reconstructed values can land outside $[0, 255]$. if you don't clamp them before saving, you get weird wrapping artifacts. learned that one the hard way.
how much space do we actually save?
original storage: $3 \cdot h \cdot w$ values (three full matrices).
compressed storage per channel: $h \times k$ for the truncated $U$, $k$ for the singular values, $k \times w$ for the truncated $V^T$. so:
$$\text{ratio} = \frac{3hw}{3k(h + 1 + w)}$$
for a $1200 \times 797$ image at rank 50, that's about $9.6\times$ compression. not bad for some matrix math.
the numbers
i ran experiments across a bunch of ranks to see what actually happens:
rank ratio mse psnr

1 478.68x 2937.52 13.45 dB
5 95.74x 1265.33 17.11 dB
10 47.87x 878.57 18.69 dB
20 23.93x 620.23 20.21 dB
50 9.57x 361.29 22.55 dB
100 4.79x 212.83 24.85 dB
200 2.39x 92.52 28.47 dB

(mse is mean squared error - lower is better. psnr is peak signal-to-noise ratio in decibels - higher is better.)
$$\text{MSE} = \frac{1}{N} \sum_{i} (x_i - \hat{x}_i)^2 \qquad \text{PSNR} = 10 \cdot \log_{10}\left(\frac{255^2}{\text{MSE}}\right)$$
some things jumped out at me.
the first few components do most of the heavy lifting. going from rank 1 to rank 20 cuts the error by almost $5\times$. going from rank 100 to rank 200 only halves it. the early gains are massive, then you hit diminishing returns fast.
there's no magic number. i kept expecting to find some rank where the image suddenly "clicks" into looking good. that doesn't happen. quality improves smoothly - you just pick where on the curve you're comfortable.
below 20 dB it looks rough. around 25 dB it starts looking fine for most purposes. the rank 50-100 range is the sweet spot for this image.
the decay curve tells the whole story
svdex plots the singular values for all three channels. the shape is always the same - a steep initial drop, then a long flat tail.
the red channel's first singular value was ~79,000. by the 10th, it dropped to ~7,400. blue started highest at ~128,000 (lots of sky in the test image). by the time you're past the first hundred or so values, everything is close to zero.
this decay is the entire reason svd compression works. if singular values were spread out evenly, throwing any of them away would hurt equally and compression would be pointless. the steep drop means most of them barely matter.
one implementation detail that matters
when running experiments across multiple ranks, compute svd once and reuse it:
pub fn compress_with_svds(svds: &[SvdResult; 3], k: usize) -> [Array2<f64>; 3] {
    [
        low_rank_approx(&svds[0], k),
        low_rank_approx(&svds[1], k),
        low_rank_approx(&svds[2], k),
    ]
}

the svd factorization is $O(\min(m, n) \cdot mn)$ - that's the expensive part. truncation is just slicing arrays. computing svd nine times instead of three because you forgot to cache it is a mistake you only make once (i made it once).
what stuck with me
the eckart-young theorem went from "cool abstract result" to something i can see. you look at a rank-50 compressed image and know - mathematically, provably - that no other 50-component approximation could look better. that's wild.
and the singular value decay curve is the single most informative thing about any matrix you're trying to compress. everything about the quality-size tradeoff is encoded in that shape.
the source code is at github.com/aidantrabs/svdex.



saba: production vpc infrastructure with terraform
2026-03-09
the problem
you need to run workloads on aws. you could click through the console and manually create a vpc, subnets, route tables, security groups - but then what? you can't reproduce it, you can't version it, and you definitely can't tear it down and rebuild it with confidence.
terraform solves this. you describe your infrastructure as code, and terraform figures out how to make reality match your description. saba is a terraform project that provisions a production-ready, multi-az vpc on aws - the kind of network architecture you'd actually use in a real environment.
terraform fundamentals
before building anything, three concepts matter:
state is how terraform tracks what it manages. it maps your config to real aws resources. lose the state file and terraform has no idea those resources belong to it - you'd have to import them manually or risk creating duplicates. this is why remote state backends exist.
providers are plugins that teach terraform how to talk to specific apis. terraform core is just an engine - it knows nothing about aws, azure, or anything else until you configure a provider.
resources vs data sources - resources are things terraform manages (create, update, delete). data sources are read-only lookups of things that already exist outside your config.
networking from first principles
cidr blocks and ip addressing
a cidr block defines a range of ip addresses. 10.0.0.0/16 means the first 16 bits are the network prefix, leaving 16 bits for host addresses - that's 65,536 addresses.
a vpc needs a cidr block to define what ip range is available. subnets carve that range into smaller pieces:
VPC: 10.0.0.0/16 (65,536 addresses)
├── public subnet a:  10.0.1.0/24  (256 addresses)
├── public subnet b:  10.0.2.0/24  (256 addresses)
├── private subnet a: 10.0.10.0/24 (256 addresses)
└── private subnet b: 10.0.20.0/24 (256 addresses)

public vs private subnets
the distinction is purely about routing:

a public subnet has a route table that points 0.0.0.0/0 to an internet gateway. resources get public ips and can communicate with the internet bidirectionally.
a private subnet has no route to an internet gateway. resources can't be reached from the internet.

anything that doesn't need to face the public internet belongs in a private subnet - application servers, databases, internal services. this is the most basic form of network isolation.
routing - igw vs nat gateway
public subnets route to an internet gateway (igw) so the internet can reach them. private subnets route to a nat gateway so they can reach the internet but the internet can't reach them.
the nat gateway translates private ips to its own public ip and only allows responses to outbound requests back in. the traffic flow looks like:
public:  Internet ⇄ IGW ⇄ EC2 (public IP)
private: Private EC2 → NAT → IGW → Internet
         Internet ✖→ Private EC2

private resources still need outbound internet access - pulling updates, container images, calling external apis. the nat gateway enables this without exposing them to inbound traffic.
a nat gateway needs an elastic ip (eip) because it provides a static public address. if you recreated the nat gateway without one, you'd get a random ip and break any ip-based allowlists.
building the vpc module
the networking module creates everything: vpc, internet gateway, four subnets across two availability zones, elastic ips, nat gateways, and all the route tables and associations.
the root module orchestrates the child modules:
module "networking" {
    source      = "./modules/networking"
    environment = var.environment
    vpc_cidr    = var.vpc_cidr
    az_a        = var.az_a
    az_b        = var.az_b
}

module "bastion" {
    source           = "./modules/bastion"
    environment      = var.environment
    vpc_id           = module.networking.vpc_id
    subnet_id        = module.networking.public_subnet_a_id
    instance_type    = var.instance_type
    public_key_path  = var.public_key_path
    allowed_ssh_cidr = var.allowed_ssh_cidr
}

notice how the bastion module references module.networking.vpc_id and module.networking.public_subnet_a_id. terraform builds a dependency graph from these references and handles sequencing automatically - vpc first, then subnets, then anything that depends on them.
multi-az for high availability
subnets are placed in two availability zones (us-east-1a and us-east-1b). if one az has an outage, resources in the other az continue running. each az gets its own nat gateway so private subnets aren't sharing a single point of failure:
┌─────────────────────┐    ┌─────────────────────┐
│   us-east-1a        │    │   us-east-1b        │
│                     │    │                     │
│  ┌───────────────┐  │    │  ┌───────────────┐  │
│  │  public_a     │  │    │  │  public_b     │  │
│  │  NAT-A + EIP  │  │    │  │  NAT-B + EIP  │  │
│  └───────────────┘  │    │  └───────────────┘  │
│         ▲           │    │         ▲           │
│  ┌───────────────┐  │    │  ┌───────────────┐  │
│  │  private_a    │  │    │  │  private_b    │  │
│  │  routes here  │  │    │  │  routes here  │  │
│  └───────────────┘  │    │  └───────────────┘  │
└─────────────────────┘    └─────────────────────┘

the bastion host
a vpc with no compute is just an empty network. the bastion host is an ec2 instance in the public subnet that acts as a secure jump point to reach private resources.
security groups
security groups are virtual firewalls attached to individual resources. they're stateful - if you allow inbound traffic on a port, the return traffic is automatically allowed without an explicit outbound rule.
this differs from network acls (nacls), which operate at the subnet level, support both allow and deny rules, and are stateless.
the bastion's security group allows inbound ssh (port 22) and all outbound traffic:
resource "aws_security_group" "bastion" {
    name        = "${var.environment}-bastion-sg"
    vpc_id      = var.vpc_id

    ingress {
        from_port   = 22
        to_port     = 22
        protocol    = "tcp"
        cidr_blocks = [var.allowed_ssh_cidr]
    }

    egress {
        from_port   = 0
        to_port     = 0
        protocol    = "-1"
        cidr_blocks = ["0.0.0.0/0"]
    }
}

in production, you'd restrict allowed_ssh_cidr to your ip rather than 0.0.0.0/0.
dynamic ami lookup
rather than hardcoding an ami id (which varies by region and changes over time), the bastion uses a data source to find the latest amazon linux 2023 image dynamically:
data "aws_ami" "bastion" {
    most_recent = true
    owners      = [var.ami_owner]

    filter {
        name   = "name"
        values = [var.ami_name_filter]
    }
}

this also makes it easy to swap operating systems by overriding the variables:
# amazon linux 2023 (default)
terraform apply

# ubuntu 24.04
terraform apply \
    -var="ami_name_filter=ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-*" \
    -var="ami_owner=099720109477"

connecting
ssh into the bastion, then jump to private resources:
# direct connection to bastion
ssh -i ~/.ssh/bastion-key ec2-user@$(terraform output -raw bastion_public_ip)

# jump through bastion to a private instance
ssh -i ~/.ssh/bastion-key -J ec2-user@ ec2-user@

the terraform lifecycle
running plan, apply, and destroy is the full lifecycle. a few things stood out while working through it.
dependency resolution is automatic. terraform inferred that the subnet depends on the vpc from vpc_id = aws_vpc.main.id. on create, it builds the vpc first. on destroy, it tears down the subnet first. you never specify ordering manually.
(known after apply) values are things aws generates at creation time - arns, ids, availability zones. they can't be known until the resource exists, so terraform marks them as pending.
state is everything. after apply, terraform writes a terraform.tfstate file. this is how it knows what to update or destroy. the state file is the single source of truth for what terraform manages.
credentials matter early. my first terraform plan failed because i was logged into the terraform cli but not aws. the error was clear enough - no valid credential sources found. logging into aws fixed it immediately.
what i learned
this project was about understanding vpc networking from first principles rather than clicking through the aws console. the key takeaways:

network isolation is just routing. public vs private is determined by whether a route table points to an igw or a nat gateway
nat gateways are the bridge that lets private resources reach the internet without being reachable from it
multi-az is about eliminating single points of failure, including having one nat gateway per az
security groups are stateful firewalls at the resource level. nacls are stateless firewalls at the subnet level
terraform's dependency graph handles sequencing automatically from resource references
state files are critical infrastructure - treat them accordingly

the source code is at github.com/aidantrabs/saba.



designing a feature flag control plane
2026-03-09
i've been thinking about feature flags a lot lately. not the "just use an if statement" kind - the kind where you need to roll out a payment flow to 5% of users in canada on the premium plan, watch it for a week, then crank it to 50% without touching a deploy pipeline. the kind where someone on your team can flip a kill switch at 2am when something goes sideways.
so i'm building switchboard - a feature flag control plane. this post is the research and design thinking before i write a single line of code.
why build one
the obvious question. launchdarkly exists. unleash exists. flagsmith exists. why build another one?
partly because i want to understand the problem space deeply - the same way you don't really understand databases until you've tried to write one. partly because most feature flag systems are either too simple (a json file you check into your repo and pray) or too complex (a whole platform with pricing tiers and a sales team). i want to find the middle ground: something an engineering team could self-host, understand completely, and extend when they need to.
the target is internal platform teams. the kind of team that runs a handful of microservices and wants centralized flag management without sending their evaluation data to a third-party saas. the kind of team where "we need to be able to run this air-gapped" is a real requirement and not just a checkbox on a compliance form.
the evaluation problem
i started by researching how flag evaluation actually works under the hood. it sounds simple - "is this flag on?" - until you start layering requirements.
here's the evaluation order i've landed on after reading through how launchdarkly, unleash, and openfeature approach it:

if the flag is globally disabled, return the default variant. easy.
evaluate targeting rules in priority order. each rule has conditions (AND logic) and a served variant. first match wins.
if no rules match but there's a rollout percentage, hash the user into a deterministic bucket and check if they're under the threshold.
if nothing hits, return the default variant.

the rollout hashing is the part that tripped me up. you can't use Math.random() - the same user needs to get the same result every time for the same flag. otherwise you get someone who sees the new checkout flow, refreshes the page, and gets the old one. that's worse than not having flags at all.
the standard approach is consistent hashing. take the flag key + user id, run it through something like murmurhash3, mod 100, check if the result lands under your rollout percentage:
hash("new-checkout" + "user-123") mod 100 = 37
rollout = 50%
37 < 50 → user gets the "on" variant

same inputs, same output, every time. and here's the property that took me a minute to appreciate: if you change the rollout from 50% to 60%, user-123 still gets "on" - you're only adding new users, never removing existing ones. that monotonicity matters a lot more than i initially realized.
hexagonal architecture (or: am i overengineering this?)
i'm going with spring boot for the server, but i want to try something i've been reading about for a while - hexagonal architecture. the idea is that your business logic lives in a core that knows nothing about the outside world. no spring annotations, no jpa, no kafka. just plain java.
domain/          ← pure java. no framework imports. ever.
application/
  port/
    input/       ← interfaces: what the system can do
    output/      ← interfaces: what the system needs
  service/       ← use case implementations
adapter/
  input/rest/    ← spring controllers
  output/
    persistence/ ← jpa (implements output ports)
    messaging/   ← kafka (implements output ports)
    cache/       ← redis (implements output ports)

the domain says "i need to save a flag" by defining an interface. a jpa adapter implements that interface. the domain never knows jpa exists. want to swap postgres for something else? write a new adapter, domain doesn't change.
i'll be honest - part of me thinks this is overkill for a project i'm building from scratch. "just put @Entity on your domain class, it's fine." but i keep reading post-mortems from teams that started that way and regretted it two years later when their domain was welded to hibernate. i'd rather pay the cost of indirection now while the codebase is small and i can actually understand the boundaries.
the plan is to enforce this with archunit tests - if someone (me, inevitably) accidentally imports a spring annotation in the domain layer, the build fails. trust but verify, especially when you don't trust yourself.
real-time updates: kafka or bust (or maybe not)
when someone toggles a flag, every service consuming that flag needs to know. the naive approach is polling - every sdk hits the server every few seconds asking "anything change?" this works, it's simple, but it's wasteful and adds latency.
after looking at how other systems handle this, i'm planning to use kafka:

flag gets toggled → database updated
application service publishes a change event
kafka carries it to a topic per project/environment
sdks consuming that topic update their local cache immediately

but here's my concern: kafka is heavy. for a small team just trying out feature flags, "also run kafka" is a tough ask. so the sdk needs a fallback - polling on a configurable interval if kafka isn't available. and if the server itself is unreachable, use the last known cached state. graceful degradation at every level.
this is the part of the design i'm least confident about. distributed cache invalidation is one of those problems that sounds straightforward and then eats your weekend. but the alternative - a network round-trip for every flag evaluation in a hot code path - isn't acceptable.
the sdk layer cake
i want the sdk to work in three modes, layered on top of each other:
pure java sdk: zero spring dependencies. construct a client with a builder, pass in your api url, call isEnabled("flag-key", context). works in any jvm application - spring, dropwizard, plain old public static void main.
spring boot starter: wraps the sdk with auto-configuration. one dependency in your build.gradle.kts, two lines in application.yml, and you get a wired-up client bean, health indicators, metrics, the works. zero boilerplate.
openfeature provider: for teams that don't want to couple to a proprietary api. openfeature is an emerging standard for feature flag evaluation - you code against the standard interface, swap providers behind it. switchboard becomes just another provider you can plug in or rip out.
and then there's local mode. this one i feel strongly about. for development and testing, the sdk should load flags from a json file:
{
    "flags": {
        "new-checkout": { "enabled": true, "variant": "on" },
        "dark-mode": { "enabled": false, "variant": "off" }
    }
}

check it into your repo, use it in tests, run in ci with no external dependencies. if your feature flag system requires a running server to run unit tests, something has gone wrong.
the data model question
this took me a few iterations on paper. the key realization: a flag's definition is project-scoped, but its state is per-environment.
"new-checkout" exists once as a concept - it has a key, a name, some variants. but it can be enabled in dev, 50% rolled out in staging, and disabled in production, each with completely different targeting rules. this is how teams actually work. you don't want a flag that's either globally on or globally off everywhere.
so the model splits into FeatureFlag (the definition) and FlagEnvironmentConfig (the per-environment state). targeting rules hang off the environment config, not the flag itself. this felt weird at first but the more i thought about it the more it made sense - you target differently in dev vs production.
i'm also planning for four flag types: release (ship a feature incrementally), experiment (a/b testing), operational (circuit breakers, maintenance mode), and permission (entitlement gating). they all evaluate the same way mechanically, but the type gives you metadata for lifecycle management - release flags should eventually be cleaned up, operational flags might live forever.
four interfaces, one source of truth
the system needs to be operable through:

rest api: the source of truth. write endpoints for management, read endpoints for sdks. separated so they could theoretically scale independently.
java sdk: how services consume flags. evaluates locally from cache, syncs in the background.
cli: for terminal-first workflows. switchboard flags toggle new-checkout --env production. json output for scripting.
dashboard: a react spa for visual management. tanstack router, tanstack query, tanstack table, tailwind. it's a pure client-side app that talks to the rest api.

the dashboard being optional is deliberate. the api and cli are primary. if your team lives in the terminal, you never need to open a browser. the dashboard is there for the people who want to see a rollout slider and a toggle switch.
for the dashboard stack specifically - i'm going heavy on tanstack. router gives type-safe routing with inferred params (no manual type assertions). query handles all the server state with caching and background refetching. table gives headless primitives for the flag lists and audit logs. it's a lot of one ecosystem but they're designed to work together and it avoids the usual glue code.
things i haven't figured out yet
stale flag detection. flags accumulate. teams create them for a release, ship it, forget to clean up. i want to surface warnings when flags haven't been modified in a while, but the ux of "hey this flag might be dead" without being annoying is an unsolved problem in my head.
audit log growth. every write operation should produce an audit entry with before/after state as json. great for debugging, but the table grows unboundedly. partitioning by time and project is the obvious answer but i haven't thought through the query patterns enough yet.
the kafka question. i keep going back and forth. kafka gives me real-time propagation but it's a heavy dependency. server-sent events would be simpler for small deployments. maybe i support both and let teams pick. or maybe i start with polling and add kafka later. this is the kind of decision that's hard to reverse so i want to get it right.
how small can each commit actually be? i'm planning to build this in tiny increments - domain model first, then ports, then adapters, then sdk, then cli, then dashboard. each step should be a handful of files. i've never actually tried to be this disciplined about it on a project this size. we'll see if i can stick to it.
the end goal
clone the repo, run docker compose up, and have a fully working feature flag platform - server, dashboard, database, cache, message broker, and a demo service showing flags being evaluated in real time. under two minutes from git clone to toggling your first flag.
that's the bar. if it takes longer than that to evaluate switchboard, the developer experience has failed.
the source code is at github.com/aidantrabs/switchboard.



bitgrid: elementary cellular automata
2026-03-08
what is a cellular automaton?
a cellular automaton is a discrete computational system. you have a row of cells, each in one of a finite number of states, and a rule that determines how each cell updates based on its neighborhood.
an elementary cellular automaton is the simplest nontrivial case:

the grid is one-dimensional - a row of cells
each cell has exactly two states: 0 or 1
each cell's next state depends on three cells: itself and its two immediate neighbors (left, center, right)

at each time step, every cell reads the triple (left, self, right) and produces a new value. that's the entire system.
why exactly 256 rules?
this is pure combinatorics.
a neighborhood is a triple of binary values. each value is 0 or 1, so there are:
$$2^3 = 8 \text{ possible neighborhood patterns}$$
the 8 patterns, ordered from 111 down to 000:
pattern 111 110 101 100 011 010 001 000

index 7 6 5 4 3 2 1 0

a rule assigns an output bit (0 or 1) to each of these 8 patterns. a rule is a function:
$$f: \{0,1\}^3 \rightarrow \{0,1\}$$
the number of such functions is:
$$2^8 = 256$$
there are exactly 256 elementary cellular automata. no more, no less.
binary rule encoding
wolfram's naming scheme is elegant. the rule number, expressed in binary, directly encodes the output table.
take rule 30:
$$30_{10} = 00011110_2$$
each bit corresponds to one neighborhood pattern:
pattern 111 110 101 100 011 010 001 000

bit index 7 6 5 4 3 2 1 0
rule 30 output 0 0 0 1 1 1 1 0

to compute the output for a neighborhood (l, c, r):

interpret the triple as a 3-bit number: $i = l \cdot 4 + c \cdot 2 + r$
extract bit $i$ from the rule number: $\text{output} = (\text{rule} \gg i) \;\&\; 1$

the entire rule engine fits in one expression. in rust:
pub fn apply(&self, left: u8, center: u8, right: u8) -> u8 {
    let index = (left << 2) | (center << 1) | right;
    (self.number >> index) & 1
}

two lines. no lookup table, no conditionals. the rule number is the lookup table, and bitwise operations are the query.
running a simulation
the automaton starts with a row of zeros and a single 1 in the center. each generation, we apply the rule to every cell using its left and right neighbors:
pub fn step(&mut self) {
    let len = self.cells.len();
    let prev = self.cells.clone();
    for i in 0..len {
        let left = if i == 0 { 0 } else { prev[i - 1] };
        let center = prev[i];
        let right = if i == len - 1 { 0 } else { prev[i + 1] };
        self.cells[i] = self.rule.apply(left, center, right);
    }
}

boundary cells see 0 beyond the edge. the previous state is cloned so updates within a generation don't interfere with each other.
three rules, three behaviors
starting from a single cell, we evolve three rules and get radically different results.
rule 30 - chaos
                    █
                   ███
                  ██  █
                 ██ ████
                ██  █   █
               ██ ████ ███
              ██  █    █  █
             ██ ████  ██████
            ██  █   ███     █
           ██ ████ ██  █   ███

rule 30 produces chaotic, aperiodic structure. the left side appears disordered while the right side shows faint regularity. despite being fully deterministic, wolfram conjectured it may function as a pseudorandom number generator. no repeating period has been found in the center column.
rule 90 - the sierpinski triangle
                    █
                   █ █
                  █   █
                 █ █ █ █
                █       █
               █ █     █ █
              █   █   █   █
             █ █ █ █ █ █ █ █
            █               █
           █ █             █ █

rule 90 is equivalent to xor(left, right) - the center cell doesn't even matter. this trivial operation produces the sierpinski triangle, a well-known fractal with hausdorff dimension $\log_2(3) \approx 1.585$.
the self-similarity is exact: zoom into any triangular region and you find a smaller copy of the whole pattern. a one-bit local operation generating a fractal is one of the most striking results in cellular automata.
we can verify the xor equivalence exhaustively:
#[test]
fn rule_90_is_xor() {
    let rule = Rule::new(90);
    for l in 0..=1u8 {
        for c in 0..=1u8 {
            for r in 0..=1u8 {
                assert_eq!(rule.apply(l, c, r), l ^ r);
            }
        }
    }
}

all 8 inputs confirm it. the center cell is irrelevant.
rule 110 - turing completeness
                    █
                   ██
                  ███
                 ██ █
                █████
               ██   █
              ███  ██
             ██ █ ███
            ███████ █
           ██     ███

rule 110 grows asymmetrically to the left with complex interacting structures. in 2004, matthew cook proved that rule 110 is turing complete - it can simulate any computation given the right initial conditions.
this is one of the most profound results in cellular automata theory. a one-dimensional row of bits with a trivial 8-entry lookup table is capable of universal computation.
measuring complexity
population density
population density is the fraction of live cells in a generation:
$$\rho(t) = \frac{1}{N} \sum_{i=0}^{N-1} c_i(t)$$
shannon entropy
shannon entropy measures the information content of a generation from the frequency of 0s and 1s:
$$H = -\sum_{i} p_i \log_2(p_i)$$
maximum entropy $H = 1.0$ means equal proportions of 0s and 1s. minimum $H = 0.0$ means all cells are in the same state.
comparing rules
running 100 generations on a 201-cell grid:
rule behavior final density mean density final entropy mean entropy

30 chaotic 0.5473 0.2585 0.9935 0.7334
90 fractal 0.0796 0.0622 0.4008 0.3047
110 complex 0.2537 0.1431 0.8171 0.5503
184 simple 0.0050 0.0050 0.0452 0.0452
0 trivial 0.0000 0.0000 0.0000 0.0005
255 trivial 1.0000 0.9900 0.0000 0.0005

the numbers reveal the behavioral classes:

rule 30 converges to density ~0.5 with entropy near 1.0 - maximum disorder
rule 90 maintains low density with periodic entropy oscillations tied to powers of 2
rule 110 sits between order and chaos - moderate density, high but not maximal entropy
rules 0, 184, 255 are degenerate - they either die out or saturate immediately

wolfram classified elementary cellular automata into four classes:

class 1 - evolves to a uniform state (rule 0, rule 255)
class 2 - evolves to periodic or stable structures (rule 90)
class 3 - chaotic, aperiodic behavior (rule 30)
class 4 - complex structures, long-lived transients (rule 110)

the boundary between class 3 and class 4 is where computation lives.
what's next
this is the foundation. from here we can explore:

spatial entropy using block decomposition rather than global frequency
mutual information between successive generations
langton's lambda parameter as a predictor of behavioral class
the full atlas of all 256 rules

the code is minimal - a rule encoder, a grid stepper, analysis functions, and a renderer. everything follows directly from the mathematics. the entire system is deterministic, pure, and fits in a few hundred lines of rust.
the source code is at github.com/aidantrabs/bitgrid.



hello world
2026-03-08
this is the first post built with forge.
why forge
because every developer needs to build their own blog engine at least once.
fn main() {
    println!("hello from forge");
}




building forge
2026-03-08
every developer eventually builds their own blog engine. this is mine.
the stack
forge is a two-part system:

a rust cli that parses markdown, applies templates, and outputs a static site
a minimal frontend built with vite, vanilla typescript, and tailwind v4

the goal was simple: fast builds, tiny output, zero runtime complexity.
why rust

i'm a wannabe rustacean

rust gives us:

fast compilation of markdown to html
zero-cost abstractions for template rendering
a single binary with no runtime dependencies

let posts = load_posts(Path::new("content"));
let renderer = Renderer::new(Path::new("templates"));

for post in &posts {
    let html = renderer.render_post(post, &config);
    fs::write(post_dir.join("index.html"), html)?;
}

the frontend
the entire javascript runtime is under 1kb. it handles:

dark mode toggle via a pull-string ui element
scroll reveal animations with IntersectionObserver
font loading with fout prevention


the best javascript is the javascript you don't ship.

what's next

wasm-powered client-side search
image optimization pipeline
incremental builds

rank	ratio	mse	psnr
1	478.68x	2937.52	13.45 dB
5	95.74x	1265.33	17.11 dB
10	47.87x	878.57	18.69 dB
20	23.93x	620.23	20.21 dB
50	9.57x	361.29	22.55 dB
100	4.79x	212.83	24.85 dB
200	2.39x	92.52	28.47 dB

rule	behavior	final density	mean density	final entropy	mean entropy
30	chaotic	0.5473	0.2585	0.9935	0.7334
90	fractal	0.0796	0.0622	0.4008	0.3047
110	complex	0.2537	0.1431	0.8171	0.5503
184	simple	0.0050	0.0050	0.0452	0.0452
0	trivial	0.0000	0.0000	0.0000	0.0005
255	trivial	1.0000	0.9900	0.0000	0.0005