Field Notes · The Dev Cycle

The Loop Is Coming Apart

jj, Crabbox, Depot, Blacksmith — a quiet contest over the inner dev loop. Every player makes the same promise: zero distance between changing your code and knowing if it works, on compute that isn't yours and isn't precious.

For about fifty years the inner dev loop has been one thing happening in one place. You edit a file. You save it. You build and run it. All on the machine under your hands, against a checkout that lives on its disk. Git made the history distributed in 2005, but the loop itself never left the laptop.

A cluster of recent projects is prying it apart — a version control system, a remote-execution control plane, two CI acceleration companies, and a field guide that refuses to pick a winner. None of them set out to do the same thing. Put them side by side and they are all making the same promise.

The promise

Strip the marketing off any of these tools and the same sentence is underneath: the speed of your feedback should not depend on the machine under your hands.

Every diff should get a warm, isolated, fast machine on demand. The gap between "I changed something" and "I know if it works" should collapse toward zero — and it should collapse on compute that is disposable, so a bad run costs nothing and a parallel run costs nothing more. That's the promise. It is genuinely new. And the moment you take it seriously, the loop has to come apart, because no single laptop can be simultaneously your editor, a fleet of fresh runners, and a warm cache the size of your whole dependency graph.

The interesting part is where the promise is being fought over, and what keeps breaking when you try to keep it.

The venue: GitHub Actions, and the cache nobody could see

The contest is happening in CI/CD, because that's where the loop already half-left the laptop. GitHub Actions is the incumbent venue; GitLab CI/CD is the other big one. This is the arena. And the contested ground inside it turns out to be the most boring-sounding thing imaginable: the cache.

Here is the detail that gives the whole game away. Two different companies — Depot and Blacksmith — independently reverse-engineered the GitHub Actions cache protocol, and each wrote a blog post about it with nearly the same title. Depot's: "We reverse-engineered the GitHub Actions cache so you don't have to." Blacksmith's: "Reverse engineering GitHub Actions cache to make it fast."

Why would two startups burn engineering on the same undocumented protocol? Because GitHub's hosted runners get roughly 1 Gbps of network throughput — about 125 MB/s — and on a fresh runner, restoring the cache is the long pole of the entire build. The throttle is the bottleneck. When your cache restore dominates wall-clock time, warmth becomes the product. So an entire cottage industry formed to sell it back to you, faster, by routing around the cache that GitHub never meant you to see.

Depot and Blacksmith: warmth, sold as a service

The two reverse-engineering crews built different answers to the same question.

Depot (depot.dev) launches ephemeral, single-tenant EC2 runners via webhook, backs them with a distributed S3 cache, and colocates your BuildKit container builders in the same private network as the runners. Cache moves at up to 1000 MiB/s over 12.5 Gbps — roughly 10x GitHub's throughput — and the headline claim is up to 55x faster builds at half the cost.

Blacksmith (blacksmith.sh) — built by three engineers out of CockroachDB and Faire — runs Actions on bare-metal gaming CPUs with about 2x the single-thread performance of GitHub's servers, colocates a warm cache for ~4x faster reads and writes (up to 10x for some), and persists Docker layers on NVMe for 2x–40x faster image builds. The integration is a one-line change: swap runs-on: ubuntu-latest for runs-on: blacksmith-2vcpu-ubuntu-2404.

Above both of them sits the honest meta-truth, and it's exactly what the GitHub Actions cache field guide was written to say: CI caching is not one cache. Native dependency caches win on warm workspaces. Build caches like Incredibuild win when runners are fresh, ephemeral, or churned by timestamps — strongest for C/C++ and Rust. BuildKit is for Docker; Go and JavaScript usually want their tool-specific cache first. There is no single knob. The field guide's refusal to crown a winner is the correct response to a venue where warmth is contested ground and the right move depends entirely on whether your runner is hot or cold.

jj: the diff stops being a verb

So compute and warmth have left the laptop. What about the code itself?

Jujutsu (jj) speaks Git's storage format but discards its mental model. The headline feature sounds like a footnote: there is no staging area, and your working copy is itself a commit. jj snapshots the working copy on every command, so the moment you save a file, that change is a commit — it has an ID, it's in the graph.

In Git, your dirty checkout is a verb: a thing in motion that you must freeze — add, stash, commit — before you can do anything with it. In jj, the dirty checkout is a noun: a first-class, addressable, syncable object that exists whether or not you've decided you're done. That distinction is the hinge of everything that follows. To ship a diff to a remote machine, the diff first has to be a thing.

Crabbox: the loop, made local-first again

Crabbox — by Peter Steinberger (@steipete), creator of OpenClaw — is where the pieces snap together. Its tagline: "A short-lived box for every run." Its promise, verbatim: "Crabbox gives maintainers and agents a fast local loop on shared cloud capacity: lease, sync, run, release."

crabbox run -- pnpm test          # lease a box, sync your dirty tree, run, tear down
crabbox warmup                    # provision and keep it warm
crabbox run --id blue-lobster --  # reuse the warm box

You keep your editor and your git workflow; Crabbox rsyncs your dirty checkout to a leased remote box — it is "local-first and does not require a clean checkout." It seeds remote Git from your base ref, overlays dirty files, skips no-op syncs by fingerprint, and guards against suspicious mass deletions. It does not ask you to commit first. That's jj's addressable diff, made portable.

But here's the move that makes Crabbox more than another runner: it doesn't compete with Depot and Blacksmith — it targets them. Crabbox is a control plane with a provider list, and that list reads like a census of this whole essay: Blacksmith Testbox, E2B, Sprites, Modal, Daytona, Tensorlake, AWS, Hetzner — and islo.dev. Crabbox 0.3.0 literally shipped a "Blacksmith Testbox wrap." It is the layer that makes all of that disposable cloud compute feel local again.

And the convergence runs both ways. Blacksmith's own Testbox — Linux microVMs that warmup then run, syncing with rsync --delete --checksum, subsequent runs in 1–3 seconds, used to reproduce flaky tests across dozens of boxes in parallel — is the exact same move from the runner vendor's side. Two ends of the market, independently, arrived at: warm a box, sync the diff, run the suite.

Haven't we been here before?

Fair question — we have, twice, and it's worth being honest about it. Over the last decade the cloud IDEs (Cloud9, then Gitpod and GitHub Codespaces) tried to lift the loop off the laptop, and Bazel's remote execution and devcontainers tried it for builds. None of them ate the world. The reason is the same for all of them: they asked you to move in. Relocate your whole environment to the cloud, give up your local editor and your shell and fifteen years of muscle memory, and trust that the latency won't drive you insane. The switching cost was the headline feature and the fatal flaw.

This wave inverts the deal. You keep the laptop, the editor, the dirty tree, the muscle memory — and ship only the diff, on demand, to a box that dies when the command exits. It's rsync, not relocation. The commitment is one command, not a migration. That's why both crabbox run and blacksmith testbox start from "your local checkout, as-is" instead of "first, recreate your world over here."

And there's a demand-side reason it's happening now and not in 2016: agents. A coding agent has no laptop to move into and no muscle memory to preserve. It produces dirty diffs at machine speed, needs somewhere isolated to run them where a mistake costs nothing, and needs to leave evidence a human can audit afterward. For the first time the economics point the same direction for the person and the machine — both want a cheap, warm, throwaway box for a single diff. The cloud IDEs were a lifestyle change nobody asked for. A short-lived box per diff is a primitive that, it turns out, everybody needs.

The loop, unbundled

Stack it all up and the inner dev loop has come apart into concerns that no longer have to live on the same machine:

ConcernUsed to beBecomingWho's building it
Editlocal filelocal file (still)you
Versionfreeze the diff firstthe diff is already an objectjj
Transportn/a — it's right heresync the diff to computeCrabbox
Computeyour laptopleased ephemeral runnerDepot, Blacksmith
Warmthwhatever your disk hadengineered cache strategyDepot, Blacksmith, field guide
Isolationnone — same OSa fresh sandbox per runislo.dev, Testbox, E2B

Notice the bottom two rows, because they're the ones that used to be free. Isolation was nothing: your code ran as you, on your OS, and "oops" meant a manual cleanup. Once the box is disposable and the runner is shared, isolation stops being optional and becomes the floor. And once an agent is the thing driving the loop, a requirement appears that no human ever needed: evidence. Terminal output that scrolls away is fine when a person watched it happen live; it's useless when an agent did. Crabbox's "proof for every run" — screenshots, video, JUnit summaries, logs, lease metadata — is the loop growing a memory, because the participant now closing it can't be trusted to remember.

Where this lands

I work on sandboxed runtimes at islo.dev, and we're already one of the providers in that Crabbox list — so I'll say the thing the rest of the field only implies: once the loop is unbundled, the sandbox is the substrate everything else sits on. The diff is mobile, the compute is disposable, the cache is engineered — and the only question left that matters is whether the box your code lands in is fast to start, safe by construction, and warm enough to be useful.

That's also where the promise gets honest. "Zero distance between change and knowing" is only true if the box is already warm and already isolated when your diff arrives. Cold boxes break the promise. Shared boxes break it differently. The reason two companies reverse-engineered the same opaque cache is that warmth is the hard part, and the reason Crabbox and Testbox both reinvented warmup → sync → run is that isolation has to be cheap enough to throw away.

jj made the diff addressable. Crabbox made it portable. Depot and Blacksmith made it fast. The field guide made warmth honest. The remaining job — give every mobile diff a clean, fast, isolated machine to land on — is the one worth getting right.

The loop isn't dying. It's just stopped fitting on one desk.