Document AA-01 Subject — agent-driven data movement Status — verified by execution

The agent plans. Crabbox runs. Airbyte moves. Evidence decides.

An execution model for agents that own data movement — with no rows and no secrets ever entering the prompt. As proof: the agent wrote a job spec, Crabbox (the run dispatcher) leased an isolated islo sandbox, and an Airbyte-contract worker moved 50,000 rows from ClickHouse to DuckDB — four parity checks, one byte-exact checksum. The architecture below is what generalizes it.

The cast — who runs what

driver

The agent

Claude Code, Codex, or your harness. Decides the next bounded run. Never carries rows.

runner

Crabbox

Turns a job spec into an auditable run boundary — lease, scoped env, artifacts, run id.

provider

islo sandbox

A fresh, repo-defined microVM. This proof used islo; any provider can sit behind Crabbox.

mover

Airbyte worker

Reads the source, writes the target — inside the box, outside the model context.

judge

The evidence

Logs, JUnit, counts, checksums. The only thing the agent is allowed to reason from.

§01 the loop

Intent becomes evidence, one boundary at a time.

Scroll. The board is the lecture — each step lights its own lane while you read what happens, who owns it, and what crosses the boundary.

Flow board · six boundaries 1 / 6 · plan
Agentic Airbyte execution flow Goal enters the agent, the agent calls Crabbox, Crabbox injects a credential profile into a worker, Airbyte moves data from source to target, evidence returns to the agent, and the agent repairs. Goal + policy The agent writes the spec chooses next action Crabbox lease + run collect artifacts Profile scoped env only Worker repo + env Airbyte runs here Source API / DB Evidence logs / JUnit counts / checksums Target warehouse
scroll, or step through
01owner · the agent

The agent compresses intent into a job spec.

It reads the goal, repo state, schemas, previous evidence, and policy. It emits one bounded run. It does not move data.

Ingoal + policy + repo state
Outjob JSON + Crabbox command
02owner · Crabbox

Crabbox creates the execution boundary.

It finds or creates a ready worker, hydrates the repo, attaches cache, and starts the requested command under a durable run id.

Inpool id + command + artifact rules
Outworker lease + run id
03owner · profile + Crabbox

Credentials enter as scoped environment.

The prompt carries only a profile name. Crabbox resolves it into allowed variables inside the worker — never back into the model context.

Incredential_profile + allow_env
Outworker env + redacted env report
04owner · Airbyte

Airbyte moves data where the agent cannot see it.

The connector reads the source and writes the target inside the worker. Rows never pass through a prompt.

Insource ref + target ref + connector config
Outtarget writes + sync status
05owner · Crabbox + worker

The run returns evidence, not guesses.

Logs, metrics, JUnit, counts, and redacted config come back under one run id. The next decision becomes auditable.

Inexit code + reports + artifacts
Outstructured evidence bundle
06owner · the agent

The agent repairs from the failing boundary.

Finish, retry, repair, or alert. The next command changes one bounded input and keeps the same audit shape. Then the loop closes.

Inevidence + failure owner
Outfinish · retry · repair job · alert
§02 the evidence

A real run, audited four ways.

“It ran” is not proof. After the move, the worker compares source and destination — and exits non-zero unless all four checks pass. This is the actual output of the run on the receipt.

exhibit a · four parity checks

1 · Row count

PASS

Every record made it across. No drops, no duplicates.

source50,000 destination50,000

2 · Revenue sum

PASS

The decimal aggregate survives the type boundary exactly.

source$611,815.02 destination$611,815.02

3 · Per-type tally

PASS

Group-by counts agree for all five event types.

page_view29,958 = 29,958 search7,598 = 7,598 add_to_cart6,014 = 6,014 checkout3,928 = 3,928 purchase2,502 = 2,502

4 · Content SHA-256

PASS

Byte-exact: a hash of every (event_id, type, revenue) tuple, sorted. The strongest check.

sourcea82239cc…c73ebcb destinationa82239cc…c73ebcb
exhibit b · schema map

The schema crossed intact.

The worker reads the source catalog and maps each ClickHouse type to a DuckDB type before a single row moves — decimals stay decimals, datetimes stay timestamps.

columnClickHouse · sourceDuckDB · destination
event_idUInt64BIGINT
user_idUInt32BIGINT
session_idUInt32BIGINT
event_typeLowCardinality(String)VARCHAR
channelLowCardinality(String)VARCHAR
deviceLowCardinality(String)VARCHAR
countryLowCardinality(String)VARCHAR
urlStringVARCHAR
revenueDecimal(12, 2)DECIMAL(18,2)
tsDateTimeTIMESTAMP
exhibit c · provenance

Where it ran — full provenance.

Not a laptop. A fresh, repo-defined islo microVM, captured live and then torn down.

sandbox id019ea238-1f81-7950-80a9-1b80a5e0b556
imagedocker.io/library/python:3.12
kernelLinux 6.16.9+ · x86_64
vCPU / memory4 vCPU · 3930 MB
compute regionca.compute.islo.dev (Canada)
source engineClickHouse 26.6.1.472
destination engineDuckDB 1.5.3
bytes read9,450,617 (~9.0 MB)
batches10 × 5,000 records
read / write split0.186 s read · 0.145 s write
exhibit d · the run, phase by phase
1bootstrapClickHouse binary + Python venv into the fresh box
2boot_clickhouselocal server up, answering on HTTP
3seed50,000 deterministic events — the system of record
4discoversource catalog read, types mapped
5write_setuptyped destination table created from the mapped catalog
6syncRECORD batches out, bulk-load in — 0.332 s
7verifyfour parity checks; non-zero exit unless all pass
8analyticsqueries on the destination prove it's usable
9emitmetrics.json + STATE — evidence for the loop

Nine ::CRABBOX_PHASE:: markers split the job into steps the orchestrator can time, attach evidence to, and reason over.

exhibit e · raw log

The raw tail, unedited.

Straight from the sandbox. Full artifacts live in the repo.

airbyte-etl · /workspace/agentic-airbyte/poc · isloexit 0
::CRABBOX_PHASE::bootstrap
[e2e] installing clickhouse static binary
::CRABBOX_PHASE::boot_clickhouse
[e2e] ClickHouse up: 26.6.1.472
::CRABBOX_PHASE::seed
[seed] analytics.events ready: rows=50000 total_revenue=611815.02
::CRABBOX_PHASE::discover
{"type":"LOG","log":{"level":"INFO","message":"discovered stream 'events'"}}
::CRABBOX_PHASE::write_setup
::CRABBOX_PHASE::sync
{"type":"LOG","log":{"level":"INFO","message":"synced 5000/50000 records"}}
          … 10 batches …
{"type":"LOG","log":{"level":"INFO","message":"synced 50000/50000 records"}}
::CRABBOX_PHASE::verify
::CRABBOX_PHASE::analytics
::CRABBOX_PHASE::emit
{"type":"STATE","state":{"records_moved":50000,"status":"SUCCEEDED"}}
{"type":"LOG","log":{"message":"sync SUCCEEDED: moved 50000 rows in 0.549s (150700.2 rows/s); checks_passed=True"}}
EXIT=0

What this proves — and what it doesn't.

This run uses the Airbyte source→destination contract on a custom-connector (Airbyte CDK) path, not a full packaged connector deployment — that's what lets it run self-contained in a sandbox in under a second. What it does prove is the part that matters for agentic data movement: a goal-driven worker can be dispatched into an isolated box, move real typed data end-to-end, and return evidence strong enough — a byte-exact checksum — for a harness to trust the result and decide what to do next. The full proof appendix walks every phase.

§03 the contracts

A useful agent output is not prose.

It is three contracts: a spec the agent writes, a handoff a runner can execute, and a repair rule that keeps the proof shape intact.

ai-agent-dispatch.sh
# Goal: sync CRM accounts into the warehouse safely.

crabbox pool ensure example-org/data-movement/main/provider/linux/etl \
  --min-ready 3 --create -- --cache-volume airbyte-etl

cat > .crabbox/generated/accounts-sync.json <<'JSON'
{
  "movement": "source_to_target",
  "source_ref": "source.crm.accounts",
  "target_ref": "warehouse.analytics.accounts",
  "credential_profile": "etl-warehouse",
  "allow_env": ["AIRBYTE_*", "SOURCE_*", "TARGET_*"],
  "idempotency_key": "accounts_sync:daily",
  "retry": { "max_attempts": 2, "when": ["rate_limit", "transient_network"] },
  "validation": ["row_count", "schema_drift", "freshness"],
  "artifacts": ["reports/**", "metrics.json", "redacted-config.json"],
  "redact": ["password", "token", "secret"]
}
JSON

crabbox run --pool example-org/data-movement/main/provider/linux/etl \
  --shell 'python -m workers.airbyte_sync --config .crabbox/generated/accounts-sync.json' \
  --allow-env 'AIRBYTE_*,SOURCE_*,TARGET_*' \
  --env-from-profile etl-warehouse \
  --artifact-glob 'reports/**,metrics.json,redacted-config.json' \
  --junit reports/

crabbox results <run-id> --json
crabbox artifacts download <run-id> --out evidence/<run-id>

The spec names things. plan

The agent writes references and rules — source, target, profile name, allowlists, validation, retry, redaction, artifact globs. It never writes secret values or row payloads.

raw secrets · copied rows · prompt transcripts

The handoff is executable. lease

Crabbox receives a pool id, command, profile name, and artifact contract. It returns a run id and a bounded evidence bundle — nothing else crosses.

unbounded shell · missing run id · missing artifact capture

Repair preserves the proof shape. repair

The next run changes one bounded input tied to the failing owner. Validation, redaction, and artifact capture stay on, so every attempt stays comparable.

changing many inputs at once · retrying partial writes without idempotency
§04 the failure map

First find the owner. Then read the signal.

Failures are not mysteries — they are boundary breaks. Each class tells you where to look first and what you are allowed to change.

classownersignal to readsmallest repair
F1 · Planbefore lease the agent spec diff fix refs, profile, validation, or retry policy — then rerun
F2 · Capacitybefore command Crabbox pool + lease status fix pool capacity, image, cache, or repo hydration
F3 · Credentialsbefore sync profile redacted env report fix the profile mapping or allow_env
F4 · Connectorduring sync Airbyte / source connector log fix auth scope, schema, API limit, or cursor
F5 · Partial writeafter write worker / target target counts + sync state verify idempotency before any retry
F6 · Validationafter proof the next plan JUnit + metrics compile a repair job from the failing checks, or alert

triage rule — owner + signal = the one bounded input the next run is allowed to change.

§05 run it yourself

One command. Same bytes, every time.

The seed is deterministic, so the SHA-256 on the receipt is reproducible. Borrow a box, hydrate it from the repo, run the proof, tear it down.

The islo way — persistent box

Lease a sandbox, hydrate from the repo, run the proof.

your machine
islo use airbyte-etl \
  --config poc/islo.yaml \
  --source github://zozo123/agentic-airbyte \
  -- bash poc/run_e2e.sh

The crabbox way — ephemeral worker

Dispatch it as a governed run with evidence capture.

your harness
crabbox run --pool org/data-movement/main/... \
  --shell 'bash poc/run_e2e.sh' \
  --artifact-glob 'poc/reports/**' \
  --junit poc/reports/

The loop is simple because the boundaries are hard.

Agent plans. Crabbox runs. Airbyte moves. Evidence returns. Repeat only when the evidence says what changed.

0 rows through the model 1 run id per attempt 4 lanes kept apart 6 failure classes with owners