The agent compresses intent into a job spec.
It reads the goal, repo state, schemas, previous evidence, and policy. It emits one bounded run. It does not move data.
An execution model for agents that own data movement — with no rows and no secrets ever entering the prompt. As proof: the agent wrote a job spec, Crabbox (the run dispatcher) leased an isolated islo sandbox, and an Airbyte-contract worker moved 50,000 rows from ClickHouse to DuckDB — four parity checks, one byte-exact checksum. The architecture below is what generalizes it.
Claude Code, Codex, or your harness. Decides the next bounded run. Never carries rows.
Turns a job spec into an auditable run boundary — lease, scoped env, artifacts, run id.
A fresh, repo-defined microVM. This proof used islo; any provider can sit behind Crabbox.
Reads the source, writes the target — inside the box, outside the model context.
Logs, JUnit, counts, checksums. The only thing the agent is allowed to reason from.
Scroll. The board is the lecture — each step lights its own lane while you read what happens, who owns it, and what crosses the boundary.
It reads the goal, repo state, schemas, previous evidence, and policy. It emits one bounded run. It does not move data.
It finds or creates a ready worker, hydrates the repo, attaches cache, and starts the requested command under a durable run id.
The prompt carries only a profile name. Crabbox resolves it into allowed variables inside the worker — never back into the model context.
The connector reads the source and writes the target inside the worker. Rows never pass through a prompt.
Logs, metrics, JUnit, counts, and redacted config come back under one run id. The next decision becomes auditable.
Finish, retry, repair, or alert. The next command changes one bounded input and keeps the same audit shape. Then the loop closes.
“It ran” is not proof. After the move, the worker compares source and destination — and exits non-zero unless all four checks pass. This is the actual output of the run on the receipt.
Every record made it across. No drops, no duplicates.
The decimal aggregate survives the type boundary exactly.
Group-by counts agree for all five event types.
Byte-exact: a hash of every (event_id, type, revenue) tuple, sorted. The strongest check.
The worker reads the source catalog and maps each ClickHouse type to a DuckDB type before a single row moves — decimals stay decimals, datetimes stay timestamps.
| column | ClickHouse · source | DuckDB · destination |
|---|---|---|
| event_id | UInt64 | BIGINT |
| user_id | UInt32 | BIGINT |
| session_id | UInt32 | BIGINT |
| event_type | LowCardinality(String) | VARCHAR |
| channel | LowCardinality(String) | VARCHAR |
| device | LowCardinality(String) | VARCHAR |
| country | LowCardinality(String) | VARCHAR |
| url | String | VARCHAR |
| revenue | Decimal(12, 2) | DECIMAL(18,2) |
| ts | DateTime | TIMESTAMP |
Not a laptop. A fresh, repo-defined islo microVM, captured live and then torn down.
| sandbox id | 019ea238-1f81-7950-80a9-1b80a5e0b556 |
| image | docker.io/library/python:3.12 |
| kernel | Linux 6.16.9+ · x86_64 |
| vCPU / memory | 4 vCPU · 3930 MB |
| compute region | ca.compute.islo.dev (Canada) |
| source engine | ClickHouse 26.6.1.472 |
| destination engine | DuckDB 1.5.3 |
| bytes read | 9,450,617 (~9.0 MB) |
| batches | 10 × 5,000 records |
| read / write split | 0.186 s read · 0.145 s write |
Nine ::CRABBOX_PHASE:: markers split the job into steps the orchestrator can time, attach evidence to, and reason over.
Straight from the sandbox. Full artifacts live in the repo.
::CRABBOX_PHASE::bootstrap [e2e] installing clickhouse static binary ::CRABBOX_PHASE::boot_clickhouse [e2e] ClickHouse up: 26.6.1.472 ::CRABBOX_PHASE::seed [seed] analytics.events ready: rows=50000 total_revenue=611815.02 ::CRABBOX_PHASE::discover {"type":"LOG","log":{"level":"INFO","message":"discovered stream 'events'"}} ::CRABBOX_PHASE::write_setup ::CRABBOX_PHASE::sync {"type":"LOG","log":{"level":"INFO","message":"synced 5000/50000 records"}} … 10 batches … {"type":"LOG","log":{"level":"INFO","message":"synced 50000/50000 records"}} ::CRABBOX_PHASE::verify ::CRABBOX_PHASE::analytics ::CRABBOX_PHASE::emit {"type":"STATE","state":{"records_moved":50000,"status":"SUCCEEDED"}} {"type":"LOG","log":{"message":"sync SUCCEEDED: moved 50000 rows in 0.549s (150700.2 rows/s); checks_passed=True"}} EXIT=0
This run uses the Airbyte source→destination contract on a custom-connector (Airbyte CDK) path, not a full packaged connector deployment — that's what lets it run self-contained in a sandbox in under a second. What it does prove is the part that matters for agentic data movement: a goal-driven worker can be dispatched into an isolated box, move real typed data end-to-end, and return evidence strong enough — a byte-exact checksum — for a harness to trust the result and decide what to do next. The full proof appendix walks every phase.
It is three contracts: a spec the agent writes, a handoff a runner can execute, and a repair rule that keeps the proof shape intact.
# Goal: sync CRM accounts into the warehouse safely. crabbox pool ensure example-org/data-movement/main/provider/linux/etl \ --min-ready 3 --create -- --cache-volume airbyte-etl cat > .crabbox/generated/accounts-sync.json <<'JSON' { "movement": "source_to_target", "source_ref": "source.crm.accounts", "target_ref": "warehouse.analytics.accounts", "credential_profile": "etl-warehouse", "allow_env": ["AIRBYTE_*", "SOURCE_*", "TARGET_*"], "idempotency_key": "accounts_sync:daily", "retry": { "max_attempts": 2, "when": ["rate_limit", "transient_network"] }, "validation": ["row_count", "schema_drift", "freshness"], "artifacts": ["reports/**", "metrics.json", "redacted-config.json"], "redact": ["password", "token", "secret"] } JSON crabbox run --pool example-org/data-movement/main/provider/linux/etl \ --shell 'python -m workers.airbyte_sync --config .crabbox/generated/accounts-sync.json' \ --allow-env 'AIRBYTE_*,SOURCE_*,TARGET_*' \ --env-from-profile etl-warehouse \ --artifact-glob 'reports/**,metrics.json,redacted-config.json' \ --junit reports/ crabbox results <run-id> --json crabbox artifacts download <run-id> --out evidence/<run-id>
The agent writes references and rules — source, target, profile name, allowlists, validation, retry, redaction, artifact globs. It never writes secret values or row payloads.
raw secrets · copied rows · prompt transcriptsCrabbox receives a pool id, command, profile name, and artifact contract. It returns a run id and a bounded evidence bundle — nothing else crosses.
unbounded shell · missing run id · missing artifact captureThe next run changes one bounded input tied to the failing owner. Validation, redaction, and artifact capture stay on, so every attempt stays comparable.
changing many inputs at once · retrying partial writes without idempotencyFailures are not mysteries — they are boundary breaks. Each class tells you where to look first and what you are allowed to change.
| class | owner | signal to read | smallest repair |
|---|---|---|---|
| F1 · Planbefore lease | the agent | spec diff | fix refs, profile, validation, or retry policy — then rerun |
| F2 · Capacitybefore command | Crabbox | pool + lease status | fix pool capacity, image, cache, or repo hydration |
| F3 · Credentialsbefore sync | profile | redacted env report | fix the profile mapping or allow_env |
| F4 · Connectorduring sync | Airbyte / source | connector log | fix auth scope, schema, API limit, or cursor |
| F5 · Partial writeafter write | worker / target | target counts + sync state | verify idempotency before any retry |
| F6 · Validationafter proof | the next plan | JUnit + metrics | compile a repair job from the failing checks, or alert |
triage rule — owner + signal = the one bounded input the next run is allowed to change.
The seed is deterministic, so the SHA-256 on the receipt is reproducible. Borrow a box, hydrate it from the repo, run the proof, tear it down.
Lease a sandbox, hydrate from the repo, run the proof.
islo use airbyte-etl \
--config poc/islo.yaml \
--source github://zozo123/agentic-airbyte \
-- bash poc/run_e2e.sh
Dispatch it as a governed run with evidence capture.
crabbox run --pool org/data-movement/main/... \ --shell 'bash poc/run_e2e.sh' \ --artifact-glob 'poc/reports/**' \ --junit poc/reports/
source — run_e2e.sh · worker/etl.py · worker/seed.py · islo.yaml
Agent plans. Crabbox runs. Airbyte moves. Evidence returns. Repeat only when the evidence says what changed.