v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness by garrytan · Pull Request #522 · garrytan/gbrain

garrytan · 2026-04-30T00:49:08Z

Summary

End-to-end claw-test friction harness so every release gets a fresh-install dry-run before users do.

New surface (5 atomic commits + merge + version bump):

gbrain claw-test — two modes. Scripted (~30s, no API keys, CI gate) walks the canonical first-day flow against a fresh tempdir and asserts every expected --progress-json phase fired + doctor's status === 'ok'. Live (--live --agent openclaw, ~5–10 min, ~$1–2 in tokens) spawns a real openclaw subprocess, hands it BRIEF.md, captures stdin/stdout/stderr to transcript.jsonl, and lets the agent log friction whenever something is confusing or wrong.
gbrain friction {log,render,list,summary} — append-only JSONL writer at $GBRAIN_HOME/friction/<run-id>.jsonl. Schema is a flat extension of StructuredAgentError. Run-id resolves from --run-id > $GBRAIN_FRICTION_RUN_ID > standalone. render --redact (default for md) strips $HOME / $CWD to placeholders so reports paste safely in PRs.
Hermeticity migration — configDir() always supported $GBRAIN_HOME as a parent-dir override, but ~12 consumers built paths from os.homedir() directly and bypassed it. Critically, loadConfig / saveConfig themselves used a private helper that ignored the env. Migrated every write site to a new gbrainPath() helper. test/gbrain-home-isolation.test.ts is the regression gate.
AgentRunner interface + OpenClaw runner — narrow contract (detect, invoke, optional postInstallHook). Invocation pattern: openclaw agent --local --agent <name> --message <brief> matching test/e2e/skills.test.ts. Hermes deferred to v1.1 (TODO).
transcript-capture — async-drains via fs.createWriteStream + 'drain' events so 256KB+ bursts don't stall the child. Friction entries' transcript_offset field references byte offsets into transcript.jsonl so render --transcripts resolves back to readable agent reasoning.
seed-pglite — ~50 LOC SQL replay primitive for the upgrade-from-v0.18 scenario. Existing migration helpers (test/e2e/helpers.ts) are Postgres-only.
Two scenarios in test/fixtures/claw-test-scenarios/: fresh-install (canonical 5-min flow) and upgrade-from-v0.18 (scaffolded; real v0.18 SQL dump documented as a v1.1 follow-up).
skills/_friction-protocol.md — cross-cutting convention skill telling agents when to call gbrain friction log.

Test Coverage

113 new unit tests + 3 E2E tests. Direct verification on every modified or new file:

Suite	Pass / Total
Hermeticity isolation	7 / 7
Friction core (writer, reader, renderer, redactor)	20 / 20
Friction CLI dispatch	16 / 16
AgentRunner registry	8 / 8
Transcript capture (incl. multi-byte UTF-8 + 256KB burst drain)	12 / 12
Progress-tail event parsing	8 / 8
Scenario loader	13 / 13
Seed-pglite (SQL splitter + replay)	11 / 11
Claw-test CLI dispatch	9 / 9
Migration v0_14_0 (regression-fixed for `$GBRAIN_HOME`)	8 / 8
Build-llms generator drift	7 / 7

E2E (full run with DATABASE_URL set):

bun run test:e2e → 241 / 241 pass across 28 files
test/e2e/skills.test.ts (real openclaw + API keys) → 3 / 3 pass (3m36s)
test/e2e/claw-test.test.ts (this PR's E2E) → 3 / 3 pass

Plan Completion

5-commit plan from ~/.claude/plans/system-instruction-you-are-working-noble-biscuit.md:

Commit	Status
Hermeticity migration	DONE
Friction CLI core	DONE
claw-test harness + AgentRunner + transcript-capture	DONE
Scenario fixtures + seed-pglite + skills convention	DONE
CLI wiring + CLAUDE.md + llms sync + E2E test	DONE

23 plan-stage decisions (D1–D23 across CEO + Eng review) all addressed. CEO + Eng reviews CLEAR.

TODOS

8 v1.1 follow-up TODOs added — hermes runner, friction analytics suite (diff/trend/migration-stub), 2 more scenarios, real v0.18 SQL dump, public scoreboard, PTY-mode capture, $GBRAIN_HOST_HOME for read-side isolation, routing-callout sweep.

Test plan

Tier 1 E2E (bun run test:e2e) passes — 241/241
Tier 2 skills E2E passes — 3/3 with real openclaw + API keys
claw-test E2E passes — 3/3
Hermeticity isolation passes — 7/7 against $GBRAIN_HOME=<tmp>
Manual gbrain friction log/list/render round-trip works
Branch merged with origin/master cleanly (1 conflict in src/cli.ts, resolved as union of auth + friction + claw-test)

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

…e env override configDir() in src/core/config.ts already implemented $GBRAIN_HOME as a parent-dir override (returns <override>/.gbrain), but ~12 consumers built paths from os.homedir() directly and bypassed it. Critically, loadConfig/saveConfig themselves used a private getConfigDir() that ignored the env. Fixed. Migrated every write site to gbrainPath() — fail-improve, validator-lint, cycle lock, shell-audit, backpressure-audit, sync-failures, integrity logs, integrations heartbeat, init pglite path, migrate-engine manifest, import checkpoint, v0_13_1 rollback, v0_14_0 host-work. Read-side host-detection in init.ts (~/.claude / ~/.openclaw probes) intentionally NOT migrated; that's a v1.1 follow-up under a separate $GBRAIN_HOST_HOME override. Adds gbrainPath(...segments) sugar plus path validation: $GBRAIN_HOME must be absolute and contain no '..' segments (throws GbrainHomeInvalidError). test/gbrain-home-isolation.test.ts proves write-isolation across all migrated sites. test/migrations-v0_14_0.test.ts updated to use $GBRAIN_HOME instead of the old HOME-swap pattern. Closes part of the claw-test E2E harness preconditions (D13 + D21).

…rter Append-only JSONL writer at $GBRAIN_HOME/friction/<run-id>.jsonl. Schema is a flat extension of StructuredAgentError (D20), one envelope shape across both agent-emitted entries and harness-wrapped command failures. Run-id resolves from --run-id > $GBRAIN_FRICTION_RUN_ID > 'standalone'. Subcommands stay ≤30 LOC each; core lives in src/core/friction.ts (writer + reader + renderer + redactor). render --redact (default for md output) strips \$HOME / \$CWD to placeholders so reports paste safely in PRs/issues. Severity: confused | error | blocker | nit. Kind: friction | delight (D7) | phase-marker | interrupted. Readers tolerate malformed lines (skip + warn). 40 unit tests; this is the channel the claw-test harness writes to and that agents emit through during live-mode runs.

Two modes: scripted (CI gate, no agent) and --live (real agent subprocess). Phases: setup → install_brain (gbrain init --pglite) → import (--no-embed) → query → extract all --source fs → verify (gbrain doctor --json, asserts status==='ok' and progress.jsonl phase coverage). AgentRunner interface + registry — interface stays narrow (detect, invoke, optional postInstallHook). v1 ships only OpenClawRunner; the registry pattern lets v1.1 land hermes/codex as ~50-line additions without refactoring callers. OpenClaw invocation: 'openclaw agent --local --agent <name> --message <brief>' matching test/e2e/skills.test.ts (NOT --prompt-file, which doesn't exist). transcript-capture: spawns child with piped stdio, async-drains via fs.createWriteStream + 'drain' events so 256KB+ bursts don't stall the child (D17 backpressure). Writes <run>/transcript.jsonl with schema_version + ts + channel + byte_offset + bytes_b64. Friction entries' transcript_offset field references byte offsets here so render --transcripts can resolve back. progress-tail: parses gbrain's --progress-json events out of child stderr. Phase verification asserts each scenario.expected_phases entry (dotted names like import.files, extract.links_fs, doctor.db_checks) saw at least one event from the actual command — proves the COMMAND ran, not that the agent obeyed prompts. seed-pglite: ~50 LOC SQL replay primitive for the upgrade-from-v0.18 scenario. Existing migration helpers (test/e2e/helpers.ts) are Postgres-only; PGLite has no equivalent. seedPglite opens a fresh PGLite, executes each statement individually (errors name the failing one), then disconnects so gbrain init can take over and walk forward. 53 unit tests covering registry selection, runner detection, multi-byte UTF-8 chunk-boundary safety, PIPE buffer drain, scenario load+validate, progress event parsing, and SQL splitter.

Two scenarios ship in v1 — fresh-install and upgrade-from-v0.18. Each is a self-contained directory: brain/ (markdown pages), BRIEF.md (live-mode prompt), expected.json (scripted-mode assertions), scenario.json (kind, expected_phases, optional from_version + seed paths). Schema is owned by src/core/claw-test/ scenarios.ts. upgrade-from-v0.18 ships scaffolded — seed/dump.sql is the v1.1 follow-up (needs a real v0.18-shape PGLite dump; seed/README.md documents the gen procedure). The harness gracefully no-ops the seed phase when dump.sql is absent. skills/_friction-protocol.md is a cross-cutting convention skill (like _brain-filing-rules.md). Tells agents when to call gbrain friction log and how to choose severity. Skills the claw-test exercises will gain a > Convention: callout pointing here in a v1.1 sweep. 13 unit tests for the scenario loader + 'shipped scenarios load cleanly' for both.

Wires both commands into src/cli.ts CLI_ONLY allow-list and adds dispatch in handleCliOnly so neither command requires a brain engine connection. CLAUDE.md gains entries for src/commands/{friction,claw-test}.ts + src/core/claw-test/ + skills/_friction-protocol.md, and a Commands section listing all 8 new gbrain claw-test ... and gbrain friction ... invocations with the v0.23 marker. Documents the GBRAIN_HOME write-isolation contract and the v1 caveat (read-side host-fingerprint detection deferred to v1.1). llms.txt + llms-full.txt regenerated via 'bun run build:llms' so the committed generator-output gate passes. test/e2e/claw-test.test.ts is the scripted-mode E2E. Builds a tiny shim that delegates to 'bun run src/cli.ts' (NOT bun --compile, which doesn't bundle PGLite's runtime assets), points the harness at it via GBRAIN_BIN_OVERRIDE, runs --scenario fresh-install end-to-end. Asserts exit 0, zero error/blocker friction. Includes a deliberate-break test that proves the friction signal fires when a phase command rejects. test/claw-test-cli.test.ts covers shipped-scenario load + agent registry + OpenClawRunner detection (relative-path / .. / missing-bin guards) + the GBRAIN_FRICTION_RUN_ID env handoff between harness and friction CLI. Closes the v0.23 claw-test E2E feature.

…-e2e # Conflicts: # .gitignore # src/cli.ts

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three CI fixes after PR #522 landed: 1. test/agent-runner.test.ts:89 — UnavailableRunner.invoke() returns Promise<void> by default but the AgentRunner contract requires Promise<InvokeResult>. Annotate the throw-only invoke explicitly so tsc sees the contract is satisfied (the throw makes the body unreachable as far as the return type is concerned). 2. test/seed-pglite.test.ts — bun:test signature is test(name, fn, timeoutMs: number), not test(name, opts: {timeout}, fn). The {timeout: 30_000} object form was a guess that tsc on bun 1.3.13 rejects. Move the 30s cap to the trailing positional number arg on each PGLite-using test. 3. test/transcript-capture.test.ts — `spawnWithCapture > timeout fires SIGTERM/SIGKILL` blew the 10s outer cap on the GitHub runner. Two fixes: (a) use `exec sleep` so the child we spawn IS sleep — SIGTERM goes directly to it, no `/bin/sh` fork-vs-exec process-group ambiguity that could orphan the sleep and force the SIGKILL grace path. (b) bump outer cap to 30s for headroom even when the runner is slow and SIGKILL after the 5s grace is what actually ends the child.

PR #506 claims v0.22.15, PR #521 claims v0.22.10, intermediate slots (.11/.12/.13/.14) are claimed by other open PRs. v0.22.16 is the next clean PATCH slot. v0.23.0 is claimed by PR #462 so MINOR isn't free. This release fits the 0.22.x train; v0.23.0 lands when #462 ships. Updates VERSION, package.json, CHANGELOG.md header, TODOS.md follow-up labels. Code is unchanged.

…-e2e # Conflicts: # CHANGELOG.md # VERSION # package.json

…-e2e # Conflicts: # CHANGELOG.md # CLAUDE.md # TODOS.md # VERSION # llms-full.txt # package.json # src/cli.ts

garrytan and others added 9 commits April 29, 2026 16:02

Merge remote-tracking branch 'origin/master' into garrytan/claw-setup…

953c8e7

…-e2e # Conflicts: # .gitignore # src/cli.ts

chore: bump version and changelog (v0.24.0)

8200c27

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

garrytan changed the title ~~v0.24.0 feat: gbrain claw-test — end-to-end fresh-install friction harness~~ v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness Apr 30, 2026

garrytan added 2 commits April 29, 2026 22:18

Merge remote-tracking branch 'origin/master' into garrytan/claw-setup…

7aae6ec

…-e2e # Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/claw-setup…

e8b6894

…-e2e # Conflicts: # CHANGELOG.md # CLAUDE.md # TODOS.md # VERSION # llms-full.txt # package.json # src/cli.ts

garrytan merged commit 83e55ff into master Apr 30, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness#522

v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness#522
garrytan merged 11 commits intomasterfrom
garrytan/claw-setup-e2e

garrytan commented Apr 30, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Apr 30, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Plan Completion

TODOS

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented Apr 30, 2026 •

edited by blacksmith-sh Bot

Loading