Skip to content

v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness#522

Merged
garrytan merged 11 commits intomasterfrom
garrytan/claw-setup-e2e
Apr 30, 2026
Merged

v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness#522
garrytan merged 11 commits intomasterfrom
garrytan/claw-setup-e2e

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented Apr 30, 2026

Summary

End-to-end claw-test friction harness so every release gets a fresh-install dry-run before users do.

New surface (5 atomic commits + merge + version bump):

  • gbrain claw-test — two modes. Scripted (~30s, no API keys, CI gate) walks the canonical first-day flow against a fresh tempdir and asserts every expected --progress-json phase fired + doctor's status === 'ok'. Live (--live --agent openclaw, ~5–10 min, ~$1–2 in tokens) spawns a real openclaw subprocess, hands it BRIEF.md, captures stdin/stdout/stderr to transcript.jsonl, and lets the agent log friction whenever something is confusing or wrong.

  • gbrain friction {log,render,list,summary} — append-only JSONL writer at $GBRAIN_HOME/friction/<run-id>.jsonl. Schema is a flat extension of StructuredAgentError. Run-id resolves from --run-id > $GBRAIN_FRICTION_RUN_ID > standalone. render --redact (default for md) strips $HOME / $CWD to placeholders so reports paste safely in PRs.

  • Hermeticity migrationconfigDir() always supported $GBRAIN_HOME as a parent-dir override, but ~12 consumers built paths from os.homedir() directly and bypassed it. Critically, loadConfig / saveConfig themselves used a private helper that ignored the env. Migrated every write site to a new gbrainPath() helper. test/gbrain-home-isolation.test.ts is the regression gate.

  • AgentRunner interface + OpenClaw runner — narrow contract (detect, invoke, optional postInstallHook). Invocation pattern: openclaw agent --local --agent <name> --message <brief> matching test/e2e/skills.test.ts. Hermes deferred to v1.1 (TODO).

  • transcript-capture — async-drains via fs.createWriteStream + 'drain' events so 256KB+ bursts don't stall the child. Friction entries' transcript_offset field references byte offsets into transcript.jsonl so render --transcripts resolves back to readable agent reasoning.

  • seed-pglite — ~50 LOC SQL replay primitive for the upgrade-from-v0.18 scenario. Existing migration helpers (test/e2e/helpers.ts) are Postgres-only.

  • Two scenarios in test/fixtures/claw-test-scenarios/: fresh-install (canonical 5-min flow) and upgrade-from-v0.18 (scaffolded; real v0.18 SQL dump documented as a v1.1 follow-up).

  • skills/_friction-protocol.md — cross-cutting convention skill telling agents when to call gbrain friction log.

Test Coverage

113 new unit tests + 3 E2E tests. Direct verification on every modified or new file:

Suite Pass / Total
Hermeticity isolation 7 / 7
Friction core (writer, reader, renderer, redactor) 20 / 20
Friction CLI dispatch 16 / 16
AgentRunner registry 8 / 8
Transcript capture (incl. multi-byte UTF-8 + 256KB burst drain) 12 / 12
Progress-tail event parsing 8 / 8
Scenario loader 13 / 13
Seed-pglite (SQL splitter + replay) 11 / 11
Claw-test CLI dispatch 9 / 9
Migration v0_14_0 (regression-fixed for $GBRAIN_HOME) 8 / 8
Build-llms generator drift 7 / 7

E2E (full run with DATABASE_URL set):

  • bun run test:e2e241 / 241 pass across 28 files
  • test/e2e/skills.test.ts (real openclaw + API keys) → 3 / 3 pass (3m36s)
  • test/e2e/claw-test.test.ts (this PR's E2E) → 3 / 3 pass

Plan Completion

5-commit plan from ~/.claude/plans/system-instruction-you-are-working-noble-biscuit.md:

Commit Status
Hermeticity migration DONE
Friction CLI core DONE
claw-test harness + AgentRunner + transcript-capture DONE
Scenario fixtures + seed-pglite + skills convention DONE
CLI wiring + CLAUDE.md + llms sync + E2E test DONE

23 plan-stage decisions (D1–D23 across CEO + Eng review) all addressed. CEO + Eng reviews CLEAR.

TODOS

8 v1.1 follow-up TODOs added — hermes runner, friction analytics suite (diff/trend/migration-stub), 2 more scenarios, real v0.18 SQL dump, public scoreboard, PTY-mode capture, $GBRAIN_HOST_HOME for read-side isolation, routing-callout sweep.

Test plan

  • Tier 1 E2E (bun run test:e2e) passes — 241/241
  • Tier 2 skills E2E passes — 3/3 with real openclaw + API keys
  • claw-test E2E passes — 3/3
  • Hermeticity isolation passes — 7/7 against $GBRAIN_HOME=<tmp>
  • Manual gbrain friction log/list/render round-trip works
  • Branch merged with origin/master cleanly (1 conflict in src/cli.ts, resolved as union of auth + friction + claw-test)

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan and others added 9 commits April 29, 2026 16:02
…e env override

configDir() in src/core/config.ts already implemented $GBRAIN_HOME as a
parent-dir override (returns <override>/.gbrain), but ~12 consumers built paths
from os.homedir() directly and bypassed it. Critically, loadConfig/saveConfig
themselves used a private getConfigDir() that ignored the env. Fixed.

Migrated every write site to gbrainPath() — fail-improve, validator-lint, cycle
lock, shell-audit, backpressure-audit, sync-failures, integrity logs,
integrations heartbeat, init pglite path, migrate-engine manifest, import
checkpoint, v0_13_1 rollback, v0_14_0 host-work. Read-side host-detection in
init.ts (~/.claude / ~/.openclaw probes) intentionally NOT migrated; that's a
v1.1 follow-up under a separate $GBRAIN_HOST_HOME override.

Adds gbrainPath(...segments) sugar plus path validation: $GBRAIN_HOME must be
absolute and contain no '..' segments (throws GbrainHomeInvalidError).

test/gbrain-home-isolation.test.ts proves write-isolation across all migrated
sites. test/migrations-v0_14_0.test.ts updated to use $GBRAIN_HOME instead of
the old HOME-swap pattern.

Closes part of the claw-test E2E harness preconditions (D13 + D21).
…rter

Append-only JSONL writer at $GBRAIN_HOME/friction/<run-id>.jsonl. Schema is a
flat extension of StructuredAgentError (D20), one envelope shape across both
agent-emitted entries and harness-wrapped command failures. Run-id resolves
from --run-id > $GBRAIN_FRICTION_RUN_ID > 'standalone'.

Subcommands stay ≤30 LOC each; core lives in src/core/friction.ts (writer +
reader + renderer + redactor). render --redact (default for md output) strips
\$HOME / \$CWD to placeholders so reports paste safely in PRs/issues.

Severity: confused | error | blocker | nit. Kind: friction | delight (D7) |
phase-marker | interrupted. Readers tolerate malformed lines (skip + warn).

40 unit tests; this is the channel the claw-test harness writes to and that
agents emit through during live-mode runs.
Two modes: scripted (CI gate, no agent) and --live (real agent subprocess).
Phases: setup → install_brain (gbrain init --pglite) → import (--no-embed) →
query → extract all --source fs → verify (gbrain doctor --json, asserts
status==='ok' and progress.jsonl phase coverage).

AgentRunner interface + registry — interface stays narrow (detect, invoke,
optional postInstallHook). v1 ships only OpenClawRunner; the registry pattern
lets v1.1 land hermes/codex as ~50-line additions without refactoring callers.
OpenClaw invocation: 'openclaw agent --local --agent <name> --message <brief>'
matching test/e2e/skills.test.ts (NOT --prompt-file, which doesn't exist).

transcript-capture: spawns child with piped stdio, async-drains via
fs.createWriteStream + 'drain' events so 256KB+ bursts don't stall the child
(D17 backpressure). Writes <run>/transcript.jsonl with schema_version + ts +
channel + byte_offset + bytes_b64. Friction entries' transcript_offset field
references byte offsets here so render --transcripts can resolve back.

progress-tail: parses gbrain's --progress-json events out of child stderr.
Phase verification asserts each scenario.expected_phases entry (dotted names
like import.files, extract.links_fs, doctor.db_checks) saw at least one event
from the actual command — proves the COMMAND ran, not that the agent obeyed
prompts.

seed-pglite: ~50 LOC SQL replay primitive for the upgrade-from-v0.18 scenario.
Existing migration helpers (test/e2e/helpers.ts) are Postgres-only; PGLite has
no equivalent. seedPglite opens a fresh PGLite, executes each statement
individually (errors name the failing one), then disconnects so gbrain init
can take over and walk forward.

53 unit tests covering registry selection, runner detection, multi-byte UTF-8
chunk-boundary safety, PIPE buffer drain, scenario load+validate, progress
event parsing, and SQL splitter.
Two scenarios ship in v1 — fresh-install and upgrade-from-v0.18. Each is a
self-contained directory: brain/ (markdown pages), BRIEF.md (live-mode prompt),
expected.json (scripted-mode assertions), scenario.json (kind, expected_phases,
optional from_version + seed paths). Schema is owned by src/core/claw-test/
scenarios.ts.

upgrade-from-v0.18 ships scaffolded — seed/dump.sql is the v1.1 follow-up
(needs a real v0.18-shape PGLite dump; seed/README.md documents the gen
procedure). The harness gracefully no-ops the seed phase when dump.sql is
absent.

skills/_friction-protocol.md is a cross-cutting convention skill (like
_brain-filing-rules.md). Tells agents when to call gbrain friction log and how
to choose severity. Skills the claw-test exercises will gain a > Convention:
callout pointing here in a v1.1 sweep.

13 unit tests for the scenario loader + 'shipped scenarios load cleanly' for
both.
Wires both commands into src/cli.ts CLI_ONLY allow-list and adds dispatch
in handleCliOnly so neither command requires a brain engine connection.

CLAUDE.md gains entries for src/commands/{friction,claw-test}.ts +
src/core/claw-test/ + skills/_friction-protocol.md, and a Commands section
listing all 8 new gbrain claw-test ... and gbrain friction ... invocations
with the v0.23 marker. Documents the GBRAIN_HOME write-isolation contract
and the v1 caveat (read-side host-fingerprint detection deferred to v1.1).
llms.txt + llms-full.txt regenerated via 'bun run build:llms' so the
committed generator-output gate passes.

test/e2e/claw-test.test.ts is the scripted-mode E2E. Builds a tiny shim that
delegates to 'bun run src/cli.ts' (NOT bun --compile, which doesn't bundle
PGLite's runtime assets), points the harness at it via GBRAIN_BIN_OVERRIDE,
runs --scenario fresh-install end-to-end. Asserts exit 0, zero error/blocker
friction. Includes a deliberate-break test that proves the friction signal
fires when a phase command rejects.

test/claw-test-cli.test.ts covers shipped-scenario load + agent registry +
OpenClawRunner detection (relative-path / .. / missing-bin guards) + the
GBRAIN_FRICTION_RUN_ID env handoff between harness and friction CLI.

Closes the v0.23 claw-test E2E feature.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three CI fixes after PR #522 landed:

1. test/agent-runner.test.ts:89 — UnavailableRunner.invoke() returns
   Promise<void> by default but the AgentRunner contract requires
   Promise<InvokeResult>. Annotate the throw-only invoke explicitly so tsc
   sees the contract is satisfied (the throw makes the body unreachable as
   far as the return type is concerned).

2. test/seed-pglite.test.ts — bun:test signature is test(name, fn, timeoutMs:
   number), not test(name, opts: {timeout}, fn). The {timeout: 30_000} object
   form was a guess that tsc on bun 1.3.13 rejects. Move the 30s cap to the
   trailing positional number arg on each PGLite-using test.

3. test/transcript-capture.test.ts — `spawnWithCapture > timeout fires
   SIGTERM/SIGKILL` blew the 10s outer cap on the GitHub runner. Two fixes:
   (a) use `exec sleep` so the child we spawn IS sleep — SIGTERM goes
   directly to it, no `/bin/sh` fork-vs-exec process-group ambiguity that
   could orphan the sleep and force the SIGKILL grace path. (b) bump outer
   cap to 30s for headroom even when the runner is slow and SIGKILL after
   the 5s grace is what actually ends the child.
PR #506 claims v0.22.15, PR #521 claims v0.22.10, intermediate slots
(.11/.12/.13/.14) are claimed by other open PRs. v0.22.16 is the next
clean PATCH slot. v0.23.0 is claimed by PR #462 so MINOR isn't free.
This release fits the 0.22.x train; v0.23.0 lands when #462 ships.

Updates VERSION, package.json, CHANGELOG.md header, TODOS.md follow-up
labels. Code is unchanged.
@garrytan garrytan changed the title v0.24.0 feat: gbrain claw-test — end-to-end fresh-install friction harness v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness Apr 30, 2026
…-e2e

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…-e2e

# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	TODOS.md
#	VERSION
#	llms-full.txt
#	package.json
#	src/cli.ts
@garrytan garrytan merged commit 83e55ff into master Apr 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant