v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness#522
Merged
v0.22.16 feat: gbrain claw-test — end-to-end fresh-install friction harness#522
Conversation
…e env override configDir() in src/core/config.ts already implemented $GBRAIN_HOME as a parent-dir override (returns <override>/.gbrain), but ~12 consumers built paths from os.homedir() directly and bypassed it. Critically, loadConfig/saveConfig themselves used a private getConfigDir() that ignored the env. Fixed. Migrated every write site to gbrainPath() — fail-improve, validator-lint, cycle lock, shell-audit, backpressure-audit, sync-failures, integrity logs, integrations heartbeat, init pglite path, migrate-engine manifest, import checkpoint, v0_13_1 rollback, v0_14_0 host-work. Read-side host-detection in init.ts (~/.claude / ~/.openclaw probes) intentionally NOT migrated; that's a v1.1 follow-up under a separate $GBRAIN_HOST_HOME override. Adds gbrainPath(...segments) sugar plus path validation: $GBRAIN_HOME must be absolute and contain no '..' segments (throws GbrainHomeInvalidError). test/gbrain-home-isolation.test.ts proves write-isolation across all migrated sites. test/migrations-v0_14_0.test.ts updated to use $GBRAIN_HOME instead of the old HOME-swap pattern. Closes part of the claw-test E2E harness preconditions (D13 + D21).
…rter Append-only JSONL writer at $GBRAIN_HOME/friction/<run-id>.jsonl. Schema is a flat extension of StructuredAgentError (D20), one envelope shape across both agent-emitted entries and harness-wrapped command failures. Run-id resolves from --run-id > $GBRAIN_FRICTION_RUN_ID > 'standalone'. Subcommands stay ≤30 LOC each; core lives in src/core/friction.ts (writer + reader + renderer + redactor). render --redact (default for md output) strips \$HOME / \$CWD to placeholders so reports paste safely in PRs/issues. Severity: confused | error | blocker | nit. Kind: friction | delight (D7) | phase-marker | interrupted. Readers tolerate malformed lines (skip + warn). 40 unit tests; this is the channel the claw-test harness writes to and that agents emit through during live-mode runs.
Two modes: scripted (CI gate, no agent) and --live (real agent subprocess). Phases: setup → install_brain (gbrain init --pglite) → import (--no-embed) → query → extract all --source fs → verify (gbrain doctor --json, asserts status==='ok' and progress.jsonl phase coverage). AgentRunner interface + registry — interface stays narrow (detect, invoke, optional postInstallHook). v1 ships only OpenClawRunner; the registry pattern lets v1.1 land hermes/codex as ~50-line additions without refactoring callers. OpenClaw invocation: 'openclaw agent --local --agent <name> --message <brief>' matching test/e2e/skills.test.ts (NOT --prompt-file, which doesn't exist). transcript-capture: spawns child with piped stdio, async-drains via fs.createWriteStream + 'drain' events so 256KB+ bursts don't stall the child (D17 backpressure). Writes <run>/transcript.jsonl with schema_version + ts + channel + byte_offset + bytes_b64. Friction entries' transcript_offset field references byte offsets here so render --transcripts can resolve back. progress-tail: parses gbrain's --progress-json events out of child stderr. Phase verification asserts each scenario.expected_phases entry (dotted names like import.files, extract.links_fs, doctor.db_checks) saw at least one event from the actual command — proves the COMMAND ran, not that the agent obeyed prompts. seed-pglite: ~50 LOC SQL replay primitive for the upgrade-from-v0.18 scenario. Existing migration helpers (test/e2e/helpers.ts) are Postgres-only; PGLite has no equivalent. seedPglite opens a fresh PGLite, executes each statement individually (errors name the failing one), then disconnects so gbrain init can take over and walk forward. 53 unit tests covering registry selection, runner detection, multi-byte UTF-8 chunk-boundary safety, PIPE buffer drain, scenario load+validate, progress event parsing, and SQL splitter.
Two scenarios ship in v1 — fresh-install and upgrade-from-v0.18. Each is a self-contained directory: brain/ (markdown pages), BRIEF.md (live-mode prompt), expected.json (scripted-mode assertions), scenario.json (kind, expected_phases, optional from_version + seed paths). Schema is owned by src/core/claw-test/ scenarios.ts. upgrade-from-v0.18 ships scaffolded — seed/dump.sql is the v1.1 follow-up (needs a real v0.18-shape PGLite dump; seed/README.md documents the gen procedure). The harness gracefully no-ops the seed phase when dump.sql is absent. skills/_friction-protocol.md is a cross-cutting convention skill (like _brain-filing-rules.md). Tells agents when to call gbrain friction log and how to choose severity. Skills the claw-test exercises will gain a > Convention: callout pointing here in a v1.1 sweep. 13 unit tests for the scenario loader + 'shipped scenarios load cleanly' for both.
Wires both commands into src/cli.ts CLI_ONLY allow-list and adds dispatch
in handleCliOnly so neither command requires a brain engine connection.
CLAUDE.md gains entries for src/commands/{friction,claw-test}.ts +
src/core/claw-test/ + skills/_friction-protocol.md, and a Commands section
listing all 8 new gbrain claw-test ... and gbrain friction ... invocations
with the v0.23 marker. Documents the GBRAIN_HOME write-isolation contract
and the v1 caveat (read-side host-fingerprint detection deferred to v1.1).
llms.txt + llms-full.txt regenerated via 'bun run build:llms' so the
committed generator-output gate passes.
test/e2e/claw-test.test.ts is the scripted-mode E2E. Builds a tiny shim that
delegates to 'bun run src/cli.ts' (NOT bun --compile, which doesn't bundle
PGLite's runtime assets), points the harness at it via GBRAIN_BIN_OVERRIDE,
runs --scenario fresh-install end-to-end. Asserts exit 0, zero error/blocker
friction. Includes a deliberate-break test that proves the friction signal
fires when a phase command rejects.
test/claw-test-cli.test.ts covers shipped-scenario load + agent registry +
OpenClawRunner detection (relative-path / .. / missing-bin guards) + the
GBRAIN_FRICTION_RUN_ID env handoff between harness and friction CLI.
Closes the v0.23 claw-test E2E feature.
…-e2e # Conflicts: # .gitignore # src/cli.ts
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three CI fixes after PR #522 landed: 1. test/agent-runner.test.ts:89 — UnavailableRunner.invoke() returns Promise<void> by default but the AgentRunner contract requires Promise<InvokeResult>. Annotate the throw-only invoke explicitly so tsc sees the contract is satisfied (the throw makes the body unreachable as far as the return type is concerned). 2. test/seed-pglite.test.ts — bun:test signature is test(name, fn, timeoutMs: number), not test(name, opts: {timeout}, fn). The {timeout: 30_000} object form was a guess that tsc on bun 1.3.13 rejects. Move the 30s cap to the trailing positional number arg on each PGLite-using test. 3. test/transcript-capture.test.ts — `spawnWithCapture > timeout fires SIGTERM/SIGKILL` blew the 10s outer cap on the GitHub runner. Two fixes: (a) use `exec sleep` so the child we spawn IS sleep — SIGTERM goes directly to it, no `/bin/sh` fork-vs-exec process-group ambiguity that could orphan the sleep and force the SIGKILL grace path. (b) bump outer cap to 30s for headroom even when the runner is slow and SIGKILL after the 5s grace is what actually ends the child.
PR #506 claims v0.22.15, PR #521 claims v0.22.10, intermediate slots (.11/.12/.13/.14) are claimed by other open PRs. v0.22.16 is the next clean PATCH slot. v0.23.0 is claimed by PR #462 so MINOR isn't free. This release fits the 0.22.x train; v0.23.0 lands when #462 ships. Updates VERSION, package.json, CHANGELOG.md header, TODOS.md follow-up labels. Code is unchanged.
…-e2e # Conflicts: # CHANGELOG.md # VERSION # package.json
…-e2e # Conflicts: # CHANGELOG.md # CLAUDE.md # TODOS.md # VERSION # llms-full.txt # package.json # src/cli.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end claw-test friction harness so every release gets a fresh-install dry-run before users do.
New surface (5 atomic commits + merge + version bump):
gbrain claw-test— two modes. Scripted (~30s, no API keys, CI gate) walks the canonical first-day flow against a fresh tempdir and asserts every expected--progress-jsonphase fired + doctor'sstatus === 'ok'. Live (--live --agent openclaw, ~5–10 min, ~$1–2 in tokens) spawns a real openclaw subprocess, hands itBRIEF.md, captures stdin/stdout/stderr totranscript.jsonl, and lets the agent log friction whenever something is confusing or wrong.gbrain friction {log,render,list,summary}— append-only JSONL writer at$GBRAIN_HOME/friction/<run-id>.jsonl. Schema is a flat extension ofStructuredAgentError. Run-id resolves from--run-id>$GBRAIN_FRICTION_RUN_ID>standalone.render --redact(default for md) strips$HOME/$CWDto placeholders so reports paste safely in PRs.Hermeticity migration —
configDir()always supported$GBRAIN_HOMEas a parent-dir override, but ~12 consumers built paths fromos.homedir()directly and bypassed it. Critically,loadConfig/saveConfigthemselves used a private helper that ignored the env. Migrated every write site to a newgbrainPath()helper.test/gbrain-home-isolation.test.tsis the regression gate.AgentRunner interface + OpenClaw runner — narrow contract (
detect,invoke, optionalpostInstallHook). Invocation pattern:openclaw agent --local --agent <name> --message <brief>matchingtest/e2e/skills.test.ts. Hermes deferred to v1.1 (TODO).transcript-capture — async-drains via
fs.createWriteStream+'drain'events so 256KB+ bursts don't stall the child. Friction entries'transcript_offsetfield references byte offsets intotranscript.jsonlsorender --transcriptsresolves back to readable agent reasoning.seed-pglite — ~50 LOC SQL replay primitive for the upgrade-from-v0.18 scenario. Existing migration helpers (
test/e2e/helpers.ts) are Postgres-only.Two scenarios in
test/fixtures/claw-test-scenarios/:fresh-install(canonical 5-min flow) andupgrade-from-v0.18(scaffolded; real v0.18 SQL dump documented as a v1.1 follow-up).skills/_friction-protocol.md— cross-cutting convention skill telling agents when to callgbrain friction log.Test Coverage
113 new unit tests + 3 E2E tests. Direct verification on every modified or new file:
$GBRAIN_HOME)E2E (full run with DATABASE_URL set):
bun run test:e2e→ 241 / 241 pass across 28 filestest/e2e/skills.test.ts(real openclaw + API keys) → 3 / 3 pass (3m36s)test/e2e/claw-test.test.ts(this PR's E2E) → 3 / 3 passPlan Completion
5-commit plan from
~/.claude/plans/system-instruction-you-are-working-noble-biscuit.md:23 plan-stage decisions (D1–D23 across CEO + Eng review) all addressed. CEO + Eng reviews CLEAR.
TODOS
8 v1.1 follow-up TODOs added — hermes runner, friction analytics suite (diff/trend/migration-stub), 2 more scenarios, real v0.18 SQL dump, public scoreboard, PTY-mode capture,
$GBRAIN_HOST_HOMEfor read-side isolation, routing-callout sweep.Test plan
bun run test:e2e) passes — 241/241$GBRAIN_HOME=<tmp>gbrain friction log/list/renderround-trip workssrc/cli.ts, resolved as union ofauth+friction+claw-test)🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.