Skip to content

v0.22.8.1 fix(test): isolate HOME in run-e2e.sh to stop config corruption#513

Closed
orendi84 wants to merge 45 commits into
garrytan:masterfrom
orendi84:garrytan/e2e-home-isolation
Closed

v0.22.8.1 fix(test): isolate HOME in run-e2e.sh to stop config corruption#513
orendi84 wants to merge 45 commits into
garrytan:masterfrom
orendi84:garrytan/e2e-home-isolation

Conversation

@orendi84

@orendi84 orendi84 commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes a recurring incident where bun run test:e2e overwrote the user's real ~/.gbrain/config.json to point at the docker test container, then wedged the live autopilot when the container tore down. Three operators hit this in 11 days; the most recent occurrence was this morning.

The fix is in scripts/run-e2e.sh: export both HOME and GBRAIN_HOME to a mktemp -d tmpdir before bun starts, so config writes during the suite land in the tmpdir instead of ~/.gbrain/config.json. After the run, the wrapper md5-compares the real user config to its pre-run snapshot (with three breach modes covered: modified, deleted, or created-from-nothing) and exits 2 with a loud HOME isolation breach detected banner if anything escaped the override. Distinct from exit 1 (test failure) so CI logs make root cause obvious.

Bundled into the PR: a bun run build:llms regen of llms-full.txt to fix test/build-llms.test.ts drift left by the v0.22.8.0 docs commit.

3 commits, bisectable:

  • fix(docs): regenerate llms-full.txt to match generator — clears the pre-existing test failure
  • fix(test): isolate HOME in run-e2e.sh to stop config corruption — the actual fix
  • chore: bump version and changelog (v0.22.8.1)

Test Coverage

The wrapper itself acts as the regression test: every E2E run now md5-compares the user config and fails loud on isolation breach. No unit test added — the wrapper IS the test.

Verified end-to-end:

  • bash -n scripts/run-e2e.sh: clean
  • bun run test:e2e against pgvector/pgvector:pg16 on port 5434: 27 files, 245 tests, 0 failures
  • User config md5 byte-identical before AND after run (bd38fb4bd78f86b0b5092bbf0876d023 both times)
  • Re-verified with codex P1 fixes folded in: same result on second run

Pre-Landing Review

Codex adversarial review caught and surfaced 3 P1 issues in the first wrapper draft, all addressed inline before this PR:

  1. mktemp -d -t gbrain-e2e fails on GNU mktemp (Linux CI) when template lacks XXXXXX — now mktemp -d "${TMPDIR:-/tmp}/gbrain-e2e.XXXXXX"
  2. Verification skipped if no pre-run config existed, allowing a test to create one outside isolation undetected — now tracked via USER_CONFIG_EXISTED flag and breach-on-create covered as a third mode
  3. set -e could abort on md5_of failure before the breach detector ran — md5_of body now wrapped in { ... } || true

P2 RISKY findings noted but not blocking: HOME unset under set -u (REAL_HOME="${HOME:-/tmp}" added), breach detector scoped to config.json only by design, race with parallel running gbrain processes (acceptable — autopilot does not normally write config), no INT/TERM forwarding around bun test (cleanup trap still fires).

Plan Completion

Closes Step 8 of the embed-worker fix closeout plan (~/Desktop/gbrain-embed-worker-fix-closeout-plan-2026-04-29.md). The other open steps from that plan (7b metric verification, 10 memory entries, 11 archive predecessors) are independent of this PR.

Documentation

/document-release will run as a follow-up commit on this branch after PR creation. The most likely surface for a doc update is CLAUDE.md's E2E test DB lifecycle section to mention the HOME isolation contract.

Test plan

  • bash -n scripts/run-e2e.sh clean
  • bun test (full unit suite): 2949 pass, 0 fail (after bun run build:llms regen fixed pre-existing build-llms drift)
  • bun run test:e2e end-to-end against docker pgvector: 27 files, 245 tests, 0 failures
  • User config md5 byte-identical before and after E2E run (twice)
  • Codex adversarial review folded in (3 P1 fixes)
  • CI runs both suites cleanly on Linux

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

orendi84 and others added 30 commits April 11, 2026 13:09
Drop yc/civic page types, add mandate (replaces deal) and knowledge
(for books/transcripts/articles). Rename test fixtures accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends listPages and the list_pages operation to accept two new
array params:
- tags_all: every listed tag must be present on the page (AND)
- tags_any: at least one listed tag must be present (OR)

Both compose with each other and with the existing type filter. The
legacy single-tag `tag` param is normalized into tags_all internally,
so all tag filtering now flows through one GROUP BY + HAVING path
using COUNT(DISTINCT tag) FILTER for exact match semantics.

Fast path unchanged: queries with no tag filter still hit pages
directly with no join.

CLI accepts comma-separated or repeated flags for array params.

Motivation: Step 4 of the gbrain personal CRM plan uses namespaced
flat tags (`firm_type:saas`, `geo_country:SG`, `theme:leadership`).
Useful segmentation requires AND-of-tags queries like "saas directors
in product function in Singapore" - single-tag filter can't express
that.

7 new E2E tests cover: tags_all AND with matching/nonexistent tags,
tags_any OR, combined tags_all + tags_any, type + tags_all composition,
legacy tag param compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tags_all/tags_any filtering to PGLite engine for parity with
Postgres engine. Uses the same GROUP BY + HAVING approach.

Include Step 4c/4f classification scripts used for taxonomy tagging
of all 12,913 pages (rule-based + LLM via Haiku 4.5).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Commits two LinkedIn outreach helper scripts used for personal CRM work
and ignores the .context/ local dev scratch dir before merging upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream brings the knowledge graph layer (v0.10.3: auto-link, typed
relationships, graph-query) and Minions v7 / v0.11.1 canonical migration
and skillify skill.

Two real conflicts, both in listPages. Gary's branch had added
tags_all/tags_any GROUP BY filtering. Upstream independently added
updated_after. Resolved by keeping the tags_all/tags_any logic in both
engines and threading updated_after through the fast path and the GROUP
BY path.

.gitignore conflict resolved by keeping both .context/ and eval/reports/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase C (backfill_links) and D (backfill_timeline) both shelled out with a
10-minute timeout. On a 13k-page brain, each extraction takes ~11 minutes,
so both phases returned 'failed' via execSync timeout and the orchestrator
marked the migration 'partial'. Schema + verify phases were unaffected.

Bump both timeouts to 30 minutes (1_800_000 ms). At the observed rate of
~50 sec/1k pages this covers brains up to ~35k pages; larger brains will
need a proportional extension.

Verified by re-running on a 13002-page brain: both phases complete cleanly
in ~11 min each, migration marks 'complete'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kept customized PageType union in src/core/types.ts
(mandate, knowledge; dropped upstream's deal/yc/civic/writing/analysis/guide/hardware/architecture).
Auto-merged cleanly - upstream v0.13 did not modify the PageType union.
v0.13 brings: frontmatter graph indexing, Knowledge Runtime, shell Minions.
One-shot migration scripts used to clean up after the Phase 4 company
enrichment run:
- tier-a-tagging-backfill: normalize Step 4A legacy rows (add
  enrichment_source + enrichment_verified tags)
- industry-normalize: collapse 407 distinct industry values into 21
  canonical buckets; preserves original in industry_original
- stub-dedup-report: report-only near-duplicate finder (exact-norm after
  legal-suffix strip, domain-match, fuzzy edit-distance)
- stub-merge-tier1: exact-name dedup merges (5 pairs)
- stub-merge-tier2: domain-match clerical-delta merges (13 pairs) with
  explicit keep-separate documentation for parent/subsidiary cases
- stub-merge-erste: Erste Hungary pair merge (manual confirmation)

Each merge script is transactional per group and writes a rollback JSONL
to ~/.gbrain/migrations/ with full dupe-page snapshot + edge inventory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Handoff log covering the Phase 4 LinkedIn+company enrichment run and the
2026-04-20 afternoon cleanup session (Tier A tagging, industry normalization,
stub dedup). References the committed cleanup scripts at scripts/cleanup-*.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ier B hub re-verify

The 2026-04-20 cleanup scripts (stub-merge-tier1/tier2/erste) used
`${JSON.stringify(x)}::jsonb` in postgres.js template literals, which silently
stores the value as a JSON string rather than a JSON object. All `->>` accessors
on those rows returned null afterwards, breaking downstream enrichment queries.

Fixes:
- All 5 cleanup scripts migrated to `sql.json(x)` (the canonical postgres.js
  helper — also what the existing CI guard error message recommends).
- CI guard scripts/check-jsonb-pattern.sh extended to scan scripts/ in addition
  to src/ (previously only src/ was checked, which is how this slipped through).
- Data repair script scripts/cleanup-fix-jsonb-string-corruption.ts unwraps the
  54 affected rows via `(frontmatter #>> '{}')::jsonb` with rollback JSONL.

New work:
- scripts/cleanup-tier-b-hub-reverify.ts: Haiku 4.5 + web search re-verification
  of Tier B non-noise hubs (inbound edges >= 3, 35 rows). Budget and wall-clock
  caps, per-row persistence, sets enrichment_source=haiku_search +
  enrichment_verified=true. Today's run completed 35/35 in 31s at $0.99.

Docs:
- docs/gbrain-work-log-2026-04-20.md updated with the hub re-verification
  addendum and the jsonb bug post-mortem.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation

12 rows swept via Haiku 4.5 + web search at concurrency 5. Completed in 13.3s
at $0.3213. Zero remaining unverified non-noise Tier B rows with inbound edges
>= 2.

Four of the 12 rows were exposed only after the JSONB corruption repair: their
enrichment_source field was buried inside the corrupted jsonb string and the
filter couldn't see them. Fixing the corruption surfaced them correctly for
this sweep.

Docs: gbrain-work-log-2026-04-20.md addendum covers the sweep + the corruption
side-effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Checks process.execPath before process.argv[1] so compiled Bun SFX
binaries don't try to spawn the virtual /$bunfs/root/gbrain path
that only exists inside the running process. Upstream garrytan/gbrain
PR garrytan#301 (commit fcf40a1, v0.15.4) landed a canonical fix for the same
bug - revert this commit after merging upstream/master.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…am-stage

# Conflicts:
#	scripts/check-jsonb-pattern.sh
#	src/commands/autopilot.ts
#	src/commands/migrations/v0_12_0.ts
#	src/core/types.ts
The jsonb-pattern guard was grep'ing src/ and scripts/ (Gary's broadening
in a8bc7b9) and hitting false positives on the literal token sequence in:
  - scripts/check-jsonb-pattern.sh lines 2 and 20 (self-documentation)
  - scripts/cleanup-fix-jsonb-string-corruption.ts line 3 (describes the bug)

Rewrote those comments as natural-language paraphrases so the forbidden
token sequence only appears in the PATTERN variable itself. Guard stays
strict (catches the pattern in any real code, including commented-out
statements) while not blocking on prose that describes what it catches.

Pre-existing bug, unmasked by this session's clean tree after the upstream
merge. Previously hidden because the daily cron kept failing earlier (dirty
tree), so tests never ran.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base: Gary's 8-type union (person, company, mandate, project, concept,
source, media, knowledge) was insufficient because upstream tests +
enrichment/completeness.ts reference 'deal', 'meeting', 'note', and the
local markdown.ts classifier returns 'writing', 'analysis', 'guide',
'hardware', 'architecture'. Keeping the 8-type union would fail tsc.

Resolution per runbook D1-A: expand to 18 types = upstream's 16 + the
fintech-specific 'mandate' and 'knowledge'. Preserves branch intent
(keep mandate/knowledge) while satisfying the TS gate.

No DB migration needed: page_type column has no CHECK constraint, and
the expansion is additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bun and Node cache os.homedir() on first call and ignore later
process.env.HOME mutations. Tests that set process.env.HOME = tmpHome
in beforeEach had no effect: loadConfig() kept reading the real
~/.gbrain/config.json, and createEngine().connect() hit production
Postgres during unit test runs.

Symptom: test/migrations-v0_14_0.test.ts "Phase A skipped (no config)"
and "Phase B host-work dedup" timed out at 5000ms (15000ms when
invoked with bun test --timeout 15000 in the daily merge script)
because orchestrator() awaited a Postgres connection that the test
never expected to happen.

Fix: mirror the resolveHome() pattern already used in
src/commands/migrations/v0_14_0.ts and src/core/preferences.ts.
process.env.HOME wins when set, homedir() is the fallback. Matches
existing convention, zero behavior change outside test contexts that
mutate HOME.

Verification:
  bun test test/migrations-v0_14_0.test.ts  # 8 pass, 0 fail, 24ms
  bun test                                   # failure set unchanged
                                             # vs. HOME=/tmp/fake wrapper
Two narrow fixes for gbrain doctor's resolver_health check:

1. Remove duplicate "citation audit" trigger from maintain/SKILL.md.
   Both maintain and citation-fixer declared it, producing a
   mece_overlap warning. Citation-fixer owns citation-only work;
   maintain's surface stays on broad health (brain health, maintenance,
   orphan pages, stale pages, check backlinks).

2. Reword enrich/SKILL.md:105 to avoid re-stating the notability gate
   rule. The rule is properly delegated at line 174 via a
   "(see skills/_brain-filing-rules.md)" reference; line 105 duplicated
   it far outside the 40-line proximity window that check-resolvable.ts
   uses to suppress near-callout mentions. Rewritten to "(see Step 6
   for creation criteria)" so the single delegated reference at line
   174 is the only source of truth.

Verified via bin/gbrain doctor --fast --json: resolver_health now OK,
health_score 90 -> 95.

Note: gbrain doctor --fix cannot auto-resolve fix 2 because
"notability gate" appears at two locations, hitting dry-fix.ts's
ambiguous_multiple_matches guard. Manual edit required.
Autopilot-cycle was hitting the 10-min per-job timeout intermittently on
the embed phase. Root cause: embedAll(staleOnly=true) iterated every page
via listPages + per-page getChunks to discover stale chunks, costing ~15K
Supabase round-trips per cycle even with 100% embed coverage.

Add listSlugsPendingEmbedding() to BrainEngine: one UNION query returns
slugs that either have stale chunks (embedded_at IS NULL) OR have no chunk
rows at all (pages created via direct putPage from migrate-engine,
enrichment-service, etc.). embedAll(staleOnly=true) now calls it first
and early-exits on [], iterating only the pending set otherwise.

Extract buildInitialChunkInputs() from embedPage and reuse it in
embedOnePage so zero-chunk pages get chunked + embedded through the
bulk path, not only through per-slug CLI invocation. Keeps the worker
pool throttle (GBRAIN_EMBED_CONCURRENCY) around getPage + upsert calls.

Also raise autopilot-cycle per-job timeout default to 15 min and expose
GBRAIN_AUTOPILOT_JOB_TIMEOUT_MS env override. Cycle report carries
pending_pages count for observability.

Tests: 5 new pglite-engine cases (including zero-chunk UNION coverage),
3 embed regression tests (fast-path empty, no hydration fan-out, zero-chunk
chunking on the fly), plus dry-run coverage, and 3 new E2E cases against
Postgres.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
orendi84 and others added 15 commits April 26, 2026 14:14
Resolution in src/core/types.ts: keep Gary's local PageType additions
('mandate', 'knowledge') alongside upstream's new 'code' type, keep
PageFilters.tags_all / tags_any (still wired in postgres-engine /
pglite-engine / operations), accept all upstream additions verbatim
(PageKind, page_kind, code-chunk metadata on Chunk/ChunkInput,
Cathedral II SearchOpts, CodeEdge types).

Same shape as c7376d9 — additive PageType union, no DB migration
needed (page_type column has no CHECK constraint).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six conflict regions across src/commands/embed.ts and test/embed.test.ts.
Resolution per codex's pre-merge recipe:

- staleOnly delegates to embedAllStale (upstream pattern); embedAllStale
  now also handles zero-chunk pages via listSlugsPendingEmbedding so
  pages created via direct putPage (migrate-engine, enrichment-service,
  output/writer) get chunked + embedded instead of silently skipped.
- Each pending slug routes via bySlug.has(slug): stale-rows go through
  upstream's merge-preserve; zero-chunk slugs go through buildInitialChunkInputs
  -> upsertChunks -> re-read -> embed (all inside the throttled worker pool).
- Dry-run path counts both stale rows and what zero-chunk pages would
  produce so the preview is honest about bootstrap work.
- listSlugsPendingEmbedding predicate aligned to embedding IS NULL on
  both postgres-engine and pglite-engine (matches countStaleChunks /
  listStaleChunks; embedded_at can lag during bulk import).
- New regression test: countStaleChunks=0 + listSlugsPendingEmbedding=
  ['new-page'] -> runEmbedCore({stale:true}) chunks and embeds it.

autopilot.ts auto-merge naturally landed both upstream's maxWaiting:1
backpressure and Gary's GBRAIN_AUTOPILOT_JOB_TIMEOUT_MS env override.

bun test: 2886 pass / 268 skip / 0 fail.
bun build --compile: clean.
./bin/gbrain stats: 15591 pages / 133442 chunks / fully embedded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ULL predicate

Codex consult flagged that the BrainEngine interface comments for
countStaleChunks, listStaleChunks, and listSlugsPendingEmbedding still
documented `embedded_at IS NULL` after the v0.22.1-v0.22.4 merge moved
the implementation to `embedding IS NULL`. Future readers would have
treated the docstring as the contract. Updated all three to match
implementation and added a one-liner explaining why embedded_at is
not the truth source (bulk-import path leaves it populated while
embedding stays NULL).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cycle.ts:585 was calling engine.listSlugsPendingEmbedding() purely for
the cycle-report summary string, then runEmbedCore -> embedAllStale
would call the same query again at embed.ts:367 inside its Promise.all.
On a 14.9s mean query firing every 5-minute autopilot cycle this doubled
the steady-state DB cost on a query that returns 0 rows in steady state.

Thread the count through EmbedResult.pending_pages instead. embedAllStale
already has pendingSlugs.length from its existing Promise.all; surface it
on the result object so cycle.ts can read it without a second query.

- EmbedResult: new pending_pages: number field, initialized to 0 in the
  runEmbedCore result literal so all 3 return paths (slugs, all/stale,
  single slug) satisfy the non-optional type
- embedAllStale: assign result.pending_pages = pendingSlugs.length right
  after the Promise.all destructure
- cycle.ts runPhaseEmbed: drop the standalone listSlugsPendingEmbedding
  call, read result.pending_pages from runEmbedCore's return instead
- delete the misleading "Cheap: paid once per cycle" comment that was
  wrong on both halves (15s mean, called twice)

Tests:
- test/core/cycle.test.ts mock returns pending_pages: 3, new assertion
  verifies cycle phase summary reads "3 pending page(s)" from the
  threaded count
- test/embed.test.ts: new test covers the fully-embedded fast path
  (pending_pages == 0), assertions added to existing zero-chunk and
  multi-page tests
- test/e2e/mechanical.test.ts: stale predicate test was setting
  embedded_at instead of embedding (the actual filter), failing on
  master since the v0.22.1 garrytan#409 alignment. Updated to set both columns
  to mirror upsertChunks() and renamed the test accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hot path

Adds a stored boolean to pages, maintained by putPage and revertToVersion
in both engines via computeHasChunkableText helper. Worker query branch 2
filters on the partial covering index idx_pages_chunkable instead of
running regexp_replace on every row. Pre-fix this branch was 74% of all
production DB time; post-fix it should be sub-100ms.

Migrations v33 (column + backfill) and v34 (CONCURRENTLY index with
TS-probe retry-safety pattern). Fresh installs pick up the column +
index from src/schema.sql and src/core/pglite-schema.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tstrap

Caught during apply-migrations on the live brain. CREATE TABLE pages
IF NOT EXISTS is a no-op on stale brains, so a brain whose pages table
predates v0.22.8.0 never gets has_chunkable_text added during initSchema,
and the subsequent CREATE INDEX idx_pages_chunkable fails with
"column has_chunkable_text does not exist".

This is the 4th recurrence of the schema-bootstrap-on-future-columns
footgun (see v0.13, v0.16, v0.27 history). Canonical fix is the
pre-ALTER pattern: ADD COLUMN IF NOT EXISTS before CREATE INDEX in both
schema.sql and pglite-schema.ts. Fresh installs are unaffected (CREATE
TABLE adds the column, ALTER becomes a no-op). Stale brains get the
column from the ALTER, so the partial index can land before migration
v33 runs.

Verified by re-running apply-migrations on the live brain: 29 -> 34
in 132s, all 5 migrations applied, idx_pages_chunkable valid + 1.7 MB,
worker query plan switched from Parallel Seq Scan to Index Only Scan
(283 ms vs 24,756 ms pre-fix, ~87x).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Supabase advisor lint 0011 (function_search_path_mutable) for the
3 plpgsql trigger functions in source so the live-DB hardening survives
schema bootstrap on fresh brains and `gbrain init --migrate-only`.

- src/schema.sql: update_chunk_search_vector + notify_minion_job_change
  get SET search_path = pg_catalog, public (sufficient: they only call
  pg_catalog built-ins; pg_temp implicit-first does not apply to
  functions/operators). update_page_search_vector gets stricter
  SET search_path = '' plus schema-qualified FROM public.timeline_entries
  because it reads an unqualified relation, and pg_temp implicit-first
  DOES apply to relations.
- src/core/pglite-schema.ts: parity edit on update_page_search_vector.
- src/core/migrate.ts: new v35 migration with sqlFor split. PGLite branch
  hardens only the two functions that exist in pglite-schema.ts
  (notify_minion_job_change is intentionally Postgres-only).
- src/core/schema-embedded.ts: regenerated via bun run build:schema.
- test/migrate.test.ts: LATEST_VERSION guard 34->35; 3 new structural
  tests asserting v35 sqlFor shape per engine.

Live Supabase brain already hardened manually on 2026-04-29; v35 is a
behavioral no-op there. Full unit + E2E suites green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v0.22.8.0 docs annotation commit modified CLAUDE.md but didn't
run `bun run build:llms`, leaving `test/build-llms.test.ts`
failing on master. Regenerating brings the committed bundle back
into sync with the generator output and unblocks the test suite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`bun run test:e2e` calls paths that resolve to `gbrain init` /
`saveConfig` (e.g. setupDB writing config for the test container)
which would otherwise overwrite the user's real
`~/.gbrain/config.json`. Three operators hit this in 11 days; the
docker container tearing down after the run wedged the live autopilot
because the worker held the original AWS Postgres sockets in memory
but config now pointed at `localhost:5434`.

The wrapper now exports both HOME and GBRAIN_HOME to a
`mktemp -d` tmpdir before bun starts (loadConfig/saveConfig resolve
via HOME, configPath/getDbUrlSource honor GBRAIN_HOME - both required
to avoid asymmetric escape paths). HOME is set before bun starts because
Bun's `os.homedir()` caches at first call and in-process mutation
cannot beat the cache.

Post-run breach detector covers three modes: config existed and md5
changed, config existed and was deleted, config did not exist before
but was created during run. Exit 2 with a loud banner distinguishes
isolation breaches from regular test failures (exit 1).

Portable mktemp (`mktemp -d "${TMPDIR:-/tmp}/gbrain-e2e.XXXXXX"")
because GNU mktemp on Linux CI errors on `-t prefix` without explicit
Xs in the template. `md5_of` body is wrapped in `{ ... } || true`
so a missing config file or missing md5 binary never trips `set -e`
before the breach detector can run.

Verified: pgvector/pgvector:pg16 lifecycle on port 5434, 27 files /
245 tests / 0 failures across two runs, user config md5 byte-identical
before and after each run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md "E2E test DB lifecycle" now documents the HOME / GBRAIN_HOME
tmpdir isolation contract, the wrapper-only exit-2 breach detector, and
why per-file `bun test test/e2e/...` calls must set both env vars
themselves. CONTRIBUTING.md gets a contributor-facing pointer to the
same wrapper behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@orendi84

Copy link
Copy Markdown
Contributor Author

Heads up: merged downstream as ec43bba on the orendi84/gbrain master while testing. The fix is operational in my local install, but this PR remains an open contribution offer to garrytan/gbrain - the bug pattern (E2E config-overwrite wedging local autopilot) is generic enough that anyone running the gbrain test suite locally will benefit. No urgency on the upstream merge from my side; just leaving it here for the public project's discretion.

@orendi84

Copy link
Copy Markdown
Contributor Author

Closing in favor of #517, a cleanly-scoped replacement.

This branch had inadvertently picked up 41 commits of downstream Phase 2 work that hadn't been merged upstream yet, turning a single-file E2E fix into a 27-file, +2,208-line integration PR with substantive src/cli.ts conflicts. Rebasing or merging that against current upstream master would have buried the actual config-corruption fix under a delayed integration. #517 is upstream/master + 2 commits (the same df60281 E2E fix and ae148f0 version bump cherry-picked clean), exactly what this PR claimed to be.

Same fix, smaller diff. The Phase 2 integration is a separate workstream that deserves its own PR(s) when it lands.

@orendi84 orendi84 closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant