v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes by garrytan · Pull Request #1313 · garrytan/gbrain

garrytan · 2026-05-23T04:19:28Z

Summary

Test-coverage wave closing 4 of 10 audit gaps with new behavioral coverage, verifying 3 audit gaps already had coverage the audit missed, and root-causing 2 pre-existing master test-infra flakes that surfaced during /ship.

Doctor refactor — extracted buildChecks(engine, args, dbSource): Promise<Check[]> from runDoctor in src/commands/doctor.ts. Behavior-preserving; all 10 process.exit sites stay in the wrapper. Snapshot-pinned against accidental check drop-outs.

Operations trust-boundary contract — table-driven test over all 74 ops + handler-invocation regressions for the 2 historically-broken HTTP-callable classes (submit_job shell + search_by_image image_path). Shell guard at scripts/check-operations-filter-bypass.sh wired into bun run verify catches future HTTP-exposing modules that bypass the canonical localOnly filter — 3 import shapes detected (destructured, aliased, namespace).

Cycle phase wrappers — exported runPhaseLint + runPhaseBacklinks from src/core/cycle.ts. Combined test/cycle-legacy-phases.test.ts pins result-mapping + error envelope for both phases. Future phase wrappers land as additional describes here.

Master flake fixes (separate concern, same wave):

test/ai/gateway.test.ts: added afterAll(resetGateway) to stop module-state leak that was poisoning sibling tests with OPENAI_API_KEY=openai-fake
test/ai/header-transport.test.ts → .serial.test.ts: quarantined to stop cross-shard race against RECIPES + configureGateway

Test Coverage

[+] src/commands/doctor.ts — extract buildChecks
  ├── buildChecks orchestrator → test/doctor-behavioral.test.ts [13 cases, ★★★]
  ├── runDoctor wrapper render/exit → test/doctor-cli-smoke.serial.test.ts [1 case, ★★★]
  └── early-return paths → 2 cases in doctor-behavioral [★★]

[+] src/core/cycle.ts — export runPhaseLint/Backlinks
  └── result-mapping + error envelope → cycle-legacy-phases [11 cases, ★★★]

[+] test/operations-trust-boundary.test.ts (new, 14 cases)
  └── 74-op scope contract + 7-name localOnly snapshot + 2 sensitive-op handler invocations [★★★]

[+] scripts/check-operations-filter-bypass.sh (new shell guard)
  └── 3 import-shape detection (destructured/aliased/namespace) [★★★]

[+] test/ai/gateway.test.ts +afterAll fix [★★]
[+] test/ai/header-transport → .serial.test.ts (quarantine) [★★]

COVERAGE: 6/6 new code paths tested (100%)
QUALITY: ★★★:5 ★★:2
GAPS: 0 silent (4 deliberate TODOs filed)

Tests: 8790 → 9056 (+266 across the wave + master merge)

Pre-Landing Review

No issues found. Small additive diff (~120 lines plus tests). No SQL changes, no LLM trust boundary changes, no JSON.stringify/jsonb interpolation. Behavior-preserving refactor pinned by snapshot test.

Adversarial Review

Codex caught 9 findings. 2 were real bugs and fixed:

feat: GBrain v0.1.0 — Postgres-native personal knowledge brain #1: Shell guard regex bypassable via import { operations as ops } or import * as. Fixed: expanded regex covers all 3 shapes.
docs: expand brain schema with database architecture and OSS smoothing #4: KNOWN_LOCAL_ONLY snapshot was missing 3 of 7 actual localOnly ops (purge_deleted_pages, get_recent_transcripts, code_traversal_cache_clear). Fixed: expanded snapshot.

2 findings were verified as false positives (#6 progress.finish placement, #8 runLintCore missing-dir behavior — both checked against source and correct as-shipped).

5 findings absorbed as follow-up TODOs (NEW-1 through NEW-4 + gateway-mutating audit).

Plan Completion

10/10 plan tasks DONE. 4 of the 10 audit gaps turned out to be non-gaps after surveying actual repo state — test/ingestion/{dedup,daemon,skillpack-load}.test.ts already had 88 tests, test/phantom-redirect.test.ts already had 38 tests, test/ingestion/test-harness.test.ts already had 36 tests. The audit was based on stale information.

Verification Results

bun run test (parallel fast loop): 9056 pass / 0 fail (~8 minutes wallclock)
bun run verify (17 shell checks + typecheck): green
Manual gbrain doctor --json against fresh PGLite tempdir: exit 0, 47 checks, schema_version=2

TODOS

5 new TODOs filed for v0.41+ follow-ups (4 from codex CMT findings + 1 from gateway-mutating audit). No items marked complete (this wave's work didn't overlap with prior TODOs).

Documentation

CLAUDE.md — added three Key-files entries for v0.40.4.1: src/commands/doctor.ts extension noting the new exported buildChecks seam + behavioral test coverage + subprocess smoke; src/core/cycle.ts extension noting the new runPhaseLint + runPhaseBacklinks exports + test/cycle-legacy-phases.test.ts; operations trust-boundary contract entry covering the new test + shell guard wired into bun run verify.

llms-full.txt — regenerated via bun run build:llms to match the CLAUDE.md edits.

CHANGELOG.md, TODOS.md — already complete from the ship.
README.md — unchanged; pure test-coverage wave has no user-facing surface to advertise.

Test plan

All unit tests pass (8 shards, 9056 tests, 0 failures)
All shell checks pass (bun run verify clean)
Typecheck clean (bun run typecheck)
Pre-existing master flakes root-caused (not bandaged)
Manual doctor smoke against fresh PGLite tempdir: exit 0, schema_version=2 envelope intact

🤖 Generated with Claude Code

gateway.test.ts: add afterAll(resetGateway) hook. The file's tests use beforeEach(resetGateway) for per-test isolation, but the FINAL test was leaving configureGateway({env: {OPENAI_API_KEY: 'openai-fake'}}) in the gateway module state. Sibling files in the same bun shard (e.g. test/ingestion/ingest-capture.test.ts) then triggered embed() against the real OpenAI endpoint with the leaked fake key, wedging the shard with 'Incorrect API key provided: openai-fake'. header-transport.test.ts → .serial.test.ts: the file mutates RECIPES (gateway's module-scoped recipe map) plus configureGateway, then asserts fakeChatFetch was invoked. Under bun's intra-shard parallelism, sibling files like test/ai/rerank.test.ts race the same state — chat would see result.text === '[]' instead of 'ok' because another test called resetGateway between this test's configureGateway and chat. Quarantining as .serial.test.ts moves the file into the post-parallel serial pass at --max-concurrency=1 per repo convention.

src/commands/doctor.ts: extract buildChecks(engine, args, dbSource): Promise<Check[]> from runDoctor. The check-building logic moves into the new exported function; existing exported computeDoctorReport(checks) at line 78 stays untouched. runDoctor becomes a thin wrapper: buildChecks → computeDoctorReport → render + process.exit. All 10 process.exit sites stay in place. The two early-return paths drop their inline outputResults+process.exit calls and return the partial check list; the wrapper still produces identical observable output. test/doctor-behavioral.test.ts (13 cases): pure pure-aggregation cases pin computeDoctorReport math (3 fails → -60 points, score clamped at 0, mixed outcome → unhealthy with fail dominating). Orchestrator cases assert --fast flag honors the skip set, --json doesn't alter the list, no-engine path returns partial without process.exit, and the snapshot of load-bearing check names catches accidental drop-outs during future refactors. test/doctor-cli-smoke.serial.test.ts (1 case): subprocess smoke spawning 'bun run src/cli.ts doctor --json' against a fresh PGLite tempdir brain. Catches render-path bugs that buildChecks-only tests miss — the class the v0.38.2.0 partial-scan wave exposed. Quarantined as .serial because PGLite write-locks don't play well with parallel runners; skippable via GBRAIN_SKIP_SUBPROCESS_TESTS=1.

…guard test/operations-trust-boundary.test.ts (14 cases): hybrid design per plan D7. Pure assertions over all 74 ops cover the drift-detection win (every op has a scope; every mutating op has a non-read scope; hasScope(['read'], op.scope) correctly rejects 'admin' or 'write'). Plus the canonical filter contract: every localOnly: true op is excluded from operations.filter(op => !op.localOnly). Plus targeted handler-invocation cases for the two historically-broken HTTP-callable classes: submit_job(name='shell', ctx.remote=true) MUST reject (F7b HTTP MCP shell-job RCE class), and search_by_image(image_path, ctx.remote=true) MUST reject (D18 P0 image-leak class). file_upload and sync_brain are deliberately omitted from handler-invocation tests because they're localOnly — calling their handlers directly tests an impossible production path (codex CMT-3). All 7 localOnly ops are snapshot-pinned by name to catch future flag-flips. scripts/check-operations-filter-bypass.sh: greps src/ for any module that imports the 'operations' value from core/operations.ts outside the canonical filter site. Three import shapes detected: destructured, aliased ('as ops'), namespace ('import * as'). Explicit allow-list of 10 known-safe importers with one-line rationale per entry. Plus a filter-presence check on serve-http.ts that fails if the canonical filter expression is refactored out. Codex /ship adversarial review caught the original narrow regex missed aliased + namespace bypasses; the expanded regex closes that class. Type-only imports of sibling exports (sourceScopeOpts, OperationContext) are not flagged. package.json: wires check:operations-filter-bypass into the verify chain alongside check:jsonb and check:progress.

…ests src/core/cycle.ts: adds 'export' keyword to two existing phase functions so behavioral tests can drive them without going through runCycle's full setup cost. No body changes; no behavior change. Documented as internal helpers exposed for test-only consumption — downstream code should NOT take a dependency on them; existing plan-eng-review D9 explicitly accepted the API-widening tax for testability. test/cycle-legacy-phases.test.ts (11 cases): combined file with two describe blocks per plan D5 (DRY — shared setup, future phase wrappers land as additional describes). Narrowed to result-mapping + error envelope per codex CMT-1 (legacy phases don't extend BaseCyclePhase and don't take a progress reporter or AbortSignal directly, so the contract surface is counter → status enum + try/catch envelope). Cases: clean run → status='ok', partial fix → status='warn' with dryRun in details, dry-run path doesn't write, throw-from-lib → status='fail' with envelope populated (no exception escape). Verified runLintCore and runBacklinksCore both throw on missing dir, so the throw-from-lib cases are deterministic.

E2E + unit test gap coverage wave. Closes 4 audit gaps with new behavioral coverage (doctor orchestrator + subprocess smoke, operations trust-boundary contract + filter-bypass guard, cycle phase wrapper result-mapping). Verifies 3 audit gaps were already covered (ingestion dedup/daemon/skillpack-load, phantom-redirect, ingestion test-harness). Root-causes 2 pre-existing master flakes (gateway state leak, header- transport cross-shard race). Files 5 follow-up TODOs from codex adversarial-review findings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…DE.md Adds three Key-files entries pinning the v0.40.4.1 test-wave additions: - doctor.ts extension: buildChecks seam + behavioral tests (13+1 cases) - cycle.ts extension: runPhaseLint + runPhaseBacklinks exports (11 cases) - operations-trust-boundary contract + check-operations-filter-bypass.sh Regenerates llms-full.txt to match (CLAUDE.md edits require build:llms per project rule, otherwise test/build-llms.test.ts fails in CI shard 1). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json

…ed in CI CI failure mode: 9 tests in test/ingestion/put-page-write-through.test.ts failed with `AIConfigError: [embed(zeroentropyai:zembed-1)] Unauthorized` because put_page's handler at src/core/operations.ts:622 computes `noEmbed = !isAvailable('embedding')`. When the gateway state has been configured by a sibling test (or by the cli.ts module-load path reading .env.testing) with a fake/stale ZEROENTROPY_API_KEY, isAvailable returns true → put_page tries to embed → the real ZeroEntropy API returns 401. Local dev passes because real ZE keys are present; CI doesn't have them. Fix: call resetGateway() in beforeEach so isAvailable('embedding') returns false → put_page's noEmbed path activates → no network call. Also reset in afterAll to avoid leaking the cleared state to sibling files in the same bun shard (the v0.40.4.1 gateway state-leak class that motivated the earlier gateway.test.ts fix). The test exercises write-through behavior, not embedding. No need for a real or fake embed transport — bypass entirely.

# Conflicts: # CHANGELOG.md # VERSION # package.json

# Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json

…ming variance CI failure: scanBrainSources partial-scan state > "hanging COUNT does not exceed deadline — Promise.race timeout fires" failed once on a GitHub Actions runner. The test asserted a 100ms deadline budget with a 500ms bound; observed test duration was 187ms on CI (passes locally 20/20 runs at the original budget). Root cause: Node.js timer drift under shard parallelism. The deadline check at src/core/brain-writer.ts:503 uses strict `Date.now() > deadline`, so when the setTimeout in Promise.race fires exactly at the boundary (e.g. start+100ms when deadline is start+100ms), the post-await check sees equality and skips the markRemainingSkipped branch. The test also asserts elapsed < 500ms; CI overhead can push elapsed past that bound when setTimeout drifts. Fix: widen the deadline budget on both deadline-race tests proportionally (keeps the same 2x ratio that proves "query exceeds deadline"). No src/ changes — this is purely a test robustness widening. - "hanging COUNT" test: 100ms → 500ms deadline, 500ms → 2500ms bound - "slow COUNT" test: 50ms → 250ms deadline, 100ms → 500ms query delay Verified locally: 20/20 stress runs at the widened budgets, no fails.

# Conflicts: # CHANGELOG.md # VERSION # package.json

…ndary) CI failure recurred: same "hanging COUNT does not exceed deadline" test failed again at 588ms (past my previous 500ms deadline + 2500ms bound widening). The root cause isn't test timing — it's an off-by-one in the source. src/core/brain-writer.ts had two deadline checks using strict `>`: - line 445 (between-source abort) - line 503 (post-COUNT-await re-check) The Promise.race setTimeout resolves null at exactly `remainingMs` from now, so post-await Date.now() OFTEN equals the deadline within integer-ms precision. With `>`, the check skipped → scanOneSource ran on the source whose budget had just been eaten → that source got status='scanned' instead of 'skipped'. The test's `expect(firstSource .status).toBe('skipped')` failed. Fix: both checks now use `>=`. When Date.now() equals deadline exactly, the budget IS exhausted — proceeding would let the next source eat its own budget on top of what's already spent. Matches the boundary the Promise.race's remainingMs <= 0 immediate-null path uses (line 481). This is the real fix for the v0.40.x CI flakes; my earlier test-budget widening papered over the symptom without closing the boundary. Kept the wider 500ms deadline for headroom but added a comment pointing at the operator fix as the load-bearing change. Verified: 20/20 stress runs green locally after the operator fix.

* upstream/master: (22 commits) v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377) v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403) v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364) v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352) v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367) v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351) v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350) v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345) v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313) v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333) v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327) v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324) v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322) v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300) v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323) v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296) v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298) v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128) v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308) v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295) ...

garrytan and others added 11 commits May 22, 2026 21:16

Merge remote-tracking branch 'origin/master' into garrytan/e2e-test-wave

633b535

# Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/e2e-test-wave

8057721

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/e2e-test-wave

ac45bd8

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/e2e-test-wave

4228b23

# Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json

garrytan changed the title ~~v0.40.4.1 test: e2e + unit gap coverage + master flake root-cause fixes~~ v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes May 23, 2026

garrytan added 4 commits May 23, 2026 16:40

Merge remote-tracking branch 'origin/master' into garrytan/e2e-test-wave

d1107f1

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/e2e-test-wave

250ff74

# Conflicts: # CHANGELOG.md # VERSION # package.json

garrytan merged commit 41ab138 into master May 24, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes#1313

v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes#1313
garrytan merged 15 commits into
masterfrom
garrytan/e2e-test-wave

garrytan commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 23, 2026

Summary

Test Coverage

Pre-Landing Review

Adversarial Review

Plan Completion

Verification Results

TODOS

Documentation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant