fix: stop PGLite CLI commands from hanging#1337
Closed
matt-dean-git wants to merge 1 commit into
Closed
Conversation
6 tasks
garrytan
added a commit
that referenced
this pull request
May 25, 2026
…1342 breadcrumbs (#1405) * fix(pglite): drain fire-and-forget last_retrieved_at writes before disconnect Closes the structural bug class behind #1247, #1269, #1290: PGLite CLI search/query/get_page commands printed results then hung at ~95-98% CPU until SIGKILL. Root cause: bumpLastRetrievedAt's IIFE races engine.disconnect() — PGLite's WASM runtime keeps Bun's event loop alive while the dangling UPDATE settles. Mirrors the existing awaitPendingSearchCacheWrites precedent landed in v0.36.1.x for #1090. Tracks every IIFE promise in a module-scoped Set, exposes awaitPendingLastRetrievedWrites(timeoutMs) that resolves once all settle. Bounded with a 5s default timeout via Promise.race so a future fire-and-forget that hangs forever can't recreate the bug class at this layer — instead, the drain stderr-warns with a pending count and returns timeout outcome so the caller can decide its fallback. Test coverage: 6 unit cases covering empty drain, single + multi-pending settle, throw-in-IIFE still settles, permanently-pending hits timeout within bound, empty pageIds does not track. This commit ships the helper + tracking + tests with NO consumer. The cli.ts wiring lands in a follow-up commit (atomic bisect units). Co-Authored-By: Park Je Hoon <jehoon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pglite): snapshot+early-null disconnect + try/finally lock-leak guard Refactor PGLiteEngine.disconnect() with two structural fixes: (1) Snapshot + early-null pattern: capture db/lock refs and null the instance fields BEFORE any await. A concurrent connect() can no longer observe `_db` pointing at a handle that's mid-close. This is PR #1337's load-bearing contribution that we DID take. (2) Wrap close + release in try/finally. Without this guard, a thrown db.close() would leak the file lock and wedge every next gbrain invocation on the stale lock. Codex outside-voice review (eng review finding #7) caught this gap when reviewing the snapshot refactor. KEEP the original close-then-release order. PR #1337's diff swapped this to release-then-close, which we explicitly REJECTED — releasing the lock before close lets a sibling process try to connect to a still-closing brain. The new lifecycle test file pins this ordering so a future maintainer reading PR #1337's diff cannot accidentally flip it. Test coverage in test/pglite-engine-disconnect.serial.test.ts: 5 cases — close-before-release ordering, early-null observable inside close, lock-still-releases on close-throw, double-disconnect idempotency, reconnect-after-disconnect clean state. `.serial` because each test creates a fresh PGLite engine (WASM cold-start cost) — running in parallel shards would starve other tests. Existing test/pglite-engine.test.ts: 100/100 still green. Co-Authored-By: Matt Dean <matt-dean-git@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pglite): classify WASM init errors so #1340 gets the right hint (#1340) Closes the user-facing half of #1340: on macOS 12.7.6 + Bun 1.3.14, the PGLite connect() catch block hardcoded the macOS 26.3 hint (#223). The actual root cause for #1340 is Bun's vfs: `/$$bunfs/root` is read-only on older macOS, so PGLite cannot extract its pglite.data WASM payload. Adds two exported helpers in pglite-engine.ts: classifyPgliteInitError(message): 'bunfs' | 'macos-26-3' | 'unknown' buildPgliteInitErrorMessage(verdict, original): string Connect catch block now routes the hint by verdict. The bunfs hint names `bun upgrade` + Node fallback. The macOS 26.3 hint keeps the existing #223 link. Unknown falls through to a generic doctor + #223 fallback. Per Codex eng-review finding #9, the bunfs regex is tightened to match either the literal `$$bunfs` marker OR ENOENT+pglite.data co-occurrence — NOT generic `pglite.data` substring (would fire on unrelated errors). Negative test pinned. Root fix is upstream Bun; this PR just stops misclassifying the failure class so support traffic doesn't conflate two unrelated bugs. Test coverage: 12 pure-function unit cases including the #1340 reporter's exact error string round-trip, the negative case Codex caught, and all three verdicts × all three message contents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): await last-retrieved drain + narrow timeout-only force-exit (#1247, #1269, #1290) Wires the v0.40.10.0 drain helper into cli.ts and adds the IRON-RULE behavioral regression test for the search-hang class. The drain is called unconditionally for every op (not per-op-name gated — that was the original PR #1259 mistake that left search and get_page exposed). The narrow force-exit synthesis (decision D7 from the eng review, informed by Codex outside-voice findings #1+#2+#8): when the drain returns outcome:'timeout', AFTER engine.disconnect() resolves AND the command is NOT a daemon, fire process.exit(0). The drain helper already stderr-warned with the pending count, so the diagnostic signal is preserved. Without this guard, a hung underlying promise could still keep Bun's event loop alive past disconnect. CRITICALLY narrower than PR #1337's blanket force-exit: the timeout path is the only trigger. In the common case (drain settles cleanly under 5s), no force-exit fires and the behavioral subprocess test still catches future regressions. The shouldForceExitAfterMain guard excludes 'serve' so the stdio + HTTP daemons stay alive past main(). e2e/pglite-cli-exit.serial.test.ts (NEW, IRON RULE): - gbrain search "foxtrot" → exits 0 within 15s - gbrain get alpha → exits 0 within 15s with foxtrot in stdout - gbrain query "foxtrot" --no-expand → exits within 15s (no-API-key graceful) - gbrain serve --http → stays alive 3+ seconds (daemon-survival regression guard) fix-wave-structural.test.ts: - import assertion for awaitPendingLastRetrievedWrites - last-retrieved.ts exports + Set tracking + Promise.race + timeout - BEHAVIORAL positioning assertion: drain `await` appears textually BEFORE engine.disconnect `await` in the op-dispatch local-engine path. Survives variable-rename refactors; catches any new disconnect path that bypasses the drain. - shouldForceExitAfterMain excludes 'serve' AND the gate is conditioned on drainResult.outcome==='timeout' Per D8 (Codex finding #5), explicitly do NOT add a drift-guard counting bumpLastRetrievedAt callers — would block harmless refactors and miss aliases. Co-Authored-By: Park Je Hoon <jehoon@users.noreply.github.com> Co-Authored-By: Matt Dean <matt-dean-git@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sync): add phase breadcrumbs to performSyncInner for #1342 triage The #1342 reporter saw ZERO stderr output before their PGLite sync hang, which made the bug impossible to triage from a community report alone. Mirrors the pre-existing `[gbrain phase] sync.git_pull start/done` pattern at the major pre-pull phase boundaries so the next #1342-shaped report names WHICH phase spun. Four new breadcrumbs at: - sync.resolve_repo (top of performSyncInner) - sync.load_active_pack (before the v0.39 T1.5 pack load) - sync.validate_repo_state (only when opts.sourceId is set — the re-clone branch) - sync.detect_head (before the isDetachedHead probe) No behavior change — pure stderr instrumentation. Doesn't fix #1342 (which still needs investigation per the TODOS entry filed in this wave), but converts "hung with no output" into actionable diagnostic data the next time the bug shape is reported. Per D9 in the eng review + Codex outside-voice finding #14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: annotate v0.40.10.0 PGLite hang wave in CLAUDE.md + regen llms Key Files entries updated: - src/core/pglite-engine.ts: documents the v0.40.10.0 disconnect refactor (snapshot+early-null + try/finally lock-leak guard, KEEPS close-then-release order), and the new classifyPgliteInitError / buildPgliteInitErrorMessage helpers for #1340 hint routing. Pins PR #1337's accepted-but-narrowed contribution and the rejected release-then-close ordering swap. - src/core/last-retrieved.ts (within the brainstorm entry): documents the new awaitPendingLastRetrievedWrites drain, the Set tracking pattern, the 5s bounded timeout, the cli.ts narrow timeout-only force-exit synthesis with the serve-daemon guard, and the three community-validated reports (#1247/#1269/#1290) the fix closes. Credits PR #1259 (drain pattern) and PR #1337 (snapshot pattern + force-exit guard idea). Regenerated llms.txt + llms-full.txt — build-llms.test.ts gates the drift, all 7 cases green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(todos): file v0.40.10.0 PGLite hang follow-ups Three deferred items from the v0.40.10.0 fix wave: 1. #1342 sync-hang investigation. Single-reporter, JS-tight-loop shape, needs reproducer before any fix. Documents the ruled-out hypotheses (lock-refresh heartbeat, v91 trigger, while-true loops) and three concrete diagnostic next steps. The v0.40.10.0 sync phase breadcrumbs make the next report actionable. 2. awaitPendingSearchCacheWrites timeout-symmetry retrofit. The #1090 drain shipped without a timeout; the v0.40.10.0 #1247 drain ships with one. Apply the same Promise.race + stderr warn pattern for symmetry. 3. Drain-helper extraction. Per D4 in the eng review: two surfaces is the threshold for noticing, three for extracting. Pair with the symmetry retrofit above as one focused refactor when a third fire-and-forget surface appears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.10.0 fix(pglite): search/query/get exit cleanly + #1340 hint + #1342 breadcrumbs Closes #1247, #1269, #1290 (PGLite CLI search/query/get hang at ~95-98% CPU after printing results — three community-validated reports). Also fixes #1340 (WASM init misroutes to macOS 26.3 hint when real cause is Bun vfs read-only mount) and adds diagnostic phase breadcrumbs for the single-reporter #1342 sync-hang investigation. Core fix: track every fire-and-forget bumpLastRetrievedAt IIFE in a module-scoped Set; cli.ts awaits the drain before engine.disconnect() in the op-dispatch finally block; narrow process.exit(0) fires ONLY when the drain times out AND the command isn't a daemon. Snapshot+ early-null disconnect pattern + try/finally lock-leak guard close the partial-state race PR #1337 originally surfaced. Co-Authored-By: Park Je Hoon <jehoon@users.noreply.github.com> Co-Authored-By: Matt Dean <matt-dean-git@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: extract shouldForceExitAfterMain to its own module + add unit cases Gap-audit follow-up: cli.ts is a script entrypoint (top-level main() side effect), so importing it from a test fires the help output as a side effect. Move shouldForceExitAfterMain into src/core/cli-force-exit.ts so it can be unit-tested in isolation without the cli.ts script tail running. Adds test/cli-should-force-exit.test.ts (9 cases): bare serve, serve with flags after, global flags BEFORE the command (the load-bearing case for `gbrain --quiet serve`), op commands return true, non-daemon CLI commands return true, empty argv defaults to true, flag-only argv, default-arg fallback to process.argv.slice(2), substring-match avoidance (`serves` is NOT `serve` — strict equality via Set, not startsWith/includes). The daemon command set is now an explicit ReadonlySet — future daemons (a hypothetical `gbrain watch` or `gbrain daemon`) just add their name to DAEMON_COMMANDS rather than chaining ||. Updates fix-wave-structural.test.ts to look for the import + the new DAEMON_COMMANDS shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(version): rebase v0.40.10.0 → v0.41.6.0 (slot collision after v0.41.0.0+ landed) origin/master moved from v0.40.8.1 → v0.41.0.0 while this wave was in flight (PR #1367 minions cathedral). v0.41.1-v0.41.5 are claimed by other in-flight branches, so v0.41.6.0 is the next available slot. Bulk-renamed v0.40.10.0 → v0.41.6.0 across: - VERSION + package.json (trio audit clean: 0.41.6.0 / 0.41.6.0 / 0.41.6.0) - CHANGELOG.md (header + 3 prose references) - CLAUDE.md (Key Files annotations) - TODOS.md (follow-up entry header) - src/cli.ts + src/core/cli-force-exit.ts + src/core/last-retrieved.ts + src/core/pglite-engine.ts + src/commands/sync.ts (inline comments) - test/* (describe blocks + test file headers) - llms-full.txt (regenerated via `bun run build:llms`) bun.lock unchanged (version-only bump, no dep churn) per Codex #12. Verify: 52/52 wave tests pass after rename, typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: quarantine seed-pglite to .serial.test.ts (parallel WASM cold-start flake) The full-suite run during the v0.41.6.0 fix wave ship hit a 30s timeout in test/seed-pglite.test.ts under heavy 4-shard parallel contention (4972/4973 passed before SIGKILL). The test passes 11/11 in isolation. Root cause: each test instantiates a fresh PGLiteEngine (5 instances across the file, one per test) because each case writes to a different mkdtemp-ed dbPath. Under parallel shard load, multiple shards each cold-starting PGLite WASM simultaneously stretches the per-instance init from ~5s to 30s+. The shared-engine pattern (canonical PGLite block in CLAUDE.md R3+R4) doesn't apply here — different dbPaths require different engines. Fix per CLAUDE.md test-isolation quarantine rules: rename to `.serial.test.ts` so the file runs in the post-parallel serial pass with full WASM init capacity. Same pattern as test/pglite-engine-disconnect.serial.test.ts (added in this wave) and test/brain-registry.serial.test.ts (pre-existing). Removes test/seed-pglite.test.ts from check-test-isolation.allowlist since the .serial.test.ts rename auto-exempts it from the R3+R4 lint (scan skips *.serial.test.ts). 641 non-serial unit files scanned, lint clean. Verify: - bun test test/seed-pglite.serial.test.ts → 11/11 pass in 4.19s - scripts/check-test-isolation.sh → OK - bun run verify → all gates pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pre-landing review fixes (C13 disconnect-hang, C1 set leak, C9 catch drain, M1 type drift) Adversarial review + maintainability specialist surfaced four real issues in the v0.41.8.0 wave. All four fixed in this commit; one deferred to TODOS.md as a v0.41+ follow-up (unusual caller pattern). **C13 [load-bearing, defense-in-depth for the wave's stated goal]:** `await engine.disconnect()` inside the op-dispatch finally can ITSELF hang on PGLite (db.close() racing OS-level FS state). When that happens, the entire wave's force-exit guard never runs — we recreate the original hang at a new layer. Fix: install an unref'd setTimeout hard-exit fallback BEFORE entering the try/catch/finally. The timer fires after DISCONNECT_HARD_DEADLINE_MS=10s with a stderr warn and process.exit(0). unref ensures it doesn't keep the loop alive on a healthy exit. Daemons (`serve`) are excluded by reusing the shouldForceExitAfterMain guard. **C9 [data freshness gap, narrow but real]:** The drain ran ONLY in the success branch of try. If `bumpLastRetrievedAt` fired (handler succeeded) but `JSON.parse(JSON.stringify(...))` or `formatResult` then threw, process.exit(1) killed the process and the in-flight UPDATE was discarded. Fix: drain in the catch path too before process.exit(1) (best-effort, bounded by the drain's own 5s timeout). **C1 [daemon leak]:** A timed-out IIFE used to stay in the pending-writes Set forever because its `.finally` never fires. Long-lived `gbrain serve` would accumulate references without bound across repeated timeouts. Fix: explicitly `delete` the snapshot's tracked promises from the Set after a timeout outcome. The IIFEs keep running (orphaned), but the Set no longer leaks references. Pinned by a new unit test that asserts the second drain after a timeout returns immediately with empty pending count. **M1 [silent type drift]:** `cli.ts` duplicated the `{outcome, pending}` literal shape instead of importing the `DrainOutcome` type that `last-retrieved.ts` exports exactly for this purpose. Two-line fix: add `type DrainOutcome` to the import and use it for `let drainResult`. Future changes to the return shape now propagate through TypeScript. **Deferred to TODOS.md (C6 — unusual caller pattern):** Concurrent connect/disconnect on the same `PGLiteEngine` instance can strand: disconnect snapshots+nulls the lock while connect is still in-flight, leaving the resolved engine with no file lock held. Fix requires an instance-level mutex; not worth the complexity for a caller pattern that doesn't appear in production (single instance per process, sequential lifecycle). Also broadened `test/fix-wave-structural.test.ts` regex to accept additional type-imports from `last-retrieved.ts` (e.g. the new `type DrainOutcome` import that M1 added). Test coverage: 53/53 wave tests pass (added C1-followup case to last-retrieved.test.ts). The C1 fix is also pinned by tightening the existing permanent-pending test's post-timeout assertion to expect empty pending count rather than the prior (stale) "stays in set" note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: post-ship documentation sync for v0.41.8.0 Consolidate the duplicate 'take advantage of v0.41.8.0' sections in the CHANGELOG entry into a single canonical block per the CLAUDE.md template. The wave originally landed with both '### How to take advantage' (line 13) and '### To take advantage' (line 57) as h3 headings. CLAUDE.md mandates one '## To take advantage of v[version]' h2 block per release entry, with verify steps + an issue-filing fallback for users hitting upgrade failures. Promoted the second block to h2, added the issue-filing step, and removed the redundant first block (the upgrade command is already covered in the verify steps). Itemized changes section was unchanged. llms.txt + llms-full.txt regenerated; structurally identical so no content changes shipped. * fix(test): find-experts-op queries schema dim instead of hardcoding 1536 (CI shard 1) CI shard 1 failed on this branch with: \"expected 1280 dimensions, not 1536\" from pgvector's CheckExpectedDim. Root cause: master's v0.36.0 changed DEFAULT_EMBEDDING_DIMENSIONS from OpenAI's 1536d to ZeroEntropy's 1280d (src/core/ai/defaults.ts:21). The test's basisEmbedding helper hardcoded dim=1536, so beforeAll's upsertChunks failed when the schema column was created at 1280d. Latent on master: the weight-aware LPT bin-packing in scripts/sharding.ts assigns files to shards deterministically based on the COMPLETE file set. My branch adds 5 new test files, which shifted find-experts-op.test.ts into shard 1. Master's shard 1 doesn't run this file (it lands in a different shard there), so the bug never surfaced in master's CI. Fix: query the actual column dim via SELECT atttypmod FROM pg_attribute after initSchema, then seed the embedding at that width. This handles both paths (no-env CI → 1280; env-configured local → 1536) without hardcoding either default. Verify: - bun test test/find-experts-op.test.ts → 11/11 pass with provider env - env -i bun test test/find-experts-op.test.ts → 11/11 pass without - bun run verify → all 21 parallel checks clean - bun run typecheck → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(lint): robust pure-bash allowlist match in check-test-isolation (CI verify) CI verify failed on PR #1405 with check:test-isolation flagging test/scripts/check-test-isolation.test.ts even though that file is on line 22 of the allowlist (and has been since v0.26.7 as a permanent exemption — its body contains process.env mutation fixtures that the lint legitimately matches). Could not reproduce locally on macOS bash 3.2 + BSD grep across any locale (C, C.UTF-8, POSIX). Suspect a subtle interaction between the prior `echo "$ALLOWLIST" | grep -qxF "$f"` form and one of: Ubuntu 24.04's bash 5 set-e/pipefail semantics, GNU grep edge case on the first-line entry, or `bun run` + GNU timeout subshell interaction. Diagnostic value of chasing further is low — the fix is to drop the grep+pipe form entirely. Switch is_allowlisted() to pure-bash `case $'\n'"$ALLOWLIST"$'\n' in *$'\n'"$f"$'\n'*) return 0 ;; esac` whole-line matching: - Locale-free (no character-class interaction) - Pipe-free (no pipefail / SIGPIPE / buffering) - Subshell-free (no env or exit-code propagation gotchas) - set-e-quirk-free (no left-side compound failure) - ~100x faster (no fork+exec per call across 689 files) Verified locally: lint OK (689 files), case-match returns true for the allowlisted file and false for a non-allowlisted file. bun run verify clean (21/21 parallel checks pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Park Je Hoon <jehoon@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Matt Dean <matt-dean-git@users.noreply.github.com>
Owner
|
Thanks for this, @matt-dean-git — closing as superseded. Your work shipped in v0.41.8.0. From the CHANGELOG:\n\n> Two community PRs proposed competing fixes. PR #1259 (jehoon, validated by @eloe, @bcallender, and @61tH0b) added the structural "wait for the background write to finish" pattern — the right approach, mirroring an existing fix we sh…\n\nReal appreciation for the contribution. |
rayers
added a commit
to rayers/gbrain
that referenced
this pull request
May 26, 2026
…ackend via pool.reserve() `PostgresEngine.initSchema()` acquired and released advisory lock 42 against bare tag-template syntax on the pool. postgres.js pulls a fresh physical connection from the pool on each call, so the acquire ran on backend A and the unlock fired on backend B. Postgres returned `false` silently from the unlock (lock not held by current session) and the original session sat idle in the pool indefinitely, still holding lock 42. Every subsequent `initSchema()` then blocked on `SELECT pg_advisory_lock(42)` until `statement_timeout` killed the query. Symptom: CLI cold-start hangs ~5 min on every CLI command that connects to the engine. Diagnosed against a real 440K-page Postgres brain where one backend had been holding lock 42 idle for 14h since the autopilot worker's first `initSchema()` call. Killing the stale backend with `pg_terminate_backend` dropped cold-start from 5+ min to 1.28s. The pre-fix comment claimed "Lane A does the routing and keeps pg_advisory_lock(42) on the SAME connection so the lock is correct" — that claim was wrong. `connectionManager.ddl()` returns the pool, not a reserved single connection. Fix: wrap the lock acquire + release pair in `pool.reserve()` so both queries pin to the same physical connection. Same pattern `executeRaw` uses elsewhere in this file (line 776). The reserved connection is also threaded through `applyForwardReferenceBootstrap` and the `SCHEMA_SQL` replay so the bootstrap probes continue to run inside the lock scope (preserves the Codex P1 fix from the v0.36 dreamy-thompson wave). `runMigrations`, `verifySchema`, and `dropZombieIndexes` still take `this` and may acquire their own pool connections — those are independent operations and their internal connection management is unchanged. Relation to PR garrytan#1337: matt-dean-git filed the same bug class for PGLite (file-based lock not releasing cleanly). This PR is the Postgres half. Test: regression case at test/e2e/postgres-initschema-advisory-lock.test.ts exercises two assertions that would fail pre-fix: 1. After a single `initSchema()`, lock 42 is not held by any backend. Pre-fix the acquire-on-A / unlock-on-B split left lock 42 granted on session A and the assertion returned a non-empty pg_locks row. 2. Two consecutive `initSchema()` calls each complete in <30s. Pre-fix the second call blocked on `SELECT pg_advisory_lock(42)` until the 5-min statement_timeout fired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 26, 2026
Both upstream-PR-ready patches got new status check entries reflecting
the v0.41.14.0 sync state. No content/decision changes — just keeping
the docs honest about how the fork patches have evolved through the
intervening 20 minor releases.
- v018-pglite-wal-durability-fix.md
- Adds 2026-05-26 status check entry.
- Notes that upstream v0.41.8.0 refactored disconnect() into snapshot-
and-early-null + try/finally (PR garrytan#1337 partial-state race fix), and
the fork's pg_switch_wal() patch is now FOLDED INTO upstream's new
structure (lives inside the new `try { if (db) { ... } }` block).
Refreshed line-number pointers (pglite-engine.ts:283,
brain-db.ts:164).
- Upstream-PR opportunity is BETTER now — patch surface is smaller
after the fold; upstream just needs the WAL flush before db.close().
- v034-google-recipe-max-batch-tokens.md
- Adds "Last status check: 2026-05-26" line.
- Un-stales the kos-compat-api /ingest example wording (that surface
retired in §6.28); replaced with the active surfaces (cron gbrain
invocations, gbrain serve --http MCP put_page response output,
interactive query). Motivation unchanged — Google embedding still
missing the field 20 minor versions later.
Both docs are now ready for a fork-operator manual PR file (`gh pr
create` from a personal upstream fork, cherry-picking the relevant
1-2 commit each, body = the markdown doc).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rayers
added a commit
to rayers/gbrain
that referenced
this pull request
May 29, 2026
…ackend via pool.reserve() `PostgresEngine.initSchema()` acquired and released advisory lock 42 against bare tag-template syntax on the pool. postgres.js pulls a fresh physical connection from the pool on each call, so the acquire ran on backend A and the unlock fired on backend B. Postgres returned `false` silently from the unlock (lock not held by current session) and the original session sat idle in the pool indefinitely, still holding lock 42. Every subsequent `initSchema()` then blocked on `SELECT pg_advisory_lock(42)` until `statement_timeout` killed the query. Symptom: CLI cold-start hangs ~5 min on every CLI command that connects to the engine. Diagnosed against a real 440K-page Postgres brain where one backend had been holding lock 42 idle for 14h since the autopilot worker's first `initSchema()` call. Killing the stale backend with `pg_terminate_backend` dropped cold-start from 5+ min to 1.28s. The pre-fix comment claimed "Lane A does the routing and keeps pg_advisory_lock(42) on the SAME connection so the lock is correct" — that claim was wrong. `connectionManager.ddl()` returns the pool, not a reserved single connection. Fix: wrap the lock acquire + release pair in `pool.reserve()` so both queries pin to the same physical connection. Same pattern `executeRaw` uses elsewhere in this file (line 776). The reserved connection is also threaded through `applyForwardReferenceBootstrap` and the `SCHEMA_SQL` replay so the bootstrap probes continue to run inside the lock scope (preserves the Codex P1 fix from the v0.36 dreamy-thompson wave). `runMigrations`, `verifySchema`, and `dropZombieIndexes` still take `this` and may acquire their own pool connections — those are independent operations and their internal connection management is unchanged. Relation to PR garrytan#1337: matt-dean-git filed the same bug class for PGLite (file-based lock not releasing cleanly). This PR is the Postgres half. Test: regression case at test/e2e/postgres-initschema-advisory-lock.test.ts exercises two assertions that would fail pre-fix: 1. After a single `initSchema()`, lock 42 is not held by any backend. Pre-fix the acquire-on-A / unlock-on-B split left lock 42 granted on session A and the assertion returned a non-empty pg_locks row. 2. Two consecutive `initSchema()` calls each complete in <30s. Pre-fix the second call blocked on `SELECT pg_advisory_lock(42)` until the 5-min statement_timeout fired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan-agents
pushed a commit
to garrytan-agents/gbrain
that referenced
this pull request
Jun 13, 2026
…hint + garrytan#1342 breadcrumbs (garrytan#1405) * fix(pglite): drain fire-and-forget last_retrieved_at writes before disconnect Closes the structural bug class behind garrytan#1247, garrytan#1269, garrytan#1290: PGLite CLI search/query/get_page commands printed results then hung at ~95-98% CPU until SIGKILL. Root cause: bumpLastRetrievedAt's IIFE races engine.disconnect() — PGLite's WASM runtime keeps Bun's event loop alive while the dangling UPDATE settles. Mirrors the existing awaitPendingSearchCacheWrites precedent landed in v0.36.1.x for garrytan#1090. Tracks every IIFE promise in a module-scoped Set, exposes awaitPendingLastRetrievedWrites(timeoutMs) that resolves once all settle. Bounded with a 5s default timeout via Promise.race so a future fire-and-forget that hangs forever can't recreate the bug class at this layer — instead, the drain stderr-warns with a pending count and returns timeout outcome so the caller can decide its fallback. Test coverage: 6 unit cases covering empty drain, single + multi-pending settle, throw-in-IIFE still settles, permanently-pending hits timeout within bound, empty pageIds does not track. This commit ships the helper + tracking + tests with NO consumer. The cli.ts wiring lands in a follow-up commit (atomic bisect units). Co-Authored-By: Park Je Hoon <jehoon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pglite): snapshot+early-null disconnect + try/finally lock-leak guard Refactor PGLiteEngine.disconnect() with two structural fixes: (1) Snapshot + early-null pattern: capture db/lock refs and null the instance fields BEFORE any await. A concurrent connect() can no longer observe `_db` pointing at a handle that's mid-close. This is PR garrytan#1337's load-bearing contribution that we DID take. (2) Wrap close + release in try/finally. Without this guard, a thrown db.close() would leak the file lock and wedge every next gbrain invocation on the stale lock. Codex outside-voice review (eng review finding garrytan#7) caught this gap when reviewing the snapshot refactor. KEEP the original close-then-release order. PR garrytan#1337's diff swapped this to release-then-close, which we explicitly REJECTED — releasing the lock before close lets a sibling process try to connect to a still-closing brain. The new lifecycle test file pins this ordering so a future maintainer reading PR garrytan#1337's diff cannot accidentally flip it. Test coverage in test/pglite-engine-disconnect.serial.test.ts: 5 cases — close-before-release ordering, early-null observable inside close, lock-still-releases on close-throw, double-disconnect idempotency, reconnect-after-disconnect clean state. `.serial` because each test creates a fresh PGLite engine (WASM cold-start cost) — running in parallel shards would starve other tests. Existing test/pglite-engine.test.ts: 100/100 still green. Co-Authored-By: Matt Dean <matt-dean-git@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pglite): classify WASM init errors so garrytan#1340 gets the right hint (garrytan#1340) Closes the user-facing half of garrytan#1340: on macOS 12.7.6 + Bun 1.3.14, the PGLite connect() catch block hardcoded the macOS 26.3 hint (garrytan#223). The actual root cause for garrytan#1340 is Bun's vfs: `/$$bunfs/root` is read-only on older macOS, so PGLite cannot extract its pglite.data WASM payload. Adds two exported helpers in pglite-engine.ts: classifyPgliteInitError(message): 'bunfs' | 'macos-26-3' | 'unknown' buildPgliteInitErrorMessage(verdict, original): string Connect catch block now routes the hint by verdict. The bunfs hint names `bun upgrade` + Node fallback. The macOS 26.3 hint keeps the existing garrytan#223 link. Unknown falls through to a generic doctor + garrytan#223 fallback. Per Codex eng-review finding garrytan#9, the bunfs regex is tightened to match either the literal `$$bunfs` marker OR ENOENT+pglite.data co-occurrence — NOT generic `pglite.data` substring (would fire on unrelated errors). Negative test pinned. Root fix is upstream Bun; this PR just stops misclassifying the failure class so support traffic doesn't conflate two unrelated bugs. Test coverage: 12 pure-function unit cases including the garrytan#1340 reporter's exact error string round-trip, the negative case Codex caught, and all three verdicts × all three message contents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): await last-retrieved drain + narrow timeout-only force-exit (garrytan#1247, garrytan#1269, garrytan#1290) Wires the v0.40.10.0 drain helper into cli.ts and adds the IRON-RULE behavioral regression test for the search-hang class. The drain is called unconditionally for every op (not per-op-name gated — that was the original PR garrytan#1259 mistake that left search and get_page exposed). The narrow force-exit synthesis (decision D7 from the eng review, informed by Codex outside-voice findings garrytan#1+garrytan#2+garrytan#8): when the drain returns outcome:'timeout', AFTER engine.disconnect() resolves AND the command is NOT a daemon, fire process.exit(0). The drain helper already stderr-warned with the pending count, so the diagnostic signal is preserved. Without this guard, a hung underlying promise could still keep Bun's event loop alive past disconnect. CRITICALLY narrower than PR garrytan#1337's blanket force-exit: the timeout path is the only trigger. In the common case (drain settles cleanly under 5s), no force-exit fires and the behavioral subprocess test still catches future regressions. The shouldForceExitAfterMain guard excludes 'serve' so the stdio + HTTP daemons stay alive past main(). e2e/pglite-cli-exit.serial.test.ts (NEW, IRON RULE): - gbrain search "foxtrot" → exits 0 within 15s - gbrain get alpha → exits 0 within 15s with foxtrot in stdout - gbrain query "foxtrot" --no-expand → exits within 15s (no-API-key graceful) - gbrain serve --http → stays alive 3+ seconds (daemon-survival regression guard) fix-wave-structural.test.ts: - import assertion for awaitPendingLastRetrievedWrites - last-retrieved.ts exports + Set tracking + Promise.race + timeout - BEHAVIORAL positioning assertion: drain `await` appears textually BEFORE engine.disconnect `await` in the op-dispatch local-engine path. Survives variable-rename refactors; catches any new disconnect path that bypasses the drain. - shouldForceExitAfterMain excludes 'serve' AND the gate is conditioned on drainResult.outcome==='timeout' Per D8 (Codex finding garrytan#5), explicitly do NOT add a drift-guard counting bumpLastRetrievedAt callers — would block harmless refactors and miss aliases. Co-Authored-By: Park Je Hoon <jehoon@users.noreply.github.com> Co-Authored-By: Matt Dean <matt-dean-git@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sync): add phase breadcrumbs to performSyncInner for garrytan#1342 triage The garrytan#1342 reporter saw ZERO stderr output before their PGLite sync hang, which made the bug impossible to triage from a community report alone. Mirrors the pre-existing `[gbrain phase] sync.git_pull start/done` pattern at the major pre-pull phase boundaries so the next garrytan#1342-shaped report names WHICH phase spun. Four new breadcrumbs at: - sync.resolve_repo (top of performSyncInner) - sync.load_active_pack (before the v0.39 T1.5 pack load) - sync.validate_repo_state (only when opts.sourceId is set — the re-clone branch) - sync.detect_head (before the isDetachedHead probe) No behavior change — pure stderr instrumentation. Doesn't fix garrytan#1342 (which still needs investigation per the TODOS entry filed in this wave), but converts "hung with no output" into actionable diagnostic data the next time the bug shape is reported. Per D9 in the eng review + Codex outside-voice finding garrytan#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: annotate v0.40.10.0 PGLite hang wave in CLAUDE.md + regen llms Key Files entries updated: - src/core/pglite-engine.ts: documents the v0.40.10.0 disconnect refactor (snapshot+early-null + try/finally lock-leak guard, KEEPS close-then-release order), and the new classifyPgliteInitError / buildPgliteInitErrorMessage helpers for garrytan#1340 hint routing. Pins PR garrytan#1337's accepted-but-narrowed contribution and the rejected release-then-close ordering swap. - src/core/last-retrieved.ts (within the brainstorm entry): documents the new awaitPendingLastRetrievedWrites drain, the Set tracking pattern, the 5s bounded timeout, the cli.ts narrow timeout-only force-exit synthesis with the serve-daemon guard, and the three community-validated reports (garrytan#1247/garrytan#1269/garrytan#1290) the fix closes. Credits PR garrytan#1259 (drain pattern) and PR garrytan#1337 (snapshot pattern + force-exit guard idea). Regenerated llms.txt + llms-full.txt — build-llms.test.ts gates the drift, all 7 cases green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(todos): file v0.40.10.0 PGLite hang follow-ups Three deferred items from the v0.40.10.0 fix wave: 1. garrytan#1342 sync-hang investigation. Single-reporter, JS-tight-loop shape, needs reproducer before any fix. Documents the ruled-out hypotheses (lock-refresh heartbeat, v91 trigger, while-true loops) and three concrete diagnostic next steps. The v0.40.10.0 sync phase breadcrumbs make the next report actionable. 2. awaitPendingSearchCacheWrites timeout-symmetry retrofit. The garrytan#1090 drain shipped without a timeout; the v0.40.10.0 garrytan#1247 drain ships with one. Apply the same Promise.race + stderr warn pattern for symmetry. 3. Drain-helper extraction. Per D4 in the eng review: two surfaces is the threshold for noticing, three for extracting. Pair with the symmetry retrofit above as one focused refactor when a third fire-and-forget surface appears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.10.0 fix(pglite): search/query/get exit cleanly + garrytan#1340 hint + garrytan#1342 breadcrumbs Closes garrytan#1247, garrytan#1269, garrytan#1290 (PGLite CLI search/query/get hang at ~95-98% CPU after printing results — three community-validated reports). Also fixes garrytan#1340 (WASM init misroutes to macOS 26.3 hint when real cause is Bun vfs read-only mount) and adds diagnostic phase breadcrumbs for the single-reporter garrytan#1342 sync-hang investigation. Core fix: track every fire-and-forget bumpLastRetrievedAt IIFE in a module-scoped Set; cli.ts awaits the drain before engine.disconnect() in the op-dispatch finally block; narrow process.exit(0) fires ONLY when the drain times out AND the command isn't a daemon. Snapshot+ early-null disconnect pattern + try/finally lock-leak guard close the partial-state race PR garrytan#1337 originally surfaced. Co-Authored-By: Park Je Hoon <jehoon@users.noreply.github.com> Co-Authored-By: Matt Dean <matt-dean-git@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: extract shouldForceExitAfterMain to its own module + add unit cases Gap-audit follow-up: cli.ts is a script entrypoint (top-level main() side effect), so importing it from a test fires the help output as a side effect. Move shouldForceExitAfterMain into src/core/cli-force-exit.ts so it can be unit-tested in isolation without the cli.ts script tail running. Adds test/cli-should-force-exit.test.ts (9 cases): bare serve, serve with flags after, global flags BEFORE the command (the load-bearing case for `gbrain --quiet serve`), op commands return true, non-daemon CLI commands return true, empty argv defaults to true, flag-only argv, default-arg fallback to process.argv.slice(2), substring-match avoidance (`serves` is NOT `serve` — strict equality via Set, not startsWith/includes). The daemon command set is now an explicit ReadonlySet — future daemons (a hypothetical `gbrain watch` or `gbrain daemon`) just add their name to DAEMON_COMMANDS rather than chaining ||. Updates fix-wave-structural.test.ts to look for the import + the new DAEMON_COMMANDS shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(version): rebase v0.40.10.0 → v0.41.6.0 (slot collision after v0.41.0.0+ landed) origin/master moved from v0.40.8.1 → v0.41.0.0 while this wave was in flight (PR garrytan#1367 minions cathedral). v0.41.1-v0.41.5 are claimed by other in-flight branches, so v0.41.6.0 is the next available slot. Bulk-renamed v0.40.10.0 → v0.41.6.0 across: - VERSION + package.json (trio audit clean: 0.41.6.0 / 0.41.6.0 / 0.41.6.0) - CHANGELOG.md (header + 3 prose references) - CLAUDE.md (Key Files annotations) - TODOS.md (follow-up entry header) - src/cli.ts + src/core/cli-force-exit.ts + src/core/last-retrieved.ts + src/core/pglite-engine.ts + src/commands/sync.ts (inline comments) - test/* (describe blocks + test file headers) - llms-full.txt (regenerated via `bun run build:llms`) bun.lock unchanged (version-only bump, no dep churn) per Codex garrytan#12. Verify: 52/52 wave tests pass after rename, typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: quarantine seed-pglite to .serial.test.ts (parallel WASM cold-start flake) The full-suite run during the v0.41.6.0 fix wave ship hit a 30s timeout in test/seed-pglite.test.ts under heavy 4-shard parallel contention (4972/4973 passed before SIGKILL). The test passes 11/11 in isolation. Root cause: each test instantiates a fresh PGLiteEngine (5 instances across the file, one per test) because each case writes to a different mkdtemp-ed dbPath. Under parallel shard load, multiple shards each cold-starting PGLite WASM simultaneously stretches the per-instance init from ~5s to 30s+. The shared-engine pattern (canonical PGLite block in CLAUDE.md R3+R4) doesn't apply here — different dbPaths require different engines. Fix per CLAUDE.md test-isolation quarantine rules: rename to `.serial.test.ts` so the file runs in the post-parallel serial pass with full WASM init capacity. Same pattern as test/pglite-engine-disconnect.serial.test.ts (added in this wave) and test/brain-registry.serial.test.ts (pre-existing). Removes test/seed-pglite.test.ts from check-test-isolation.allowlist since the .serial.test.ts rename auto-exempts it from the R3+R4 lint (scan skips *.serial.test.ts). 641 non-serial unit files scanned, lint clean. Verify: - bun test test/seed-pglite.serial.test.ts → 11/11 pass in 4.19s - scripts/check-test-isolation.sh → OK - bun run verify → all gates pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pre-landing review fixes (C13 disconnect-hang, C1 set leak, C9 catch drain, M1 type drift) Adversarial review + maintainability specialist surfaced four real issues in the v0.41.8.0 wave. All four fixed in this commit; one deferred to TODOS.md as a v0.41+ follow-up (unusual caller pattern). **C13 [load-bearing, defense-in-depth for the wave's stated goal]:** `await engine.disconnect()` inside the op-dispatch finally can ITSELF hang on PGLite (db.close() racing OS-level FS state). When that happens, the entire wave's force-exit guard never runs — we recreate the original hang at a new layer. Fix: install an unref'd setTimeout hard-exit fallback BEFORE entering the try/catch/finally. The timer fires after DISCONNECT_HARD_DEADLINE_MS=10s with a stderr warn and process.exit(0). unref ensures it doesn't keep the loop alive on a healthy exit. Daemons (`serve`) are excluded by reusing the shouldForceExitAfterMain guard. **C9 [data freshness gap, narrow but real]:** The drain ran ONLY in the success branch of try. If `bumpLastRetrievedAt` fired (handler succeeded) but `JSON.parse(JSON.stringify(...))` or `formatResult` then threw, process.exit(1) killed the process and the in-flight UPDATE was discarded. Fix: drain in the catch path too before process.exit(1) (best-effort, bounded by the drain's own 5s timeout). **C1 [daemon leak]:** A timed-out IIFE used to stay in the pending-writes Set forever because its `.finally` never fires. Long-lived `gbrain serve` would accumulate references without bound across repeated timeouts. Fix: explicitly `delete` the snapshot's tracked promises from the Set after a timeout outcome. The IIFEs keep running (orphaned), but the Set no longer leaks references. Pinned by a new unit test that asserts the second drain after a timeout returns immediately with empty pending count. **M1 [silent type drift]:** `cli.ts` duplicated the `{outcome, pending}` literal shape instead of importing the `DrainOutcome` type that `last-retrieved.ts` exports exactly for this purpose. Two-line fix: add `type DrainOutcome` to the import and use it for `let drainResult`. Future changes to the return shape now propagate through TypeScript. **Deferred to TODOS.md (C6 — unusual caller pattern):** Concurrent connect/disconnect on the same `PGLiteEngine` instance can strand: disconnect snapshots+nulls the lock while connect is still in-flight, leaving the resolved engine with no file lock held. Fix requires an instance-level mutex; not worth the complexity for a caller pattern that doesn't appear in production (single instance per process, sequential lifecycle). Also broadened `test/fix-wave-structural.test.ts` regex to accept additional type-imports from `last-retrieved.ts` (e.g. the new `type DrainOutcome` import that M1 added). Test coverage: 53/53 wave tests pass (added C1-followup case to last-retrieved.test.ts). The C1 fix is also pinned by tightening the existing permanent-pending test's post-timeout assertion to expect empty pending count rather than the prior (stale) "stays in set" note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: post-ship documentation sync for v0.41.8.0 Consolidate the duplicate 'take advantage of v0.41.8.0' sections in the CHANGELOG entry into a single canonical block per the CLAUDE.md template. The wave originally landed with both '### How to take advantage' (line 13) and '### To take advantage' (line 57) as h3 headings. CLAUDE.md mandates one '## To take advantage of v[version]' h2 block per release entry, with verify steps + an issue-filing fallback for users hitting upgrade failures. Promoted the second block to h2, added the issue-filing step, and removed the redundant first block (the upgrade command is already covered in the verify steps). Itemized changes section was unchanged. llms.txt + llms-full.txt regenerated; structurally identical so no content changes shipped. * fix(test): find-experts-op queries schema dim instead of hardcoding 1536 (CI shard 1) CI shard 1 failed on this branch with: \"expected 1280 dimensions, not 1536\" from pgvector's CheckExpectedDim. Root cause: master's v0.36.0 changed DEFAULT_EMBEDDING_DIMENSIONS from OpenAI's 1536d to ZeroEntropy's 1280d (src/core/ai/defaults.ts:21). The test's basisEmbedding helper hardcoded dim=1536, so beforeAll's upsertChunks failed when the schema column was created at 1280d. Latent on master: the weight-aware LPT bin-packing in scripts/sharding.ts assigns files to shards deterministically based on the COMPLETE file set. My branch adds 5 new test files, which shifted find-experts-op.test.ts into shard 1. Master's shard 1 doesn't run this file (it lands in a different shard there), so the bug never surfaced in master's CI. Fix: query the actual column dim via SELECT atttypmod FROM pg_attribute after initSchema, then seed the embedding at that width. This handles both paths (no-env CI → 1280; env-configured local → 1536) without hardcoding either default. Verify: - bun test test/find-experts-op.test.ts → 11/11 pass with provider env - env -i bun test test/find-experts-op.test.ts → 11/11 pass without - bun run verify → all 21 parallel checks clean - bun run typecheck → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(lint): robust pure-bash allowlist match in check-test-isolation (CI verify) CI verify failed on PR garrytan#1405 with check:test-isolation flagging test/scripts/check-test-isolation.test.ts even though that file is on line 22 of the allowlist (and has been since v0.26.7 as a permanent exemption — its body contains process.env mutation fixtures that the lint legitimately matches). Could not reproduce locally on macOS bash 3.2 + BSD grep across any locale (C, C.UTF-8, POSIX). Suspect a subtle interaction between the prior `echo "$ALLOWLIST" | grep -qxF "$f"` form and one of: Ubuntu 24.04's bash 5 set-e/pipefail semantics, GNU grep edge case on the first-line entry, or `bun run` + GNU timeout subshell interaction. Diagnostic value of chasing further is low — the fix is to drop the grep+pipe form entirely. Switch is_allowlisted() to pure-bash `case $'\n'"$ALLOWLIST"$'\n' in *$'\n'"$f"$'\n'*) return 0 ;; esac` whole-line matching: - Locale-free (no character-class interaction) - Pipe-free (no pipefail / SIGPIPE / buffering) - Subshell-free (no env or exit-code propagation gotchas) - set-e-quirk-free (no left-side compound failure) - ~100x faster (no fork+exec per call across 689 files) Verified locally: lint OK (689 files), case-match returns true for the allowlisted file and false for a non-allowlisted file. bun run verify clean (21/21 parallel checks pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Park Je Hoon <jehoon@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Matt Dean <matt-dean-git@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
serve --httpVerification
bun run typecheckgbrain listandgbrain config showboth exit in <1s instead of hanginggbrain serve --http --port <free>remains running and serves OAuth metadatabun test test/cycle-pglite-lock-ordering.serial.test.ts test/admin-embed-spawn.serial.test.tsNotes
bun run testwas attempted before narrowing; it still has unrelated environment-sensitive serial failures around embedding/provider setup. The regression surfaces touched here are covered by the focused checks above.