Conversation
The PGLite write lock leaked indefinitely when the parent of `gbrain serve`
disconnected. Three root causes: serve.ts never called engine.disconnect()
after startMcpServer() resolved; cli.ts short-circuited with a "serve doesn't
disconnect" comment; and the MCP SDK's StdioServerTransport only listens for
'data'/'error' on stdin, never 'end'/'close', so even a clean stdin EOF never
reached the SDK.
Net effect: the next `gbrain serve` waited for the in-process 5-minute stale-
lock check or hung indefinitely.
stdio path now installs a unified lifecycle:
- SIGTERM/SIGINT/SIGHUP all funnel into one idempotent shutdown path
(SIGHUP coverage matters for Claude Desktop on macOS / MCP gateway
restarts; SIGINT for Ctrl-C; SIGTERM for daemon shutdown).
- stdin 'end' (clean EOF) and 'close' (parent SIGKILL with pipe still
open) both trigger the same graceful path. TTY stdin skips the watchers
so interactive `gbrain serve` is unaffected.
- Parent-process watchdog polls the live kernel parent PID via spawnSync
('ps','-o','ppid=','-p',PID) every 5s. process.ppid is cached at process
creation by Bun (and Node) and never refreshes on re-parent — empirical
evidence on macOS shows ps reports the new parent within one tick while
process.ppid stays at the original PID indefinitely (oven-sh/bun#30305).
- Watchdog fires on `getParentPid() !== initialParentPid` (any reparent),
not just `=== 1`. Catches launchd / systemd / tmux / parent-shell-with-
PR_SET_CHILD_SUBREAPER cases where the kernel re-anchors us to a non-1
subreaper PID. Codex review caught the original `=== 1` was incomplete.
- One-shot startup probe verifies `spawnSync('ps')` actually works on this
host. If the probe fails (stripped containers / busybox without procps),
we skip installing the watchdog interval entirely AND emit a loud stderr
line — the operator sees "watchdog disabled" instead of an installed-
but-never-fires phantom that silently falls back to cached process.ppid.
- 5-second cleanup deadline: if engine.disconnect() wedges (PGLite WASM
stall, etc.), the process still calls process.exit(0). The abandoned
lock dir is reclaimed on the next start by the existing stale-lock
check in pglite-lock.ts.
- Optional `--stdio-idle-timeout <sec>`: default OFF safety net for
parents that leak the pipe but never close it. Strict parsing rejects
`abc` / `30junk` / `-1` / `1.5` / blank values explicitly so a typo
doesn't silently disable the safety net (closes #446).
Test seam: ServeOptions { stdin, signals, exit, log, startMcpServer,
getParentPid, setInterval, clearInterval, probeWatchdog } lets the
lifecycle be unit-tested deterministically without spawning a real Bun
child or booting the MCP SDK.
22 test cases covering signals, stdin EOF, TTY skip, watchdog reparent
(both PID-1 and subreaper-PID-N cases), ps-unavailable degraded mode,
idle timeout, idempotent shutdown, and cleanup-deadline behavior.
Closes #413, #446. Supersedes #591.
Co-Authored-By: Aragorn2046 <noreply@github.com>
Co-Authored-By: seungsu-kr <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`gbrain auth` and `gbrain serve --http` previously routed every SQL
through the postgres.js singleton in src/core/db.ts, which silently fell
back to a file-backed PGLite when DATABASE_URL was set but the config
file disagreed. The HTTP transport's verbatim use of the singleton also
made `gbrain serve --http` Postgres-only, even though the
`access_tokens` and `mcp_request_log` tables exist in both engine
schemas.
Auth, OAuth, admin, file uploads, and HTTP-transport SQL now run through
`engine.executeRaw` via a deliberately narrow tagged-template adapter
(`src/core/sql-query.ts`). The contract is scalar-binds-only — adding
JSONB or fragment composition would invite the adapter to drift into a
partial postgres.js clone. JSONB writes use a separate
`executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that
composes positional `$N::jsonb` casts and passes objects through
`engine.executeRaw`. The CI guard at `scripts/check-jsonb-pattern.sh`
doesn't fire because the helper is a method call, not the banned
`${JSON.stringify(x)}::jsonb` template-literal interpolation, and the
v0.12.0 double-encode bug class doesn't apply to positional binding via
`postgres.js`'s `unsafe()` (verified by
`test/e2e/auth-permissions.test.ts:67` on Postgres and the new
`test/sql-query.test.ts` on PGLite).
Migrated call sites:
- src/commands/auth.ts: takes-holders writes (lines 52, 86) →
executeRawJsonb. List, revoke, register-client, revoke-client →
SqlQuery via withConfiguredSql() helper that opens an engine, runs
the callback, disconnects.
- src/commands/serve-http.ts: ~25 call sites including the four
mcp_request_log.params INSERTs (now write real JSONB objects, not
JSON-encoded strings — the read side `params->>'op'` returns the
operation name, closing CLAUDE.md's outstanding "JSON-string-into-
JSONB" note as a side effect). The /admin/api/requests dynamic
filter pattern (postgres.js fragment composition) is rewritten as
parametrized SQL string + params array.
- src/mcp/http-transport.ts: legacy bearer-auth path. The
Postgres-only fail-fast at startup is removed because both schemas
now carry access_tokens + mcp_request_log.
- src/core/oauth-provider.ts: SqlQuery / SqlValue types relocated
from here to sql-query.ts as the canonical home (Codex finding #8).
- src/commands/files.ts: all 5 db.getConnection() sites (lines 104,
139, 252, 326, 355). The line-256 INSERT into files.metadata uses
executeRawJsonb; the other four are scalar-only SqlQuery (Codex
finding #6 — scope was bigger than the plan's "lone INSERT" framing).
- src/core/config.ts: env-var DATABASE_URL inference. When dbUrl is
set, infer Postgres engine and clear the stale database_path.
Engine-internal sql.json() sites in src/core/postgres-engine.ts (5
sites: lines 520, 1689, 1728, 1790, 2313) STAY UNCHANGED. They live
inside PostgresEngine itself, where the postgres.js template-tag
sql.json() pattern is correct — those methods are only loaded when
Postgres is the active engine, so there's no PGLite-routing concern.
Migration v45 (mcp_request_log_params_jsonb_normalize): one-shot UPDATE
that lifts pre-v0.31 string-shaped JSONB rows to objects so the
/admin/api/requests endpoint at serve-http.ts:605 returns one
consistent shape to the admin SPA. Idempotent (subsequent runs find no
rows where jsonb_typeof = 'string'). Closes the mixed-shape window
that would otherwise have made post-deploy admin reads break.
Tests:
- test/sql-query.test.ts: 7 cases covering scalar binds, the
.json() rejection (defense in depth — SqlQuery is scalar-only),
JSONB round-trip with `jsonb_typeof = 'object'` and `->>`
semantics, the v0.12.0 double-encode regression guard, null
JSONB handling, and the scalars-then-jsonb call shape.
- test/config-env.test.ts: migrated from PR's manual `restoreEnv()`
in afterEach to the canonical `withEnv()` helper at
test/helpers/with-env.ts (CLAUDE.md R1 / codex finding D3).
Five cases covering DATABASE_URL precedence, GBRAIN_DATABASE_URL
operator override, file-only config, env-only config, and the
no-config null path.
- test/e2e/auth-takes-holders-pglite.test.ts: 6 cases against
in-memory PGLite (no DATABASE_URL gate). Covers create / update /
read of access_tokens.permissions, mcp_request_log.params object
+ null writes, and the migration v45 normalizer (seed
string-shaped row, run UPDATE, assert object shape; second-run
no-op for idempotency).
- test/http-transport.test.ts: mock updated to intercept
engine.executeRaw (the new code path) instead of the postgres.js
template tag. 24 cases pass.
Plan reference: ~/.claude/plans/system-instruction-you-are-working-peppy-moore.md.
Codex outside-voice review applied: D-codex-1, D-codex-2, D-codex-5,
D-codex-8, D-codex-9, D-codex-10 (and D1, D5 reversed by codex).
Closes the architectural intent of #681. Supersedes its branch.
Co-Authored-By: codex-bot <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings master's v0.31.0 facts hot memory + v0.31.1 thin-client + v0.31.1.1 fix-wave + v0.31.2 sync-strategy-code wave into the branch. Two real merge conflicts resolved: - src/commands/serve-http.ts: master refactored the inlined op.handler call into dispatchToolCall() with metaHook for hot-memory `_meta` injection. Branch's executeRawJsonb migration for the four mcp_request_log.params INSERT sites now applies on top of master's new dispatch flow. The catch-throw, toolResult.isError, and success paths all route through executeRawJsonb so JSONB writes preserve the object shape (D1 wave). - src/core/migrate.ts: master's v45 (facts_hot_memory_v0_31) keeps slot 45; branch's mcp_request_log_params_jsonb_normalize moves to v46 to avoid the version collision. The normalizer SQL is unchanged — runs after the facts table lands so /admin/api/requests sees one consistent shape post-deploy. VERSION 0.31.2 -> 0.31.3 (next available patch slot; v0.31.4 and v0.31.6 already claimed by open PRs #795 #796). package.json synced. CHANGELOG entry added describing both bundled fixes (#676 stdio cleanup, #681 engine-aware auth SQL). Targeted tests: 64 cases across 5 files all pass post-merge: bun test test/serve-stdio-lifecycle.test.ts test/sql-query.test.ts \ test/config-env.test.ts test/http-transport.test.ts \ test/e2e/auth-takes-holders-pglite.test.ts Verify gate: typecheck clean, all pre-checks pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Annotate the v0.31.3 changes in the canonical Key Files section: new src/core/sql-query.ts adapter (#681), src/commands/serve.ts stdio cleanup (#676), v0.31.3 amendments to auth.ts / serve-http.ts / oauth-provider.ts surfaces, and migration v46 normalizer in migrate.ts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI's build-llms test asserts the committed llms.txt + llms-full.txt match what scripts/build-llms.ts produces from current source state. CLAUDE.md was amended by /document-release post-merge (new entries for src/core/sql-query.ts and src/commands/serve.ts; amended notes on auth.ts / serve-http.ts / migrate.ts), so the inlined-bundle fell out of sync. Regenerated via `bun run build:llms`. llms.txt unchanged (curated index — no new web URLs added). llms-full.txt updated to inline the new CLAUDE.md content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two community PRs landed as one focused PR with two atomic bisect-friendly commits:
stdio MCP graceful cleanup (#676 successor): stdio EOF, SIGTERM, SIGINT, SIGHUP, parent-process death (every reparent case — PID 1, launchd subreaper, systemd, tmux, parent shell with PR_SET_CHILD_SUBREAPER), and an optional
--stdio-idle-timeoutall funnel into one idempotent shutdown that releases the engine and the PGLite write lock within 5 seconds. ps-unavailable stripped containers log loudly and skip the watchdog rather than silently fall back toprocess.ppid(Bun caches it). Closes #413, #446. Supersedes #591. Credit @Aragorn2046, @seungsu-kr.Engine-aware auth/admin SQL (#681 successor):
gbrain auth, OAuth 2.1 (register-client / token mint / revoke), the admin SPA, file uploads, and the legacy bearer-auth HTTP transport all run through the active engine via a deliberately-narrowSqlQueryadapter (src/core/sql-query.ts) callingengine.executeRaw. JSONB writes route through a separateexecuteRawJsonb(engine, sql, scalarParams, jsonbParams)helper that emits explicit$N::jsonbcasts. Migration v46 normalizes pre-v0.31.3 string-shapedmcp_request_log.paramsrows up to real JSONB objects. Credit codex-bot.Decision log: ~/.claude/plans/system-instruction-you-are-working-peppy-moore.md (5 eng-review decisions + 8 codex outside-voice decisions; 3 of the eng-review decisions reversed by codex including the
SqlQuery.json()extension which was rejected in favor of the separateexecuteRawJsonbhelper).Test Coverage
Tests: 4786 pass / 1 pre-existing fail (
brain-registry.serial, reproduces on master pre-branch — not introduced by this PR).Pre-Landing Review
Already ran via
/plan-eng-review(5 issues found, all addressed) +/codexoutside-voice consult-mode plan review (10 findings, 9 closed by re-design + 1 wording fix). Both reviews logged CLEAR before commits landed. Implementation matches the reviewed plan; no new findings during the merge.Plan Completion
11/11 plan tasks DONE — see commit messages and
~/.claude/plans/system-instruction-you-are-working-peppy-moore.md. Decisions D1, D2, D5 were reversed by Codex's outside-voice review (recorded in the plan's decision log) before implementation started, so the shipped code matches the post-review consensus, not the original plan.TODOS
No TODOS.md items completed by this PR. The PR closes GitHub issues #413 and #446 (which aren't tracked in TODOS.md).
Documentation
CLAUDE.mdupdated with v0.31.3 annotations in the Key Files section: new entries forsrc/core/sql-query.tsandsrc/commands/serve.ts, amended notes onauth.ts,serve-http.ts, andmigrate.ts(v46). README, CHANGELOG, TODOS, and VERSION required no changes (CHANGELOG already shipped via the merge commit).Test plan
bun run typecheck— cleanbash scripts/check-jsonb-pattern.sh— cleanbash scripts/check-test-isolation.sh— clean (299 non-serial unit files)bun test test/serve-stdio-lifecycle.test.ts— 22/22 passbun test test/sql-query.test.ts— 7/7 passbun test test/config-env.test.ts— 5/5 passbun test test/http-transport.test.ts— 24/24 pass (mock updated to intercept engine.executeRaw)bun test test/e2e/auth-takes-holders-pglite.test.ts— 6/6 passbun run test(full unit fast loop) — 4786 pass / 1 pre-existing failbun run verify— typecheck + 4 pre-checks all cleangit bisect start HEAD~3; git bisect bad HEAD; git bisect good HEAD~3— verified before merge that commit 1 (stdio) typechecks + tests pass independentlybun run test:e2e) — not run locally; CI will run on PR🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.