v0.42.24.0 fix(minions): route lock claim/renewLock through direct session pool#1822
Merged
Conversation
The Minion lock heartbeat (claim + renewLock) ran every UPDATE through engine.executeRaw(), which is hardcoded to the read pool. On Supabase that is the transaction-mode pooler (6543), which recycles connections per transaction. A lock is held for minutes, so the pooler periodically reaps the socket mid-heartbeat -> CONNECTION_ENDED -> the lock looks expired -> the worker force-evicts its own job and the claim loop wedges silently. Add BrainEngine.executeRawDirect(): same contract as executeRaw, but routes to the direct session-mode pool (5432, GBRAIN_DIRECT_DATABASE_URL) when dual-pool is active. No-op delegation on PGLite / non-Supabase / kill-switch. claim/renewLock now use it. Single-statement UPDATEs only, so the double-claim guard and the renewLock no-inline-retry contract are preserved. Statements inside an open transaction keep their tx connection (in-transaction guard keys on peekReadPool() !== _sql); the lock hot-path never runs inside transaction(). The Postgres impl shares its cancellation plumbing with executeRaw via a private runUnsafe helper. New test/postgres-execute-raw-direct.test.ts covers the routing decision (dual-pool on/off x in-tx/not + abort short-circuit) without a live Postgres; queue-lock-retry.test.ts gains a guard that claim can never fall back to executeRaw. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document executeRawDirect on the BrainEngine contract and the claim/renewLock direct-session-pool routing in KEY_FILES.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DRY refactor in this PR extracted executeRaw/executeRawDirect's shared cancellation plumbing into a private runUnsafe(conn, ...) helper, so the single conn.unsafe() call moved out of executeRaw's body. The D3 guard read executeRaw's source and asserted conn.unsafe( appeared exactly once there, which now fails (it's zero — executeRaw delegates). The D3 invariant (no per-call retry wrapper) is unchanged; it just spans the delegate now. Update the guard to check both public methods delegate to runUnsafe without reconnect/retry, and assert the exactly-once conn.unsafe + cancel-only catch in runUnsafe. Also extends coverage to executeRawDirect so the lock hot-path can't reintroduce a retry wrapper either. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve VERSION/package.json (keep 0.42.24.0, higher than master's 0.42.21.0), CHANGELOG (both entries kept, 0.42.24.0 on top), and the postgres-engine.ts / connection-resilience.test.ts auto-merges. Trio audit green; typecheck clean; lock-path + D3-guard + routing tests pass on the merged state.
Resolve VERSION/package.json (keep 0.42.24.0 > master's 0.42.22.0), CHANGELOG (both entries, 0.42.24.0 on top), and KEY_FILES.md (keep my queue.ts entry with executeRawDirect; take master's newer worker/supervisor/child-worker-supervisor entries from the #1801 progress-watchdog work). queue.ts auto-merge kept both executeRawDirect call sites. Trio green; typecheck clean; lock-path + D3-guard + routing tests pass on the merged state.
Resolve VERSION/package.json (keep 0.42.24.0 > master's 0.42.23.0) and CHANGELOG (both entries, 0.42.24.0 on top). KEY_FILES auto-merged cleanly (my queue.ts executeRawDirect entry + master's wedge work both intact). Trio green; typecheck clean; lock-path + D3-guard + routing tests pass on the merged state.
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
Jun 5, 2026
* upstream/master: v0.42.26.0 docs(supabase): update connection-string setup to new UI + Transaction pooler (garrytan#1848) (garrytan#1875) v0.42.25.0 fix(pricing): unify chat-model pricing into one canonical source; add Opus 4.8 (garrytan#1819) (garrytan#1827) v0.42.24.0 fix(minions): route lock claim/renewLock through direct session pool (garrytan#1822)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix (
fix(minions): route lock claim/renewLock through direct session pool): The Minion lock heartbeat (claim+renewLockinsrc/core/minions/queue.ts) ran every UPDATE throughengine.executeRaw(), hardcoded to the read pool. On Supabase that is the transaction-mode pooler (6543), which recycles connections per transaction. A lock is held for minutes, so the pooler periodically reaps the socket mid-heartbeat →CONNECTION_ENDED→ the lock looks expired → the worker force-evicts its own job and the claim loop wedges silently (process alive, no errors, claiming nothing). Showed up most underenrichfan-out.The fix adds
BrainEngine.executeRawDirect()— same contract asexecuteRaw, but routes to the direct session-mode pool (5432,GBRAIN_DIRECT_DATABASE_URL) when dual-pool is active, holding the connection open for the worker's life so heartbeats survive. No-op delegation on PGLite / non-Supabase / kill-switch (GBRAIN_DISABLE_DIRECT_POOL).claim/renewLocknow use it. Single-statement UPDATEs only, so the double-claim guard and the renewLock no-inline-retry contract are preserved. Statements inside an open transaction keep their tx connection (in-transaction guard keys onpeekReadPool() !== _sql); the lock hot-path never runs insidetransaction(). The Postgres impl shares its cancellation plumbing withexecuteRawvia a privaterunUnsafehelper.No new infrastructure —
GBRAIN_DIRECT_DATABASE_URLwas already configured; this just makes the lock path use the pool built for exactly this workload. Supersedes PR #1816 (closed).Test Coverage
New
test/postgres-execute-raw-direct.test.tscovers the Postgres routing decision (dual-pool on/off × in-tx/not, plus abort short-circuit) with stubbed connections — runs in CI without a live Postgres, the branch the fix exists to exercise.test/queue-lock-retry.test.tsgains a guard thatclaimcan never fall back toexecuteRaw.bun run typecheck— clean repo-wide on the merged stateminions,queue-lock-retry,postgres-execute-raw-direct,worker-lock-renewal,postgres-engine-getter-selfheal,postgres-reconnect-singleton)check-worker-pool-atomicity,check-worker-lock-renewal-shape,check-no-double-retry,check-no-legacy-getconnection,check-jsonb-patterninTransactionguard survives v0.42.20.0's reconnect-race rework: the instance reconnect path re-runssetReadPool(this._sql), sopeekReadPool() === _sqlstill holds for a live worker.Pre-Landing Review
/plan-eng-reviewcleared the change this session. Three findings applied before ship: DRY refactor (sharedrunUnsafehelper), the routing unit test above, and a P3 follow-up TODO for direct-pool sizing under enrich fan-out.Design Review
No frontend files changed — design review skipped.
Eval Results
No prompt-related files changed — evals skipped.
Plan Completion
No plan file detected (diff-scoped change reviewed via
/plan-eng-review).TODOS
TODOS.md, Minion-lock direct-pool follow-up section).Documentation
docs/architecture/KEY_FILES.md— documented the newBrainEngine.executeRawDirect()contract method (lock-hot-path sibling ofexecuteRaw; routes to the direct session-mode pool when dual-pool is active, PGLite delegates straight toexecuteRaw) on thesrc/core/engine.tsentry; noted on thesrc/core/minions/queue.tsentry thatclaim/renewLocknow issue their UPDATE viaexecuteRawDirect.Documentation Debt (pre-existing, not introduced here)
docs/ENGINES.mdlists only the transaction pooler (6543) and never documents the dual-pool / direct session-mode (5432) design. Candidate for a follow-up/document-generatepass.Test plan
🤖 Generated with Claude Code