v0.42.21.0 fix(postgres): module-singleton ownership — canonical landing for the dream-cycle "connect() has not been called" class (#1404/#1471/#1619)#1805
Merged
Conversation
…nger nulls the cycle's connection (#1404/#1471/#1619) gbrain dream on Postgres failed every DB phase with "No database connection: connect() has not been called": a short-lived borrower probe engine (lint/doctor config-lift, no poolSize) called db.disconnect() in its own disconnect(), nulling the shared module singleton the long-lived cycle owner was still using. The module `sql` is only ever nulled by db.disconnect() (postgres.js auto-reconnects its own pool), so the failure was always a borrower-disconnect, never an idle-pooler drop. Fix: db.connect() returns whether THIS call created the singleton (atomic — no await between the null-check and the sql=postgres() assignment), PostgresEngine stores it as _ownsModuleSingleton, and disconnect() only calls db.disconnect() when it owns the connection. Borrowers no-op. Hardening: db.disconnect() snapshots +nulls sql before awaiting end(); reconnect() shares one in-flight _reconnectPromise. Tests: new postgres-engine-singleton-ownership.test.ts; expanded DB-gated e2e matrix (owner/borrower, creation-not-role, symmetric CLI-exit, owner-reconnect- with-live-borrower); module-style getter asymmetry; #1570 shared-recovery regression updated to assert the fixed contract. Co-Authored-By: nullhex-io <noreply@github.com> Co-Authored-By: joelwp <noreply@github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2.21.0) Resolves conflicts from master's v0.42.11.0–0.42.16.0 wave: - VERSION/package.json/CHANGELOG → 0.42.21.0 (claimed slot past sibling versions) - src/core/postgres-engine.ts reconnect() + src/core/retry.ts: merged this branch's shared `_reconnectPromise` concurrency fix with master's #1685 pool-recovery audit (ctx?.error classification + logPoolRecovery), keeping both. - CLAUDE.md: took master's reference-map refactor; documented the module-singleton ownership behavior as current-state prose in docs/architecture/KEY_FILES.md instead. - TODOS.md/CHANGELOG.md: kept both this branch's and master's new sections. - llms bundles regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2.21.0) Resolves conflicts from master's v0.42.17.0–0.42.18.0 sync wave: - VERSION/package.json/CHANGELOG → 0.42.21.0 (kept my entry on top + master's new entries) - src/cli.ts finally block: kept BOTH master's `syncWatchdog?.dispose()` (#1633) and this branch's #1471 owner-disconnect tripwire comment. - llms bundles regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2.21.0) Resolves conflicts from master's v0.42.19.0–0.42.20.0 reliability wave: - src/core/postgres-engine.ts reconnect(): took master's style-aware reconnect (#1745/#1810 — module-singleton path recovers idempotently via db.connect() WITHOUT tearing down the shared pool; instance path rebuilds + pool-recovery audit). Kept this branch's _ownsModuleSingleton ownership token in connect()/disconnect(). Dropped this branch's _reconnectPromise in favor of master's _reconnecting guard (master's module-never-teardown obviates it); updated the singleton-ownership test + retry.ts/batchRetry comments to match. - src/cli.ts: took master's drain-before-disconnect block (#1762) on the CLI_ONLY owner-disconnect (this resolves the F5 facts-queue-drain TODO for the fall-through path); kept the #1471 owner-disconnect-last invariant note. - VERSION/package.json/CHANGELOG/TODOS → 0.42.21.0, both sides' entries kept. - llms bundles regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
Jun 3, 2026
* upstream/master: v0.42.23.0 feat(jobs): --nice scheduling-priority flag for jobs work/supervisor (garrytan#1815) (garrytan#1820) v0.42.22.0 fix(minions): supervisor progress watchdog + worker DB self-defense — alive-but-wedged worker self-heals (garrytan#1801) (garrytan#1824) v0.42.21.0 fix(postgres): module-singleton ownership — canonical landing for the dream-cycle "connect() has not been called" class (garrytan#1404/garrytan#1471/garrytan#1619) (garrytan#1805) v0.42.20.0 fix: reliability wave — PGLite capture lock-pin + Postgres reconnect race + search embed-hang (garrytan#1762 garrytan#1745 garrytan#1775) (garrytan#1810) v0.42.19.0 fix(skillopt): close the last gap in the AI SDK v6 tool-loop fix (write-capture mapper + regression test) (garrytan#1809) v0.42.18.0 fix: sync orphan-pileup watchdog (garrytan#1633) + links-lag µs stamp (garrytan#1768) (garrytan#1807) v0.42.17.0 fix(sync): resumable incremental sync — killed mid-import no longer loses progress (garrytan#1794) (garrytan#1808) v0.42.16.0 feat(doctor): brain health as a solved problem — cause-ranked doctor + OOM-loop line + auto-drain + pool-reap (garrytan#1685) (garrytan#1802) v0.42.15.0 fix: decouple CLI primary output from process.stdout.isTTY (garrytan#1784) (garrytan#1806) v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (garrytan#1780) (garrytan#1804) v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (garrytan#1777) (garrytan#1797) v0.42.12.0 feat: self-upgrading gbrain — invocation-riding update check + opt-in auto-upgrade (garrytan#1798) v0.42.11.0 feat(skillopt): held-out eval gate, honest receipts, ENFORCE + ablation opts (garrytan#1759) v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes garrytan#972) (garrytan#1388)
This was referenced Jun 8, 2026
fix(postgres-engine): pair pg_advisory_lock acquire+release on same backend via pool.reserve()
#1408
Closed
Closed
Closed
Closed
This was referenced Jun 8, 2026
fix(code-graph): TS/TSX call edges now resolve (const-arrow naming + source_id on edge writes)
#1723
Closed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
gbrain dreamon Postgres (local or Supabase) failed every DB-dependent phase withNo database connection: connect() has not been called.lint/backlinks(filesystem-only) worked;sync/synthesize/embed/orphansblew up;extractreported "created 0" while silently dropping every row. Each phase worked in isolation (separate process) — only the full cycle broke, and it left a stuckgbrain_cycle_locksrow.This is the canonical landing for a long-standing class with ~15 competing community PRs and zero merged. Closes #1404, #1471, #1619 (symptom tracked in #1570/#1535).
Root cause (confirmed)
The dream cycle opens one long-lived "owner" connection (the module-level
sqlsingleton insrc/core/db.ts). Short-lived "borrower" probe engines (lint config-lift, doctor) open withoutpoolSize, so they share that singleton — but theirdisconnect()cascaded intodb.disconnect()and nulled the singleton the owner was still using. The modulesqlis only ever nulled bydb.disconnect()(postgres.js auto-reconnects its own internal pool and never touches our reference), so the failure was always a borrower-disconnect — never the "idle pooler drop" hypothesized in #1570/#1619.PostgresEngine.disconnect()calleddb.disconnect()for any_connectionStyle === 'module'engine with no check for whether it actually created the singleton.Prior partial fixes addressed symptoms: v0.41.27.0's retry-with-reconnect rescued batch phases (
extract) but notsync/synthesize(they don't use the retry path); v0.42.5.0 stopped the lint phase from being a borrower but left the structural hole open for every other helper.The fix
db.connect()now returnsboolean—trueiff THIS call created the singleton. The decision is atomic: there's noawaitbetween theif (sql)null-check and the synchronoussql = postgres(...)assignment, so two concurrent connects can't both claim creation.PostgresEnginestores it as_ownsModuleSingletonand only callsdb.disconnect()when it owns the connection; borrowers clear their own marker and leave the shared pool alone.db.disconnect()snapshots + nullssqlbefore awaitingend()(a concurrent connect can't join a closing pool);PostgresEngine.reconnect()uses a shared in-flight_reconnectPromise(replaces the_reconnectingboolean that returned the 2nd caller immediately against a half-rebuilt pool).cli.tsinvariant tripwire on the dream owner-disconnect documenting that the owner must outlive its borrowers.Tests
test/postgres-engine-singleton-ownership.test.ts— source-level guardrails for the ownership contract (atomic token, guarded teardown, no TOCTOU pre-sample).test/e2e/postgres-engine-disconnect-idempotency.test.ts— owner/borrower, creation-not-role, symmetric CLI-exit (no hang regression), owner-reconnect-with-live-borrower.test/postgres-engine-getter-selfheal.test.ts.Verification:
bun run verify29/29; blast-radius + e2e + migrate suites 254/254 against real Postgres. Full unit matrix runs in CI (the local box couldn't complete it under load).Review
Ran
/plan-eng-review+/reviewwith both Codex and Claude adversarial passes. Both verified the core fix correct. They converged on one concern — per-engine ownership state about a shared resource can desync under concurrent module connect/reconnect — which is not reachable in current gbrain (sequential cycle phases, nested borrowers, instance-engine worker pool, background writes reuse the owner). Captured as aTODOS.mdfollow-up (the refcount/generation hardening to do before any concurrent-module-connect path is introduced), alongside two other pre-existing items (facts-queue drain on dream/fall-through paths; stale ConnectionManager after reconnect).Credit
The ownership-flag approach that shipped: @nullhex-io (#1651) and @joelwp (#1667). Root-cause walkthroughs: @BrendanGahan, @xaviroblessarries, and the other #1471 reporters.
🤖 Generated with Claude Code