Follow-up to #1570 — "connect() has not been called" still reproduces on 0.41.28 (concurrency path the withRetry fix doesn't cover)
Thanks for the fix on #1570. We hit the same No database connection: connect() has not been called error on gbrain 0.41.28.0 with Supabase Postgres, but via a path the withRetry-reconnect fix does not protect: a concurrent operation reading the module singleton while another caller is tearing it down.
Environment
- gbrain 0.41.28.0, engine: Postgres
- Supabase, direct connection
db.<ref>.supabase.co:5432 (sslmode=require, IPv6) — not the pooler
- Symptom isolated to the
dream cycle's synthesize phase + the minion worker; core retrieval (search/query/embed/sync/extract) unaffected.
Symptom
Every dream/autopilot cycle, synthesize fails and synth_pages=0, with these repeating errors:
Promotion error: No database connection: connect() has not been called. ...
[extract.links_fs] connection blip, retrying (attempt 1/3): No database connection: connect() has not been called. ...
Dream cycle (partial): [InternalError/SYNTH_PHASE_FAIL] No database connection: connect() has not been called. ...
totals: ... synth_transcripts=0 synth_pages=0
Root cause (the remaining path)
#1570's fix made the retrying caller reconnect. But the singleton is still nulled out from under other in-flight operations:
- A transient blip triggers
PostgresEngine.reconnect() (src/core/postgres-engine.ts:~4515), which does try { await this.disconnect(); } catch {} then reconnects.
- In module-singleton mode,
this.disconnect() routes to db.disconnect() (src/core/postgres-engine.ts:~223).
db.disconnect() (src/core/db.ts:227) executes if (sql) { await sql.end(); sql = null; connectedUrl = null; } — it nulls the shared module singleton.
- The minion worker loop runs concurrently:
MinionWorker → this.queue.promoteDelayed() (src/core/minions/worker.ts:463) → engine.executeRaw → getConnection() (src/core/db.ts:~150). It reads sql during the window where it is null and throws "connect() has not been called".
- The
synthesize phase submits its writer jobs through the same minion queue (src/core/cycle/synthesize.ts engine.executeRaw, surfaced as SYNTH_PHASE_FAIL), so the whole phase aborts.
So withRetry reconnects the one caller it wraps, but db.disconnect() setting sql = null breaks every other operation sharing the singleton during the disconnect→reconnect window. On Postgres this is newly exposed because minions are Postgres-only (jobs work is "Postgres only"); the path never ran on PGLite. We see the in-code instrumentation comment at db.ts:228 referencing #1570 ("identify the caller that's nulling the module singleton mid-cycle") — that caller is reconnect()'s disconnect(), and the victims are the concurrent minion-queue ops.
Isolation facts
- Not env/config: same process with
GBRAIN_DATABASE_URL set runs embed, orphans, purge, extract.timeline_fs fine; only the minion-queue-backed path throws.
- Not connection instability per se: non-minion phases reuse the same connection without blips.
- Not dual-worker contention: reproduces with a single
gbrain jobs work.
Workaround we applied (stopgap, local source patch)
Gate the singleton release in db.disconnect() so a normal reconnect() no longer nulls the shared connection — postgres.js auto-reconnects dead sockets inside the pool, so keeping the singleton alive for the process lifetime is safe:
// db.ts disconnect(): only release on an explicit full shutdown
if (sql && process.env.GBRAIN_FORCE_DISCONNECT === '1') {
await sql.end();
sql = null;
connectedUrl = null;
}
After this, the error disappears (0 occurrences across many cycles) and synthesize runs to completion instead of SYNTH_PHASE_FAIL.
Suggested durable fix (your call)
Either (a) don't tear down the shared module singleton inside reconnect() — let postgres.js's pool self-heal; or (b) make getConnection() lazily reconnect (await an in-flight connect()) instead of throwing when sql is transiently null; or (c) guard concurrent queue ops against the disconnect window. Happy to test a patch against our Supabase setup.
Reproduction
gbrain init --supabase --url <conn> --embedding-model zeroentropyai:zembed-1 --embedding-dimensions 1280
- add a source, sync content, set a synthesize corpus dir
- run
gbrain autopilot (or gbrain jobs work + gbrain dream --synthesize)
- observe
synth_pages=0 + the "connect() has not been called" errors during a connection blip.
Follow-up to #1570 — "connect() has not been called" still reproduces on 0.41.28 (concurrency path the withRetry fix doesn't cover)
Thanks for the fix on #1570. We hit the same
No database connection: connect() has not been callederror on gbrain 0.41.28.0 with Supabase Postgres, but via a path thewithRetry-reconnect fix does not protect: a concurrent operation reading the module singleton while another caller is tearing it down.Environment
db.<ref>.supabase.co:5432(sslmode=require, IPv6) — not the poolerdreamcycle'ssynthesizephase + the minion worker; core retrieval (search/query/embed/sync/extract) unaffected.Symptom
Every dream/autopilot cycle,
synthesizefails andsynth_pages=0, with these repeating errors:Root cause (the remaining path)
#1570's fix made the retrying caller reconnect. But the singleton is still nulled out from under other in-flight operations:
PostgresEngine.reconnect()(src/core/postgres-engine.ts:~4515), which doestry { await this.disconnect(); } catch {}then reconnects.this.disconnect()routes todb.disconnect()(src/core/postgres-engine.ts:~223).db.disconnect()(src/core/db.ts:227) executesif (sql) { await sql.end(); sql = null; connectedUrl = null; }— it nulls the shared module singleton.MinionWorker→this.queue.promoteDelayed()(src/core/minions/worker.ts:463) →engine.executeRaw→getConnection()(src/core/db.ts:~150). It readssqlduring the window where it isnulland throws "connect() has not been called".synthesizephase submits its writer jobs through the same minion queue (src/core/cycle/synthesize.tsengine.executeRaw, surfaced asSYNTH_PHASE_FAIL), so the whole phase aborts.So
withRetryreconnects the one caller it wraps, butdb.disconnect()settingsql = nullbreaks every other operation sharing the singleton during the disconnect→reconnect window. On Postgres this is newly exposed because minions are Postgres-only (jobs workis "Postgres only"); the path never ran on PGLite. We see the in-code instrumentation comment atdb.ts:228referencing #1570 ("identify the caller that's nulling the module singleton mid-cycle") — that caller isreconnect()'sdisconnect(), and the victims are the concurrent minion-queue ops.Isolation facts
GBRAIN_DATABASE_URLset runsembed,orphans,purge,extract.timeline_fsfine; only the minion-queue-backed path throws.gbrain jobs work.Workaround we applied (stopgap, local source patch)
Gate the singleton release in
db.disconnect()so a normalreconnect()no longer nulls the shared connection — postgres.js auto-reconnects dead sockets inside the pool, so keeping the singleton alive for the process lifetime is safe:After this, the error disappears (0 occurrences across many cycles) and
synthesizeruns to completion instead ofSYNTH_PHASE_FAIL.Suggested durable fix (your call)
Either (a) don't tear down the shared module singleton inside
reconnect()— let postgres.js's pool self-heal; or (b) makegetConnection()lazily reconnect (await an in-flightconnect()) instead of throwing whensqlis transiently null; or (c) guard concurrent queue ops against the disconnect window. Happy to test a patch against our Supabase setup.Reproduction
gbrain init --supabase --url <conn> --embedding-model zeroentropyai:zembed-1 --embedding-dimensions 1280gbrain autopilot(orgbrain jobs work+gbrain dream --synthesize)synth_pages=0+ the "connect() has not been called" errors during a connection blip.