"connect() has not been called" still reproduces on 0.41.28 — concurrent minion-worker path not covered by #1570 fix

## Follow-up to #1570 — "connect() has not been called" still reproduces on 0.41.28 (concurrency path the withRetry fix doesn't cover)

Thanks for the fix on #1570. We hit the same `No database connection: connect() has not been called` error on **gbrain 0.41.28.0** with Supabase Postgres, but via a path the `withRetry`-reconnect fix does not protect: a **concurrent** operation reading the module singleton while another caller is tearing it down.

### Environment
- gbrain 0.41.28.0, engine: Postgres
- Supabase, **direct** connection `db.<ref>.supabase.co:5432` (sslmode=require, IPv6) — not the pooler
- Symptom isolated to the `dream` cycle's `synthesize` phase + the minion worker; core retrieval (search/query/embed/sync/extract) unaffected.

### Symptom
Every dream/autopilot cycle, `synthesize` fails and `synth_pages=0`, with these repeating errors:
```
Promotion error: No database connection: connect() has not been called. ...
[extract.links_fs] connection blip, retrying (attempt 1/3): No database connection: connect() has not been called. ...
Dream cycle (partial): [InternalError/SYNTH_PHASE_FAIL] No database connection: connect() has not been called. ...
totals: ... synth_transcripts=0 synth_pages=0
```

### Root cause (the remaining path)
#1570's fix made the *retrying caller* reconnect. But the singleton is still **nulled out from under other in-flight operations**:

1. A transient blip triggers `PostgresEngine.reconnect()` (`src/core/postgres-engine.ts:~4515`), which does `try { await this.disconnect(); } catch {}` then reconnects.
2. In module-singleton mode, `this.disconnect()` routes to `db.disconnect()` (`src/core/postgres-engine.ts:~223`).
3. `db.disconnect()` (`src/core/db.ts:227`) executes `if (sql) { await sql.end(); sql = null; connectedUrl = null; }` — it **nulls the shared module singleton**.
4. The minion worker loop runs concurrently: `MinionWorker` → `this.queue.promoteDelayed()` (`src/core/minions/worker.ts:463`) → `engine.executeRaw` → `getConnection()` (`src/core/db.ts:~150`). It reads `sql` during the window where it is `null` and throws "connect() has not been called".
5. The `synthesize` phase submits its writer jobs through the same minion queue (`src/core/cycle/synthesize.ts` `engine.executeRaw`, surfaced as `SYNTH_PHASE_FAIL`), so the whole phase aborts.

So `withRetry` reconnects the *one* caller it wraps, but `db.disconnect()` setting `sql = null` breaks every *other* operation sharing the singleton during the disconnect→reconnect window. On Postgres this is newly exposed because minions are Postgres-only (`jobs work` is "Postgres only"); the path never ran on PGLite. We see the in-code instrumentation comment at `db.ts:228` referencing #1570 ("identify the caller that's nulling the module singleton mid-cycle") — that caller is `reconnect()`'s `disconnect()`, and the victims are the concurrent minion-queue ops.

### Isolation facts
- Not env/config: same process with `GBRAIN_DATABASE_URL` set runs `embed`, `orphans`, `purge`, `extract.timeline_fs` fine; only the minion-queue-backed path throws.
- Not connection instability per se: non-minion phases reuse the same connection without blips.
- Not dual-worker contention: reproduces with a single `gbrain jobs work`.

### Workaround we applied (stopgap, local source patch)
Gate the singleton release in `db.disconnect()` so a normal `reconnect()` no longer nulls the shared connection — postgres.js auto-reconnects dead sockets inside the pool, so keeping the singleton alive for the process lifetime is safe:
```ts
// db.ts disconnect(): only release on an explicit full shutdown
if (sql && process.env.GBRAIN_FORCE_DISCONNECT === '1') {
  await sql.end();
  sql = null;
  connectedUrl = null;
}
```
After this, the error disappears (0 occurrences across many cycles) and `synthesize` runs to completion instead of `SYNTH_PHASE_FAIL`.

### Suggested durable fix (your call)
Either (a) don't tear down the shared module singleton inside `reconnect()` — let postgres.js's pool self-heal; or (b) make `getConnection()` lazily reconnect (await an in-flight `connect()`) instead of throwing when `sql` is transiently null; or (c) guard concurrent queue ops against the disconnect window. Happy to test a patch against our Supabase setup.

### Reproduction
1. `gbrain init --supabase --url <conn> --embedding-model zeroentropyai:zembed-1 --embedding-dimensions 1280`
2. add a source, sync content, set a synthesize corpus dir
3. run `gbrain autopilot` (or `gbrain jobs work` + `gbrain dream --synthesize`)
4. observe `synth_pages=0` + the "connect() has not been called" errors during a connection blip.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"connect() has not been called" still reproduces on 0.41.28 — concurrent minion-worker path not covered by #1570 fix #1745

Follow-up to #1570 — "connect() has not been called" still reproduces on 0.41.28 (concurrency path the withRetry fix doesn't cover)

Environment

Symptom

Root cause (the remaining path)

Isolation facts

Workaround we applied (stopgap, local source patch)

Suggested durable fix (your call)

Reproduction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

"connect() has not been called" still reproduces on 0.41.28 — concurrent minion-worker path not covered by #1570 fix #1745

Description

Follow-up to #1570 — "connect() has not been called" still reproduces on 0.41.28 (concurrency path the withRetry fix doesn't cover)

Environment

Symptom

Root cause (the remaining path)

Isolation facts

Workaround we applied (stopgap, local source patch)

Suggested durable fix (your call)

Reproduction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions