autopilot: silent link-row loss + getHealth() breakage from module-singleton null mid-cycle (separate class from #1162 / #465)

## Summary

On Postgres engine, the `autopilot --inline` daemon silently loses link rows during the `extract.links_fs` phase, and `engine.getHealth()` then throws `No database connection: connect() has not been called` every cycle. Affects v0.41.0.0 (current at time of filing). Not the same class as #1162 / PR #465 (which is about the reconnect loop crashing the daemon) — this fires even when the daemon stays alive.

## Repro

Stock v0.41.0.0 on a Postgres-engine brain. From an interactive shell with your `GBRAIN_DATABASE_URL` / `ZEROENTROPY_API_KEY` etc. loaded:

```bash
gbrain dream --dir /path/to/your/brain --json 2>&1 | tee /tmp/dream.log
```

Output around the `extract` phase will contain:

```
[cycle.extract] start
[extract.links_fs] 3/3 (100%)
  batch error (2 link rows lost): No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
[extract.links_fs] 3/3 (100%) done
```

The cycle continues, every subsequent phase that calls `this.sql` and falls back to the module-level `db.getConnection()` will fail the same way (in autopilot's main loop this surfaces as `[health] ERROR ...` after the cycle completes, but the daemon does not crash — `logError` catches it).

## Root cause

`PostgresEngine.connect(config)` without a `poolSize` arg goes to the module-level singleton branch (`postgres-engine.ts:175-189`) — `db.connect(config)` runs, but the engine instance's own `_sql` stays `null`. From then on, `this.sql` returns `db.getConnection()` (the module singleton).

Somewhere in the cycle's call graph (still narrowing — observed firing during the `extract.links_fs` phase, after `synthesize` completes — even on a brain with no transcripts so synthesize is a near-noop), the module-level `sql` in `db.ts` gets nulled. Once that happens, every later engine call routed through the getter throws `connect() has not been called`. The lost-rows behavior in `extract.ts:614-628` is structural — the `flush()` function catches and logs the throw, but `batch.length = 0` in the `finally` clears the un-inserted rows regardless.

This is the same bug class the source comment at `src/commands/auth.ts:52` acknowledges for auth commands:

> v0.32: createEngine returns a disconnected instance. PostgresEngine's `sql` getter falls back to `db.getConnection()` (the module-level singleton) when `_sql` is unset, which throws "connect() has not been called" when `db.connect()` was never invoked either.

The auth-command instance was fixed by having `withConfiguredSql` call `engine.connect()`. Long-running daemons that hold the engine across many cycle phases have a different exposure: SOMETHING during the cycle nulls the singleton out from under them.

I did not isolate the exact line that nulls `sql` in `db.ts` (the only sites that set `sql = null` are `connect()`'s catch path on line 216 and `disconnect()` on line 230 — neither of which any cycle phase appears to invoke directly). It could be a transient pool error that triggers the catch-path null, or a code path I missed. Filing this as a bug first so a maintainer who knows the cycle code better can localize the source.

## What this affects

- **`gbrain autopilot --inline` on Postgres**: silent row loss in every `extract` phase that has links to insert. Status reads as `partial`, but the bare counts shown in the human log line (`extracted=N`) look correct because they reflect *intent*, not actual rows persisted. Operators trusting the log line will believe the daemon is healthy.
- **`gbrain autopilot` Minions-dispatch path on Postgres**: same class, fires when the dispatched job's engine routes through the singleton (per-job worker engines using `poolSize` are immune; the autopilot's *parent* engine handling intra-tick orchestration is not).
- **Adaptive `engine.getHealth()` health check** post-cycle: throws `connect() has not been called` and `logError('health', e)` writes the line, daemon survives, no adaptive backoff occurs because `interval` stays at `baseInterval`.

## Relationship to PR #465

PR #465 (open, ~1 month no review) fixes the `(engine as any).connect?.()` no-config-arg bug in the autopilot reconnect path. That's a real fix and lands when this issue lands. But applying #465 alone leaves THIS bug fully present — I confirmed it by patching #465 locally and re-running. The daemon stays alive (good — that's what #465 ships), but `extract.links_fs` still loses rows on every cycle and `getHealth` still throws.

So this is a separate fix wave.

## Suggested fix

Two reasonable shapes; I'm filing PR for the second one because it's the narrowest scope and survives architectural disagreement about the first.

**Option A — engine-level**: Have `PostgresEngine.connect(config)` always create its own `_sql` regardless of `poolSize`. Treat `poolSize` as a sizing hint, not a "should we own a pool" toggle. This kills the singleton-fallback class entirely. Risk: every existing CLI caller now opens a pool instead of sharing one — small extra connection overhead, but it's the architecturally cleaner answer.

**Option B — autopilot-specific** (what my PR does): At autopilot startup, after `connectEngine()` returns, force `engine.connect({...savedConfig, poolSize: 5})` so the autopilot's engine moves to instance-owned pool. The long-running daemon is the affected blast radius; one-shot CLI commands are unaffected by the same race because they exit before the singleton has time to get nulled.

I'm not opening Option A as a PR because it's a behavior change for every caller and reviewing it deserves a separate thread.

## Environment

- gbrain v0.41.0.0 (commit at install: `bun install -g github:garrytan/gbrain` ~2026-05-24)
- Engine: postgres (local Postgres 17.10 via Homebrew)
- macOS 26.4.1, Bun 1.x via global bun install
- ZeroEntropy embeddings (`zeroentropyai:zembed-1`, 1280-dim)
- Brain has 15 pages, ~50 chunks

## Companion PR

(linked in a moment)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autopilot: silent link-row loss + getHealth() breakage from module-singleton null mid-cycle (separate class from #1162 / #465) #1413

Summary

Repro

Root cause

What this affects

Relationship to PR #465

Suggested fix

Environment

Companion PR

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

autopilot: silent link-row loss + getHealth() breakage from module-singleton null mid-cycle (separate class from #1162 / #465) #1413

Description

Summary

Repro

Root cause

What this affects

Relationship to PR #465

Suggested fix

Environment

Companion PR

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions