Skip to content

resolveLintContentSanity disconnects shared module-level db singleton, killing the cycle's main engine connection #1471

@BrendanGahan

Description

@BrendanGahan

Summary

When gbrain dream runs on Postgres, resolveLintContentSanity (src/commands/lint.ts:298-339) creates a fresh PostgresEngine to probe DB-plane lint config, then disconnects it in finally. That disconnect kills the module-level db.ts connection singleton that the cycle's main engine is still using — every subsequent DB-touching phase then throws No database connection: connect() has not been called and lock.release() throws CONNECTION_ENDED from postgres.js, stranding the row in gbrain_cycle_locks.

Mechanism (root cause confirmed via stack-trace instrumentation)

  1. CLI's connectEngine() creates PostgresEngine instance feat: GBrain v0.1.0 — Postgres-native personal knowledge brain #1 (the cycle engine).
  2. #1.connect() falls through to the else branch in src/core/postgres-engine.ts:175 (no poolSize passed), calls db.connect(config), which creates the module-level sql singleton. Sets #1._connectionStyle = 'module'.
  3. Cycle runs lint as its first phase; runLintCore calls resolveLintContentSanity().
  4. resolveLintContentSanity calls createEngine(...)PostgresEngine instance feat: GBrain v0.2.0 — incremental sync, file storage, install skill #2 to probe lifted config.
  5. #2.connect({}) also falls through to the else branch. db.connect() sees sql already set and returns early. Sets #2._connectionStyle = 'module'.
  6. lint.ts:319 calls engine.disconnect() on feat: GBrain v0.2.0 — incremental sync, file storage, install skill #2 in finally.
  7. #2.disconnect() (postgres-engine.ts:192-211) sees _connectionStyle === 'module' and calls db.disconnect() → nulls the shared sql singleton.
  8. Engine feat: GBrain v0.1.0 — Postgres-native personal knowledge brain #1 (still in use by the cycle) is now broken. Every later phase throws connect() has not been called. runCycle's finally then throws CONNECTION_ENDED on lock.release().

The comment at postgres-engine.ts:94-100 explicitly anticipates this category of bug, but the existing _connectionStyle: 'instance' | 'module' | null flag doesn't distinguish a singleton owner from a singleton borrower — both call db.disconnect().

Repro

psql gbrain -c "DELETE FROM gbrain_cycle_locks;"
gbrain dream --dir ~/path/to/brain
psql gbrain -c "SELECT id, holder_pid FROM gbrain_cycle_locks;"

Expect: 1.4s run, every DB phase reports No database connection: connect() has not been called, stranded row in gbrain_cycle_locks.

(In a vanilla 0.41.2.0 install this currently surfaces differently because of a separate Bun WebCrypto issue that masks the post-extract_facts phases — see linked Bun issue draft. Switching src/core/schema-pack/manifest-v1.ts:274 and closure.ts:175 to sync node:crypto exposes the singleton-clobber bug cleanly.)

Suggested fix

Track singleton ownership in PostgresEngine. Only the instance whose connect() call actually created the singleton may disconnect it. Borrowers just clear their own marker:

// postgres-engine.ts — add field
private _ownsModuleSingleton: boolean = false;

// in connect(), else branch (the no-poolSize / module-singleton path):
} else {
  // Determine ownership BEFORE delegating to db.connect.
  let alreadyConnected = false;
  try { db.getConnection(); alreadyConnected = true; } catch { /* sql null = we'll be the owner */ }
  await db.connect(config);
  this._connectionStyle = 'module';
  this._ownsModuleSingleton = !alreadyConnected;
  // ... existing connectionManager wiring
}

// in disconnect():
if (this._connectionStyle === 'module') {
  if (this._ownsModuleSingleton) {
    await db.disconnect();
    this._ownsModuleSingleton = false;
  }
  this._connectionStyle = null;
}

This fix is minimal, backward-compatible, and addresses the root cause rather than symptoms. The CLI's connectEngine() engine becomes the owner; any helper that creates a probe engine (lint, doctor, future probes) becomes a borrower automatically.

An alternative — refactor resolveLintContentSanity to accept an engine from the caller and skip its own probe — is also defensible but spreads the API contract through more call sites. The ownership flag contains the bug to the place it was introduced.

Verification

After the fix:

  • Dream completes all 19 phases doing real DB work (sync adds pages, extract_facts reconciles, embed runs, etc.)
  • Lock row released cleanly each run (zero rows in gbrain_cycle_locks)
  • Per-phase report shows on every DB phase, not ✗ ... connect() has not been called

Tested locally on a 12,500-page brain over multiple runs. Full diagnostic write-up + stack traces available if useful.

Environment

  • gbrain 0.41.2.0
  • Bun 1.3.14
  • Postgres on macOS (Apple Silicon, localhost)

Related

Companion issue: silent lock.release() swallow (made this bug invisible for an unknown time).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions