autopilot: minion worker crashes with "No database connection: connect() has not been called"

## Description

The minion worker subprocess (`gbrain jobs work`) spawned by `gbrain autopilot` crashes repeatedly because `engine.connect()` has not been called before the worker's main loop attempts to access the database.

This prevents ALL autopilot cycles from completing, and causes data loss during the extract phase.

## Environment

- **gbrain version**: 0.41.14.0
- **Engine**: Postgres
- **Config**: `agent.use_gateway_loop=true`, `chat_model: deepseek:deepseek-v4-flash`
- **OS**: Linux (WSL2 Ubuntu)

## Symptoms

### 1. Worker crash loop (51 crashes recorded)

```
Promotion error: No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
[autopilot] worker exited code=1 signal=null after 1409ms, crashCount=4, cause=runtime_error
[autopilot] crash backoff 8111ms (crashCount=4)
```

Error counts:
- **51** worker crashes
- **265** "Promotion error" in `promoteDelayed()`
- **446** total "No database connection" errors
- **52** batch link row losses during extraction

### 2. Data loss during extract.links_fs

```
[extract.links_fs] 60/60 (100%)
[extract.links_fs] connection blip, retrying 26 rows in 500ms (No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>)
  batch error (26 link rows lost): No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
[extract.links_fs] 60/60 (100%) done
Links: created 0 from 60 pages
```

### 3. Unhandled rejection during conversation_facts_backfill

```
[cycle.conversation_facts_backfill] start
[unhandledRejection] GBrainError: No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
    at getConnection (/.../gbrain/src/core/db.ts:153:15)
    at transaction (/.../gbrain/src/core/postgres-engine.ts:764:34)
    at failJob (/.../gbrain/src/core/minions/queue.ts:855:24)
    at executeJob (/.../gbrain/src/core/minions/worker.ts:831:39)
    at processTicksAndRejections (native:7:39)
```

### 4. All sources never complete a full cycle

```
[FAIL] cycle_freshness: Source 'aevum' has never completed a full cycle;
Source 'agent-arch' has never completed a full cycle; ... (all 9 sources)
```

Brain score permanently stuck at 51/100 because link extraction (3/25), timeline extraction (2/15), and orphan resolution (1/15) cannot complete.

## Root Cause Analysis

The main `gbrain` CLI process correctly calls `createEngine()` → `connectWithRetry()` at startup (`src/cli.ts:1687-1691`). However, the minion worker is spawned as a **separate** `bun gbrain jobs work` subprocess by the supervisor (`src/core/minions/supervisor.ts`). This child process also goes through the full CLI initialization and should call `connectWithRetry()`, but something in the worker's startup path causes `connect()` to not be reliably established before `promoteDelayed()` or `executeJob()` attempts DB access.

The worker's `queue.promoteDelayed()` runs in the main loop before any job is claimed (`src/core/minions/worker.ts:434`), and if the engine's connection pool isn't ready at that point, the error is caught but the worker then crashes with an unhandled rejection when `executeJob()` later tries to call `getConnection()`.

## Contrast: Main process works fine

The parent autopilot process and direct `gbrain dream` invocations work correctly — DB access succeeds. Only the minion worker child process experiences this.

## Suggested Fix Direction

The worker's main loop should verify that `engine.connect()` has been called before entering the `promoteDelayed()` / claim / execute cycle. Alternatively, the connection should be lazily established on first use rather than requiring an explicit `connect()` call.

A workaround might be to pass `GBRAIN_DATABASE_URL` as an environment variable so the worker can self-initialize without depending on parent state.

## Related

- PR #1488 (gateway tool schema fix for non-Anthropic providers) — submitted separately
- The subagent functionality is also affected because subagent phases (consolidate, synthesize) run inside the worker process

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autopilot: minion worker crashes with "No database connection: connect() has not been called" #1491

Description

Environment

Symptoms

1. Worker crash loop (51 crashes recorded)

2. Data loss during extract.links_fs

3. Unhandled rejection during conversation_facts_backfill

4. All sources never complete a full cycle

Root Cause Analysis

Contrast: Main process works fine

Suggested Fix Direction

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

autopilot: minion worker crashes with "No database connection: connect() has not been called" #1491

Description

Description

Environment

Symptoms

1. Worker crash loop (51 crashes recorded)

2. Data loss during extract.links_fs

3. Unhandled rejection during conversation_facts_backfill

4. All sources never complete a full cycle

Root Cause Analysis

Contrast: Main process works fine

Suggested Fix Direction

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions