PGLite engine: `gbrain dream --phase synthesize` hangs indefinitely (no worker daemon to process queued subagent jobs)

## Summary

`gbrain dream` (full cycle or `--phase synthesize` alone) deterministically **hangs at the start of `[cycle.synthesize]`** on PGLite with zero apparent cause: idle main thread, no TCP sockets, no Anthropic API calls in flight, no child processes. Reproduced on v0.37.3.0 and v0.39.0.0 against both an established and a freshly-rebuilt brain.

This appears to be a regression in the v0.38.1 "provider-agnostic subagent loop" rebuild on PGLite engines specifically. The PGLite engine does not run a worker daemon (per `gbrain jobs work` error: *"Worker daemon requires Postgres"*), but the v0.38+ synthesize phase submits `subagent` jobs to the queue and `waitForCompletion`-polls them indefinitely. Without a worker, the jobs stay in `waiting` state and the cycle never advances.

## Environment

- **OS:** macOS 26.5 (Darwin 25.5.0), arm64
- **gbrain:** 0.39.0.0
- **Bun:** 1.3.14
- **Engine:** PGLite (zero-config default from `gbrain init --pglite`)
- **Brain size:** 7,178 pages / 20,089 chunks
- **Corpus:** `dream.synthesize.session_corpus_dir = ~/brain/raw-transcripts/` (5 .txt files for the minimal repro; same hang at 397 files)
- **Embedding:** `openai:text-embedding-3-large` / 1536 dims
- **Models:** `models.dream.synthesize = anthropic:claude-sonnet-4-6`, `models.dream.patterns = anthropic:claude-haiku-4-5`

## Reproduction

```bash
# Fresh brain init (NOT a corrupted-WAL case — verified by full rebuild from #223 workaround)
gbrain init --pglite --path "$HOME/.gbrain/brain.pglite" --json
gbrain import ~/brain --no-embed       # 7178 pages, 20089 chunks, 0 errors
gbrain embed --stale                    # all chunks embedded

# Configure synthesize
gbrain config set dream.synthesize.session_corpus_dir ~/brain/raw-transcripts
gbrain config set models.dream.synthesize anthropic:claude-sonnet-4-6

# Trigger
gbrain dream --phase synthesize --dry-run --json
# Hangs indefinitely. Killed after 3 min via SIGKILL.
```

## Observed process state during hang

```
$ ps -p 58579 -o pid,etime,%cpu,rss
  PID ELAPSED  %CPU    RSS
58579   03:06   0.5 460064

$ lsof -p 58579 -i        # zero entries — no network
(empty)

$ pgrep -P 58579           # zero entries — no children
(empty)

$ sample 58579 2 -mayDie
Call graph:
    1723 Thread_5344885   DispatchQueue_1: com.apple.main-thread  (serial)
    + 1723 start  (in dyld) + 6992  [0x18270be00]
    +   1723 ???  (in bun)  load address 0x100be0000 + 0x9bdd0c
# Main thread parked in kevent64 — classic Bun event-loop wait with nothing scheduled
```

Last log line before the hang, every time:

```
[cycle.synthesize] start
[dream] model "anthropic:claude-sonnet-4-6" is not in MODEL_CONTEXT_TOKENS; using 180000-token fallback budget. Set dream.synthesize.max_prompt_tokens to override.
```

No further output. Process consumes ~2GB RSS over time but does no work.

## Root cause analysis

After tracing through `src/core/cycle/synthesize.ts` and `src/core/minions/queue.ts`:

1. Synthesize fans out one `subagent` job per worth-processing transcript via `MinionQueue.add()` with `allowProtectedSubmit: true`.
2. After submission, it calls `waitForCompletion(queue, jobId, { timeoutMs: 35 * 60 * 1000, pollMs: 5_000 })` for each child.
3. `waitForCompletion` polls `gbrain_jobs.status` for that id, expecting a worker to pick it up and transition it through `running` → `completed` / `failed`.
4. **On PGLite there is no worker.** `gbrain jobs work` refuses to start with: *"Error: Worker daemon requires Postgres. PGLite uses an exclusive file lock that blocks other processes."*
5. The submitted jobs sit at status `waiting` forever. The orchestrator polls them every 5s for up to 35 min per job — then the minion's TimeoutError fires, status becomes `timeout`, but only **after 35 min per job**. With 5 transcripts that's nearly 3 hours wall time.

Confirmed by inspecting queue state during/after a hung run:

```
$ gbrain jobs list
  ID     Name           Status               Queue      Time     Created
  324    subagent       waiting              default    —        2026-05-22T15:57:06
  323    subagent       waiting              default    —        2026-05-22T15:57:06
  ... (324 stuck jobs from previous attempts) ...

$ gbrain jobs stats
  Queue health: 324 waiting, 0 active, 0 stalled
```

(I cancelled all 324 via `gbrain jobs cancel`; the queue stays clean until the next synthesize run repopulates it with new waiters.)

## Why this wasn't caught earlier

- Postgres users have a worker daemon (`gbrain jobs work`) running alongside the cycle, so the same code path works for them.
- The PGLite engine's documentation rightly says it uses an exclusive file lock that prevents a separate worker — but the synthesize phase wasn't gated on engine type when v0.38+ moved the work into the minion queue.
- `gbrain doctor --fast` doesn't flag this (passes with 90/100 on the broken setup).

## Expected behavior

Either:
1. **Run subagent jobs inline on PGLite.** Synthesize should detect `engine.kind === 'pglite'` and execute children synchronously in the orchestrator process instead of submitting to the queue. (This is what v0.37 did, and what `gbrain jobs submit <name> --follow` does today.)
2. **Skip the phase with a clear error.** If the phase architecturally requires a worker, `loadSynthConfig` or the phase entrypoint should return `failed('synthesize requires worker daemon (Postgres engine); current engine: pglite')` so users see what's wrong instead of an indefinite hang.
3. **Provide a PGLite-compatible worker.** Bracket the worker around the cycle: orchestrator releases the writer lock, worker takes it, processes one job, releases, orchestrator resumes. (Heavier surgery.)

## Workarounds for users hitting this today

```bash
# Disable synthesize entirely (other cycle phases run cleanly)
gbrain config set dream.synthesize.enabled false
```

This bypasses the broken code path. Sync, embed, lint, backlinks, doctor all still run on the nightly cycle.

To force-release a stale `gbrain-cycle` lock left by a killed synthesize hang:

```typescript
// Connect to ~/.gbrain/brain.pglite directly, run:
//   DELETE FROM gbrain_cycle_locks WHERE id = 'gbrain-cycle';
// Otherwise the 30-min TTL has to expire.
```

A `gbrain doctor --release-stale-locks` (or similar admin command) would help here.

## Related

- #223 — macOS WASM crash. **Not the same bug** but easy to confuse: rebuilding from `#223` workaround does NOT resolve the synthesize hang. I rebuilt with `gbrain init --pglite` and re-imported all 7178 pages cleanly, and the hang reproduces immediately on the fresh brain.
- v0.38.1.0 commit message ("provider-agnostic subagent loop + remote MCP dispatch + budget meter") — likely where the regression entered.

Happy to provide additional diagnostics (full sample trace, jobs table dump, etc.) or test a patch.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PGLite engine: `gbrain dream --phase synthesize` hangs indefinitely (no worker daemon to process queued subagent jobs) #1306

Summary

Environment

Reproduction

Observed process state during hang

Root cause analysis

Why this wasn't caught earlier

Expected behavior

Workarounds for users hitting this today

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PGLite engine: gbrain dream --phase synthesize hangs indefinitely (no worker daemon to process queued subagent jobs) #1306

Description

Summary

Environment

Reproduction

Observed process state during hang

Root cause analysis

Why this wasn't caught earlier

Expected behavior

Workarounds for users hitting this today

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

PGLite engine: `gbrain dream --phase synthesize` hangs indefinitely (no worker daemon to process queued subagent jobs) #1306