You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gbrain dream (full cycle or --phase synthesize alone) deterministically hangs at the start of [cycle.synthesize] on PGLite with zero apparent cause: idle main thread, no TCP sockets, no Anthropic API calls in flight, no child processes. Reproduced on v0.37.3.0 and v0.39.0.0 against both an established and a freshly-rebuilt brain.
This appears to be a regression in the v0.38.1 "provider-agnostic subagent loop" rebuild on PGLite engines specifically. The PGLite engine does not run a worker daemon (per gbrain jobs work error: "Worker daemon requires Postgres"), but the v0.38+ synthesize phase submits subagent jobs to the queue and waitForCompletion-polls them indefinitely. Without a worker, the jobs stay in waiting state and the cycle never advances.
Environment
OS: macOS 26.5 (Darwin 25.5.0), arm64
gbrain: 0.39.0.0
Bun: 1.3.14
Engine: PGLite (zero-config default from gbrain init --pglite)
Brain size: 7,178 pages / 20,089 chunks
Corpus:dream.synthesize.session_corpus_dir = ~/brain/raw-transcripts/ (5 .txt files for the minimal repro; same hang at 397 files)
# Fresh brain init (NOT a corrupted-WAL case — verified by full rebuild from #223 workaround)
gbrain init --pglite --path "$HOME/.gbrain/brain.pglite" --json
gbrain import ~/brain --no-embed # 7178 pages, 20089 chunks, 0 errors
gbrain embed --stale # all chunks embedded# Configure synthesize
gbrain config set dream.synthesize.session_corpus_dir ~/brain/raw-transcripts
gbrain config set models.dream.synthesize anthropic:claude-sonnet-4-6
# Trigger
gbrain dream --phase synthesize --dry-run --json
# Hangs indefinitely. Killed after 3 min via SIGKILL.
Observed process state during hang
$ ps -p 58579 -o pid,etime,%cpu,rss
PID ELAPSED %CPU RSS
58579 03:06 0.5 460064
$ lsof -p 58579 -i # zero entries — no network
(empty)
$ pgrep -P 58579 # zero entries — no children
(empty)
$ sample 58579 2 -mayDie
Call graph:
1723 Thread_5344885 DispatchQueue_1: com.apple.main-thread (serial)
+ 1723 start (in dyld) + 6992 [0x18270be00]
+ 1723 ??? (in bun) load address 0x100be0000 + 0x9bdd0c
# Main thread parked in kevent64 — classic Bun event-loop wait with nothing scheduled
Last log line before the hang, every time:
[cycle.synthesize] start
[dream] model "anthropic:claude-sonnet-4-6" is not in MODEL_CONTEXT_TOKENS; using 180000-token fallback budget. Set dream.synthesize.max_prompt_tokens to override.
No further output. Process consumes ~2GB RSS over time but does no work.
Root cause analysis
After tracing through src/core/cycle/synthesize.ts and src/core/minions/queue.ts:
Synthesize fans out one subagent job per worth-processing transcript via MinionQueue.add() with allowProtectedSubmit: true.
After submission, it calls waitForCompletion(queue, jobId, { timeoutMs: 35 * 60 * 1000, pollMs: 5_000 }) for each child.
waitForCompletion polls gbrain_jobs.status for that id, expecting a worker to pick it up and transition it through running → completed / failed.
On PGLite there is no worker.gbrain jobs work refuses to start with: "Error: Worker daemon requires Postgres. PGLite uses an exclusive file lock that blocks other processes."
The submitted jobs sit at status waiting forever. The orchestrator polls them every 5s for up to 35 min per job — then the minion's TimeoutError fires, status becomes timeout, but only after 35 min per job. With 5 transcripts that's nearly 3 hours wall time.
Confirmed by inspecting queue state during/after a hung run:
$ gbrain jobs list
ID Name Status Queue Time Created
324 subagent waiting default — 2026-05-22T15:57:06
323 subagent waiting default — 2026-05-22T15:57:06
... (324 stuck jobs from previous attempts) ...
$ gbrain jobs stats
Queue health: 324 waiting, 0 active, 0 stalled
(I cancelled all 324 via gbrain jobs cancel; the queue stays clean until the next synthesize run repopulates it with new waiters.)
Why this wasn't caught earlier
Postgres users have a worker daemon (gbrain jobs work) running alongside the cycle, so the same code path works for them.
The PGLite engine's documentation rightly says it uses an exclusive file lock that prevents a separate worker — but the synthesize phase wasn't gated on engine type when v0.38+ moved the work into the minion queue.
gbrain doctor --fast doesn't flag this (passes with 90/100 on the broken setup).
Expected behavior
Either:
Run subagent jobs inline on PGLite. Synthesize should detect engine.kind === 'pglite' and execute children synchronously in the orchestrator process instead of submitting to the queue. (This is what v0.37 did, and what gbrain jobs submit <name> --follow does today.)
Skip the phase with a clear error. If the phase architecturally requires a worker, loadSynthConfig or the phase entrypoint should return failed('synthesize requires worker daemon (Postgres engine); current engine: pglite') so users see what's wrong instead of an indefinite hang.
Provide a PGLite-compatible worker. Bracket the worker around the cycle: orchestrator releases the writer lock, worker takes it, processes one job, releases, orchestrator resumes. (Heavier surgery.)
Workarounds for users hitting this today
# Disable synthesize entirely (other cycle phases run cleanly)
gbrain config set dream.synthesize.enabled false
This bypasses the broken code path. Sync, embed, lint, backlinks, doctor all still run on the nightly cycle.
To force-release a stale gbrain-cycle lock left by a killed synthesize hang:
// Connect to ~/.gbrain/brain.pglite directly, run:// DELETE FROM gbrain_cycle_locks WHERE id = 'gbrain-cycle';// Otherwise the 30-min TTL has to expire.
A gbrain doctor --release-stale-locks (or similar admin command) would help here.
Related
PGLite WASM crash on macOS 26.3 with Bun 1.3.11 #223 — macOS WASM crash. Not the same bug but easy to confuse: rebuilding from #223 workaround does NOT resolve the synthesize hang. I rebuilt with gbrain init --pglite and re-imported all 7178 pages cleanly, and the hang reproduces immediately on the fresh brain.
v0.38.1.0 commit message ("provider-agnostic subagent loop + remote MCP dispatch + budget meter") — likely where the regression entered.
Happy to provide additional diagnostics (full sample trace, jobs table dump, etc.) or test a patch.
Summary
gbrain dream(full cycle or--phase synthesizealone) deterministically hangs at the start of[cycle.synthesize]on PGLite with zero apparent cause: idle main thread, no TCP sockets, no Anthropic API calls in flight, no child processes. Reproduced on v0.37.3.0 and v0.39.0.0 against both an established and a freshly-rebuilt brain.This appears to be a regression in the v0.38.1 "provider-agnostic subagent loop" rebuild on PGLite engines specifically. The PGLite engine does not run a worker daemon (per
gbrain jobs workerror: "Worker daemon requires Postgres"), but the v0.38+ synthesize phase submitssubagentjobs to the queue andwaitForCompletion-polls them indefinitely. Without a worker, the jobs stay inwaitingstate and the cycle never advances.Environment
gbrain init --pglite)dream.synthesize.session_corpus_dir = ~/brain/raw-transcripts/(5 .txt files for the minimal repro; same hang at 397 files)openai:text-embedding-3-large/ 1536 dimsmodels.dream.synthesize = anthropic:claude-sonnet-4-6,models.dream.patterns = anthropic:claude-haiku-4-5Reproduction
Observed process state during hang
Last log line before the hang, every time:
No further output. Process consumes ~2GB RSS over time but does no work.
Root cause analysis
After tracing through
src/core/cycle/synthesize.tsandsrc/core/minions/queue.ts:subagentjob per worth-processing transcript viaMinionQueue.add()withallowProtectedSubmit: true.waitForCompletion(queue, jobId, { timeoutMs: 35 * 60 * 1000, pollMs: 5_000 })for each child.waitForCompletionpollsgbrain_jobs.statusfor that id, expecting a worker to pick it up and transition it throughrunning→completed/failed.gbrain jobs workrefuses to start with: "Error: Worker daemon requires Postgres. PGLite uses an exclusive file lock that blocks other processes."waitingforever. The orchestrator polls them every 5s for up to 35 min per job — then the minion's TimeoutError fires, status becomestimeout, but only after 35 min per job. With 5 transcripts that's nearly 3 hours wall time.Confirmed by inspecting queue state during/after a hung run:
(I cancelled all 324 via
gbrain jobs cancel; the queue stays clean until the next synthesize run repopulates it with new waiters.)Why this wasn't caught earlier
gbrain jobs work) running alongside the cycle, so the same code path works for them.gbrain doctor --fastdoesn't flag this (passes with 90/100 on the broken setup).Expected behavior
Either:
engine.kind === 'pglite'and execute children synchronously in the orchestrator process instead of submitting to the queue. (This is what v0.37 did, and whatgbrain jobs submit <name> --followdoes today.)loadSynthConfigor the phase entrypoint should returnfailed('synthesize requires worker daemon (Postgres engine); current engine: pglite')so users see what's wrong instead of an indefinite hang.Workarounds for users hitting this today
This bypasses the broken code path. Sync, embed, lint, backlinks, doctor all still run on the nightly cycle.
To force-release a stale
gbrain-cyclelock left by a killed synthesize hang:A
gbrain doctor --release-stale-locks(or similar admin command) would help here.Related
#223workaround does NOT resolve the synthesize hang. I rebuilt withgbrain init --pgliteand re-imported all 7178 pages cleanly, and the hang reproduces immediately on the fresh brain.Happy to provide additional diagnostics (full sample trace, jobs table dump, etc.) or test a patch.