Description
The minion worker subprocess (gbrain jobs work) spawned by gbrain autopilot crashes repeatedly because engine.connect() has not been called before the worker's main loop attempts to access the database.
This prevents ALL autopilot cycles from completing, and causes data loss during the extract phase.
Environment
- gbrain version: 0.41.14.0
- Engine: Postgres
- Config:
agent.use_gateway_loop=true, chat_model: deepseek:deepseek-v4-flash
- OS: Linux (WSL2 Ubuntu)
Symptoms
1. Worker crash loop (51 crashes recorded)
Promotion error: No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
[autopilot] worker exited code=1 signal=null after 1409ms, crashCount=4, cause=runtime_error
[autopilot] crash backoff 8111ms (crashCount=4)
Error counts:
- 51 worker crashes
- 265 "Promotion error" in
promoteDelayed()
- 446 total "No database connection" errors
- 52 batch link row losses during extraction
2. Data loss during extract.links_fs
[extract.links_fs] 60/60 (100%)
[extract.links_fs] connection blip, retrying 26 rows in 500ms (No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>)
batch error (26 link rows lost): No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
[extract.links_fs] 60/60 (100%) done
Links: created 0 from 60 pages
3. Unhandled rejection during conversation_facts_backfill
[cycle.conversation_facts_backfill] start
[unhandledRejection] GBrainError: No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
at getConnection (/.../gbrain/src/core/db.ts:153:15)
at transaction (/.../gbrain/src/core/postgres-engine.ts:764:34)
at failJob (/.../gbrain/src/core/minions/queue.ts:855:24)
at executeJob (/.../gbrain/src/core/minions/worker.ts:831:39)
at processTicksAndRejections (native:7:39)
4. All sources never complete a full cycle
[FAIL] cycle_freshness: Source 'aevum' has never completed a full cycle;
Source 'agent-arch' has never completed a full cycle; ... (all 9 sources)
Brain score permanently stuck at 51/100 because link extraction (3/25), timeline extraction (2/15), and orphan resolution (1/15) cannot complete.
Root Cause Analysis
The main gbrain CLI process correctly calls createEngine() → connectWithRetry() at startup (src/cli.ts:1687-1691). However, the minion worker is spawned as a separate bun gbrain jobs work subprocess by the supervisor (src/core/minions/supervisor.ts). This child process also goes through the full CLI initialization and should call connectWithRetry(), but something in the worker's startup path causes connect() to not be reliably established before promoteDelayed() or executeJob() attempts DB access.
The worker's queue.promoteDelayed() runs in the main loop before any job is claimed (src/core/minions/worker.ts:434), and if the engine's connection pool isn't ready at that point, the error is caught but the worker then crashes with an unhandled rejection when executeJob() later tries to call getConnection().
Contrast: Main process works fine
The parent autopilot process and direct gbrain dream invocations work correctly — DB access succeeds. Only the minion worker child process experiences this.
Suggested Fix Direction
The worker's main loop should verify that engine.connect() has been called before entering the promoteDelayed() / claim / execute cycle. Alternatively, the connection should be lazily established on first use rather than requiring an explicit connect() call.
A workaround might be to pass GBRAIN_DATABASE_URL as an environment variable so the worker can self-initialize without depending on parent state.
Related
Description
The minion worker subprocess (
gbrain jobs work) spawned bygbrain autopilotcrashes repeatedly becauseengine.connect()has not been called before the worker's main loop attempts to access the database.This prevents ALL autopilot cycles from completing, and causes data loss during the extract phase.
Environment
agent.use_gateway_loop=true,chat_model: deepseek:deepseek-v4-flashSymptoms
1. Worker crash loop (51 crashes recorded)
Error counts:
promoteDelayed()2. Data loss during extract.links_fs
3. Unhandled rejection during conversation_facts_backfill
4. All sources never complete a full cycle
Brain score permanently stuck at 51/100 because link extraction (3/25), timeline extraction (2/15), and orphan resolution (1/15) cannot complete.
Root Cause Analysis
The main
gbrainCLI process correctly callscreateEngine()→connectWithRetry()at startup (src/cli.ts:1687-1691). However, the minion worker is spawned as a separatebun gbrain jobs worksubprocess by the supervisor (src/core/minions/supervisor.ts). This child process also goes through the full CLI initialization and should callconnectWithRetry(), but something in the worker's startup path causesconnect()to not be reliably established beforepromoteDelayed()orexecuteJob()attempts DB access.The worker's
queue.promoteDelayed()runs in the main loop before any job is claimed (src/core/minions/worker.ts:434), and if the engine's connection pool isn't ready at that point, the error is caught but the worker then crashes with an unhandled rejection whenexecuteJob()later tries to callgetConnection().Contrast: Main process works fine
The parent autopilot process and direct
gbrain dreaminvocations work correctly — DB access succeeds. Only the minion worker child process experiences this.Suggested Fix Direction
The worker's main loop should verify that
engine.connect()has been called before entering thepromoteDelayed()/ claim / execute cycle. Alternatively, the connection should be lazily established on first use rather than requiring an explicitconnect()call.A workaround might be to pass
GBRAIN_DATABASE_URLas an environment variable so the worker can self-initialize without depending on parent state.Related