Summary
When the autopilot hits its 5-consecutive-cycle-failure threshold and logs 5 consecutive cycle failures. Stopping autopilot., the process does not actually exit. Instead it enters a reconnect-retry loop that logs three lines every 5 minutes forever:
[2026-05-18T15:16:53] [reconnect] ERROR: undefined is not an object (evaluating 'config.database_url')
[2026-05-18T15:16:53] [dispatch] ERROR: No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
[2026-05-18T15:16:53] [health] ERROR: No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>
~/.gbrain/config.json on disk is still valid throughout (e.g. gbrain query from the same machine works fine). The autopilot's in-process config reference has gone undefined and the reconnect path doesn't re-read from disk before retrying.
Repro
- Create connection pressure that causes autopilot cycles to fail. In my case the trigger was Supabase session-pool exhaustion (
(EMAXCONNSESSION) max clients reached in session mode - max clients are limited to pool_size: 15) from accumulated gbrain serve processes across days — see companion issue [link to serve-accumulation issue]. Any sustained DB-unreachability of similar shape should reproduce.
- Wait for 5 consecutive
autopilot-cycle job failures.
- Observe
5 consecutive cycle failures. Stopping autopilot. in ~/.gbrain/autopilot.err.
ps aux | grep gbrain.*autopilot — the process is still running.
tail -f ~/.gbrain/autopilot.err — the three-line reconnect/dispatch/health error block repeats every 5 minutes (the autopilot interval), forever.
Diagnosis
Stack trace from the most recent occurrence:
GBrainError: No database connection: connect() has not been called.
at getConnection (src/core/db.ts:153)
at transaction (src/core/postgres-engine.ts:524)
at failJob (src/core/minions/queue.ts:846)
at executeJob (src/core/minions/worker.ts:715)
The [reconnect] log line specifically reads undefined is not an object (evaluating 'config.database_url') — meaning the in-process `config` reference itself is undefined at the moment of retry, not just the connection. This points to a state-handoff bug in the reconnect path rather than a transient network issue:
- The cycle-failure threshold path probably nulls or scopes-out the config object before triggering "Stopping autopilot."
- The reconnect retry then runs against that nulled config, fails on the dereference, and the process stays alive in a non-functional zombie state.
- The "Stopping autopilot." message is misleading — it logs intent but doesn't actually terminate the process or restart with a fresh config-load.
Workaround (verified 2026-05-18)
# Force-kill the zombie autopilot
kill -KILL <pid>
rm -f ~/.gbrain/autopilot.lock
# Whatever watchdog you have (launchd / cron / systemd) respawns autopilot
# with a clean config-load and the loop is gone. Verified next cycle reports:
# [cycle] score=70 elapsed=1s next=300s
Suggested fix directions
- After "Stopping autopilot." actually exit with non-zero code so the watchdog can respawn cleanly. Don't drop into a reconnect loop.
- OR: have the reconnect path re-read `~/.gbrain/config.json` from disk on each retry instead of relying on the in-process config reference.
- OR: when `config` is detected undefined, treat as fatal and exit instead of continuing to retry against an undefined reference.
Any of the three closes the loop. (1) is probably cleanest — preserves the watchdog contract.
Environment
- gbrain: 0.33.1.1
- Bun: 1.3.11
- Platform: macOS Darwin 25.5.0 (arm64)
- Engine: postgres (Supabase, session-mode pooler port 5432)
- Discovered while diagnosing a Supabase session-pool exhaustion incident from accumulated `serve` processes; the upstream pool issue was unrelated but exposed this autopilot reconnect bug.
Happy to PR option 1 if the maintainer team agrees on the direction.
Summary
When the autopilot hits its 5-consecutive-cycle-failure threshold and logs
5 consecutive cycle failures. Stopping autopilot., the process does not actually exit. Instead it enters a reconnect-retry loop that logs three lines every 5 minutes forever:~/.gbrain/config.jsonon disk is still valid throughout (e.g.gbrain queryfrom the same machine works fine). The autopilot's in-process config reference has gone undefined and the reconnect path doesn't re-read from disk before retrying.Repro
(EMAXCONNSESSION) max clients reached in session mode - max clients are limited to pool_size: 15) from accumulatedgbrain serveprocesses across days — see companion issue [link to serve-accumulation issue]. Any sustained DB-unreachability of similar shape should reproduce.autopilot-cyclejob failures.5 consecutive cycle failures. Stopping autopilot.in~/.gbrain/autopilot.err.ps aux | grep gbrain.*autopilot— the process is still running.tail -f ~/.gbrain/autopilot.err— the three-line reconnect/dispatch/health error block repeats every 5 minutes (the autopilot interval), forever.Diagnosis
Stack trace from the most recent occurrence:
The
[reconnect]log line specifically readsundefined is not an object (evaluating 'config.database_url')— meaning the in-process `config` reference itself is undefined at the moment of retry, not just the connection. This points to a state-handoff bug in the reconnect path rather than a transient network issue:Workaround (verified 2026-05-18)
Suggested fix directions
Any of the three closes the loop. (1) is probably cleanest — preserves the watchdog contract.
Environment
Happy to PR option 1 if the maintainer team agrees on the direction.