Summary
gbrain sync --break-lock --max-age N exists and self-heals wedged sync locks.
The cycle lock (gbrain-cycle in gbrain_cycle_locks) has no equivalent CLI
escape hatch. When a dream/autopilot cycle gets stuck (cancelled job, embed phase
ignoring abort signal, timeout), the cycle lock stays held indefinitely and every
subsequent gbrain dream skips with cycle_already_running.
This is documented in TODOS.md as a known issue. Garry hits it daily on his
production brain.
Reproduction
- Run
gbrain dream — starts an autopilot-cycle job
- Cancel the job mid-run (or let it timeout during embed phase)
- The
gbrain_cycle_locks row stays held (ttl_expires_at in the future,
or the autopilot daemon refreshes it before TTL expires)
- Every subsequent
gbrain dream → cycle_already_running → skipped
gbrain sync --break-lock only targets gbrain-sync, not gbrain-cycle
- No CLI command exists to break the cycle lock
Current workaround
Manually DELETE FROM public.gbrain_cycle_locks in the database, then
immediately run gbrain dream --phase X for individual phases. Race the
autopilot daemon which re-acquires the lock within seconds.
Proposed fix
One or both of:
Option A: gbrain dream --break-lock [--max-age N]
Mirror the gbrain sync --break-lock --max-age N pattern for the cycle lock.
Clears any gbrain-cycle lock in gbrain_cycle_locks where last_refreshed_at
is older than N seconds, then proceeds with the dream cycle.
Option B: Autopilot self-heals stale locks on startup
When the autopilot daemon starts (or on each tick), check if the held lock's
last_refreshed_at is older than 2× the TTL. If so, break it and proceed.
This makes the autopilot self-healing without operator intervention.
Option C: Embed phase respects abort signals (TODOS.md fix)
The root cause from TODOS.md: "Embed phase ignores signal.aborted between
batches today. Job wall-clock timeout fires → handler keeps running → cycle's
finally block unreachable → gbrain_cycle_locks row stays held indefinitely."
Adding signal.aborted checks between embed batches would prevent the lock
from getting stuck in the first place.
Environment
- gbrain: v0.41.20.0
- Engine: postgres (Supabase EU/Ireland, Session pooler)
- Schema: v103
- Deployment: Railway (gbrain serve --http)
- Autopilot: via Minions queue (autopilot-cycle handler)
Impact
The cycle lock bug means the brain never self-maintains. No backlinks, no
orphan resolution, no enrichment, no pattern extraction. The brain score
degrades over time. The whole value proposition of "the brain gets smarter
while you sleep" breaks.
Summary
gbrain sync --break-lock --max-age Nexists and self-heals wedged sync locks.The cycle lock (
gbrain-cycleingbrain_cycle_locks) has no equivalent CLIescape hatch. When a dream/autopilot cycle gets stuck (cancelled job, embed phase
ignoring abort signal, timeout), the cycle lock stays held indefinitely and every
subsequent
gbrain dreamskips withcycle_already_running.This is documented in TODOS.md as a known issue. Garry hits it daily on his
production brain.
Reproduction
gbrain dream— starts an autopilot-cycle jobgbrain_cycle_locksrow stays held (ttl_expires_at in the future,or the autopilot daemon refreshes it before TTL expires)
gbrain dream→cycle_already_running→ skippedgbrain sync --break-lockonly targetsgbrain-sync, notgbrain-cycleCurrent workaround
Manually
DELETE FROM public.gbrain_cycle_locksin the database, thenimmediately run
gbrain dream --phase Xfor individual phases. Race theautopilot daemon which re-acquires the lock within seconds.
Proposed fix
One or both of:
Option A:
gbrain dream --break-lock [--max-age N]Mirror the
gbrain sync --break-lock --max-age Npattern for the cycle lock.Clears any
gbrain-cyclelock ingbrain_cycle_lockswherelast_refreshed_atis older than N seconds, then proceeds with the dream cycle.
Option B: Autopilot self-heals stale locks on startup
When the autopilot daemon starts (or on each tick), check if the held lock's
last_refreshed_atis older than 2× the TTL. If so, break it and proceed.This makes the autopilot self-healing without operator intervention.
Option C: Embed phase respects abort signals (TODOS.md fix)
The root cause from TODOS.md: "Embed phase ignores signal.aborted between
batches today. Job wall-clock timeout fires → handler keeps running → cycle's
finally block unreachable → gbrain_cycle_locks row stays held indefinitely."
Adding
signal.abortedchecks between embed batches would prevent the lockfrom getting stuck in the first place.
Environment
Impact
The cycle lock bug means the brain never self-maintains. No backlinks, no
orphan resolution, no enrichment, no pattern extraction. The brain score
degrades over time. The whole value proposition of "the brain gets smarter
while you sleep" breaks.