Skip to content

feat(dream): add --break-lock for gbrain-cycle (parity with sync --break-lock) #1591

@Software2EU

Description

@Software2EU

Summary

gbrain sync --break-lock --max-age N exists and self-heals wedged sync locks.
The cycle lock (gbrain-cycle in gbrain_cycle_locks) has no equivalent CLI
escape hatch. When a dream/autopilot cycle gets stuck (cancelled job, embed phase
ignoring abort signal, timeout), the cycle lock stays held indefinitely and every
subsequent gbrain dream skips with cycle_already_running.

This is documented in TODOS.md as a known issue. Garry hits it daily on his
production brain.

Reproduction

  1. Run gbrain dream — starts an autopilot-cycle job
  2. Cancel the job mid-run (or let it timeout during embed phase)
  3. The gbrain_cycle_locks row stays held (ttl_expires_at in the future,
    or the autopilot daemon refreshes it before TTL expires)
  4. Every subsequent gbrain dreamcycle_already_running → skipped
  5. gbrain sync --break-lock only targets gbrain-sync, not gbrain-cycle
  6. No CLI command exists to break the cycle lock

Current workaround

Manually DELETE FROM public.gbrain_cycle_locks in the database, then
immediately run gbrain dream --phase X for individual phases. Race the
autopilot daemon which re-acquires the lock within seconds.

Proposed fix

One or both of:

Option A: gbrain dream --break-lock [--max-age N]

Mirror the gbrain sync --break-lock --max-age N pattern for the cycle lock.
Clears any gbrain-cycle lock in gbrain_cycle_locks where last_refreshed_at
is older than N seconds, then proceeds with the dream cycle.

Option B: Autopilot self-heals stale locks on startup

When the autopilot daemon starts (or on each tick), check if the held lock's
last_refreshed_at is older than 2× the TTL. If so, break it and proceed.
This makes the autopilot self-healing without operator intervention.

Option C: Embed phase respects abort signals (TODOS.md fix)

The root cause from TODOS.md: "Embed phase ignores signal.aborted between
batches today. Job wall-clock timeout fires → handler keeps running → cycle's
finally block unreachable → gbrain_cycle_locks row stays held indefinitely."
Adding signal.aborted checks between embed batches would prevent the lock
from getting stuck in the first place.

Environment

  • gbrain: v0.41.20.0
  • Engine: postgres (Supabase EU/Ireland, Session pooler)
  • Schema: v103
  • Deployment: Railway (gbrain serve --http)
  • Autopilot: via Minions queue (autopilot-cycle handler)

Impact

The cycle lock bug means the brain never self-maintains. No backlinks, no
orphan resolution, no enrichment, no pattern extraction. The brain score
degrades over time. The whole value proposition of "the brain gets smarter
while you sleep" breaks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions