Skip to content

gbrain dream: intermittent 4-second exit-1 crash with no error output, leaves stale gbrain-cycle lock #1535

@Mr-B-1

Description

@Mr-B-1

Summary

gbrain dream (invoked via systemd timer cron) intermittently exits with code 1 after ~4 seconds of wall-time and ~7MB memory peak, having written no diagnostic output to its configured StandardOutput/StandardError log. The crash leaves the gbrain-cycle row in gbrain_cycle_locks un-released, compounding with #1534.

Symptoms

From journalctl -u gbrain-dream.service --no-pager:

May 25 09:00:00 CloudTron systemd[1]: Starting gbrain-dream.service - GBrain nightly dream cycle...
May 25 09:34:07 CloudTron systemd[1]: Finished gbrain-dream.service - GBrain nightly dream cycle.
May 25 09:34:07 CloudTron systemd[1]: gbrain-dream.service: Consumed 27.721s CPU time.
May 26 09:00:01 CloudTron systemd[1]: Starting gbrain-dream.service - GBrain nightly dream cycle...
May 26 09:00:05 CloudTron systemd[1]: gbrain-dream.service: Main process exited, code=exited, status=1/FAILURE
May 26 09:00:05 CloudTron systemd[1]: gbrain-dream.service: Failed with result "exit-code".
May 26 09:00:05 CloudTron systemd[1]: gbrain-dream.service: Consumed 2.618s CPU time, 7.0M memory peak, 0B memory swap peak.

A successful run takes 30-35 min wall-time, ~28s CPU, ~440 MB memory peak. The failure mode is a 4-second exit with 7 MB memory peak — the process barely started.

A second instance: cycle attempt at 21:40 PST same day also died (PID 3917507 left a stale lock).

What I tried

  1. Reproduce interactively with the exact systemd ExecStart args:

    /usr/local/bin/gbrain-job.sh dream --dir /data/brain --source default
    

    Result: Skipped: another cycle is already running. (locked) — exit 0. (The previous crash left the lock held, so this path is benign and exits cleanly.)

  2. gbrain dream --dry-run --json — exit 0, status: skipped, reason: cycle_already_running

  3. env -i HOME=/root PATH=... to strip env contamination — same "Skipped" result

  4. Check /var/log/gbrain/gbrain-dream.log — file uses StandardOutput=append so all runs commingle without per-run boundary. The 4s crash either wrote nothing or wrote ≤3 lines indistinguishable from prior runs ending mid-[cycle.conversation_facts_backfill].

  5. Postgres health — 9 connections active of 100 max, no errors in postgres log, SELECT 1 returns fine.

  6. systemd drop-ins — none for this service.

  7. Recent config changes/root/.config/api-keys.env last modified 24h before the crash, /root/.gbrain/pg.env modified 7d before, /etc/systemd/system/gbrain-dream.service modified 12h before.

Suggested upstream improvements

  1. Per-run log demarcation: write ---- run START <ISO> pid=<N> args=... and ---- run END <ISO> exit=<N> duration=<ms> markers to make StandardOutput=append logs separable.
  2. Startup heartbeat: write a startup ok line within the first 500ms of gbrain dream so failures BEFORE that line are distinguishable from failures AFTER.
  3. Crash trap: wrap the top-level dream entrypoint in try/catch that flushes stderr with the exception class + stack before process.exit(1). Right now the 4s exit-1 produces zero diagnostic output.
  4. Compound effect with doctor stale_locks: --break-lock hint hard-codes <code>gbrain sync</code> even when lock is <code>gbrain-cycle</code> #1534: when the crash happens, the lock stays held, and the CLI has no native breaker for gbrain-cycle locks. Fixing doctor stale_locks: --break-lock hint hard-codes <code>gbrain sync</code> even when lock is <code>gbrain-cycle</code> #1534 + adding (3) above would close the loop.

Environment

  • gbrain v0.41.14.0 (local patches: bench-publish dispatcher, ollama tuning — neither touches dream or cycle code)
  • Ubuntu 24.04 on Hetzner CCX33
  • PostgreSQL 16, pgvector
  • Embedding provider: ollama:nomic-embed-text (local)
  • Service definition: /etc/systemd/system/gbrain-dream.service/usr/local/bin/gbrain-job.sh dream --dir /data/brain --source default

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions