gbrain dream: intermittent 4-second exit-1 crash with no error output, leaves stale gbrain-cycle lock

## Summary

`gbrain dream` (invoked via systemd timer cron) intermittently exits with code 1 after ~4 seconds of wall-time and ~7MB memory peak, having written no diagnostic output to its configured `StandardOutput`/`StandardError` log. The crash leaves the `gbrain-cycle` row in `gbrain_cycle_locks` un-released, compounding with #1534.

## Symptoms

From `journalctl -u gbrain-dream.service --no-pager`:

```
May 25 09:00:00 CloudTron systemd[1]: Starting gbrain-dream.service - GBrain nightly dream cycle...
May 25 09:34:07 CloudTron systemd[1]: Finished gbrain-dream.service - GBrain nightly dream cycle.
May 25 09:34:07 CloudTron systemd[1]: gbrain-dream.service: Consumed 27.721s CPU time.
May 26 09:00:01 CloudTron systemd[1]: Starting gbrain-dream.service - GBrain nightly dream cycle...
May 26 09:00:05 CloudTron systemd[1]: gbrain-dream.service: Main process exited, code=exited, status=1/FAILURE
May 26 09:00:05 CloudTron systemd[1]: gbrain-dream.service: Failed with result "exit-code".
May 26 09:00:05 CloudTron systemd[1]: gbrain-dream.service: Consumed 2.618s CPU time, 7.0M memory peak, 0B memory swap peak.
```

A successful run takes 30-35 min wall-time, ~28s CPU, ~440 MB memory peak. The failure mode is a 4-second exit with 7 MB memory peak — the process barely started.

A second instance: cycle attempt at 21:40 PST same day also died (PID 3917507 left a stale lock).

## What I tried

1. **Reproduce interactively** with the exact systemd ExecStart args:
   ```
   /usr/local/bin/gbrain-job.sh dream --dir /data/brain --source default
   ```
   Result: `Skipped: another cycle is already running. (locked)` — exit 0. (The previous crash left the lock held, so this path is benign and exits cleanly.)

2. **`gbrain dream --dry-run --json`** — exit 0, status: skipped, reason: cycle_already_running

3. **`env -i HOME=/root PATH=...` to strip env contamination** — same "Skipped" result

4. **Check `/var/log/gbrain/gbrain-dream.log`** — file uses `StandardOutput=append` so all runs commingle without per-run boundary. The 4s crash either wrote nothing or wrote ≤3 lines indistinguishable from prior runs ending mid-`[cycle.conversation_facts_backfill]`.

5. **Postgres health** — 9 connections active of 100 max, no errors in postgres log, `SELECT 1` returns fine.

6. **systemd drop-ins** — none for this service.

7. **Recent config changes** — `/root/.config/api-keys.env` last modified 24h before the crash, `/root/.gbrain/pg.env` modified 7d before, `/etc/systemd/system/gbrain-dream.service` modified 12h before.

## Suggested upstream improvements

1. **Per-run log demarcation**: write `---- run START <ISO> pid=<N> args=...` and `---- run END <ISO> exit=<N> duration=<ms>` markers to make StandardOutput=append logs separable.
2. **Startup heartbeat**: write a `startup ok` line within the first 500ms of `gbrain dream` so failures BEFORE that line are distinguishable from failures AFTER.
3. **Crash trap**: wrap the top-level dream entrypoint in try/catch that flushes stderr with the exception class + stack before `process.exit(1)`. Right now the 4s exit-1 produces zero diagnostic output.
4. **Compound effect with #1534**: when the crash happens, the lock stays held, and the CLI has no native breaker for `gbrain-cycle` locks. Fixing #1534 + adding (3) above would close the loop.

## Environment

- gbrain v0.41.14.0 (local patches: bench-publish dispatcher, ollama tuning — neither touches dream or cycle code)
- Ubuntu 24.04 on Hetzner CCX33
- PostgreSQL 16, pgvector
- Embedding provider: ollama:nomic-embed-text (local)
- Service definition: `/etc/systemd/system/gbrain-dream.service` → `/usr/local/bin/gbrain-job.sh dream --dir /data/brain --source default`

## Related

- #1452 (dream silently drops --source flag)
- #1454 (doctor advertises config option that does not exist)
- #1534 (doctor stale_locks: --break-lock hint hard-codes wrong command, no native breaker for gbrain-cycle)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gbrain dream: intermittent 4-second exit-1 crash with no error output, leaves stale gbrain-cycle lock #1535

Summary

Symptoms

What I tried

Suggested upstream improvements

Environment

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

gbrain dream: intermittent 4-second exit-1 crash with no error output, leaves stale gbrain-cycle lock #1535

Description

Summary

Symptoms

What I tried

Suggested upstream improvements

Environment

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions