Feature: --nice flag on jobs supervisor/work to yield CPU to interactive co-tenants (priority propagated to spawned workers)

**Version:** gbrain 0.42.10.0 (Postgres engine; minion `jobs supervisor` + child `jobs work`)
**Type:** Feature request
**Severity:** Medium — production interactive co-tenant (a chat gateway) was starved of CPU for hours by full-throttle brain processing.

## TL;DR

`jobs supervisor` / `jobs work` run at default process priority and there's no built-in way to make the brain *yield* CPU to an interactive co-tenant. On a shared box where gbrain runs alongside something latency-sensitive (a chat gateway, a UI server, an editor), a heavy autopilot/embed backlog at full concurrency pins the cores and the foreground process visibly lags. Wrapping the binary in `nice` from outside *almost* works — but the supervisor spawns its worker as a child, and operators reasonably expect the priority to propagate to the whole tree. Request: a first-class `--nice <n>` flag on `jobs supervisor` (and `jobs work`) that calls `setpriority` on itself and is inherited by every spawned child (tini + worker).

## Motivation (real incident)

Shared 126GB box: gbrain minion supervisor (concurrency 3) co-located with a chat gateway that handles interactive user turns. After a backlog of ~40 retried jobs + autopilot cycles started draining at full concurrency, load average hit ~7 and the gateway — at default priority, competing head-to-head with a 90%+ CPU worker — fell behind on user-facing responses by minutes.

The fix was simple and correct: run the gbrain tree at `nice +15`. The brain still gets full concurrency and drains the queue, but it only consumes CPU the interactive process isn't using. Load dropped from ~7 to ~3 with no throughput loss on an otherwise-idle-cored box, and user latency returned to normal. **Concurrency starvation (dropping to 1) was the wrong lever — niceness is.** Full parallelism for throughput, low scheduling priority so foreground work always wins contention.

## Why "just wrap it in nice" isn't enough

1. **Propagation expectation.** `nice -n 15 gbrain jobs supervisor …` does set the supervisor's priority, and Linux *does* inherit niceness to children, so in practice the worker comes up niced too. But this is implicit and easy to get wrong: any code path that re-execs, detaches, or resets priority breaks it silently, and there's no signal in `jobs stats`/`doctor` that the tree is (or isn't) niced. A flag makes the intent explicit, testable, and observable.
2. **Discoverability.** Operators hit this exact CPU-contention wall and reinvent the `nice` wrapper from scratch (we did). A documented `--nice` flag turns tribal ops knowledge into a supported feature everyone benefits from.
3. **Self-contained supervision.** For deployments that use gbrain's own `jobs supervisor` as the top-level process manager (rather than an external wrapper), there's currently no in-band way to set priority at all.

## Proposed behavior

- Add `--nice <n>` (range -20..19, default unset = no change) to `jobs supervisor` and `jobs work`.
- On startup, the supervisor calls `process.setpriority?.(0, n)` (Node ≥10 has `os.setPriority`) on itself.
- When spawning the child worker, either (a) rely on inheritance (document it), or better (b) pass the same `--nice` through and have the worker set its own priority after `setpriority`-capable startup, so it's robust to any priority reset in the spawn path.
- Surface the effective niceness in `jobs stats` / `doctor` (e.g. `worker: pid=… nice=15`) so operators can confirm the tree is yielding as intended.
- No behavior change when the flag is omitted.

## Acceptance

- `gbrain jobs supervisor --nice 15` → supervisor and spawned worker both run at OS nice 15 (verifiable via `ps -o ni`).
- `doctor`/`jobs stats` reports effective worker niceness.
- Omitting `--nice` leaves priority untouched (back-compat).

## Notes

- Distinct from #1801 (alive-but-wedged worker not restarted) — that's a recovery bug; this is a scheduling/co-tenancy feature. They compose: a niced worker that wedges still needs #1801's progress watchdog to recover it.
- Linux-first; on platforms without `setpriority` semantics, no-op with a one-line warn rather than failing.

---
_Filed from a production CPU-contention incident, 2026-06-03. Local remediation: spawn the supervisor tree under `nice -n 15`, kept concurrency at 3 — full throughput, interactive gateway always wins CPU._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: --nice flag on jobs supervisor/work to yield CPU to interactive co-tenants (priority propagated to spawned workers) #1815

TL;DR

Motivation (real incident)

Why "just wrap it in nice" isn't enough

Proposed behavior

Acceptance

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: --nice flag on jobs supervisor/work to yield CPU to interactive co-tenants (priority propagated to spawned workers) #1815

Description

TL;DR

Motivation (real incident)

Why "just wrap it in nice" isn't enough

Proposed behavior

Acceptance

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions