Version: gbrain 0.42.10.0 (Postgres engine; minion jobs supervisor + child jobs work)
Type: Feature request
Severity: Medium — production interactive co-tenant (a chat gateway) was starved of CPU for hours by full-throttle brain processing.
TL;DR
jobs supervisor / jobs work run at default process priority and there's no built-in way to make the brain yield CPU to an interactive co-tenant. On a shared box where gbrain runs alongside something latency-sensitive (a chat gateway, a UI server, an editor), a heavy autopilot/embed backlog at full concurrency pins the cores and the foreground process visibly lags. Wrapping the binary in nice from outside almost works — but the supervisor spawns its worker as a child, and operators reasonably expect the priority to propagate to the whole tree. Request: a first-class --nice <n> flag on jobs supervisor (and jobs work) that calls setpriority on itself and is inherited by every spawned child (tini + worker).
Motivation (real incident)
Shared 126GB box: gbrain minion supervisor (concurrency 3) co-located with a chat gateway that handles interactive user turns. After a backlog of ~40 retried jobs + autopilot cycles started draining at full concurrency, load average hit ~7 and the gateway — at default priority, competing head-to-head with a 90%+ CPU worker — fell behind on user-facing responses by minutes.
The fix was simple and correct: run the gbrain tree at nice +15. The brain still gets full concurrency and drains the queue, but it only consumes CPU the interactive process isn't using. Load dropped from ~7 to ~3 with no throughput loss on an otherwise-idle-cored box, and user latency returned to normal. Concurrency starvation (dropping to 1) was the wrong lever — niceness is. Full parallelism for throughput, low scheduling priority so foreground work always wins contention.
Why "just wrap it in nice" isn't enough
- Propagation expectation.
nice -n 15 gbrain jobs supervisor … does set the supervisor's priority, and Linux does inherit niceness to children, so in practice the worker comes up niced too. But this is implicit and easy to get wrong: any code path that re-execs, detaches, or resets priority breaks it silently, and there's no signal in jobs stats/doctor that the tree is (or isn't) niced. A flag makes the intent explicit, testable, and observable.
- Discoverability. Operators hit this exact CPU-contention wall and reinvent the
nice wrapper from scratch (we did). A documented --nice flag turns tribal ops knowledge into a supported feature everyone benefits from.
- Self-contained supervision. For deployments that use gbrain's own
jobs supervisor as the top-level process manager (rather than an external wrapper), there's currently no in-band way to set priority at all.
Proposed behavior
- Add
--nice <n> (range -20..19, default unset = no change) to jobs supervisor and jobs work.
- On startup, the supervisor calls
process.setpriority?.(0, n) (Node ≥10 has os.setPriority) on itself.
- When spawning the child worker, either (a) rely on inheritance (document it), or better (b) pass the same
--nice through and have the worker set its own priority after setpriority-capable startup, so it's robust to any priority reset in the spawn path.
- Surface the effective niceness in
jobs stats / doctor (e.g. worker: pid=… nice=15) so operators can confirm the tree is yielding as intended.
- No behavior change when the flag is omitted.
Acceptance
gbrain jobs supervisor --nice 15 → supervisor and spawned worker both run at OS nice 15 (verifiable via ps -o ni).
doctor/jobs stats reports effective worker niceness.
- Omitting
--nice leaves priority untouched (back-compat).
Notes
Filed from a production CPU-contention incident, 2026-06-03. Local remediation: spawn the supervisor tree under nice -n 15, kept concurrency at 3 — full throughput, interactive gateway always wins CPU.
Version: gbrain 0.42.10.0 (Postgres engine; minion
jobs supervisor+ childjobs work)Type: Feature request
Severity: Medium — production interactive co-tenant (a chat gateway) was starved of CPU for hours by full-throttle brain processing.
TL;DR
jobs supervisor/jobs workrun at default process priority and there's no built-in way to make the brain yield CPU to an interactive co-tenant. On a shared box where gbrain runs alongside something latency-sensitive (a chat gateway, a UI server, an editor), a heavy autopilot/embed backlog at full concurrency pins the cores and the foreground process visibly lags. Wrapping the binary innicefrom outside almost works — but the supervisor spawns its worker as a child, and operators reasonably expect the priority to propagate to the whole tree. Request: a first-class--nice <n>flag onjobs supervisor(andjobs work) that callssetpriorityon itself and is inherited by every spawned child (tini + worker).Motivation (real incident)
Shared 126GB box: gbrain minion supervisor (concurrency 3) co-located with a chat gateway that handles interactive user turns. After a backlog of ~40 retried jobs + autopilot cycles started draining at full concurrency, load average hit ~7 and the gateway — at default priority, competing head-to-head with a 90%+ CPU worker — fell behind on user-facing responses by minutes.
The fix was simple and correct: run the gbrain tree at
nice +15. The brain still gets full concurrency and drains the queue, but it only consumes CPU the interactive process isn't using. Load dropped from ~7 to ~3 with no throughput loss on an otherwise-idle-cored box, and user latency returned to normal. Concurrency starvation (dropping to 1) was the wrong lever — niceness is. Full parallelism for throughput, low scheduling priority so foreground work always wins contention.Why "just wrap it in nice" isn't enough
nice -n 15 gbrain jobs supervisor …does set the supervisor's priority, and Linux does inherit niceness to children, so in practice the worker comes up niced too. But this is implicit and easy to get wrong: any code path that re-execs, detaches, or resets priority breaks it silently, and there's no signal injobs stats/doctorthat the tree is (or isn't) niced. A flag makes the intent explicit, testable, and observable.nicewrapper from scratch (we did). A documented--niceflag turns tribal ops knowledge into a supported feature everyone benefits from.jobs supervisoras the top-level process manager (rather than an external wrapper), there's currently no in-band way to set priority at all.Proposed behavior
--nice <n>(range -20..19, default unset = no change) tojobs supervisorandjobs work.process.setpriority?.(0, n)(Node ≥10 hasos.setPriority) on itself.--nicethrough and have the worker set its own priority aftersetpriority-capable startup, so it's robust to any priority reset in the spawn path.jobs stats/doctor(e.g.worker: pid=… nice=15) so operators can confirm the tree is yielding as intended.Acceptance
gbrain jobs supervisor --nice 15→ supervisor and spawned worker both run at OS nice 15 (verifiable viaps -o ni).doctor/jobs statsreports effective worker niceness.--niceleaves priority untouched (back-compat).Notes
setprioritysemantics, no-op with a one-line warn rather than failing.Filed from a production CPU-contention incident, 2026-06-03. Local remediation: spawn the supervisor tree under
nice -n 15, kept concurrency at 3 — full throughput, interactive gateway always wins CPU.