Summary
Integrate instantaneous power readings over time into persistent energy counters (Joules / kWh) and, when a $/kWh price is configured, display running cost in the TUI. Expose the counter as a Prometheus metric (all_smi_energy_consumed_joules_total) that works with rate() and increase(). Add an energy-focused summary section surfacing top consumers.
Motivation
The project already collects real-time chassis/GPU power every few seconds. Integrating those samples into an energy counter unlocks two operator-grade use cases:
- Session reporting — "this cluster consumed 27.4 kWh over the last 6 hours, equivalent to $3.29 at $0.12/kWh". Currently there is no in-TUI way to see this; users must integrate externally in PromQL.
- Carbon / sustainability reporting — total Joules is the raw input most sustainability dashboards want.
The data is a trivial sum += P_avg * dt running integration, so the cost/benefit is strongly positive.
Current state
- Chassis / GPU / CPU power readings are collected every collection tick (
AppConfig::MIN_RENDER_INTERVAL_MS drives rendering cadence; readers poll at the mode-configured interval).
AppConfig::HISTORY_MAX_ENTRIES = 100 provides short-term ring history but no integral.
- No Prometheus counter for energy exists; only gauges.
- No config path for $/kWh.
Proposed design
Integrator
Trapezoidal integration per device and per chassis, maintained as f64 Joules. For a sample stream (t_0, p_0), (t_1, p_1), ... the increment is ((p_{i-1} + p_i) / 2) * (t_i - t_{i-1}).
- Per-GPU counter:
Joules accumulator keyed by (host, gpu_uuid).
- Per-chassis counter: keyed by
host.
- Per-CPU counter where a CPU power reading exists (Apple Silicon, some Intel/AMD chipsets).
Missing samples:
- Gap ≤ 10s: linear interpolate the power across the gap.
- Gap > 10s: hold last reading (explicit rationale — a dropped sample is likelier than an instant doubling).
- NaN / negative: treat as zero for the integration window.
Persistence
- In-memory during a live session. Reset on
R hotkey (with confirmation toast).
- Optional disk-backed WAL at
~/.cache/all-smi/energy-wal.bin so Prometheus counters survive restart:
- Append a 16-byte record
(host_hash: u64, device_hash: u64, joules_delta: f64) every minute.
- On startup, replay.
- Crash-safe because entries are independent; a torn final record is just discarded.
Prometheus metric
New counter metric:
# HELP all_smi_energy_consumed_joules_total Cumulative energy consumption in Joules.
# TYPE all_smi_energy_consumed_joules_total counter
all_smi_energy_consumed_joules_total{host="dgx-01", gpu_index="0", gpu_uuid="..."} 8.43e6
all_smi_energy_consumed_joules_total{host="dgx-01", scope="chassis"} 6.13e7
This is a monotonic counter suitable for rate() and increase() queries.
TUI display
- Chassis renderer gains a row:
Energy session: 3.21 kWh | $0.39 (at $0.12/kWh).
- New local/remote section
E (or an expandable row inside the existing Chassis panel — pick whichever fits the redesigned local TUI layout):
- Top-3 consumers by device.
- Cumulative chassis energy, elapsed time, average power.
- Per-tab "session-kWh" chip next to the existing utilization chip.
R hotkey resets in-memory session counters (WAL is not rewound — it continues to accumulate across sessions; Prometheus users rely on monotonicity).
Config
Under [energy] section of the config file (see companion config issue):
[energy]
price_per_kwh = 0.12
currency = "USD"
show_cost = true
wal_path = "~/.cache/all-smi/energy-wal.bin"
gap_interpolate_seconds = 10
Env var overrides: ALL_SMI_ENERGY_PRICE, ALL_SMI_ENERGY_CURRENCY, ALL_SMI_ENERGY_NO_COST (unsets show_cost).
Implementation plan
Files to add / modify:
- New
src/metrics/energy.rs:
PowerIntegrator with record_sample(device_key, t, watts).
EnergyAccountant with per-device and per-chassis views.
- Trapezoidal integration, gap handling, reset,
Joules → kWh and cost helpers.
src/metrics/aggregator.rs — wire incoming power samples into the integrator each collection cycle.
- New
src/metrics/energy_wal.rs — append-only WAL with fsync cadence (60s).
src/api/metrics/ — export the new all_smi_energy_consumed_joules_total counter.
src/ui/renderers/chassis_renderer.rs — render the new row.
- New
src/ui/renderers/energy_renderer.rs — top-consumer summary section.
src/view/event_handler.rs — R hotkey.
src/app_state.rs — house the EnergyAccountant and reset timestamp.
src/common/config.rs — read [energy]; reasonable defaults.
src/mock/generator.rs — mock emits power consistently so mock mode can exercise energy counters meaningfully.
Acceptance criteria
Edge cases & non-goals
- Power readings available on some platforms only (Apple Silicon — ANE + package; NVIDIA — per-GPU; AMD — per-GPU on most SKUs). If no power is available for a device, energy for that device is not emitted (not zero).
- Cost is explicitly an approximation — document that idle/base load power and PSU efficiency are ignored.
- Currency is display-only; no FX conversion.
- Non-goal: historical data store. That's Prometheus' job; we only keep a WAL big enough for counter continuity.
- Non-goal: carbon intensity mapping (gCO2/kWh). Out of scope for v1; the Joule counter is enough for external tooling.
Soft dependency
- Config file issue — without it, ship with env-var only (
ALL_SMI_ENERGY_PRICE=0.12) and add config-file support when that lands.
Summary
Integrate instantaneous power readings over time into persistent energy counters (Joules / kWh) and, when a $/kWh price is configured, display running cost in the TUI. Expose the counter as a Prometheus metric (
all_smi_energy_consumed_joules_total) that works withrate()andincrease(). Add an energy-focused summary section surfacing top consumers.Motivation
The project already collects real-time chassis/GPU power every few seconds. Integrating those samples into an energy counter unlocks two operator-grade use cases:
The data is a trivial
sum += P_avg * dtrunning integration, so the cost/benefit is strongly positive.Current state
AppConfig::MIN_RENDER_INTERVAL_MSdrives rendering cadence; readers poll at the mode-configured interval).AppConfig::HISTORY_MAX_ENTRIES = 100provides short-term ring history but no integral.Proposed design
Integrator
Trapezoidal integration per device and per chassis, maintained as
f64Joules. For a sample stream(t_0, p_0), (t_1, p_1), ...the increment is((p_{i-1} + p_i) / 2) * (t_i - t_{i-1}).Joulesaccumulator keyed by(host, gpu_uuid).host.Missing samples:
Persistence
Rhotkey (with confirmation toast).~/.cache/all-smi/energy-wal.binso Prometheus counters survive restart:(host_hash: u64, device_hash: u64, joules_delta: f64)every minute.Prometheus metric
New counter metric:
This is a monotonic counter suitable for
rate()andincrease()queries.TUI display
Energy session: 3.21 kWh | $0.39 (at $0.12/kWh).E(or an expandable row inside the existing Chassis panel — pick whichever fits the redesigned local TUI layout):Rhotkey resets in-memory session counters (WAL is not rewound — it continues to accumulate across sessions; Prometheus users rely on monotonicity).Config
Under
[energy]section of the config file (see companion config issue):Env var overrides:
ALL_SMI_ENERGY_PRICE,ALL_SMI_ENERGY_CURRENCY,ALL_SMI_ENERGY_NO_COST(unsetsshow_cost).Implementation plan
Files to add / modify:
src/metrics/energy.rs:PowerIntegratorwithrecord_sample(device_key, t, watts).EnergyAccountantwith per-device and per-chassis views.Joules → kWhand cost helpers.src/metrics/aggregator.rs— wire incoming power samples into the integrator each collection cycle.src/metrics/energy_wal.rs— append-only WAL with fsync cadence (60s).src/api/metrics/— export the newall_smi_energy_consumed_joules_totalcounter.src/ui/renderers/chassis_renderer.rs— render the new row.src/ui/renderers/energy_renderer.rs— top-consumer summary section.src/view/event_handler.rs—Rhotkey.src/app_state.rs— house theEnergyAccountantand reset timestamp.src/common/config.rs— read[energy]; reasonable defaults.src/mock/generator.rs— mock emits power consistently so mock mode can exercise energy counters meaningfully.Acceptance criteria
show_cost = falseor price is 0.Rresets the session; Prometheus counter is not affected (continues monotonic).apimode with WAL replays the counter (monotonicity preserved).view, the cluster-level energy panel sums across hosts.Edge cases & non-goals
Soft dependency
ALL_SMI_ENERGY_PRICE=0.12) and add config-file support when that lands.