Skip to content

fix: GPU List "Freq" field intermittently disappears on macOS (Apple Silicon) #216

Description

@inureyes

Summary

On macOS (Apple Silicon), the Freq: field in the GPU List row of the view TUI intermittently appears and disappears between refreshes. The value is correct when shown, but the whole Freq: label+value vanishes for a refresh cycle and comes back later, which also shifts the layout of the fields after it (Pwr: etc.) on that row.

NVIDIA and other accelerators have not been confirmed yet, but the analysis below shows the rendering-layer half of the bug is cross-platform; the data-layer half is Apple-Silicon-specific.

Reproduction

  1. sudo ./target/release/all-smi view (or cargo run --bin all-smi -- view) on an Apple Silicon Mac.
  2. Watch the GPU List row over 20-30s while the machine is mostly idle.
  3. The Freq: field disappears and reappears across refresh cycles. It is most stable under sustained GPU load and most flickery when the GPU is near-idle.

Root cause

There are two distinct bugs that compound.

Bug A — data layer: idle Apple GPU computes frequency == 0

Render value chain:

  • AppleSiliconNativeGpuReader::get_gpu_info()frequency: metrics.frequency.unwrap_or(0)src/device/readers/apple_silicon_native.rs:236
  • metrics.frequency = Some(data.gpu_frequency)src/device/readers/apple_silicon_native.rs:153
  • data.gpu_frequency = ioreport.gpu_freqsrc/device/macos_native/metrics.rs:105
  • ioreport.gpu_freq defaults to 0, overwritten only via calculate_cluster_average(&gpu_freqs)src/device/macos_native/ioreport.rs:765-768

gpu_freq ends up 0 through either of these paths:

  1. Idle GPU, active_residency == 0. process_gpu_channelcalc_gpu_freq_with_table (src/device/macos_native/ioreport.rs:851-887) skips every state whose name contains IDLE/OFF/DOWN (ioreport.rs:861-863). When the Apple GPU power-gates (the normal idle state on a machine just running a TUI), all GPUPH residency for the 100ms window sits in those idle states, so active_residency stays 0 and the function returns (0, …). That (0, 0.0) is pushed into gpu_freqs, so calculate_cluster_average returns Some((0, …)) — note: Some, not None — and gpu_freq is set to 0.

  2. gpu_freqs empty. If the GPUPH channel item is absent from a given IOReport sample, or item.get_residencies() returns empty, process_gpu_channel early-returns (src/device/macos_native/ioreport.rs:830-832) and nothing is pushed. calculate_cluster_average([]) returns None, so gpu_freq keeps its 0 default.

The 4-sample integer averaging in NativeMetricsManager::average_samples (src/device/macos_native/manager.rs:242,257) then averages those per-sample values; an all-idle collection window averages to 0.

The core semantic problem: GpuInfo.frequency: u32 (src/device/types.rs:50) cannot distinguish "GPU idle, no active P-state this window" from "no frequency data available". An idle Apple GPU is not running at 0 Hz — it is parked at its lowest P-state — but the code reports 0 for it.

Bug B — render layer: frequency > 0 used as "is data available"

src/ui/renderers/gpu_renderer.rs:315:

if info.frequency > 0 {
    print_colored_text(stdout, " Freq:", Color::Magenta, None, None);
    ...
}

The entire Freq: label+value is gated on frequency > 0. This conflates "zero / no data" with "don't render at all", so any refresh where Bug A produces 0 makes the field vanish — and because it is rendered inline, every field after it on the row shifts left. This block has been unchanged since the renderer split in #44.

This half is cross-platform: NVIDIA's reader also does device.clock(...).unwrap_or(0) (src/device/readers/nvidia.rs:341) and the nvidia-smi fallback does parts[7].parse().unwrap_or(0) (src/device/readers/nvidia.rs:818); Jetson reads cur_freq from sysfs (src/device/readers/nvidia_jetson.rs:120). Any of those producing 0 (query error, sysfs read failure) would flicker identically. NVIDIA just rarely hits it because NVML reports a stable idle graphics clock.

Why it is intermittent

gpu_freq is tied to GPU activity during the ~400ms collection window. Busy GPU → active P-state residency → non-zero freq → Freq: shown. Idle GPU → only idle-state residency → 0Freq: hidden. A normal desktop Apple GPU floats between these states constantly (even drawing the TUI causes short bursts), so the field flickers every few seconds.

Proposed fix

Two parts; both should land together.

Fix B (renderer, cross-platform) — stop gating display on the value

Don't use frequency > 0 as a data-availability proxy. Either:

  • always render Freq: and show N/A when no data is available — consistent with how VRAM:/Temp: already handle metrics_available == "false" and temperature == 0 in the same function (src/ui/renderers/gpu_renderer.rs:273,293-298), or
  • gate on an explicit "frequency known" signal rather than the integer value.

This alone stops the flicker and keeps the row layout stable.

Fix A (Apple Silicon data layer) — idle GPU should report a real frequency

Make calc_gpu_freq_with_table (and calc_freq_from_residencies) return the GPU's idle/base clock instead of 0 when the GPU is present but idle (total_residency > 0 && active_residency == 0): return freq_table[0] (the lowest P-state in the IOKit pmgr table) rather than 0. An idle Apple GPU genuinely sits at its lowest P-state.

Optionally also make availability explicit end-to-end — e.g. GpuInfo.frequency: Option<u32> (or a companion frequency_available: bool), with None reserved for the genuine "no data" case (channel missing / residencies empty) and the renderer showing N/A for None. This cleanly separates the two states Bug A conflates today.

Worth checking while in here

  • Whether GPUPH is the correct/only GPU P-state channel across M1-M5 (Pro/Max/Ultra), or whether some chips expose it under a different channel name — a permanently-missing channel would make Freq never show, not flicker.
  • Whether get_residencies() ever transiently returns empty for GPUPH (would indicate a sampling-robustness issue, not just an idle-state issue).

Affected files

  • src/ui/renderers/gpu_renderer.rs:314-334 — Fix B
  • src/device/macos_native/ioreport.rs:828-887process_gpu_channel, calc_gpu_freq_with_table, calc_freq_from_residencies — Fix A
  • src/device/macos_native/ioreport.rs:765-768calculate_cluster_average call site for GPU
  • src/device/readers/apple_silicon_native.rs:153,236frequency plumbing
  • src/device/types.rs:50GpuInfo.frequency field (if making availability explicit)
  • src/device/macos_native/manager.rs:242,257 — sample averaging (verify behavior once idle returns base clock)

Acceptance criteria

  • On an idle Apple Silicon Mac, the Freq: field in the GPU List stays visible across consecutive refresh cycles (no appear/disappear flicker) and the row layout does not shift.
  • An idle Apple GPU reports its idle/base clock, not 0; a loaded GPU still reports the correct weighted active frequency.
  • When frequency data is genuinely unavailable, the field renders a stable N/A instead of vanishing.
  • Renderer change verified to not regress NVIDIA / Jetson rows (no spurious N/A, no flicker).
  • Changes integrated into the actual view TUI code path (not just helper functions) and confirmed in a running all-smi view session.
  • cargo fmt --check, cargo clippy, and cargo test pass; existing ioreport.rs freq tests updated for the new idle-state behavior.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions