Comparing changes

…ler (#149) Platform detection functions (has_nvidia, has_gaudi, etc.) were re-evaluated on every view refresh from update_notifications, executing system_profiler SPPCIDataType once per frame on macOS. The probe takes hundreds of ms and spawns processes repeatedly even though hardware presence never changes at runtime. Wrap each detection function in a process-global OnceLock so the underlying probe runs at most once per process. Also collapse nested ifs flagged by clippy in macOS device readers.

Apple Silicon's SMC stores `flt ` (IEEE 754 single-precision) sensor values in little-endian byte order, unlike the legacy SP78/FP* fixed point types which remain big-endian. The convert_value() helper was using f32::from_be_bytes() for FLT, so every Tg*/Tp*/Te* temperature read returned a randomly varying garbage float (e.g. 1e-32, 3e36) and the (10..=120) sanity filter rejected almost everything. Symptoms in local view mode: GPU temperature showed 0 °C or sporadic 6/7 °C, CPU temperature showed 0 °C, dashboard "Avg. Temp" showed thermal pressure text only because the numeric path produced nonsense. Switching the FLT branch to from_le_bytes() restores real die temperatures (~50–60 °C idle on M1 Ultra). Verified by dumping the raw SMC response buffer: bytes always landed at offset 48 with consistent LE-encoded floats matching expected die temps. While here, several related correctness issues were also addressed: * Dashboard and aggregator looked up the GPU detail key as "Architecture" but apple_silicon_native (and the prometheus exporter) write it lowercase as "architecture". The mismatch silently disabled the entire is_apple_silicon special-case path. Standardise on the lowercase form everywhere, including the NVIDIA and Jetson readers. * cpu_macos::get_cpu_temperature was a stub that always returned None, so the live "CPU Temp." gauge was permanently 0 °C even though SMC was already collecting the value. Wire it through the cached NativeMetricsData fetched in get_apple_silicon_cpu_info. * apple_silicon_native now falls back to the SMC CPU die temperature when the GPU sensor is unavailable. CPU and GPU share the same SoC die so the readings track each other closely; this is far more meaningful than reporting 0 °C. * Dashboard "Avg. Temp" cell now shows numeric °C on every platform. On Apple Silicon the second-row "Temp. Stdev" cell becomes a "Thermal" cell carrying the qualitative thermal pressure level (single-die std dev is meaningless), so both pieces of information remain visible. * The per-GPU list view used to display thermal pressure text on Apple Silicon for the same reason; it now shows the real numeric die temp, consistent with every other platform. Added a regression unit test (test_flt_little_endian_decoding) so that the endianness can't silently flip back.

The TIME+ column was fixed at 8 chars, but format_cpu_time can produce values up to 10 chars ("8760:00:00" at the 365-day cap). Values like "213:16:04" (9 chars) overflow the column and push the Command column right, so rows with 9-char times no longer align with the header or with rows that have shorter times. Widen fixed_widths[11] from 8 to 10 so all possible TIME+ values right-align cleanly in the column and Command stays at a consistent position. Document the width invariant in format_cpu_time and add a regression test that enforces the maximum output width.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Apr 8, 2026

This comparison is taking too long to generate.

Uh oh!