Skip to content

feat: enrich chassis info with DMI data, thermal zones, and GPU power#148

Merged
inureyes merged 3 commits into
mainfrom
feature/issue-147-chassis-dmi-thermal
Apr 8, 2026
Merged

feat: enrich chassis info with DMI data, thermal zones, and GPU power#148
inureyes merged 3 commits into
mainfrom
feature/issue-147-chassis-dmi-thermal

Conversation

@inureyes

@inureyes inureyes commented Apr 8, 2026

Copy link
Copy Markdown
Member

Summary

  • Read DMI fields (product name, vendor, board, version, BIOS) from /sys/class/dmi/id/ on Linux (no sudo required)
  • Read ACPI thermal zones for inlet/outlet board temperatures from /sys/class/thermal/
  • Wire up chassis reader in API server data collection loop (was missing)
  • Aggregate GPU power into chassis total_power_watts for both TUI and API paths
  • Add all_smi_chassis_info Prometheus metric exposing DMI details as labels

Verified on DGX Spark (GB10)

Before: Pwr: N/A (no other chassis data)

After:

all_smi_chassis_info{hostname="spark-0c5a", product_name="NVIDIA_DGX_Spark", vendor="NVIDIA", board="P4242", version="A.7", bios_version="5.36_0ACUM018", platform="Linux"} 1
all_smi_chassis_power_watts{hostname="spark-0c5a"} 4.05
all_smi_chassis_inlet_temperature_celsius{hostname="spark-0c5a"} 39.7
all_smi_chassis_outlet_temperature_celsius{hostname="spark-0c5a"} 41.8

Closes #147

Test plan

  • Build passes on GB10 (aarch64)
  • All 601 tests pass
  • Chassis metrics include DMI data, thermal zones, and GPU power
  • No sudo required for new data collection
  • Graceful fallback when DMI/thermal data unavailable (tested with nonexistent paths)
  • No regression on macOS (Apple Silicon chassis reader untouched)

- Read DMI fields (product name, vendor, board, version, BIOS) from
  /sys/class/dmi/id/ on Linux (no sudo required)
- Read ACPI thermal zones for inlet/outlet board temperatures
- Wire up chassis reader in API server data collection loop
- Aggregate GPU power into chassis total_power_watts
- Add all_smi_chassis_info Prometheus metric with DMI labels
- Add inject_gpu_power helper for both TUI and API paths

Closes #147
@inureyes inureyes added type:enhancement New feature or request priority:medium Medium priority issue labels Apr 8, 2026
@inureyes

inureyes commented Apr 8, 2026

Copy link
Copy Markdown
Member Author

Implementation Review Summary

Intent

Enrich chassis information with DMI data, thermal zones, and GPU power aggregation for Linux platforms (verified on DGX Spark GB10).

Findings Addressed

No findings requiring code changes were identified. The implementation is clean, correct, and well-integrated.

Remaining Items

None.

Verification

  • All stated requirements implemented
    • DMI fields (product name, vendor, board, version, BIOS version) read from /sys/class/dmi/id/
    • Thermal zone temperatures (inlet/outlet) from /sys/class/thermal/
    • GPU power aggregated into chassis total_power_watts
    • all_smi_chassis_info Prometheus metric with DMI labels
    • Chassis reader wired into API server data collection loop
  • No placeholder/mock code remaining
  • Integrated into project code flow
    • API server (server.rs): chassis reader created, chassis info collected in the loop, GPU power injected, state updated
    • TUI path (local_collector.rs): inject_gpu_power() helper called in both first-iteration parallel and subsequent sequential collection paths
    • Prometheus metrics (chassis.rs): new all_smi_chassis_info gauge with dynamic DMI labels exported correctly
  • Project conventions followed
    • License headers present, module-level doc comments updated
    • #[cfg(target_os = "linux")] guards for platform-specific code
    • #[cfg(not(target_os = "linux"))] fallback for non-Linux
    • Error handling via .ok() / graceful None returns (no unwrap() in library paths)
    • Inline format args used consistently (Rust 1.58+)
    • Import organization follows project style
  • Existing modules reused where applicable
    • Reuses ChassisInfo struct fields (inlet_temperature, outlet_temperature, detail) that already existed
    • Reuses MetricBuilder and MetricPresenceFlags patterns from existing chassis metric exporter
    • Reuses get_hostname() utility
  • No unintended structural changes
    • Only the four files listed in the PR are modified; no renames or moves
  • Tests pass
    • All 601 tests pass (251 unit + 302 integration + doc tests)
    • 7 new tests added for DMI reading, thermal zone parsing (mock zones, empty dir, nonexistent path, single zone, Linux-specific DMI presence)
    • Clippy clean with -D warnings

Notes

  • The GPU power injection logic is intentionally duplicated between server.rs (API path, inline) and local_collector.rs (TUI path, extracted helper). This is reasonable since the API server loop is structurally different from the local collector, and extracting a shared function would require coupling these independent modules. The local_collector.rs version is properly factored into a named inject_gpu_power() function reused in both parallel and sequential collection paths within that file.
  • The all_smi_chassis_info metric uses dynamic label sets (only labels with available values are included), which is correct Prometheus practice for optional metadata.
  • The read_thermal_zones_from() function properly filters to only acpitz zones and includes sanity bounds (-40 to 150 Celsius), which is a good defensive measure.

inureyes added 2 commits April 8, 2026 13:25
- Escape backslash and newline in Prometheus label values per spec
  (previously only double-quote was escaped)
- Cache static DMI fields at GenericChassisReader construction time
  instead of re-reading /sys/class/dmi/id/ on every collection cycle
Add unit tests for MetricBuilder label escaping (backslash, double-quote,
newline) and ChassisMetricExporter covering the all_smi_chassis_info DMI
labels metric, inlet/outlet temperature, no-DMI guard, and
MetricPresenceFlags::all_present.
@inureyes

inureyes commented Apr 8, 2026

Copy link
Copy Markdown
Member Author

PR Finalization Complete

Summary

  • Tests: Added 12 new unit tests across 2 files covering previously untested paths
  • Documentation: No doc changes needed (inline doc comments already present)
  • Lint/Format: All clean - cargo clippy and cargo fmt pass with zero issues

New tests added

src/api/metrics/mod.rs (MetricBuilder label escaping - 5 tests):

  • test_metric_builder_label_escaping_backslash - backslash escaping in label values
  • test_metric_builder_label_escaping_double_quote - double-quote escaping
  • test_metric_builder_label_escaping_newline - newline escaping
  • test_metric_builder_no_labels - metric with no labels renders correctly
  • test_metric_builder_help_and_type - HELP/TYPE lines render correctly

src/api/metrics/chassis.rs (ChassisMetricExporter - 5 tests):

  • test_chassis_info_dmi_labels_metric - all_smi_chassis_info metric emits DMI labels (product_name, vendor, board, bios_version, platform)
  • test_chassis_inlet_outlet_temperature_metrics - inlet/outlet temperature metrics export
  • test_chassis_info_no_dmi_details_skips_info_metric - no all_smi_chassis_info emitted when no DMI keys present
  • test_metric_presence_flags_all_present - MetricPresenceFlags correctly detects all metrics present

All checks passing. Ready for merge.

@inureyes inureyes added the status:done Completed label Apr 8, 2026
@inureyes inureyes merged commit 7e75ec7 into main Apr 8, 2026
2 checks passed
@inureyes inureyes deleted the feature/issue-147-chassis-dmi-thermal branch April 8, 2026 04:29
@inureyes inureyes self-assigned this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:medium Medium priority issue status:done Completed type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Enrich chassis information with DMI and thermal zone data on Linux

1 participant