Skip to content

feat: Enrich chassis information with DMI and thermal zone data on Linux #147

Description

@inureyes

Problem / Background

The GenericChassisReader (used on all non-macOS platforms) currently only reports aggregated GPU power. On DGX Spark (GB10) and other Linux systems, much more chassis-level information is available but not collected:

  • DMI data (/sys/class/dmi/id/): product name, vendor, board name, BIOS version — all readable without sudo
  • Thermal zones (/sys/class/thermal/thermal_zone*/): multiple ACPI temperature zones (inlet/outlet/board temps) — readable without sudo
  • System identification: Chassis type, product version

Currently the TUI only shows "Pwr: N/A" for the chassis line, which is unhelpful.

Verified on Real Hardware (DGX Spark GB10)

Available data sources (no sudo required):

Source Data Path
DMI product_name NVIDIA_DGX_Spark /sys/class/dmi/id/product_name
DMI sys_vendor NVIDIA /sys/class/dmi/id/sys_vendor
DMI board_name P4242 /sys/class/dmi/id/board_name
DMI product_version A.7 /sys/class/dmi/id/product_version
DMI bios_version 5.36_0ACUM018 /sys/class/dmi/id/bios_version
DMI chassis_type 17 (Server) /sys/class/dmi/id/chassis_type
Thermal zones (7) 38.9°C ~ 41.7°C (acpitz) /sys/class/thermal/thermal_zone*/temp
GPU Power ~4.5W nvidia-smi
IPMI Not available No /dev/ipmi0
Fan Speed N/A Not exposed on GB10

Proposed Solution

1. Collect DMI Information

In GenericChassisReader, read DMI data from /sys/class/dmi/id/ and populate ChassisInfo.detail:

  • product_name → detail["Product Name"]
  • sys_vendor → detail["Vendor"]
  • board_name → detail["Board"]
  • product_version → detail["Version"]
  • bios_version → detail["BIOS Version"]

2. Collect Thermal Zone Data

Read /sys/class/thermal/thermal_zone*/temp and /sys/class/thermal/thermal_zone*/type:

  • Use the lowest-index acpitz zone as inlet_temperature
  • Use the highest-temp acpitz zone or last zone as outlet_temperature
  • These are board-level temperatures, distinct from per-GPU temperatures

3. Ensure GPU Power Flows to Chassis

The GenericChassisReader has a gpu_power_cache mechanism but it appears the total power may not always be populated. Verify and fix the GPU power aggregation so total_power_watts shows correctly.

Acceptance Criteria

  • DMI product name, vendor, and board info shown in chassis detail
  • Thermal zone temperatures collected and shown (inlet/outlet)
  • GPU power correctly reported in chassis total_power_watts
  • No sudo required for any of the new data collection
  • Graceful fallback when DMI or thermal data is unavailable
  • Works on DGX Spark and generic Linux systems
  • No regression on macOS chassis reader

Technical Reference

Key files:

  • src/device/readers/chassis/generic.rs — Main file to modify
  • src/device/readers/chassis/mod.rs — Factory function
  • src/device/types.rs — ChassisInfo struct (already has inlet_temperature, outlet_temperature, detail fields)
  • src/ui/renderers/chassis_renderer.rs — TUI renderer (already handles thermal display)
  • src/api/metrics/chassis.rs — API exporter (already exports inlet/outlet temps)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions