Summary
The public library API (AllSmi, src/client.rs) returns owned, point-in-time snapshots (Vec<GpuInfo>, Vec<CpuInfo>, Vec<MemoryInfo>). Refreshing today means re-calling the getter, which re-enumerates every device. This issue requests two ergonomics for library consumers:
- Stable correlation identifiers so a re-fetched batch can be matched, entry-by-entry, to a previously held batch.
- Targeted refresh of a single device of interest without re-enumerating everything.
Background
AllSmi was introduced for embedding in external Rust projects (#106) and has since grown (e.g. get_storage_info, #115). Its getters — get_gpu_info (src/client.rs:259), get_cpu_info (src/client.rs:383), get_memory_info (src/client.rs:412) — take &self and rebuild the result on each call by iterating every reader (src/client.rs:260-264). So the underlying source is not frozen: re-calling returns fresh values (bounded by the platform sample interval on Apple Silicon). What goes stale is the owned copy the caller is holding, and after a refresh the caller is left to (a) re-enumerate and (b) work out which new entry maps to which old one.
State of per-entry identifiers today:
GpuInfo.uuid exists (src/device/types.rs:36) → GPUs/NPUs are already correlatable. The GPU gap is purely efficiency (re-enumerating 8+ devices to refresh one).
StorageInfo.index exists (src/storage/info.rs:24) → storage is correlatable.
CpuInfo (src/device/types.rs:290) has no per-entry unique id — only host-level host_id / hostname / instance, which are identical across every CPU entry from one host. This is exactly the gap the reporter hit. (Per-socket detail is nested in per_socket_info[].socket_id, but there is no key for the CpuInfo itself.)
MemoryInfo (src/device/types.rs:365) likewise has no id, though it is effectively a host singleton.
Cost that motivates targeted refresh: the NVIDIA reader (src/device/readers/nvidia.rs:370) queries NVML for every device on each get_gpu_info() and, on NVML failure, shells out to nvidia-smi (src/device/readers/nvidia.rs:383). Refreshing a single device of interest should not pay for all of them.
Proposed Solution
1. Targeted refresh on AllSmi (not on the info structs)
impl AllSmi {
/// Fetch fresh info for one GPU/NPU by UUID. `None` if it is no longer present.
pub fn get_gpu_by_uuid(&self, uuid: &str) -> Option<GpuInfo>;
/// Re-fetch `info` in place by its UUID. Returns `true` if the device was
/// found and the struct overwritten, `false` if it has disappeared.
pub fn refresh_gpu(&self, info: &mut GpuInfo) -> bool;
}
Rationale for putting this on AllSmi rather than the reporter's GpuInfo::update(&mut self) option: GpuInfo / CpuInfo / MemoryInfo are plain Serialize + Deserialize + Clone DTOs with no handle to hardware. The same types are also produced by the remote-monitoring network parser and by snapshot deserialization. A self-updating struct would have to embed a #[serde(skip)] reader handle that is None for any deserialized value, giving update() a confusing partial contract (and complicating Clone / Send / Sync / serialization). The readers already live inside AllSmi (src/client.rs:134-138), so that is the correct owner of refresh logic.
refresh_gpu returns bool to match the current infallible reader behavior (get_gpu_info swallows read errors and returns zeros). A Result<bool> variant — matching the reporter's fallible suggestion — becomes worthwhile if/when readers start propagating read errors.
2. Optional, backward-compatible reader hook for efficiency
pub trait GpuReader: Send + Sync {
fn get_gpu_info(&self) -> Vec<GpuInfo>;
// …existing methods…
/// Fetch a single device by UUID. Default filters the full enumeration;
/// readers that can address one device directly should override.
fn get_gpu_info_by_uuid(&self, uuid: &str) -> Option<GpuInfo> {
self.get_gpu_info().into_iter().find(|g| g.uuid == uuid)
}
}
The default impl keeps every existing reader compiling unchanged. The NVIDIA reader can override it: NVML supports opening a handle by UUID (nvmlDeviceGetHandleByUUID, surfaced as Nvml::device_by_uuid in nvml-wrapper 0.12.1 — confirmed present, the reader currently only uses device_by_index), letting it skip the full-device loop in get_gpu_info_nvml.
3. Stable correlation identifier for CpuInfo / MemoryInfo
Add a 0-based index: u32 field (mirroring StorageInfo.index), assigned by AllSmi::get_cpu_info / get_memory_info while concatenating reader outputs (enumerate() over the flattened result), marked #[serde(default)] for wire/snapshot back-compat (same convention already used for temperature_threshold_* and bandwidth_mb_s in types.rs). CPU/memory topology is static, so the index is a stable key across refreshes — unlike GPUs, which can hot-plug/MIG-reconfigure and therefore key on uuid.
Optional companion for symmetry: get_cpu_by_index(u32) -> Option<CpuInfo>. The efficiency win is marginal for CPU (typically a single aggregate entry per host), so the identifier itself is the real deliverable here.
Implementation Notes
- Files:
src/client.rs (new AllSmi methods + index assignment), src/device/traits.rs (default trait method), src/device/readers/nvidia.rs (override), src/device/types.rs (index fields).
- NVML static cache:
get_gpu_info_nvml caches static per-device data keyed by index (src/device/readers/nvidia.rs:269, :305). A UUID-keyed override needs a uuid→index resolution or a uuid-keyed static cache; the simplest first cut is to ship the default filter for NVIDIA and add the device_by_uuid fast path as a follow-up.
- Exporter / parser scope: the
index field is only required on the library/JSON path. The Prometheus exporter and remote network parser do not have to carry a CPU/memory index label — #[serde(default)] keeps older snapshots deserializing cleanly, so this change can stay contained to the library layer.
- Thread-safety: all new methods stay
&self; AllSmi: Send + Sync is preserved (src/client.rs:549-550).
- Back-compat / breaking surface: the new methods are additive. Adding
#[serde(default)] fields follows existing repo precedent and is treated as non-breaking; the only caveat is downstream code that constructs CpuInfo / MemoryInfo via exhaustive struct literals (these structs are normally returned by the library, not built by consumers).
- Docs: extend
examples/library_usage.rs with a correlate-and-refresh loop, and add a short "Refreshing data" section to the AllSmi rustdoc in src/client.rs / src/lib.rs.
Acceptance Criteria
Original Suggestion
Title: Allow targeted updates
Right now the returned infos are static and so grow stale the longer it's been since e.g. AllSmi::get_gpu_info was called.
This can be addressed by repeated calls to AllSmi::get_gpu_info, AllSmi::get_cpu_info, etc.
However, at least in the case of CpuInfo, there is no unique identifier by which to clearly determine which in the new batch of CpuInfos corresponds to which in the prior batch.
It would be sufficient if such a unique identifier were provided, allowing targeted updates to be done by re-generating all infos, and selecting the one desired.
It would be more efficient, though, in cases where only a small number of devices are of interest, to allow targeted updates.
One approach would be to enable the info structs to update themselves. This might look like:
impl GpuInfo {
pub fn update(&mut self) { ... }
}
Or if it needs to be fallible (such as if the device disappears, permissions change, etc.):
impl GpuInfo {
pub fn update(self) -> Result<Self, ...> { ... }
}
Or it could live in AllSmi:
impl AllSmi {
pub fn update(gpu_info: &mut GpuInfo) { ... }
}
etc.
This would allow the staleness of returned information to be addressed as desired by the user, without having to re-enumerate all devices such as when calling AllSmi::get_gpu_info. Thanks for considering.
Summary
The public library API (
AllSmi,src/client.rs) returns owned, point-in-time snapshots (Vec<GpuInfo>,Vec<CpuInfo>,Vec<MemoryInfo>). Refreshing today means re-calling the getter, which re-enumerates every device. This issue requests two ergonomics for library consumers:Background
AllSmiwas introduced for embedding in external Rust projects (#106) and has since grown (e.g.get_storage_info, #115). Its getters —get_gpu_info(src/client.rs:259),get_cpu_info(src/client.rs:383),get_memory_info(src/client.rs:412) — take&selfand rebuild the result on each call by iterating every reader (src/client.rs:260-264). So the underlying source is not frozen: re-calling returns fresh values (bounded by the platform sample interval on Apple Silicon). What goes stale is the owned copy the caller is holding, and after a refresh the caller is left to (a) re-enumerate and (b) work out which new entry maps to which old one.State of per-entry identifiers today:
GpuInfo.uuidexists (src/device/types.rs:36) → GPUs/NPUs are already correlatable. The GPU gap is purely efficiency (re-enumerating 8+ devices to refresh one).StorageInfo.indexexists (src/storage/info.rs:24) → storage is correlatable.CpuInfo(src/device/types.rs:290) has no per-entry unique id — only host-levelhost_id/hostname/instance, which are identical across every CPU entry from one host. This is exactly the gap the reporter hit. (Per-socket detail is nested inper_socket_info[].socket_id, but there is no key for theCpuInfoitself.)MemoryInfo(src/device/types.rs:365) likewise has no id, though it is effectively a host singleton.Cost that motivates targeted refresh: the NVIDIA reader (
src/device/readers/nvidia.rs:370) queries NVML for every device on eachget_gpu_info()and, on NVML failure, shells out tonvidia-smi(src/device/readers/nvidia.rs:383). Refreshing a single device of interest should not pay for all of them.Proposed Solution
1. Targeted refresh on
AllSmi(not on the info structs)Rationale for putting this on
AllSmirather than the reporter'sGpuInfo::update(&mut self)option:GpuInfo/CpuInfo/MemoryInfoare plainSerialize + Deserialize + CloneDTOs with no handle to hardware. The same types are also produced by the remote-monitoring network parser and by snapshot deserialization. A self-updating struct would have to embed a#[serde(skip)]reader handle that isNonefor any deserialized value, givingupdate()a confusing partial contract (and complicatingClone/Send/Sync/ serialization). The readers already live insideAllSmi(src/client.rs:134-138), so that is the correct owner of refresh logic.refresh_gpureturnsboolto match the current infallible reader behavior (get_gpu_infoswallows read errors and returns zeros). AResult<bool>variant — matching the reporter's fallible suggestion — becomes worthwhile if/when readers start propagating read errors.2. Optional, backward-compatible reader hook for efficiency
The default impl keeps every existing reader compiling unchanged. The NVIDIA reader can override it: NVML supports opening a handle by UUID (
nvmlDeviceGetHandleByUUID, surfaced asNvml::device_by_uuidinnvml-wrapper0.12.1 — confirmed present, the reader currently only usesdevice_by_index), letting it skip the full-device loop inget_gpu_info_nvml.3. Stable correlation identifier for
CpuInfo/MemoryInfoAdd a 0-based
index: u32field (mirroringStorageInfo.index), assigned byAllSmi::get_cpu_info/get_memory_infowhile concatenating reader outputs (enumerate()over the flattened result), marked#[serde(default)]for wire/snapshot back-compat (same convention already used fortemperature_threshold_*andbandwidth_mb_sintypes.rs). CPU/memory topology is static, so the index is a stable key across refreshes — unlike GPUs, which can hot-plug/MIG-reconfigure and therefore key onuuid.Optional companion for symmetry:
get_cpu_by_index(u32) -> Option<CpuInfo>. The efficiency win is marginal for CPU (typically a single aggregate entry per host), so the identifier itself is the real deliverable here.Implementation Notes
src/client.rs(newAllSmimethods + index assignment),src/device/traits.rs(default trait method),src/device/readers/nvidia.rs(override),src/device/types.rs(indexfields).get_gpu_info_nvmlcaches static per-device data keyed by index (src/device/readers/nvidia.rs:269,:305). A UUID-keyed override needs a uuid→index resolution or a uuid-keyed static cache; the simplest first cut is to ship the default filter for NVIDIA and add thedevice_by_uuidfast path as a follow-up.indexfield is only required on the library/JSON path. The Prometheus exporter and remote network parser do not have to carry a CPU/memory index label —#[serde(default)]keeps older snapshots deserializing cleanly, so this change can stay contained to the library layer.&self;AllSmi: Send + Syncis preserved (src/client.rs:549-550).#[serde(default)]fields follows existing repo precedent and is treated as non-breaking; the only caveat is downstream code that constructsCpuInfo/MemoryInfovia exhaustive struct literals (these structs are normally returned by the library, not built by consumers).examples/library_usage.rswith a correlate-and-refresh loop, and add a short "Refreshing data" section to theAllSmirustdoc insrc/client.rs/src/lib.rs.Acceptance Criteria
AllSmi::get_gpu_by_uuid(&str) -> Option<GpuInfo>returns fresh data for a present UUID andNonefor an absent one.AllSmi::refresh_gpu(&mut GpuInfo) -> booloverwrites the struct in place and reports found/absent.GpuReader::get_gpu_info_by_uuidexists with a default implementation; all existing readers compile without change.CpuInfoandMemoryInfocarry a stable, serializableindexpopulated byAllSmi::get_cpu_info/get_memory_info, with#[serde(default)]for back-compat.CpuInfoindex stability across two consecutiveget_cpu_info()calls.examples/library_usage.rsdemonstrates correlating and refreshing a previously fetched device;cargo run --example library_usagesucceeds.cargo fmt --check,cargo clippy, andcargo testpass.Original Suggestion
Title: Allow targeted updates
Right now the returned infos are static and so grow stale the longer it's been since e.g.
AllSmi::get_gpu_infowas called.This can be addressed by repeated calls to
AllSmi::get_gpu_info,AllSmi::get_cpu_info, etc.However, at least in the case of
CpuInfo, there is no unique identifier by which to clearly determine which in the new batch ofCpuInfos corresponds to which in the prior batch.It would be sufficient if such a unique identifier were provided, allowing targeted updates to be done by re-generating all infos, and selecting the one desired.
It would be more efficient, though, in cases where only a small number of devices are of interest, to allow targeted updates.
One approach would be to enable the info structs to update themselves. This might look like:
Or if it needs to be fallible (such as if the device disappears, permissions change, etc.):
Or it could live in
AllSmi:etc.
This would allow the staleness of returned information to be addressed as desired by the user, without having to re-enumerate all devices such as when calling
AllSmi::get_gpu_info. Thanks for considering.