Problem / Background
The all-smi crate is currently structured primarily as a CLI application, but it already has a library target defined in Cargo.toml (name = "all_smi"). While the core building blocks exist (traits, types, and factory functions), using all-smi as a library in external Rust projects requires understanding internal module structure and manually managing platform-specific components.
Current State:
Cargo.toml defines [lib] with name = "all_smi"
src/lib.rs re-exports modules primarily for testing purposes
- Core types exist:
GpuInfo, CpuInfo, MemoryInfo, ProcessInfo, ChassisInfo
- Core traits exist:
GpuReader, CpuReader, MemoryReader, ChassisReader
- Factory functions exist:
get_gpu_readers(), get_cpu_readers(), get_memory_readers()
Pain Points for Library Users:
- No high-level, ergonomic API - users must understand internal trait-based architecture
- Platform-specific managers (e.g.,
NativeMetricsManager for macOS, HlsmiManager for Gaudi) require manual lifecycle management
- No unified error type hierarchy for library usage
- Missing prelude module for convenient imports
- No documentation examples for library usage
Proposed Solution
Create a user-friendly library API that abstracts away internal complexity while maintaining full access to lower-level components when needed.
Phase 1: High-Level API (src/api.rs or src/client.rs)
Create an AllSmi struct with simple, ergonomic methods:
use all_smi::AllSmi;
fn main() -> Result<(), all_smi::Error> {
// Initialize with auto-detection
let smi = AllSmi::new()?;
// Get all GPU/NPU information
let gpus = smi.get_gpu_info();
for gpu in &gpus {
println!("{}: {}% utilization, {:.1}W",
gpu.name, gpu.utilization, gpu.power_consumption);
}
// Get CPU information
let cpus = smi.get_cpu_info();
for cpu in &cpus {
println!("{}: {}% utilization", cpu.cpu_model, cpu.utilization);
}
// Get memory information
let memory = smi.get_memory_info();
for mem in &memory {
println!("Memory: {:.1}% used", mem.utilization);
}
// Get process information (GPU processes)
let processes = smi.get_process_info();
for proc in &processes {
println!("PID {}: {} using {} MB GPU memory",
proc.pid, proc.process_name, proc.used_memory / 1024 / 1024);
}
// Get chassis/node-level information
if let Some(chassis) = smi.get_chassis_info() {
if let Some(power) = chassis.total_power_watts {
println!("Total system power: {:.1}W", power);
}
}
Ok(())
}
Phase 2: Prelude Module (src/prelude.rs)
Create convenient re-exports for use all_smi::prelude::*:
// src/prelude.rs
pub use crate::AllSmi;
pub use crate::Error;
// Core types
pub use crate::device::{
GpuInfo, CpuInfo, MemoryInfo, ProcessInfo, ChassisInfo,
CpuPlatformType, CoreType, CoreUtilization,
AppleSiliconCpuInfo, CpuSocketInfo,
FanInfo, PsuInfo, PsuStatus,
};
// Traits (for advanced usage)
pub use crate::device::{GpuReader, CpuReader, MemoryReader, ChassisReader};
// Factory functions (for advanced usage)
pub use crate::device::{get_gpu_readers, get_cpu_readers, get_memory_readers, create_chassis_reader};
Phase 3: Unified Error Type (src/error.rs)
Create a proper error hierarchy:
use thiserror::Error;
#[derive(Debug, Error)]
pub enum Error {
#[error("Platform initialization failed: {0}")]
PlatformInit(String),
#[error("No supported devices found")]
NoDevicesFound,
#[error("Device access error: {0}")]
DeviceAccess(String),
#[error("Permission denied: {0}")]
PermissionDenied(String),
#[error("Feature not supported on this platform: {0}")]
NotSupported(String),
#[error(transparent)]
Io(#[from] std::io::Error),
}
pub type Result<T> = std::result::Result<T, Error>;
Phase 4: Lifecycle Management
The AllSmi struct should handle:
- Auto-initialize platform-specific managers (NativeMetricsManager, HlsmiManager)
- Implement
Drop for proper cleanup
- Provide
refresh() method for updating cached data
impl AllSmi {
pub fn new() -> Result<Self> { ... }
/// Refresh all device information
pub fn refresh(&mut self) -> Result<()> { ... }
/// Get supported device types on this platform
pub fn supported_devices(&self) -> Vec<DeviceType> { ... }
}
impl Drop for AllSmi {
fn drop(&mut self) {
// Clean up platform-specific resources
}
}
Phase 5: Builder Pattern (Optional)
For advanced configuration:
let smi = AllSmi::builder()
.with_gpu_monitoring(true)
.with_cpu_monitoring(true)
.with_memory_monitoring(true)
.with_process_monitoring(false) // Disable process monitoring
.with_refresh_interval(Duration::from_secs(3))
.build()?;
Acceptance Criteria
Technical Considerations
Platform-Specific Initialization
The library needs to handle platform-specific managers:
- macOS (Apple Silicon): Initialize
NativeMetricsManager for IOReport/SMC access
- Linux (Gaudi): Initialize
HlsmiManager for hl-smi process management
- Cross-platform: Handle cases where no GPU/NPU is detected gracefully
Thread Safety
AllSmi should be Send + Sync for use in async contexts
- Consider using
Arc<Mutex<...>> for shared state if needed
- Existing readers already implement
Send + Sync
Error Handling
- Return
Result<T> from all fallible operations
- Provide informative error messages with context
- Don't panic on missing hardware - return empty collections or errors
Documentation
- All public types and functions should have doc comments
- Include
# Examples sections in documentation
- Provide both simple and advanced usage examples
Additional Context
This enhancement will make all-smi more useful for:
- Integration into larger monitoring systems
- Building custom CLI tools on top of all-smi
- Embedding hardware monitoring in Rust applications
- Backend services that need programmatic access to GPU/NPU metrics
Related crates that could benefit:
- Backend.AI agent implementations
- Custom monitoring dashboards
- Resource schedulers
Problem / Background
The
all-smicrate is currently structured primarily as a CLI application, but it already has a library target defined inCargo.toml(name = "all_smi"). While the core building blocks exist (traits, types, and factory functions), usingall-smias a library in external Rust projects requires understanding internal module structure and manually managing platform-specific components.Current State:
Cargo.tomldefines[lib]withname = "all_smi"src/lib.rsre-exports modules primarily for testing purposesGpuInfo,CpuInfo,MemoryInfo,ProcessInfo,ChassisInfoGpuReader,CpuReader,MemoryReader,ChassisReaderget_gpu_readers(),get_cpu_readers(),get_memory_readers()Pain Points for Library Users:
NativeMetricsManagerfor macOS,HlsmiManagerfor Gaudi) require manual lifecycle managementProposed Solution
Create a user-friendly library API that abstracts away internal complexity while maintaining full access to lower-level components when needed.
Phase 1: High-Level API (
src/api.rsorsrc/client.rs)Create an
AllSmistruct with simple, ergonomic methods:Phase 2: Prelude Module (
src/prelude.rs)Create convenient re-exports for
use all_smi::prelude::*:Phase 3: Unified Error Type (
src/error.rs)Create a proper error hierarchy:
Phase 4: Lifecycle Management
The
AllSmistruct should handle:Dropfor proper cleanuprefresh()method for updating cached dataPhase 5: Builder Pattern (Optional)
For advanced configuration:
Acceptance Criteria
AllSmistruct with high-level API insrc/client.rsor similarpreludemodule with common re-exportsErrortype usingthiserrorDropexamples/library_usage.rsdemonstrating library usagesrc/lib.rsto export new public APITechnical Considerations
Platform-Specific Initialization
The library needs to handle platform-specific managers:
NativeMetricsManagerfor IOReport/SMC accessHlsmiManagerforhl-smiprocess managementThread Safety
AllSmishould beSend + Syncfor use in async contextsArc<Mutex<...>>for shared state if neededSend + SyncError Handling
Result<T>from all fallible operationsDocumentation
# Examplessections in documentationAdditional Context
This enhancement will make
all-smimore useful for:Related crates that could benefit: