Summary
Following the successful implementation of static value caching in PR #69, we've identified significant inconsistencies in how different AI accelerator reader implementations handle caching and static device information. This refactoring will standardize caching patterns across all accelerator readers (NVIDIA GPU, AMD GPU, Apple Silicon, NVIDIA Jetson, Tenstorrent NPU, Rebellions NPU, Furiosa NPU) to improve maintainability, consistency, and code reuse.
Current State Analysis
Caching Pattern Inconsistencies
Each accelerator implementation uses different caching approaches:
-
NVIDIA GPU (nvidia.rs):
OnceLock<HashMap<u32, DeviceStaticInfo>> for device info
- Separate
OnceLock<Option<String>> for driver_version and cuda_version
- Bulk initialization via
get_device_static_info()
- Key:
u32 (device index)
-
AMD GPU (amd.rs):
- Per-device
OnceLock<DeviceStaticInfo> (no HashMap)
- Separate
OnceLock<Option<String>> for rocm_version
- Per-device initialization
- No global cache structure
-
Apple Silicon (apple_silicon.rs):
- Uses
OnceCell + global Lazy<Mutex<Option<GpuInfoCache>>>
- Stores fields directly in reader struct
ensure_initialized() with global cache
- Different synchronization primitives
-
NVIDIA Jetson (nvidia_jetson.rs):
- Similar to NVIDIA GPU but with Tegra-specific handling
OnceLock<HashMap<u32, DeviceStaticInfo>>
- Key:
u32 (device index)
-
Tenstorrent NPU (tenstorrent.rs):
Lazy<Mutex<Option<Vec<CachedChipInfo>>>>
- Uses Vec instead of HashMap
ensure_chips_initialized() with global Lazy
- Different synchronization approach
-
Rebellions NPU (rebellions.rs):
OnceLock<String> for kmd_version
OnceLock<HashMap<String, DeviceStaticInfo>> for device info
ensure_static_cache_initialized(response) pattern
- Key:
String (UUID)
-
Furiosa NPU (furiosa.rs):
OnceLock<HashMap<String, DeviceStaticInfo>> for both CLI and RS methods
- Separate caches for different backend methods
- Key:
String (device index for CLI, UUID for RS)
DeviceStaticInfo Structure Variations
- NVIDIA/Furiosa/AMD: Uses
detail: HashMap<String, String> for extensible static fields
- Rebellions/Tenstorrent: Uses individual struct fields (less flexible)
- Apple Silicon: Stores fields directly in reader struct (no separate StaticInfo)
Version/Library Info Caching
Different approaches to caching driver/library versions:
- NVIDIA:
driver_version and cuda_version as separate OnceLock
- AMD:
rocm_version as OnceLock<Option<String>>
- Rebellions:
kmd_version as OnceLock<String>
- Apple:
driver_version as OnceCell<Option<String>>
- Tenstorrent: Embedded in chip info
- Furiosa: Part of DeviceStaticInfo
Proposed Refactoring
1. Unified Caching Pattern
Goal: Standardize on a single caching approach across all readers
Proposed Standard:
// Common pattern for all accelerators
static DEVICE_STATIC_CACHE: OnceLock<HashMap<KEY_TYPE, DeviceStaticInfo>> = OnceLock::new();
static LIBRARY_VERSION: OnceLock<Option<String>> = OnceLock::new();
Changes:
- ✅ Keep: NVIDIA, Rebellions, Furiosa patterns (already using OnceLock + HashMap)
- ❌ Replace: Apple Silicon's
OnceCell → OnceLock
- ❌ Replace: Apple Silicon's
Lazy<Mutex<>> → OnceLock<HashMap<>>
- ❌ Replace: Tenstorrent's
Lazy<Mutex<Option<Vec<>>>> → OnceLock<HashMap<>>
- ❌ Refactor: AMD's per-device OnceLock → centralized HashMap
2. Common DeviceStaticInfo Trait/Structure
Goal: Define shared interface for static device information
Proposed Structure:
// In src/device/common.rs or src/device/cache.rs
pub struct DeviceStaticInfo {
pub name: String,
pub uuid: Option<String>, // Not all devices have UUID
pub detail: HashMap<String, String>, // Extensible key-value storage
}
// Helper trait for platforms with specific needs
pub trait StaticDeviceInfoExt {
fn from_platform_specific(data: &PlatformData) -> Self;
fn get_library_name() -> &'static str;
fn get_library_version(&self) -> Option<String>;
}
Benefits:
- Consistent structure across all readers
- Extensible via
detail HashMap
- Platform-specific data via trait extensions
- Easy serialization for API mode
3. Standardized Initialization Pattern
Goal: Consistent naming and error handling for cache initialization
Proposed Pattern:
fn ensure_static_cache_initialized() -> Result<(), Box<dyn std::error::Error>> {
if DEVICE_STATIC_CACHE.get().is_some() {
return Ok(());
}
// Platform-specific discovery and initialization
let devices = discover_devices()?;
// Validate device count
if devices.len() > MAX_DEVICES {
return Err(format!("Too many devices: {}", devices.len()).into());
}
let cache = devices.into_iter()
.map(|(key, info)| (key, DeviceStaticInfo::from_platform_specific(&info)))
.collect::<HashMap<_, _>>();
let _ = DEVICE_STATIC_CACHE.set(cache);
Ok(())
}
Changes:
- Standardize function name across all readers
- Consistent error handling patterns
- Common validation (MAX_DEVICES = 256)
- Clear separation of discovery vs. caching logic
4. Helper Functions for Common Operations
Goal: Extract common patterns into reusable utilities
Proposed Helpers (in src/device/cache_utils.rs or similar):
// Version extraction helper
pub fn get_or_init_library_version<F>(
cache: &OnceLock<Option<String>>,
extractor: F
) -> Option<String>
where
F: FnOnce() -> Option<String>,
{
cache.get_or_init(|| extractor()).clone()
}
// Detail HashMap builder
pub struct DetailBuilder {
detail: HashMap<String, String>,
}
impl DetailBuilder {
pub fn new() -> Self { /* ... */ }
pub fn insert_lib_info(&mut self, name: &str, version: Option<&str>) -> &mut Self { /* ... */ }
pub fn insert_pci_info(&mut self, pci_data: &PciInfo) -> &mut Self { /* ... */ }
pub fn insert_optional(&mut self, key: &str, value: Option<String>) -> &mut Self { /* ... */ }
pub fn build(self) -> HashMap<String, String> { /* ... */ }
}
// PCI information parser
pub fn parse_pci_info(pci_string: &str) -> Result<PciInfo, ParseError> { /* ... */ }
Benefits:
- Reduce code duplication
- Consistent behavior across platforms
- Easier to test and maintain
- Clear separation of concerns
5. Macro Improvements and Consolidation
Goal: Enhance existing macros and move to common module
Current Macros:
extract_struct_fields! (Rebellions)
insert_optional_fields! (Rebellions)
Proposed Enhancements:
// In src/device/macros.rs
#[macro_export]
macro_rules! build_detail_map {
($($key:expr => $value:expr),* $(,)?) => {{
let mut map = HashMap::new();
$(
if let Some(val) = $value {
map.insert($key.to_string(), val.to_string());
}
)*
map
}};
}
#[macro_export]
macro_rules! cache_device_static_info {
($cache:expr, $key_type:ty, $discovery_fn:expr) => {{
if $cache.get().is_some() {
return Ok(());
}
let devices = $discovery_fn?;
let cache_map = devices.into_iter()
.collect::<HashMap<$key_type, DeviceStaticInfo>>();
let _ = $cache.set(cache_map);
Ok(())
}};
}
Benefits:
- Reduce boilerplate in reader implementations
- Consistent error handling
- Easier to add new accelerator support
6. HashMap Key Type Standardization
Current Key Types:
- NVIDIA/Jetson:
u32 (device index) ✅ Good for stable indexed devices
- AMD: Per-device (no HashMap) ❌ Needs refactoring
- Rebellions:
String (UUID) ✅ Good for UUID-based identification
- Furiosa:
String (mixed: index/UUID) ⚠️ Needs consistency
- Tenstorrent: Vec (no HashMap) ❌ Needs refactoring
- Apple: Global cache ❌ Needs refactoring
Proposed Guidelines:
- Use
u32 for index-based devices (NVIDIA, Jetson, AMD)
- Use
String for UUID-based devices (Rebellions, Furiosa RS)
- Use
String for mixed identification (Furiosa CLI as fallback)
- Document key type rationale in each reader
Implementation Checklist
Phase 1: Foundation (Common Abstractions)
Phase 2: Reader Refactoring (Per-Platform)
Phase 3: Testing and Validation
Phase 4: Documentation
Expected Benefits
Maintainability
- Single source of truth: Common patterns reduce cognitive load
- Easier debugging: Consistent error handling and logging
- Simpler reviews: Reviewers familiar with one reader understand all
- Future-proof: New accelerator support follows established patterns
Code Quality
- Reduced duplication: ~200-300 lines of common code extracted
- Better testability: Isolated utilities easier to test
- Type safety: Consistent structures reduce runtime errors
- Clear contracts: Traits define expected behavior
Performance
Developer Experience
- Clear patterns: New contributors can add accelerator support easily
- Better tooling: Common structures enable better IDE support
- Less friction: Reduced platform-specific quirks
Non-Goals / Out of Scope
This refactoring does NOT include:
- Changing the public API or CLI interface
- Modifying metric collection logic (only caching patterns)
- Performance optimizations beyond consistency improvements
- Adding new accelerator platform support
- Changing the overall reader architecture (traits remain)
Success Criteria
Implementation Notes
Migration Strategy
- Implement foundation (Phase 1) without breaking existing code
- Migrate one reader at a time (Phase 2), with tests after each
- Keep old code until new code is proven (feature flags if needed)
- Clean up after all readers migrated
Backward Compatibility
- API mode metrics format remains unchanged
- Mock server responses remain compatible
- No breaking changes to public interfaces
Review Process
- Each phase should be a separate PR
- Individual reader migrations can be separate PRs
- Foundation PR requires thorough review (affects all platforms)
Related Issues and PRs
Priority Rationale
Medium Priority because:
- ✅ Not urgent: Current implementations work correctly
- ✅ Important: Significant maintainability and code quality benefits
- ✅ Foundation work: Will make future accelerator additions much easier
- ✅ Technical debt: Addresses inconsistencies from incremental development
- ❌ Not critical: No bugs or performance issues in current implementation
Recommended timeline: Implement over 2-3 development cycles to ensure quality and thorough testing.
Summary
Following the successful implementation of static value caching in PR #69, we've identified significant inconsistencies in how different AI accelerator reader implementations handle caching and static device information. This refactoring will standardize caching patterns across all accelerator readers (NVIDIA GPU, AMD GPU, Apple Silicon, NVIDIA Jetson, Tenstorrent NPU, Rebellions NPU, Furiosa NPU) to improve maintainability, consistency, and code reuse.
Current State Analysis
Caching Pattern Inconsistencies
Each accelerator implementation uses different caching approaches:
NVIDIA GPU (
nvidia.rs):OnceLock<HashMap<u32, DeviceStaticInfo>>for device infoOnceLock<Option<String>>for driver_version and cuda_versionget_device_static_info()u32(device index)AMD GPU (
amd.rs):OnceLock<DeviceStaticInfo>(no HashMap)OnceLock<Option<String>>for rocm_versionApple Silicon (
apple_silicon.rs):OnceCell+ globalLazy<Mutex<Option<GpuInfoCache>>>ensure_initialized()with global cacheNVIDIA Jetson (
nvidia_jetson.rs):OnceLock<HashMap<u32, DeviceStaticInfo>>u32(device index)Tenstorrent NPU (
tenstorrent.rs):Lazy<Mutex<Option<Vec<CachedChipInfo>>>>ensure_chips_initialized()with global LazyRebellions NPU (
rebellions.rs):OnceLock<String>for kmd_versionOnceLock<HashMap<String, DeviceStaticInfo>>for device infoensure_static_cache_initialized(response)patternString(UUID)Furiosa NPU (
furiosa.rs):OnceLock<HashMap<String, DeviceStaticInfo>>for both CLI and RS methodsString(device index for CLI, UUID for RS)DeviceStaticInfo Structure Variations
detail: HashMap<String, String>for extensible static fieldsVersion/Library Info Caching
Different approaches to caching driver/library versions:
driver_versionandcuda_versionas separateOnceLockrocm_versionasOnceLock<Option<String>>kmd_versionasOnceLock<String>driver_versionasOnceCell<Option<String>>Proposed Refactoring
1. Unified Caching Pattern
Goal: Standardize on a single caching approach across all readers
Proposed Standard:
Changes:
OnceCell→OnceLockLazy<Mutex<>>→OnceLock<HashMap<>>Lazy<Mutex<Option<Vec<>>>>→OnceLock<HashMap<>>2. Common DeviceStaticInfo Trait/Structure
Goal: Define shared interface for static device information
Proposed Structure:
Benefits:
detailHashMap3. Standardized Initialization Pattern
Goal: Consistent naming and error handling for cache initialization
Proposed Pattern:
Changes:
4. Helper Functions for Common Operations
Goal: Extract common patterns into reusable utilities
Proposed Helpers (in
src/device/cache_utils.rsor similar):Benefits:
5. Macro Improvements and Consolidation
Goal: Enhance existing macros and move to common module
Current Macros:
extract_struct_fields!(Rebellions)insert_optional_fields!(Rebellions)Proposed Enhancements:
Benefits:
6. HashMap Key Type Standardization
Current Key Types:
u32(device index) ✅ Good for stable indexed devicesString(UUID) ✅ Good for UUID-based identificationString(mixed: index/UUID)Proposed Guidelines:
u32for index-based devices (NVIDIA, Jetson, AMD)Stringfor UUID-based devices (Rebellions, Furiosa RS)Stringfor mixed identification (Furiosa CLI as fallback)Implementation Checklist
Phase 1: Foundation (Common Abstractions)
src/device/cache_utils.rswith common cache utilitiesDeviceStaticInfostandard structureStaticDeviceInfoExttraitsrc/device/macros.rsPhase 2: Reader Refactoring (Per-Platform)
NVIDIA GPU (
nvidia.rs):AMD GPU (
amd.rs):u32as key type (device index)Apple Silicon (
apple_silicon.rs):u32for GPU index)NVIDIA Jetson (
nvidia_jetson.rs):Tenstorrent NPU (
tenstorrent.rs):u32for chip index orStringfor chip ID)Rebellions NPU (
rebellions.rs):detailHashMapFuriosa NPU (
furiosa.rs):Phase 3: Testing and Validation
Phase 4: Documentation
Expected Benefits
Maintainability
Code Quality
Performance
Developer Experience
Non-Goals / Out of Scope
This refactoring does NOT include:
Success Criteria
Implementation Notes
Migration Strategy
Backward Compatibility
Review Process
Related Issues and PRs
Priority Rationale
Medium Priority because:
Recommended timeline: Implement over 2-3 development cycles to ensure quality and thorough testing.