Skip to content

Refactor AI accelerator readers to use unified caching patterns and common abstractions #70

Description

@inureyes

Summary

Following the successful implementation of static value caching in PR #69, we've identified significant inconsistencies in how different AI accelerator reader implementations handle caching and static device information. This refactoring will standardize caching patterns across all accelerator readers (NVIDIA GPU, AMD GPU, Apple Silicon, NVIDIA Jetson, Tenstorrent NPU, Rebellions NPU, Furiosa NPU) to improve maintainability, consistency, and code reuse.

Current State Analysis

Caching Pattern Inconsistencies

Each accelerator implementation uses different caching approaches:

  1. NVIDIA GPU (nvidia.rs):

    • OnceLock<HashMap<u32, DeviceStaticInfo>> for device info
    • Separate OnceLock<Option<String>> for driver_version and cuda_version
    • Bulk initialization via get_device_static_info()
    • Key: u32 (device index)
  2. AMD GPU (amd.rs):

    • Per-device OnceLock<DeviceStaticInfo> (no HashMap)
    • Separate OnceLock<Option<String>> for rocm_version
    • Per-device initialization
    • No global cache structure
  3. Apple Silicon (apple_silicon.rs):

    • Uses OnceCell + global Lazy<Mutex<Option<GpuInfoCache>>>
    • Stores fields directly in reader struct
    • ensure_initialized() with global cache
    • Different synchronization primitives
  4. NVIDIA Jetson (nvidia_jetson.rs):

    • Similar to NVIDIA GPU but with Tegra-specific handling
    • OnceLock<HashMap<u32, DeviceStaticInfo>>
    • Key: u32 (device index)
  5. Tenstorrent NPU (tenstorrent.rs):

    • Lazy<Mutex<Option<Vec<CachedChipInfo>>>>
    • Uses Vec instead of HashMap
    • ensure_chips_initialized() with global Lazy
    • Different synchronization approach
  6. Rebellions NPU (rebellions.rs):

    • OnceLock<String> for kmd_version
    • OnceLock<HashMap<String, DeviceStaticInfo>> for device info
    • ensure_static_cache_initialized(response) pattern
    • Key: String (UUID)
  7. Furiosa NPU (furiosa.rs):

    • OnceLock<HashMap<String, DeviceStaticInfo>> for both CLI and RS methods
    • Separate caches for different backend methods
    • Key: String (device index for CLI, UUID for RS)

DeviceStaticInfo Structure Variations

  • NVIDIA/Furiosa/AMD: Uses detail: HashMap<String, String> for extensible static fields
  • Rebellions/Tenstorrent: Uses individual struct fields (less flexible)
  • Apple Silicon: Stores fields directly in reader struct (no separate StaticInfo)

Version/Library Info Caching

Different approaches to caching driver/library versions:

  • NVIDIA: driver_version and cuda_version as separate OnceLock
  • AMD: rocm_version as OnceLock<Option<String>>
  • Rebellions: kmd_version as OnceLock<String>
  • Apple: driver_version as OnceCell<Option<String>>
  • Tenstorrent: Embedded in chip info
  • Furiosa: Part of DeviceStaticInfo

Proposed Refactoring

1. Unified Caching Pattern

Goal: Standardize on a single caching approach across all readers

Proposed Standard:

// Common pattern for all accelerators
static DEVICE_STATIC_CACHE: OnceLock<HashMap<KEY_TYPE, DeviceStaticInfo>> = OnceLock::new();
static LIBRARY_VERSION: OnceLock<Option<String>> = OnceLock::new();

Changes:

  • ✅ Keep: NVIDIA, Rebellions, Furiosa patterns (already using OnceLock + HashMap)
  • ❌ Replace: Apple Silicon's OnceCellOnceLock
  • ❌ Replace: Apple Silicon's Lazy<Mutex<>>OnceLock<HashMap<>>
  • ❌ Replace: Tenstorrent's Lazy<Mutex<Option<Vec<>>>>OnceLock<HashMap<>>
  • ❌ Refactor: AMD's per-device OnceLock → centralized HashMap

2. Common DeviceStaticInfo Trait/Structure

Goal: Define shared interface for static device information

Proposed Structure:

// In src/device/common.rs or src/device/cache.rs
pub struct DeviceStaticInfo {
    pub name: String,
    pub uuid: Option<String>,  // Not all devices have UUID
    pub detail: HashMap<String, String>,  // Extensible key-value storage
}

// Helper trait for platforms with specific needs
pub trait StaticDeviceInfoExt {
    fn from_platform_specific(data: &PlatformData) -> Self;
    fn get_library_name() -> &'static str;
    fn get_library_version(&self) -> Option<String>;
}

Benefits:

  • Consistent structure across all readers
  • Extensible via detail HashMap
  • Platform-specific data via trait extensions
  • Easy serialization for API mode

3. Standardized Initialization Pattern

Goal: Consistent naming and error handling for cache initialization

Proposed Pattern:

fn ensure_static_cache_initialized() -> Result<(), Box<dyn std::error::Error>> {
    if DEVICE_STATIC_CACHE.get().is_some() {
        return Ok(());
    }
    
    // Platform-specific discovery and initialization
    let devices = discover_devices()?;
    
    // Validate device count
    if devices.len() > MAX_DEVICES {
        return Err(format!("Too many devices: {}", devices.len()).into());
    }
    
    let cache = devices.into_iter()
        .map(|(key, info)| (key, DeviceStaticInfo::from_platform_specific(&info)))
        .collect::<HashMap<_, _>>();
    
    let _ = DEVICE_STATIC_CACHE.set(cache);
    Ok(())
}

Changes:

  • Standardize function name across all readers
  • Consistent error handling patterns
  • Common validation (MAX_DEVICES = 256)
  • Clear separation of discovery vs. caching logic

4. Helper Functions for Common Operations

Goal: Extract common patterns into reusable utilities

Proposed Helpers (in src/device/cache_utils.rs or similar):

// Version extraction helper
pub fn get_or_init_library_version<F>(
    cache: &OnceLock<Option<String>>,
    extractor: F
) -> Option<String>
where
    F: FnOnce() -> Option<String>,
{
    cache.get_or_init(|| extractor()).clone()
}

// Detail HashMap builder
pub struct DetailBuilder {
    detail: HashMap<String, String>,
}

impl DetailBuilder {
    pub fn new() -> Self { /* ... */ }
    pub fn insert_lib_info(&mut self, name: &str, version: Option<&str>) -> &mut Self { /* ... */ }
    pub fn insert_pci_info(&mut self, pci_data: &PciInfo) -> &mut Self { /* ... */ }
    pub fn insert_optional(&mut self, key: &str, value: Option<String>) -> &mut Self { /* ... */ }
    pub fn build(self) -> HashMap<String, String> { /* ... */ }
}

// PCI information parser
pub fn parse_pci_info(pci_string: &str) -> Result<PciInfo, ParseError> { /* ... */ }

Benefits:

  • Reduce code duplication
  • Consistent behavior across platforms
  • Easier to test and maintain
  • Clear separation of concerns

5. Macro Improvements and Consolidation

Goal: Enhance existing macros and move to common module

Current Macros:

  • extract_struct_fields! (Rebellions)
  • insert_optional_fields! (Rebellions)

Proposed Enhancements:

// In src/device/macros.rs
#[macro_export]
macro_rules! build_detail_map {
    ($($key:expr => $value:expr),* $(,)?) => {{
        let mut map = HashMap::new();
        $(
            if let Some(val) = $value {
                map.insert($key.to_string(), val.to_string());
            }
        )*
        map
    }};
}

#[macro_export]
macro_rules! cache_device_static_info {
    ($cache:expr, $key_type:ty, $discovery_fn:expr) => {{
        if $cache.get().is_some() {
            return Ok(());
        }
        
        let devices = $discovery_fn?;
        let cache_map = devices.into_iter()
            .collect::<HashMap<$key_type, DeviceStaticInfo>>();
        
        let _ = $cache.set(cache_map);
        Ok(())
    }};
}

Benefits:

  • Reduce boilerplate in reader implementations
  • Consistent error handling
  • Easier to add new accelerator support

6. HashMap Key Type Standardization

Current Key Types:

  • NVIDIA/Jetson: u32 (device index) ✅ Good for stable indexed devices
  • AMD: Per-device (no HashMap) ❌ Needs refactoring
  • Rebellions: String (UUID) ✅ Good for UUID-based identification
  • Furiosa: String (mixed: index/UUID) ⚠️ Needs consistency
  • Tenstorrent: Vec (no HashMap) ❌ Needs refactoring
  • Apple: Global cache ❌ Needs refactoring

Proposed Guidelines:

  • Use u32 for index-based devices (NVIDIA, Jetson, AMD)
  • Use String for UUID-based devices (Rebellions, Furiosa RS)
  • Use String for mixed identification (Furiosa CLI as fallback)
  • Document key type rationale in each reader

Implementation Checklist

Phase 1: Foundation (Common Abstractions)

  • Create src/device/cache_utils.rs with common cache utilities
  • Define DeviceStaticInfo standard structure
  • Implement StaticDeviceInfoExt trait
  • Create helper functions (version extraction, detail builder, PCI parser)
  • Move and enhance macros to src/device/macros.rs
  • Add comprehensive tests for common utilities

Phase 2: Reader Refactoring (Per-Platform)

  • NVIDIA GPU (nvidia.rs):

    • Adopt common DeviceStaticInfo structure
    • Use helper functions for detail building
    • Standardize initialization function name
  • AMD GPU (amd.rs):

    • Migrate from per-device OnceLock to HashMap pattern
    • Use u32 as key type (device index)
    • Adopt common DeviceStaticInfo structure
    • Use helper functions for ROCm version caching
  • Apple Silicon (apple_silicon.rs):

    • Replace OnceCell with OnceLock
    • Replace global Lazy<Mutex<>> with OnceLock<HashMap<>>
    • Adopt standard DeviceStaticInfo structure
    • Choose appropriate key type (likely u32 for GPU index)
  • NVIDIA Jetson (nvidia_jetson.rs):

    • Align with NVIDIA GPU pattern
    • Adopt common DeviceStaticInfo structure
    • Use helper functions for Tegra-specific info
  • Tenstorrent NPU (tenstorrent.rs):

    • Replace Lazy<Mutex> with OnceLock<HashMap<>>
    • Choose key type (likely u32 for chip index or String for chip ID)
    • Adopt common DeviceStaticInfo structure
    • Use helper functions for multi-temperature/clock handling
  • Rebellions NPU (rebellions.rs):

    • Already using OnceLock + HashMap (good!)
    • Migrate DeviceStaticInfo to use detail HashMap
    • Adopt common helper functions
    • Use enhanced macros
  • Furiosa NPU (furiosa.rs):

    • Already using OnceLock + HashMap (good!)
    • Unify CLI and RS cache patterns
    • Adopt common helper functions
    • Document key type strategy (index vs UUID)

Phase 3: Testing and Validation

  • Unit tests for each refactored reader
  • Integration tests across all platforms
  • Mock server compatibility verification
  • Performance benchmarks (should maintain ~95% reduction from PR perf: optimize GPU/NPU readers by caching static values #69)
  • API mode metrics validation
  • Remote monitoring compatibility check

Phase 4: Documentation

  • Update CLAUDE.md with new patterns
  • Add inline documentation for common utilities
  • Create migration guide for future accelerator additions
  • Document key type selection rationale per platform

Expected Benefits

Maintainability

  • Single source of truth: Common patterns reduce cognitive load
  • Easier debugging: Consistent error handling and logging
  • Simpler reviews: Reviewers familiar with one reader understand all
  • Future-proof: New accelerator support follows established patterns

Code Quality

  • Reduced duplication: ~200-300 lines of common code extracted
  • Better testability: Isolated utilities easier to test
  • Type safety: Consistent structures reduce runtime errors
  • Clear contracts: Traits define expected behavior

Performance

Developer Experience

  • Clear patterns: New contributors can add accelerator support easily
  • Better tooling: Common structures enable better IDE support
  • Less friction: Reduced platform-specific quirks

Non-Goals / Out of Scope

This refactoring does NOT include:

  • Changing the public API or CLI interface
  • Modifying metric collection logic (only caching patterns)
  • Performance optimizations beyond consistency improvements
  • Adding new accelerator platform support
  • Changing the overall reader architecture (traits remain)

Success Criteria

  • All 7 accelerator readers use OnceLock-based caching
  • All readers use common DeviceStaticInfo structure
  • All readers use standardized initialization pattern
  • Common utilities tested with >90% coverage
  • No performance regression (benchmarks pass)
  • All existing tests pass
  • Documentation updated and complete

Implementation Notes

Migration Strategy

  1. Implement foundation (Phase 1) without breaking existing code
  2. Migrate one reader at a time (Phase 2), with tests after each
  3. Keep old code until new code is proven (feature flags if needed)
  4. Clean up after all readers migrated

Backward Compatibility

  • API mode metrics format remains unchanged
  • Mock server responses remain compatible
  • No breaking changes to public interfaces

Review Process

  • Each phase should be a separate PR
  • Individual reader migrations can be separate PRs
  • Foundation PR requires thorough review (affects all platforms)

Related Issues and PRs

Priority Rationale

Medium Priority because:

  • ✅ Not urgent: Current implementations work correctly
  • ✅ Important: Significant maintainability and code quality benefits
  • ✅ Foundation work: Will make future accelerator additions much easier
  • ✅ Technical debt: Addresses inconsistencies from incremental development
  • ❌ Not critical: No bugs or performance issues in current implementation

Recommended timeline: Implement over 2-3 development cycles to ensure quality and thorough testing.

Metadata

Metadata

Assignees

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions