Refactor AI accelerator readers to use unified caching patterns and common abstractions

## Summary

Following the successful implementation of static value caching in PR #69, we've identified significant inconsistencies in how different AI accelerator reader implementations handle caching and static device information. This refactoring will standardize caching patterns across all accelerator readers (NVIDIA GPU, AMD GPU, Apple Silicon, NVIDIA Jetson, Tenstorrent NPU, Rebellions NPU, Furiosa NPU) to improve maintainability, consistency, and code reuse.

## Current State Analysis

### Caching Pattern Inconsistencies

Each accelerator implementation uses different caching approaches:

1. **NVIDIA GPU** (`nvidia.rs`):
   - `OnceLock<HashMap<u32, DeviceStaticInfo>>` for device info
   - Separate `OnceLock<Option<String>>` for driver_version and cuda_version
   - Bulk initialization via `get_device_static_info()`
   - Key: `u32` (device index)

2. **AMD GPU** (`amd.rs`):
   - Per-device `OnceLock<DeviceStaticInfo>` (no HashMap)
   - Separate `OnceLock<Option<String>>` for rocm_version
   - Per-device initialization
   - No global cache structure

3. **Apple Silicon** (`apple_silicon.rs`):
   - Uses `OnceCell` + global `Lazy<Mutex<Option<GpuInfoCache>>>`
   - Stores fields directly in reader struct
   - `ensure_initialized()` with global cache
   - Different synchronization primitives

4. **NVIDIA Jetson** (`nvidia_jetson.rs`):
   - Similar to NVIDIA GPU but with Tegra-specific handling
   - `OnceLock<HashMap<u32, DeviceStaticInfo>>`
   - Key: `u32` (device index)

5. **Tenstorrent NPU** (`tenstorrent.rs`):
   - `Lazy<Mutex<Option<Vec<CachedChipInfo>>>>`
   - Uses Vec instead of HashMap
   - `ensure_chips_initialized()` with global Lazy
   - Different synchronization approach

6. **Rebellions NPU** (`rebellions.rs`):
   - `OnceLock<String>` for kmd_version
   - `OnceLock<HashMap<String, DeviceStaticInfo>>` for device info
   - `ensure_static_cache_initialized(response)` pattern
   - Key: `String` (UUID)

7. **Furiosa NPU** (`furiosa.rs`):
   - `OnceLock<HashMap<String, DeviceStaticInfo>>` for both CLI and RS methods
   - Separate caches for different backend methods
   - Key: `String` (device index for CLI, UUID for RS)

### DeviceStaticInfo Structure Variations

- **NVIDIA/Furiosa/AMD**: Uses `detail: HashMap<String, String>` for extensible static fields
- **Rebellions/Tenstorrent**: Uses individual struct fields (less flexible)
- **Apple Silicon**: Stores fields directly in reader struct (no separate StaticInfo)

### Version/Library Info Caching

Different approaches to caching driver/library versions:
- NVIDIA: `driver_version` and `cuda_version` as separate `OnceLock`
- AMD: `rocm_version` as `OnceLock<Option<String>>`
- Rebellions: `kmd_version` as `OnceLock<String>`
- Apple: `driver_version` as `OnceCell<Option<String>>`
- Tenstorrent: Embedded in chip info
- Furiosa: Part of DeviceStaticInfo

## Proposed Refactoring

### 1. Unified Caching Pattern

**Goal**: Standardize on a single caching approach across all readers

**Proposed Standard**:
```rust
// Common pattern for all accelerators
static DEVICE_STATIC_CACHE: OnceLock<HashMap<KEY_TYPE, DeviceStaticInfo>> = OnceLock::new();
static LIBRARY_VERSION: OnceLock<Option<String>> = OnceLock::new();
```

**Changes**:
- ✅ Keep: NVIDIA, Rebellions, Furiosa patterns (already using OnceLock + HashMap)
- ❌ Replace: Apple Silicon's `OnceCell` → `OnceLock`
- ❌ Replace: Apple Silicon's `Lazy<Mutex<>>` → `OnceLock<HashMap<>>`
- ❌ Replace: Tenstorrent's `Lazy<Mutex<Option<Vec<>>>>` → `OnceLock<HashMap<>>`
- ❌ Refactor: AMD's per-device OnceLock → centralized HashMap

### 2. Common DeviceStaticInfo Trait/Structure

**Goal**: Define shared interface for static device information

**Proposed Structure**:
```rust
// In src/device/common.rs or src/device/cache.rs
pub struct DeviceStaticInfo {
    pub name: String,
    pub uuid: Option<String>,  // Not all devices have UUID
    pub detail: HashMap<String, String>,  // Extensible key-value storage
}

// Helper trait for platforms with specific needs
pub trait StaticDeviceInfoExt {
    fn from_platform_specific(data: &PlatformData) -> Self;
    fn get_library_name() -> &'static str;
    fn get_library_version(&self) -> Option<String>;
}
```

**Benefits**:
- Consistent structure across all readers
- Extensible via `detail` HashMap
- Platform-specific data via trait extensions
- Easy serialization for API mode

### 3. Standardized Initialization Pattern

**Goal**: Consistent naming and error handling for cache initialization

**Proposed Pattern**:
```rust
fn ensure_static_cache_initialized() -> Result<(), Box<dyn std::error::Error>> {
    if DEVICE_STATIC_CACHE.get().is_some() {
        return Ok(());
    }
    
    // Platform-specific discovery and initialization
    let devices = discover_devices()?;
    
    // Validate device count
    if devices.len() > MAX_DEVICES {
        return Err(format!("Too many devices: {}", devices.len()).into());
    }
    
    let cache = devices.into_iter()
        .map(|(key, info)| (key, DeviceStaticInfo::from_platform_specific(&info)))
        .collect::<HashMap<_, _>>();
    
    let _ = DEVICE_STATIC_CACHE.set(cache);
    Ok(())
}
```

**Changes**:
- Standardize function name across all readers
- Consistent error handling patterns
- Common validation (MAX_DEVICES = 256)
- Clear separation of discovery vs. caching logic

### 4. Helper Functions for Common Operations

**Goal**: Extract common patterns into reusable utilities

**Proposed Helpers** (in `src/device/cache_utils.rs` or similar):

```rust
// Version extraction helper
pub fn get_or_init_library_version<F>(
    cache: &OnceLock<Option<String>>,
    extractor: F
) -> Option<String>
where
    F: FnOnce() -> Option<String>,
{
    cache.get_or_init(|| extractor()).clone()
}

// Detail HashMap builder
pub struct DetailBuilder {
    detail: HashMap<String, String>,
}

impl DetailBuilder {
    pub fn new() -> Self { /* ... */ }
    pub fn insert_lib_info(&mut self, name: &str, version: Option<&str>) -> &mut Self { /* ... */ }
    pub fn insert_pci_info(&mut self, pci_data: &PciInfo) -> &mut Self { /* ... */ }
    pub fn insert_optional(&mut self, key: &str, value: Option<String>) -> &mut Self { /* ... */ }
    pub fn build(self) -> HashMap<String, String> { /* ... */ }
}

// PCI information parser
pub fn parse_pci_info(pci_string: &str) -> Result<PciInfo, ParseError> { /* ... */ }
```

**Benefits**:
- Reduce code duplication
- Consistent behavior across platforms
- Easier to test and maintain
- Clear separation of concerns

### 5. Macro Improvements and Consolidation

**Goal**: Enhance existing macros and move to common module

**Current Macros**:
- `extract_struct_fields!` (Rebellions)
- `insert_optional_fields!` (Rebellions)

**Proposed Enhancements**:
```rust
// In src/device/macros.rs
#[macro_export]
macro_rules! build_detail_map {
    ($($key:expr => $value:expr),* $(,)?) => {{
        let mut map = HashMap::new();
        $(
            if let Some(val) = $value {
                map.insert($key.to_string(), val.to_string());
            }
        )*
        map
    }};
}

#[macro_export]
macro_rules! cache_device_static_info {
    ($cache:expr, $key_type:ty, $discovery_fn:expr) => {{
        if $cache.get().is_some() {
            return Ok(());
        }
        
        let devices = $discovery_fn?;
        let cache_map = devices.into_iter()
            .collect::<HashMap<$key_type, DeviceStaticInfo>>();
        
        let _ = $cache.set(cache_map);
        Ok(())
    }};
}
```

**Benefits**:
- Reduce boilerplate in reader implementations
- Consistent error handling
- Easier to add new accelerator support

### 6. HashMap Key Type Standardization

**Current Key Types**:
- NVIDIA/Jetson: `u32` (device index) ✅ Good for stable indexed devices
- AMD: Per-device (no HashMap) ❌ Needs refactoring
- Rebellions: `String` (UUID) ✅ Good for UUID-based identification
- Furiosa: `String` (mixed: index/UUID) ⚠️ Needs consistency
- Tenstorrent: Vec (no HashMap) ❌ Needs refactoring
- Apple: Global cache ❌ Needs refactoring

**Proposed Guidelines**:
- Use `u32` for index-based devices (NVIDIA, Jetson, AMD)
- Use `String` for UUID-based devices (Rebellions, Furiosa RS)
- Use `String` for mixed identification (Furiosa CLI as fallback)
- Document key type rationale in each reader

## Implementation Checklist

### Phase 1: Foundation (Common Abstractions)
- [ ] Create `src/device/cache_utils.rs` with common cache utilities
- [ ] Define `DeviceStaticInfo` standard structure
- [ ] Implement `StaticDeviceInfoExt` trait
- [ ] Create helper functions (version extraction, detail builder, PCI parser)
- [ ] Move and enhance macros to `src/device/macros.rs`
- [ ] Add comprehensive tests for common utilities

### Phase 2: Reader Refactoring (Per-Platform)
- [ ] **NVIDIA GPU** (`nvidia.rs`):
  - [ ] Adopt common DeviceStaticInfo structure
  - [ ] Use helper functions for detail building
  - [ ] Standardize initialization function name
  
- [ ] **AMD GPU** (`amd.rs`):
  - [ ] Migrate from per-device OnceLock to HashMap pattern
  - [ ] Use `u32` as key type (device index)
  - [ ] Adopt common DeviceStaticInfo structure
  - [ ] Use helper functions for ROCm version caching
  
- [ ] **Apple Silicon** (`apple_silicon.rs`):
  - [ ] Replace OnceCell with OnceLock
  - [ ] Replace global Lazy<Mutex<>> with OnceLock<HashMap<>>
  - [ ] Adopt standard DeviceStaticInfo structure
  - [ ] Choose appropriate key type (likely `u32` for GPU index)
  
- [ ] **NVIDIA Jetson** (`nvidia_jetson.rs`):
  - [ ] Align with NVIDIA GPU pattern
  - [ ] Adopt common DeviceStaticInfo structure
  - [ ] Use helper functions for Tegra-specific info
  
- [ ] **Tenstorrent NPU** (`tenstorrent.rs`):
  - [ ] Replace Lazy<Mutex<Vec>> with OnceLock<HashMap<>>
  - [ ] Choose key type (likely `u32` for chip index or `String` for chip ID)
  - [ ] Adopt common DeviceStaticInfo structure
  - [ ] Use helper functions for multi-temperature/clock handling
  
- [ ] **Rebellions NPU** (`rebellions.rs`):
  - [ ] Already using OnceLock + HashMap (good!)
  - [ ] Migrate DeviceStaticInfo to use `detail` HashMap
  - [ ] Adopt common helper functions
  - [ ] Use enhanced macros
  
- [ ] **Furiosa NPU** (`furiosa.rs`):
  - [ ] Already using OnceLock + HashMap (good!)
  - [ ] Unify CLI and RS cache patterns
  - [ ] Adopt common helper functions
  - [ ] Document key type strategy (index vs UUID)

### Phase 3: Testing and Validation
- [ ] Unit tests for each refactored reader
- [ ] Integration tests across all platforms
- [ ] Mock server compatibility verification
- [ ] Performance benchmarks (should maintain ~95% reduction from PR #69)
- [ ] API mode metrics validation
- [ ] Remote monitoring compatibility check

### Phase 4: Documentation
- [ ] Update CLAUDE.md with new patterns
- [ ] Add inline documentation for common utilities
- [ ] Create migration guide for future accelerator additions
- [ ] Document key type selection rationale per platform

## Expected Benefits

### Maintainability
- **Single source of truth**: Common patterns reduce cognitive load
- **Easier debugging**: Consistent error handling and logging
- **Simpler reviews**: Reviewers familiar with one reader understand all
- **Future-proof**: New accelerator support follows established patterns

### Code Quality
- **Reduced duplication**: ~200-300 lines of common code extracted
- **Better testability**: Isolated utilities easier to test
- **Type safety**: Consistent structures reduce runtime errors
- **Clear contracts**: Traits define expected behavior

### Performance
- **Maintain gains from PR #69**: ~95% reduction in redundant calls preserved
- **Potential improvements**: Optimized common utilities benefit all platforms
- **Memory efficiency**: Consistent HashMap sizing and allocation

### Developer Experience
- **Clear patterns**: New contributors can add accelerator support easily
- **Better tooling**: Common structures enable better IDE support
- **Less friction**: Reduced platform-specific quirks

## Non-Goals / Out of Scope

This refactoring does NOT include:
- Changing the public API or CLI interface
- Modifying metric collection logic (only caching patterns)
- Performance optimizations beyond consistency improvements
- Adding new accelerator platform support
- Changing the overall reader architecture (traits remain)

## Success Criteria

- [ ] All 7 accelerator readers use OnceLock-based caching
- [ ] All readers use common DeviceStaticInfo structure
- [ ] All readers use standardized initialization pattern
- [ ] Common utilities tested with >90% coverage
- [ ] No performance regression (benchmarks pass)
- [ ] All existing tests pass
- [ ] Documentation updated and complete

## Implementation Notes

### Migration Strategy
1. Implement foundation (Phase 1) without breaking existing code
2. Migrate one reader at a time (Phase 2), with tests after each
3. Keep old code until new code is proven (feature flags if needed)
4. Clean up after all readers migrated

### Backward Compatibility
- API mode metrics format remains unchanged
- Mock server responses remain compatible
- No breaking changes to public interfaces

### Review Process
- Each phase should be a separate PR
- Individual reader migrations can be separate PRs
- Foundation PR requires thorough review (affects all platforms)

## Related Issues and PRs

- PR #69: Initial static value caching implementation (baseline for this refactoring)
- Builds on: Performance optimization work across all readers

## Priority Rationale

**Medium Priority** because:
- ✅ Not urgent: Current implementations work correctly
- ✅ Important: Significant maintainability and code quality benefits
- ✅ Foundation work: Will make future accelerator additions much easier
- ✅ Technical debt: Addresses inconsistencies from incremental development
- ❌ Not critical: No bugs or performance issues in current implementation

Recommended timeline: Implement over 2-3 development cycles to ensure quality and thorough testing.

Uh oh!

Refactor AI accelerator readers to use unified caching patterns and common abstractions #70

Description

Summary

Current State Analysis

Caching Pattern Inconsistencies

DeviceStaticInfo Structure Variations

Version/Library Info Caching

Proposed Refactoring

1. Unified Caching Pattern

2. Common DeviceStaticInfo Trait/Structure

3. Standardized Initialization Pattern

4. Helper Functions for Common Operations

5. Macro Improvements and Consolidation

6. HashMap Key Type Standardization

Implementation Checklist

Phase 1: Foundation (Common Abstractions)

Phase 2: Reader Refactoring (Per-Platform)

Phase 3: Testing and Validation

Phase 4: Documentation

Expected Benefits

Maintainability

Code Quality

Performance

Developer Experience

Non-Goals / Out of Scope

Success Criteria

Implementation Notes

Migration Strategy

Backward Compatibility

Review Process

Related Issues and PRs

Priority Rationale

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions