feat(snapshot): add --runtime-class flag as alternative to --require-gpu for CDI environments

## Problem

When CDI is enabled, the snapshot agent pod needs GPU access to run `nvidia-smi`. The existing `--require-gpu` flag solves this by requesting `nvidia.com/gpu: 1`, but this **fails when all GPUs are already allocated** to workloads.

## Solution

Add a `--runtime-class` flag to `aicr snapshot` that:

1. Sets `runtimeClassName` on the agent Job's pod spec (e.g., `nvidia`)
2. Injects `NVIDIA_VISIBLE_DEVICES=all` environment variable into the container

This gives the agent access to `nvidia-smi` via the NVIDIA container runtime **without consuming a GPU from the Device Plugin**. The snapshot GPU collector only needs to run `nvidia-smi -q -x` — it does not need a dedicated GPU allocation.

## Flags behavior

- `--runtime-class` and `--require-gpu` are **mutually exclusive**
- `--runtime-class` is the preferred approach; the error message when both are set recommends it
- Supports `AICR_RUNTIME_CLASS` environment variable

## Acceptance criteria

- [ ] `aicr snapshot --runtime-class nvidia` sets `runtimeClassName` on the agent pod
- [ ] `NVIDIA_VISIBLE_DEVICES=all` is injected when `--runtime-class` is set
- [ ] `--require-gpu` and `--runtime-class` together produce a clear error recommending `--runtime-class`
- [ ] Table-driven tests cover the new flag in `job_test.go`
- [ ] `make qualify` passes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(snapshot): add --runtime-class flag as alternative to --require-gpu for CDI environments #433

Problem

Solution

Flags behavior

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(snapshot): add --runtime-class flag as alternative to --require-gpu for CDI environments #433

Description

Problem

Solution

Flags behavior

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions