feat: add 'snapshot' subcommand for one-shot JSON/CSV/Prometheus output

## Summary

Add a new `all-smi snapshot` subcommand that produces a one-shot machine-readable dump of the current hardware state (GPU/CPU/memory/chassis/process/storage) to stdout. This unlocks scripting, CI probes, Slurm prolog/epilog hooks, and quick `jq`/`yq` piping without having to start the long-running `api` server.

## Motivation

Today the only machine-readable output path is the `api` subcommand's Prometheus `/metrics` endpoint, which requires running a long-lived HTTP server. The library API returns typed values but is only accessible from Rust code. Operators and CI users regularly want the `nvidia-smi --query-gpu=... --format=csv` ergonomics — one process invocation, machine-readable stdout, non-zero exit on failure. `all-smi` has all the data already (every device type already implements `Serialize`), but no CLI path to emit it for a single collection cycle.

## Current state

- `AllSmi::new()` library API returns `Vec<GpuInfo>`, `Vec<CpuInfo>`, etc., all `Serialize + Deserialize`.
- `api` mode uses these to build Prometheus text — long-lived server only.
- `local` and `view` subcommands enter a TUI loop — not scriptable.
- No CLI path for "collect once, print JSON/CSV, exit".

## Proposed design

New subcommand `all-smi snapshot`:

```
all-smi snapshot [--format json|csv|prometheus] [--pretty]
                 [--include gpu,cpu,memory,chassis,process,storage]
                 [--query index,uuid,name,utilization,memory.used,memory.total,temperature,power]
                 [--interval <secs>] [--samples <n>]
                 [--timeout-ms <n>] [--output <path>]
```

Semantics:

- Default: `--format json --pretty --include gpu,cpu,memory,chassis` (no `process`/`storage` by default to keep it fast).
- `--format json` prints a single top-level object `{ "schema": 1, "timestamp": "2026-04-18T...Z", "gpus": [...], "cpus": [...], ... }` matching the library types' serde schema.
- `--format csv` flattens to one row per device. Default columns per `--include` type; `--query` overrides with a comma-separated field list using dot paths (e.g. `memory.used`, `detail.cuda_version`).
- `--format prometheus` emits the exact Prometheus exposition the `/metrics` endpoint would emit for this single collection — i.e. reuses the exporter.
- `--samples N --interval T` collects N samples T seconds apart and emits a JSON array (or CSV with repeated rows sharing a `timestamp` column).
- `--output path` writes to a file instead of stdout; `-` means stdout (default).
- `--timeout-ms` applies per-reader (TPU/Gaudi can be slow); reader failures are reported under a top-level `errors` array in JSON mode and as an `errors` column or stderr line in CSV/prometheus modes.
- Process and storage sections are opt-in because they are expensive.

Exit codes:
- `0` — success (possibly with partial `errors` array).
- `1` — hard failure (no devices collected at all).
- `2` — CLI / flag parse error.

## Implementation plan

Files to add / modify:

- `src/cli.rs` — add `Snapshot(SnapshotArgs)` variant with the flags above.
- `src/main.rs` — dispatch.
- New module `src/snapshot/mod.rs` containing:
  - `SnapshotOptions` struct parsed from `SnapshotArgs`.
  - `fn run(opts: SnapshotOptions) -> anyhow::Result<()>` that drives a single collection using the existing `AllSmi` library client and reuses `src/api/metrics/*` exporters for the `prometheus` format path.
  - `serializers/json.rs`, `serializers/csv.rs`, `serializers/prometheus.rs` for each format. CSV lives here rather than pulling `csv` crate — write rows manually to keep deps light.
  - `query.rs` — dot-path field selector evaluated against `serde_json::Value` so every device type is supported uniformly.
- `src/lib.rs` — re-export `SnapshotOptions` and `snapshot::run` for programmatic use.
- `src/traits/exporter.rs` — extend `ExporterError` variants if needed; reuse existing `SerializationError`, `FormatError`, `UnsupportedFormat`.

Reuse rules:
- The JSON schema MUST be the same as the SSE endpoint when that lands (see companion issue). Schema version field `"schema": 1` pins this.
- The Prometheus serializer MUST be the same codepath as `src/api/metrics/*`; do not re-implement.

## Acceptance criteria

- [x] `all-smi snapshot` prints valid JSON with at least `schema`, `timestamp`, and one device array (`gpus` on supported systems; `cpus` + `memory` always).
- [x] `all-smi snapshot --format csv --query index,name,utilization,temperature` prints a CSV with a header row matching the query and one row per GPU.
- [x] `all-smi snapshot --format prometheus` byte-for-byte matches a single scrape of `api` mode's `/metrics` for the same data.
- [x] `all-smi snapshot --include cpu,memory` omits `gpus` entirely (not an empty array — absent key).
- [x] `all-smi snapshot --samples 3 --interval 1` emits a JSON array of 3 objects taken ~1s apart.
- [x] `all-smi snapshot --output /tmp/x.json` writes to that file and prints nothing to stdout.
- [x] Hard failure (e.g., all readers error) exits 1; flag parse error exits 2.
- [ ] Works on all supported platforms — macOS, Linux (NVIDIA/AMD/Jetson/Gaudi/TPU/Tenstorrent/Rebellions/Furiosa where applicable), Windows.
- [x] Integration test under `tests/snapshot_test.rs` covering JSON and CSV against mock readers.
- [x] README gains a "Scripting / CI" section with examples (`jq`, Slurm epilog).

## Edge cases & non-goals

- Piping to `jq` must work (no ANSI colors, no TTY probes on `--format json`). `--pretty` default is on when stdout is a TTY and off otherwise.
- `--query` dot paths into nested `detail` HashMap entries must not panic on missing keys (emit empty cell / `null`).
- Slow readers (TPU, Gaudi) must respect `--timeout-ms` and surface the failure in `errors` rather than hanging.
- Running on macOS without sudo must still emit what it can (CPU/memory/chassis) and note missing GPU access in `errors`.
- Non-goal: long-term metric scraping — that is what `api` mode is for.
- Non-goal: streaming — see the SSE companion issue.

## Soft dependencies

- Issue "SSE streaming endpoint" reuses this JSON schema.
- Issue "Agentless SSH mode" remotely executes `all-smi snapshot --format json` as its primary path (with a `nvidia-smi` CSV shim as fallback).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add 'snapshot' subcommand for one-shot JSON/CSV/Prometheus output #185

Summary

Motivation

Current state

Proposed design

Implementation plan

Acceptance criteria

Edge cases & non-goals

Soft dependencies

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat: add 'snapshot' subcommand for one-shot JSON/CSV/Prometheus output #185

Description

Summary

Motivation

Current state

Proposed design

Implementation plan

Acceptance criteria

Edge cases & non-goals

Soft dependencies

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions