Skip to content

test(fit-params): smoke for --fit-print-plan JSON output#75

Merged
marksverdhei merged 1 commit into
feat/fit-params-print-planfrom
test/fit-params-print-plan
Jun 12, 2026
Merged

test(fit-params): smoke for --fit-print-plan JSON output#75
marksverdhei merged 1 commit into
feat/fit-params-print-planfrom
test/fit-params-print-plan

Conversation

@marksverdhei

Copy link
Copy Markdown

Epoch #73 task 2. PR #72 coverage gap: the --fit-print-plan flag landed without tests.

Adds tools/fit-params/tests.sh (pattern from tools/gguf-split/tests.sh). 6 invariants:

  1. success emits single-line JSON
  2. schema: per_device_bytes (list), n_devices (int), total_bytes (int)
  3. len(per_device_bytes) == n_devices
  4. total_bytes == sum(per_device_bytes)
  5. CPU-only build → empty plan + total=0
  6. fit-failure emits {"error":"fit_failed"} marker (not garbage)

Local run on CPU-only build: ALL PASSED.

Stacks on #72 (depends on the new flag); merge after #72.

…rage)

PR #72 added the --fit-print-plan flag to llama-fit-params without test
coverage. This adds a tools/fit-params/tests.sh (pattern lifted from
tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and
verifies six invariants:

1. success-path emits single-line JSON
2. schema has per_device_bytes / n_devices / total_bytes with correct types
3. len(per_device_bytes) == n_devices
4. total_bytes == sum(per_device_bytes)
5. on CPU-only builds (n_devices==0): plan is empty, total is 0
6. fit-failure (nonexistent model) emits the documented "error":"fit_failed"
   JSON marker on stdout (not garbage) so subprocess callers can
   distinguish fit-failure from parse-failure

Run with: tools/fit-params/tests.sh path/to/build/bin

Verified locally on CPU-only build: ALL fit-params --fit-print-plan
smoke tests PASSED.
@marksverdhei marksverdhei merged commit 62e2afe into feat/fit-params-print-plan Jun 12, 2026
@marksverdhei marksverdhei deleted the test/fit-params-print-plan branch June 12, 2026 18:37
marksverdhei added a commit that referenced this pull request Jun 12, 2026
…rage) (#75)

PR #72 added the --fit-print-plan flag to llama-fit-params without test
coverage. This adds a tools/fit-params/tests.sh (pattern lifted from
tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and
verifies six invariants:

1. success-path emits single-line JSON
2. schema has per_device_bytes / n_devices / total_bytes with correct types
3. len(per_device_bytes) == n_devices
4. total_bytes == sum(per_device_bytes)
5. on CPU-only builds (n_devices==0): plan is empty, total is 0
6. fit-failure (nonexistent model) emits the documented "error":"fit_failed"
   JSON marker on stdout (not garbage) so subprocess callers can
   distinguish fit-failure from parse-failure

Run with: tools/fit-params/tests.sh path/to/build/bin

Verified locally on CPU-only build: ALL fit-params --fit-print-plan
smoke tests PASSED.
marksverdhei added a commit that referenced this pull request Jun 12, 2026
…#66 step 2 prep) (#72)

* feat(fit-params): --fit-print-plan emits per-device byte plan as JSON (#66 step 2 prep)

The router's per-GPU admit decision (#66) needs the per-device byte
demand for a candidate model BEFORE spawning the child subprocess.
PR #69 added the underlying `out_bytes_per_device` output to
`common_fit_params`; this PR exposes it via the existing
`tools/fit-params` CLI as a subprocess-friendly JSON output.

* New CLI flag `--fit-print-plan` (env `LLAMA_ARG_FIT_PRINT_PLAN`).
* On success, prints a single-line JSON object on stdout:
    {"per_device_bytes":[N0,N1,...],"n_devices":K,"total_bytes":T}
  plan[i] = i-th GPU/accel device, same order as tensor_split; CPU
  host memory NOT included. Empty plan for CPU-only builds.
* On fit failure, emits an explicit JSON failure marker and exits 1:
    {"error":"fit_failed","status":N}
* common/fit.cpp: populate `out_bytes_per_device` at the three early-
  return paths (the impl had three 'no changes needed' fast-paths that
  bypassed the main return point where PR #69 wrote the plan). Doc
  string in common/fit.h corrected — plan covers GPU devices only.

Designed to be subprocessed from `server_models::compute_admit_plan(name)`
(#66 step 2 — out-of-process approach per the architectural call on
issue #66 / task ggml-org#123). The router parses this JSON, tracks
`reserved[d]` for in-flight LOADING models, admits candidates against
`live_cudaMemGetInfo(d) - reserved[d]`. Mutually exclusive with the
existing `--fit-print` mode; if both set, `--fit-print-plan` wins.

Local CPU build verified: `--help` renders the new flag, empty plan
returned for CPU-only build as expected. GPU verification deferred to
snoop-kube's canary-cycle.

* test(fit-params): smoke for --fit-print-plan JSON output (PR #72 coverage) (#75)

PR #72 added the --fit-print-plan flag to llama-fit-params without test
coverage. This adds a tools/fit-params/tests.sh (pattern lifted from
tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and
verifies six invariants:

1. success-path emits single-line JSON
2. schema has per_device_bytes / n_devices / total_bytes with correct types
3. len(per_device_bytes) == n_devices
4. total_bytes == sum(per_device_bytes)
5. on CPU-only builds (n_devices==0): plan is empty, total is 0
6. fit-failure (nonexistent model) emits the documented "error":"fit_failed"
   JSON marker on stdout (not garbage) so subprocess callers can
   distinguish fit-failure from parse-failure

Run with: tools/fit-params/tests.sh path/to/build/bin

Verified locally on CPU-only build: ALL fit-params --fit-print-plan
smoke tests PASSED.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant