test(fit-params): smoke for --fit-print-plan JSON output by marksverdhei · Pull Request #75 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-05T09:49:56Z

Epoch #73 task 2. PR #72 coverage gap: the --fit-print-plan flag landed without tests.

Adds tools/fit-params/tests.sh (pattern from tools/gguf-split/tests.sh). 6 invariants:

success emits single-line JSON
schema: per_device_bytes (list), n_devices (int), total_bytes (int)
len(per_device_bytes) == n_devices
total_bytes == sum(per_device_bytes)
CPU-only build → empty plan + total=0
fit-failure emits {"error":"fit_failed"} marker (not garbage)

Local run on CPU-only build: ALL PASSED.

Stacks on #72 (depends on the new flag); merge after #72.

…rage) PR #72 added the --fit-print-plan flag to llama-fit-params without test coverage. This adds a tools/fit-params/tests.sh (pattern lifted from tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and verifies six invariants: 1. success-path emits single-line JSON 2. schema has per_device_bytes / n_devices / total_bytes with correct types 3. len(per_device_bytes) == n_devices 4. total_bytes == sum(per_device_bytes) 5. on CPU-only builds (n_devices==0): plan is empty, total is 0 6. fit-failure (nonexistent model) emits the documented "error":"fit_failed" JSON marker on stdout (not garbage) so subprocess callers can distinguish fit-failure from parse-failure Run with: tools/fit-params/tests.sh path/to/build/bin Verified locally on CPU-only build: ALL fit-params --fit-print-plan smoke tests PASSED.

…rage) (#75) PR #72 added the --fit-print-plan flag to llama-fit-params without test coverage. This adds a tools/fit-params/tests.sh (pattern lifted from tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and verifies six invariants: 1. success-path emits single-line JSON 2. schema has per_device_bytes / n_devices / total_bytes with correct types 3. len(per_device_bytes) == n_devices 4. total_bytes == sum(per_device_bytes) 5. on CPU-only builds (n_devices==0): plan is empty, total is 0 6. fit-failure (nonexistent model) emits the documented "error":"fit_failed" JSON marker on stdout (not garbage) so subprocess callers can distinguish fit-failure from parse-failure Run with: tools/fit-params/tests.sh path/to/build/bin Verified locally on CPU-only build: ALL fit-params --fit-print-plan smoke tests PASSED.

…#66 step 2 prep) (#72) * feat(fit-params): --fit-print-plan emits per-device byte plan as JSON (#66 step 2 prep) The router's per-GPU admit decision (#66) needs the per-device byte demand for a candidate model BEFORE spawning the child subprocess. PR #69 added the underlying `out_bytes_per_device` output to `common_fit_params`; this PR exposes it via the existing `tools/fit-params` CLI as a subprocess-friendly JSON output. * New CLI flag `--fit-print-plan` (env `LLAMA_ARG_FIT_PRINT_PLAN`). * On success, prints a single-line JSON object on stdout: {"per_device_bytes":[N0,N1,...],"n_devices":K,"total_bytes":T} plan[i] = i-th GPU/accel device, same order as tensor_split; CPU host memory NOT included. Empty plan for CPU-only builds. * On fit failure, emits an explicit JSON failure marker and exits 1: {"error":"fit_failed","status":N} * common/fit.cpp: populate `out_bytes_per_device` at the three early- return paths (the impl had three 'no changes needed' fast-paths that bypassed the main return point where PR #69 wrote the plan). Doc string in common/fit.h corrected — plan covers GPU devices only. Designed to be subprocessed from `server_models::compute_admit_plan(name)` (#66 step 2 — out-of-process approach per the architectural call on issue #66 / task ggml-org#123). The router parses this JSON, tracks `reserved[d]` for in-flight LOADING models, admits candidates against `live_cudaMemGetInfo(d) - reserved[d]`. Mutually exclusive with the existing `--fit-print` mode; if both set, `--fit-print-plan` wins. Local CPU build verified: `--help` renders the new flag, empty plan returned for CPU-only build as expected. GPU verification deferred to snoop-kube's canary-cycle. * test(fit-params): smoke for --fit-print-plan JSON output (PR #72 coverage) (#75) PR #72 added the --fit-print-plan flag to llama-fit-params without test coverage. This adds a tools/fit-params/tests.sh (pattern lifted from tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and verifies six invariants: 1. success-path emits single-line JSON 2. schema has per_device_bytes / n_devices / total_bytes with correct types 3. len(per_device_bytes) == n_devices 4. total_bytes == sum(per_device_bytes) 5. on CPU-only builds (n_devices==0): plan is empty, total is 0 6. fit-failure (nonexistent model) emits the documented "error":"fit_failed" JSON marker on stdout (not garbage) so subprocess callers can distinguish fit-failure from parse-failure Run with: tools/fit-params/tests.sh path/to/build/bin Verified locally on CPU-only build: ALL fit-params --fit-print-plan smoke tests PASSED.

This was referenced Jun 5, 2026

Hivemind Maintenance Tasks Epoch 1 #73

Closed

Hivemind Maintenance Tasks Epoch 2 #79

Closed

Hivemind Maintenance Tasks Epoch 3 #81

Closed

Hivemind Maintenance Tasks Epoch 4 #86

Closed

marksverdhei merged commit 62e2afe into feat/fit-params-print-plan Jun 12, 2026

marksverdhei deleted the test/fit-params-print-plan branch June 12, 2026 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(fit-params): smoke for --fit-print-plan JSON output#75

test(fit-params): smoke for --fit-print-plan JSON output#75
marksverdhei merged 1 commit into
feat/fit-params-print-planfrom
test/fit-params-print-plan

marksverdhei commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant