test(fit-params): smoke for --fit-print-plan JSON output#75
Merged
marksverdhei merged 1 commit intoJun 12, 2026
Merged
Conversation
…rage) PR #72 added the --fit-print-plan flag to llama-fit-params without test coverage. This adds a tools/fit-params/tests.sh (pattern lifted from tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and verifies six invariants: 1. success-path emits single-line JSON 2. schema has per_device_bytes / n_devices / total_bytes with correct types 3. len(per_device_bytes) == n_devices 4. total_bytes == sum(per_device_bytes) 5. on CPU-only builds (n_devices==0): plan is empty, total is 0 6. fit-failure (nonexistent model) emits the documented "error":"fit_failed" JSON marker on stdout (not garbage) so subprocess callers can distinguish fit-failure from parse-failure Run with: tools/fit-params/tests.sh path/to/build/bin Verified locally on CPU-only build: ALL fit-params --fit-print-plan smoke tests PASSED.
This was referenced Jun 5, 2026
marksverdhei
added a commit
that referenced
this pull request
Jun 12, 2026
…rage) (#75) PR #72 added the --fit-print-plan flag to llama-fit-params without test coverage. This adds a tools/fit-params/tests.sh (pattern lifted from tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and verifies six invariants: 1. success-path emits single-line JSON 2. schema has per_device_bytes / n_devices / total_bytes with correct types 3. len(per_device_bytes) == n_devices 4. total_bytes == sum(per_device_bytes) 5. on CPU-only builds (n_devices==0): plan is empty, total is 0 6. fit-failure (nonexistent model) emits the documented "error":"fit_failed" JSON marker on stdout (not garbage) so subprocess callers can distinguish fit-failure from parse-failure Run with: tools/fit-params/tests.sh path/to/build/bin Verified locally on CPU-only build: ALL fit-params --fit-print-plan smoke tests PASSED.
marksverdhei
added a commit
that referenced
this pull request
Jun 12, 2026
…#66 step 2 prep) (#72) * feat(fit-params): --fit-print-plan emits per-device byte plan as JSON (#66 step 2 prep) The router's per-GPU admit decision (#66) needs the per-device byte demand for a candidate model BEFORE spawning the child subprocess. PR #69 added the underlying `out_bytes_per_device` output to `common_fit_params`; this PR exposes it via the existing `tools/fit-params` CLI as a subprocess-friendly JSON output. * New CLI flag `--fit-print-plan` (env `LLAMA_ARG_FIT_PRINT_PLAN`). * On success, prints a single-line JSON object on stdout: {"per_device_bytes":[N0,N1,...],"n_devices":K,"total_bytes":T} plan[i] = i-th GPU/accel device, same order as tensor_split; CPU host memory NOT included. Empty plan for CPU-only builds. * On fit failure, emits an explicit JSON failure marker and exits 1: {"error":"fit_failed","status":N} * common/fit.cpp: populate `out_bytes_per_device` at the three early- return paths (the impl had three 'no changes needed' fast-paths that bypassed the main return point where PR #69 wrote the plan). Doc string in common/fit.h corrected — plan covers GPU devices only. Designed to be subprocessed from `server_models::compute_admit_plan(name)` (#66 step 2 — out-of-process approach per the architectural call on issue #66 / task ggml-org#123). The router parses this JSON, tracks `reserved[d]` for in-flight LOADING models, admits candidates against `live_cudaMemGetInfo(d) - reserved[d]`. Mutually exclusive with the existing `--fit-print` mode; if both set, `--fit-print-plan` wins. Local CPU build verified: `--help` renders the new flag, empty plan returned for CPU-only build as expected. GPU verification deferred to snoop-kube's canary-cycle. * test(fit-params): smoke for --fit-print-plan JSON output (PR #72 coverage) (#75) PR #72 added the --fit-print-plan flag to llama-fit-params without test coverage. This adds a tools/fit-params/tests.sh (pattern lifted from tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and verifies six invariants: 1. success-path emits single-line JSON 2. schema has per_device_bytes / n_devices / total_bytes with correct types 3. len(per_device_bytes) == n_devices 4. total_bytes == sum(per_device_bytes) 5. on CPU-only builds (n_devices==0): plan is empty, total is 0 6. fit-failure (nonexistent model) emits the documented "error":"fit_failed" JSON marker on stdout (not garbage) so subprocess callers can distinguish fit-failure from parse-failure Run with: tools/fit-params/tests.sh path/to/build/bin Verified locally on CPU-only build: ALL fit-params --fit-print-plan smoke tests PASSED.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Epoch #73 task 2. PR #72 coverage gap: the --fit-print-plan flag landed without tests.
Adds tools/fit-params/tests.sh (pattern from tools/gguf-split/tests.sh). 6 invariants:
Local run on CPU-only build: ALL PASSED.
Stacks on #72 (depends on the new flag); merge after #72.