Skip to content

feat(fit-params): --fit-print-plan emits per-device byte plan as JSON (#66 step 2 prep)#72

Merged
marksverdhei merged 2 commits into
htfrom
feat/fit-params-print-plan
Jun 12, 2026
Merged

feat(fit-params): --fit-print-plan emits per-device byte plan as JSON (#66 step 2 prep)#72
marksverdhei merged 2 commits into
htfrom
feat/fit-params-print-plan

Conversation

@marksverdhei

Copy link
Copy Markdown

Stacks on PR #69. Step 2 prep work for issue #66.

Why

Router-side admit decision (#66) needs the per-device byte demand for a candidate model BEFORE spawning the child subprocess. Architectural decision on issue #66 (captured in task ggml-org#123): use the out-of-process subprocess approachtools/fit-params already runs the same fit logic in-process; just need a subprocess-friendly output format.

What

  • New CLI flag --fit-print-plan (env LLAMA_ARG_FIT_PRINT_PLAN).
  • Emits single-line JSON on success:
    {"per_device_bytes":[N0,N1,...],"n_devices":K,"total_bytes":T}
    
  • Emits JSON failure marker on fit failure (subprocess callers can distinguish from parse failures):
    {"error":"fit_failed","status":N}
    
  • common/fit.cpp: also populate out_bytes_per_device at the three "no changes needed" early-return paths. PR feat(fit): expose per-device byte plan from common_fit_params (#66 step 1) #69 only populated the main return points; for models that fit without adjustment (the common case), the plan was empty.
  • common/fit.h: doc string corrected — plan covers GPU/accel devices only, CPU host memory NOT included, plan empty for CPU-only builds.
  • Mutually exclusive with the existing --fit-print mode; if both set, --fit-print-plan wins.

Verified

  • ✅ Build clean (cmake --build build --target llama-fit-params)
  • --help renders the new flag
  • ✅ CPU-only smoke returns {"per_device_bytes":[],"n_devices":0,"total_bytes":0} (correct — no GPU demand for CPU build)
  • ⏳ GPU verification deferred to snoop-kube's canary cycle (local GPU busy with centurion container)

Usage from the future router

When picking up step 2 in server_models::compute_admit_plan(name):

subprocess: tools/fit-params --fit-print-plan <model args from preset>
parse the single JSON line
plan[i] = bytes for the i-th GPU/accel device (router uses tensor_split order)

Sequence

  1. PR feat(fit): expose per-device byte plan from common_fit_params (#66 step 1) #69out_bytes_per_device foundation (open)
  2. This PR — fit-params CLI surface (depends on feat(fit): expose per-device byte plan from common_fit_params (#66 step 1) #69)
  3. server_models::compute_admit_plan helper + reserved[] state (depends on this)
  4. Shadow-mode admit logging (depends on 3)
  5. Flip to enforcement + LRU evict (depends on 4, needs canary)

Each is small and individually shippable.

Base automatically changed from feat/fit-per-device-plan to ht June 12, 2026 18:36
marksverdhei added a commit that referenced this pull request Jun 12, 2026
…rage) (#75)

PR #72 added the --fit-print-plan flag to llama-fit-params without test
coverage. This adds a tools/fit-params/tests.sh (pattern lifted from
tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and
verifies six invariants:

1. success-path emits single-line JSON
2. schema has per_device_bytes / n_devices / total_bytes with correct types
3. len(per_device_bytes) == n_devices
4. total_bytes == sum(per_device_bytes)
5. on CPU-only builds (n_devices==0): plan is empty, total is 0
6. fit-failure (nonexistent model) emits the documented "error":"fit_failed"
   JSON marker on stdout (not garbage) so subprocess callers can
   distinguish fit-failure from parse-failure

Run with: tools/fit-params/tests.sh path/to/build/bin

Verified locally on CPU-only build: ALL fit-params --fit-print-plan
smoke tests PASSED.
marksverdhei and others added 2 commits June 12, 2026 20:40
…#66 step 2 prep)

The router's per-GPU admit decision (#66) needs the per-device byte
demand for a candidate model BEFORE spawning the child subprocess.
PR #69 added the underlying `out_bytes_per_device` output to
`common_fit_params`; this PR exposes it via the existing
`tools/fit-params` CLI as a subprocess-friendly JSON output.

* New CLI flag `--fit-print-plan` (env `LLAMA_ARG_FIT_PRINT_PLAN`).
* On success, prints a single-line JSON object on stdout:
    {"per_device_bytes":[N0,N1,...],"n_devices":K,"total_bytes":T}
  plan[i] = i-th GPU/accel device, same order as tensor_split; CPU
  host memory NOT included. Empty plan for CPU-only builds.
* On fit failure, emits an explicit JSON failure marker and exits 1:
    {"error":"fit_failed","status":N}
* common/fit.cpp: populate `out_bytes_per_device` at the three early-
  return paths (the impl had three 'no changes needed' fast-paths that
  bypassed the main return point where PR #69 wrote the plan). Doc
  string in common/fit.h corrected — plan covers GPU devices only.

Designed to be subprocessed from `server_models::compute_admit_plan(name)`
(#66 step 2 — out-of-process approach per the architectural call on
issue #66 / task ggml-org#123). The router parses this JSON, tracks
`reserved[d]` for in-flight LOADING models, admits candidates against
`live_cudaMemGetInfo(d) - reserved[d]`. Mutually exclusive with the
existing `--fit-print` mode; if both set, `--fit-print-plan` wins.

Local CPU build verified: `--help` renders the new flag, empty plan
returned for CPU-only build as expected. GPU verification deferred to
snoop-kube's canary-cycle.
…rage) (#75)

PR #72 added the --fit-print-plan flag to llama-fit-params without test
coverage. This adds a tools/fit-params/tests.sh (pattern lifted from
tools/gguf-split/tests.sh) that downloads a small Qwen3-0.6B GGUF and
verifies six invariants:

1. success-path emits single-line JSON
2. schema has per_device_bytes / n_devices / total_bytes with correct types
3. len(per_device_bytes) == n_devices
4. total_bytes == sum(per_device_bytes)
5. on CPU-only builds (n_devices==0): plan is empty, total is 0
6. fit-failure (nonexistent model) emits the documented "error":"fit_failed"
   JSON marker on stdout (not garbage) so subprocess callers can
   distinguish fit-failure from parse-failure

Run with: tools/fit-params/tests.sh path/to/build/bin

Verified locally on CPU-only build: ALL fit-params --fit-print-plan
smoke tests PASSED.
@marksverdhei marksverdhei force-pushed the feat/fit-params-print-plan branch from 62e2afe to 005d9b0 Compare June 12, 2026 18:40
@marksverdhei marksverdhei merged commit 68e31e9 into ht Jun 12, 2026
3 of 7 checks passed
@marksverdhei marksverdhei deleted the feat/fit-params-print-plan branch June 12, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant