feat(self-host): worker advertises non-typed metadata siblings (gh-8749)#9707
Conversation
WalkthroughThis PR extends model deployment card handling to harvest and advertise additional metadata files alongside typed slots during self-hosting. It adds an ChangesExtra metadata files harvesting for self-hosted models
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Three dynamo-ops review threads on #9707: 1. Name-only LocalModel placeholders have an empty `full_path` and would fail on `read_dir("")` once self-host is enabled. Skip the harvest when `model_dir` isn't a directory. 2. `DirEntry::file_type()` and `DirEntry::metadata()` use lstat on Unix — they do not follow symlinks. HF snapshot dirs are full of symlinks pointing into `blobs/`, so we'd skip the very files we want to harvest. Switch to `Path::is_file()` which calls `fs::metadata` and follows symlinks. 3. `mdcsum()` previously omitted `extra_files`, so two workers with different siblings published the same checksum and the frontend cache served stale metadata. Hash each extra file's checksum in sorted order — sorting eliminates dependence on the harvest's filesystem-defined `read_dir` order. Extends the harvest test fixture with a symlink-to-real-file and adds a `returns_empty_for_missing_dir` test for the name-only path. Signed-off-by: nnshah1 <neelays@nvidia.com>
When `self_host_metadata` is enabled, the worker scans its model
directory and pushes non-weight, non-typed, non-ignored siblings
(`preprocessor_config.json`, `special_tokens_map.json`, ...) into a new
`ModelDeploymentCard.extra_files` field. Frontend pipeline is already
generic over slot count, so these ride the same resolve-and-cache path
as typed slots and land in `slug_dir`. External preprocessors that
load via `from_pretrained(slug_dir)` now see a complete model dir.
Weight-file and ignore checks delegate to modelexpress-common's
`HuggingFaceProvider::{is_weight_file, is_ignored}` so policy lives in
one place across mx download and dynamo's self-host harvest.
Signed-off-by: nnshah1 <neelays@nvidia.com>
- Sort harvested extras by filename so `mdcsum()` is reproducible
across filesystems whose `read_dir` order differs. Without this,
multi-replica deployments with byte-identical model dirs could
publish different MDCs and thrash the frontend cache.
- Switch `with_context` error formatters from `{:?}` to `Path::display()`
for consistency with surrounding code.
- Replace magic-5 capacity hint with `Self::TYPED_SLOT_COUNT` const
so future slot additions touch one place.
Signed-off-by: nnshah1 <neelays@nvidia.com>
Worker harvest was using `HuggingFaceProvider::is_weight_file` from mx, whose deny-list is narrower than the one used by the frontend's gh-8749 hf-sibling harvest (#9610): mx misses `.gguf`, `.onnx`, `.tflite`, `.pt`, `.pth`. That asymmetry meant a stray PyTorch / ONNX / gguf weight in the model dir would slip past the worker filter, get advertised as an `extra_files` entry, and ride the metadata HTTP path to the frontend — turning a multi-GB weight blob into a config-cache resident. Promote `model_card::is_weight_file` to `pub(crate)` and call it from the worker harvest. Single source of truth for "is this a weight?" across both gh-8749 harvests; no new constants. Bumps the worker harvest test to also assert `.pt` is filtered. Signed-off-by: nnshah1 <neelays@nvidia.com>
Three dynamo-ops review threads on #9707: 1. Name-only LocalModel placeholders have an empty `full_path` and would fail on `read_dir("")` once self-host is enabled. Skip the harvest when `model_dir` isn't a directory. 2. `DirEntry::file_type()` and `DirEntry::metadata()` use lstat on Unix — they do not follow symlinks. HF snapshot dirs are full of symlinks pointing into `blobs/`, so we'd skip the very files we want to harvest. Switch to `Path::is_file()` which calls `fs::metadata` and follows symlinks. 3. `mdcsum()` previously omitted `extra_files`, so two workers with different siblings published the same checksum and the frontend cache served stale metadata. Hash each extra file's checksum in sorted order — sorting eliminates dependence on the harvest's filesystem-defined `read_dir` order. Extends the harvest test fixture with a symlink-to-real-file and adds a `returns_empty_for_missing_dir` test for the name-only path. Signed-off-by: nnshah1 <neelays@nvidia.com>
…8749) Extend the mocker self-host test to also `GET /v1/metadata/{slug}/_base/vocab.json` — `vocab.json` is a non-typed sibling in the Qwen3 HF snapshot, so its presence in the route proves: - `harvest_extra_files` picked it up - the runtime registry registered it under `(slug, suffix, filename)` - the system_status_server route serves it Also exercises the symlink-follow fix from `048ba5af0c`: every entry in HF snapshot dirs is a symlink into `blobs/`, so a non-following stat would skip these files. Signed-off-by: nnshah1 <neelays@nvidia.com>
- `move_to_self_host`: drop stale "(mx's deny-list)" attribution; the weight-file check now goes through `model_card::is_weight_file`. - `mdcsum()`: compress the 6-line extras-hashing rationale to 3. No behavior change. Signed-off-by: nnshah1 <neelays@nvidia.com>
05c037f to
e602df6
Compare
Hashing only the content checksum means two workers advertising the same bytes under different filenames produce identical mdcsums, so the frontend cache aliases them and the slug_dir resolved for worker A is reused for worker B (with the wrong sibling names). Fold sorted (basename, checksum) pairs into the mdcsum input. NUL separator between fields prevents a basename-vs-hash boundary collision. Tests: two new unit tests cover (a) basename distinguishes equal-byte extras, (b) read-order independence is preserved after the sort. Signed-off-by: nnshah1 <neelays@nvidia.com>
|
@nnshah1 What decides if something is Could we move everything into |
Per graham-code-review: - mdcsum() already sorts extras by (basename, checksum), so harvest_extra_files's filename sort is dead weight — drop the sort and the doc claim it justified. - is_weight_file doc: trim to a one-liner listing callers; design rationale belongs in the PR description. - extra_files field doc: rewrite tighter, drop AI-flavored phrasing. Signed-off-by: nnshah1 <neelays@nvidia.com>
…namogh-8749) (ai-dynamo#9707) Signed-off-by: nnshah1 <neelays@nvidia.com>
) The frontend's `download_config` pipeline already resolves every typed MDC file and harvested sibling into a content-addressed `slug_dir` with blake3-verified copies (#9610 + #9707). The vllm and sglang native preprocessor factories were nonetheless calling `fetch_model` against the worker URI on top of that — a redundant HF download whenever the worker-side path wasn't reachable from the frontend pod. Expose the resolve target on the MDC: * Rust: `ModelDeploymentCard::local_dir() -> PathBuf` — pure path computation matching `mdc_slug_dir(slug, mdcsum)`. * Python binding + .pyi stub: `mdc.local_dir() -> str`. Both engine factories now use `mdc.local_dir()` directly: * `components/src/dynamo/frontend/vllm_processor.py` * `components/src/dynamo/frontend/sglang_processor.py` Drops the unused `fetch_model` import in both. Adds one Rust unit test asserting `local_dir()` produces the same path the resolve pipeline writes into. Signed-off-by: nnshah1 <neelays@nvidia.com>
) The frontend's `download_config` pipeline already resolves every typed MDC file and harvested sibling into a content-addressed `slug_dir` with blake3-verified copies (#9610 + #9707). The vllm and sglang native preprocessor factories were nonetheless calling `fetch_model` against the worker URI on top of that — a redundant HF download whenever the worker-side path wasn't reachable from the frontend pod. Expose the resolve target on the MDC: * Rust: `ModelDeploymentCard::local_dir() -> PathBuf` — pure path computation matching `mdc_slug_dir(slug, mdcsum)`. * Python binding + .pyi stub: `mdc.local_dir() -> str`. Both engine factories now use `mdc.local_dir()` directly: * `components/src/dynamo/frontend/vllm_processor.py` * `components/src/dynamo/frontend/sglang_processor.py` Drops the unused `fetch_model` import in both. Adds one Rust unit test asserting `local_dir()` produces the same path the resolve pipeline writes into. Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview
Worker side of the multimodal-completeness gap left after #9610 (frontend HF-sibling harvest). When
self_host_metadatais enabled, the worker now advertises non-weight, non-typed, non-ignored sibling files (preprocessor_config.json,special_tokens_map.json,added_tokens.json, …) via a newModelDeploymentCard.extra_filesfield. Frontend's existingresolve_metadata_filespipeline is generic over slot count, so these resolve, verify-and-cache, and symlink intoslug_direxactly like the typed slots.Why now
#9610 covers HF-backed models by harvesting siblings from
hf_snapshotsat the frontend. For custom/private worker dirs without an HF source, the only way to surface those files is for the worker to advertise them. This closes the last hole before flippingDYN_SELF_HOST_METADATAdefault ON.Details
ModelDeploymentCard.extra_files: Vec<CheckedFile>—#[serde(default, skip_serializing_if = "Vec::is_empty")], so wire-format roundtrip with older peers is invisible.iter_metadata_files{,_mut}now yieldextra_filesafter typed slots; downstream resolve / verify / cache loops pick them up with zero changes.local_model.rsis non-recursive (no surprise walks into nested checkpoints). "Is this junk?" delegates tomodelexpress_common::providers::HuggingFaceProvider::is_ignoredso the dotfile/README deny-list lives in one place. "Is this a weight?" usesmodel_card::is_weight_file— wider than mx's list (covers.pt/.gguf/.onnx/.tflite/.h5/.pth/.msgpackin addition to.safetensors/.bin) so a stray weight can't ride the metadata HTTP path..safetensors.index.jsonis correctly kept (extension is.json).move_to_self_hostbefore the existing http-rewrite loop, so harvested files get the same/v1/metadata/{slug}/{suffix}/{filename}rewrite + registry registration as typed slots.Test
One pure unit test on the harvest helper: tmpdir with one typed file, one weight, one dotfile, one README, two genuine extras, one subdir — asserts only the two extras land. Wire-format back-compat is statically guaranteed by
#[serde(default)](no separate roundtrip test needed).Out of scope
DYN_SELF_HOST_METADATAdefault flip — still gated on auto-cleanup-on-detach landing (PR4 capstone).CheckedFile::from_disktime but no upper bound enforced. Frontend's existing resolve-side caps still apply.Related
Summary by CodeRabbit