Skip to content

feat(self-host): worker advertises non-typed metadata siblings (gh-8749)#9707

Merged
nnshah1 merged 8 commits into
mainfrom
nnshah1/gh-8749-mdc-extra-files
May 27, 2026
Merged

feat(self-host): worker advertises non-typed metadata siblings (gh-8749)#9707
nnshah1 merged 8 commits into
mainfrom
nnshah1/gh-8749-mdc-extra-files

Conversation

@nnshah1

@nnshah1 nnshah1 commented May 18, 2026

Copy link
Copy Markdown
Contributor

Overview

Worker side of the multimodal-completeness gap left after #9610 (frontend HF-sibling harvest). When self_host_metadata is enabled, the worker now advertises non-weight, non-typed, non-ignored sibling files (preprocessor_config.json, special_tokens_map.json, added_tokens.json, …) via a new ModelDeploymentCard.extra_files field. Frontend's existing resolve_metadata_files pipeline is generic over slot count, so these resolve, verify-and-cache, and symlink into slug_dir exactly like the typed slots.

Why now

#9610 covers HF-backed models by harvesting siblings from hf_snapshots at the frontend. For custom/private worker dirs without an HF source, the only way to surface those files is for the worker to advertise them. This closes the last hole before flipping DYN_SELF_HOST_METADATA default ON.

Details

  • ModelDeploymentCard.extra_files: Vec<CheckedFile>#[serde(default, skip_serializing_if = "Vec::is_empty")], so wire-format roundtrip with older peers is invisible.
  • iter_metadata_files{,_mut} now yield extra_files after typed slots; downstream resolve / verify / cache loops pick them up with zero changes.
  • Harvest helper in local_model.rs is non-recursive (no surprise walks into nested checkpoints). "Is this junk?" delegates to modelexpress_common::providers::HuggingFaceProvider::is_ignored so the dotfile/README deny-list lives in one place. "Is this a weight?" uses model_card::is_weight_file — wider than mx's list (covers .pt / .gguf / .onnx / .tflite / .h5 / .pth / .msgpack in addition to .safetensors / .bin) so a stray weight can't ride the metadata HTTP path. .safetensors.index.json is correctly kept (extension is .json).
  • Called from move_to_self_host before the existing http-rewrite loop, so harvested files get the same /v1/metadata/{slug}/{suffix}/{filename} rewrite + registry registration as typed slots.

Test

One pure unit test on the harvest helper: tmpdir with one typed file, one weight, one dotfile, one README, two genuine extras, one subdir — asserts only the two extras land. Wire-format back-compat is statically guaranteed by #[serde(default)] (no separate roundtrip test needed).

Out of scope

  • DYN_SELF_HOST_METADATA default flip — still gated on auto-cleanup-on-detach landing (PR4 capstone).
  • Recursive harvest — current scan is the model dir only.
  • Per-file size caps for extras — not added; size is captured at CheckedFile::from_disk time but no upper bound enforced. Frontend's existing resolve-side caps still apply.

Related

Summary by CodeRabbit

  • New Features
    • Self-hosting now automatically discovers and manages additional metadata files (such as preprocessor configs and special token mappings) alongside core model components, ensuring complete model configurations are properly cached and deployed.

Review Change Stack


Open in Devin Review

@nnshah1 nnshah1 requested a review from a team as a code owner May 18, 2026 19:16
@github-actions github-actions Bot added the feat label May 18, 2026
@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This PR extends model deployment card handling to harvest and advertise additional metadata files alongside typed slots during self-hosting. It adds an extra_files field to ModelDeploymentCard, integrates it into the metadata iteration APIs, and implements directory scanning in move_to_self_host to discover and populate eligible files while filtering weights, typed names, and ignored entries.

Changes

Extra metadata files harvesting for self-hosted models

Layer / File(s) Summary
Extra files field and metadata iteration extension
lib/llm/src/model_card.rs
Added extra_files: Vec<CheckedFile> to ModelDeploymentCard struct. Extended iter_metadata_files() documentation and implementation to yield extra_files after typed slots, and updated iter_metadata_files_mut() capacity calculation and append logic to mirror the ordering.
Harvesting helper and self-host integration
lib/llm/src/local_model.rs
Added HashSet and imports for HuggingFaceProvider and CheckedFile. Implemented harvest_extra_files() helper to scan model directory non-recursively, filtering out weights, typed filenames, dotfiles/README, and subdirectories. Integrated harvesting into move_to_self_host to compute typed filenames from existing metadata and append discovered extras to card.extra_files before registration. Added unit tests validating correct filtering of typed entries, weights, ignored files, and subdirectories.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding worker-side advertisement of non-typed metadata sibling files for self-hosted models.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering all template sections with clear details about changes, motivation, implementation specifics, testing, and scope boundaries.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread lib/llm/src/local_model.rs
Comment thread lib/llm/src/local_model.rs Outdated
Comment thread lib/llm/src/local_model.rs Outdated
Comment thread lib/llm/src/model_card.rs
nnshah1 added a commit that referenced this pull request May 18, 2026
Three dynamo-ops review threads on #9707:

1. Name-only LocalModel placeholders have an empty `full_path` and
   would fail on `read_dir("")` once self-host is enabled. Skip the
   harvest when `model_dir` isn't a directory.

2. `DirEntry::file_type()` and `DirEntry::metadata()` use lstat on
   Unix — they do not follow symlinks. HF snapshot dirs are full of
   symlinks pointing into `blobs/`, so we'd skip the very files we
   want to harvest. Switch to `Path::is_file()` which calls
   `fs::metadata` and follows symlinks.

3. `mdcsum()` previously omitted `extra_files`, so two workers with
   different siblings published the same checksum and the frontend
   cache served stale metadata. Hash each extra file's checksum in
   sorted order — sorting eliminates dependence on the harvest's
   filesystem-defined `read_dir` order.

Extends the harvest test fixture with a symlink-to-real-file and adds
a `returns_empty_for_missing_dir` test for the name-only path.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 requested review from a team as code owners May 19, 2026 01:38
nnshah1 added 6 commits May 19, 2026 22:09
When `self_host_metadata` is enabled, the worker scans its model
directory and pushes non-weight, non-typed, non-ignored siblings
(`preprocessor_config.json`, `special_tokens_map.json`, ...) into a new
`ModelDeploymentCard.extra_files` field. Frontend pipeline is already
generic over slot count, so these ride the same resolve-and-cache path
as typed slots and land in `slug_dir`. External preprocessors that
load via `from_pretrained(slug_dir)` now see a complete model dir.

Weight-file and ignore checks delegate to modelexpress-common's
`HuggingFaceProvider::{is_weight_file, is_ignored}` so policy lives in
one place across mx download and dynamo's self-host harvest.

Signed-off-by: nnshah1 <neelays@nvidia.com>
- Sort harvested extras by filename so `mdcsum()` is reproducible
  across filesystems whose `read_dir` order differs. Without this,
  multi-replica deployments with byte-identical model dirs could
  publish different MDCs and thrash the frontend cache.
- Switch `with_context` error formatters from `{:?}` to `Path::display()`
  for consistency with surrounding code.
- Replace magic-5 capacity hint with `Self::TYPED_SLOT_COUNT` const
  so future slot additions touch one place.

Signed-off-by: nnshah1 <neelays@nvidia.com>
Worker harvest was using `HuggingFaceProvider::is_weight_file` from
mx, whose deny-list is narrower than the one used by the frontend's
gh-8749 hf-sibling harvest (#9610): mx misses `.gguf`, `.onnx`,
`.tflite`, `.pt`, `.pth`. That asymmetry meant a stray PyTorch /
ONNX / gguf weight in the model dir would slip past the worker
filter, get advertised as an `extra_files` entry, and ride the
metadata HTTP path to the frontend — turning a multi-GB weight
blob into a config-cache resident.

Promote `model_card::is_weight_file` to `pub(crate)` and call it
from the worker harvest. Single source of truth for "is this a
weight?" across both gh-8749 harvests; no new constants.

Bumps the worker harvest test to also assert `.pt` is filtered.

Signed-off-by: nnshah1 <neelays@nvidia.com>
Three dynamo-ops review threads on #9707:

1. Name-only LocalModel placeholders have an empty `full_path` and
   would fail on `read_dir("")` once self-host is enabled. Skip the
   harvest when `model_dir` isn't a directory.

2. `DirEntry::file_type()` and `DirEntry::metadata()` use lstat on
   Unix — they do not follow symlinks. HF snapshot dirs are full of
   symlinks pointing into `blobs/`, so we'd skip the very files we
   want to harvest. Switch to `Path::is_file()` which calls
   `fs::metadata` and follows symlinks.

3. `mdcsum()` previously omitted `extra_files`, so two workers with
   different siblings published the same checksum and the frontend
   cache served stale metadata. Hash each extra file's checksum in
   sorted order — sorting eliminates dependence on the harvest's
   filesystem-defined `read_dir` order.

Extends the harvest test fixture with a symlink-to-real-file and adds
a `returns_empty_for_missing_dir` test for the name-only path.

Signed-off-by: nnshah1 <neelays@nvidia.com>
…8749)

Extend the mocker self-host test to also `GET /v1/metadata/{slug}/_base/vocab.json` — `vocab.json` is a non-typed sibling
in the Qwen3 HF snapshot, so its presence in the route proves:

- `harvest_extra_files` picked it up
- the runtime registry registered it under `(slug, suffix, filename)`
- the system_status_server route serves it

Also exercises the symlink-follow fix from `048ba5af0c`: every entry
in HF snapshot dirs is a symlink into `blobs/`, so a non-following
stat would skip these files.

Signed-off-by: nnshah1 <neelays@nvidia.com>
- `move_to_self_host`: drop stale "(mx's deny-list)" attribution; the
  weight-file check now goes through `model_card::is_weight_file`.
- `mdcsum()`: compress the 6-line extras-hashing rationale to 3.

No behavior change.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 force-pushed the nnshah1/gh-8749-mdc-extra-files branch from 05c037f to e602df6 Compare May 20, 2026 13:51
@nnshah1 nnshah1 enabled auto-merge (squash) May 20, 2026 14:55
Comment thread lib/llm/src/model_card.rs Outdated
Hashing only the content checksum means two workers advertising the
same bytes under different filenames produce identical mdcsums, so the
frontend cache aliases them and the slug_dir resolved for worker A is
reused for worker B (with the wrong sibling names). Fold sorted
(basename, checksum) pairs into the mdcsum input. NUL separator
between fields prevents a basename-vs-hash boundary collision.

Tests: two new unit tests cover (a) basename distinguishes equal-byte
extras, (b) read-order independence is preserved after the sort.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@grahamking

Copy link
Copy Markdown
Contributor

@nnshah1 What decides if something is extra_files or it has it's own field?

Could we move everything into files? Or should the extra_files have their own fields?

Comment thread lib/llm/src/local_model.rs Outdated
Comment thread lib/llm/src/local_model.rs Outdated
Comment thread lib/llm/src/model_card.rs Outdated
Comment thread lib/llm/src/model_card.rs Outdated
Per graham-code-review:
- mdcsum() already sorts extras by (basename, checksum), so harvest_extra_files's filename sort is dead weight — drop the sort and the doc claim it justified.
- is_weight_file doc: trim to a one-liner listing callers; design rationale belongs in the PR description.
- extra_files field doc: rewrite tighter, drop AI-flavored phrasing.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 merged commit 53aa9b7 into main May 27, 2026
101 checks passed
@nnshah1 nnshah1 deleted the nnshah1/gh-8749-mdc-extra-files branch May 27, 2026 05:12
MartinRepo pushed a commit to MartinRepo/dynamo that referenced this pull request May 28, 2026
nnshah1 added a commit that referenced this pull request Jun 5, 2026
)

The frontend's `download_config` pipeline already resolves every typed
MDC file and harvested sibling into a content-addressed `slug_dir` with
blake3-verified copies (#9610 + #9707). The vllm and sglang native
preprocessor factories were nonetheless calling `fetch_model` against
the worker URI on top of that — a redundant HF download whenever the
worker-side path wasn't reachable from the frontend pod.

Expose the resolve target on the MDC:
  * Rust: `ModelDeploymentCard::local_dir() -> PathBuf` — pure path
    computation matching `mdc_slug_dir(slug, mdcsum)`.
  * Python binding + .pyi stub: `mdc.local_dir() -> str`.

Both engine factories now use `mdc.local_dir()` directly:
  * `components/src/dynamo/frontend/vllm_processor.py`
  * `components/src/dynamo/frontend/sglang_processor.py`

Drops the unused `fetch_model` import in both. Adds one Rust unit test
asserting `local_dir()` produces the same path the resolve pipeline
writes into.

Signed-off-by: nnshah1 <neelays@nvidia.com>
nnshah1 added a commit that referenced this pull request Jun 10, 2026
)

The frontend's `download_config` pipeline already resolves every typed
MDC file and harvested sibling into a content-addressed `slug_dir` with
blake3-verified copies (#9610 + #9707). The vllm and sglang native
preprocessor factories were nonetheless calling `fetch_model` against
the worker URI on top of that — a redundant HF download whenever the
worker-side path wasn't reachable from the frontend pod.

Expose the resolve target on the MDC:
  * Rust: `ModelDeploymentCard::local_dir() -> PathBuf` — pure path
    computation matching `mdc_slug_dir(slug, mdcsum)`.
  * Python binding + .pyi stub: `mdc.local_dir() -> str`.

Both engine factories now use `mdc.local_dir()` directly:
  * `components/src/dynamo/frontend/vllm_processor.py`
  * `components/src/dynamo/frontend/sglang_processor.py`

Drops the unused `fetch_model` import in both. Adds one Rust unit test
asserting `local_dir()` produces the same path the resolve pipeline
writes into.

Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants