Skip to content

feat(self-host): native preprocessor consumes resolved slug_dir (gh-8749)#10037

Merged
nnshah1 merged 6 commits into
mainfrom
nnshah1/gh-8749-native-preprocessor
Jun 10, 2026
Merged

feat(self-host): native preprocessor consumes resolved slug_dir (gh-8749)#10037
nnshah1 merged 6 commits into
mainfrom
nnshah1/gh-8749-native-preprocessor

Conversation

@nnshah1

@nnshah1 nnshah1 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Overview

vllm and sglang engine factories stop re-running fetch_model against the worker URI when initializing native preprocessor mode, and instead consume the already-resolved slug_dir that download_config populates on the frontend. This eliminates a redundant HF download on every model registration and is the natural follow-on to #9707 — the frontend now has every typed file plus harvested sibling (preprocessor_config.json, special_tokens_map.json, …) blake3-verified under ~/.cache/dynamo/mdc/by-slug/<slug>/<mdcsum>/, so the native preprocessor just needs to be pointed at that directory.

Why now

#9610 + #9707 closed the multimodal-completeness gap by harvesting non-typed siblings on both the HF and self-host paths. The TODOs in vllm_processor.py and sglang_processor.py flagged the redundant fetch_model as the next thing to remove once that landed. This is the smallest remaining piece in the gh-8749 stack before the DYN_SELF_HOST_METADATA default-ON flip (#8754 capstone); it's also a prerequisite for that flip to be a no-op in latency terms.

Details

  • ModelDeploymentCard::local_dir() -> PathBuf — pure path computation, returns the same path that mdc_slug_dir(slug, mdcsum) produces. No fs side effect (the resolve pipeline owns directory creation).
  • Python binding ModelDeploymentCard.local_dir() -> str plus .pyi stub.
  • vllm and sglang factories replace the source_path = mdc.source_path(); if not exists: fetch_model(source_path, ignore_weights=True) block with source_path = mdc.local_dir(). The unused fetch_model import is dropped in both.
  • TensorBased models skip download_config and would have an empty slug_dir, but both factories already gate on model_type.supports_chat() at the top, which excludes them — no extra guard needed.

Test

Rust unit test (model_card::tests::local_dir_matches_resolve_pipeline_path) asserts the path produced by local_dir() matches what resolve_metadata_files writes into. The existing tests/frontend/test_self_host_metadata.py covers the worker HTTP route end; the factory consumer path is exercised by the real vllm/sglang serve tests in CI (tests/serve/test_vllm.py, tests/serve/test_sglang.py).

Out of scope

  • Default-ON flip of DYN_SELF_HOST_METADATA — still gated on auto-detach-cleanup landing.
  • Auto-detach-cleanup itself — separate follow-up; the MetadataArtifactRegistry needs a secondary (endpoint, lora_slug) -> Vec<(slug, suffix)> index so the static detach_from_endpoint helper can clean up without holding a LocalModel handle.
  • Multimodal positive e2e covering a Qwen-VL through the full self-host path — separate PR.

Related


Open in Devin Review

Summary by CodeRabbit

Release Notes

  • New Features
    • Added local_dir() method to access the local directory containing cached model files, enabling direct access to resolved metadata without requiring runtime fetch operations.

Review Change Stack

@nnshah1 nnshah1 requested review from a team as code owners May 27, 2026 13:28
@github-actions github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels May 27, 2026
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7633679e-8b1d-4cff-aa82-ccd56c0df1c7

📥 Commits

Reviewing files that changed from the base of the PR and between a28c600 and 166ac37.

📒 Files selected for processing (5)
  • components/src/dynamo/frontend/sglang_processor.py
  • components/src/dynamo/frontend/vllm_processor.py
  • lib/bindings/python/rust/llm/model_card.rs
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/llm/src/model_card.rs

Walkthrough

This PR introduces a local_dir() method to ModelDeploymentCard that computes the on-disk cache directory for resolved metadata. It then refactors SGLang and VLLM engine factories to use this accessor instead of calling fetch_model on demand.

Changes

ModelDeploymentCard.local_dir() and Engine Factory Refactoring

Layer / File(s) Summary
Core ModelDeploymentCard.local_dir() method in Rust
lib/llm/src/model_card.rs
ModelDeploymentCard::local_dir() deterministically computes the resolved-metadata cache directory from the card's slug and mdcsum; unit test local_dir_matches_resolve_pipeline_path asserts the result matches the expected by-slug/<slug>/<mdcsum> structure.
Python bindings and type stub for local_dir()
lib/bindings/python/rust/llm/model_card.rs, lib/bindings/python/src/dynamo/_core.pyi
Rust binding exposes local_dir() to Python as an owned String; Python type stub documents the method signature and its purpose as the post-download_config resolved metadata directory.
SGLang and VLLM engine factory updates
components/src/dynamo/frontend/sglang_processor.py, components/src/dynamo/frontend/vllm_processor.py
Both processors remove fetch_model imports and update chat_engine_factory() to use mdc.local_dir() as the tokenizer/model source path, relying on prior pipeline steps to materialize the resolved directory instead of checking existence and fetching on miss.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main change: native preprocessor now consumes the resolved slug_dir instead of re-running fetch_model.
Description check ✅ Passed The description follows the template with Overview, Why now, Details, Test, Out of scope, and Related sections; all required information is present and comprehensive.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

nnshah1 added 4 commits June 10, 2026 08:49
)

The frontend's `download_config` pipeline already resolves every typed
MDC file and harvested sibling into a content-addressed `slug_dir` with
blake3-verified copies (#9610 + #9707). The vllm and sglang native
preprocessor factories were nonetheless calling `fetch_model` against
the worker URI on top of that — a redundant HF download whenever the
worker-side path wasn't reachable from the frontend pod.

Expose the resolve target on the MDC:
  * Rust: `ModelDeploymentCard::local_dir() -> PathBuf` — pure path
    computation matching `mdc_slug_dir(slug, mdcsum)`.
  * Python binding + .pyi stub: `mdc.local_dir() -> str`.

Both engine factories now use `mdc.local_dir()` directly:
  * `components/src/dynamo/frontend/vllm_processor.py`
  * `components/src/dynamo/frontend/sglang_processor.py`

Drops the unused `fetch_model` import in both. Adds one Rust unit test
asserting `local_dir()` produces the same path the resolve pipeline
writes into.

Signed-off-by: nnshah1 <neelays@nvidia.com>
- Extract `mdc_local_path` pure helper; rename `mdc_slug_dir` →
  `mdc_local_dir` so the create_dir_all wrapper and the public
  `local_dir()` share one path expression.
- Rename internal `slug_dir` → `local_dir` (variable, helper,
  parameter, docstrings, log fields) so one term flows from Python
  down through the resolve pipeline. Disk segment stays `by-slug/`
  to keep existing caches valid and describe the indexing scheme.
- Python binding `local_dir()` now returns `PyResult<String>` and
  raises `ValueError` on non-UTF-8 instead of silently dropping
  bytes via `to_string_lossy()`.
- Trim docstring on `ModelDeploymentCard::local_dir`; drop the
  redundant Python-side narrative explaining the same thing.
- Rename test to `local_dir_computes_expected_path` with a one-line
  comment about its scope (cache-layout drift sentinel, not a
  resolve-pipeline integration test).
- Move `ModelDeploymentCard` into the test module's `use` block
  alongside `HFConfig`.
- Compress Python call-site comments to one line each; rename
  factory-local `source_path` → `local_dir` so the variable name
  matches what it actually holds.

Signed-off-by: nnshah1 <neelays@nvidia.com>
CI's rustfmt collapsed the closure body to one line; local rustfmt
expanded it. Write the format! call inline so both versions agree.
Plain reflow, no semantic change.

Signed-off-by: nnshah1 <neelays@nvidia.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 force-pushed the nnshah1/gh-8749-native-preprocessor branch from adaeffe to 4763067 Compare June 10, 2026 15:51
@nnshah1 nnshah1 enabled auto-merge (squash) June 10, 2026 15:51
Comment thread components/src/dynamo/frontend/sglang_processor.py Outdated
Comment thread components/src/dynamo/frontend/vllm_processor.py Outdated
@tanmayv25

Copy link
Copy Markdown
Contributor

Small nit comments. Otherwise looks good to me.

Per tanmayv25 review: the engine factories depend on
download_config having populated mdc.local_dir() before they run.
The invariant is currently enforced by watcher.rs (which calls
download_config unconditionally before invoking the factory), but
the dependency wasn't expressed at the factory boundary.

Add an explicit os.path.isdir(local_dir) check that raises a
RuntimeError naming the model and the missing directory if a
future refactor reorders the call sites. Same check in both vllm
and sglang factories.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 force-pushed the nnshah1/gh-8749-native-preprocessor branch from 2b01ad5 to 1888922 Compare June 10, 2026 19:23
Comment thread components/src/dynamo/frontend/sglang_processor.py
The Rust pyo3 impl exposes name() but the .pyi stub was missing it,
so mypy rejected the defensive check added in the previous commit.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 merged commit 7ff7cff into main Jun 10, 2026
96 checks passed
@nnshah1 nnshah1 deleted the nnshah1/gh-8749-native-preprocessor branch June 10, 2026 22:18
Broduker pushed a commit to Broduker/dynamo that referenced this pull request Jun 12, 2026
…ynamogh-8749) (ai-dynamo#10037)

Signed-off-by: nnshah1 <neelays@nvidia.com>
Signed-off-by: shenls <shenlinshan@kanzhun.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants