feat(self-host): native preprocessor consumes resolved slug_dir (gh-8749)#10037
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughThis PR introduces a ChangesModelDeploymentCard.local_dir() and Engine Factory Refactoring
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
88f43b8 to
adaeffe
Compare
) The frontend's `download_config` pipeline already resolves every typed MDC file and harvested sibling into a content-addressed `slug_dir` with blake3-verified copies (#9610 + #9707). The vllm and sglang native preprocessor factories were nonetheless calling `fetch_model` against the worker URI on top of that — a redundant HF download whenever the worker-side path wasn't reachable from the frontend pod. Expose the resolve target on the MDC: * Rust: `ModelDeploymentCard::local_dir() -> PathBuf` — pure path computation matching `mdc_slug_dir(slug, mdcsum)`. * Python binding + .pyi stub: `mdc.local_dir() -> str`. Both engine factories now use `mdc.local_dir()` directly: * `components/src/dynamo/frontend/vllm_processor.py` * `components/src/dynamo/frontend/sglang_processor.py` Drops the unused `fetch_model` import in both. Adds one Rust unit test asserting `local_dir()` produces the same path the resolve pipeline writes into. Signed-off-by: nnshah1 <neelays@nvidia.com>
- Extract `mdc_local_path` pure helper; rename `mdc_slug_dir` → `mdc_local_dir` so the create_dir_all wrapper and the public `local_dir()` share one path expression. - Rename internal `slug_dir` → `local_dir` (variable, helper, parameter, docstrings, log fields) so one term flows from Python down through the resolve pipeline. Disk segment stays `by-slug/` to keep existing caches valid and describe the indexing scheme. - Python binding `local_dir()` now returns `PyResult<String>` and raises `ValueError` on non-UTF-8 instead of silently dropping bytes via `to_string_lossy()`. - Trim docstring on `ModelDeploymentCard::local_dir`; drop the redundant Python-side narrative explaining the same thing. - Rename test to `local_dir_computes_expected_path` with a one-line comment about its scope (cache-layout drift sentinel, not a resolve-pipeline integration test). - Move `ModelDeploymentCard` into the test module's `use` block alongside `HFConfig`. - Compress Python call-site comments to one line each; rename factory-local `source_path` → `local_dir` so the variable name matches what it actually holds. Signed-off-by: nnshah1 <neelays@nvidia.com>
CI's rustfmt collapsed the closure body to one line; local rustfmt expanded it. Write the format! call inline so both versions agree. Plain reflow, no semantic change. Signed-off-by: nnshah1 <neelays@nvidia.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
adaeffe to
4763067
Compare
|
Small nit comments. Otherwise looks good to me. |
Per tanmayv25 review: the engine factories depend on download_config having populated mdc.local_dir() before they run. The invariant is currently enforced by watcher.rs (which calls download_config unconditionally before invoking the factory), but the dependency wasn't expressed at the factory boundary. Add an explicit os.path.isdir(local_dir) check that raises a RuntimeError naming the model and the missing directory if a future refactor reorders the call sites. Same check in both vllm and sglang factories. Signed-off-by: nnshah1 <neelays@nvidia.com>
2b01ad5 to
1888922
Compare
The Rust pyo3 impl exposes name() but the .pyi stub was missing it, so mypy rejected the defensive check added in the previous commit. Signed-off-by: nnshah1 <neelays@nvidia.com>
…ynamogh-8749) (ai-dynamo#10037) Signed-off-by: nnshah1 <neelays@nvidia.com> Signed-off-by: shenls <shenlinshan@kanzhun.com>
Overview
vllm and sglang engine factories stop re-running
fetch_modelagainst the worker URI when initializing native preprocessor mode, and instead consume the already-resolvedslug_dirthatdownload_configpopulates on the frontend. This eliminates a redundant HF download on every model registration and is the natural follow-on to #9707 — the frontend now has every typed file plus harvested sibling (preprocessor_config.json, special_tokens_map.json, …) blake3-verified under~/.cache/dynamo/mdc/by-slug/<slug>/<mdcsum>/, so the native preprocessor just needs to be pointed at that directory.Why now
#9610 + #9707 closed the multimodal-completeness gap by harvesting non-typed siblings on both the HF and self-host paths. The TODOs in
vllm_processor.pyandsglang_processor.pyflagged the redundantfetch_modelas the next thing to remove once that landed. This is the smallest remaining piece in the gh-8749 stack before theDYN_SELF_HOST_METADATAdefault-ON flip (#8754 capstone); it's also a prerequisite for that flip to be a no-op in latency terms.Details
ModelDeploymentCard::local_dir() -> PathBuf— pure path computation, returns the same path thatmdc_slug_dir(slug, mdcsum)produces. No fs side effect (the resolve pipeline owns directory creation).ModelDeploymentCard.local_dir() -> strplus.pyistub.source_path = mdc.source_path(); if not exists: fetch_model(source_path, ignore_weights=True)block withsource_path = mdc.local_dir(). The unusedfetch_modelimport is dropped in both.download_configand would have an emptyslug_dir, but both factories already gate onmodel_type.supports_chat()at the top, which excludes them — no extra guard needed.Test
Rust unit test (
model_card::tests::local_dir_matches_resolve_pipeline_path) asserts the path produced bylocal_dir()matches whatresolve_metadata_fileswrites into. The existingtests/frontend/test_self_host_metadata.pycovers the worker HTTP route end; the factory consumer path is exercised by the real vllm/sglang serve tests in CI (tests/serve/test_vllm.py,tests/serve/test_sglang.py).Out of scope
DYN_SELF_HOST_METADATA— still gated on auto-detach-cleanup landing.MetadataArtifactRegistryneeds a secondary(endpoint, lora_slug) -> Vec<(slug, suffix)>index so the staticdetach_from_endpointhelper can clean up without holding aLocalModelhandle.Related
Summary by CodeRabbit
Release Notes
local_dir()method to access the local directory containing cached model files, enabling direct access to resolved metadata without requiring runtime fetch operations.