feat: Worker self-hosted metadata files#8754
Conversation
Closed five-role enum covering the metadata files the frontend consumes today (config, tokenizer, tokenizer_config, chat_template, generation_config) plus a struct describing one entry in the upcoming files vector on ModelDeploymentCard: filename, opaque uri, blake3 checksum, byte size, and role. Types only; no call sites yet. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
gh-8749 §2.2) Adds `files: Vec<ArtifactRef>` to ModelDeploymentCard. The field is declared #[serde(default, skip_serializing_if = "Vec::is_empty")] so old MDC bytes deserialize cleanly with an empty vector and new workers don't bloat the wire format when populating it alongside the legacy typed fields. `from_repo_checkout` now populates `files` by walking the same five typed artifacts (model_info, tokenizer, prompt_formatter, chat_template_file, gen_config) and synthesizing one ArtifactRef per reachable CheckedFile. Filenames come from the file's path, URIs are `file://<absolute>` for local paths or the URL string for URL-backed files, checksums are reused from CheckedFile's existing blake3, and sizes are read from filesystem metadata when available. The legacy typed fields are unchanged: new frontends prefer `files`, old frontends ignore it. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749 §2.3) Field defaults to false. Setter mirrors the existing http_* / tls_* setter pattern on the builder. No behavior wired yet — the field is read by registration in a follow-up commit. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
gh-8749 §2.4) self_host_base_url(drt) builds http://<host>:<port> from the actual bound port (system_status_server_info()) and the existing DYN_SYSTEM_HOST config. When the configured host is the unspecified bind sentinel (0.0.0.0/::), substitutes a routable IP detected via the standard UDP-connect kernel-routing trick — bind a UDP socket, connect() to a public address, read local_addr(). connect() on UDP does not transmit, so this works air-gapped. Falls back to 127.0.0.1 when no usable interface is found, yielding a loud connection-refused on the consumer rather than silently broken content. Errors immediately when the system_status_server isn't running so registration doesn't publish an unreachable URL. self_host_base_url is gated #[allow(dead_code)] for one commit; it becomes reachable in the §2.6 commit that wires the opt-in path. detect_routable_ip is exercised by a unit test asserting the result parses as an IP address. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749 §2.5) Adds a new lib/runtime/src/metadata_registry.rs module with a process-local MetadataArtifactRegistry mapping (model_slug, filename) to on-disk paths, modeled on the existing EngineRouteRegistry. Stored as a field on DistributedRuntime with a metadata_artifacts() accessor. Mounts a new GET /v1/metadata/{model_slug}/{*filename} route on the system_status_server alongside the /v1/loras precedent. The route is mounted unconditionally so an empty registry is harmless; handler returns 404 when (model_slug, filename) is not registered, 500 on a local read error, and the raw file bytes on success. Consumers blake3-verify against the MDC entry, so the transport is untrusted by construction. Six unit tests cover registry roundtrip, miss paths, multi-model isolation, model unregister, and clone semantics. The opt-in path that populates the registry lands in §2.6. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749 §2.6) Threads self_host_metadata from LocalModelBuilder into LocalModel and consumes it in attach(). When the flag is true, the new LocalModel::wire_self_hosted_artifacts helper: 1. Builds the worker's base URL via self_host_base_url (errors loudly if DYN_SYSTEM_PORT is unset). 2. Walks the MDC's `files` vector. For each ArtifactRef whose URI is a file:// URL pointing at this worker's local disk: - inserts (model_slug, filename) → on-disk path into DistributedRuntime::metadata_artifacts(), so the /v1/metadata/{slug}/{filename} handler can serve the bytes; - rewrites the ArtifactRef.uri in place to that route URL. Entries with non-file URIs (hf://, mx://, ...) are left as-is. The legacy hf:// rewrite of the typed enum fields runs as before, so old frontends continue to fall through to that path. Removes the one-commit #[allow(dead_code)] on self_host_base_url. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…-8749 §2.7) Threads the optional self_host_metadata: Option<bool> through the PyO3 register_model signature into LocalModelBuilder. Default remains false: existing worker recipes (vllm/sglang/trtllm) continue to behave identically; recipe authors flip the kwarg when they want self-hosting. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…gh-8749 §2.8) Adds try_self_host_download() on ModelDeploymentCard. download_config now tries this path before the legacy hub::from_hf fetch: - Walks the MDC's `files` vector for entries with http(s):// URIs. - GETs each, verifies the body's blake3 against ArtifactRef.checksum (loud failure on mismatch — no silent fallback to HF). - Atomically writes verified bytes to ~/.cache/dynamo/self-host-metadata/{slug}/ (tmp-then-rename, so a partial download never leaves a corrupt file in place). - Calls update_dir(cache_dir) so downstream consumers (`tokenizer()`, `prompt_formatter()`, ...) see local paths pointing at the verified files. When the artifact list is empty or carries no http(s) URIs, returns Ok(false) and download_config falls through to the existing hub::from_hf path. Old MDC bytes (no `files` field) decode with an empty vector via #[serde(default)], so this preserves cross-version compatibility for unmodified workers. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…§2.9)
Four new unit tests:
- artifact_ref_serde_roundtrip — ArtifactRef survives serde JSON
with all five fields preserved.
- artifact_role_serde_snake_case — locks the wire format for the
closed five-role enum (config / tokenizer / tokenizer_config /
chat_template / generation_config).
- mdc_without_files_deserializes_to_empty_vec — cross-version
compatibility guarantee: MDC bytes without a `files` field
deserialize cleanly with `files` defaulted to an empty vector,
so old workers' MDCs interoperate with new frontends.
- fetch_and_verify_rejects_checksum_mismatch — uses mockito to
serve a payload whose blake3 doesn't match the expected checksum;
asserts the helper errors with a clear "checksum mismatch"
message and leaves no file at the destination (the
tmp-then-rename atomicity guarantee).
Integration tests against a live deployment (worker opt-in +
mountless frontend, end-to-end /v1/models verification) are deferred
to a follow-up — they need the k8s test harness, not just the unit
suite. Refs #8749.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Removes three methods with no production callers and the two tests
that depended on them:
- unregister_model: no detach path calls it; speculative addition.
Worker process exit clears the whole map already. When we want
per-model cleanup, ~10 LOC re-adds it.
- len, is_empty: only used by tests. Tests now assert correctness
via get() — the actual contract — instead of the count.
Also drops the `clone_shares_state` test, which only exercised the
Arc-share semantic that #[derive(Clone)] documents at the type level.
Module shrinks from ~125 to ~85 LOC; production API is now
new / register / get. All four remaining tests pass. Refs #8749.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
…pty (gh-8749) Effectively reverts 0d4ab6d. Restoring the methods on second look: - register / unregister: parallel-construction symmetry. Even though no production caller invokes unregister yet, the pair reads as a natural domain API. Detach paths will need it; having it ready keeps that change a one-liner. - len / is_empty: standard collection accessors. Lightweight (one read-lock each) and useful for observability/diagnostics beyond the test suite. The clone_shares_state test also comes back — small but explicit coverage of the Arc-share semantic, which is load-bearing for the DistributedRuntime → SystemStatusServer handoff. Module returns to its pre-simplification shape (~125 LOC, 6 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…es (gh-8749) Two renames in service of clearer naming and parallel construction: Worker side (local_model.rs): wire_self_hosted_artifacts -> move_to_self_host Mirrors the existing ModelDeploymentCard::move_to_url pair (same shape: walks every typed-enum CheckedFile, rewrites URIs in place; opposite of update_dir). "wire" was an outlier — no other lib/ function uses it. Frontend side (model_card.rs): try_self_host_download -> try_download_files self_host_cache_dir -> files_cache_dir .cache/dynamo/self-host-metadata/ -> .cache/dynamo/model-metadata/ The frontend consumer is scheme-agnostic: it walks `files`, picks http(s):// entries, fetches them, and verifies blake3. It doesn't know or care whether the URIs point at the originating worker, a peer holder, modelexpress, or some other front-cache. Naming the function "self_host" baked in a worker-side concern that has no business in the consumer. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749) Match the legacy `download_config` posture: it doesn't own a disk cache, it delegates to `hub::from_hf` which manages HF Hub's cache. The new branch was creating its own persistent directory under ~/.cache/dynamo/model-metadata/, which is asymmetric and accumulates state in $HOME with no cleanup. Switch to a per-process scratch directory under `std::env::temp_dir()` keyed by PID + slug: /tmp/dynamo-mdc-<pid>/<slug>/<filename> Process restart re-fetches (~10 MiB total per model, sub-second on a DC network — same cost the legacy path pays when HF Hub's cache misses). The OS reaps tempdir on reboot. No long-running disk artifact in $HOME, no cleanup logic. Renames `files_cache_dir` -> `files_download_dir` to reflect that this is transient scratch, not a cache. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
) Replaces two ad-hoc helpers with a scheme-uniform resolver and an HF-Hub-style content-addressed cache. resolve_uri(client, uri, expected_checksum, dest) dispatches by URI scheme, so the consumer side is genuinely transport-agnostic: - http/https — reqwest GET, body verified - file — tokio::fs::read of the local path, verified - hf://repo[@rev]/filename — hub::from_hf, copy from snapshot, verified All schemes write atomically (tmp + rename); checksum mismatch is a hard error in every branch. Cache layout mirrors HF Hub: ~/.cache/dynamo/mdc/blobs/<blake3-hex> — bytes ~/.cache/dynamo/mdc/by-slug/<slug>/<filename> — symlink → blob Multi-worker dedup is automatic: same blake3 → same blob path → the existence check skips the fetch. Frontend restart hits the warm cache. Cleanup is user-managed, same posture as HF Hub. try_download_files becomes download_files, walks every entry in the files vector through resolve_uri, and update_dirs the typed CheckedFiles at the per-slug symlink directory. Returns Ok(false) only when files is empty so the caller falls through to the legacy hub::from_hf path; non-empty files but a single resolution failure is loud. move_to_url now also walks the files vector, rewriting `file://` entries to the same `hf://repo/filename` URLs it produces for the typed enums. Self-host (`http://`) entries are preserved — the file://-prefix gate keeps move_to_url from clobbering them when it runs after move_to_self_host. Shared-volume deployments (source_path.exists() on the worker) skip move_to_url entirely, so their `file://` URIs are preserved across both surfaces. Updates the in-source mockito test to call resolve_uri instead of the removed fetch_and_verify, plus a parse_hf_uri unit test. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749) Five new tests in lib/llm/tests/model_card.rs alongside the existing TinyLlama-fixture tests: - test_files_vector_populated_from_local_repo — load TinyLlama, assert mdc.files has entries with file:// URIs, blake3 checksums, sizes, and roles drawn from the closed five-role enum. - test_download_files_resolves_local_file_scheme — round-trip file:// resolution into the content-addressed cache, verify blob + by-slug symlink layout, assert the tokenizer loads via the resolved path. - test_download_files_dedupes_by_blake3 — call download_config twice, assert blob mtimes are unchanged on the second call (the blake3-keyed exists() check skips the fetch). - test_download_files_rejects_blake3_mismatch — tamper with one ArtifactRef.checksum, assert download_config errors with a clear "checksum mismatch" message and leaves no blob behind. - test_download_files_resolves_hf_scheme_with_token — HF_TOKEN runtime-skip pattern (matches preprocessor.rs); same Llama-3.1 fixture the existing preprocessor tests use; exercises the resolve_uri hf:// branch end-to-end. Tests are serialized via serial_test::serial because they share the $HOME-rooted MDC cache. Each test installs its own tempdir-backed $HOME so it doesn't pollute the user's real ~/.cache/dynamo/. Bug fix: artifact_ref_from_checked_file canonicalizes the local path before constructing the file:// URI. url::Url::from_file_path requires an absolute path; relative paths returned None and the files vector came out empty for any caller that loaded the MDC from a relative path (production goes through LocalModelBuilder::build which already canonicalizes; the bug only manifested in tests and direct load_from_disk callers). Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…lt (gh-8749) Lets deployments turn self-hosting on without modifying every worker recipe. When DYN_SELF_HOST_METADATA is set to a truthy value (1, true, yes, on; case-insensitive), `LocalModelBuilder::default()` initializes `self_host_metadata` to true. Explicit setter calls (`.self_host_metadata(...)`) and explicit PyO3 kwargs always win over the env-var default. Also tightens the PyO3 binding: the builder method is now only called when the kwarg is `Some(_)`. An unset kwarg falls through to the builder's default, so the env var is honored end-to-end. Two unit tests cover the truthy/falsy parsing, gated by serial_test::serial because they share a process-global env var. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
gh-8749) Tightens doc comments and inline comments across the new self-host code to match the surrounding files' style: brief one-line docs on public APIs, no doc on simple private helpers, longer prose only where the WHY is non-obvious. Removes design rationale, motivation, and "mirror existing pattern" prose — the type signatures and existing tests carry that load. Also adds tests/frontend/test_vllm_self_host_metadata.py — a local e2e clone of test_vllm.py with the only diff being DYN_SELF_HOST_METADATA=true in the worker process env. Verifies the full chain by asserting the model lands on /v1/models. Uses QWEN (Qwen3-0.6B). Marked vllm/gpu_1/e2e/post_merge so it slots into the existing vLLM e2e CI gate. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749) Strengthens the Python e2e to actually validate that the frontend fetched via the self-host http path: the frontend process now runs under an isolated HOME=<tmp_path> so any cache writes are scoped to the test, and the test asserts at least one blake3-keyed blob appears under <HOME>/.cache/dynamo/mdc/blobs/. That directory is only ever written by ModelDeploymentCard::download_files, so its contents prove the http resolver ran end-to-end (the legacy hub::from_hf fallback writes to the HF Hub cache, not to this directory). Adds a DRT-level e2e in lib/llm/tests/model_card.rs under #[cfg(feature = "integration")] mod integration_tests, modeled on lib/llm/tests/http_metrics.rs. Real DistributedRuntime with DYN_SYSTEM_PORT=0, real worker LocalModel::attach -> ModelWatcher -> download_config -> cache populated. Asserts blob count matches the artifact list and the tokenizer loads. Gated #[ignore] because it requires etcd; run with cargo test -p dynamo-llm --test model_card --features integration -- --ignored. Parallelism: - Within a binary: #[serial_test::serial] because HOME and DYN_SYSTEM_PORT are process-global. - Across cargo test processes: each gets its own tempdir-rooted HOME, its own random port (DYN_SYSTEM_PORT=0), and a unique etcd namespace (self-host-it-<pid>-<uuid>); the watcher uses NamespaceFilter::Exact so two parallel runs don't see each other's MDCs. Replaces the bare tempfile::TempDir isolated_home() helper with a true RAII guard (IsolatedHome) that restores the previous HOME on drop, so subsequent tests aren't left pointing at a deleted tempdir path. Refs #8749. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
|
The size field was added for two reasons: (1) per-file proactive cap on transfers, (2) |
Apply the same lessons from grahamking's review across the rest of
the PR's new code:
- ENV_SELF_HOST_METADATA constant doc trimmed from 5 lines of
parsing rules to a one-liner, matching the existing convention
in lib/runtime/src/config/environment_names.rs
(e.g. DYN_SYSTEM_HOST/PORT/etc).
- LocalModelBuilder::self_host_metadata setter doc rewritten to
match the runtime config field convention from
lib/runtime/src/config.rs ("Set this at runtime with environment
variable DYN_FOO").
- model_card.rs: demote the "resolved model metadata files" log
from info! to debug! — once-per-registration is too quiet for
info, same pattern we just demoted in move_to_self_host.
- model_card.rs::stream_to_tmp: cap.saturating_add(1) -> cap + 1
(cap bounded by ABSOLUTE_MAX_METADATA_BYTES = 1 GiB, no overflow).
- model_card.rs::stream_to_tmp: surface flush errors via
with_context()? instead of silent .ok(); a flush failure right
before atomic-rename would mean the staged bytes weren't fully
on disk.
- local_model.rs::move_to_self_host: use fs::canonicalize (the
module is already imported via `use std::fs;` at top of file).
Signed-off-by: nnshah1 <neelays@nvidia.com>
…ymlink (gh-8749) `move_to_self_host` was calling `fs::canonicalize()` on the local path and then taking `file_name()` from the canonical result. For workers backed by the HuggingFace cache, snapshot/<rev>/<filename> is itself a symlink into blobs/<git-sha1>; canonicalize follows it and returns the SHA1 as the file_name, so: * the http URL becomes /v1/metadata/<slug>/<sha1> * the per-(slug, mdcsum) symlink in the frontend's cache is placed at <slug>/<mdcsum>/<sha1> * downstream code that opens files by literal name (e.g. `HFConfig::from_json_file` doing `parent.join("generation_config.json")` to fall back to generation_config.json for `eos_token_id`) fails with "Failed to open file" — the file *is* in the cache, just under a different name. This bit the vllm Multi-GPU `mm_e_pd_llava-1.5-7b` test in CI because llava-1.5-7b's `eos_token_id` lives only in generation_config.json (Qwen3-0.6B has it directly in config.json, which is why our mocker tests didn't catch it). Fix: derive `filename` from the original (pre-canonicalize) path so the URL, the per-slug symlink, and the registry key all use the human-readable name. Keep canonicalize for the *registry value* — the http handler still needs an absolute path to the actual blob to serve bytes. Local repro added (and passes after fix): pytest -xvs tests/frontend/test_llava_repro.py ========================= 1 passed in 13.63s ===================== Existing mocker tests still pass: pytest -xvs tests/frontend/test_mocker_self_host_multi_replica.py ========================= 4 passed in 69.12s ===================== Signed-off-by: nnshah1 <neelays@nvidia.com>
) `ModelDeploymentCard::update_dir` deliberately skips the chat_template_file slot when `is_custom == true` — the legacy `hub::from_hf` fall-through doesn't fetch user-supplied custom templates (they aren't on HF), so leaving the path untouched kept the original local one for the legacy code path. In the new-format path we *do* fetch every populated slot (including custom templates served via the worker's self-host route), so the URL-skip exception no longer applies. The previous exception left `chat_template_file.path()` as a URL post-resolve, and `PromptFormatter::from_mdc` would bail with: HfChatTemplateJinja for <model> is a URL, cannot load Caught by the trtllm Multi-GPU `raw_embeddings_epd-2` test (which uses `--custom-jinja-template` via `agg_raw_embeddings_llava.sh`). Fix: in `resolve_metadata_files`, after fetching every slot via `iter_metadata_files`, walk `iter_metadata_files_mut()` and call `CheckedFile::update_dir(&slug_dir)` on each — same enumeration as the fetch loop, so we cover exactly the files we just fetched. This bypasses the legacy `is_custom` exclusion that's still load- bearing for the legacy `hub::from_hf` fall-through. Verified locally: existing mocker suite (3 schemes + multi-frontend) + llava repro all pass: 5 passed in 82.63s. Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749) Comment-only follow-up to 262f314 + 1257fe0. Trim two WHY blocks (7-line and 6-line) down to 3-4 lines each while keeping the load-bearing reason for each non-obvious choice (canonicalize-vs-original-filename, and skip-self.update_dir because of legacy is_custom exclusion). Signed-off-by: nnshah1 <neelays@nvidia.com>
| } | ||
|
|
||
| impl BlobLock { | ||
| async fn acquire(blob_path: &Path) -> anyhow::Result<Self> { |
There was a problem hiding this comment.
Can you explain this function please?
If the idea is to lock the file because several concurrent processes are writing it, that should not be the case. Each machine only has one frontend, and the cache should be local to the machine.
There was a problem hiding this comment.
In most production settings that's likely the case — but not guaranteed, and it also breaks our current parallel testing strategy. We use flock because it prevents both intra-process and inter-process races (workers across different worker sets can register at the same time and land on the same blob if they share file content).
Two cases the operator's one-frontend-per-pod model doesn't cover:
- Local dev / CI —
pytestruns two frontend processes against one mocker (seetests/frontend/test_mocker_self_host_multi_replica.py::test_two_frontends_share_metadata_cache_via_flock); both share$HOMEand race on the same~/.cache/dynamo/mdc/blobs/<blake3>. - Cross-worker-set in one frontend — two MDCs (different
model_nameornamespace) carrying a shared file via blake3 (e.g., two fine-tunes of one base model with identicaltokenizer.json). Their registrations run concurrently in the discovery watcher and would write the same blob without serialization.
This is the same pattern huggingface_hub uses internally — fcntl-based locks under ~/.cache/huggingface/hub/.locks/ around _download_to_tmp_and_move to coordinate concurrent downloads of the same blob. flock(LOCK_EX) covers both intra- and inter-process races via per-open-file-description semantics, so it's one mechanism.
| fn blake3_hex_of(checksum: &str) -> anyhow::Result<&str> { | ||
| checksum | ||
| .strip_prefix("blake3:") | ||
| .with_context(|| format!("expected blake3 checksum, got: {checksum}")) |
There was a problem hiding this comment.
We have a Checksum type in llm/src/common/checked_file.rs, might be able to use that.
There was a problem hiding this comment.
Good call. 63728476cc adds pub fn hash(&self) -> &str (and algorithm()) on Checksum, drops the blake3_hex_of helper, and reads the hex directly.
|
|
||
| let mut fetched = 0usize; | ||
| for (uri, expected, filename) in &entries { | ||
| let blake3_hex = blake3_hex_of(&expected.checksum().to_string())?.to_string(); |
There was a problem hiding this comment.
Here checksum() could return a Checksum which knows it's hash.
There was a problem hiding this comment.
Same patch (63728476cc). expected.checksum().hash() replaces the to_string() + strip_prefix round-trip.
|
|
||
| use std::collections::HashMap; | ||
| use std::path::PathBuf; | ||
| use std::sync::{Arc, RwLock}; |
There was a problem hiding this comment.
Use parking_lot::RwLock instead. Most of the code base migrated to that a while back. It avoids the unwrap(), and has several other benefits.
There was a problem hiding this comment.
Done in 63728476cc — swapped std::sync::RwLock for parking_lot::RwLock; the .unwrap() calls go away.
|
I ran it through
|
…h-8749) - `Checksum::hash()` / `algorithm()` accessors so callers can read the blake3 hex without round-tripping through `Display` + `strip_prefix`. - Drop `blake3_hex_of` helper in `model_card.rs`; callers use `cf.checksum().hash()` directly. - Swap `std::sync::RwLock` for `parking_lot::RwLock` in `metadata_registry`; drops `.unwrap()` calls and matches the rest of the codebase. Per Graham's review on PR #8754. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
…h-8749) Drop comments that just restate function names / short signatures: - `symlink_force` doc reduced to the concurrency note. - `filename_from_checked_file`, `pf_checked_file`, `copy_to_tmp` — function names + signatures already say what they do. - `move_to_self_host` 6-line preamble collapsed to 3 lines that capture the URL-vs-local dispatch (the non-obvious part). - `clear_self_hosted_artifacts` 8-line block collapsed to 2. - `self_host_base_url` host-selection paragraph removed; the IPv4/v6 resolver behavior lives in `ip_resolver`, not here. Cache/lock/concurrency rationale (BlobLock, stage_and_rename, unique_tmp_path, resolve_uri scheme list) kept verbatim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Implements #8749.
What this enables
A worker can opt in to self-hosting the metadata files (
config.json,tokenizer.json/tokenizer.model,tokenizer_config.json,chat_template.{jinja,json},generation_config.json) the frontend needs for preprocessing. The MDC's existing typed-enumCheckedFileslots become URI carriers — every slot already had a blake3 checksum and apath: Either<PathBuf, Url>capable of holding any URI scheme. The only new field is an additiveCheckedFile.size: Option<u64>that doubles as the per-file fetch cap and the new/legacy discriminant.Self-host is on by default. When the worker has a
system_status_serverrunning (i.e.DYN_SYSTEM_PORTis set), each metadata file'sCheckedFile.pathis rewritten to the worker's/v1/metadata/...route. Workers without a status server (noDYN_SYSTEM_PORT) gracefully skip the rewrite and keep whatever URI is naturally there (hf://after origin'smove_to_url, or a localfile://) — frontends still resolve those through the same verify-and-cache pipeline.Workers opt out via:
register_model(..., self_host_metadata=False)(Python kwarg), orDYN_SELF_HOST_METADATA=false(or0/no/off) in the worker's environment.State machine — worker side
DistributedRuntime::newspawns thesystem_status_serverwhenDYN_SYSTEM_PORTis set. The server unconditionally mountsGET /v1/metadata/{model_slug}/{*filename}— the handler reads from a process-localMetadataArtifactRegistryon the runtime; an empty registry returns 404.register_model(...).LocalModelBuilder::build()produces an MDC. Each populated typed-enum slot'sCheckedFileis built viaCheckedFile::from_disk(...), which mmap-hashes the bytes and populates(checksum, size)in one pass. Every populated slot now carriessize = Some(N), sois_new_format()returnstrue.LocalModel::attach()fires.self_host_metadata=True(default) ANDsystem_status_serveris running:move_to_self_host(base, drt)walks every populated typed-enum slot viaiter_metadata_files_mut(), derives the filename from the existingCheckedFile.path, and callscf.move_to_url(<base>/v1/metadata/<slug>/<filename>)— the same helper origin already uses forhf://rewriting. It also registers(slug, filename) → on-disk pathindrt.metadata_artifacts(). The worker does not stage or copy bytes; files stay where they are.self_host_metadata=True(default) butDYN_SYSTEM_PORTis unset (nosystem_status_server): the rewrite is silently skipped with an info log; CheckedFiles continue with whatever URI is naturally there (hf://after the fall-through below, or localfile://).source_pathdoesn't exist locally, origin's existingmove_to_url("hf://...")rewrites the typed CheckedFiles tohf://<repo>/<filename>. Unchanged from origin.mdcsum()which hashes only the typed-enum checksums, never URIs — multiple replicas of the same model share a workerset regardless of which URI scheme each worker advertises.State machine — frontend side
ModelWatcherobserves a new MDC.is_checksum_compatible(ws_key, mdcsum)rejects mismatched workers (loud error in logs, no registration). Same blake3s for the typed artifacts → samemdcsum()→ same workerset.download_configruns. Order: (a) ifis_new_format()(every populated slot hassize = Some), route throughresolve_metadata_files— every scheme rechecks blake3 against the staged bytes. (b) Otherwise legacy: short-circuit onhas_local_files()for shared mounts. (c) Otherwise fall back tohub::from_hf(self.source_path()). The new format always goes through verify-and-cache, regardless of whether the URI ishttp://,hf://, or a synthesizedfile://.resolve_metadata_filesresolves each populated slot. For eachCheckedFile:cf.url()if the worker rewrote it viamove_to_url/move_to_self_host; otherwise synthesizefile://<canonicalized>from the localPathBuf(shared-mount case).blob = ~/.cache/dynamo/mdc/blobs/<blake3-hex>.flock(LOCK_EX)on<blob>.lock. Linux's per-open-file-description lock semantics serialize both intra-process tasks and across processes, so a single layer covers both. Multiple frontends sharing$HOMEcollapse concurrent fetches to a single download per blake3.blob.exists()after acquiring the lock (a peer may have populated it while we waited); short-circuit on cache hit.expected.size()(with a 1 GiB absolute ceiling on the declared value):http(s)streams viatokio_util::io::StreamReader+tokio::io::copy;file://andhf://go through a sizedtokio::fs::copy.CheckedFile::from_disk(tmp)and comparing to the expected checksum — same hashing path the worker used at registration, so the comparison is bit-identical.stage_and_renameprimitive.symlink_force(blob, by-slug/<slug>/<mdcsum>/<filename>)— same atomic-rename primitive, so concurrent frontends don't collide on the symlink either.update_dir(by-slug/<slug>/<mdcsum>)points the typed CheckedFiles at the per-(slug, mdcsum) symlink directory. The<mdcsum>segment mirrors HF Hub'ssnapshots/<rev>/and isolates worker sets that share a model name but publish different file content. Downstream consumers (tokenizer(),prompt_formatter(), ...) load from local paths./v1/models. Tokenizer is in memory; bytes don't need disk again.Key design points
CheckedFile.size: Option<u64>with#[serde(default, skip_serializing_if = "Option::is_none")]. NoVec<ArtifactRef>, noArtifactRoleenum, no parallel checksum store. The typed-enum slots are the list;CheckedFile.checksumis the identity. Drift between sources of truth is structurally impossible.sizedoubles as discriminantis_new_format()returnstrueiff every populated slot carriessize = Some. Pre-PR MDCs deserialize cleanly withsize = Noneand fall through to legacyhub::from_hf. Mixed states (some slots with size, some without) count as legacy — surfacing a likely worker bug rather than papering over it.~/.cache/dynamo/mdc/blobs/<blake3-hex>+~/.cache/dynamo/mdc/by-slug/<slug>/<mdcsum>/<filename>symlinks. Per-(slug, mdcsum) isolation handles worker-set churn. Cleanup is user-managed (same posture as HF Hub).flock(LOCK_EX)on<blob>.lockcovers both intra- and inter-process. Atomic publish viastage_and_rename(per-call tmp +rename(2)) for both blob writes and symlink creation.CheckedFile::from_disk(tmp)and compared toexpected.checksum(). Hard-fail on mismatch (no silent fallback to HF once the new path is engaged).expected.size(), plus a 1 GiB absolute ceiling atresolve_urientry to clamp the trust boundary. Bodies are streamed (noVec<u8>buffering); pre-checked viaContent-Lengthwhen present.system_hostfrom existingDYN_SYSTEM_HOST; wildcard binds (0.0.0.0/::) resolved viadynamo_runtime::utils::ip_resolver::get_local_ip_for_advertise()— the shared resolver used by the TCP request plane (IPv4 / IPv6 / URL bracketing handled uniformly). No bespoke implementation. No new env var.DYN_SYSTEM_PORTon frontend pods, so the route never spawns there.DYN_SELF_HOST_METADATAdefaults to truthy; falsy values0/false/no/offopt out). Workers without asystem_status_server(noDYN_SYSTEM_PORT) silently skip the http rewrite and keep their natural URI scheme — preserves out-of-the-box behavior for the 90+ sample DGDs that don't setDYN_SYSTEM_PORT.Tests
lib/llm/src/common/checked_file.rs#tests):CheckedFileround-trip — legacy bytes deserialize withsize = None; new bytes carrysize = Some(N).lib/llm/src/model_card.rs#tests):is_new_format/iter_metadata_filescorrectness;resolve_uriblake3 mismatch (mockito);parse_hf_uricases; HTTP body oversize rejection (Content-Length pre-check + post-write overage).lib/llm/src/local_model.rs#tests):DYN_SELF_HOST_METADATAparsing — default-on when unset, truthy values keep the default ON, falsy values (0/false/no/off/ empty) opt out.lib/runtime/src/metadata_registry.rs#tests): registry contract — register/get/unregister/multi-model/clone-shares-state.lib/llm/tests/model_card.rs):download_configresolves throughcf.url()forfile://,hf://,http://; blake3-mismatch rejection; legacy fallback whenis_new_format() == false; per-testIsolatedHomeRAII guard so the cache leaves no residue. Plus a#[cfg(feature = "integration")]worker+frontend round-trip via realDistributedRuntime+ModelWatcher+ workerattach(), using a unique etcd namespace (<pid>-<uuid>) + random port so parallelcargo testinvocations don't collide.tests/frontend/test_mocker_self_host_multi_replica.py): CPU-only mocker workers parametrized over the URI scheme —self_host(http://),hf(hf://), andfile(file://, via a local snapshot dir). Two replicas exercise concurrent registrations against an isolatedHOME=<tmp_path>. A separate test pairs twoDynamoFrontendProcessinstances sharing one$HOMEto exercise cross-process flock on the metadata cache. Markedgpu_0/e2e/post_merge.Files changed
lib/llm/src/common/checked_file.rs— newsize: Option<u64>field + accessor;from_diskpopulates it; serde additive.lib/llm/src/model_card.rs—is_new_format,iter_metadata_files/iter_metadata_files_mut,download_configreorder,resolve_metadata_files,resolve_uri,BlobLock,stage_and_rename/stage_and_rename_sync,symlink_force, scheme-specific stage helpers (stream_to_tmp,copy_to_tmp).lib/llm/src/local_model.rs—self_host_metadatafield + setter,move_to_self_host(gracefully skips whensystem_status_serverisn't running),self_host_base_urlreturningOption<String>(using sharedget_local_ip_for_advertise),DYN_SELF_HOST_METADATAenv-var-driven default (defaults to ON; falsy values opt out).lib/llm/Cargo.toml— addslibcdependency forflock.lib/runtime/src/metadata_registry.rs(new) — process-local(slug, filename) → PathBufregistry.lib/runtime/src/distributed.rs— registry field + accessor onDistributedRuntime.lib/runtime/src/system_status_server.rs— mounts/v1/metadata/{slug}/{*filename}route + handler.lib/bindings/python/rust/lib.rs—self_host_metadatakwarg onregister_model.lib/llm/tests/model_card.rs— integration tests + DRT-level integration test.tests/frontend/test_mocker_self_host_multi_replica.py(new) — CPU-only multi-replica + multi-frontend e2e.Out of scope
DYN_SYSTEM_HOST=status.podIPvia downward API. Not blocking —get_local_ip_for_advertise()covers k8s pods out of the box (operator only setsDYN_SYSTEM_PORT, leaving HOST at the0.0.0.0default → wildcard branch → interface enum → pod IP). Same posture as request plane / event plane; bind/advertise separation is a runtime-wide follow-up, not a self-host-specific blocker.CheckedFileis forward-compatible;mx://already parses today./v1/mdcobservability endpoint.