You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated 2026-04-28 to reflect the final design after review. The original Vec<ArtifactRef> proposal has been replaced by a single additive field (CheckedFile.size: Option<u64>) on the existing typed-enum slots. Implementation: PR #8754.
Enable workers to self-host the model metadata files the frontend needs for preprocessing (config.json, tokenizer.json / tokenizer.model, tokenizer_config.json, chat_template.{jinja,json}, generation_config.json).
Workers serve those files at GET /v1/metadata/{model_slug}/{filename} on the existing system_status_server, and rewrite the per-file CheckedFile URI on the MDC to that route. The MDC's typed-enum CheckedFile slots are the single source of identity — every slot already carries a blake3 checksum and a path: Either<PathBuf, Url> capable of holding any URI scheme. The only new field is an additive CheckedFile.size: Option<u64> that doubles as the per-file fetch cap and the new/legacy discriminant.
Motivation
The frontend today resolves metadata files via one of two paths: a local read (when frontend and worker share a filesystem, or both pods happen to have the same path on disk) or hub::from_hf(source_path) (which transparently uses ModelExpress as a regional cache when MODEL_EXPRESS_URL is set, otherwise hits HuggingFace directly). Both paths fail for private or custom models that aren't published anywhere reachable — ModelExpress can't serve a file that doesn't exist upstream, and HF can't either. They also fail when source_path isn't a valid HF slug (e.g. --served-model-name=cosmos-reason1-7b): the HF download errors and the model never registers with the frontend. curl /v1/models returns empty data; the failure is visible in frontend pod logs, but /v1/models itself gives operators no signal that registration was attempted.
External contributor @Kaonael's DEP draft (Kaonael/dynamo#3, "DEP: P2P model config file delivery from workers to frontends") calls out this gap explicitly for private/custom models that don't exist on HuggingFace — the same case where the HF fallback is not just slow but impossible. PR #5102 (by @yurekami) was a narrow fix to a public-model instance of the same broader problem: frontend cannot get metadata in any deployment without one of those two paths working. This DEP aligns with @Kaonael's diagnosis with a simplified design.
The path field already supports file://, http(s)://, hf://, mx:// — every typed-enum CheckedFile slot on the MDC (model_info, tokenizer, prompt_formatter, chat_template_file, gen_config) becomes a URI carrier without any schema churn. The existing move_to_url(...) helper (already used in origin for hf:// rewriting) does the work.
size is the only new field. It's Option<u64>, #[serde(default)], and:
Pre-PR MDCs deserialize cleanly with size = None.
New MDCs from PR-aware workers carry size = Some(N).
is_new_format() returns true iff every populated slot carries size = Some — size doubles as the new/legacy discriminant.
No files: Vec<ArtifactRef> field, no ArtifactRole enum, no parallel checksum store. The typed-enum slots already are the list, and the existing CheckedFile.checksum is already the identity. Drift between sources of truth is structurally impossible.
Backed by a setter on LocalModelBuilder (lib/llm/src/local_model.rs) and threaded through the PyO3 binding (lib/bindings/python/rust/lib.rs).
When self_host_metadata=True, the runtime constructs the worker base URL from existing system_status_server info — no new env var:
let info = drt.system_status_server_info().context("self_host_metadata=True requires DYN_SYSTEM_PORT to be set")?;let host = match drt.config().system_host.as_str(){"0.0.0.0" | "::" | "[::]" => {
dynamo_runtime::utils::ip_resolver::get_local_ip_for_advertise()}
other => other.to_string(),};format!("http://{host}:{}", info.port())
get_local_ip_for_advertise() is the shared resolver used by the TCP request plane — IPv4 / IPv6 / URL bracketing handled uniformly. No bespoke implementation in this path.
move_to_self_host(base, drt) walks every populated typed-enum slot, derives the filename from the existing CheckedFile.path, calls cf.move_to_url(<base>/v1/metadata/<slug>/<filename>), populates cf.size from on-disk metadata, and registers the (slug, filename) → on-disk path triple in a process-local registry that the route handler reads from. The worker does not stage or copy bytes — files stay where they are.
If system_status_server is not running (DYN_SYSTEM_PORT unset), registration fails immediately with a clear error.
3. Route on system_status_server
A single filename-keyed route, mounted unconditionally at server spawn time alongside /v1/loras* (lib/runtime/src/system_status_server.rs):
GET /v1/metadata/{model_slug}/{filename}
Handler reads from the process-local registry. If the (model_slug, filename) pair isn't registered, returns 404. Response body is the raw file bytes; the consumer verifies blake3 against the MDC entry, so the transport is untrusted by construction. Mirrors the existing /v1/loras/{*lora_name} precedent in the same server.
4. Frontend resolve / verify / cache
download_config short-circuits on is_new_format() and routes through resolve_metadata_files:
ifself.is_new_format(){returnself.resolve_metadata_files().await;// every scheme rechecks blake3}ifself.has_local_files(){returnOk(());}// legacy short-circuitlet p = crate::hub::from_hf(self.source_path(),true).await?;self.update_dir(&p);Ok(())
For each CheckedFile, the frontend derives a URI:
cf.url() if the worker called move_to_url/move_to_self_host (http://, hf://, …).
otherwise synthesize file://<canonicalized> from the local PathBuf (shared-mount workers that never rewrote).
So all three transport classes — self-hosted http, HF-resolved, and shared-mount file:// — converge on a single resolve_uri(client, uri, expected_checked_file, dest) pipeline.
resolve_uri cap-checks expected.size() (with a 1 GiB absolute ceiling on the declared value), acquires flock(LOCK_EX) on <blob>.lock, double-checks the blob existence, then inside an atomic-publish primitive (stage_and_rename) it stages bytes scheme-specifically into a per-call tmp, rebuilds CheckedFile::from_disk(tmp) (same mmap-backed blake3 path the worker used at registration), and checksum-compares to expected. On match, atomic-renames tmp into the cache.
Per-(slug, mdcsum) symlinks isolate worker sets that share a model name but publish different content. The flock layer is per-OFD so it serializes both intra-process tasks and across processes — multiple frontends sharing $HOME collapse concurrent fetches to a single download per blake3.
5. Startup and registration sequence
sequenceDiagram
participant W as Worker
participant SS as system_status_server
participant ETCD as MDC store
participant F as Frontend
participant CACHE as MDC cache
Note over W: register_model(self_host_metadata=True)
W->>W: build typed-enum slots, populate checksum + size
W->>SS: register (slug, filename) to on-disk path per slot
W->>W: move_to_self_host(base) rewrites cf.path to http URL
W->>ETCD: publish MDC (size populated → is_new_format())
Note over F: discovery watcher picks up MDC
F->>F: download_config — is_new_format()? yes
loop each populated CheckedFile slot
F->>F: derive uri (cf.url() or synth file://)
F->>CACHE: flock blob lock, check if cached
alt cache miss
alt http(s)
F->>SS: GET /v1/metadata/{slug}/{filename}
SS-->>F: bytes
else hf
F->>F: hub::from_hf to local snapshot
else file
F->>F: read local path
end
F->>F: stage to tmp, rebuild CheckedFile::from_disk, compare checksum
F->>CACHE: atomic rename tmp to blobs/{blake3}
end
F->>CACHE: symlink_force by-slug/{slug}/{mdcsum}/{filename}
end
F->>F: update_dir to per-mdcsum dir, load tokenizer + config
F-->>F: model registered, /v1/models reflects it
Loading
6. Cross-version compatibility
The change is additive. CheckedFile.size is Option<u64> with #[serde(default)], so old MDC bytes deserialize cleanly with size = None. The legacy fields and the download_config() → hub::from_hf fallback path remain in place.
Worker
Frontend
Behavior
Old
Old
Unchanged.
New, self_host=False
Old
Old frontend ignores size; reads the same typed-enum slots populated as today.
Old
New
New frontend sees is_new_format() == false; falls through to legacy download_config().
New, self_host=False
New
New frontend resolves each CheckedFile via its URI (typically hf:// or file://), verifies blake3, caches.
New, self_host=True
New
New frontend fetches from http://<worker>/v1/metadata/..., verifies blake3, caches.
New, self_host=True
Old
Old frontend uses legacy source_path; resolution depends on whether the path is reachable from the frontend pod (same as today's behavior with that worker config).
7. Relationship to prior issues
This DEP addresses the same problem class @Kaonael flagged in Kaonael/dynamo#3 (private/custom models that don't exist on HuggingFace) — the worker self-host path makes the frontend independent of HF entirely when the worker opts in. PR #5102's narrow fix stays untouched and is preserved on the MDC for the public-model --served-model-name case during the cross-version window.
Authors
@tanmayv25, @grahamking, @nicolasnoble, @biswapanda, @Kaonael
Area
frontend
Summary
Enable workers to self-host the model metadata files the frontend needs for preprocessing (
config.json,tokenizer.json/tokenizer.model,tokenizer_config.json,chat_template.{jinja,json},generation_config.json).Workers serve those files at
GET /v1/metadata/{model_slug}/{filename}on the existingsystem_status_server, and rewrite the per-fileCheckedFileURI on the MDC to that route. The MDC's typed-enumCheckedFileslots are the single source of identity — every slot already carries a blake3 checksum and apath: Either<PathBuf, Url>capable of holding any URI scheme. The only new field is an additiveCheckedFile.size: Option<u64>that doubles as the per-file fetch cap and the new/legacy discriminant.Motivation
The frontend today resolves metadata files via one of two paths: a local read (when frontend and worker share a filesystem, or both pods happen to have the same path on disk) or
hub::from_hf(source_path)(which transparently uses ModelExpress as a regional cache whenMODEL_EXPRESS_URLis set, otherwise hits HuggingFace directly). Both paths fail for private or custom models that aren't published anywhere reachable — ModelExpress can't serve a file that doesn't exist upstream, and HF can't either. They also fail whensource_pathisn't a valid HF slug (e.g.--served-model-name=cosmos-reason1-7b): the HF download errors and the model never registers with the frontend.curl /v1/modelsreturns empty data; the failure is visible in frontend pod logs, but/v1/modelsitself gives operators no signal that registration was attempted.External contributor @Kaonael's DEP draft (Kaonael/dynamo#3, "DEP: P2P model config file delivery from workers to frontends") calls out this gap explicitly for private/custom models that don't exist on HuggingFace — the same case where the HF fallback is not just slow but impossible. PR #5102 (by @yurekami) was a narrow fix to a public-model instance of the same broader problem: frontend cannot get metadata in any deployment without one of those two paths working. This DEP aligns with @Kaonael's diagnosis with a simplified design.
Proposal
1. One additive field on
CheckedFileThe
pathfield already supportsfile://,http(s)://,hf://,mx://— every typed-enumCheckedFileslot on the MDC (model_info,tokenizer,prompt_formatter,chat_template_file,gen_config) becomes a URI carrier without any schema churn. The existingmove_to_url(...)helper (already used in origin forhf://rewriting) does the work.sizeis the only new field. It'sOption<u64>,#[serde(default)], and:size = None.size = Some(N).is_new_format()returnstrueiff every populated slot carriessize = Some—sizedoubles as the new/legacy discriminant.No
files: Vec<ArtifactRef>field, noArtifactRoleenum, no parallel checksum store. The typed-enum slots already are the list, and the existingCheckedFile.checksumis already the identity. Drift between sources of truth is structurally impossible.2. Self-host opt-in at
register_modelA new boolean kwarg, default
False:Backed by a setter on
LocalModelBuilder(lib/llm/src/local_model.rs) and threaded through the PyO3 binding (lib/bindings/python/rust/lib.rs).When
self_host_metadata=True, the runtime constructs the worker base URL from existingsystem_status_serverinfo — no new env var:get_local_ip_for_advertise()is the shared resolver used by the TCP request plane — IPv4 / IPv6 / URL bracketing handled uniformly. No bespoke implementation in this path.move_to_self_host(base, drt)walks every populated typed-enum slot, derives the filename from the existingCheckedFile.path, callscf.move_to_url(<base>/v1/metadata/<slug>/<filename>), populatescf.sizefrom on-disk metadata, and registers the(slug, filename) → on-disk pathtriple in a process-local registry that the route handler reads from. The worker does not stage or copy bytes — files stay where they are.If
system_status_serveris not running (DYN_SYSTEM_PORTunset), registration fails immediately with a clear error.3. Route on
system_status_serverA single filename-keyed route, mounted unconditionally at server spawn time alongside
/v1/loras*(lib/runtime/src/system_status_server.rs):Handler reads from the process-local registry. If the
(model_slug, filename)pair isn't registered, returns 404. Response body is the raw file bytes; the consumer verifies blake3 against the MDC entry, so the transport is untrusted by construction. Mirrors the existing/v1/loras/{*lora_name}precedent in the same server.4. Frontend resolve / verify / cache
download_configshort-circuits onis_new_format()and routes throughresolve_metadata_files:For each
CheckedFile, the frontend derives a URI:cf.url()if the worker calledmove_to_url/move_to_self_host(http://,hf://, …).file://<canonicalized>from the localPathBuf(shared-mount workers that never rewrote).So all three transport classes — self-hosted http, HF-resolved, and shared-mount file:// — converge on a single
resolve_uri(client, uri, expected_checked_file, dest)pipeline.resolve_uricap-checksexpected.size()(with a 1 GiB absolute ceiling on the declared value), acquiresflock(LOCK_EX)on<blob>.lock, double-checks the blob existence, then inside an atomic-publish primitive (stage_and_rename) it stages bytes scheme-specifically into a per-call tmp, rebuildsCheckedFile::from_disk(tmp)(same mmap-backed blake3 path the worker used at registration), and checksum-compares toexpected. On match, atomic-renames tmp into the cache.Cache layout (per
$HOME):Per-(slug, mdcsum) symlinks isolate worker sets that share a model name but publish different content. The flock layer is per-OFD so it serializes both intra-process tasks and across processes — multiple frontends sharing
$HOMEcollapse concurrent fetches to a single download per blake3.5. Startup and registration sequence
sequenceDiagram participant W as Worker participant SS as system_status_server participant ETCD as MDC store participant F as Frontend participant CACHE as MDC cache Note over W: register_model(self_host_metadata=True) W->>W: build typed-enum slots, populate checksum + size W->>SS: register (slug, filename) to on-disk path per slot W->>W: move_to_self_host(base) rewrites cf.path to http URL W->>ETCD: publish MDC (size populated → is_new_format()) Note over F: discovery watcher picks up MDC F->>F: download_config — is_new_format()? yes loop each populated CheckedFile slot F->>F: derive uri (cf.url() or synth file://) F->>CACHE: flock blob lock, check if cached alt cache miss alt http(s) F->>SS: GET /v1/metadata/{slug}/{filename} SS-->>F: bytes else hf F->>F: hub::from_hf to local snapshot else file F->>F: read local path end F->>F: stage to tmp, rebuild CheckedFile::from_disk, compare checksum F->>CACHE: atomic rename tmp to blobs/{blake3} end F->>CACHE: symlink_force by-slug/{slug}/{mdcsum}/{filename} end F->>F: update_dir to per-mdcsum dir, load tokenizer + config F-->>F: model registered, /v1/models reflects it6. Cross-version compatibility
The change is additive.
CheckedFile.sizeisOption<u64>with#[serde(default)], so old MDC bytes deserialize cleanly withsize = None. The legacy fields and thedownload_config()→hub::from_hffallback path remain in place.self_host=Falsesize; reads the same typed-enum slots populated as today.is_new_format() == false; falls through to legacydownload_config().self_host=FalseCheckedFilevia its URI (typicallyhf://orfile://), verifies blake3, caches.self_host=Truehttp://<worker>/v1/metadata/..., verifies blake3, caches.self_host=Truesource_path; resolution depends on whether the path is reachable from the frontend pod (same as today's behavior with that worker config).7. Relationship to prior issues
This DEP addresses the same problem class @Kaonael flagged in Kaonael/dynamo#3 (private/custom models that don't exist on HuggingFace) — the worker self-host path makes the frontend independent of HF entirely when the worker opts in. PR #5102's narrow fix stays untouched and is preserved on the MDC for the public-model
--served-model-namecase during the cross-version window.8. Code references
lib/llm/src/common/checked_file.rs—CheckedFile(path, checksum, size);from_diskmmap-backed blake3.lib/llm/src/model_card.rs—is_new_format,iter_metadata_files,download_config,resolve_metadata_files,resolve_uri,BlobLock,stage_and_rename,symlink_force.lib/llm/src/local_model.rs—move_to_self_host,LocalModelBuilder.self_host_metadata.lib/llm/src/discovery/watcher.rs— frontend-sidedownload_configand tokenizer load.lib/runtime/src/system_status_server.rs—/v1/metadataroute;/v1/loras*precedent.lib/runtime/src/distributed.rs—system_status_serverspawn site,system_status_server_info()accessor;metadata_artifactsregistry.lib/llm/src/hub.rs—from_hf(HuggingFace + ModelExpress transport).lib/runtime/src/utils/ip_resolver.rs—get_local_ip_for_advertise()shared with the request plane.lib/bindings/python/rust/lib.rs— PyO3register_modelsignature.components/src/dynamo/{vllm,sglang,trtllm}/...— worker registration call sites.deploy/operator/internal/dynamo/component_worker.go— worker env block (existingDYN_SYSTEM_PORTplumbing is sufficient).