Skip to content

DEP (light): Worker self-hosted metadata files #8749

@nnshah1

Description

@nnshah1

Updated 2026-04-28 to reflect the final design after review. The original Vec<ArtifactRef> proposal has been replaced by a single additive field (CheckedFile.size: Option<u64>) on the existing typed-enum slots. Implementation: PR #8754.

Authors

@tanmayv25, @grahamking, @nicolasnoble, @biswapanda, @Kaonael

Area

frontend

Summary

Enable workers to self-host the model metadata files the frontend needs for preprocessing (config.json, tokenizer.json / tokenizer.model, tokenizer_config.json, chat_template.{jinja,json}, generation_config.json).

Workers serve those files at GET /v1/metadata/{model_slug}/{filename} on the existing system_status_server, and rewrite the per-file CheckedFile URI on the MDC to that route. The MDC's typed-enum CheckedFile slots are the single source of identity — every slot already carries a blake3 checksum and a path: Either<PathBuf, Url> capable of holding any URI scheme. The only new field is an additive CheckedFile.size: Option<u64> that doubles as the per-file fetch cap and the new/legacy discriminant.

Motivation

The frontend today resolves metadata files via one of two paths: a local read (when frontend and worker share a filesystem, or both pods happen to have the same path on disk) or hub::from_hf(source_path) (which transparently uses ModelExpress as a regional cache when MODEL_EXPRESS_URL is set, otherwise hits HuggingFace directly). Both paths fail for private or custom models that aren't published anywhere reachable — ModelExpress can't serve a file that doesn't exist upstream, and HF can't either. They also fail when source_path isn't a valid HF slug (e.g. --served-model-name=cosmos-reason1-7b): the HF download errors and the model never registers with the frontend. curl /v1/models returns empty data; the failure is visible in frontend pod logs, but /v1/models itself gives operators no signal that registration was attempted.

External contributor @Kaonael's DEP draft (Kaonael/dynamo#3, "DEP: P2P model config file delivery from workers to frontends") calls out this gap explicitly for private/custom models that don't exist on HuggingFace — the same case where the HF fallback is not just slow but impossible. PR #5102 (by @yurekami) was a narrow fix to a public-model instance of the same broader problem: frontend cannot get metadata in any deployment without one of those two paths working. This DEP aligns with @Kaonael's diagnosis with a simplified design.

Proposal

1. One additive field on CheckedFile

pub struct CheckedFile {
    path: Either<PathBuf, Url>,           // unchanged
    checksum: Checksum,                   // unchanged

    #[serde(default, skip_serializing_if = "Option::is_none")]
    size: Option<u64>,                    // NEW
}

The path field already supports file://, http(s)://, hf://, mx:// — every typed-enum CheckedFile slot on the MDC (model_info, tokenizer, prompt_formatter, chat_template_file, gen_config) becomes a URI carrier without any schema churn. The existing move_to_url(...) helper (already used in origin for hf:// rewriting) does the work.

size is the only new field. It's Option<u64>, #[serde(default)], and:

  • Pre-PR MDCs deserialize cleanly with size = None.
  • New MDCs from PR-aware workers carry size = Some(N).
  • is_new_format() returns true iff every populated slot carries size = Somesize doubles as the new/legacy discriminant.

No files: Vec<ArtifactRef> field, no ArtifactRole enum, no parallel checksum store. The typed-enum slots already are the list, and the existing CheckedFile.checksum is already the identity. Drift between sources of truth is structurally impossible.

2. Self-host opt-in at register_model

A new boolean kwarg, default False:

await register_model(
    model_input, model_type, endpoint, model_path, model_name,
    self_host_metadata=True,
    ...,
)

Backed by a setter on LocalModelBuilder (lib/llm/src/local_model.rs) and threaded through the PyO3 binding (lib/bindings/python/rust/lib.rs).

When self_host_metadata=True, the runtime constructs the worker base URL from existing system_status_server info — no new env var:

let info = drt.system_status_server_info()
    .context("self_host_metadata=True requires DYN_SYSTEM_PORT to be set")?;
let host = match drt.config().system_host.as_str() {
    "0.0.0.0" | "::" | "[::]" => {
        dynamo_runtime::utils::ip_resolver::get_local_ip_for_advertise()
    }
    other => other.to_string(),
};
format!("http://{host}:{}", info.port())

get_local_ip_for_advertise() is the shared resolver used by the TCP request plane — IPv4 / IPv6 / URL bracketing handled uniformly. No bespoke implementation in this path.

move_to_self_host(base, drt) walks every populated typed-enum slot, derives the filename from the existing CheckedFile.path, calls cf.move_to_url(<base>/v1/metadata/<slug>/<filename>), populates cf.size from on-disk metadata, and registers the (slug, filename) → on-disk path triple in a process-local registry that the route handler reads from. The worker does not stage or copy bytes — files stay where they are.

If system_status_server is not running (DYN_SYSTEM_PORT unset), registration fails immediately with a clear error.

3. Route on system_status_server

A single filename-keyed route, mounted unconditionally at server spawn time alongside /v1/loras* (lib/runtime/src/system_status_server.rs):

GET /v1/metadata/{model_slug}/{filename}

Handler reads from the process-local registry. If the (model_slug, filename) pair isn't registered, returns 404. Response body is the raw file bytes; the consumer verifies blake3 against the MDC entry, so the transport is untrusted by construction. Mirrors the existing /v1/loras/{*lora_name} precedent in the same server.

4. Frontend resolve / verify / cache

download_config short-circuits on is_new_format() and routes through resolve_metadata_files:

if self.is_new_format() {
    return self.resolve_metadata_files().await;   // every scheme rechecks blake3
}
if self.has_local_files() { return Ok(()); }      // legacy short-circuit
let p = crate::hub::from_hf(self.source_path(), true).await?;
self.update_dir(&p);
Ok(())

For each CheckedFile, the frontend derives a URI:

  • cf.url() if the worker called move_to_url/move_to_self_host (http://, hf://, …).
  • otherwise synthesize file://<canonicalized> from the local PathBuf (shared-mount workers that never rewrote).

So all three transport classes — self-hosted http, HF-resolved, and shared-mount file:// — converge on a single resolve_uri(client, uri, expected_checked_file, dest) pipeline.

resolve_uri cap-checks expected.size() (with a 1 GiB absolute ceiling on the declared value), acquires flock(LOCK_EX) on <blob>.lock, double-checks the blob existence, then inside an atomic-publish primitive (stage_and_rename) it stages bytes scheme-specifically into a per-call tmp, rebuilds CheckedFile::from_disk(tmp) (same mmap-backed blake3 path the worker used at registration), and checksum-compares to expected. On match, atomic-renames tmp into the cache.

Cache layout (per $HOME):

~/.cache/dynamo/mdc/
  blobs/<blake3-hex>                          # content-addressed
  by-slug/<slug>/<mdcsum>/<filename>          # symlinks → blobs

Per-(slug, mdcsum) symlinks isolate worker sets that share a model name but publish different content. The flock layer is per-OFD so it serializes both intra-process tasks and across processes — multiple frontends sharing $HOME collapse concurrent fetches to a single download per blake3.

5. Startup and registration sequence

sequenceDiagram
    participant W as Worker
    participant SS as system_status_server
    participant ETCD as MDC store
    participant F as Frontend
    participant CACHE as MDC cache

    Note over W: register_model(self_host_metadata=True)
    W->>W: build typed-enum slots, populate checksum + size
    W->>SS: register (slug, filename) to on-disk path per slot
    W->>W: move_to_self_host(base) rewrites cf.path to http URL
    W->>ETCD: publish MDC (size populated → is_new_format())

    Note over F: discovery watcher picks up MDC
    F->>F: download_config — is_new_format()? yes
    loop each populated CheckedFile slot
        F->>F: derive uri (cf.url() or synth file://)
        F->>CACHE: flock blob lock, check if cached
        alt cache miss
            alt http(s)
                F->>SS: GET /v1/metadata/{slug}/{filename}
                SS-->>F: bytes
            else hf
                F->>F: hub::from_hf to local snapshot
            else file
                F->>F: read local path
            end
            F->>F: stage to tmp, rebuild CheckedFile::from_disk, compare checksum
            F->>CACHE: atomic rename tmp to blobs/{blake3}
        end
        F->>CACHE: symlink_force by-slug/{slug}/{mdcsum}/{filename}
    end
    F->>F: update_dir to per-mdcsum dir, load tokenizer + config
    F-->>F: model registered, /v1/models reflects it
Loading

6. Cross-version compatibility

The change is additive. CheckedFile.size is Option<u64> with #[serde(default)], so old MDC bytes deserialize cleanly with size = None. The legacy fields and the download_config()hub::from_hf fallback path remain in place.

Worker Frontend Behavior
Old Old Unchanged.
New, self_host=False Old Old frontend ignores size; reads the same typed-enum slots populated as today.
Old New New frontend sees is_new_format() == false; falls through to legacy download_config().
New, self_host=False New New frontend resolves each CheckedFile via its URI (typically hf:// or file://), verifies blake3, caches.
New, self_host=True New New frontend fetches from http://<worker>/v1/metadata/..., verifies blake3, caches.
New, self_host=True Old Old frontend uses legacy source_path; resolution depends on whether the path is reachable from the frontend pod (same as today's behavior with that worker config).

7. Relationship to prior issues

This DEP addresses the same problem class @Kaonael flagged in Kaonael/dynamo#3 (private/custom models that don't exist on HuggingFace) — the worker self-host path makes the frontend independent of HF entirely when the worker opts in. PR #5102's narrow fix stays untouched and is preserved on the MDC for the public-model --served-model-name case during the cross-version window.

8. Code references

  • lib/llm/src/common/checked_file.rsCheckedFile (path, checksum, size); from_disk mmap-backed blake3.
  • lib/llm/src/model_card.rsis_new_format, iter_metadata_files, download_config, resolve_metadata_files, resolve_uri, BlobLock, stage_and_rename, symlink_force.
  • lib/llm/src/local_model.rsmove_to_self_host, LocalModelBuilder.self_host_metadata.
  • lib/llm/src/discovery/watcher.rs — frontend-side download_config and tokenizer load.
  • lib/runtime/src/system_status_server.rs/v1/metadata route; /v1/loras* precedent.
  • lib/runtime/src/distributed.rssystem_status_server spawn site, system_status_server_info() accessor; metadata_artifacts registry.
  • lib/llm/src/hub.rsfrom_hf (HuggingFace + ModelExpress transport).
  • lib/runtime/src/utils/ip_resolver.rsget_local_ip_for_advertise() shared with the request plane.
  • lib/bindings/python/rust/lib.rs — PyO3 register_model signature.
  • components/src/dynamo/{vllm,sglang,trtllm}/... — worker registration call sites.
  • deploy/operator/internal/dynamo/component_worker.go — worker env block (existing DYN_SYSTEM_PORT plumbing is sufficient).

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions