Skip to content

feat(self-host): worker hosts metadata files via system_status_server (gh-8749)#8855

Merged
nnshah1 merged 7 commits into
mainfrom
nnshah1/gh-8749-worker-self-host-route
Apr 30, 2026
Merged

feat(self-host): worker hosts metadata files via system_status_server (gh-8749)#8855
nnshah1 merged 7 commits into
mainfrom
nnshah1/gh-8749-worker-self-host-route

Conversation

@nnshah1

@nnshah1 nnshah1 commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

First slice of #8749. Companion to #8754, which stays open as the integration capstone — once this lands, #8754 rebases against main and the next slice (frontend verify-and-cache) extracts cleanly.

Scope of this PR

Worker side only. Opt-in via DYN_SELF_HOST_METADATA=true (default OFF) or register_model(..., self_host_metadata=True). When opted in, LocalModel::attach() walks the MDC's typed CheckedFile slots, registers each local-disk path in a process-local registry on DistributedRuntime, and rewrites cf.path to http://<worker>/v1/metadata/<slug>/<filename>. URL slots (hf://, etc.) are left alone so existing transports keep working.

Three commits:

  1. feat(runtime)MetadataArtifactRegistry + DistributedRuntime::metadata_artifacts() accessor + always-mounted GET /v1/metadata/{slug}/{*filename} route on system_status_server (empty registry → 404).
  2. feat(checked-file) — wire-format addition: CheckedFile.size: Option<u64>, populated by from_disk, additive serde so legacy MDC bytes deserialize cleanly with size = None.
  3. feat(self-host) — worker producer: LocalModelBuilder::self_host_metadata setter, move_to_self_host, self_host_base_url, DYN_SELF_HOST_METADATA env (default OFF), register_model(..., self_host_metadata=...) Python kwarg, plus iter_metadata_files{,_mut} slot iterators on the MDC.

Why default OFF

The frontend verify-and-cache pipeline ships in a follow-up. Without it, a frontend seeing an http:// URL in a custom-template slot trips the update_dir is_custom exclusion and downstream from_mdc fails on path() == None. Default-OFF means existing deployments are unaffected; opt-in users with custom templates get a clean error, not silent breakage. Default flips to ON in the capstone PR once the consumer is in place.

What's NOT here

Deferred to follow-up PRs (tracked under #8749 / #8754):

  • Frontend resolve_metadata_files + BlobLock + stage_and_rename + verify-and-cache.
  • update_dir is_custom exclusion fix.
  • Mocker e2e + multi-replica integration tests.
  • Default-ON flip.

Tests

4 unit tests:

  • metadata_registry::tests — register/get roundtrip, unregister scope, Arc<RwLock> aliasing across clones.
  • local_model::env_self_host_metadata_tests::env_default_parsing — exhaustive truthy/falsy/empty/garbage env-var parsing (serial_test gated).

No new tests for the wire-format size field — pre-existing CheckedFile round-trip tests cover it implicitly via from_disk. Worker→frontend e2e is intentionally deferred to the capstone PR.

🤖 Generated with Claude Code


Open in Devin Review

Summary by CodeRabbit

  • New Features
    • Users can now optionally enable self-hosting of model metadata artifacts during model registration
    • New metadata serving endpoint provides access to registered model metadata files with appropriate HTTP responses
    • File size information is now tracked for model metadata artifacts

nnshah1 and others added 3 commits April 29, 2026 14:03
Backs worker self-hosting of MDC files. Process-local
`(model_slug, filename) -> PathBuf` registry on `DistributedRuntime`.
`system_status_server` always mounts `GET /v1/metadata/{slug}/{*filename}`;
empty registry returns 404. Consumers blake3-verify the body.

No producer wiring yet — that ships in a follow-up commit on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Wire-format addition. `CheckedFile.size: Option<u64>` populated by
`from_disk` and exposed via `size()`. Serde additive (`#[serde(default)]`)
so legacy MDC bytes (no size field) deserialize cleanly with
`size = None`. Used downstream as a new/legacy discriminant and as the
per-file fetch cap by the frontend resolve pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
`LocalModelBuilder::self_host_metadata(bool)` setter (default OFF). When
opted in, `LocalModel::attach()` walks the MDC's typed `CheckedFile`
slots, registers each local-disk path in the runtime's metadata-artifact
registry, and rewrites `cf.path` to
`http://<worker>/v1/metadata/<slug>/<filename>`. URL slots (`hf://`, etc.)
are left alone so existing transports keep working.

Default-OFF until the frontend verify-and-cache pipeline lands —
opting in early on a deployment with custom chat templates would trip
`update_dir`'s `is_custom` exclusion and surface as a downstream
`from_mdc` error. Override via `DYN_SELF_HOST_METADATA=true` (truthy
values opt in) or `register_model(..., self_host_metadata=True)`.

Adds `iter_metadata_files{,_mut}` slot iterators on
`ModelDeploymentCard` (and matching `pf_checked_file{,_mut}` dispatch
helpers) — needed by `move_to_self_host` and reused by the frontend
resolver in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 requested a review from a team as a code owner April 29, 2026 21:04
@github-actions github-actions Bot added the feat label Apr 29, 2026
@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

The changes introduce self-hosted metadata artifacts functionality, allowing models to serve metadata files through a registered HTTP endpoint. The system now tracks file sizes, manages artifact registration via a thread-safe registry, provides builder methods to control the feature, and exposes an HTTP endpoint for retrieving self-hosted files.

Changes

Cohort / File(s) Summary
Python Binding
lib/bindings/python/rust/lib.rs
Added optional self_host_metadata parameter to register_model function, passed to LocalModelBuilder when provided.
File Metadata Tracking
lib/llm/src/common/checked_file.rs
Added optional size field to CheckedFile struct; populated from filesystem metadata in from_disk, exposed via size() accessor, and conditionally serialized/deserialized.
Local Model Self-Hosting
lib/llm/src/local_model.rs
Implemented self-hosting mechanism via LocalModelBuilder.self_host_metadata() method, environment variable fallback, and conditional rewriting of metadata file slots to HTTP URLs during model attachment; includes artifact cleanup and helper for base URL resolution.
Model Card Iteration APIs
lib/llm/src/model_card.rs
Added immutable and mutable iterator methods to ModelDeploymentCard for accessing all metadata CheckedFile references across fixed field slots.
Runtime Registry Infrastructure
lib/runtime/src/lib.rs, lib/runtime/src/distributed.rs, lib/runtime/src/metadata_registry.rs
Added new MetadataArtifactRegistry struct with thread-safe HashMap-backed storage; DistributedRuntime now owns and exposes registry; module exported from crate root.
System Status Server Endpoint
lib/runtime/src/system_status_server.rs
Added /v1/metadata/{model_slug}/{filename} HTTP endpoint that retrieves registered artifacts from the registry, reads files asynchronously, and returns appropriate status codes (200, 404, 500).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main feature: worker-side self-hosting of metadata files via the system_status_server endpoint.
Description check ✅ Passed The description provides comprehensive coverage of scope, design rationale, implementation strategy across three commits, deferred work, and test coverage—exceeding the template requirements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/local_model.rs`:
- Around line 596-601: Detach currently doesn't clean up registry entries,
relying on the optional clear_self_hosted_artifacts(&self, drt) call, which is
easy to miss; modify detach_from_endpoint(...) to ensure
metadata_artifacts().unregister_model(...) is invoked as part of the detach path
(or persist the model slug/identity before teardown and use that persisted slug
during detach) so the /v1/metadata entries are removed automatically;
specifically, call the existing clear_self_hosted_artifacts implementation (or
inline its unregister_model(self.card.slug().as_ref()) logic) from
detach_from_endpoint and ensure slug is captured/owned before any state is
dropped so unregister can always run.

In `@lib/runtime/src/metadata_registry.rs`:
- Around line 27-43: The current entries map key (model_slug, filename) is not
unique across registrations so register/get/unregister_model must be changed to
include a per-registration discriminator or ref-counting: update the storage to
map (model_slug, filename) -> HashMap<registration_id, PathBuf> (or to
(model_slug, filename, registration_id) -> PathBuf plus a derived view), modify
register(&self, model_slug, filename, path, registration_id) to insert under the
registration_id, modify get(&self, model_slug, filename, registration_id_opt) to
resolve the correct registration (or a canonical/most-recent entry when
registration_id is absent), and change unregister_model(&self, model_slug,
registration_id_opt) to only remove entries for that specific registration_id
(or decrement/ref-count and only remove when count reaches zero); adjust all
call sites to supply the discriminator or rely on the new resolution semantics
and keep tracing::debug calls reflecting the registration_id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fe0956cc-9482-4e92-aa4f-ead7f87cecef

📥 Commits

Reviewing files that changed from the base of the PR and between cc1af6a and 50da344.

📒 Files selected for processing (8)
  • lib/bindings/python/rust/lib.rs
  • lib/llm/src/common/checked_file.rs
  • lib/llm/src/local_model.rs
  • lib/llm/src/model_card.rs
  • lib/runtime/src/distributed.rs
  • lib/runtime/src/lib.rs
  • lib/runtime/src/metadata_registry.rs
  • lib/runtime/src/system_status_server.rs

Comment thread lib/llm/src/local_model.rs
Comment thread lib/runtime/src/metadata_registry.rs Outdated
Per CodeRabbit. `(slug, filename)` was not unique across registrations
sharing a slug — base + LoRA(s) on top of the same model collide on
`tokenizer.json` etc., and `unregister_model(slug)` would wipe entries
the other registration still serves.

Changes:
- Registry keyed by `(slug, suffix, filename)`. `suffix` is the LoRA
  slug, or `BASE_SUFFIX = "_base"` for non-LoRA. `unregister(slug, suffix)`
  removes only that registration's entries.
- Route changes from `/v1/metadata/{slug}/{*filename}` to
  `/v1/metadata/{slug}/{suffix}/{*filename}`. URLs are an internal
  contract worker→frontend (worker writes them into MDC's CheckedFiles,
  frontend dereferences), so the change is invisible to users.
- `LocalModel::move_to_self_host` accepts `model_suffix: Option<&str>`
  (already computed in `attach()` for LoRA) and stores it on the
  `LocalModel` so `clear_self_hosted_artifacts` can scope cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Comment thread lib/llm/src/common/checked_file.rs Outdated
Comment thread lib/llm/src/model_card.rs Outdated
Comment thread lib/runtime/src/distributed.rs Outdated
Comment thread lib/runtime/src/metadata_registry.rs Outdated
- Drop the `path` field doc on `CheckedFile` — the type already says it.
- `iter_metadata_files` returns `Vec<&CheckedFile>` instead of an
  `impl Iterator` chain, matching the `_mut` variant. Avoids the
  generic and is easier to read.
- Update the `metadata_artifacts` field comment to reflect the
  three-segment URL `/v1/metadata/{slug}/{suffix}/{filename}`.
- Rewrite the metadata_registry module doc as prose: what it is
  (worker-side index of metadata files served via http) before how
  it's stored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Comment thread lib/runtime/src/metadata_registry.rs
Comment thread lib/runtime/src/metadata_registry.rs Outdated
Per Graham's review on PR #8855. We derive `Clone` on
`MetadataArtifactRegistry`, so testing that clone shares state is
testing `std::sync::Arc`, not our code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Comment thread lib/runtime/src/system_status_server.rs
Comment thread lib/runtime/src/system_status_server.rs
Single test that opts the mocker into self-host
(`DYN_SELF_HOST_METADATA=true`) and `requests.get`s
`/v1/metadata/{slug}/_base/config.json` directly on the worker's
`system_status_server`. Asserts 200 + valid HF config JSON, plus a 404
on an unknown filename. No frontend resolve-and-cache involved — just
the worker producer + HTTP route shipped in this PR.

File scopes to the self-host feature; later PRs will add multi-replica,
LoRA, and frontend verify-and-cache cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 requested review from a team as code owners April 30, 2026 15:09

@biswapanda biswapanda left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nnshah1 nnshah1 merged commit 6953ebc into main Apr 30, 2026
92 of 93 checks passed
@nnshah1 nnshah1 deleted the nnshah1/gh-8749-worker-self-host-route branch April 30, 2026 18:55
furionw pushed a commit that referenced this pull request May 2, 2026
…gh-8749) (#8855)

Signed-off-by: nnshah1 <neelays@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nnshah1 added a commit that referenced this pull request May 4, 2026
…pat tests (gh-8749)

PR #8855 added `CheckedFile.size: Option<u64>` as the new/legacy
discriminant for self-hosted metadata. This adds the consumer-side
plumbing the frontend resolve pipeline needs:

  - `Checksum::hash()` — borrows the hex digest. Used as the blob
    filename in the MDC cache (`blobs/<blake3-hex>`).
  - `test_legacy_wire_format_omits_size` — locks in `serde(default)`
    behavior so old MDC bytes deserialize cleanly with `size = None`.
  - `test_new_wire_format_carries_size` — round-trips a populated
    size through serde.
  - Extends `test_checked_file_from_disk` to assert `size` is
    populated (`Some(560)`) and `Checksum::hash()` returns the
    expected hex.

No behavior change for callers; resolve_metadata_files (next commit)
is the first consumer.

Refs #8749 #8855

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants