feat(self-host): worker hosts metadata files via system_status_server (gh-8749)#8855
Conversation
Backs worker self-hosting of MDC files. Process-local
`(model_slug, filename) -> PathBuf` registry on `DistributedRuntime`.
`system_status_server` always mounts `GET /v1/metadata/{slug}/{*filename}`;
empty registry returns 404. Consumers blake3-verify the body.
No producer wiring yet — that ships in a follow-up commit on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Wire-format addition. `CheckedFile.size: Option<u64>` populated by `from_disk` and exposed via `size()`. Serde additive (`#[serde(default)]`) so legacy MDC bytes (no size field) deserialize cleanly with `size = None`. Used downstream as a new/legacy discriminant and as the per-file fetch cap by the frontend resolve pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
`LocalModelBuilder::self_host_metadata(bool)` setter (default OFF). When
opted in, `LocalModel::attach()` walks the MDC's typed `CheckedFile`
slots, registers each local-disk path in the runtime's metadata-artifact
registry, and rewrites `cf.path` to
`http://<worker>/v1/metadata/<slug>/<filename>`. URL slots (`hf://`, etc.)
are left alone so existing transports keep working.
Default-OFF until the frontend verify-and-cache pipeline lands —
opting in early on a deployment with custom chat templates would trip
`update_dir`'s `is_custom` exclusion and surface as a downstream
`from_mdc` error. Override via `DYN_SELF_HOST_METADATA=true` (truthy
values opt in) or `register_model(..., self_host_metadata=True)`.
Adds `iter_metadata_files{,_mut}` slot iterators on
`ModelDeploymentCard` (and matching `pf_checked_file{,_mut}` dispatch
helpers) — needed by `move_to_self_host` and reused by the frontend
resolver in a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
WalkthroughThe changes introduce self-hosted metadata artifacts functionality, allowing models to serve metadata files through a registered HTTP endpoint. The system now tracks file sizes, manages artifact registration via a thread-safe registry, provides builder methods to control the feature, and exposes an HTTP endpoint for retrieving self-hosted files. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/llm/src/local_model.rs`:
- Around line 596-601: Detach currently doesn't clean up registry entries,
relying on the optional clear_self_hosted_artifacts(&self, drt) call, which is
easy to miss; modify detach_from_endpoint(...) to ensure
metadata_artifacts().unregister_model(...) is invoked as part of the detach path
(or persist the model slug/identity before teardown and use that persisted slug
during detach) so the /v1/metadata entries are removed automatically;
specifically, call the existing clear_self_hosted_artifacts implementation (or
inline its unregister_model(self.card.slug().as_ref()) logic) from
detach_from_endpoint and ensure slug is captured/owned before any state is
dropped so unregister can always run.
In `@lib/runtime/src/metadata_registry.rs`:
- Around line 27-43: The current entries map key (model_slug, filename) is not
unique across registrations so register/get/unregister_model must be changed to
include a per-registration discriminator or ref-counting: update the storage to
map (model_slug, filename) -> HashMap<registration_id, PathBuf> (or to
(model_slug, filename, registration_id) -> PathBuf plus a derived view), modify
register(&self, model_slug, filename, path, registration_id) to insert under the
registration_id, modify get(&self, model_slug, filename, registration_id_opt) to
resolve the correct registration (or a canonical/most-recent entry when
registration_id is absent), and change unregister_model(&self, model_slug,
registration_id_opt) to only remove entries for that specific registration_id
(or decrement/ref-count and only remove when count reaches zero); adjust all
call sites to supply the discriminator or rely on the new resolution semantics
and keep tracing::debug calls reflecting the registration_id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fe0956cc-9482-4e92-aa4f-ead7f87cecef
📒 Files selected for processing (8)
lib/bindings/python/rust/lib.rslib/llm/src/common/checked_file.rslib/llm/src/local_model.rslib/llm/src/model_card.rslib/runtime/src/distributed.rslib/runtime/src/lib.rslib/runtime/src/metadata_registry.rslib/runtime/src/system_status_server.rs
Per CodeRabbit. `(slug, filename)` was not unique across registrations
sharing a slug — base + LoRA(s) on top of the same model collide on
`tokenizer.json` etc., and `unregister_model(slug)` would wipe entries
the other registration still serves.
Changes:
- Registry keyed by `(slug, suffix, filename)`. `suffix` is the LoRA
slug, or `BASE_SUFFIX = "_base"` for non-LoRA. `unregister(slug, suffix)`
removes only that registration's entries.
- Route changes from `/v1/metadata/{slug}/{*filename}` to
`/v1/metadata/{slug}/{suffix}/{*filename}`. URLs are an internal
contract worker→frontend (worker writes them into MDC's CheckedFiles,
frontend dereferences), so the change is invisible to users.
- `LocalModel::move_to_self_host` accepts `model_suffix: Option<&str>`
(already computed in `attach()` for LoRA) and stores it on the
`LocalModel` so `clear_self_hosted_artifacts` can scope cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
- Drop the `path` field doc on `CheckedFile` — the type already says it.
- `iter_metadata_files` returns `Vec<&CheckedFile>` instead of an
`impl Iterator` chain, matching the `_mut` variant. Avoids the
generic and is easier to read.
- Update the `metadata_artifacts` field comment to reflect the
three-segment URL `/v1/metadata/{slug}/{suffix}/{filename}`.
- Rewrite the metadata_registry module doc as prose: what it is
(worker-side index of metadata files served via http) before how
it's stored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Per Graham's review on PR #8855. We derive `Clone` on `MetadataArtifactRegistry`, so testing that clone shares state is testing `std::sync::Arc`, not our code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
Single test that opts the mocker into self-host
(`DYN_SELF_HOST_METADATA=true`) and `requests.get`s
`/v1/metadata/{slug}/_base/config.json` directly on the worker's
`system_status_server`. Asserts 200 + valid HF config JSON, plus a 404
on an unknown filename. No frontend resolve-and-cache involved — just
the worker producer + HTTP route shipped in this PR.
File scopes to the self-host feature; later PRs will add multi-replica,
LoRA, and frontend verify-and-cache cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
…pat tests (gh-8749) PR #8855 added `CheckedFile.size: Option<u64>` as the new/legacy discriminant for self-hosted metadata. This adds the consumer-side plumbing the frontend resolve pipeline needs: - `Checksum::hash()` — borrows the hex digest. Used as the blob filename in the MDC cache (`blobs/<blake3-hex>`). - `test_legacy_wire_format_omits_size` — locks in `serde(default)` behavior so old MDC bytes deserialize cleanly with `size = None`. - `test_new_wire_format_carries_size` — round-trips a populated size through serde. - Extends `test_checked_file_from_disk` to assert `size` is populated (`Some(560)`) and `Checksum::hash()` returns the expected hex. No behavior change for callers; resolve_metadata_files (next commit) is the first consumer. Refs #8749 #8855 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
First slice of #8749. Companion to #8754, which stays open as the integration capstone — once this lands, #8754 rebases against main and the next slice (frontend verify-and-cache) extracts cleanly.
Scope of this PR
Worker side only. Opt-in via
DYN_SELF_HOST_METADATA=true(default OFF) orregister_model(..., self_host_metadata=True). When opted in,LocalModel::attach()walks the MDC's typedCheckedFileslots, registers each local-disk path in a process-local registry onDistributedRuntime, and rewritescf.pathtohttp://<worker>/v1/metadata/<slug>/<filename>. URL slots (hf://, etc.) are left alone so existing transports keep working.Three commits:
feat(runtime)—MetadataArtifactRegistry+DistributedRuntime::metadata_artifacts()accessor + always-mountedGET /v1/metadata/{slug}/{*filename}route onsystem_status_server(empty registry → 404).feat(checked-file)— wire-format addition:CheckedFile.size: Option<u64>, populated byfrom_disk, additive serde so legacy MDC bytes deserialize cleanly withsize = None.feat(self-host)— worker producer:LocalModelBuilder::self_host_metadatasetter,move_to_self_host,self_host_base_url,DYN_SELF_HOST_METADATAenv (default OFF),register_model(..., self_host_metadata=...)Python kwarg, plusiter_metadata_files{,_mut}slot iterators on the MDC.Why default OFF
The frontend verify-and-cache pipeline ships in a follow-up. Without it, a frontend seeing an
http://URL in a custom-template slot trips theupdate_diris_customexclusion and downstreamfrom_mdcfails onpath() == None. Default-OFF means existing deployments are unaffected; opt-in users with custom templates get a clean error, not silent breakage. Default flips to ON in the capstone PR once the consumer is in place.What's NOT here
Deferred to follow-up PRs (tracked under #8749 / #8754):
resolve_metadata_files+BlobLock+stage_and_rename+ verify-and-cache.update_diris_customexclusion fix.Tests
4 unit tests:
metadata_registry::tests— register/get roundtrip, unregister scope,Arc<RwLock>aliasing across clones.local_model::env_self_host_metadata_tests::env_default_parsing— exhaustive truthy/falsy/empty/garbage env-var parsing (serial_test gated).No new tests for the wire-format
sizefield — pre-existingCheckedFileround-trip tests cover it implicitly viafrom_disk. Worker→frontend e2e is intentionally deferred to the capstone PR.🤖 Generated with Claude Code
Summary by CodeRabbit