Skip to content

feat(llm): add P2P model config file delivery from workers to frontends#3

Closed
Kaonael wants to merge 1 commit into
mainfrom
feat-p2p-config-delivery
Closed

feat(llm): add P2P model config file delivery from workers to frontends#3
Kaonael wants to merge 1 commit into
mainfrom
feat-p2p-config-delivery

Conversation

@Kaonael

@Kaonael Kaonael commented Mar 29, 2026

Copy link
Copy Markdown
Owner

Overview:

Add P2P model config file delivery from workers to frontends. Workers automatically serve config files (config.json, tokenizer.json, etc.) over the request plane. Frontends
discover and download them from any available worker before falling back to HuggingFace. Enables private models without shared filesystems.

Details:

Problem: Frontends need model config files for preprocessing. The only paths were HuggingFace and ModelExpress — breaking for private/custom models not on HF.

Solution: Workers register a model-config AsyncEngine endpoint alongside inference endpoints. Frontends discover it through standard discovery and download files directly
over the current request plane (HTTP, TCP, or NATS).

Download priority chain (new):

  1. Local HF cache — instant, no network
  2. ModelExpress server — server-only, no hidden HF fallback
  3. P2P from any available worker — parallel downloads, atomic writes
  4. Direct HuggingFace download — last resort

Implementation:

  • config_endpoint.rs (new) — ModelConfigEngine, request/response types, P2pConfigDownloader trait
  • hub.rs — split from_hf() into get_cached_model_path(), try_model_express_server(), mx_download_direct() for fine-grained fallback control
  • model_card.rsdownload_config() orchestrates the 4-step chain; new config_filenames(), verify_local_checksums(), checked_file() accessors
  • watcher.rsWatcherP2pDownloader implements P2pConfigDownloader with parallel downloads and atomic writes (temp file + rename)
  • local_model.rs — registers model-config endpoint in attach()
  • checked_file.rsfilename() method, update_dir() refactored to use it

Safety:

  • Blake3 checksum verification after P2P download
  • Path traversal sanitization in engine (../etc/passwdpasswd)
  • Atomic file writes prevent corruption from concurrent discovery events
  • String-based response avoids Vec<u8> JSON serialization blowup (4-7x)

Docs updated:

  • frontend/README.md — download fallback chain
  • frontend/configuration.md — new "Model Config File Delivery" section
  • discovery-plane.md — new "Model Config Endpoint" section

Tests:

  • checked_file.rsfilename() from paths/URLs, update_dir() conversion
  • config_endpoint.rs — engine file serving, path traversal, serde round-trip, cache dir uniqueness
  • model_card.rsconfig_filenames(), verify_local_checksums() on valid and tampered files

Where should the reviewer start?

  1. lib/llm/src/config_endpoint.rs — new file, core types and engine
  2. lib/llm/src/model_card.rs:537-598download_config() 4-step fallback chain
  3. lib/llm/src/hub.rs:149-175try_model_express_server() using request_model_with_provider (not _and_fallback)
  4. lib/llm/src/discovery/watcher.rs:832-931WatcherP2pDownloader with parallel downloads and atomic writes

Workers register a "model-config" AsyncEngine endpoint alongside inference
endpoints. Frontends discover it through standard discovery and download
config files directly from any available worker over the request plane
(HTTP, TCP, or NATS).

Download priority chain:
1. Local HF cache (instant, no network)
2. ModelExpress server (no hidden HF fallback)
3. P2P from any available worker (parallel downloads, atomic writes)
4. Direct HuggingFace download (last resort)

This enables private models that aren't on HuggingFace without requiring
shared filesystems or manual file copying.

Key implementation details:
- Split hub::from_hf() into individual steps for fine-grained fallback
- P2pConfigDownloader trait decouples model_card from discovery internals
- Blake3 checksum verification after P2P download
- Atomic writes (temp file + rename) for concurrent safety
- String-based response to avoid Vec<u8> JSON serialization blowup

Signed-off-by: Nikita Sukharev <kaonael@gmail.com>
@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the Stale label Apr 29, 2026
@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

@github-actions github-actions Bot closed this May 4, 2026
@github-actions github-actions Bot deleted the feat-p2p-config-delivery branch May 4, 2026 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation feat Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant