Skip to content

feat(worker): add LOCALAI_PREFETCH_MODELS for boot-time gallery prefetch#10108

Merged
mudler merged 1 commit into
masterfrom
feat/worker-prefetch-models
May 31, 2026
Merged

feat(worker): add LOCALAI_PREFETCH_MODELS for boot-time gallery prefetch#10108
mudler merged 1 commit into
masterfrom
feat/worker-prefetch-models

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Summary

In LocalAI distributed mode the master streams a model GGUF to a worker on first inference. On bandwidth-constrained cluster networks (libp2p circuit-v2 relays under NAT, double-NAT residential, slow overlays) that transfer can be slow or unreliable — meanwhile each worker's outbound internet is usually fine.

LOCALAI_PREFETCH_MODELS lets the operator name gallery model IDs to download at worker boot, before the worker subscribes to backend.install events. Reuses gallery.InstallModelFromGallery so the on-disk /models layout matches what the master would have pushed, and the master can still push files on demand if the gallery is unreachable at boot (prefetch is non-fatal on every error path).

Design notes

  • LOCALAI_PREFETCH_MODELS (alias PREFETCH_MODELS) is comma-separated gallery IDs, mirroring LOCALAI_MODELS,MODELS on the master.
  • Added a companion LOCALAI_GALLERIES,GALLERIES field to the worker config (the gallery URL list is needed to resolve IDs; the master already reads it).
  • Prefetch runs before registration / NATS subscription / the "Worker ready" log, so a still-warming worker isn't announced as ready.
  • Reuses the master's gallery install entrypoint (gallery.InstallModelFromGallery) so URL resolution, SHA verification, and the idempotent file-exists/SHA-match skip path are shared.
  • The installer is wrapped in a private function-typed indirection so tests can swap a fake — production code never reassigns the binding.

Tests

11 new Ginkgo specs in `core/services/worker/prefetch_test.go`:

  • happy path
  • idempotent restart (file already on disk)
  • non-fatal per-model failure
  • empty config
  • whitespace trimming
  • malformed `LOCALAI_GALLERIES`
  • empty galleries list

Build + vet clean; `go test ./core/services/worker/... ./core/cli/... ./core/gallery/... ./core/startup/...` all pass.

In LocalAI distributed mode the master streams a model GGUF to a
worker on first inference. On bandwidth-constrained cluster networks
(libp2p circuit-v2 relays under NAT, double-NAT residential, slow
overlays) that transfer can be slow or unreliable — meanwhile each
worker's outbound internet is usually fine.

LOCALAI_PREFETCH_MODELS lets the operator name gallery model IDs to
download at worker boot, BEFORE the worker subscribes to backend.install
events. Reuses gallery.InstallModelFromGallery so the on-disk /models
layout matches what the master would have pushed, and the master can
still push files on demand if the gallery is unreachable at boot
(prefetch is non-fatal on every error path).

The installer is wrapped in a function-value indirection so tests can
swap a fake without touching the real gallery; production never
reassigns the binding.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/worker-prefetch-models branch from 54251c7 to 92be4d6 Compare May 31, 2026 09:32
@mudler mudler merged commit 0d57957 into master May 31, 2026
58 checks passed
@mudler mudler deleted the feat/worker-prefetch-models branch May 31, 2026 10:22
@localai-bot localai-bot added the enhancement New feature or request label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants