feat(worker): add LOCALAI_PREFETCH_MODELS for boot-time gallery prefetch#10108
Merged
Conversation
In LocalAI distributed mode the master streams a model GGUF to a worker on first inference. On bandwidth-constrained cluster networks (libp2p circuit-v2 relays under NAT, double-NAT residential, slow overlays) that transfer can be slow or unreliable — meanwhile each worker's outbound internet is usually fine. LOCALAI_PREFETCH_MODELS lets the operator name gallery model IDs to download at worker boot, BEFORE the worker subscribes to backend.install events. Reuses gallery.InstallModelFromGallery so the on-disk /models layout matches what the master would have pushed, and the master can still push files on demand if the gallery is unreachable at boot (prefetch is non-fatal on every error path). The installer is wrapped in a function-value indirection so tests can swap a fake without touching the real gallery; production never reassigns the binding. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
54251c7 to
92be4d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In LocalAI distributed mode the master streams a model GGUF to a worker on first inference. On bandwidth-constrained cluster networks (libp2p circuit-v2 relays under NAT, double-NAT residential, slow overlays) that transfer can be slow or unreliable — meanwhile each worker's outbound internet is usually fine.
LOCALAI_PREFETCH_MODELSlets the operator name gallery model IDs to download at worker boot, before the worker subscribes tobackend.installevents. Reusesgallery.InstallModelFromGalleryso the on-disk/modelslayout matches what the master would have pushed, and the master can still push files on demand if the gallery is unreachable at boot (prefetch is non-fatal on every error path).Design notes
LOCALAI_PREFETCH_MODELS(aliasPREFETCH_MODELS) is comma-separated gallery IDs, mirroringLOCALAI_MODELS,MODELSon the master.LOCALAI_GALLERIES,GALLERIESfield to the worker config (the gallery URL list is needed to resolve IDs; the master already reads it).gallery.InstallModelFromGallery) so URL resolution, SHA verification, and the idempotent file-exists/SHA-match skip path are shared.Tests
11 new Ginkgo specs in `core/services/worker/prefetch_test.go`:
Build + vet clean; `go test ./core/services/worker/... ./core/cli/... ./core/gallery/... ./core/startup/...` all pass.