-
Notifications
You must be signed in to change notification settings - Fork 0
feat: local model management for Ollama and LM Studio (browse, pull, configure) #1030
Description
Summary
Local providers (Ollama, LM Studio) have management APIs that SynthOrg doesn't use. Currently, configured provider model counts come from LiteLLM's static model_cost database instead of the actual running instance, and there's no way to download new models or configure launch parameters from the dashboard.
Bug: model count mismatch in setup wizard
The setup wizard's "Auto-detected Providers" section shows one model count (live probe via /api/tags) while the "Configured Providers" card shows a different count (from LiteLLM's hardcoded database). For example, Ollama auto-detect shows 17 models (the real ones) but the configured provider shows 29 (LiteLLM's static list, which includes models like deepseek-v3.1:671b-cloud that don't exist locally).
Root cause: create_from_preset (service.py:394) calls models_from_litellm("ollama") which returns 29 entries from LiteLLM's static DB. Since this is truthy, _maybe_discover_preset_models (line 453: if models: return models) skips live discovery entirely.
Fix: For local/no-auth presets (Ollama, LM Studio, vLLM), always prefer live discovery over models_from_litellm. The static DB is irrelevant for local providers -- only what's actually running matters.
Feature: local model management
Provider capabilities matrix
| Provider | List models | Pull/download | Delete | Launch params | Notes |
|---|---|---|---|---|---|
| Ollama | GET /api/tags |
POST /api/pull |
DELETE /api/delete |
num_ctx, num_gpu, num_thread via options |
Full management API |
| LM Studio | GET /v1/models |
GET /api/v0/models/download |
Yes via API | Context length, GPU layers via load params | Newer API versions |
| vLLM | GET /v1/models |
No | No | No (launch-time only) | Read-only -- launched with --model X |
Backend requirements
-
ProviderPreset capability flags: Add
supports_model_pull,supports_model_delete,supports_model_configbooleans toProviderPresetso the UI knows what controls to show per provider. -
Model pull/download endpoint:
POST /api/v1/providers/{name}/models/pullwith{ "model": "llama3.1:8b" }. Proxies to Ollama's/api/pullor LM Studio's download API. Returns a streaming response for download progress. -
Model delete endpoint:
DELETE /api/v1/providers/{name}/models/{model_id}. Proxies to the provider's delete API. -
Model config endpoint:
PUT /api/v1/providers/{name}/models/{model_id}/configwith launch parameters (num_ctx,num_gpu, etc.). Stored inProviderModelConfigand passed to LiteLLM at inference time. -
Refresh models endpoint:
POST /api/v1/providers/{name}/models/refresh. Re-runs live discovery and updates the stored model list to match reality. -
Fix
create_from_preset: For presets withauth_type=NONE, skipmodels_from_litellmand go directly to live discovery.
Dashboard requirements
-
Model browser: In the provider detail page, show the actual models from the running instance (not LiteLLM's static list). Include model size, quantization info, and last-modified date where available.
-
Pull new model: Search/input field to pull a model by name (e.g.,
llama3.1:8b). Show download progress with a progress bar. Only shown for providers withsupports_model_pull. -
Delete model: Confirmation dialog to remove a model from the local instance. Only shown for providers with
supports_model_delete. -
Model parameters: Per-model config editor for launch parameters (context window size, GPU layer count, thread count). Only shown for providers with
supports_model_config. Values stored inProviderModelConfigand applied at inference time. -
Refresh button: Re-discover models from the running instance and sync the stored list.
Per-model parameters (Ollama)
| Parameter | Description | Default |
|---|---|---|
num_ctx |
Context window size (tokens) | Model default |
num_gpu |
Number of GPU layers to offload | Auto |
num_thread |
CPU thread count | Auto |
num_batch |
Batch size for prompt processing | 512 |
repeat_penalty |
Repetition penalty | 1.1 |
Per-model parameters (LM Studio)
| Parameter | Description | Default |
|---|---|---|
n_ctx |
Context window size | Model default |
n_gpu_layers |
GPU layers to offload | Auto |
Design Reference
- Providers design page:
docs/design/operations.md(provider configuration section) - Provider presets:
src/synthorg/providers/presets.py - Provider management service:
src/synthorg/providers/management/service.py - Discovery module:
src/synthorg/providers/discovery.py - Probing module:
src/synthorg/providers/probing.py
Dependencies
None -- all provider infrastructure already exists.
Implementation notes
- The bug fix (skip
models_from_litellmfor local presets) should land first as it's the smallest change - Streaming pull progress will need a WebSocket or SSE endpoint
- vLLM is read-only by design -- only show list/refresh, no pull/delete/config controls