Skip to content

feat: local model management for Ollama and LM Studio (browse, pull, configure) #1030

@Aureliolo

Description

@Aureliolo

Summary

Local providers (Ollama, LM Studio) have management APIs that SynthOrg doesn't use. Currently, configured provider model counts come from LiteLLM's static model_cost database instead of the actual running instance, and there's no way to download new models or configure launch parameters from the dashboard.

Bug: model count mismatch in setup wizard

The setup wizard's "Auto-detected Providers" section shows one model count (live probe via /api/tags) while the "Configured Providers" card shows a different count (from LiteLLM's hardcoded database). For example, Ollama auto-detect shows 17 models (the real ones) but the configured provider shows 29 (LiteLLM's static list, which includes models like deepseek-v3.1:671b-cloud that don't exist locally).

Root cause: create_from_preset (service.py:394) calls models_from_litellm("ollama") which returns 29 entries from LiteLLM's static DB. Since this is truthy, _maybe_discover_preset_models (line 453: if models: return models) skips live discovery entirely.

Fix: For local/no-auth presets (Ollama, LM Studio, vLLM), always prefer live discovery over models_from_litellm. The static DB is irrelevant for local providers -- only what's actually running matters.

Feature: local model management

Provider capabilities matrix

Provider List models Pull/download Delete Launch params Notes
Ollama GET /api/tags POST /api/pull DELETE /api/delete num_ctx, num_gpu, num_thread via options Full management API
LM Studio GET /v1/models GET /api/v0/models/download Yes via API Context length, GPU layers via load params Newer API versions
vLLM GET /v1/models No No No (launch-time only) Read-only -- launched with --model X

Backend requirements

  1. ProviderPreset capability flags: Add supports_model_pull, supports_model_delete, supports_model_config booleans to ProviderPreset so the UI knows what controls to show per provider.

  2. Model pull/download endpoint: POST /api/v1/providers/{name}/models/pull with { "model": "llama3.1:8b" }. Proxies to Ollama's /api/pull or LM Studio's download API. Returns a streaming response for download progress.

  3. Model delete endpoint: DELETE /api/v1/providers/{name}/models/{model_id}. Proxies to the provider's delete API.

  4. Model config endpoint: PUT /api/v1/providers/{name}/models/{model_id}/config with launch parameters (num_ctx, num_gpu, etc.). Stored in ProviderModelConfig and passed to LiteLLM at inference time.

  5. Refresh models endpoint: POST /api/v1/providers/{name}/models/refresh. Re-runs live discovery and updates the stored model list to match reality.

  6. Fix create_from_preset: For presets with auth_type=NONE, skip models_from_litellm and go directly to live discovery.

Dashboard requirements

  1. Model browser: In the provider detail page, show the actual models from the running instance (not LiteLLM's static list). Include model size, quantization info, and last-modified date where available.

  2. Pull new model: Search/input field to pull a model by name (e.g., llama3.1:8b). Show download progress with a progress bar. Only shown for providers with supports_model_pull.

  3. Delete model: Confirmation dialog to remove a model from the local instance. Only shown for providers with supports_model_delete.

  4. Model parameters: Per-model config editor for launch parameters (context window size, GPU layer count, thread count). Only shown for providers with supports_model_config. Values stored in ProviderModelConfig and applied at inference time.

  5. Refresh button: Re-discover models from the running instance and sync the stored list.

Per-model parameters (Ollama)

Parameter Description Default
num_ctx Context window size (tokens) Model default
num_gpu Number of GPU layers to offload Auto
num_thread CPU thread count Auto
num_batch Batch size for prompt processing 512
repeat_penalty Repetition penalty 1.1

Per-model parameters (LM Studio)

Parameter Description Default
n_ctx Context window size Model default
n_gpu_layers GPU layers to offload Auto

Design Reference

  • Providers design page: docs/design/operations.md (provider configuration section)
  • Provider presets: src/synthorg/providers/presets.py
  • Provider management service: src/synthorg/providers/management/service.py
  • Discovery module: src/synthorg/providers/discovery.py
  • Probing module: src/synthorg/providers/probing.py

Dependencies

None -- all provider infrastructure already exists.

Implementation notes

  • The bug fix (skip models_from_litellm for local presets) should land first as it's the smallest change
  • Streaming pull progress will need a WebSocket or SSE endpoint
  • vLLM is read-only by design -- only show list/refresh, no pull/delete/config controls

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:mediumShould do, but not blockingscope:large3+ days of workspec:human-interactionDESIGN_SPEC Section 13 - Human Interaction Layerspec:providersDESIGN_SPEC Section 9 - Model Provider Layertype:featureNew feature implementationv0.6Minor version v0.6v0.6.0Patch release v0.6.0

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions