-
-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Thank you for this great software. I tried to get it running with docker compose and its new model runner. With the following configuration I managed to see the (healthy) endpoint and the available models. But I cannot access the service endpoints from the other end (ollama, openai, ...).
I tried to setup the endpoint with openai-api since docker model runner supports a compatible api (see https://docs.docker.com/ai/model-runner/api-reference/#available-openai-endpoints )
- Endpoint is running and healthy
- Models with internal api route
/internal/status/modelsor/olla/modelsshow up - Models with external api routes return an empty list (i.e.
/olla/openai/v1/modelsor/olla/llamacpp/v1/models) - External api endpoints do not work (404 page not found)
It feels like I'm nearly there, but it seems like I missed something with my config.
## docker-compose.yml ##
services:
olla:
image: ghcr.io/thushan/olla:latest
container_name: olla
restart: unless-stopped
ports:
- "40114:40114"
volumes:
- ./olla.yaml:/app/config.yaml:ro
# - ./logs:/app/logs
healthcheck:
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:40114/internal/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
environment:
- LOG_LEVEL=info
models:
gptoss:
endpoint_var: GPTOSS_URL
model_var: GPTOSS_MODEL
models:
qwen3coder:
model: ai/qwen3-coder:30B-A3B-UD-Q4_K_XL
context_size: 30000
# runtime_flags:
# - "--n-gpu-layers 1"For olla I used the default reference config:
## olla.yml ##
server:
host: "0.0.0.0"
port: 40114
read_timeout: 20s
write_timeout: 0s
idle_timeout: 120s
shutdown_timeout: 10s
request_logging: false
request_limits:
max_body_size: 52428800 # 50MB
max_header_size: 524288 # 512KB
rate_limits:
global_requests_per_minute: 0
per_ip_requests_per_minute: 0
health_requests_per_minute: 0
burst_size: 50
cleanup_interval: 1m
trust_proxy_headers: false
trusted_proxy_cidrs: []
proxy:
engine: "olla"
profile: "auto"
load_balancer: "priority"
connection_timeout: 30s
response_timeout: 0s
read_timeout: 0s
stream_buffer_size: 4096
# profile_filter:
# include:
# - "ollama" # Include Ollama
# - "openai*" # Include all OpenAI variants
translators:
anthropic:
enabled: true # Enable Anthropic translator
max_message_size: 10485760 # Max request size (10MB)
discovery:
type: "static"
refresh_interval: 5m
model_discovery:
enabled: true
interval: 5m
timeout: 30s
concurrent_workers: 5
retry_attempts: 3
retry_backoff: 5s
static:
endpoints:
- url: "http://172.17.0.1:12434/engines/llama.cpp/"
name: "docker-model-loader"
type: "openai"
priority: 100
model_url: "/v1/models"
health_check_url: "http://172.17.0.1:12434/engines/llama.cpp/v1/models"
check_interval: 15s
check_timeout: 10s
model_registry:
type: "memory"
enable_unifier: true
routing_strategy:
type: "strict" # strict, optimistic, or discovery
options:
fallback_behavior: "all" # compatible_only, all, or none
discovery_timeout: 2s
discovery_refresh_on_miss: false
unification:
enabled: true
stale_threshold: 24h # Model retention time
cleanup_interval: 10m # Cleanup frequency
cache_ttl: 5m
custom_rules: []
logging:
level: "debug"
format: "text"
output: "stdout"So internal endpoints are working fine:
/internal/status/endpoints
{
"timestamp": "2025-11-05T06:52:26.31293467Z",
"endpoints": [
{
"name": "docker-model-loader",
"type": "openai",
"status": "healthy",
"last_model_sync": "2m ago",
"health_check": "25s ago",
"response_time": "2ms",
"success_rate": "100%",
"priority": 100,
"model_count": 2,
"request_count": 3
}
],
"total_count": 1,
"healthy_count": 1,
"routable_count": 1
}/olla/models
{
"object": "list",
"data": [
{
"olla": {
"family": "",
"variant": "",
"parameter_size": "",
"quantization": "",
"aliases": [
"ai/gpt-oss:20B-UD-Q6_K_XL"
],
"availability": [
{
"endpoint": "docker-model-loader",
"state": "unknown"
}
],
"capabilities": [
"text-generation"
]
},
"id": "ai/gpt-oss:20B-UD-Q6_K_XL",
"object": "model",
"owned_by": "olla",
"created": 1762324251
},
{
"olla": {
"family": "",
"variant": "",
"parameter_size": "",
"quantization": "",
"aliases": [
"hf.co/unsloth/qwen3-coder-30b-a3b-instruct-gguf:q4_k_xl"
],
"availability": [
{
"endpoint": "docker-model-loader",
"state": "unknown"
}
],
"capabilities": [
"text-generation",
"code-generation",
"programming",
"code-completion",
"instruction-following",
"chat"
]
},
"id": "hf.co/unsloth/qwen3-coder-30b-a3b-instruct-gguf:q4_k_xl",
"object": "model",
"owned_by": "olla",
"created": 1762324251
}
]
}But external don't:
/olla/openai/v1/models
{
"object": "list",
"data": []
}/olla/openai/v1
404 page not found
How can I get the openai or ollama endpoints up and running?
Best regards
Torsten