Olla with docker model runner

Thank you for this great software. I tried to get it running with docker compose and its new model runner. With the following configuration I managed to see the (healthy) endpoint and the available models. But I cannot access the service endpoints from the other end (ollama, openai, ...).

I tried to setup the endpoint with openai-api since docker model runner supports a compatible api (see [https://docs.docker.com/ai/model-runner/api-reference/#available-openai-endpoints](https://docs.docker.com/ai/model-runner/api-reference/#available-openai-endpoints) )

- [x] Endpoint is running and healthy
- [x] Models with internal api route `/internal/status/models` or `/olla/models` show up
- [ ] Models with external api routes return an empty list (i.e. `/olla/openai/v1/models` or `/olla/llamacpp/v1/models`)
- [ ] External api endpoints do not work (404 page not found)

It feels like I'm nearly there, but it seems like I missed something with my config.

```yml
## docker-compose.yml ##
services:
  olla:
    image: ghcr.io/thushan/olla:latest
    container_name: olla
    restart: unless-stopped
    ports:
      - "40114:40114"
    volumes:
      - ./olla.yaml:/app/config.yaml:ro
      # - ./logs:/app/logs
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:40114/internal/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    environment:
      - LOG_LEVEL=info
    models:
      gptoss:
        endpoint_var: GPTOSS_URL
        model_var: GPTOSS_MODEL

models:
  qwen3coder:
    model: ai/qwen3-coder:30B-A3B-UD-Q4_K_XL
    context_size: 30000
#    runtime_flags:
#      - "--n-gpu-layers 1"
```

For olla I used the default reference config:

```yml
## olla.yml ##
server:
  host: "0.0.0.0"
  port: 40114
  read_timeout: 20s
  write_timeout: 0s
  idle_timeout: 120s
  shutdown_timeout: 10s
  request_logging: false
  request_limits:
    max_body_size: 52428800    # 50MB
    max_header_size: 524288     # 512KB
  rate_limits:
    global_requests_per_minute: 0
    per_ip_requests_per_minute: 0
    health_requests_per_minute: 0
    burst_size: 50
    cleanup_interval: 1m
    trust_proxy_headers: false
    trusted_proxy_cidrs: []


proxy:
  engine: "olla"
  profile: "auto"
  load_balancer: "priority"
  connection_timeout: 30s
  response_timeout: 0s
  read_timeout: 0s
  stream_buffer_size: 4096
#  profile_filter:
#    include:
#      - "ollama"        # Include Ollama
#      - "openai*"       # Include all OpenAI variants

translators:
  anthropic:
    enabled: true                   # Enable Anthropic translator
    max_message_size: 10485760     # Max request size (10MB)


discovery:
  type: "static"
  refresh_interval: 5m
  model_discovery:
    enabled: true
    interval: 5m
    timeout: 30s
    concurrent_workers: 5
    retry_attempts: 3
    retry_backoff: 5s
  static:
    endpoints:
      - url: "http://172.17.0.1:12434/engines/llama.cpp/"
        name: "docker-model-loader"
        type: "openai"
        priority: 100
        model_url: "/v1/models"
        health_check_url: "http://172.17.0.1:12434/engines/llama.cpp/v1/models"
        check_interval: 15s
        check_timeout: 10s

model_registry:
  type: "memory"
  enable_unifier: true
  routing_strategy:
    type: "strict"       # strict, optimistic, or discovery
    options:
      fallback_behavior: "all"     # compatible_only, all, or none
      discovery_timeout: 2s
      discovery_refresh_on_miss: false
  unification:
    enabled: true
    stale_threshold: 24h   # Model retention time
    cleanup_interval: 10m  # Cleanup frequency
    cache_ttl: 5m
    custom_rules: []

logging:
  level: "debug"
  format: "text"
  output: "stdout"
```


So internal endpoints are working fine:

**/internal/status/endpoints**

```json
{
  "timestamp": "2025-11-05T06:52:26.31293467Z",
  "endpoints": [
    {
      "name": "docker-model-loader",
      "type": "openai",
      "status": "healthy",
      "last_model_sync": "2m ago",
      "health_check": "25s ago",
      "response_time": "2ms",
      "success_rate": "100%",
      "priority": 100,
      "model_count": 2,
      "request_count": 3
    }
  ],
  "total_count": 1,
  "healthy_count": 1,
  "routable_count": 1
}
```

**/olla/models**

```json
{
  "object": "list",
  "data": [
    {
      "olla": {
        "family": "",
        "variant": "",
        "parameter_size": "",
        "quantization": "",
        "aliases": [
          "ai/gpt-oss:20B-UD-Q6_K_XL"
        ],
        "availability": [
          {
            "endpoint": "docker-model-loader",
            "state": "unknown"
          }
        ],
        "capabilities": [
          "text-generation"
        ]
      },
      "id": "ai/gpt-oss:20B-UD-Q6_K_XL",
      "object": "model",
      "owned_by": "olla",
      "created": 1762324251
    },
    {
      "olla": {
        "family": "",
        "variant": "",
        "parameter_size": "",
        "quantization": "",
        "aliases": [
          "hf.co/unsloth/qwen3-coder-30b-a3b-instruct-gguf:q4_k_xl"
        ],
        "availability": [
          {
            "endpoint": "docker-model-loader",
            "state": "unknown"
          }
        ],
        "capabilities": [
          "text-generation",
          "code-generation",
          "programming",
          "code-completion",
          "instruction-following",
          "chat"
        ]
      },
      "id": "hf.co/unsloth/qwen3-coder-30b-a3b-instruct-gguf:q4_k_xl",
      "object": "model",
      "owned_by": "olla",
      "created": 1762324251
    }
  ]
}
```

But external don't:

**/olla/openai/v1/models**

```json
{
  "object": "list",
  "data": []
}
```

**/olla/openai/v1**

```
404 page not found
```

How can I get the **openai** or **ollama** endpoints up and running?

Best regards
Torsten

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Olla with docker model runner #80

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Olla with docker model runner #80

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions