Skip to content

Very basic ollama instructions #607

@codefromthecrypt

Description

@codefromthecrypt

Description:

I would like to configure a basic AI gateway that routes all incoming openai requests to ollama selecting the model based on openai request, and no additional headers to say to use ollama.

e.g.

  • I want a default route, so no header needed to just go to ollama.
  • I don't want anything in the config file except ollama (e.g. no openai platform secrets or bedrock)
  • I want to install aigw by a specific version, or worst case @latest not @main tag
  • bonus points for making otel work, but ok if this doesn't have it

I tried to pare down config I found to this, but not sure how to get a default route to work. So, this works but gives a 404 probably because I've not guessed correctly the rules part

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: aigw-run
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: aigw-run
spec:
  gatewayClassName: aigw-run
  listeners:
    - name: http
      protocol: HTTP
      port: 1975
  infrastructure:
    parametersRef:
      group: gateway.envoyproxy.io
      kind: EnvoyProxy
      name: envoy-ai-gateway
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: aigw-run
spec:
  schema:
    name: OpenAI
  targetRefs:
    - name: aigw-run
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - backendRefs:  # This part is likely wrong. I just want ollama to be the default and no header needed
        - name: ollama
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: ollama
spec:
  timeouts:
    request: 3m
  schema:
    name: OpenAI
  backendRef:
    name: ollama
    kind: Backend
    group: gateway.envoyproxy.io
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: ollama
spec:
  endpoints:
    - ip:
        address: 0.0.0.0
        port: 11434

Before I start, I had to install like this

go install github.com/envoyproxy/ai-gateway/cmd/aigw@main

because installing latest was a no-go

$ go install github.com/envoyproxy/ai-gateway@latest
go: github.com/envoyproxy/ai-gateway@v0.1.5 requires go >= 1.24.0; switching to go1.24.3
go: github.com/envoyproxy/ai-gateway@latest: module github.com/envoyproxy/ai-gateway@latest found (v0.1.5), but does not contain package github.com/envoyproxy/ai-gateway

Here's my openai client code, which I run like this:

uv run -q --env-file .env chat.py

main.py

# /// script
# dependencies = [
#     "openai",
#     "elastic-opentelemetry",
#     "elastic-opentelemetry-instrumentation-openai",
#     "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os

import openai
from opentelemetry.instrumentation import auto_instrumentation

auto_instrumentation.initialize()

model = os.getenv("CHAT_MODEL", "gpt-4o-mini")


def main():
    parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
    parser.add_argument(
        "--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
    )
    args = parser.parse_args()

    client = openai.Client()

    messages = [
        {
            "role": "user",
            "content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
        }
    ]

    # vllm-specific switch to disable thinking
    # See https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
    if "qwen3" in model.lower():
        extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
    else:
        extra_body = {}
    if args.use_responses_api:
        response = client.responses.create(
            model=model, input=messages[0]["content"], temperature=0, extra_body=extra_body
        )
        print(response.output[0].content[0].text)
    else:
        chat_completion = client.chat.completions.create(
            model=model, messages=messages, temperature=0, extra_body=extra_body
        )
        print(chat_completion.choices[0].message.content)


if __name__ == "__main__":
    main()

.env

OPENAI_BASE_URL=http://localhost:1975/v1
OPENAI_API_KEY=unused
CHAT_MODEL=qwen3:0.6B

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions