Very basic ollama instructions

*Description*:

I would like to configure a basic AI gateway that routes all incoming openai requests to ollama selecting the model based on openai request, and no additional headers to say to use ollama.

e.g.
* I want a default route, so no header needed to just go to ollama.
* I don't want anything in the config file except ollama (e.g. no openai platform secrets or bedrock)
* I want to install `aigw` by a specific version, or worst case `@latest` not `@main` tag
* bonus points for making otel work, but ok if this doesn't have it

I tried to pare down [config I found](https://github.com/envoyproxy/ai-gateway/blob/main/cmd/aigw/ai-gateway-default-resources.yaml) to this, but not sure how to get a default route to work. So, this works but gives a 404 probably because I've not guessed correctly the rules part

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: aigw-run
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: aigw-run
spec:
  gatewayClassName: aigw-run
  listeners:
    - name: http
      protocol: HTTP
      port: 1975
  infrastructure:
    parametersRef:
      group: gateway.envoyproxy.io
      kind: EnvoyProxy
      name: envoy-ai-gateway
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: aigw-run
spec:
  schema:
    name: OpenAI
  targetRefs:
    - name: aigw-run
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - backendRefs:  # This part is likely wrong. I just want ollama to be the default and no header needed
        - name: ollama
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: ollama
spec:
  timeouts:
    request: 3m
  schema:
    name: OpenAI
  backendRef:
    name: ollama
    kind: Backend
    group: gateway.envoyproxy.io
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: ollama
spec:
  endpoints:
    - ip:
        address: 0.0.0.0
        port: 11434
```

Before I start, I had to install like this
```bash
go install github.com/envoyproxy/ai-gateway/cmd/aigw@main
```

because installing latest was a no-go

```bash
$ go install github.com/envoyproxy/ai-gateway@latest
go: github.com/envoyproxy/ai-gateway@v0.1.5 requires go >= 1.24.0; switching to go1.24.3
go: github.com/envoyproxy/ai-gateway@latest: module github.com/envoyproxy/ai-gateway@latest found (v0.1.5), but does not contain package github.com/envoyproxy/ai-gateway
```

Here's my openai client code, which I run like this:

```bash
uv run -q --env-file .env chat.py
```

main.py
```python
# /// script
# dependencies = [
#     "openai",
#     "elastic-opentelemetry",
#     "elastic-opentelemetry-instrumentation-openai",
#     "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os

import openai
from opentelemetry.instrumentation import auto_instrumentation

auto_instrumentation.initialize()

model = os.getenv("CHAT_MODEL", "gpt-4o-mini")


def main():
    parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
    parser.add_argument(
        "--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
    )
    args = parser.parse_args()

    client = openai.Client()

    messages = [
        {
            "role": "user",
            "content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
        }
    ]

    # vllm-specific switch to disable thinking
    # See https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
    if "qwen3" in model.lower():
        extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
    else:
        extra_body = {}
    if args.use_responses_api:
        response = client.responses.create(
            model=model, input=messages[0]["content"], temperature=0, extra_body=extra_body
        )
        print(response.output[0].content[0].text)
    else:
        chat_completion = client.chat.completions.create(
            model=model, messages=messages, temperature=0, extra_body=extra_body
        )
        print(chat_completion.choices[0].message.content)


if __name__ == "__main__":
    main()
```

.env
```
OPENAI_BASE_URL=http://localhost:1975/v1
OPENAI_API_KEY=unused
CHAT_MODEL=qwen3:0.6B
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very basic ollama instructions #607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Very basic ollama instructions #607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions