-
Notifications
You must be signed in to change notification settings - Fork 199
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Description:
I would like to configure a basic AI gateway that routes all incoming openai requests to ollama selecting the model based on openai request, and no additional headers to say to use ollama.
e.g.
- I want a default route, so no header needed to just go to ollama.
- I don't want anything in the config file except ollama (e.g. no openai platform secrets or bedrock)
- I want to install
aigwby a specific version, or worst case@latestnot@maintag - bonus points for making otel work, but ok if this doesn't have it
I tried to pare down config I found to this, but not sure how to get a default route to work. So, this works but gives a 404 probably because I've not guessed correctly the rules part
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: aigw-run
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: aigw-run
spec:
gatewayClassName: aigw-run
listeners:
- name: http
protocol: HTTP
port: 1975
infrastructure:
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: envoy-ai-gateway
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: aigw-run
spec:
schema:
name: OpenAI
targetRefs:
- name: aigw-run
kind: Gateway
group: gateway.networking.k8s.io
rules:
- backendRefs: # This part is likely wrong. I just want ollama to be the default and no header needed
- name: ollama
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
name: ollama
spec:
timeouts:
request: 3m
schema:
name: OpenAI
backendRef:
name: ollama
kind: Backend
group: gateway.envoyproxy.io
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: ollama
spec:
endpoints:
- ip:
address: 0.0.0.0
port: 11434Before I start, I had to install like this
go install github.com/envoyproxy/ai-gateway/cmd/aigw@mainbecause installing latest was a no-go
$ go install github.com/envoyproxy/ai-gateway@latest
go: github.com/envoyproxy/ai-gateway@v0.1.5 requires go >= 1.24.0; switching to go1.24.3
go: github.com/envoyproxy/ai-gateway@latest: module github.com/envoyproxy/ai-gateway@latest found (v0.1.5), but does not contain package github.com/envoyproxy/ai-gatewayHere's my openai client code, which I run like this:
uv run -q --env-file .env chat.pymain.py
# /// script
# dependencies = [
# "openai",
# "elastic-opentelemetry",
# "elastic-opentelemetry-instrumentation-openai",
# "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os
import openai
from opentelemetry.instrumentation import auto_instrumentation
auto_instrumentation.initialize()
model = os.getenv("CHAT_MODEL", "gpt-4o-mini")
def main():
parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
parser.add_argument(
"--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
)
args = parser.parse_args()
client = openai.Client()
messages = [
{
"role": "user",
"content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
}
]
# vllm-specific switch to disable thinking
# See https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
if "qwen3" in model.lower():
extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
else:
extra_body = {}
if args.use_responses_api:
response = client.responses.create(
model=model, input=messages[0]["content"], temperature=0, extra_body=extra_body
)
print(response.output[0].content[0].text)
else:
chat_completion = client.chat.completions.create(
model=model, messages=messages, temperature=0, extra_body=extra_body
)
print(chat_completion.choices[0].message.content)
if __name__ == "__main__":
main().env
OPENAI_BASE_URL=http://localhost:1975/v1
OPENAI_API_KEY=unused
CHAT_MODEL=qwen3:0.6B
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request