-
Notifications
You must be signed in to change notification settings - Fork 199
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Description:
fyi I tried this and had a 500 error attempting to get this to route through to ollama. I'm using ENVOY_VERSION=1.34.1 aigw run ai-gateway.yaml
Here's my client failure
$ uv run --exact -q --env-file env.local ../chat.py
Traceback (most recent call last):
File "/Users/adriancole/oss/observability-examples/inference-platforms/aigw/../chat.py", line 56, in <module>
main()
File "/Users/adriancole/oss/observability-examples/inference-platforms/aigw/../chat.py", line 48, in main
chat_completion = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/_utils/_utils.py", line 287, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 925, in create
return self._post(
^^^^^^^^^^^
File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/_base_client.py", line 1242, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/_base_client.py", line 1037, in request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500Repro steps:
env.local
OPENAI_BASE_URL=http://localhost:1975/v1
OPENAI_API_KEY=unused
CHAT_MODEL=qwen3:0.6B
chat.py
Details
# /// script
# dependencies = [
# "openai",
# "elastic-opentelemetry",
# "openinference-instrumentation-openai",
# "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os
import openai
# from opentelemetry.instrumentation import auto_instrumentation
#
# auto_instrumentation.initialize()
model = os.getenv("CHAT_MODEL", "gpt-4o-mini")
def main():
parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
parser.add_argument(
"--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
)
args = parser.parse_args()
client = openai.Client()
messages = [
{
"role": "user",
"content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
}
]
# vllm-specific switch to disable thinking, ignored by other inference platforms.
# See https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
if "qwen3" in model.lower():
extra_body = {"chat_template_kwargs": {"enable_thinking": False}}
else:
extra_body = {}
if args.use_responses_api:
response = client.responses.create(
model=model, input=messages[0]["content"], temperature=0, extra_body=extra_body
)
print(response.output[0].content[0].text)
else:
chat_completion = client.chat.completions.create(
model=model, messages=messages, temperature=0, extra_body=extra_body, extra_headers={"x-ai-eg-model": "qwen3:0.6B"}
)
print(chat_completion.choices[0].message.content)
if __name__ == "__main__":
main()ai-gateway.yaml
Details
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: aigw-run
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: aigw-run
namespace: default
spec:
gatewayClassName: aigw-run
listeners:
- name: http
protocol: HTTP
port: 1975
infrastructure:
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: envoy-ai-gateway
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: envoy-ai-gateway
namespace: default
spec:
logging:
level:
default: debug
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: aigw-run
namespace: default
spec:
schema:
name: OpenAI
targetRefs:
- name: aigw-run
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: qwen3:0.6B
backendRefs:
- name: ollama
namespace: default
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
name: ollama
namespace: default
spec:
timeouts:
request: 3m
schema:
name: OpenAI
backendRef:
name: ollama
kind: Backend
group: gateway.envoyproxy.io
namespace: default
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: ollama
namespace: default
spec:
endpoints:
- ip:
address: 127.0.0.1
port: 11434aigw linux
docker run -it --rm -p 1975:1975 golang:1.24
go install github.com/envoyproxy/ai-gateway/cmd/aigw@main
ENVOY_VERSION=1.34.1 aigw run ai-gateway.yamlaigw darwin
go install github.com/envoyproxy/ai-gateway/cmd/aigw@main
ENVOY_VERSION=1.34.1 aigw run ai-gateway.yamlEnvironment:
github.com/envoyproxy/ai-gateway/cmd/aigw@main
ENVOY_VERSION=1.34.1
darwin or linux
Logs:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working