Skip to content

Error code: 500 on simple pass-through ollama request #724

@codefromthecrypt

Description

@codefromthecrypt

Description:
fyi I tried this and had a 500 error attempting to get this to route through to ollama. I'm using ENVOY_VERSION=1.34.1 aigw run ai-gateway.yaml

Here's my client failure

$ uv run --exact -q --env-file env.local ../chat.py
Traceback (most recent call last):
  File "/Users/adriancole/oss/observability-examples/inference-platforms/aigw/../chat.py", line 56, in <module>
    main()

  File "/Users/adriancole/oss/observability-examples/inference-platforms/aigw/../chat.py", line 48, in main
    chat_completion = client.chat.completions.create(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/_utils/_utils.py", line 287, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 925, in create
    return self._post(
           ^^^^^^^^^^^
  File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/_base_client.py", line 1242, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adriancole/.cache/uv/environments-v2/chat-939cac794e42803a/lib/python3.12/site-packages/openai/_base_client.py", line 1037, in request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500

Repro steps:

env.local

OPENAI_BASE_URL=http://localhost:1975/v1
OPENAI_API_KEY=unused
CHAT_MODEL=qwen3:0.6B

chat.py

Details
# /// script
# dependencies = [
#     "openai",
#     "elastic-opentelemetry",
#     "openinference-instrumentation-openai",
#     "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os

import openai
# from opentelemetry.instrumentation import auto_instrumentation
#
# auto_instrumentation.initialize()

model = os.getenv("CHAT_MODEL", "gpt-4o-mini")


def main():
    parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
    parser.add_argument(
        "--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
    )
    args = parser.parse_args()

    client = openai.Client()

    messages = [
        {
            "role": "user",
            "content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
        }
    ]

    # vllm-specific switch to disable thinking, ignored by other inference platforms.
    # See https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
    if "qwen3" in model.lower():
        extra_body = {"chat_template_kwargs": {"enable_thinking": False}}
    else:
        extra_body = {}
    if args.use_responses_api:
        response = client.responses.create(
            model=model, input=messages[0]["content"], temperature=0, extra_body=extra_body
        )
        print(response.output[0].content[0].text)
    else:
        chat_completion = client.chat.completions.create(
            model=model, messages=messages, temperature=0, extra_body=extra_body,     extra_headers={"x-ai-eg-model": "qwen3:0.6B"}

        )
        print(chat_completion.choices[0].message.content)


if __name__ == "__main__":
    main()

ai-gateway.yaml

Details
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: aigw-run
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: aigw-run
  namespace: default
spec:
  gatewayClassName: aigw-run
  listeners:
    - name: http
      protocol: HTTP
      port: 1975
  infrastructure:
    parametersRef:
      group: gateway.envoyproxy.io
      kind: EnvoyProxy
      name: envoy-ai-gateway
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: envoy-ai-gateway
  namespace: default
spec:
  logging:
    level:
      default: debug
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: aigw-run
  namespace: default
spec:
  schema:
    name: OpenAI
  targetRefs:
    - name: aigw-run
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: qwen3:0.6B
      backendRefs:
        - name: ollama
          namespace: default
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: ollama
  namespace: default
spec:
  timeouts:
    request: 3m
  schema:
    name: OpenAI
  backendRef:
    name: ollama
    kind: Backend
    group: gateway.envoyproxy.io
    namespace: default
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: ollama
  namespace: default
spec:
  endpoints:
    - ip:
        address: 127.0.0.1
        port: 11434

aigw linux

docker run -it --rm -p 1975:1975 golang:1.24
go install github.com/envoyproxy/ai-gateway/cmd/aigw@main
ENVOY_VERSION=1.34.1 aigw run ai-gateway.yaml

aigw darwin

 go install github.com/envoyproxy/ai-gateway/cmd/aigw@main
 ENVOY_VERSION=1.34.1 aigw run ai-gateway.yaml

Environment:

github.com/envoyproxy/ai-gateway/cmd/aigw@main
ENVOY_VERSION=1.34.1

darwin or linux

Logs:

darwin-logs.txt
linux-logs.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions