feat: cross Backend failover/fallback and retry support by mathetake · Pull Request #599 · envoyproxy/ai-gateway

mathetake · 2025-05-02T00:16:27Z

Commit Message

This commit is a relatively large refactoring of internals to make Envoy AI Gateawy's API more aligned with Envoy Gateway's BackendTrafficPolicy as well as HTTPRoute. Specifically, the main objective here to allow failover and retires to work well across multiple AIServiceBackend.

One of the most notable changes in this commit is that we split the extproc's logic into two phases; one is executed at the normal router level that selects a route (as opposed to the backend selection previously) and the other as the upstream filter that performs auth and transformation. In other words, Envoy AI Gateway configures two external processing filters.

As a result, users are now able to configure failover as well as the retry/fallback using Envoy Gateway's BackendTrafficPolicy attached to HTTPRoute generated by the Envoy AI Gateway. For example, this allows us to support the case where primary cluster is an Azure OpenAI and when it's failing, the AI Gateway fallbacks to AWS Bedrock with the standard Envoy Gateway configuration.

Background
At the Envoy configuration level, Envoy Gateway translates multiple backends in a single HTTPRoute's Rule into a single Envoy cluster whose endpoints consists of multiple Endpoint set (called LocalityLbEndpoints in Envoy API [1]) and each set corresponds to a Backend with priority configured. For example, very roughly speaking, the following pseudo HTTPRoute

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata
  name: provider-fallback
spec:
  rules:
  - backendRefs:
    - group: gateway.envoyproxy.io
      kind: Backend
      name: primary-backend
    - group: gateway.envoyproxy.io
      kind: Backend
      name: secondary-backend
    matches:
    - path:
        type: PathPrefix
        value: /

will be translated as, when secondary-backend is marked as fallback: true in its Backend definition ([2]):

- cluster:
  '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster
  loadAssignment:
    clusterName: httproute/default/provider-fallback/rule/0
    endpoints:
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: primary.com
              portValue: 443
      priority: 0
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: secondary.com
              portValue: 443
      priority: 1

where priority is configured 0 and 1 for each primary and secondary backend. When retry or passive health check is configured, Envoy will retry or fallback into the secondary cluster.

In our API, transformation as well as upstream authentication must be performed per Backend so these logic must be inserted after this endpoint set (or LocalityLbEndpoints to be precise) is chosen by Envoy. For example, primary.com and secondary.com might have different API schema, authentication etc. Since Envoy has a specific HTTP filter chain that will be executed at this stage, which is called "upstream filters", if we insert the extproc that performs these logic, we can properly do authn/z and transformation in response to the retry attempts by Envoy natively.

From the upstream filter level external processor's perspective, it needs to know which exactly backend is chosen by the Envoy's cluster load balancing logic. We add some additional metadata information into the endpoint with EG's extension server so that the extproc can retrieve these information. We also use the extension server to insert the upstream extproc filter since currently it's not supported by EG. These logic in our extension server can be eliminated when the corresponding functionality become available in EG ([3],[4]).

Caveats

Due to the limitation of EG's extension server API, AIBackendService that references k8s Service cannot be supported so we have to drop the support for it. Since there's a workaround for it, it should be fine plus EG can be fixed easily so the version after the next release should be able to revive the support.
aigw run temporarily disabled until [5] is resolved
Infernce Extension support temporarily disabled but will be revived before the next release.

[1] https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto
[2] https://gateway.envoyproxy.io/latest/api/extension_types/#backendspec
[3] envoyproxy/gateway#5523
[4] envoyproxy/gateway#5351
[5] envoyproxy/gateway#5918

Related Issues/PRs (if applicable)

Partially resolves the provider level fallbacks for #34

mathetake · 2025-05-02T19:28:43Z

note to self: TODOs after this PR:

Change weight to the optional int32; non-breaking change
Add timeout at the rule level to match HTTPRoute; deprecate the AIServiceBackend level timeout
Fix aigw run by running the extension server in the standalone mode
Fix and Redo Inference Extension support
Document that BackendRef must NOT be Service; not breaking change (there's a workaround)

examples/basic/basic.yaml

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake · 2025-05-03T00:01:46Z

finally passed all tests.....

filterapi/filterconfig.go

yuzisun · 2025-05-04T09:26:49Z

filterapi/filterconfig.go

 	// Inside the routing rules, the header ModelNameHeaderKey may be used to make the routing decision.
 	Rules []RouteRule `json:"rules"`
+	// Backends is the list of backends to which the request should be routed to when the headers match.
+	Backends []*Backend `json:"backends"`


Not sure if I understand correctly, is the idea here is to match the route name which is the HTTPRoute resource, each AIGatewayRoute rule creates a corresponding HTTPRoute ?

well, not really. AIGatewayRoute: HTTPRoute is one-to-one and each rule in AIGatewayRoute will also one-to-one correspond to HTTRoute's rule. The route level extproc's only responsibility is to choose the matching rule (as opposed to the backend before). And then Envoy will route the requests to the chosen rule (== cluster binding multiple backends).

This means that we are dropping the capability of making the routing decision in extproc from a list of backend refs if I understand correctly. For example if we want to implement latency aware routing, we will no longer be able to do that in extproc and have to rely on envoy endpoint picker ?

apiVersion: aigateway.envoyproxy.io/v1alpha1 kind: AIGatewayRoute metadata: name: latency-aware-routing namespace: default spec: schema: name: OpenAI targetRefs: - name: latency-aware-routing kind: Gateway group: gateway.networking.k8s.io rules: - matches: - headers: - type: Exact name: x-ai-eg-model value: us.meta.llama3-2-1b-instruct-v1:0 loadBalancer: latency backendRefs: - name: provider-aws-llama - name: provider-gcp-llama

we are dropping the capability of making the routing decision in extproc from a list of backend refs if I understand correctly.

Not really. If we want to implement loadBalancer: latency something like this after this PR, we can still translate it by adding a specific cluster and then set a specific backend in the header from the ext proc to that cluster. it should be doable.

yuzisun · 2025-05-04T09:49:52Z

internal/extensionserver/extensionserver.go

+	}
+	// Get the HTTPRoute object from the cluster name.
+	var aigwRoute aigv1a1.AIGatewayRoute
+	err = s.k8sClient.Get(context.Background(), client.ObjectKey{Namespace: httpRouteNamespace, Name: httpRouteName}, &aigwRoute)


is the assumption here that AIGatewayRoute is now split to individual ones like we planned for HTTPRoute ?

It won't split into multiple HTTPRoutes as I commented above. The size limit thingy can be fixed with the support for multiple AIGatewayRoutes per Gateway as we discussed in the slack with Yao and others. Keeping AIGatewayRoute: HTTPRoute = 1: 1 is much simpler from UX perspective as well since you can create the retry/fallback policy for the generated one HTTPRoute as in the example 3549245

internal/extproc/router/router.go

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

kerthcet · 2025-05-07T08:41:48Z

filterapi/filterconfig.go

 	// Headers is the list of headers to match for the routing decision.
 	// Currently, only exact match is supported.
 	Headers []HeaderMatch `json:"headers"`
-	// Backends is the list of backends to which the request should be routed to when the headers match.


I think still possible with #588 (comment) here?

yeah it still should be possible even after this PR, just that the backends moved to the top level here vs under Rules before.

internal/controller/ai_gateway_route.go

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

internal/extensionserver/extensionserver.go

yuzisun · 2025-05-09T02:35:43Z

internal/extproc/translator/openai_openai.go

 		o.stream = true
 	}
-	return nil, nil, nil
+	// On retry, the path might have changed to a different provider. So, this will ensure that the path is always set to OpenAI.


On retry the model name could also change, "same model" but they can have different name in different provider. I can create a separate to support his after this PR is in.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

yuzisun

Awesome work !!!

**Commit Message** This deprecates the AIServiceBackend.Timeouts configuration that has started working not well with the refactored use of HTTPRoute since #599. Instead, this adds `timeouts` into AIGatewayRouteRule to matche the one of HTTPRoute in GWAPI. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

kerthcet · 2025-05-13T09:49:57Z

filterapi/filterconfig.go

 	// Inside the routing rules, the header ModelNameHeaderKey may be used to make the routing decision.
 	Rules []RouteRule `json:"rules"`
+	// Backends is the list of backends to which the request should be routed to when the headers match.
+	Backends []*Backend `json:"backends"`


Sorry, How we map the Backends with the headers now? Seems they're separated, no obvisous relationship between them.

oh you are right. that's necessary for #588 right? we need to store the route->backends info somewhere in here. would you mind sending a patch?

Yeah, I can take a look.

**Commit Message** This fixes `aigw run` command which has been disabled since the refactoring in #599. This requires a couple bug fixes in Envoy Gateway side, so this commit includes the upgrade of the EG as a dependency. **Related Issues/PRs (if applicable)** * Closes #607 * Includes envoyproxy/gateway/pull/5984 * Includes envoyproxy/gateway/pull/6020 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

**Commit Message** The backends and headers in filter config are M:N, with #599, we swapped out the backend from the config.rules, leading to the lost mapping relationship between them. With this PR, we'll move the backends back to the config.rules which is more straightforward. **Related Issues/PRs (if applicable)** Related PR: #620, #599 **Special notes for reviewers (if applicable)** None --------- Signed-off-by: kerthcet <kerthcet@gmail.com>

**Commit Message** This commit refactors the internal on how the ext proc is deployed. Specifically, this switches to insert the ext proc container as a sidecar container of Envoy pods created by Envoy Gateway. This is another large refactoring that turned out necessary for #599. This utilizes the mutating webhook to insert the extproc container Envoy pods. Making the extproc as as sidecar means that we now have a one-to-one mapping between Gateway and the extproc hence this naturally resolves the previously known limitation #509 and now users can attach multiple AIGatewayRoute(s) to one Gateway. Implementation note: since the volume mounts only work in the namespace-scoped way, use-created secrets (like API Keys) cannot be mounted by the extproc as it runs in "envoy-gateway-system" namespace. To resolve this, now the controller reads the secret and embed the read credentials into the "extproc secret" (which is previously known as "extproc configmap") together with routing, matching and backend information. That secret is written in the "envoy-gateway-system" namespace hence it can be mounted by the extproc container. **Related Issues/PRs (if applicable)** Resolves #509 Resolves #621 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

**Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since #599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref #612 Ref #73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

@mathetake

**Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

@mathetake

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

@mathetake

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update paralleltoolcalls Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add back system helper Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

@mathetake

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update paralleltoolcalls Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add back system helper Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

**Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

mathetake force-pushed the twophases branch 2 times, most recently from 97a66d4 to d97af8b Compare May 2, 2025 01:01

mathetake mentioned this pull request May 2, 2025

Support k8s gateway API inference extensions #423

Closed

3 tasks

mathetake changed the title ~~wip~~ feat: Backend level failover/fallback and retry support May 2, 2025

mathetake changed the title ~~feat: Backend level failover/fallback and retry support~~ feat: cross Backend failover/fallback and retry support May 2, 2025

mathetake commented May 2, 2025

View reviewed changes

examples/basic/basic.yaml Show resolved Hide resolved

mathetake mentioned this pull request May 2, 2025

Suport for multiple AIGatewayRoutes per Gateway #509

Closed

feat: cross Backend failover/fallback and retry support

7f3ff68

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake force-pushed the twophases branch from 55058d6 to 7f3ff68 Compare May 2, 2025 23:50

revert the config

a9996b8

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

yuzisun reviewed May 4, 2025

View reviewed changes

mathetake mentioned this pull request May 4, 2025

Support for Extension Server in standalone mode envoyproxy/gateway#5918

Closed

mathetake added 11 commits May 5, 2025 12:04

Add fallback example

3549245

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

add TODO to TestRun

57ae1a0

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

typo etc

047d16a

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

more

efb0065

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

more

e1c56c5

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

removes dynlib

e1766a2

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

increase test coverage

5b352f4

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

more

8ef31a9

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

more

31980b0

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

more

c603100

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

more

c5098ef

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake marked this pull request as ready for review May 5, 2025 22:32

mathetake requested a review from a team as a code owner May 5, 2025 22:32

mathetake requested review from arkodg, wengyao04 and yuzisun May 5, 2025 22:32

kerthcet reviewed May 7, 2025

View reviewed changes

mathetake mentioned this pull request May 7, 2025

[Refactor] Reorganize the location of infext controllers #597

Closed

more comments

372b671

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake assigned yuzisun May 7, 2025

yuzisun reviewed May 8, 2025

View reviewed changes

internal/extensionserver/extensionserver.go Outdated Show resolved Hide resolved

yuzisun reviewed May 9, 2025

View reviewed changes

typo lol

c59d36c

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake mentioned this pull request May 9, 2025

Support DynamicLoadBalancing beyond AIE(API inference extension) #604

Closed

mathetake requested a review from yuzisun May 9, 2025 18:59

yuzisun approved these changes May 9, 2025

View reviewed changes

mathetake merged commit 68653e7 into main May 10, 2025
17 checks passed

mathetake deleted the twophases branch May 10, 2025 00:08

This was referenced May 10, 2025

api: deprecates AIServiceBackend.Timeouts #606

Merged

Very basic ollama instructions #607

Closed

kerthcet reviewed May 13, 2025

View reviewed changes

mathetake mentioned this pull request May 13, 2025

cli: fixes aigw run #611

Merged

kerthcet mentioned this pull request May 14, 2025

Change the envoy ai gateway to a stable version InftyAI/llmaz#414

Closed

mathetake mentioned this pull request May 22, 2025

feat: switch to sidecar+UDS extproc #629

Merged

kerthcet mentioned this pull request May 22, 2025

feat: move Backends back to RouteRule #633

Merged

mathetake mentioned this pull request Jul 3, 2025

refactor: use Envoy native router #793

Merged

Conversation

mathetake commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathetake commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mathetake commented May 3, 2025

Uh oh!

Uh oh!

yuzisun May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathetake May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathetake May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzisun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mathetake commented May 2, 2025 •

edited

Loading

mathetake commented May 2, 2025 •

edited

Loading

yuzisun May 4, 2025 •

edited

Loading

mathetake May 9, 2025 •

edited

Loading

mathetake May 7, 2025 •

edited

Loading