Skip to content

feat: cross Backend failover/fallback and retry support#599

Merged
mathetake merged 16 commits intomainfrom
twophases
May 10, 2025
Merged

feat: cross Backend failover/fallback and retry support#599
mathetake merged 16 commits intomainfrom
twophases

Conversation

@mathetake
Copy link
Copy Markdown
Member

@mathetake mathetake commented May 2, 2025

Commit Message

This commit is a relatively large refactoring of internals to make Envoy AI Gateawy's API more aligned with Envoy Gateway's BackendTrafficPolicy as well as HTTPRoute. Specifically, the main objective here to allow failover and retires to work well across multiple AIServiceBackend.

One of the most notable changes in this commit is that we split the extproc's logic into two phases; one is executed at the normal router level that selects a route (as opposed to the backend selection previously) and the other as the upstream filter that performs auth and transformation. In other words, Envoy AI Gateway configures two external processing filters.

As a result, users are now able to configure failover as well as the retry/fallback using Envoy Gateway's BackendTrafficPolicy attached to HTTPRoute generated by the Envoy AI Gateway. For example, this allows us to support the case where primary cluster is an Azure OpenAI and when it's failing, the AI Gateway fallbacks to AWS Bedrock with the standard Envoy Gateway configuration.

Background
At the Envoy configuration level, Envoy Gateway translates multiple backends in a single HTTPRoute's Rule into a single Envoy cluster whose endpoints consists of multiple Endpoint set (called LocalityLbEndpoints in Envoy API [1]) and each set corresponds to a Backend with priority configured. For example, very roughly speaking, the following pseudo HTTPRoute

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata
  name: provider-fallback
spec:
  rules:
  - backendRefs:
    - group: gateway.envoyproxy.io
      kind: Backend
      name: primary-backend
    - group: gateway.envoyproxy.io
      kind: Backend
      name: secondary-backend
    matches:
    - path:
        type: PathPrefix
        value: /

will be translated as, when secondary-backend is marked as fallback: true in its Backend definition ([2]):

- cluster:
  '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster
  loadAssignment:
    clusterName: httproute/default/provider-fallback/rule/0
    endpoints:
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: primary.com
              portValue: 443
      priority: 0
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: secondary.com
              portValue: 443
      priority: 1

where priority is configured 0 and 1 for each primary and secondary backend. When retry or passive health check is configured, Envoy will retry or fallback into the secondary cluster.

In our API, transformation as well as upstream authentication must be performed per Backend so these logic must be inserted after this endpoint set (or LocalityLbEndpoints to be precise) is chosen by Envoy. For example, primary.com and secondary.com might have different API schema, authentication etc. Since Envoy has a specific HTTP filter chain that will be executed at this stage, which is called "upstream filters", if we insert the extproc that performs these logic, we can properly do authn/z and transformation in response to the retry attempts by Envoy natively.

From the upstream filter level external processor's perspective, it needs to know which exactly backend is chosen by the Envoy's cluster load balancing logic. We add some additional metadata information into the endpoint with EG's extension server so that the extproc can retrieve these information. We also use the extension server to insert the upstream extproc filter since currently it's not supported by EG. These logic in our extension server can be eliminated when the corresponding functionality become available in EG ([3],[4]).

Caveats

  • Due to the limitation of EG's extension server API, AIBackendService that references k8s Service cannot be supported so we have to drop the support for it. Since there's a workaround for it, it should be fine plus EG can be fixed easily so the version after the next release should be able to revive the support.
  • aigw run temporarily disabled until [5] is resolved
  • Infernce Extension support temporarily disabled but will be revived before the next release.

[1] https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto
[2] https://gateway.envoyproxy.io/latest/api/extension_types/#backendspec
[3] envoyproxy/gateway#5523
[4] envoyproxy/gateway#5351
[5] envoyproxy/gateway#5918

Related Issues/PRs (if applicable)

Partially resolves the provider level fallbacks for #34

@mathetake mathetake force-pushed the twophases branch 2 times, most recently from 97a66d4 to d97af8b Compare May 2, 2025 01:01
@mathetake mathetake changed the title wip feat: Backend level failover/fallback and retry support May 2, 2025
@mathetake mathetake changed the title feat: Backend level failover/fallback and retry support feat: cross Backend failover/fallback and retry support May 2, 2025
@mathetake
Copy link
Copy Markdown
Member Author

mathetake commented May 2, 2025

note to self: TODOs after this PR:

  • Change weight to the optional int32; non-breaking change
  • Add timeout at the rule level to match HTTPRoute; deprecate the AIServiceBackend level timeout
  • Fix aigw run by running the extension server in the standalone mode
  • Fix and Redo Inference Extension support
  • Document that BackendRef must NOT be Service; not breaking change (there's a workaround)

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake
Copy link
Copy Markdown
Member Author

finally passed all tests.....

// Inside the routing rules, the header ModelNameHeaderKey may be used to make the routing decision.
Rules []RouteRule `json:"rules"`
// Backends is the list of backends to which the request should be routed to when the headers match.
Backends []*Backend `json:"backends"`
Copy link
Copy Markdown
Contributor

@yuzisun yuzisun May 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand correctly, is the idea here is to match the route name which is the HTTPRoute resource, each AIGatewayRoute rule creates a corresponding HTTPRoute ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, not really. AIGatewayRoute: HTTPRoute is one-to-one and each rule in AIGatewayRoute will also one-to-one correspond to HTTRoute's rule. The route level extproc's only responsibility is to choose the matching rule (as opposed to the backend before). And then Envoy will route the requests to the chosen rule (== cluster binding multiple backends).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we are dropping the capability of making the routing decision in extproc from a list of backend refs if I understand correctly. For example if we want to implement latency aware routing, we will no longer be able to do that in extproc and have to rely on envoy endpoint picker ?

 apiVersion: aigateway.envoyproxy.io/v1alpha1
 kind: AIGatewayRoute
 metadata:
   name: latency-aware-routing
   namespace: default
 spec:
   schema:
     name: OpenAI
   targetRefs:
     - name: latency-aware-routing
       kind: Gateway
       group: gateway.networking.k8s.io
   rules:
     - matches:
         - headers:
             - type: Exact
               name: x-ai-eg-model
               value: us.meta.llama3-2-1b-instruct-v1:0
       loadBalancer: latency
       backendRefs:
         - name: provider-aws-llama
         - name: provider-gcp-llama

Copy link
Copy Markdown
Member Author

@mathetake mathetake May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are dropping the capability of making the routing decision in extproc from a list of backend refs if I understand correctly.

Not really. If we want to implement loadBalancer: latency something like this after this PR, we can still translate it by adding a specific cluster and then set a specific backend in the header from the ext proc to that cluster. it should be doable.

}
// Get the HTTPRoute object from the cluster name.
var aigwRoute aigv1a1.AIGatewayRoute
err = s.k8sClient.Get(context.Background(), client.ObjectKey{Namespace: httpRouteNamespace, Name: httpRouteName}, &aigwRoute)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the assumption here that AIGatewayRoute is now split to individual ones like we planned for HTTPRoute ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't split into multiple HTTPRoutes as I commented above. The size limit thingy can be fixed with the support for multiple AIGatewayRoutes per Gateway as we discussed in the slack with Yao and others. Keeping AIGatewayRoute: HTTPRoute = 1: 1 is much simpler from UX perspective as well since you can create the retry/fallback policy for the generated one HTTPRoute as in the example 3549245

mathetake added 11 commits May 5, 2025 12:04
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake marked this pull request as ready for review May 5, 2025 22:32
@mathetake mathetake requested a review from a team as a code owner May 5, 2025 22:32
@mathetake mathetake requested review from arkodg, wengyao04 and yuzisun May 5, 2025 22:32
// Headers is the list of headers to match for the routing decision.
// Currently, only exact match is supported.
Headers []HeaderMatch `json:"headers"`
// Backends is the list of backends to which the request should be routed to when the headers match.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think still possible with #588 (comment) here?

Copy link
Copy Markdown
Member Author

@mathetake mathetake May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it still should be possible even after this PR, just that the backends moved to the top level here vs under Rules before.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
o.stream = true
}
return nil, nil, nil
// On retry, the path might have changed to a different provider. So, this will ensure that the path is always set to OpenAI.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On retry the model name could also change, "same model" but they can have different name in different provider. I can create a separate to support his after this PR is in.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Copy link
Copy Markdown
Contributor

@yuzisun yuzisun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work !!!

@mathetake mathetake merged commit 68653e7 into main May 10, 2025
17 checks passed
@mathetake mathetake deleted the twophases branch May 10, 2025 00:08
mathetake added a commit that referenced this pull request May 12, 2025
**Commit Message**

This deprecates the AIServiceBackend.Timeouts configuration that has
started working not well with the refactored use of HTTPRoute since
#599. Instead, this adds `timeouts` into AIGatewayRouteRule to matche
the one of HTTPRoute in GWAPI.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
// Inside the routing rules, the header ModelNameHeaderKey may be used to make the routing decision.
Rules []RouteRule `json:"rules"`
// Backends is the list of backends to which the request should be routed to when the headers match.
Backends []*Backend `json:"backends"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, How we map the Backends with the headers now? Seems they're separated, no obvisous relationship between them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you are right. that's necessary for #588 right? we need to store the route->backends info somewhere in here. would you mind sending a patch?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can take a look.

@mathetake mathetake mentioned this pull request May 13, 2025
mathetake added a commit that referenced this pull request May 13, 2025
**Commit Message**

This fixes `aigw run` command which has been disabled since the
refactoring in #599. This requires a couple bug fixes in Envoy Gateway
side, so this commit includes the upgrade of the EG as a dependency.

**Related Issues/PRs (if applicable)**

* Closes #607
* Includes envoyproxy/gateway/pull/5984
* Includes envoyproxy/gateway/pull/6020

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
mathetake pushed a commit that referenced this pull request May 22, 2025
**Commit Message**

The backends and headers in filter config are M:N, with
#599, we swapped out the
backend from the config.rules, leading to the lost mapping relationship
between them. With this PR, we'll move the backends back to the
config.rules which is more straightforward.

**Related Issues/PRs (if applicable)**


Related PR: #620,
#599

**Special notes for reviewers (if applicable)**

None

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
yuzisun pushed a commit that referenced this pull request May 28, 2025
**Commit Message**

This commit refactors the internal on how the ext proc is deployed.
Specifically, this switches to insert the ext proc container as a
sidecar container of Envoy pods created by Envoy Gateway. This is
another large refactoring that turned out necessary for #599. This
utilizes the mutating webhook to insert the extproc container Envoy
pods.

Making the extproc as as sidecar means that we now have a one-to-one
mapping between Gateway and the extproc hence this naturally resolves
the previously known limitation #509 and now users can attach multiple
AIGatewayRoute(s) to one Gateway.

Implementation note: since the volume mounts only work in the
namespace-scoped way, use-created secrets (like API Keys) cannot be
mounted by the extproc as it runs in "envoy-gateway-system" namespace.
To resolve this, now the controller reads the secret and embed the read
credentials into the "extproc secret" (which is previously known as
"extproc configmap") together with routing, matching and backend
information. That secret is written in the "envoy-gateway-system"
namespace hence it can be mounted by the extproc container.

**Related Issues/PRs (if applicable)**

Resolves #509 
Resolves #621

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
mathetake added a commit that referenced this pull request Jul 4, 2025
**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since #599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref #612 
Ref #73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
alexagriffith pushed a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: fix aigw parentRefs in fallback (envoyproxy#824)

**Description**

This PR fixed the AIGatewayRoute parentRefs in fallback guides.

Signed-off-by: bitliu <bitliu@tencent.com>

chore: make test-e2e logs visible (envoyproxy#825)

**Description**

This PR is to make  test-e2e logs visible in local.

Signed-off-by: bitliu <bitliu@tencent.com>

extproc: account for parallel tool calls (envoyproxy#813)

**Description**
Resolves envoyproxy#736

Assistant that calls multiple tools are expected to group tool result in
the same message. Adding logic for that!

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833)

extproc: return 404 instead of 500 for unknown path (envoyproxy#835)

**Description**

Previously, unknown path was responded as an internal error as opposed
to the fact that it's an 404 with the user input root cause. This fixes
the extproc code that way, now that users will be able to know what's
wrong with the operation instead of getting the cryptic 500 error.

**Related Issues/PRs (if applicable)**

Contributes to envoyproxy#810
Closes envoyproxy#724

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: add endpoint support (envoyproxy#787)

**Description**

This PR adds the endpoint support pages for EAGW.

**Related Issues/PRs (if applicable)**

Fixes: envoyproxy#705

**Special notes for reviewers (if applicable)**

@mathetake

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>

controller: return 404 instead of 500 for no matching (envoyproxy#837)

**Description**

Before envoyproxy#793, the case where no matching route found was handled in the
extproc and the 404 immediate response was returned from there, but
after that, it naturally results in the "unreachable" default route and
swallowed the indication of no matching and it made it impossible to
reason about the 500 error on that case. In other words, this fixes the
regression in envoyproxy#793 to return the proper 404 response.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

update

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit passing

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove header hotfix

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit working
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

 add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

test: adds real provider embeddings test & update doc (envoyproxy#841)

**Description**

This adds embeddings endpoint tests with the providers that support the
endpoint. This only added the providers for which we have credentials.
According to the testing situation we have right now, this also
clarifies in the "Supported Endpoints" page that which provider is
tested and which is not for each endpoint.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

cli: adds default route test (envoyproxy#842)

**Description**

This adds an additional test to aigw run command so that we can verify
that setting the default route is possible.

**Related Issues/PRs (if applicable)**

Closes envoyproxy#612

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845)

test: fixes TestStartConfigWatcher flake (envoyproxy#843)

controller: ensure eg rollout when deployed as daemonset (envoyproxy#831)

**Description**
This PR handles the rollout for envoy gateway during ai gateway extproc
upgrade when deployed as daemonset.

Related Issues/PRs (if applicable)
Related PR: envoyproxy#699

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

make test var

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
alexagriffith added a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

lint no err

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add translation

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update so tests work

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more tests

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove print

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

refactor: deprecate targetRefs in favor or parentRefs  (envoyproxy#821)

docs: add epp integration proposal (envoyproxy#771)

**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: update epp outdated logics (envoyproxy#822)

refactor: use Envoy native router (envoyproxy#793)

**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: fix aigw parentRefs in fallback (envoyproxy#824)

**Description**

This PR fixed the AIGatewayRoute parentRefs in fallback guides.

Signed-off-by: bitliu <bitliu@tencent.com>

chore: make test-e2e logs visible (envoyproxy#825)

**Description**

This PR is to make  test-e2e logs visible in local.

Signed-off-by: bitliu <bitliu@tencent.com>

extproc: account for parallel tool calls (envoyproxy#813)

**Description**
Resolves envoyproxy#736

Assistant that calls multiple tools are expected to group tool result in
the same message. Adding logic for that!

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833)

extproc: return 404 instead of 500 for unknown path (envoyproxy#835)

**Description**

Previously, unknown path was responded as an internal error as opposed
to the fact that it's an 404 with the user input root cause. This fixes
the extproc code that way, now that users will be able to know what's
wrong with the operation instead of getting the cryptic 500 error.

**Related Issues/PRs (if applicable)**

Contributes to envoyproxy#810
Closes envoyproxy#724

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: add endpoint support (envoyproxy#787)

**Description**

This PR adds the endpoint support pages for EAGW.

**Related Issues/PRs (if applicable)**

Fixes: envoyproxy#705

**Special notes for reviewers (if applicable)**

@mathetake

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>

controller: return 404 instead of 500 for no matching (envoyproxy#837)

**Description**

Before envoyproxy#793, the case where no matching route found was handled in the
extproc and the 404 immediate response was returned from there, but
after that, it naturally results in the "unreachable" default route and
swallowed the indication of no matching and it made it impossible to
reason about the 500 error on that case. In other words, this fixes the
regression in envoyproxy#793 to return the proper 404 response.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

update

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit passing

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove header hotfix

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit working
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

 add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

test: adds real provider embeddings test & update doc (envoyproxy#841)

**Description**

This adds embeddings endpoint tests with the providers that support the
endpoint. This only added the providers for which we have credentials.
According to the testing situation we have right now, this also
clarifies in the "Supported Endpoints" page that which provider is
tested and which is not for each endpoint.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

cli: adds default route test (envoyproxy#842)

**Description**

This adds an additional test to aigw run command so that we can verify
that setting the default route is possible.

**Related Issues/PRs (if applicable)**

Closes envoyproxy#612

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845)

test: fixes TestStartConfigWatcher flake (envoyproxy#843)

controller: ensure eg rollout when deployed as daemonset (envoyproxy#831)

**Description**
This PR handles the rollout for envoy gateway during ai gateway extproc
upgrade when deployed as daemonset.

Related Issues/PRs (if applicable)
Related PR: envoyproxy#699

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

make test var

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
alexagriffith added a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update paralleltoolcalls

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add back system helper

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

lint no err

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add translation

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update so tests work

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more tests

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove print

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

refactor: deprecate targetRefs in favor or parentRefs  (envoyproxy#821)

docs: add epp integration proposal (envoyproxy#771)

**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: update epp outdated logics (envoyproxy#822)

refactor: use Envoy native router (envoyproxy#793)

**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: fix aigw parentRefs in fallback (envoyproxy#824)

**Description**

This PR fixed the AIGatewayRoute parentRefs in fallback guides.

Signed-off-by: bitliu <bitliu@tencent.com>

chore: make test-e2e logs visible (envoyproxy#825)

**Description**

This PR is to make  test-e2e logs visible in local.

Signed-off-by: bitliu <bitliu@tencent.com>

extproc: account for parallel tool calls (envoyproxy#813)

**Description**
Resolves envoyproxy#736

Assistant that calls multiple tools are expected to group tool result in
the same message. Adding logic for that!

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833)

extproc: return 404 instead of 500 for unknown path (envoyproxy#835)

**Description**

Previously, unknown path was responded as an internal error as opposed
to the fact that it's an 404 with the user input root cause. This fixes
the extproc code that way, now that users will be able to know what's
wrong with the operation instead of getting the cryptic 500 error.

**Related Issues/PRs (if applicable)**

Contributes to envoyproxy#810
Closes envoyproxy#724

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: add endpoint support (envoyproxy#787)

**Description**

This PR adds the endpoint support pages for EAGW.

**Related Issues/PRs (if applicable)**

Fixes: envoyproxy#705

**Special notes for reviewers (if applicable)**

@mathetake

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>

controller: return 404 instead of 500 for no matching (envoyproxy#837)

**Description**

Before envoyproxy#793, the case where no matching route found was handled in the
extproc and the 404 immediate response was returned from there, but
after that, it naturally results in the "unreachable" default route and
swallowed the indication of no matching and it made it impossible to
reason about the 500 error on that case. In other words, this fixes the
regression in envoyproxy#793 to return the proper 404 response.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

update

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit passing

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove header hotfix

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit working
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

 add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

test: adds real provider embeddings test & update doc (envoyproxy#841)

**Description**

This adds embeddings endpoint tests with the providers that support the
endpoint. This only added the providers for which we have credentials.
According to the testing situation we have right now, this also
clarifies in the "Supported Endpoints" page that which provider is
tested and which is not for each endpoint.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

cli: adds default route test (envoyproxy#842)

**Description**

This adds an additional test to aigw run command so that we can verify
that setting the default route is possible.

**Related Issues/PRs (if applicable)**

Closes envoyproxy#612

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845)

test: fixes TestStartConfigWatcher flake (envoyproxy#843)

controller: ensure eg rollout when deployed as daemonset (envoyproxy#831)

**Description**
This PR handles the rollout for envoy gateway during ai gateway extproc
upgrade when deployed as daemonset.

Related Issues/PRs (if applicable)
Related PR: envoyproxy#699

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

make test var

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
alexagriffith added a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update paralleltoolcalls

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add back system helper

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

lint no err

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add translation

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update so tests work

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more tests

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove print

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

refactor: deprecate targetRefs in favor or parentRefs  (envoyproxy#821)

docs: add epp integration proposal (envoyproxy#771)

**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: update epp outdated logics (envoyproxy#822)

refactor: use Envoy native router (envoyproxy#793)

**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: fix aigw parentRefs in fallback (envoyproxy#824)

**Description**

This PR fixed the AIGatewayRoute parentRefs in fallback guides.

Signed-off-by: bitliu <bitliu@tencent.com>

chore: make test-e2e logs visible (envoyproxy#825)

**Description**

This PR is to make  test-e2e logs visible in local.

Signed-off-by: bitliu <bitliu@tencent.com>

extproc: account for parallel tool calls (envoyproxy#813)

**Description**
Resolves envoyproxy#736

Assistant that calls multiple tools are expected to group tool result in
the same message. Adding logic for that!

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833)

extproc: return 404 instead of 500 for unknown path (envoyproxy#835)

**Description**

Previously, unknown path was responded as an internal error as opposed
to the fact that it's an 404 with the user input root cause. This fixes
the extproc code that way, now that users will be able to know what's
wrong with the operation instead of getting the cryptic 500 error.

**Related Issues/PRs (if applicable)**

Contributes to envoyproxy#810
Closes envoyproxy#724

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: add endpoint support (envoyproxy#787)

**Description**

This PR adds the endpoint support pages for EAGW.

**Related Issues/PRs (if applicable)**

Fixes: envoyproxy#705

**Special notes for reviewers (if applicable)**

@mathetake

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>

controller: return 404 instead of 500 for no matching (envoyproxy#837)

**Description**

Before envoyproxy#793, the case where no matching route found was handled in the
extproc and the 404 immediate response was returned from there, but
after that, it naturally results in the "unreachable" default route and
swallowed the indication of no matching and it made it impossible to
reason about the 500 error on that case. In other words, this fixes the
regression in envoyproxy#793 to return the proper 404 response.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

update

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit passing

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove header hotfix

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit working
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

 add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

test: adds real provider embeddings test & update doc (envoyproxy#841)

**Description**

This adds embeddings endpoint tests with the providers that support the
endpoint. This only added the providers for which we have credentials.
According to the testing situation we have right now, this also
clarifies in the "Supported Endpoints" page that which provider is
tested and which is not for each endpoint.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

cli: adds default route test (envoyproxy#842)

**Description**

This adds an additional test to aigw run command so that we can verify
that setting the default route is possible.

**Related Issues/PRs (if applicable)**

Closes envoyproxy#612

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845)

test: fixes TestStartConfigWatcher flake (envoyproxy#843)

controller: ensure eg rollout when deployed as daemonset (envoyproxy#831)

**Description**
This PR handles the rollout for envoy gateway during ai gateway extproc
upgrade when deployed as daemonset.

Related Issues/PRs (if applicable)
Related PR: envoyproxy#699

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

make test var

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
alexagriffith pushed a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 11, 2025
**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants