Conversation
97a66d4 to
d97af8b
Compare
|
note to self: TODOs after this PR:
|
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
finally passed all tests..... |
| // Inside the routing rules, the header ModelNameHeaderKey may be used to make the routing decision. | ||
| Rules []RouteRule `json:"rules"` | ||
| // Backends is the list of backends to which the request should be routed to when the headers match. | ||
| Backends []*Backend `json:"backends"` |
There was a problem hiding this comment.
Not sure if I understand correctly, is the idea here is to match the route name which is the HTTPRoute resource, each AIGatewayRoute rule creates a corresponding HTTPRoute ?
There was a problem hiding this comment.
well, not really. AIGatewayRoute: HTTPRoute is one-to-one and each rule in AIGatewayRoute will also one-to-one correspond to HTTRoute's rule. The route level extproc's only responsibility is to choose the matching rule (as opposed to the backend before). And then Envoy will route the requests to the chosen rule (== cluster binding multiple backends).
There was a problem hiding this comment.
This means that we are dropping the capability of making the routing decision in extproc from a list of backend refs if I understand correctly. For example if we want to implement latency aware routing, we will no longer be able to do that in extproc and have to rely on envoy endpoint picker ?
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: latency-aware-routing
namespace: default
spec:
schema:
name: OpenAI
targetRefs:
- name: latency-aware-routing
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: us.meta.llama3-2-1b-instruct-v1:0
loadBalancer: latency
backendRefs:
- name: provider-aws-llama
- name: provider-gcp-llamaThere was a problem hiding this comment.
we are dropping the capability of making the routing decision in extproc from a list of backend refs if I understand correctly.
Not really. If we want to implement loadBalancer: latency something like this after this PR, we can still translate it by adding a specific cluster and then set a specific backend in the header from the ext proc to that cluster. it should be doable.
| } | ||
| // Get the HTTPRoute object from the cluster name. | ||
| var aigwRoute aigv1a1.AIGatewayRoute | ||
| err = s.k8sClient.Get(context.Background(), client.ObjectKey{Namespace: httpRouteNamespace, Name: httpRouteName}, &aigwRoute) |
There was a problem hiding this comment.
is the assumption here that AIGatewayRoute is now split to individual ones like we planned for HTTPRoute ?
There was a problem hiding this comment.
It won't split into multiple HTTPRoutes as I commented above. The size limit thingy can be fixed with the support for multiple AIGatewayRoutes per Gateway as we discussed in the slack with Yao and others. Keeping AIGatewayRoute: HTTPRoute = 1: 1 is much simpler from UX perspective as well since you can create the retry/fallback policy for the generated one HTTPRoute as in the example 3549245
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
| // Headers is the list of headers to match for the routing decision. | ||
| // Currently, only exact match is supported. | ||
| Headers []HeaderMatch `json:"headers"` | ||
| // Backends is the list of backends to which the request should be routed to when the headers match. |
There was a problem hiding this comment.
I think still possible with #588 (comment) here?
There was a problem hiding this comment.
yeah it still should be possible even after this PR, just that the backends moved to the top level here vs under Rules before.
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
| o.stream = true | ||
| } | ||
| return nil, nil, nil | ||
| // On retry, the path might have changed to a different provider. So, this will ensure that the path is always set to OpenAI. |
There was a problem hiding this comment.
On retry the model name could also change, "same model" but they can have different name in different provider. I can create a separate to support his after this PR is in.
**Commit Message** This deprecates the AIServiceBackend.Timeouts configuration that has started working not well with the refactored use of HTTPRoute since #599. Instead, this adds `timeouts` into AIGatewayRouteRule to matche the one of HTTPRoute in GWAPI. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
| // Inside the routing rules, the header ModelNameHeaderKey may be used to make the routing decision. | ||
| Rules []RouteRule `json:"rules"` | ||
| // Backends is the list of backends to which the request should be routed to when the headers match. | ||
| Backends []*Backend `json:"backends"` |
There was a problem hiding this comment.
Sorry, How we map the Backends with the headers now? Seems they're separated, no obvisous relationship between them.
There was a problem hiding this comment.
oh you are right. that's necessary for #588 right? we need to store the route->backends info somewhere in here. would you mind sending a patch?
**Commit Message** This fixes `aigw run` command which has been disabled since the refactoring in #599. This requires a couple bug fixes in Envoy Gateway side, so this commit includes the upgrade of the EG as a dependency. **Related Issues/PRs (if applicable)** * Closes #607 * Includes envoyproxy/gateway/pull/5984 * Includes envoyproxy/gateway/pull/6020 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Commit Message** The backends and headers in filter config are M:N, with #599, we swapped out the backend from the config.rules, leading to the lost mapping relationship between them. With this PR, we'll move the backends back to the config.rules which is more straightforward. **Related Issues/PRs (if applicable)** Related PR: #620, #599 **Special notes for reviewers (if applicable)** None --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
**Commit Message** This commit refactors the internal on how the ext proc is deployed. Specifically, this switches to insert the ext proc container as a sidecar container of Envoy pods created by Envoy Gateway. This is another large refactoring that turned out necessary for #599. This utilizes the mutating webhook to insert the extproc container Envoy pods. Making the extproc as as sidecar means that we now have a one-to-one mapping between Gateway and the extproc hence this naturally resolves the previously known limitation #509 and now users can attach multiple AIGatewayRoute(s) to one Gateway. Implementation note: since the volume mounts only work in the namespace-scoped way, use-created secrets (like API Keys) cannot be mounted by the extproc as it runs in "envoy-gateway-system" namespace. To resolve this, now the controller reads the secret and embed the read credentials into the "extproc secret" (which is previously known as "extproc configmap") together with routing, matching and backend information. That secret is written in the "envoy-gateway-system" namespace hence it can be mounted by the extproc container. **Related Issues/PRs (if applicable)** Resolves #509 Resolves #621 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since #599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref #612 Ref #73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update paralleltoolcalls Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add back system helper Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update paralleltoolcalls Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add back system helper Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
**Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Commit Message
This commit is a relatively large refactoring of internals to make Envoy AI Gateawy's API more aligned with Envoy Gateway's BackendTrafficPolicy as well as HTTPRoute. Specifically, the main objective here to allow failover and retires to work well across multiple AIServiceBackend.
One of the most notable changes in this commit is that we split the extproc's logic into two phases; one is executed at the normal router level that selects a route (as opposed to the backend selection previously) and the other as the upstream filter that performs auth and transformation. In other words, Envoy AI Gateway configures two external processing filters.
As a result, users are now able to configure failover as well as the retry/fallback using Envoy Gateway's BackendTrafficPolicy attached to HTTPRoute generated by the Envoy AI Gateway. For example, this allows us to support the case where primary cluster is an Azure OpenAI and when it's failing, the AI Gateway fallbacks to AWS Bedrock with the standard Envoy Gateway configuration.
Background
At the Envoy configuration level, Envoy Gateway translates multiple backends in a single HTTPRoute's Rule into a single Envoy cluster whose endpoints consists of multiple Endpoint set (called
LocalityLbEndpointsin Envoy API [1]) and each set corresponds to a Backend with priority configured. For example, very roughly speaking, the following pseudo HTTPRoutewill be translated as, when
secondary-backendis marked asfallback: truein its Backend definition ([2]):where priority is configured 0 and 1 for each primary and secondary backend. When retry or passive health check is configured, Envoy will retry or fallback into the secondary cluster.
In our API, transformation as well as upstream authentication must be performed per Backend so these logic must be inserted after this endpoint set (or LocalityLbEndpoints to be precise) is chosen by Envoy. For example, primary.com and secondary.com might have different API schema, authentication etc. Since Envoy has a specific HTTP filter chain that will be executed at this stage, which is called "upstream filters", if we insert the extproc that performs these logic, we can properly do authn/z and transformation in response to the retry attempts by Envoy natively.
From the upstream filter level external processor's perspective, it needs to know which exactly backend is chosen by the Envoy's cluster load balancing logic. We add some additional metadata information into the endpoint with EG's extension server so that the extproc can retrieve these information. We also use the extension server to insert the upstream extproc filter since currently it's not supported by EG. These logic in our extension server can be eliminated when the corresponding functionality become available in EG ([3],[4]).
Caveats
aigw runtemporarily disabled until [5] is resolved[1] https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto
[2] https://gateway.envoyproxy.io/latest/api/extension_types/#backendspec
[3] envoyproxy/gateway#5523
[4] envoyproxy/gateway#5351
[5] envoyproxy/gateway#5918
Related Issues/PRs (if applicable)
Partially resolves the provider level fallbacks for #34