docs: add epp integration proposal by Xunzhuo · Pull Request #771 · envoyproxy/ai-gateway

Xunzhuo · 2025-06-25T10:15:50Z

Description

This PR adds the proposal for supporting Integration with Endpoint Picker(GIE)

Related to #423

Signed-off-by: bitliu <bitliu@tencent.com>

missBerg · 2025-06-26T00:15:18Z

@yanavlasov you may appreciate this document 😊

mathetake · 2025-06-26T23:14:53Z

will take a look tomorrow (in the us timezone)

yuzisun · 2025-06-27T21:06:21Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+This is a core functionality in EAGW`s vision, make the routing more intelligent.
+
+![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)


Could we add this image to the envoy ai gateway repo?

Second this, I think it's unaccessible now.

yuzisun · 2025-06-27T21:11:44Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+Above section tells, the destination is chosen by EPP and the information is located in header and metadata, so the way envoy determines is to read the header or the metadata to pick the target endpoint.
+
+There are two approaches envoy can work in this scenario:


Is there any pros and cons between the two approaches ?

overrideHostSources supports a list which is more flexible. The request is forwarded to the first address and subsequent addresses are used for request retries or hedging.

yuzisun · 2025-06-27T21:24:43Z

docs/proposals/003-epp-integration-proposal/proposal.md

+          kind: InferencePool
+```
+
+#### Option 2: Add InferencePool as an backendRef on AIServiceBackend Level


I like this option as InferencePool should be at the AIServiceBackend level.

mathetake

Generally looks good (I like your break down of how it works) with some comments:

The reason why I prefer the AIGatewyRouteBackendRef level is that InferencePool will never need BackendSecurityPolicy as well as the schema configuration. We should be able to assume they are using OpenAI format as GAIE's implementation itself is based on that assumption.
We have to document that cross provider fallback won't work at the InferencePool due to the way Envoy cluster is configured on both EG and this extension server level insertion. This is I believe cannot be resolved unless EG allows us to use aggregate cluster. (Or maybe it's possible to use it at our extension server level?)
We also have to document that users are not allowed to define multiple InferencePool and/or with normal AIServiceBackends in a single route rule. This can be enforced at k8s CEL validation layer. The reason is similar as to why the fallback cannot work above. The multiple backends in a route rule thingy assumes that an Envoy cluster contains multiple backends as LocalityLbEndpoints for each Backend whose metadata can be used to distinguish which AIServiceBackend it belongs to, and with that, the extproc can determine translator etc. Also, EG level fallback/priority works on that level.

One last question though is that what do we do about #648. It's just for conformance test but at the end of the day, almost the entire logic lives within the extproc, so i don't think that matters in reality. Having said that though, I would like to see our impl pass the conformance test as-is as well. wdyt?

mathetake · 2025-06-27T20:43:17Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+This requires to expand the `AIGatewayRouteRuleBackendRef` with `BackendObjectReference`
+
+##### Current


nit: can you use ```diff block to merge "Current" and "Target" to make it visible

mathetake · 2025-06-27T20:45:43Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+To integrate with the GIE, there are two options:
+
+#### Option 1: Add InferencePool as an backendRef on AIGatewayRoute Level


I am +1 to this since this is the same as the initial proposal.

yuzisun · 2025-06-27T21:46:07Z

The reason why I prefer the AIGatewyRouteBackendRef level is that InferencePool will never need BackendSecurityPolicy as well as the schema configuration. We should be able to assume they are using OpenAI format as GAIE's implementation itself is based on that assumption.

This might not be true, InferencePool is designed for self hosted model endpoints which itself can be protected with authentication and authorization, so you still need a BackendSecurityPolicy in that case.

yuzisun · 2025-06-27T21:55:42Z

We have to document that cross provider fallback won't work at the InferencePool due to the way Envoy cluster is configured on both EG and this extension server level insertion. This is I believe cannot be resolved unless EG allows us to use aggregate cluster. (Or maybe it's possible to use it at our extension server level?)

This is a good point, the use case does exist though as we want to fallback to an InferencePool if the cloud model endpoint is unhealthy. For the time being we can add the validation.

yuzisun · 2025-06-27T22:01:34Z

docs/proposals/003-epp-integration-proposal/proposal.md

+![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)
+
+## Goals
+ Integrate with EPP to expand the Envoy AI Gateway abilities


What are the EPP algorithms we plan to support initially ? would like to see more details on the EPP implementation

i believe the EEP is selected via InferencePool.spec.extensionRef (https://gateway-api-inference-extension.sigs.k8s.io/reference/spec/#extensionreference) and that's agnostic of the implementation where users can freely specify their own EEP's deployment.

for example, they can deploy the GAIE reference impl or their own custom extproc cc @kerthcet

Yes, and for how llm-d doing is GW + GIE as a core and the scheduler is a higher level for that. So basically if we support this proposal, we can be an option for llm-d scheduler GW implementation.

I understand user can plugin their own epp implementation. I guess my question is that what envoy ai gateway offers as out of the box AI aware routing like kv-aware, prefix-aware and disaggregated prefilling routing. LLM-d is implementation details of envoy ai gateway.

@yuzisun the first goal for envoy ai gw is to take GIE as a default EPP implementation, it provides the routing algorithm you raised above.

What we want here is just a flexible way to embed our own logic but let the ai gateway help us handle the traffic forwarding.

I think even in the long term, maybe ai gw should not touch this part as well, llm-d, aibrix, our own platform llmaz all follow the same way just because the implementation is complex and domain specific, for example, it needs to collect metrics for decision make. It would be surprised if aigw wants to do this.

I'm not saying this pattern is always right, because all the things goes too fast. And if there're any very general algorithms there, also happy to see here.

@kerthcet as I commented below, the Envoy AI GW only cares about the InferencePool API, so actually, after this landing, we can support any endpoint picker.

mathetake · 2025-06-27T22:56:40Z

yeah, so if we somehow will be able to resolve the fallback issue (== cluster definition and priority issue) with aggregated cluster, maybe it's better to have it on AIServiceBackend level. My concern is that, as I commented above, that might end up making it difficult for the direct HTTPRoute support, though i am not sure whether we want to do that in the first place. The direct support portion can be deferred later i guess as i don't think that really matters in reality.

Xunzhuo · 2025-06-27T23:22:17Z

docs/proposals/003-epp-integration-proposal/proposal.md

+ patch the ext-proc filter into the route configuration based on which route the InferencePool is linked with
+ add the cluster with loadbalancing policy or ORIGINAL_DST to understand the header and route  `x-gateway-destination-endpoint`
+
+#### Resource Generation


Still need to decide for two points, the resource generation and works with EG.

i don't think we have currently any bandwidth to maintain this dynamic one, so let's start small with the static configuration by end user.

Sorry, I didn't get the point here, I think we can not create the whole GIE resources in ai gw, then how can we say manage the lifecycle?

+1 to start with the base line.

Yes currently, we need user to install the GIE deployment themselves. This is considering that we want to just focus on the InferencePool API, we just manages the ext-proc config and route level config, help the EPP connects to EnvoyProxy in the right place. This can allow end-user to implement custom EPP and integrate with Envoy AI Gateway.

If we manage to GIE resource, maybe we are binded to the EPP implementation

btw KServe is going with the llm-d EPP implementation and KServe's LLMInferenceService API handles the EPP deployment, so I do not think this needs to be done in Envoy AI Gateway. see https://github.com/kserve/kserve/blob/master/pkg/apis/serving/v1alpha1/llm_inference_service_types.go#L171

kerthcet · 2025-06-28T06:05:36Z

/cc

Xunzhuo · 2025-06-30T16:34:26Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+#### Work with Envoy Gateway
+
+There are two work-in-process PRs in upstream:


Thoughts on the two approaches? I prefer the first one, we created the Backend + EEP for LLMInferencePool Backend Type.

cc @mathetake

I would prefer the extension server approach. We can later revisit how to utilize the EG resources but after i saw the discussion here envoyproxy/gateway#6234, i am not sure if it's ready for us to tie our implementation with EG level API. Not only that, personally i feel it's easier to understand what's going on if we consolidate all the logic in the extension server.

Yep, if we take the extension server approach, maybe we can reuse the EEP for generating the ext proc server filter and cluster and we modify the route and add original dst cluster for it?

plus i think with this approach, we can pass the conformance test without modifying it (#648)?

@mathetake yes, if we create a httproute by the aigwroute and sycned the InferencePool configuration with it, for the conformance view, it is matched with its requirements.

agree with the extension server approach for now.

Xunzhuo · 2025-07-02T13:39:28Z

Hey forks, a left decision to make is the place we add the InferencePool.
I am +1 for adding that in AIGWRoute level, pretty straightforward, and also we can starting the implementation when envoyproxy/gateway#6342 is landing.

Envoy Gateway will support custom backendRef via extension server.

The workflow will be like:

Enable Envoy Gateway extension server with InferencePool backendResources
Install GIE related resources first (CRDs , deployment)
Create InferencePool refers to the ext-proc service
Use InferencePool as the AIGatewayRoute Backend (limitation is one InferencePool per Rule)
Envoy AI GW sync it to the managed HTTPRoute with BackendRef of InferencePool
Envoy AI GW also creates an EnvoyExtensionPolicy with ext-proc info of InferencePool, and targetRef is the HTTPRoute.
Envoy GW carries the InferencePool resource with the Cluster by calling PostClusterModify
Envoy AI GW modifies the Cluster with Original Dst Cluster and enable the header match with x-gateway-destination-endpoint
Client calls to EnvoyProxy, EnvoyProxy talks to the GIE, GIE adds the header, EnvoyProxy forwards with the header.

mathetake · 2025-07-02T15:51:04Z

yep, i am +1 for Route Rule level too which matches the underlying HTTPRoute extension ref style.

As for the backend security attachment pointed out by @yuzisun, i think with this #549 refactoring, we can allow BSP to be attached to the pool. That way we can also allow InferencePool to use the BSP. i guess this seems to be the way.

mathetake · 2025-07-02T17:11:06Z

@Xunzhuo could you refactor the doc based on the discussion/agreement we had, and then i can stamp

kerthcet

Thanks @Xunzhuo for all these efforts, left some comments, generally would like to see more integrations with non-GIE implementations. I think also one of the goals of this proposal? Maybe could elaborate more once we made our technical decisions.

kerthcet · 2025-07-02T21:23:39Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+This is a core functionality in EAGW`s vision, make the routing more intelligent.
+
+![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)


Second this, I think it's unaccessible now.

kerthcet · 2025-07-02T21:26:35Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+This propopal aims to land integration with other endpoint picker in Envoy AI Gateway, expand EAGW abilities with other EPP implementations, like Gateway API Inference Extension, AIBrix Plugin, semantic router etc.
+
+This is a core functionality in EAGW`s vision, make the routing more intelligent.


I guess the main goal here is more flexible.

kerthcet · 2025-07-02T21:42:06Z

docs/proposals/003-epp-integration-proposal/proposal.md

+![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)
+
+## Goals
+ Integrate with EPP to expand the Envoy AI Gateway abilities


What we want here is just a flexible way to embed our own logic but let the ai gateway help us handle the traffic forwarding.

I think even in the long term, maybe ai gw should not touch this part as well, llm-d, aibrix, our own platform llmaz all follow the same way just because the implementation is complex and domain specific, for example, it needs to collect metrics for decision make. It would be surprised if aigw wants to do this.

I'm not saying this pattern is always right, because all the things goes too fast. And if there're any very general algorithms there, also happy to see here.

kerthcet · 2025-07-02T21:46:53Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+## Goals
+ Integrate with EPP to expand the Envoy AI Gateway abilities
+ Integrate with the existing CRD and features well


Maybe we should add Non-Goals as we discussed above:

No EPP implementations would be supported as the out-of-the-box routing algorithms, would consider this in the future if needed.

I do not want to exclude this possibility, there are use cases for picking a cloud model endpoint like QoS which is not the focus of llm-d, aibrix.

Yeah, it's good to have general ones. But can you elaborate more about QoS endpoints, what do it means?

kerthcet · 2025-07-02T22:04:15Z

docs/proposals/003-epp-integration-proposal/proposal.md

+    failureMode: FailClose
+```
+
+The control plane will generate the corresponding ext proc config (filter + cluster) to envoy, Take the inferencePool above as an example, the destination would be `vllm-llama3-8b-instruct-epp:9002` in the same namespace with the InferencePool.


What about non-GAIE, no where to store the information.

how about i think we can simply expand this implementation to support a generic k8s service as a backendref target, and that service can be a custom eep beyond inference pool.

kerthcet · 2025-07-02T22:05:45Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+Above section tells, the destination is chosen by EPP and the information is located in header and metadata, so the way envoy determines is to read the header or the metadata to pick the target endpoint.
+
+There are two approaches envoy can work in this scenario:


kerthcet · 2025-07-02T22:33:17Z

docs/proposals/003-epp-integration-proposal/proposal.md

+// AIGatewayRouteRuleBackendRef is a reference to a backend with a weight.
+type AIGatewayRouteRuleBackendRef struct {
+
+	gwapiv1.BackendObjectReference


One confusing thing is the default value of backendObjectReference.kind is Service, but we're AIServiceBackend, I think they're not the same. And seems the fields are nested too deep ...

kerthcet · 2025-07-02T22:39:41Z

docs/proposals/003-epp-integration-proposal/proposal.md

+          kind: InferencePool
+```
+
+#### Option 2: Add InferencePool as an backendRef on AIServiceBackend Level


Why I feel this one is more reasonable?

kerthcet · 2025-07-02T22:45:08Z

docs/proposals/003-epp-integration-proposal/proposal.md

+
+based on the background, we need to generate such configurations:
+
+##### ext-proc config


Maybe more on the integration with non-GIE projects, I think it's reasonable since we want to work out a general solution.

@kerthcet Actually this is not binded to the GIE, just focus on the InferencePool API, with this, we can support any endpoint picker.

kerthcet · 2025-07-02T23:01:35Z

docs/proposals/003-epp-integration-proposal/proposal.md

+ patch the ext-proc filter into the route configuration based on which route the InferencePool is linked with
+ add the cluster with loadbalancing policy or ORIGINAL_DST to understand the header and route  `x-gateway-destination-endpoint`
+
+#### Resource Generation


Sorry, I didn't get the point here, I think we can not create the whole GIE resources in ai gw, then how can we say manage the lifecycle?

+1 to start with the base line.

Xunzhuo · 2025-07-03T07:04:03Z

For the non-GIE EPP implementation, there are some requirements for it.

it should be an ext-proc server
it should be picking the endpoint by adding x-gateway-destination-endpoint header in the request header
it should be awared of the InferencePool API (it tells Envoy AI Gateway how to connect to the ext-proc server, and which routes should be attached to the ext-proc route level filter, and modifies the cluster with original dst cluster with header x-gateway-destination-endpoint)

After meeting the requirements above, non-GIE EPP implementation just needs to deploy and create InferencePool to link it to the AIGatewayRoute, then when request comes to the envoy, it will talk to the non-GIE EPP ext-proc server, ask for the x-gateway-destination-endpoint in the header, then forward to the picked endpoint.

cc @kerthcet

Xunzhuo · 2025-07-03T10:34:31Z

Some points left to discuss:

LB policy: use host override policy or use original dst?
Fallback support: since we support cluster level hook in eg, can we support fallback between the InferencePool and AIBackendService? (one InferencePool and n*AIServiceBackend at one rule, we just need to patch the cluster carried with the inferencePool resource)
Token ratelimit: can we support token based ratelimit to inferencePool?

mathetake · 2025-07-03T21:33:25Z

LB policy: use host override policy or use original dst?

you can start with original_dst as it's currently used by the reference implementation repo.

Fallback support:

As I said above, i don't think it will work without a huge refactoring of EG, not here. EG's fallback works WITHIN one cluster where multiple backends in a single route rule exists as a separate localityLBEndpoints. On the other hand, this case, InfrencePool cannot coexist with other normal backends in one single cluster as the cluster will be configured to use original_dst (or override_host) stuff. Maybe there's a solution but atm i have no concrete idea. We can ignore this and document this limitation at the initial impl.

Token ratelimit: can we support token based ratelimit to inferencePool?

As long as the upstream filter extproc is configured properly, i should work. In other words, don't forget to insert upstream extproc filter just like we do in the current extension server code;)

yuzisun · 2025-07-03T23:58:13Z

LB policy: use host override policy or use original dst?

you can start with original_dst as it's currently [used by the reference implementation

My only concern is that original_dst does not seem like support fallback, what if the epp service is not available and the request will fail?

mathetake · 2025-07-04T00:05:35Z

yeah we can use the override_host then

Xunzhuo · 2025-07-04T00:22:35Z

For the host override lbpolicy, we need to create a service with the same selector of the inference pool, and the endpoint selection should be in the service endpoints. Istio creates a service for the inference pool to meet host override policy requirements.

mathetake · 2025-07-04T00:27:27Z

up to you but I would start with the simpler impl

Xunzhuo · 2025-07-04T00:31:02Z

Sure I think this cannot be too hard to implement the host override(generate a service and pin it to one more backendRef with InferencePool), but we can do it later. Will document it in design doc.

Xunzhuo · 2025-07-04T00:32:20Z

I will start to summarize to design doc today. Since the extension server is going to merge now

mathetake · 2025-07-04T00:33:44Z

yep, this is exciting!

Signed-off-by: bitliu <bitliu@tencent.com>

mathetake

Well done!

kerthcet · 2025-07-04T09:55:48Z

I know GIE is definitely the in-tree support for our gateway, but I'm not quite understand why we need to have inferencePool for non-GIE implementations as well, to me, I hope not, also would like to hear advices from @yuzisun wearing the hat of kserve.

how about i think we can simply expand this implementation to support a generic k8s service as a backendref target, and that service can be a custom eep beyond inference pool.

I think @mathetake advice is what we hope to have, relying on kubernetes native objects only.

kerthcet · 2025-07-04T09:57:10Z

Also only me can not see the arch diagram?

Xunzhuo · 2025-07-04T11:18:14Z

The reason we support InferencePool is just using it to determine which routes should use the Original DST or Host Override Policy, and tell envoy AI GW how to connect to the EPP ext-proc.

So I don't think we support it for binding integrations to GIE.

Xunzhuo · 2025-07-04T11:19:15Z

Also only me can not see the arch diagram?

Removed that for now, it is a graph we use in KubeCon

**Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

@mathetake

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

@mathetake

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update paralleltoolcalls Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add back system helper Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

@mathetake

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update paralleltoolcalls Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add back system helper Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> lint no err Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add translation Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> update so tests work Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more tests Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove print Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> refactor: deprecate targetRefs in favor or parentRefs (envoyproxy#821) docs: add epp integration proposal (envoyproxy#771) **Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: update epp outdated logics (envoyproxy#822) refactor: use Envoy native router (envoyproxy#793) **Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since envoyproxy#599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref envoyproxy#612 Ref envoyproxy#73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: fix aigw parentRefs in fallback (envoyproxy#824) **Description** This PR fixed the AIGatewayRoute parentRefs in fallback guides. Signed-off-by: bitliu <bitliu@tencent.com> chore: make test-e2e logs visible (envoyproxy#825) **Description** This PR is to make test-e2e logs visible in local. Signed-off-by: bitliu <bitliu@tencent.com> extproc: account for parallel tool calls (envoyproxy#813) **Description** Resolves envoyproxy#736 Assistant that calls multiple tools are expected to group tool result in the same message. Adding logic for that! --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net> build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833) extproc: return 404 instead of 500 for unknown path (envoyproxy#835) **Description** Previously, unknown path was responded as an internal error as opposed to the fact that it's an 404 with the user input root cause. This fixes the extproc code that way, now that users will be able to know what's wrong with the operation instead of getting the cryptic 500 error. **Related Issues/PRs (if applicable)** Contributes to envoyproxy#810 Closes envoyproxy#724 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> docs: add endpoint support (envoyproxy#787) **Description** This PR adds the endpoint support pages for EAGW. **Related Issues/PRs (if applicable)** Fixes: envoyproxy#705 **Special notes for reviewers (if applicable)** @mathetake --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> controller: return 404 instead of 500 for no matching (envoyproxy#837) **Description** Before envoyproxy#793, the case where no matching route found was handled in the extproc and the 404 immediate response was returned from there, but after that, it naturally results in the "unreachable" default route and swallowed the indication of no matching and it made it impossible to reason about the 500 error on that case. In other words, this fixes the regression in envoyproxy#793 to return the proper 404 response. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> update Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit passing Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> remove header hotfix Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> precommit working Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> add more test coverage Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> test: adds real provider embeddings test & update doc (envoyproxy#841) **Description** This adds embeddings endpoint tests with the providers that support the endpoint. This only added the providers for which we have credentials. According to the testing situation we have right now, this also clarifies in the "Supported Endpoints" page that which provider is tested and which is not for each endpoint. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> cli: adds default route test (envoyproxy#842) **Description** This adds an additional test to aigw run command so that we can verify that setting the default route is possible. **Related Issues/PRs (if applicable)** Closes envoyproxy#612 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845) test: fixes TestStartConfigWatcher flake (envoyproxy#843) controller: ensure eg rollout when deployed as daemonset (envoyproxy#831) **Description** This PR handles the rollout for envoy gateway during ai gateway extproc upgrade when deployed as daemonset. Related Issues/PRs (if applicable) Related PR: envoyproxy#699 --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> make test var Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

**Description** This PR adds the proposal for supporting Integration with Endpoint Picker(GIE) Related to envoyproxy#423 --------- Signed-off-by: bitliu <bitliu@tencent.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

Xunzhuo requested a review from a team as a code owner June 25, 2025 10:15

Xunzhuo force-pushed the docs-proposal branch 2 times, most recently from c7f57c8 to e885156 Compare June 25, 2025 10:24

docs: add epp integration proposal

5c06a1f

Signed-off-by: bitliu <bitliu@tencent.com>

Xunzhuo force-pushed the docs-proposal branch from e885156 to 5c06a1f Compare June 25, 2025 10:29

yuzisun reviewed Jun 27, 2025

View reviewed changes

mathetake reviewed Jun 27, 2025

View reviewed changes

yuzisun reviewed Jun 27, 2025

View reviewed changes

Xunzhuo commented Jun 27, 2025

View reviewed changes

Xunzhuo commented Jun 30, 2025

View reviewed changes

mathetake added this to the v0.3.0 milestone Jul 2, 2025

mathetake self-assigned this Jul 2, 2025

kerthcet reviewed Jul 2, 2025

View reviewed changes

resolve feedbacks

8a97760

Signed-off-by: bitliu <bitliu@tencent.com>

Xunzhuo force-pushed the docs-proposal branch from cb296a8 to 8a97760 Compare July 4, 2025 02:47

mathetake approved these changes Jul 4, 2025

View reviewed changes

Merge branch 'main' into docs-proposal

5275898

mathetake merged commit c3fda88 into envoyproxy:main Jul 4, 2025
9 checks passed


		This is a core functionality in EAGW`s vision, make the routing more intelligent.

		![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)


		Above section tells, the destination is chosen by EPP and the information is located in header and metadata, so the way envoy determines is to read the header or the metadata to pick the target endpoint.

		There are two approaches envoy can work in this scenario:


		This requires to expand the `AIGatewayRouteRuleBackendRef` with `BackendObjectReference`

		##### Current


		To integrate with the GIE, there are two options:

		#### Option 1: Add InferencePool as an backendRef on AIGatewayRoute Level


		#### Work with Envoy Gateway

		There are two work-in-process PRs in upstream:


		This propopal aims to land integration with other endpoint picker in Envoy AI Gateway, expand EAGW abilities with other EPP implementations, like Gateway API Inference Extension, AIBrix Plugin, semantic router etc.

		This is a core functionality in EAGW`s vision, make the routing more intelligent.


		based on the background, we need to generate such configurations:

		##### ext-proc config

Conversation

Xunzhuo commented Jun 25, 2025

Uh oh!

missBerg commented Jun 26, 2025

Uh oh!

mathetake commented Jun 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathetake left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzisun commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuzisun commented Jun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzisun Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathetake commented Jun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xunzhuo Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzisun Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kerthcet commented Jun 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xunzhuo Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mathetake left a comment •

edited

Loading

yuzisun commented Jun 27, 2025 •

edited

Loading

yuzisun Jun 28, 2025 •

edited

Loading

Xunzhuo Jul 3, 2025 •

edited

Loading

yuzisun Jul 3, 2025 •

edited

Loading

Xunzhuo Jul 1, 2025 •

edited

Loading