Skip to content

docs: add epp integration proposal#771

Merged
mathetake merged 3 commits intoenvoyproxy:mainfrom
Xunzhuo:docs-proposal
Jul 4, 2025
Merged

docs: add epp integration proposal#771
mathetake merged 3 commits intoenvoyproxy:mainfrom
Xunzhuo:docs-proposal

Conversation

@Xunzhuo
Copy link
Copy Markdown
Member

@Xunzhuo Xunzhuo commented Jun 25, 2025

Description

This PR adds the proposal for supporting Integration with Endpoint Picker(GIE)

Related to #423

@Xunzhuo Xunzhuo requested a review from a team as a code owner June 25, 2025 10:15
@Xunzhuo Xunzhuo force-pushed the docs-proposal branch 2 times, most recently from c7f57c8 to e885156 Compare June 25, 2025 10:24
Signed-off-by: bitliu <bitliu@tencent.com>
@missBerg
Copy link
Copy Markdown
Contributor

@yanavlasov you may appreciate this document 😊

@mathetake
Copy link
Copy Markdown
Member

will take a look tomorrow (in the us timezone)


This is a core functionality in EAGW`s vision, make the routing more intelligent.

![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add this image to the envoy ai gateway repo?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second this, I think it's unaccessible now.


Above section tells, the destination is chosen by EPP and the information is located in header and metadata, so the way envoy determines is to read the header or the metadata to pick the target endpoint.

There are two approaches envoy can work in this scenario:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any pros and cons between the two approaches ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overrideHostSources supports a list which is more flexible. The request is forwarded to the first address and subsequent addresses are used for request retries or hedging.

kind: InferencePool
```

#### Option 2: Add InferencePool as an backendRef on AIServiceBackend Level
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this option as InferencePool should be at the AIServiceBackend level.

Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good (I like your break down of how it works) with some comments:

  • The reason why I prefer the AIGatewyRouteBackendRef level is that InferencePool will never need BackendSecurityPolicy as well as the schema configuration. We should be able to assume they are using OpenAI format as GAIE's implementation itself is based on that assumption.
  • We have to document that cross provider fallback won't work at the InferencePool due to the way Envoy cluster is configured on both EG and this extension server level insertion. This is I believe cannot be resolved unless EG allows us to use aggregate cluster. (Or maybe it's possible to use it at our extension server level?)
  • We also have to document that users are not allowed to define multiple InferencePool and/or with normal AIServiceBackends in a single route rule. This can be enforced at k8s CEL validation layer. The reason is similar as to why the fallback cannot work above. The multiple backends in a route rule thingy assumes that an Envoy cluster contains multiple backends as LocalityLbEndpoints for each Backend whose metadata can be used to distinguish which AIServiceBackend it belongs to, and with that, the extproc can determine translator etc. Also, EG level fallback/priority works on that level.

One last question though is that what do we do about #648. It's just for conformance test but at the end of the day, almost the entire logic lives within the extproc, so i don't think that matters in reality. Having said that though, I would like to see our impl pass the conformance test as-is as well. wdyt?


This requires to expand the `AIGatewayRouteRuleBackendRef` with `BackendObjectReference`

##### Current
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you use ```diff block to merge "Current" and "Target" to make it visible


To integrate with the GIE, there are two options:

#### Option 1: Add InferencePool as an backendRef on AIGatewayRoute Level
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 to this since this is the same as the initial proposal.

@yuzisun
Copy link
Copy Markdown
Contributor

yuzisun commented Jun 27, 2025

  • The reason why I prefer the AIGatewyRouteBackendRef level is that InferencePool will never need BackendSecurityPolicy as well as the schema configuration. We should be able to assume they are using OpenAI format as GAIE's implementation itself is based on that assumption.

This might not be true, InferencePool is designed for self hosted model endpoints which itself can be protected with authentication and authorization, so you still need a BackendSecurityPolicy in that case.

@yuzisun
Copy link
Copy Markdown
Contributor

yuzisun commented Jun 27, 2025

  • We have to document that cross provider fallback won't work at the InferencePool due to the way Envoy cluster is configured on both EG and this extension server level insertion. This is I believe cannot be resolved unless EG allows us to use aggregate cluster. (Or maybe it's possible to use it at our extension server level?)

This is a good point, the use case does exist though as we want to fallback to an InferencePool if the cloud model endpoint is unhealthy. For the time being we can add the validation.

![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)

## Goals
+ Integrate with EPP to expand the Envoy AI Gateway abilities
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the EPP algorithms we plan to support initially ? would like to see more details on the EPP implementation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe the EEP is selected via InferencePool.spec.extensionRef (https://gateway-api-inference-extension.sigs.k8s.io/reference/spec/#extensionreference) and that's agnostic of the implementation where users can freely specify their own EEP's deployment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, they can deploy the GAIE reference impl or their own custom extproc cc @kerthcet

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and for how llm-d doing is GW + GIE as a core and the scheduler is a higher level for that. So basically if we support this proposal, we can be an option for llm-d scheduler GW implementation.

Copy link
Copy Markdown
Contributor

@yuzisun yuzisun Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand user can plugin their own epp implementation. I guess my question is that what envoy ai gateway offers as out of the box AI aware routing like kv-aware, prefix-aware and disaggregated prefilling routing. LLM-d is implementation details of envoy ai gateway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun the first goal for envoy ai gw is to take GIE as a default EPP implementation, it provides the routing algorithm you raised above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we want here is just a flexible way to embed our own logic but let the ai gateway help us handle the traffic forwarding.

I think even in the long term, maybe ai gw should not touch this part as well, llm-d, aibrix, our own platform llmaz all follow the same way just because the implementation is complex and domain specific, for example, it needs to collect metrics for decision make. It would be surprised if aigw wants to do this.

I'm not saying this pattern is always right, because all the things goes too fast. And if there're any very general algorithms there, also happy to see here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerthcet as I commented below, the Envoy AI GW only cares about the InferencePool API, so actually, after this landing, we can support any endpoint picker.

@mathetake
Copy link
Copy Markdown
Member

yeah, so if we somehow will be able to resolve the fallback issue (== cluster definition and priority issue) with aggregated cluster, maybe it's better to have it on AIServiceBackend level. My concern is that, as I commented above, that might end up making it difficult for the direct HTTPRoute support, though i am not sure whether we want to do that in the first place. The direct support portion can be deferred later i guess as i don't think that really matters in reality.

+ patch the ext-proc filter into the route configuration based on which route the InferencePool is linked with
+ add the cluster with loadbalancing policy or ORIGINAL_DST to understand the header and route `x-gateway-destination-endpoint`

#### Resource Generation
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to decide for two points, the resource generation and works with EG.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we have currently any bandwidth to maintain this dynamic one, so let's start small with the static configuration by end user.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get the point here, I think we can not create the whole GIE resources in ai gw, then how can we say manage the lifecycle?

+1 to start with the base line.

Copy link
Copy Markdown
Member Author

@Xunzhuo Xunzhuo Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes currently, we need user to install the GIE deployment themselves. This is considering that we want to just focus on the InferencePool API, we just manages the ext-proc config and route level config, help the EPP connects to EnvoyProxy in the right place. This can allow end-user to implement custom EPP and integrate with Envoy AI Gateway.

If we manage to GIE resource, maybe we are binded to the EPP implementation

Copy link
Copy Markdown
Contributor

@yuzisun yuzisun Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw KServe is going with the llm-d EPP implementation and KServe's LLMInferenceService API handles the EPP deployment, so I do not think this needs to be done in Envoy AI Gateway. see https://github.com/kserve/kserve/blob/master/pkg/apis/serving/v1alpha1/llm_inference_service_types.go#L171

@kerthcet
Copy link
Copy Markdown
Contributor

/cc


#### Work with Envoy Gateway

There are two work-in-process PRs in upstream:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on the two approaches? I prefer the first one, we created the Backend + EEP for LLMInferencePool Backend Type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the extension server approach. We can later revisit how to utilize the EG resources but after i saw the discussion here envoyproxy/gateway#6234, i am not sure if it's ready for us to tie our implementation with EG level API. Not only that, personally i feel it's easier to understand what's going on if we consolidate all the logic in the extension server.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, if we take the extension server approach, maybe we can reuse the EEP for generating the ext proc server filter and cluster and we modify the route and add original dst cluster for it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus i think with this approach, we can pass the conformance test without modifying it (#648)?

Copy link
Copy Markdown
Member Author

@Xunzhuo Xunzhuo Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathetake yes, if we create a httproute by the aigwroute and sycned the InferencePool configuration with it, for the conformance view, it is matched with its requirements.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with the extension server approach for now.

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 2, 2025

Hey forks, a left decision to make is the place we add the InferencePool.
I am +1 for adding that in AIGWRoute level, pretty straightforward, and also we can starting the implementation when envoyproxy/gateway#6342 is landing.

Envoy Gateway will support custom backendRef via extension server.

The workflow will be like:

  1. Enable Envoy Gateway extension server with InferencePool backendResources
  2. Install GIE related resources first (CRDs , deployment)
  3. Create InferencePool refers to the ext-proc service
  4. Use InferencePool as the AIGatewayRoute Backend (limitation is one InferencePool per Rule)
  5. Envoy AI GW sync it to the managed HTTPRoute with BackendRef of InferencePool
  6. Envoy AI GW also creates an EnvoyExtensionPolicy with ext-proc info of InferencePool, and targetRef is the HTTPRoute.
  7. Envoy GW carries the InferencePool resource with the Cluster by calling PostClusterModify
  8. Envoy AI GW modifies the Cluster with Original Dst Cluster and enable the header match with x-gateway-destination-endpoint
  9. Client calls to EnvoyProxy, EnvoyProxy talks to the GIE, GIE adds the header, EnvoyProxy forwards with the header.

@mathetake
Copy link
Copy Markdown
Member

yep, i am +1 for Route Rule level too which matches the underlying HTTPRoute extension ref style.

As for the backend security attachment pointed out by @yuzisun, i think with this #549 refactoring, we can allow BSP to be attached to the pool. That way we can also allow InferencePool to use the BSP. i guess this seems to be the way.

@mathetake
Copy link
Copy Markdown
Member

@Xunzhuo could you refactor the doc based on the discussion/agreement we had, and then i can stamp

@mathetake mathetake added this to the v0.3.0 milestone Jul 2, 2025
@mathetake mathetake self-assigned this Jul 2, 2025
Copy link
Copy Markdown
Contributor

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Xunzhuo for all these efforts, left some comments, generally would like to see more integrations with non-GIE implementations. I think also one of the goals of this proposal? Maybe could elaborate more once we made our technical decisions.


This is a core functionality in EAGW`s vision, make the routing more intelligent.

![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second this, I think it's unaccessible now.


This propopal aims to land integration with other endpoint picker in Envoy AI Gateway, expand EAGW abilities with other EPP implementations, like Gateway API Inference Extension, AIBrix Plugin, semantic router etc.

This is a core functionality in EAGW`s vision, make the routing more intelligent.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the main goal here is more flexible.

![](http://liuxunzhuo.oss-cn-chengdu.aliyuncs.com/2025-06-25-090714.png)

## Goals
+ Integrate with EPP to expand the Envoy AI Gateway abilities
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we want here is just a flexible way to embed our own logic but let the ai gateway help us handle the traffic forwarding.

I think even in the long term, maybe ai gw should not touch this part as well, llm-d, aibrix, our own platform llmaz all follow the same way just because the implementation is complex and domain specific, for example, it needs to collect metrics for decision make. It would be surprised if aigw wants to do this.

I'm not saying this pattern is always right, because all the things goes too fast. And if there're any very general algorithms there, also happy to see here.


## Goals
+ Integrate with EPP to expand the Envoy AI Gateway abilities
+ Integrate with the existing CRD and features well
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add Non-Goals as we discussed above:

No EPP implementations would be supported as the out-of-the-box routing algorithms, would consider this in the future if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to exclude this possibility, there are use cases for picking a cloud model endpoint like QoS which is not the focus of llm-d, aibrix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's good to have general ones. But can you elaborate more about QoS endpoints, what do it means?

failureMode: FailClose
```

The control plane will generate the corresponding ext proc config (filter + cluster) to envoy, Take the inferencePool above as an example, the destination would be `vllm-llama3-8b-instruct-epp:9002` in the same namespace with the InferencePool.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about non-GAIE, no where to store the information.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about i think we can simply expand this implementation to support a generic k8s service as a backendref target, and that service can be a custom eep beyond inference pool.


Above section tells, the destination is chosen by EPP and the information is located in header and metadata, so the way envoy determines is to read the header or the metadata to pick the target endpoint.

There are two approaches envoy can work in this scenario:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// AIGatewayRouteRuleBackendRef is a reference to a backend with a weight.
type AIGatewayRouteRuleBackendRef struct {

gwapiv1.BackendObjectReference
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One confusing thing is the default value of backendObjectReference.kind is Service, but we're AIServiceBackend, I think they're not the same. And seems the fields are nested too deep ...

kind: InferencePool
```

#### Option 2: Add InferencePool as an backendRef on AIServiceBackend Level
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why I feel this one is more reasonable?


based on the background, we need to generate such configurations:

##### ext-proc config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more on the integration with non-GIE projects, I think it's reasonable since we want to work out a general solution.

Copy link
Copy Markdown
Member Author

@Xunzhuo Xunzhuo Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerthcet Actually this is not binded to the GIE, just focus on the InferencePool API, with this, we can support any endpoint picker.

+ patch the ext-proc filter into the route configuration based on which route the InferencePool is linked with
+ add the cluster with loadbalancing policy or ORIGINAL_DST to understand the header and route `x-gateway-destination-endpoint`

#### Resource Generation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get the point here, I think we can not create the whole GIE resources in ai gw, then how can we say manage the lifecycle?

+1 to start with the base line.

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 3, 2025

For the non-GIE EPP implementation, there are some requirements for it.

  1. it should be an ext-proc server
  2. it should be picking the endpoint by adding x-gateway-destination-endpoint header in the request header
  3. it should be awared of the InferencePool API (it tells Envoy AI Gateway how to connect to the ext-proc server, and which routes should be attached to the ext-proc route level filter, and modifies the cluster with original dst cluster with header x-gateway-destination-endpoint)

After meeting the requirements above, non-GIE EPP implementation just needs to deploy and create InferencePool to link it to the AIGatewayRoute, then when request comes to the envoy, it will talk to the non-GIE EPP ext-proc server, ask for the x-gateway-destination-endpoint in the header, then forward to the picked endpoint.

cc @kerthcet

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 3, 2025

Some points left to discuss:

  1. LB policy: use host override policy or use original dst?
  2. Fallback support: since we support cluster level hook in eg, can we support fallback between the InferencePool and AIBackendService? (one InferencePool and n*AIServiceBackend at one rule, we just need to patch the cluster carried with the inferencePool resource)
  3. Token ratelimit: can we support token based ratelimit to inferencePool?

@mathetake
Copy link
Copy Markdown
Member

mathetake commented Jul 3, 2025

  1. LB policy: use host override policy or use original dst?

you can start with original_dst as it's currently used by the reference implementation repo.

  1. Fallback support:

As I said above, i don't think it will work without a huge refactoring of EG, not here. EG's fallback works WITHIN one cluster where multiple backends in a single route rule exists as a separate localityLBEndpoints. On the other hand, this case, InfrencePool cannot coexist with other normal backends in one single cluster as the cluster will be configured to use original_dst (or override_host) stuff. Maybe there's a solution but atm i have no concrete idea. We can ignore this and document this limitation at the initial impl.

  1. Token ratelimit: can we support token based ratelimit to inferencePool?

As long as the upstream filter extproc is configured properly, i should work. In other words, don't forget to insert upstream extproc filter just like we do in the current extension server code;)

@yuzisun
Copy link
Copy Markdown
Contributor

yuzisun commented Jul 3, 2025

  1. LB policy: use host override policy or use original dst?

you can start with original_dst as it's currently [used by the reference implementation

My only concern is that original_dst does not seem like support fallback, what if the epp service is not available and the request will fail?

@mathetake
Copy link
Copy Markdown
Member

yeah we can use the override_host then

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 4, 2025

For the host override lbpolicy, we need to create a service with the same selector of the inference pool, and the endpoint selection should be in the service endpoints. Istio creates a service for the inference pool to meet host override policy requirements.

@mathetake
Copy link
Copy Markdown
Member

up to you but I would start with the simpler impl

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 4, 2025

Sure I think this cannot be too hard to implement the host override(generate a service and pin it to one more backendRef with InferencePool), but we can do it later. Will document it in design doc.

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 4, 2025

I will start to summarize to design doc today. Since the extension server is going to merge now

@mathetake
Copy link
Copy Markdown
Member

yep, this is exciting!

Signed-off-by: bitliu <bitliu@tencent.com>
Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

@mathetake mathetake merged commit c3fda88 into envoyproxy:main Jul 4, 2025
9 checks passed
@kerthcet
Copy link
Copy Markdown
Contributor

kerthcet commented Jul 4, 2025

I know GIE is definitely the in-tree support for our gateway, but I'm not quite understand why we need to have inferencePool for non-GIE implementations as well, to me, I hope not, also would like to hear advices from @yuzisun wearing the hat of kserve.

how about i think we can simply expand this implementation to support a generic k8s service as a backendref target, and that service can be a custom eep beyond inference pool.

I think @mathetake advice is what we hope to have, relying on kubernetes native objects only.

@kerthcet
Copy link
Copy Markdown
Contributor

kerthcet commented Jul 4, 2025

Also only me can not see the arch diagram?

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 4, 2025

The reason we support InferencePool is just using it to determine which routes should use the Original DST or Host Override Policy, and tell envoy AI GW how to connect to the EPP ext-proc.

So I don't think we support it for binding integrations to GIE.

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 4, 2025

Also only me can not see the arch diagram?

Removed that for now, it is a graph we use in KubeCon

alexagriffith pushed a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
alexagriffith added a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

lint no err

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add translation

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update so tests work

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more tests

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove print

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

refactor: deprecate targetRefs in favor or parentRefs  (envoyproxy#821)

docs: add epp integration proposal (envoyproxy#771)

**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: update epp outdated logics (envoyproxy#822)

refactor: use Envoy native router (envoyproxy#793)

**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: fix aigw parentRefs in fallback (envoyproxy#824)

**Description**

This PR fixed the AIGatewayRoute parentRefs in fallback guides.

Signed-off-by: bitliu <bitliu@tencent.com>

chore: make test-e2e logs visible (envoyproxy#825)

**Description**

This PR is to make  test-e2e logs visible in local.

Signed-off-by: bitliu <bitliu@tencent.com>

extproc: account for parallel tool calls (envoyproxy#813)

**Description**
Resolves envoyproxy#736

Assistant that calls multiple tools are expected to group tool result in
the same message. Adding logic for that!

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833)

extproc: return 404 instead of 500 for unknown path (envoyproxy#835)

**Description**

Previously, unknown path was responded as an internal error as opposed
to the fact that it's an 404 with the user input root cause. This fixes
the extproc code that way, now that users will be able to know what's
wrong with the operation instead of getting the cryptic 500 error.

**Related Issues/PRs (if applicable)**

Contributes to envoyproxy#810
Closes envoyproxy#724

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: add endpoint support (envoyproxy#787)

**Description**

This PR adds the endpoint support pages for EAGW.

**Related Issues/PRs (if applicable)**

Fixes: envoyproxy#705

**Special notes for reviewers (if applicable)**

@mathetake

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>

controller: return 404 instead of 500 for no matching (envoyproxy#837)

**Description**

Before envoyproxy#793, the case where no matching route found was handled in the
extproc and the 404 immediate response was returned from there, but
after that, it naturally results in the "unreachable" default route and
swallowed the indication of no matching and it made it impossible to
reason about the 500 error on that case. In other words, this fixes the
regression in envoyproxy#793 to return the proper 404 response.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

update

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit passing

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove header hotfix

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit working
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

 add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

test: adds real provider embeddings test & update doc (envoyproxy#841)

**Description**

This adds embeddings endpoint tests with the providers that support the
endpoint. This only added the providers for which we have credentials.
According to the testing situation we have right now, this also
clarifies in the "Supported Endpoints" page that which provider is
tested and which is not for each endpoint.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

cli: adds default route test (envoyproxy#842)

**Description**

This adds an additional test to aigw run command so that we can verify
that setting the default route is possible.

**Related Issues/PRs (if applicable)**

Closes envoyproxy#612

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845)

test: fixes TestStartConfigWatcher flake (envoyproxy#843)

controller: ensure eg rollout when deployed as daemonset (envoyproxy#831)

**Description**
This PR handles the rollout for envoy gateway during ai gateway extproc
upgrade when deployed as daemonset.

Related Issues/PRs (if applicable)
Related PR: envoyproxy#699

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

make test var

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
alexagriffith added a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update paralleltoolcalls

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add back system helper

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

lint no err

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add translation

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update so tests work

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more tests

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove print

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

refactor: deprecate targetRefs in favor or parentRefs  (envoyproxy#821)

docs: add epp integration proposal (envoyproxy#771)

**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: update epp outdated logics (envoyproxy#822)

refactor: use Envoy native router (envoyproxy#793)

**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: fix aigw parentRefs in fallback (envoyproxy#824)

**Description**

This PR fixed the AIGatewayRoute parentRefs in fallback guides.

Signed-off-by: bitliu <bitliu@tencent.com>

chore: make test-e2e logs visible (envoyproxy#825)

**Description**

This PR is to make  test-e2e logs visible in local.

Signed-off-by: bitliu <bitliu@tencent.com>

extproc: account for parallel tool calls (envoyproxy#813)

**Description**
Resolves envoyproxy#736

Assistant that calls multiple tools are expected to group tool result in
the same message. Adding logic for that!

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833)

extproc: return 404 instead of 500 for unknown path (envoyproxy#835)

**Description**

Previously, unknown path was responded as an internal error as opposed
to the fact that it's an 404 with the user input root cause. This fixes
the extproc code that way, now that users will be able to know what's
wrong with the operation instead of getting the cryptic 500 error.

**Related Issues/PRs (if applicable)**

Contributes to envoyproxy#810
Closes envoyproxy#724

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: add endpoint support (envoyproxy#787)

**Description**

This PR adds the endpoint support pages for EAGW.

**Related Issues/PRs (if applicable)**

Fixes: envoyproxy#705

**Special notes for reviewers (if applicable)**

@mathetake

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>

controller: return 404 instead of 500 for no matching (envoyproxy#837)

**Description**

Before envoyproxy#793, the case where no matching route found was handled in the
extproc and the 404 immediate response was returned from there, but
after that, it naturally results in the "unreachable" default route and
swallowed the indication of no matching and it made it impossible to
reason about the 500 error on that case. In other words, this fixes the
regression in envoyproxy#793 to return the proper 404 response.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

update

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit passing

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove header hotfix

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit working
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

 add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

test: adds real provider embeddings test & update doc (envoyproxy#841)

**Description**

This adds embeddings endpoint tests with the providers that support the
endpoint. This only added the providers for which we have credentials.
According to the testing situation we have right now, this also
clarifies in the "Supported Endpoints" page that which provider is
tested and which is not for each endpoint.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

cli: adds default route test (envoyproxy#842)

**Description**

This adds an additional test to aigw run command so that we can verify
that setting the default route is possible.

**Related Issues/PRs (if applicable)**

Closes envoyproxy#612

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845)

test: fixes TestStartConfigWatcher flake (envoyproxy#843)

controller: ensure eg rollout when deployed as daemonset (envoyproxy#831)

**Description**
This PR handles the rollout for envoy gateway during ai gateway extproc
upgrade when deployed as daemonset.

Related Issues/PRs (if applicable)
Related PR: envoyproxy#699

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

make test var

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
alexagriffith added a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 9, 2025
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update paralleltoolcalls

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add back system helper

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

lint no err

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add translation

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

update so tests work

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more tests

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove print

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

refactor: deprecate targetRefs in favor or parentRefs  (envoyproxy#821)

docs: add epp integration proposal (envoyproxy#771)

**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: update epp outdated logics (envoyproxy#822)

refactor: use Envoy native router (envoyproxy#793)

**Description**

This commit removes the handwritten header matching code from the
extproc, and instead starts utilizing the hardened envoy native router.

Historically, we had only one giant extproc filter where we did all
logics including model name extraction, routing and then body
transformation & upstream authorization. Since envoyproxy#599, we split into two
external processor filters; one sits at the normal HTTP router and the
other is configured at the per-cluster upstream HTTP filter. In theory,
the one at HTTP router has only one job on request path: extracting
model name from the request body. However, due to the historical reason,
the handwritten router logic component remained, and that comes with not
only a maintenance cost (forcing a complex extproc & control plane
orchestration) but also a potential security vulnerability. In fact,
writing header matching logic can be an easy attack surface, so if it's
possible, we should avoid writing our own header matching (routing
logic) but should rely on the battle-tested hardened envoy native
router.

With this commit, now a regex matching is available as well as there's
no difference between HTTPRoute's matching and AIGatewayRoute's matching
implementation. This also opens up a possibility to support path
matching in our rule.

**Related Issues/PRs (if applicable)**

Ref envoyproxy#612
Ref envoyproxy#73

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: fix aigw parentRefs in fallback (envoyproxy#824)

**Description**

This PR fixed the AIGatewayRoute parentRefs in fallback guides.

Signed-off-by: bitliu <bitliu@tencent.com>

chore: make test-e2e logs visible (envoyproxy#825)

**Description**

This PR is to make  test-e2e logs visible in local.

Signed-off-by: bitliu <bitliu@tencent.com>

extproc: account for parallel tool calls (envoyproxy#813)

**Description**
Resolves envoyproxy#736

Assistant that calls multiple tools are expected to group tool result in
the same message. Adding logic for that!

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump google.golang.org/genai from 1.13.0 to 1.14.0 (envoyproxy#833)

extproc: return 404 instead of 500 for unknown path (envoyproxy#835)

**Description**

Previously, unknown path was responded as an internal error as opposed
to the fact that it's an 404 with the user input root cause. This fixes
the extproc code that way, now that users will be able to know what's
wrong with the operation instead of getting the cryptic 500 error.

**Related Issues/PRs (if applicable)**

Contributes to envoyproxy#810
Closes envoyproxy#724

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

docs: add endpoint support (envoyproxy#787)

**Description**

This PR adds the endpoint support pages for EAGW.

**Related Issues/PRs (if applicable)**

Fixes: envoyproxy#705

**Special notes for reviewers (if applicable)**

@mathetake

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>

controller: return 404 instead of 500 for no matching (envoyproxy#837)

**Description**

Before envoyproxy#793, the case where no matching route found was handled in the
extproc and the 404 immediate response was returned from there, but
after that, it naturally results in the "unreachable" default route and
swallowed the indication of no matching and it made it impossible to
reason about the 500 error on that case. In other words, this fixes the
regression in envoyproxy#793 to return the proper 404 response.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

update

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit passing

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

remove header hotfix

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

precommit working
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

 add more test coverage

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

test: adds real provider embeddings test & update doc (envoyproxy#841)

**Description**

This adds embeddings endpoint tests with the providers that support the
endpoint. This only added the providers for which we have credentials.
According to the testing situation we have right now, this also
clarifies in the "Supported Endpoints" page that which provider is
tested and which is not for each endpoint.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

cli: adds default route test (envoyproxy#842)

**Description**

This adds an additional test to aigw run command so that we can verify
that setting the default route is possible.

**Related Issues/PRs (if applicable)**

Closes envoyproxy#612

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

build(deps): bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (envoyproxy#845)

test: fixes TestStartConfigWatcher flake (envoyproxy#843)

controller: ensure eg rollout when deployed as daemonset (envoyproxy#831)

**Description**
This PR handles the rollout for envoy gateway during ai gateway extproc
upgrade when deployed as daemonset.

Related Issues/PRs (if applicable)
Related PR: envoyproxy#699

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

make test var

Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
alexagriffith pushed a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 11, 2025
**Description**

This PR adds the proposal for supporting Integration with Endpoint
Picker(GIE)

Related to envoyproxy#423

---------

Signed-off-by: bitliu <bitliu@tencent.com>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants