Skip to content

Support k8s gateway API inference extensions #423

@yanavlasov

Description

@yanavlasov

Support k8s gateway API inference extensions

  • Translate k8s gateway API inference extensions CRDs into Envoy configuration
  • Add e2e tests
  • TBD

Design notes:

  • Inference extension maps HTTPRoute to multiple InferencePools. Inference pool uses labels to select pods for the pool and the port number to use. There is a note that says a service selector can be used, but not sure how it might work (a well known label?).
  • The InferenceModel determines which pool is going to be used for serving a request. presently it does by using the model value form the request, assuming requests use OpenAI schema.
  • The inference extension spec allows offloading of model and endpoint selection to a remote service that implements ext_proc protocol. This is so operators can implement implement their proprietary business logic for selecting both models and then individual endpoints for serving the request.
  • In the most generic case request processing requires two ext_proc callouts.
    1. first callout to select the model and as such inference pool for this model based on contents of the body and other requirements (i.e. cost, priority, etc).
    2. second callout to select endpoint from the inference pool - this is to optimize resource utilization for serving requests. Note that in the generic case the endpoint picking extension can be specific to the pool (hence requirement for a dedicated calout).
  • The two steps above can be done by a single callout, if HTTPRoute is mapped to only one pool, external endpoint selection is not needed, or the callout service can select endpoints for any inference pool.
  • Model/pool external picker uses header (TBD: get puclic specification for the header name) to select a route/cluster corresponding to the inference pool.
  • External endpoint picker uses either metadata or well define header to specify primary endpoint. See proposal Note that the fallback endpoints (for retries) will be added to that proposal shortly.

Reference implementation for external endpoint picker: https://github.com/kubernetes-sigs/gateway-api-inference-extension/

Implementation details and questions:

  1. To support proposed way of specifying endpoints the cluster either needs to use subsets (a subset for each endpoint) or use a specific LB policy.
    • @yanavlasov is to open source LB policy to Envoy that will support endpoint picking based on the propsed endpoint picker protocol.
    • It is unclear of Envoy Gateway supports generating endpoint assignments with subsets for each endpoint.
  2. Can Envoy Gateway generate endpoint assignments for the cluster based on Pod labels or does it need a Service ref?

Possible iteration steps:

  1. Consume inference extension CRDs and generate the same configuration as today. This will give us external model/pool selection, but still Envoy internal endpoint selection.
    • Need to resolve the issue of InferencePool using labels to specify pods.
  2. Open source LB policy to support external endpoint selection in Envoy (parallel to step 1).
  3. Add config (TBD) to use remote endpoint picker - this will use two callouts per request.

Metadata

Metadata

Assignees

Labels

apiControl Plane APIenhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions