Skip to content

Support DynamicLoadBalancing beyond AIE(API inference extension) #604

@kerthcet

Description

@kerthcet

Description:

DynamicLoadBalancing extends the gateway routing system by picking the endpoints(Pod IPs etc.)directly, it's great. But after I checked the code, I found two limitations here:

  • From the API perspective, the AIGatewayRoute particularly, only AIE is supported, see the code link below. For users don't employ AIE like us, we hope to have this support as well. I know we can update the configmap to enable this, but it's a bit complex.
    if isInferencePoolRef(backendRef) {
    var pool *gwaiev1a2.InferencePool
    var referencedAIServiceBackends []aigv1a1.AIServiceBackend
    pool, referencedAIServiceBackends, err = c.getPoolAndReferencedAIServiceBackends(ctx, aiGatewayRoute.Namespace, backendRef.Name)
    if err != nil {
    return fmt.Errorf("failed to get pool and referenced AIServiceBackends: %w", err)
    }
    ecBackendConfig.DynamicLoadBalancing, err = c.createDynamicLoadBalancing(ctx, i, j, pool, referencedAIServiceBackends)
  • The Pod IPs are attached to the Backend configurations, which means once Pods updates, we have to update the configurations. Have no idea whether this is an efficient way, maybe still useful to some users with static Pod IPs, but to us, we hope to manage the Pods ourselves. So we may implement a new DynamicLoadBalancing to replace the default one.

This is a blocking issue for us, what we're doing right now is we're building the metrics-based router, we have all the Pods informations by collecting necessary metrics and want to bake it as a plugin into the Envoy AI Gateway. So within the dynamicLoadBalaner, we can make wise routing decisions by picking Pod endpoint directly.

Correct me if I misunderstood the system design here. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions