Skip to content

Enable HPA for Envoy Proxy introduce unexpected behavior #2807

@ardikabs

Description

@ardikabs

Description:
Relates to #2257

Once enabled envoyHpa on EnvoyProxy API some unexpected behavior was observed.

During reconciling, Envoy Gateway managed to create HorizontalPodAutoscaler resource successfully. However, when it comes to any operation related to Gateway API resource, Envoy Gateway unexpectedly reset the replicas of Envoy Proxy deployment to their original value specified in EnvoyProxy.spec.provider.kubernetes.envoyDeployment.replicas, if this field is undefined, then it will set to 1 replica. Though, after some time, when the HPA controller kicks in again, it will set back to its minReplicas, but I am a bit concerned in the event of high traffic, and any operation for Gateway API applied, it enforces the Envoy Proxy deployment to be scaled-down.

Repro steps:

Apply the below EnvoyProxy manifest, then assuming some of HTTPRoute resources exist, then do any operation such as modification into that resource. You will observe a scale-down operation into EnvoyProxy deployment, from 3 replicas (specified in HPA's minReplicas field) to 1 replica (default value for EnvoyProxy replica).

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: default
  namespace: envoy-gateway-system
spec:
  logging:
    level:
      default: warn
  provider:
    kubernetes:
      envoyDeployment:
        pod:
          labels:
            service_name: envoy-proxy
            service_group: envoy
        container:
          resources:
            limits:
              cpu: 2
              memory: 2Gi
            requests:
              cpu: 1
              memory: 1Gi
      envoyHpa:
        maxReplicas: 12
        minReplicas: 3
        metrics:
          - resource:
              name: cpu
              target:
                averageUtilization: 60
                type: Utilization
            type: Resource
    type: Kubernetes

Proposal:

Even though the code has already been set up to prevent reverting replicas to their original value when the HPA controller kicks in (to adjust the Envoy Proxy deployment replicas), the Envoy Gateway seems still not aware when it comes in createOrUpdateDeployment function during reconciliation.

func (i *Infra) createOrUpdateDeployment(ctx context.Context, r ResourceRender) error {
deployment, err := r.Deployment()
if err != nil {
return err
}
current := &appsv1.Deployment{}
key := types.NamespacedName{
Namespace: deployment.Namespace,
Name: deployment.Name,
}
hpa, err := r.HorizontalPodAutoscaler()
if err != nil {
return err
}
var opts cmp.Options
if hpa != nil {
opts = append(opts, cmpopts.IgnoreFields(appsv1.DeploymentSpec{}, "Replicas"))
}

The createOrUpdateDeployment function seems reset the replicas from deploymentConfig in L239,
Spec: appsv1.DeploymentSpec{
Replicas: deploymentConfig.Replicas,
Strategy: *deploymentConfig.Strategy,
.
It seems to me that we need to set the replicas to nil if envoyHpa is utilized, something like below:

if hpa != nil {
   deployment.Spec.Replicas = nil
   opts = append(opts, cmpopts.IgnoreFields(appsv1.DeploymentSpec{}, "Replicas")) 
} 

This way the Envoy Gateway would refrain from further replica adjustments when envoyHpa is utilized.

Environment:
latest

Logs:

Include the access logs and the Envoy logs.

Metadata

Metadata

Assignees

Labels

area/infra-mgrIssues related to the provisioner used for provisioning the managed Envoy Proxy fleet.kind/bugSomething isn't workingroad-to-ga

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions