Skip to content

Loadbalancers created via aws-load-balancer-controller for EnvoyProxy instances are leaked once Gateway is removed #1820

@wondersd

Description

@wondersd

Description:
When creating a Gateway instance in an EKS cluster using the aws-load-balancer-controller to provision loadbalancers, subsequent deletion of the Gateway resource can leave the underlying AWS resources (loadbalancer, targetgroup, sg rules, sgs, etc) leaked and never deleted.

The underlying reason for this appears to be EG is striping the .metadata.finalizers section of the managed Service object. As aws-load-balancer-controller (and probably any and all load balancer providers for that matter) create non k8s resources that must be cleaned up, the finalizer is injected onto the service so that it the controller is guaranteed time to reconcile the deletion of these resources before losing the Service object (and then its tracking of those resources leaving them leaked).

EnvoyProxy service object created

apiVersion: v1
  kind: Service
  metadata:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
      service.beta.kubernetes.io/aws-load-balancer-scheme: internal
      service.beta.kubernetes.io/aws-load-balancer-type: external
    creationTimestamp: "2023-08-23T14:18:56Z"
    labels:
      app.kubernetes.io/component: proxy
      app.kubernetes.io/managed-by: envoy-gateway
      app.kubernetes.io/name: envoy
      gateway.envoyproxy.io/owning-gateway-name: ...
      gateway.envoyproxy.io/owning-gateway-namespace:...
    name: envoy-...-72616262
    namespace: envoy-gateway
    resourceVersion: ...
    uid: ...
  spec:
    allocateLoadBalancerNodePorts: true
    clusterIP: 172.20.132.225
    clusterIPs:
    - 172.20.132.225
    externalTrafficPolicy: Local
    healthCheckNodePort: 30042
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    ...
    selector:
      app.kubernetes.io/component: proxy
      app.kubernetes.io/managed-by: envoy-gateway
      app.kubernetes.io/name: envoy
      gateway.envoyproxy.io/owning-gateway-name: ...
      gateway.envoyproxy.io/owning-gateway-namespace:...
    sessionAffinity: None
    type: LoadBalancer
  status:
    loadBalancer:
      ingress:
      - hostname: ....elb.us-east-1.amazonaws.com

A service object created manually from the following yaml

apiVersion: v1
kind: Service
metadata:
  name: loadbalancer
  namespace: envoy-gateway
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internal
    service.beta.kubernetes.io/aws-load-balancer-type: external
spec:
  type: LoadBalancer
  ports:
    - name: test
      port: 8080
      protocol: TCP

yeilds

apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: ...
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internal
    service.beta.kubernetes.io/aws-load-balancer-type: external
  creationTimestamp: "2023-08-23T15:04:39Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  - service.k8s.aws/resources
  name: loadbalancer
  namespace: envoy-gateway
  resourceVersion: ...
  uid: ...
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.20.204.23
  clusterIPs:
  - 172.20.204.23
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: test
    nodePort: 30456
    port: 8080
    protocol: TCP
    targetPort: 8080
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - hostname: ....elb.us-east-1.amazonaws.com

Unlike what is observed from the EG managed service, we have the finalizer section intact for the manually created service (with it not being part of the original yaml). So it seems as though the reconciliation process is stripping that out. Without that our loadbalancer provider is unable to guarantee cleanup the cloud resources.

Repro steps:

  1. setup gatewayclass, envoyproxy and gateway resource as per normal with service type LoadBalancer (default) and requisite annotations to activate aws-load-balancer-controller (should also be reproducible for any other loadbalancer providers that rely on finalizers)
    apiVersion: "config.gateway.envoyproxy.io/v1alpha1"
    kind: "EnvoyProxy"
    metadata:
      name: "aws-loadbalancer-controller-ep"
    spec:
      provider:
        type: "Kubernetes"
        kubernetes:
          envoyService:
            annotations:
              service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
              service.beta.kubernetes.io/aws-load-balancer-scheme: internal
              service.beta.kubernetes.io/aws-load-balancer-type: external
    
  2. delete Gateway object. see aws loadbalancer is not removed

Environment:

  • aws based kubernetes cluster with aws-load-balancer-controller available to be used
  • eg 0.5.0

Logs:
Have not been able to locate specific logs indicating finalizer is being removed.

Relates to:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions