Skip to content

Race conditions between controller and gateway service #6814

@adrielp

Description

@adrielp

Description:

When deploying for the first time, the Envoy Gateway controller appears to experience a race condition when trying to reference the Envoy service before it's fully created. One has to restart the controller pod for it to resolve.

Repro steps:

v1.5.0 of the install.yaml with minor annotation changes

envoy-proxy.yaml

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: public-proxy-config
  namespace: envoy-gateway-system
spec:
  # Bootstrap configuration to fix admin interface IPv6 binding
  bootstrap:
    type: JSONPatch
    jsonPatches:
    - {"op": "replace", "path": "/admin/address/socket_address/port_value", "value": 19003}
    - {"op": "replace", "path": "/admin/address/socket_address/address", "value": "::"}

  ipFamily: IPv6
  logging:
    level:
      default: debug
  provider:
    type: Kubernetes
    kubernetes:
      envoyService:
        annotations:
          # EKS Auto Mode compatible annotations
          service.beta.kubernetes.io/aws-load-balancer-type: external
          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
          service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
          service.beta.kubernetes.io/aws-load-balancer-ip-address-type: dualstack
          service.beta.kubernetes.io/aws-load-balancer-subnets: "example-1, example-2"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: TCP
        type: LoadBalancer
        loadBalancerClass: eks.amazonaws.com/nlb
        externalTrafficPolicy: Cluster

gateway-class.yaml

---
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: some-public
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: public-proxy-config
    namespace: envoy-gateway-system

gateway.yaml

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: some-public
  namespace: envoy-gateway-system
  labels:
    gateway: public
spec:
  gatewayClassName: some-public
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    allowedRoutes:
      namespaces:
        from: All
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: <some secret>
  # HTTP listener for redirects
  - name: http
    protocol: HTTP
    port: 80
    allowedRoutes:
      namespaces:
        from: All

Environment:

Envoy installed through Helm chart at v1.5.0.
ArgoCD > 3.0 applies manifests.
AWS EKS 1.33 Kubernetes w/ EKS Automode (karpenter enabled).

Logs:

Failure when envoy gateway is being deployed. Doesn't automatically resolve once service is online.

2025-08-16T02:27:27.124Z ERROR provider kubernetes/controller.go:618 failed to get Service {"runner": "provider", "namespace": "envoy-gateway-system", "name": "envoy-envoy-gateway-system-public-gateway-b3b5c242", "error": "Service \"envoy-envoy-gateway-system-public-gateway-b3b5c242\" not found"}

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions