-
Notifications
You must be signed in to change notification settings - Fork 711
Websocket is disconnected when creating/removing unrelated gateways with mergeGateways mode #6716
Description
Description:
When adding or removing gateway listener metadata is updated, which causes the entire listener to be drained based on envoy documentation:
“Not all the listener config updates can be executed by filter chain update. For example, if the listener metadata is updated within the new listener config, the new metadata must be picked up by the new filter chains. In this case, the entire listener is drained and updated.”
Although I didn’t see any indication of listener draining in envoy metrics, I experienced websocket disconnections with downstream_local_disconnect(purging_socket_that_have_not_progressed_to_connections) reponse_code_details in envoy logs. When testing with a custom EG image that doesn’t update listener metadata the issue was not reproduced.
The listener metadata contains an entry for each gateway, so it is updated with any gateway addition/removal/update.
According to EG docs, in mergeGateways mode listener metadata should be taken from GatewayClass, not Gateway.
Repro steps:
- Create KinD cluster:
kind create cluster --config=- << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 31500
hostPort: 31500
protocol: TCP
EOF- Install envoy-gateway v1.5.0-rc.2 with
XDSNameSchemeV2runtime flag enabled:
helm install eg oci://docker.io/envoyproxy/gateway-helm \
--version v1.5.0-rc.2 \
-n envoy-gateway-system \
--create-namespace \
--wait \
--set "config.envoyGateway.runtimeFlags.enabled[0]=XDSNameSchemeV2"- Apply gatewayClass and envoyProxy with mergedGateways enabled:
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: config
namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: config
namespace: envoy-gateway-system
spec:
provider:
type: Kubernetes
kubernetes:
envoyService:
type: NodePort
patch:
type: StrategicMerge
value:
spec:
ports:
- nodePort: 31500
port: 80
mergeGateways: true
EOF- Create backend, gateway, and httproute in
consumer1namespace:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
name: consumer1
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: backend
namespace: consumer1
---
apiVersion: v1
kind: Service
metadata:
name: backend
namespace: consumer1
labels:
app: backend
service: backend
spec:
ports:
- name: http
port: 3000
targetPort: 3000
selector:
app: backend
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: consumer1
spec:
replicas: 1
selector:
matchLabels:
app: backend
version: v1
template:
metadata:
labels:
app: backend
version: v1
spec:
serviceAccountName: backend
containers:
- image: gcr.io/k8s-staging-gateway-api/echo-basic:v20231214-v1.0.0-140-gf544a46e
imagePullPolicy: IfNotPresent
name: backend
ports:
- containerPort: 3000
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: consumer1
spec:
gatewayClassName: eg
listeners:
- name: http
protocol: HTTP
port: 80
hostname: example.com
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: backend
namespace: consumer1
spec:
parentRefs:
- name: eg
hostnames:
- "example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
EOF- Create websocket backend, gateway, and httproute in
consumer2namespace:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
name: consumer2
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: backend
namespace: consumer2
---
apiVersion: v1
kind: Service
metadata:
name: backend
namespace: consumer2
labels:
app: backend
service: backend
spec:
ports:
- name: http
port: 8080
targetPort: 8080
selector:
app: backend
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: consumer2
spec:
replicas: 1
selector:
matchLabels:
app: backend
version: v1
template:
metadata:
labels:
app: backend
version: v1
spec:
serviceAccountName: backend
containers:
- image: jmalloc/echo-server:v0.3.7
imagePullPolicy: IfNotPresent
name: backend
ports:
- containerPort: 8080
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: consumer2
spec:
gatewayClassName: eg
listeners:
- name: http
protocol: HTTP
port: 80
hostname: example2.com
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: backend
namespace: consumer2
spec:
parentRefs:
- name: eg
hostnames:
- "example2.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 8080
weight: 1
matches:
- path:
type: PathPrefix
value: /
EOF- Initiate websocket connection with websocat:
while true; do
echo "ping"
sleep 1
done | websocat -t - ws-c:tcp:127.0.0.1:31500 --ws-c-uri=ws://example2.com/A ping message should be sent every 1 seconds to server and should be echoed back:
Request served by backend-84f786dd68-p5jgq
ping
ping
ping
ping
...
- Delete
consumer1namespace from a new terminal:
kubectl delete consumer1- After a short time websocket will get disconnected:
...
ping
ping
websocat: WebSocketError: I/O failure
websocat: error running
You'll also see the following access log in envoy:
{":authority":"example2.com:31500","bytes_received":1554,"bytes_sent":1178,"connection_termination_details":null,"downstream_local_address":"10.244.0.12:10080","downstream_remote_address":"10.244.0.1:44529","duration":1122809,"method":"GET","protocol":"HTTP/1.1","requested_server_name":null,"response_code":101,"response_code_details":"downstream_local_disconnect(purging_socket_that_have_not_progressed_to_connections)","response_flags":"DC","route_name":"httproute/consumer2/backend/rule/0/match/0/example2_com","start_time":"2025-08-06T19:05:17.318Z","upstream_cluster":"httproute/consumer2/backend/rule/0","upstream_host":"10.244.0.14:8080","upstream_local_address":"10.244.0.12:45008","upstream_transport_failure_reason":null,"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36","x-envoy-origin-path":"/","x-envoy-upstream-service-time":null,"x-forwarded-for":"10.244.0.1","x-request-id":"57c99ca3-1df4-4545-9309-58f34aef0993"}
Note the response_code_details":"downstream_local_disconnect(purging_socket_that_have_not_progressed_to_connections)
Related: #6534
Environment:
EG 1.5.0-rc.2