feat: Gracefully drain listeners before envoy shutdown on pod termination by davidalger · Pull Request #2633 · envoyproxy/gateway

davidalger · 2024-02-16T21:40:09Z

What this PR does:

This PR is a step in the direction of achieving a hitless upgrade. Some challenges to achieving this have been outlined by @guydc in #1712 (comment) and #2610

This proposed implementation of a PreStop lifecycle hook will allow Envoy proxies to gracefully shutdown when pods are terminated by Kubernetes. This is done by:

a) Initiating a graceful drain sequence via the drain_listeners?graceful admin endpoint.

b) Sleeping for a minimum drain duration of 15 seconds allowing ample time for endpoint deprogramming to occur.

c) Monitors total.server_connections statistic exiting once total connection count reaches 0

d) Times out after 600 seconds after which Envoy will be sent SIGTERM by Kubernetes regardless of how many open connections remain. This timeout lines up with the default value for --drain-time-s

e) Configures terminationGracePeriodSeconds to 900 seconds. This provides ample time for Envoy to cleanly exit after connections have been drained.

Some prior art w.r.t graceful shutdown for Envoy:

Summary of why we need this:

Kubernetes uses SIGTERM to signal processes to gracefully shutdown, however Envoy does not support gracefully draining connections during shutdowns triggered by SIGTERM.
Endpoint deprogramming is eventually consistent and the time this takes varies somewhat based on the k8s implementation and size of the cluster.

Caveats:

When TCP listeners are used (rather than HTTP listeners), connections will not drain per the below note from draining architecture documentation. This means in the event of long-living TCP connections, Envoy proxies will simply wait in the PreStop hook for the 600 second timeout to occur after which these connections will be closed when Envoy receives the SIGTERM from Kubernetes.

Note that although draining is a per-listener concept, it must be supported at the network filter level. Currently the only filters that support graceful draining are Redis, Mongo, Thrift (if the envoy.reloadable_features.thrift_connection_draining runtime feature is enabled), and HTTP connection manager.

Reference: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/draining

davidalger · 2024-02-16T21:40:35Z

Testing / Load generator with unaltered EG quickstart.yaml:

run from another pod within the cluster

hey -c 100 -q 10 -z 300s -host www.example.com http://172.18.255.200/

PreStop logs from rolling restart of single replica deployment

$ kubectl logs -n envoy-gateway-system deploy/envoy-default-eg-e41e7b31 -f | grep -v start_time
Found 2 pods, using pod/envoy-default-eg-e41e7b31-85c848c76b-w8p5t
[2024-02-16 21:23:58.508][1][warning][main] [source/server/server.cc:923] There is no configured limit to the number of allowed active downstream connections. Configure a limit in `envoy.resource_monitors.downstream_connections` resource monitor.
[2024-02-16 21:24:21.965][37][info][PreStop] initiating graceful drain with 15 second minimum drain period and 600 second timeout
[2024-02-16 21:24:21.969][37][info][PreStop] total connections: 100
[2024-02-16 21:24:22.974][37][info][PreStop] total connections: 100
[2024-02-16 21:24:23.978][37][info][PreStop] total connections: 97
[2024-02-16 21:24:24.983][37][info][PreStop] total connections: 97
[2024-02-16 21:24:25.988][37][info][PreStop] total connections: 97
[2024-02-16 21:24:26.992][37][info][PreStop] total connections: 97
[2024-02-16 21:24:27.997][37][info][PreStop] total connections: 97
[2024-02-16 21:24:29.001][37][info][PreStop] total connections: 64
[2024-02-16 21:24:30.005][37][info][PreStop] total connections: 64
[2024-02-16 21:24:31.010][37][info][PreStop] total connections: 64
[2024-02-16 21:24:32.014][37][info][PreStop] total connections: 64
[2024-02-16 21:24:33.018][37][info][PreStop] total connections: 64
[2024-02-16 21:24:34.023][37][info][PreStop] total connections: 29
[2024-02-16 21:24:35.027][37][info][PreStop] total connections: 29
[2024-02-16 21:24:36.032][37][info][PreStop] total connections: 29
[2024-02-16 21:24:37.036][37][info][PreStop] total connections: 29
[2024-02-16 21:24:37.037][37][info][PreStop] minimum drain period reached; allowed to exit when total connections reaches zero
[2024-02-16 21:24:38.042][37][info][PreStop] total connections: 29
[2024-02-16 21:24:39.046][37][info][PreStop] total connections: 7
[2024-02-16 21:24:40.050][37][info][PreStop] total connections: 7
[2024-02-16 21:24:41.055][37][info][PreStop] total connections: 7
[2024-02-16 21:24:42.059][37][info][PreStop] total connections: 7
[2024-02-16 21:24:43.064][37][info][PreStop] total connections: 7
[2024-02-16 21:24:44.068][37][info][PreStop] total connections: 1
[2024-02-16 21:24:45.073][37][info][PreStop] total connections: 1
[2024-02-16 21:24:46.077][37][info][PreStop] total connections: 1
[2024-02-16 21:24:47.082][37][info][PreStop] total connections: 1
[2024-02-16 21:24:48.086][37][info][PreStop] total connections: 1
[2024-02-16 21:24:49.091][37][info][PreStop] total connections: 0
[2024-02-16 21:24:49.096][1][warning][main] [source/server/server.cc:895] caught ENVOY_SIGTERM
[2024-02-16 21:24:49.100][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:155] DeltaAggregatedResources gRPC config stream to xds_cluster closed: 13,

After restarting the single replica deployment, Envoy was scaled up to 5 replicas and restarted twice with the hey load generator continuing to run.

load generator output demonstrating 100% success responses for the test

$ hey -c 100 -q 10 -z 300s -host www.example.com http://172.18.255.200/

Summary:
  Total:	300.0754 secs
  Slowest:	0.4689 secs
  Fastest:	0.0012 secs
  Average:	0.1215 secs
  Requests/sec:	812.2893
  
  Total data:	121874000 bytes
  Size/request:	500 bytes

Response time histogram:
  0.001 [1]	|
  0.048 [3263]	|■
  0.095 [4661]	|■
  0.142 [222708]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.188 [4640]	|■
  0.235 [6894]	|■
  0.282 [990]	|
  0.329 [370]	|
  0.375 [141]	|
  0.422 [66]	|
  0.469 [14]	|


Latency distribution:
  10% in 0.1130 secs
  25% in 0.1168 secs
  50% in 0.1196 secs
  75% in 0.1224 secs
  90% in 0.1264 secs
  95% in 0.1520 secs
  99% in 0.2142 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.0012 secs, 0.4689 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:	0.0006 secs, 0.0000 secs, 0.1484 secs
  resp wait:	0.0015 secs, 0.0002 secs, 0.2476 secs
  resp read:	0.0594 secs, 0.0000 secs, 0.3291 secs

Status code distribution:
  [200]	243748 responses

davidalger · 2024-02-16T21:40:41Z

Testing / Load generator with EG quickstart.yaml plus 1s delay:

run from another pod within the cluster

hey -c 20 -q 10 -z 300s -host www.example.com http://172.18.255.200/

fault-injection-delay simulating 1s response times

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: fault-injection-delay
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: backend
  faultInjection:
    delay:
      fixedDelay: 1s

PreStop logs from rolling restart of single replica deployment

$ kubectl logs -n envoy-gateway-system deploy/envoy-default-eg-e41e7b31 -f | grep -v start_time
Found 2 pods, using pod/envoy-default-eg-e41e7b31-7854bd848-qpd82
[2024-02-16 21:31:03.157][1][warning][main] [source/server/server.cc:923] There is no configured limit to the number of allowed active downstream connections. Configure a limit in `envoy.resource_monitors.downstream_connections` resource monitor.
[2024-02-16 21:33:14.274][37][info][PreStop] initiating graceful drain with 15 second minimum drain period and 600 second timeout
[2024-02-16 21:33:14.278][37][info][PreStop] total connections: 20
[2024-02-16 21:33:15.286][37][info][PreStop] total connections: 20
[2024-02-16 21:33:16.308][37][info][PreStop] total connections: 20
[2024-02-16 21:33:17.327][37][info][PreStop] total connections: 20
[2024-02-16 21:33:18.347][37][info][PreStop] total connections: 19
[2024-02-16 21:33:19.353][37][info][PreStop] total connections: 19
[2024-02-16 21:33:20.374][37][info][PreStop] total connections: 19
[2024-02-16 21:33:21.395][37][info][PreStop] total connections: 19
[2024-02-16 21:33:22.405][37][info][PreStop] total connections: 19
[2024-02-16 21:33:23.426][37][info][PreStop] total connections: 19
[2024-02-16 21:33:24.443][37][info][PreStop] total connections: 19
[2024-02-16 21:33:25.484][37][info][PreStop] total connections: 19
[2024-02-16 21:33:26.498][37][info][PreStop] total connections: 19
[2024-02-16 21:33:27.517][37][info][PreStop] total connections: 19
[2024-02-16 21:33:28.539][37][info][PreStop] total connections: 16
[2024-02-16 21:33:29.568][37][info][PreStop] total connections: 16
[2024-02-16 21:33:30.594][37][info][PreStop] total connections: 16
[2024-02-16 21:33:30.598][37][info][PreStop] minimum drain period reached; allowed to exit when total connections reaches zero
[2024-02-16 21:33:31.608][37][info][PreStop] total connections: 16
[2024-02-16 21:33:32.629][37][info][PreStop] total connections: 16
[2024-02-16 21:33:33.656][37][info][PreStop] total connections: 16
[2024-02-16 21:33:34.681][37][info][PreStop] total connections: 16
[2024-02-16 21:33:35.707][37][info][PreStop] total connections: 16
[2024-02-16 21:33:36.729][37][info][PreStop] total connections: 16
[2024-02-16 21:33:37.848][37][info][PreStop] total connections: 16
[2024-02-16 21:33:38.862][37][info][PreStop] total connections: 14
[2024-02-16 21:33:39.889][37][info][PreStop] total connections: 14
[2024-02-16 21:33:40.900][37][info][PreStop] total connections: 14
[2024-02-16 21:33:41.914][37][info][PreStop] total connections: 14
[2024-02-16 21:33:42.926][37][info][PreStop] total connections: 14
[2024-02-16 21:33:43.943][37][info][PreStop] total connections: 11
[2024-02-16 21:33:44.948][37][info][PreStop] total connections: 11
[2024-02-16 21:33:45.958][37][info][PreStop] total connections: 11
[2024-02-16 21:33:46.969][37][info][PreStop] total connections: 11
[2024-02-16 21:33:47.992][37][info][PreStop] total connections: 11
[2024-02-16 21:33:49.007][37][info][PreStop] total connections: 10
[2024-02-16 21:33:50.026][37][info][PreStop] total connections: 10
[2024-02-16 21:33:51.048][37][info][PreStop] total connections: 10
[2024-02-16 21:33:52.055][37][info][PreStop] total connections: 10
[2024-02-16 21:33:53.068][37][info][PreStop] total connections: 10
[2024-02-16 21:33:54.094][37][info][PreStop] total connections: 8
[2024-02-16 21:33:55.101][37][info][PreStop] total connections: 8
[2024-02-16 21:33:56.109][37][info][PreStop] total connections: 8
[2024-02-16 21:33:57.121][37][info][PreStop] total connections: 8
[2024-02-16 21:33:58.134][37][info][PreStop] total connections: 8
[2024-02-16 21:33:59.147][37][info][PreStop] total connections: 7
[2024-02-16 21:34:00.166][37][info][PreStop] total connections: 7
[2024-02-16 21:34:01.184][37][info][PreStop] total connections: 7
[2024-02-16 21:34:02.206][37][info][PreStop] total connections: 7
[2024-02-16 21:34:03.328][37][info][PreStop] total connections: 7
[2024-02-16 21:34:04.341][37][info][PreStop] total connections: 4
[2024-02-16 21:34:05.462][37][info][PreStop] total connections: 4
[2024-02-16 21:34:06.505][37][info][PreStop] total connections: 4
[2024-02-16 21:34:07.625][37][info][PreStop] total connections: 4
[2024-02-16 21:34:08.652][37][info][PreStop] total connections: 1
[2024-02-16 21:34:09.678][37][info][PreStop] total connections: 1
[2024-02-16 21:34:10.687][37][info][PreStop] total connections: 1
[2024-02-16 21:34:11.710][37][info][PreStop] total connections: 1
[2024-02-16 21:34:12.731][37][info][PreStop] total connections: 1
[2024-02-16 21:34:13.860][37][info][PreStop] total connections: 0
[2024-02-16 21:34:13.880][1][warning][main] [source/server/server.cc:895] caught ENVOY_SIGTERM
[2024-02-16 21:34:13.886][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:155] DeltaAggregatedResources gRPC config stream to xds_cluster closed: 13,

After restarting the single replica deployment, Envoy was scaled up to 5 replicas and restarted twice with the hey load generator continuing to run.

load generator output demonstrating 100% success responses for the test

hey -c 20 -q 10 -z 300s -host www.example.com http://172.18.255.200/

Summary:
  Total:	300.7588 secs
  Slowest:	1.1698 secs
  Fastest:	0.9969 secs
  Average:	1.0085 secs
  Requests/sec:	19.8165
  
  Total data:	2980000 bytes
  Size/request:	500 bytes

Response time histogram:
  0.997 [1]	|
  1.014 [5073]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  1.031 [798]	|■■■■■■
  1.049 [65]	|■
  1.066 [3]	|
  1.083 [0]	|
  1.101 [0]	|
  1.118 [0]	|
  1.135 [1]	|
  1.152 [9]	|
  1.170 [10]	|


Latency distribution:
  10% in 1.0017 secs
  25% in 1.0033 secs
  50% in 1.0061 secs
  75% in 1.0110 secs
  90% in 1.0168 secs
  95% in 1.0216 secs
  99% in 1.0363 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0003 secs, 0.9969 secs, 1.1698 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:	0.0004 secs, 0.0000 secs, 0.0308 secs
  resp wait:	1.0035 secs, 0.9960 secs, 1.0380 secs
  resp read:	0.0020 secs, 0.0000 secs, 0.0292 secs

Status code distribution:
  [200]	5960 responses

davidalger · 2024-02-16T21:42:53Z

@arkodg @guydc Would like thoughts from the both of you on the proposed implementation and approach demonstrated here and also whether these defaults are sufficient or need knobs to adjust:

minimum drain duration (currently 15s)
graceful drain timeout (currently 600s)
terminationGracePeriodSeconds (currently 900s)
total.server_connections exit threshold (currently 0)

…tion Signed-off-by: David Alger <davidmalger@gmail.com>

guydc · 2024-02-16T22:02:45Z

This lines up with the default value for --parent-shutdown-time-s

Is there any significance to using the same value here, if we're not using the envoy hot reload feature?

codecov · 2024-02-16T22:06:25Z

Codecov Report

Attention: Patch coverage is 93.56725% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 63.23%. Comparing base (72fadb7) to head (afcbf23).

Files	Patch %	Lines
internal/cmd/envoy.go	86.36%	4 Missing and 2 partials ⚠️
...ternal/infrastructure/kubernetes/proxy/resource.go	96.59%	2 Missing and 1 partial ⚠️
internal/logging/log.go	88.23%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2633      +/-   ##
==========================================
+ Coverage   63.00%   63.23%   +0.22%     
==========================================
  Files         122      123       +1     
  Lines       19770    19922     +152     
==========================================
+ Hits        12457    12598     +141     
- Misses       6506     6513       +7     
- Partials      807      811       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

davidalger · 2024-02-16T22:08:04Z

This lines up with the default value for --parent-shutdown-time-s

Is there any significance to using the same value here, if we're not using the envoy hot reload feature?

@guydc No particular significance to the value being the same in the context of k8s, but it is why I chose 5 minutes greater than the drain timeout since the docs state the drain time should be less than the parent shutdown time set via the --parent-shutdown-time-s option.

The important bit is that terminationGracePeriodSeconds must be higher than the graceful drain timeout so that k8s will send a SIGTERM instead of a SIGKILL if the drain runs up the clock against the full 600 second drain timeout.

guydc · 2024-02-16T22:23:29Z

internal/infrastructure/kubernetes/proxy/resource.go

+	return &corev1.Lifecycle{
+		PreStop: &corev1.LifecycleHandler{
+			Exec: &corev1.ExecAction{
+				Command: []string{"/bin/bash", "-c", strings.ReplaceAll(strings.TrimSpace(dedent.Dedent(`


is this portable to distroless images?

That's actually a really good call out, as I'm pretty sure it won't work in any pre-built distroless image. Probably won't break anything, but won't do anything either since the exec will fail and k8s will proceed with an immediate SIGTERM to Envoy. So now I'm having a sudden realization that v0.6.0 started using the distroless image, but the dev builds (which I tested with) don't. 🤦🏻

The options are more limited with distroless images since they typically don't include extra binaries such as bash, cat, or awk very likely leaving us with the only real viable long-term option being to build our own Envoy proxy images to include something like a wrapper process or a static executable to manage the the termination process.

There are other approaches that could work without building our own proxy image, some would tend to be fragile or require exposing the admin port. It's possible a sidecar container along with an http prestop hook could work, but will definitely take a bit more effort to setup.

can we implement a) and b) using the HTTP and Sleep pre stop hooks ?
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-implementations
we may need to convert the preStop.httpGet into a POST using envoy itself :)

gateway/internal/xds/bootstrap/bootstrap.yaml.tpl

Line 77 in f7df0e2

route_config:

There is unfortunately no way to accomplish this only leveraging k8s sleep and http lifecycle hooks:

HTTP requests cannot be made to a port bound to 127.0.0.1 (I've tried this) so requests to the Envoy admin port would require it be exposed which would pose a security risk.

Envoy doesn't exit after a graceful drain since the process was designed for hot reloading. It would always run out the clock on the terminationGracePeriodSeconds even if there were no open connections.

Sleep prestop actions are too new and cannot be used even to just introduce a basic X second sleep for deprogramming as they won't reach beta until k8s v1.30.

Here is what I'm going to propose as a path forward:

Implement a sub-command in Envoy Gateway called envoy shutdown which will execute the graceful drain and block until both the min-drain-period has elapsed and open connections have closed (similar to what the bash script does currently)

Implement a sub-command in Envoy Gateway called envoy shutdown-manager which exposes an HTTP endpoint on port 19002 and responds to /shutdown/ready which will block until the graceful drain initiated by the envoy shutdown sub-command has completed.

Add a shutdown-manager container to the Envoy proxy pod running the envoy shutdown-manager http server.

HTTP preStop hook on the envoy container requesting /shutdown/ready on port 19002

Exec preStop hook on the shutdown-manager container which executes the envoy shutdown sub-command

When the pod gets a request to shutdown, the preStop hook will be triggered on both the envoy and shutdown-manager containers simultaneously. The preStop on the shutdown-manager container will execute the graceful drain and the preStop on the envoy pod will block until the drain is complete. Each of the containers will receive the SIGTERM from k8s per-normal when the preStop hook on the container completes.

I've mocked this up as a proof-of-concept to test the setup. Unless there are objections I will plan on re-implementing the drain process per the above outline which is very similar to what Contour did previously and appears it will work based on my testing.

hey @davidalger , exposing /shutdown on envoy has the same security risk, if its exposed on 0.0.0.0, it can be called by anyone .

@davidalger does preStop.httpGet.host = localhost work ?
https://github.com/kubernetes/kubernetes/blob/b069ef6e13a2f44ce844acd9ff88a6957fc6d418/pkg/apis/core/types.go#L2201

does preStop.httpGet.host = localhost work ?

Tried this. It fails with a PreStopHook failed warning added to the pod status and the container exits without allowing the drain to occur. I'm fairly certain this is because the request is coming from kubelet, which isn't local to the pod and thus can't actually make a request to the loopback.

exposing /shutdown on envoy has the same security risk, if its exposed on 0.0.0.0, it can be called by anyone .

This is mitigated by having the /shutdown/ready endpoint largely be a dumb wait that does nothing more than block envoy container shutdown until the the drain logic in the exec prestop hook on the shutdown-manager container finishes.

There really are two reasons why the actual drain logic is in an exec to a sub-command on the eg binary:

a) The security aspect, since http prestop hooks must be exposed to the rest of the cluster

b) The sub-command will be useful for a local (any non-k8s) EG deployment when such may be implemented in the future

I've pushed up my work in progress for those that may like to take a look. There are still some things to do to finish it up, but it's functional. Couple example log sets from restarts under varying load, with no failed requests:

example 1 logs from shutdown-manager container

2024-02-21T14:47:39.037Z INFO shutdown-manager envoy/shutdown_manager.go:72 starting shutdown manager 2024-02-21T14:47:55.365Z INFO shutdown-manager envoy/shutdown_manager.go:86 received shutdown ready request 2024-02-21T14:47:55.398Z INFO shutdown-manager envoy/shutdown_manager.go:115 initiating graceful drain with 15 second minimum drain period and 600 second timeout 2024-02-21T14:47:55.399Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 10 2024-02-21T14:47:56.401Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 10 2024-02-21T14:47:57.402Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 10 2024-02-21T14:47:58.404Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 10 2024-02-21T14:47:59.408Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 9 2024-02-21T14:48:00.412Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 9 2024-02-21T14:48:01.414Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 9 2024-02-21T14:48:02.421Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 9 2024-02-21T14:48:03.427Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 9 2024-02-21T14:48:04.428Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 5 2024-02-21T14:48:05.438Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 5 2024-02-21T14:48:06.441Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 5 2024-02-21T14:48:07.443Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 5 2024-02-21T14:48:08.454Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 5 2024-02-21T14:48:09.461Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 0 2024-02-21T14:48:10.464Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 0 2024-02-21T14:48:10.464Z INFO shutdown-manager envoy/shutdown_manager.go:140 minimum drain period reached; will exit when total connections is 0 2024-02-21T14:48:10.464Z INFO shutdown-manager envoy/shutdown_manager.go:148 graceful drain sequence completed 2024-02-21T14:48:10.473Z INFO shutdown-manager envoy/shutdown_manager.go:62 received SIGTERM 2024-02-21T14:48:11.386Z INFO shutdown-manager envoy/shutdown_manager.go:97 shutdown readiness detected Stream closed EOF for envoy-gateway-system/envoy-default-eg-e41e7b31-65bb79b45-mkgdt (shutdown-manager)

example 2 logs from shutdown-manager container

2024-02-21T14:47:55.887Z INFO shutdown-manager envoy/shutdown_manager.go:72 starting shutdown manager 2024-02-21T14:48:56.058Z INFO shutdown-manager envoy/shutdown_manager.go:86 received shutdown ready request 2024-02-21T14:48:56.087Z INFO shutdown-manager envoy/shutdown_manager.go:115 initiating graceful drain with 15 second minimum drain period and 600 second timeout 2024-02-21T14:48:56.089Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 100 2024-02-21T14:48:57.089Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 100 2024-02-21T14:48:58.090Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 100 2024-02-21T14:48:59.091Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 100 2024-02-21T14:49:00.091Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 100 2024-02-21T14:49:01.093Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 74 2024-02-21T14:49:02.094Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 74 2024-02-21T14:49:03.094Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 74 2024-02-21T14:49:04.096Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 74 2024-02-21T14:49:05.096Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 74 2024-02-21T14:49:06.097Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 33 2024-02-21T14:49:07.099Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 33 2024-02-21T14:49:08.100Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 33 2024-02-21T14:49:09.101Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 33 2024-02-21T14:49:10.102Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 33 2024-02-21T14:49:11.103Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 11 2024-02-21T14:49:11.103Z INFO shutdown-manager envoy/shutdown_manager.go:140 minimum drain period reached; will exit when total connections is 0 2024-02-21T14:49:12.105Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 11 2024-02-21T14:49:13.106Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 11 2024-02-21T14:49:14.106Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 11 2024-02-21T14:49:15.107Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 11 2024-02-21T14:49:16.108Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:17.109Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:18.111Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:19.111Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:20.112Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:21.113Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:22.114Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:23.115Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:24.116Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:25.117Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:26.119Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:27.120Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:28.121Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:29.122Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:30.124Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 1 2024-02-21T14:49:31.133Z INFO shutdown-manager envoy/shutdown_manager.go:196 total connections: 0 2024-02-21T14:49:31.133Z INFO shutdown-manager envoy/shutdown_manager.go:148 graceful drain sequence completed 2024-02-21T14:49:31.148Z INFO shutdown-manager envoy/shutdown_manager.go:62 received SIGTERM 2024-02-21T14:49:32.080Z INFO shutdown-manager envoy/shutdown_manager.go:97 shutdown readiness detected Stream closed EOF for envoy-gateway-system/envoy-default-eg-e41e7b31-65bb79b45-crxbp (shutdown-manager)

Signed-off-by: David Alger <davidmalger@gmail.com>

davidalger · 2024-02-21T16:14:46Z

/retest

arkodg · 2024-02-21T20:24:34Z

internal/infrastructure/kubernetes/proxy/resource.go

+			},
+		},
+		{
+			Name:            "shutdown-manager",


should this be a sidecar container ?

this will ensure that preStop for shutdown-manager runs before envoy

building off this

preStop exec shutdown performs a touch on the tmp file and exits

preStop exec /shutdown/ready performs the envoy shutdown procedure and blocks until complete

/shutdown/ready can return a non 200 error code immediately if tmp file doesnt exist

Not sure of the shutdown sequencing as I don't see much about it in the announcement blog post or on the docs page other than indications that pod termination is not blocked by sidecar containers.

Update: Found it under pod lifecycle docs:

Beginning with Kubernetes 1.29, if your Pod includes one or more sidecar containers (init containers with an Always restart policy), the kubelet will delay sending the TERM signal to these sidecar containers until the last main container has fully terminated. The sidecar containers will be terminated in the reverse order they are defined in the Pod spec. This ensures that sidecar containers continue serving the other containers in the Pod until they are no longer needed.

I could look into this if we are OK with dropping support for 1.28 where sidecars containers are feature gated only being enabled by default in 1.29. Not sure I'd recommend this though with 1.29 not yet being available as stable/GA on AKS or GKE

ah good point, we cant rely on order yet, since sidecars just GA'd

then I propose /shutdown/ready sleeps for 1s (ensuring in case of termination, the other preStop had the chance to create the tmp file) before checking whether the tmp file exists or not.
If it doesnt, return 404, else run the shutdown steps and return 200.
This eliminates the need for the for within the http handler.

at a high level, big +1 with this approach

then I propose /shutdown/ready sleeps for 1s (ensuring in case of termination, the other preStop had the chance to create the tmp file) before checking whether the tmp file exists or not.
If it doesnt, return 404, else run the shutdown steps and return 200.

Not sure I follow. As soon as the /shutdown/ready request returns (no matter the response code) kubelet will immediately send sigterm to PID 1 (envoy).

Was the concern with the for loop that it could never return? I've modified this to ensure requests don't endlessly poll after the ready-timeout: 4a3b90f since http.TimeoutHandler of course wouldn't interrupt the processing just breaking the connection.

the concerns I was trying to mitigate were

using more CPU with the for

allowing non kubelet clients to hoard the shutdown handler CPU

Since we cannot establish order between the envoy and shutdown-manager container, I was proposing we use a delay (1s / 2s) to ensure that for the preStop case, we can be confident that the other preStop has signaled to us (using a file) that its time to shutdown.

For the non preStop case, it would just be a delay + return 404

Updated in b3dcffd

guydc

Great work!

guydc · 2024-02-22T15:19:50Z

internal/cmd/envoy/shutdown_manager.go

+
+	// Reconfigure logger to write to stdout of main process if running in Kubernetes
+	if _, k8s := os.LookupEnv("KUBERNETES_SERVICE_HOST"); k8s && os.Getpid() != 1 {
+		logger = logging.FileLogger("/proc/1/fd/1", "shutdown-manager", v1alpha1.LogLevelInfo)


There's a discussion here about this option maybe not working if running the process as non-root.
Not sure if this concern is applicable in this case...

writing directly to /proc/1/fd/1 will be problematic if the process drops root privileges but PID 1 is still running as root

I don't think it will apply in this case, as neither PID 1 nor the exec'd process are running as root since the container sets the uid:gid in the build (see here)

guydc · 2024-02-22T15:42:25Z

internal/cmd/envoy/shutdown_manager.go

+	// Setup HTTP handler
+	handler := http.NewServeMux()
+	handler.HandleFunc("/shutdown/ready", func(w http.ResponseWriter, r *http.Request) {
+		shutdownReadyHandler(w, r, ShutdownReadyFile)


Can this timeout before the handler actually detects shutdown readdiness due to the server timeouts and/or kubelet timeouts? If so, what would be the behavior?

It has a configurable timeout (defaulting to 10s longer than the drain timeout) as a safeguard, but it won't timeout prior to this per documented and observed behavior.

If this did return before the exec prestop completed the drain, envoy would be sent a SIGTERM and exit after which you may observe errors such as this one logged until the drain timeout elapses and the shutdown command exits:

shutdown-manager 2024-02-22T20:48:54.259Z ERROR shutdown-manager envoy/shutdown_manager.go:145 error getting total connections {"error": "Get \"http://127.0.0.1:19000//stats?filter=^server\\\\.total_connections$&format=json\": dial tcp 127.0.0.1:19000: connect: connection refused"}

This should never occur with properly configured (staggered) timeouts, so I would hesitate to code around it for an initial implementation preferring to let the errors be logged to get eyes on the potential problem.

Signed-off-by: David Alger <davidmalger@gmail.com>

davidalger · 2024-02-23T19:24:06Z

/retest

arkodg · 2024-02-26T19:06:24Z

internal/infrastructure/kubernetes/proxy/resource_provider.go

 	return deployment, nil
 }

+func expectedTerminationGracePeriodSeconds(cfg *egv1a1.ShutdownConfig) *int64 {


shouldn't this be equal to DrainTimeout ?

terminationGracePeriodSeconds needs to be higher than the drain timeout so the pod has time to handle the SIGTERM before being sent a SIGKILL. It doesn't have to be much, but it does need to be marginally higher, and assuming other timeouts are functioning it should never be reached under normal circumstances.

arkodg · 2024-02-26T19:13:16Z

internal/cmd/envoy/shutdown_manager.go

+			logger.Error(err, "error getting total connections")
+		}
+
+		if elapsedTime > minDrainDuration && !allowedToExit {


nit: would it make sense to move minDrainDuration outside the loop and sleep for that duration before performs checks and calls ?

Probably a matter of personal preference, but I opted for this method so that drain logs make it obvious what is happening and one gets visibility on connection counts changing more immediately after starting the graceful drain vs waiting X seconds silently before starting to log for the drain process.

arkodg · 2024-02-26T19:24:39Z

internal/cmd/envoy/shutdown_manager.go

+		minDrainDuration.Seconds(), drainTimeout.Seconds()))
+
+	// Send request to Envoy admin API to initiate the graceful drain sequence
+	if resp, err := http.Post(fmt.Sprintf("http://%s:%d/drain_listeners?graceful&skip_exit",


will this cause the health check defined in

gateway/internal/xds/bootstrap/bootstrap.yaml.tpl

Line 93 in 338515a

"@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck

to fail ?

also curious why is skip_exit needed ?

The healthcheck endpoint will not fail. It is unimportant though as the the ready status of the endpoint will immediately become false when the pod enters the terminating status triggering anything that uses the endpoint to begin deprogramming.

skip_exit is useful to ensure new connections are not unexpectedly blocked if a drain takes than drain-time-s on the Envoy proxy, so it becomes important where long drain timeouts are configured. It's a knob recently added to the drain API to support Istio's graceful drain process as they transition to native sidecars: envoyproxy/envoy#30003 and I thought it made sense to include it so we don't have to also configure drain-time-s on Envoy where longer than 10m drain periods are needed.

I think its important to fail the healthcheck endpoint, so downstream LBs that are using a active health check mechanism can deprogram the endpoint faster

referring to calling this endpoint explicitly as the first step of the shutdown process
https://www.envoyproxy.io/docs/envoy/latest/operations/admin#post--healthcheck-fail
cc @guydc

you're right, fast failure is also a good thing to maintain

updated in f249d73 to first fail the healthcheck and then proceed with a graceful drain. in order to do this, I've configured MODIFY_ONLY drain type on the HTTP listener so the call to /healthcheck/fail will not change the drain strategy used.

I think that the LB concern is related to a specific deployment type (1 proxy per Node with ExternalTrafficPolicy: Local). This is already handled in K8s 1.26 by: https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/#traffic-loss-from-load-balancers-during-rolling-updates

Signed-off-by: David Alger <davidmalger@gmail.com>

arkodg

thanks for building this out @davidalger, all users will benefit from this work, and not have to deal with custom images or scripts !
have a few non blocking questions, overall LGTM !

Signed-off-by: David Alger <davidmalger@gmail.com>

internal/xds/translator/listener.go

Signed-off-by: David Alger <davidmalger@gmail.com>

guydc

LGTM

… termination (envoyproxy#2633)" This reverts commit 329aafc.

davidalger requested a review from a team as a code owner February 16, 2024 21:40

feat: Gracefully drain listeners before envoy shutdown on pod termina…

b4baa8f

…tion Signed-off-by: David Alger <davidmalger@gmail.com>

davidalger force-pushed the algerdev/graceful-pod-termination branch from e2c7dc7 to b4baa8f Compare February 16, 2024 22:02

guydc reviewed Feb 16, 2024

View reviewed changes

davidalger added 3 commits February 17, 2024 13:46

Setup hooks to manage graceful Envoy shutdown process

2b9d79c

Signed-off-by: David Alger <davidmalger@gmail.com>

Implement graceful drain process in shutdown-manager

fa8ad3a

Signed-off-by: David Alger <davidmalger@gmail.com>

Send logs from exec prestop hook to stdout of main process

9a8b290

Signed-off-by: David Alger <davidmalger@gmail.com>

arkodg mentioned this pull request Feb 19, 2024

chore: Update Envoy proxy image to envoy:distroless-dev in main #2640

Merged

davidalger added 2 commits February 21, 2024 08:26

Merge branch 'main' into algerdev/graceful-pod-termination

666275e

Signed-off-by: David Alger <davidmalger@gmail.com>

Make linter happy

367d4a3

Signed-off-by: David Alger <davidmalger@gmail.com>

davidalger marked this pull request as draft February 21, 2024 15:12

Minor cleanup

b2e68b1

Signed-off-by: David Alger <davidmalger@gmail.com>

arkodg reviewed Feb 21, 2024

View reviewed changes

guydc reviewed Feb 22, 2024

View reviewed changes

davidalger added 8 commits February 22, 2024 13:04

Stop polling when ready-timeout exceeded

4a3b90f

Signed-off-by: David Alger <davidmalger@gmail.com>

Container configuration for shutdown manager

fa0f1ec

Signed-off-by: David Alger <davidmalger@gmail.com>

Setup health probes

23144c9

Signed-off-by: David Alger <davidmalger@gmail.com>

Configurable shutdown timeouts

62ec399

Signed-off-by: David Alger <davidmalger@gmail.com>

Correct exitAtConnections logic

b3bc785

Signed-off-by: David Alger <davidmalger@gmail.com>

Lower shutdown timeouts for conformance tests

561530a

Signed-off-by: David Alger <davidmalger@gmail.com>

Merge branch 'main' into algerdev/graceful-pod-termination

1ab6e6c

Integrate with latest from main

ac5bac1

Signed-off-by: David Alger <davidmalger@gmail.com>

arkodg reviewed Feb 26, 2024

View reviewed changes

Recombine image consts

0c701bd

Signed-off-by: David Alger <davidmalger@gmail.com>

arkodg previously approved these changes Feb 26, 2024

View reviewed changes

arkodg requested review from a team and guydc February 26, 2024 19:49

davidalger dismissed arkodg’s stale review via 532d9e0 February 26, 2024 20:14

Lower default min-drain-duration to 5 seconds

eafa8a5

Signed-off-by: David Alger <davidmalger@gmail.com>

davidalger force-pushed the algerdev/graceful-pod-termination branch from 532d9e0 to eafa8a5 Compare February 26, 2024 20:15

davidalger added 5 commits February 26, 2024 14:18

Update generated docs

7696402

Signed-off-by: David Alger <davidmalger@gmail.com>

Merge branch 'main' into algerdev/graceful-pod-termination

3a5071a

Fail health checks to support fast failure when active health checking

f249d73

Signed-off-by: David Alger <davidmalger@gmail.com>

Update tests

fa9bcd3

Signed-off-by: David Alger <davidmalger@gmail.com>

Merge branch 'main' into algerdev/graceful-pod-termination

013099d

davidalger requested a review from arkodg February 26, 2024 22:25

arkodg reviewed Feb 26, 2024

View reviewed changes

internal/xds/translator/listener.go Outdated Show resolved Hide resolved

Move drain type config to correct place

afcbf23

Signed-off-by: David Alger <davidmalger@gmail.com>

arkodg approved these changes Feb 26, 2024

View reviewed changes

guydc approved these changes Feb 27, 2024

View reviewed changes

arkodg merged commit 329aafc into envoyproxy:main Feb 27, 2024

arkodg mentioned this pull request Feb 27, 2024

Handle EnvoyProxy Image version upgrades #1712

Closed

NamiKazu added a commit to NamiKazu/envoyGateway that referenced this pull request Feb 28, 2024

Revert "feat: Gracefully drain listeners before envoy shutdown on pod…

5805f29

… termination (envoyproxy#2633)" This reverts commit 329aafc.

guydc mentioned this pull request Mar 5, 2024

E2E Zero DownTime Test when upgrading Envoy Proxy Versions #2610

Closed

davidalger deleted the algerdev/graceful-pod-termination branch March 8, 2024 21:29

davidalger mentioned this pull request Mar 10, 2024

fix: Address race condition disrupting graceful shutdown process #2864

Merged

benediktwerner mentioned this pull request May 3, 2024

Zero downtime with AWS NLB/ALB #3321

Closed

arkodg mentioned this pull request Sep 10, 2024

Terminating pods don't respond with "Connection: close" (H1) or GOAWAY(H2) on shutdown #4205

Closed

ahmedwaleedmalik mentioned this pull request Nov 11, 2025

Graceful shutdown for Envoy Proxy kubermatic/kubelb#168

Closed

Conversation

davidalger commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidalger commented Feb 16, 2024

Uh oh!

davidalger commented Feb 16, 2024

Uh oh!

davidalger commented Feb 16, 2024

Uh oh!

guydc commented Feb 16, 2024

Uh oh!

codecov bot commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

davidalger commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidalger Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arkodg Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidalger Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arkodg Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidalger commented Feb 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidalger Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guydc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidalger commented Feb 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidalger commented Feb 16, 2024 •

edited

Loading

codecov bot commented Feb 16, 2024 •

edited

Loading

davidalger commented Feb 16, 2024 •

edited

Loading

davidalger Feb 16, 2024 •

edited

Loading

arkodg Feb 17, 2024 •

edited

Loading

davidalger Feb 17, 2024 •

edited

Loading

arkodg Feb 20, 2024 •

edited

Loading

davidalger Feb 21, 2024 •

edited

Loading

arkodg Feb 26, 2024 •

edited

Loading