-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Graceful HTTP Connection Draining During Shutdown? #7841
Description
Title: Graceful HTTP Connection Draining During Shutdown?
Description:
At my organization, we are preparing a large-scale Envoy deployment for serving all network traffic at the edge on ingress to our network. We are using static configuration files on disk and using the Python reloader script to execute hot reloads when our configuration on disk changes.
Sometimes, we need to stop and start the Envoy process completely.
I was hoping to see clean shutdown functionality similar to how NGINX handles a process shutdown. After a shutdown signal, NGINX stops accepting new TCP connections and attempts to complete all current HTTP requests with Connection: close to gracefully shut down the connections before stopping the process. This means that connections terminate cleanly, without a TCP close unless a timeout is exceeded.
In a nutshell, NGINX does this:
- Receive terminate signal.
- Stop accepting new TCP connections.
- Wait for up to
$TIMEOUTseconds for HTTP connections to terminate before terminating their TCP sockets. - For each existing request, serve
Connection: closeback in each response to cleanly close each connection. - Exit the process(es).
When I restart Envoy, either with SIGTERM or with the admin interface (/quitquitquit), I see TCP connection resets, rather than Connection: close. The hot reload process does connection closing properly, but it doesn't seem that shutdown does.
Repro steps:
We are using Envoy 1.11.0 on Ubuntu 16.04 in AWS.
I have open-sourced our load-testing tool using the exact Python version, requests version, etc. to reliably reproduce the issue.
Our systemd unit for running Envoy:
envoy.service
[Unit]
Description=Envoy Proxy
Requires=network-online.target
After=network-online.target
[Service]
Type=simple
Environment="ENVOY_CONFIG_FILE=/etc/envoy/envoy.yaml"
Environment="ENVOY_START_OPTS=--use-libevent-buffers 0 --parent-shutdown-time-s 60"
ExecStart=/usr/local/bin/envoy-restarter.py /usr/local/bin/start-envoy.sh
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -TERM $MAINPID
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
We are using the exact Python restarter tool that is currently in master.
The script that we have the Python restarter tool run is this:
#!/bin/bash
exec /usr/local/bin/envoy -c $ENVOY_CONFIG_FILE $ENVOY_START_OPTS --restart-epoch $RESTART_EPOCHConfig:
---
static_resources:
listeners:
- name: listener_http
address:
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: ingress_http
route_config:
name: route_config
virtual_hosts:
- name: abc.mycompany.com
domains: ["abc.mycompany.com", "*.abc.mycompany.com"]
routes:
- match:
prefix: "/"
route:
cluster: abc
http_filters:
- name: envoy.router
clusters:
- name: abc
connect_timeout: 1.0s
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: abc.internal.mycompany.com
port_value: 80
circuit_breakers:
thresholds:
priority: DEFAULT
max_connections: 100000000
max_pending_requests: 1000000000
max_requests: 100000000
max_retries: 1000000000
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }Logs:
Aug 06 21:16:27 hostname systemd[1]: Stopping Envoy Proxy...
Aug 06 21:16:27 hostname envoy-restarter.py[4406]: [2019-08-06 21:16:27.955][4408][warning][main] [source/server/server.cc:463] caught SIGTERM
Aug 06 21:16:27 hostname envoy-restarter.py[4406]: [2019-08-06 21:16:27.955][4408][info][main] [source/server/server.cc:567] shutting down server instance
Aug 06 21:16:27 hostname envoy-restarter.py[4406]: [2019-08-06 21:16:27.955][4408][info][main] [source/server/server.cc:521] main dispatch loop exited
Aug 06 21:16:27 hostname envoy-restarter.py[4406]: [2019-08-06 21:16:27.958][4408][info][main] [source/server/server.cc:560] exiting
Aug 06 21:16:28 hostname envoy-restarter.py[4406]: starting hot-restarter with target: /usr/local/bin/start-envoy.sh
Aug 06 21:16:28 hostname envoy-restarter.py[4406]: forking and execing new child process at epoch 0
Aug 06 21:16:28 hostname envoy-restarter.py[4406]: forked new child process with PID=4408
Aug 06 21:16:28 hostname envoy-restarter.py[4406]: got SIGTERM
Aug 06 21:16:28 hostname envoy-restarter.py[4406]: sending TERM to PID=4408
Aug 06 21:16:28 hostname envoy-restarter.py[4406]: got SIGTERM
Aug 06 21:16:28 hostname envoy-restarter.py[4406]: all children exited cleanly
Aug 06 21:16:28 hostname systemd[1]: Stopped Envoy Proxy.