-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Title: HTTP health checker does not gracefully handle HTTP2 server shutdown (GOAWAY with NO_ERROR)
Description:
The HTTP health checker does not gracefully handle when an HTTP2 server performs the graceful drain sequence. GOAWAY is ignored (no callbacks registered) and then health checks on the old connection fail with REFUSED_STREAM until the connection is forcibly closed by the upstream, at which point the health checker reconnects.
Repro steps:
I've reproduced with this Envoy bootstrap config:
admin:
address:
socket_address:
address: 127.0.0.1
port_value: 8082
access_log_path: '/tmp/envoytesting/admin.log'
dynamic_resources:
lds_config:
path: /tmp/envoytesting/lds.yaml
static_resources:
clusters:
- name: self_health_checked
connect_timeout: 5s
type: STATIC
lb_policy: RANDOM
http2_protocol_options: {}
health_checks:
- interval: "1s"
no_traffic_interval: 1s
timeout: 5s
unhealthy_threshold: 1
healthy_threshold: 1
http_health_check:
path: /_status
codec_client_type: HTTP2
always_log_health_check_failures: true
event_log_path: /tmp/envoytesting/healthcheck.log
load_assignment:
cluster_name: self_health_checked
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: 127.0.0.1, port_value: 8081 }
http2_protocol_options: {}
node:
id: foo
cluster: foo
And this in /tmp/envoytesting/lds.yaml
version_info: "http_stats2"
type_url: type.googleapis.com/envoy.config.listener.v3.Listener
resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
name: http_listener
address:
socket_address:
address: 127.0.0.1
port_value: 8081
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: http_stats2
http_filters:
- name: health
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
pass_through_mode: false
headers:
name: :path
exact_match: /_status
- name: router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
virtual_hosts:
- name: "only_host"
domains: ["*"]
routes:
- match:
prefix: "/"
direct_response:
status: 200
body:
inline_string: "direct response"
Triggering an lds update by updating something like the stat prefix and mv over lds.yaml causes health checks to fail.
Logs from running Envoy with --component-log-level connection:debug,http2:debug,client:debug,hc:debug:
[2020-10-13 15:24:20.877][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:295] [C1] hc response=200 health_flags=healthy
[2020-10-13 15:24:20.877][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 0
[2020-10-13 15:24:21.229][69][info][upstream] [external/envoy/source/server/lds_api.cc:79] lds: add/update listener 'http_listener'
[2020-10-13 15:24:21.877][77][debug][http2] [external/envoy/source/common/http/http2/codec_impl_legacy.cc:813] [C2] sent goaway code=0
[2020-10-13 15:24:21.877][77][debug][http2] [external/envoy/source/common/http/http2/codec_impl_legacy.cc:933] [C2] stream closed: 0
[2020-10-13 15:24:21.877][69][debug][client] [external/envoy/source/common/http/codec_client.cc:109] [C1] response complete
[2020-10-13 15:24:21.877][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:295] [C1] hc response=200 health_flags=healthy
[2020-10-13 15:24:21.877][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 0
[2020-10-13 15:24:22.878][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 7
[2020-10-13 15:24:22.878][69][debug][client] [external/envoy/source/common/http/codec_client.cc:119] [C1] request reset
[2020-10-13 15:24:22.878][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:287] [C1] connection/stream error health_flags=healthy
[2020-10-13 15:24:23.879][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 7
[2020-10-13 15:24:23.880][69][debug][client] [external/envoy/source/common/http/codec_client.cc:119] [C1] request reset
[2020-10-13 15:24:23.880][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:287] [C1] connection/stream error health_flags=/failed_active_hc
[2020-10-13 15:24:24.880][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 7
[2020-10-13 15:24:24.881][69][debug][client] [external/envoy/source/common/http/codec_client.cc:119] [C1] request reset
[2020-10-13 15:24:24.881][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:287] [C1] connection/stream error health_flags=/failed_active_hc
[2020-10-13 15:24:25.883][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 7
[2020-10-13 15:24:25.883][69][debug][client] [external/envoy/source/common/http/codec_client.cc:119] [C1] request reset
[2020-10-13 15:24:25.883][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:287] [C1] connection/stream error health_flags=/failed_active_hc
[2020-10-13 15:24:26.879][77][debug][http2] [external/envoy/source/common/http/http2/codec_impl_legacy.cc:813] [C2] sent goaway code=0
[2020-10-13 15:24:26.880][77][debug][connection] [external/envoy/source/common/network/connection_impl.cc:106] [C2] closing data_to_write=17 type=2
[2020-10-13 15:24:26.880][77][debug][connection] [external/envoy/source/common/network/connection_impl_base.cc:40] [C2] setting delayed close timer with timeout 1000 ms
[2020-10-13 15:24:26.880][77][debug][connection] [external/envoy/source/common/network/connection_impl.cc:641] [C2] write flush complete
[2020-10-13 15:24:26.882][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 7
[2020-10-13 15:24:26.882][69][debug][client] [external/envoy/source/common/http/codec_client.cc:119] [C1] request reset
[2020-10-13 15:24:26.882][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:287] [C1] connection/stream error health_flags=/failed_active_hc
[2020-10-13 15:24:27.884][77][debug][connection] [external/envoy/source/common/network/connection_impl_base.cc:54] [C2] triggered delayed close
[2020-10-13 15:24:27.884][77][debug][connection] [external/envoy/source/common/network/connection_impl.cc:202] [C2] closing socket: 1
[2020-10-13 15:24:27.884][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:960] [C1] stream closed: 7
[2020-10-13 15:24:27.884][69][debug][client] [external/envoy/source/common/http/codec_client.cc:119] [C1] request reset
[2020-10-13 15:24:27.885][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:287] [C1] connection/stream error health_flags=/failed_active_hc
[2020-10-13 15:24:27.885][69][debug][connection] [external/envoy/source/common/network/connection_impl.cc:579] [C1] remote close
[2020-10-13 15:24:27.885][69][debug][connection] [external/envoy/source/common/network/connection_impl.cc:202] [C1] closing socket: 0
[2020-10-13 15:24:27.885][69][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C1] disconnect. resetting 0 pending requests
[2020-10-13 15:24:28.887][69][debug][client] [external/envoy/source/common/http/codec_client.cc:39] [C3] connecting
[2020-10-13 15:24:28.887][69][debug][connection] [external/envoy/source/common/network/connection_impl.cc:755] [C3] connecting to 127.0.0.1:8081
[2020-10-13 15:24:28.887][69][debug][connection] [external/envoy/source/common/network/connection_impl.cc:771] [C3] connection in progress
[2020-10-13 15:24:28.887][69][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:1168] [C3] updating connection-level initial window size to 268435456
[2020-10-13 15:24:28.888][69][debug][connection] [external/envoy/source/common/network/connection_impl.cc:611] [C3] connected
[2020-10-13 15:24:28.888][69][debug][client] [external/envoy/source/common/http/codec_client.cc:77] [C3] connected
[2020-10-13 15:24:28.888][77][debug][http2] [external/envoy/source/common/http/http2/codec_impl_legacy.cc:1139] [C4] updating connection-level initial window size to 268435456
[2020-10-13 15:24:28.888][77][debug][http2] [external/envoy/source/common/http/http2/codec_impl_legacy.cc:933] [C4] stream closed: 0
[2020-10-13 15:24:28.889][69][debug][client] [external/envoy/source/common/http/codec_client.cc:109] [C3] response complete
[2020-10-13 15:24:28.889][69][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:295] [C3] hc response=200 health_flags=/failed_active_hc
You can see this sequence:
- health checks are succeeding
- the server says
sent goaway code=0 - the in-flight health check succeeds one last time
- a sequence of "stream closed: 7" (7 is REFUSED_STREAM) causing health checks to fail
- server says
closing socket: 1, client saysremote close - client reconnects and health checks start succeeding again.
I'm working on a PR to make the http health checker handle GOAWAY similarly to how the gRPC one does, and will test with my same local repro.