-
Notifications
You must be signed in to change notification settings - Fork 712
HTTP Health Check doesn't work as expected #3033
Copy link
Copy link
Closed
Description
Description:
Envoy failed to check endpoint's health status via HTTP.
Repro steps:
- Follow Quickstart guide to install EG
v1.0.0and apply quickstart.yaml.- Apply BackendTrafficPolicy to check endpoint's health status.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: backend-http-health-check
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: backend
namespace: default
healthCheck:
active:
timeout: 3s
interval: 5s
unhealthyThreshold: 3
healthyThreshold: 1
type: HTTP
http:
path: "/healthz"
expectedStatuses:
- 200
- Execute
kubectl port-forward deploy/{envoy_deployment_name} -n envoy-gateway-system 19000:19000to expose Envoy admin service to outside of K8s cluster.
Environment:
- K8s
v1.26.14- EG
v1.0.0- Envoy
v1.29.2
Logs:
EG logs show xDS IR
http:
- address: 0.0.0.0
hostnames:
- '*'
isHTTP2: false
name: default/eg/http
path:
escapedSlashesAction: UnescapeAndRedirect
mergeSlashes: true
port: 10080
routes:
- backendWeights:
invalid: 0
valid: 0
destination:
name: httproute/default/backend/rule/0
settings:
- addressType: IP
endpoints:
- host: 10.250.92.44
port: 3000
protocol: HTTP
weight: 1
healthCheck:
active:
healthyThreshold: 1
http:
expectedStatuses:
- 200
path: /healthz
interval: 5s
timeout: 3s
unhealthyThreshold: 3
hostname: www.example.com
isHTTP2: false
name: httproute/default/backend/rule/0/match/0/www_example_com
pathMatch:
distinct: false
name: ""
prefix: /Envoy admin service REST API
:19000/config_dumpshows xDS cluster
{
"@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
"name": "httproute/default/backend/rule/0",
"type": "EDS",
"eds_cluster_config": {
"eds_config": {
"ads": {},
"resource_api_version": "V3"
},
"service_name": "httproute/default/backend/rule/0"
},
"connect_timeout": "10s",
"per_connection_buffer_limit_bytes": 32768,
"lb_policy": "LEAST_REQUEST",
"health_checks": [
{
"timeout": "3s",
"interval": "5s",
"unhealthy_threshold": 3,
"healthy_threshold": 1,
"http_health_check": {
"path": "/healthz",
"expected_statuses": [
{
"start": "200",
"end": "201"
}
]
}
}
],
"circuit_breakers": {
"thresholds": [
{
"max_retries": 1024
}
]
},
"dns_lookup_family": "V4_ONLY",
"outlier_detection": {},
"common_lb_config": {
"locality_weighted_lb_config": {}
}
}Envoy admin service REST API
:19000/clustersshows xDS cluster status
httproute/default/backend/rule/0::observability_name::httproute/default/backend/rule/0
httproute/default/backend/rule/0::outlier::success_rate_average::-1
httproute/default/backend/rule/0::outlier::success_rate_ejection_threshold::-1
httproute/default/backend/rule/0::outlier::local_origin_success_rate_average::-1
httproute/default/backend/rule/0::outlier::local_origin_success_rate_ejection_threshold::-1
httproute/default/backend/rule/0::default_priority::max_connections::1024
httproute/default/backend/rule/0::default_priority::max_pending_requests::1024
httproute/default/backend/rule/0::default_priority::max_requests::1024
httproute/default/backend/rule/0::default_priority::max_retries::1024
httproute/default/backend/rule/0::high_priority::max_connections::1024
httproute/default/backend/rule/0::high_priority::max_pending_requests::1024
httproute/default/backend/rule/0::high_priority::max_requests::1024
httproute/default/backend/rule/0::high_priority::max_retries::3
httproute/default/backend/rule/0::added_via_api::true
httproute/default/backend/rule/0::eds_service_name::httproute/default/backend/rule/0
httproute/default/backend/rule/0::10.250.92.44:3000::cx_active::0
httproute/default/backend/rule/0::10.250.92.44:3000::cx_connect_fail::0
httproute/default/backend/rule/0::10.250.92.44:3000::cx_total::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_active::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_error::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_success::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_timeout::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_total::0
httproute/default/backend/rule/0::10.250.92.44:3000::hostname::
httproute/default/backend/rule/0::10.250.92.44:3000::health_flags::/failed_active_hc
httproute/default/backend/rule/0::10.250.92.44:3000::weight::1
httproute/default/backend/rule/0::10.250.92.44:3000::region::httproute/default/backend/rule/0/backend/0
httproute/default/backend/rule/0::10.250.92.44:3000::zone::
httproute/default/backend/rule/0::10.250.92.44:3000::sub_zone::
httproute/default/backend/rule/0::10.250.92.44:3000::canary::false
httproute/default/backend/rule/0::10.250.92.44:3000::priority::0
httproute/default/backend/rule/0::10.250.92.44:3000::success_rate::-1
httproute/default/backend/rule/0::10.250.92.44:3000::local_origin_success_rate::-1
Analysis:
- EG doesn't set config.core.v3.HealthCheck.HttpHealthCheck.host in cluster.go#L207-#221.
- EG doesn't set config.endpoint.v3.Endpoint.HealthCheckConfig.hostname in cluster.go#L377-L394.
- Envoy will use cluster name as Host header value when perform HTTP health check to endpoint.
- But cluster name generated by EG (like
httproute/default/backend/rule/0) is not a valid Host according to RFC9110 7.2. Host and :authority, backend service will respond with 400 Bad Request to health check request issued by Envoy.
kubectl port-forward deploy/backend 3000:3000
curl localhost:3000 -H 'Host: httproute/default/backend/rule/0' -v
* Trying [::1]:3000...
* Connected to localhost (::1) port 3000
> GET / HTTP/1.1
> Host: httproute/default/backend/rule/0
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request: malformed Host header
< Content-Type: text/plain; charset=utf-8
< Connection: close
<
* Closing connection
400 Bad Request: malformed Host headerReactions are currently unavailable