Skip to content

HTTP Health Check doesn't work as expected #3033

@shuji-2019

Description

@shuji-2019

Description:

Envoy failed to check endpoint's health status via HTTP.

Repro steps:

  1. Follow Quickstart guide to install EG v1.0.0 and apply quickstart.yaml.
  2. Apply BackendTrafficPolicy to check endpoint's health status.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: backend-http-health-check
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: backend
    namespace: default
  healthCheck:
    active:
      timeout: 3s
      interval: 5s
      unhealthyThreshold: 3
      healthyThreshold: 1
      type: HTTP
      http:
        path: "/healthz"
        expectedStatuses:
          - 200
  1. Execute kubectl port-forward deploy/{envoy_deployment_name} -n envoy-gateway-system 19000:19000 to expose Envoy admin service to outside of K8s cluster.

Environment:

  • K8s v1.26.14
  • EG v1.0.0
  • Envoy v1.29.2

Logs:

EG logs show xDS IR

http:
- address: 0.0.0.0
  hostnames:
  - '*'
  isHTTP2: false
  name: default/eg/http
  path:
    escapedSlashesAction: UnescapeAndRedirect
    mergeSlashes: true
  port: 10080
  routes:
  - backendWeights:
      invalid: 0
      valid: 0
    destination:
      name: httproute/default/backend/rule/0
      settings:
      - addressType: IP
        endpoints:
        - host: 10.250.92.44
          port: 3000
        protocol: HTTP
        weight: 1
    healthCheck:
      active:
        healthyThreshold: 1
        http:
          expectedStatuses:
          - 200
          path: /healthz
        interval: 5s
        timeout: 3s
        unhealthyThreshold: 3
    hostname: www.example.com
    isHTTP2: false
    name: httproute/default/backend/rule/0/match/0/www_example_com
    pathMatch:
      distinct: false
      name: ""
      prefix: /

Envoy admin service REST API :19000/config_dump shows xDS cluster

{
  "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
  "name": "httproute/default/backend/rule/0",
  "type": "EDS",
  "eds_cluster_config": {
    "eds_config": {
      "ads": {},
      "resource_api_version": "V3"
    },
    "service_name": "httproute/default/backend/rule/0"
  },
  "connect_timeout": "10s",
  "per_connection_buffer_limit_bytes": 32768,
  "lb_policy": "LEAST_REQUEST",
  "health_checks": [
    {
      "timeout": "3s",
      "interval": "5s",
      "unhealthy_threshold": 3,
      "healthy_threshold": 1,
      "http_health_check": {
        "path": "/healthz",
        "expected_statuses": [
          {
            "start": "200",
            "end": "201"
          }
        ]
      }
    }
  ],
  "circuit_breakers": {
    "thresholds": [
      {
        "max_retries": 1024
      }
    ]
  },
  "dns_lookup_family": "V4_ONLY",
  "outlier_detection": {},
  "common_lb_config": {
    "locality_weighted_lb_config": {}
  }
}

Envoy admin service REST API :19000/clusters shows xDS cluster status

httproute/default/backend/rule/0::observability_name::httproute/default/backend/rule/0
httproute/default/backend/rule/0::outlier::success_rate_average::-1
httproute/default/backend/rule/0::outlier::success_rate_ejection_threshold::-1
httproute/default/backend/rule/0::outlier::local_origin_success_rate_average::-1
httproute/default/backend/rule/0::outlier::local_origin_success_rate_ejection_threshold::-1
httproute/default/backend/rule/0::default_priority::max_connections::1024
httproute/default/backend/rule/0::default_priority::max_pending_requests::1024
httproute/default/backend/rule/0::default_priority::max_requests::1024
httproute/default/backend/rule/0::default_priority::max_retries::1024
httproute/default/backend/rule/0::high_priority::max_connections::1024
httproute/default/backend/rule/0::high_priority::max_pending_requests::1024
httproute/default/backend/rule/0::high_priority::max_requests::1024
httproute/default/backend/rule/0::high_priority::max_retries::3
httproute/default/backend/rule/0::added_via_api::true
httproute/default/backend/rule/0::eds_service_name::httproute/default/backend/rule/0
httproute/default/backend/rule/0::10.250.92.44:3000::cx_active::0
httproute/default/backend/rule/0::10.250.92.44:3000::cx_connect_fail::0
httproute/default/backend/rule/0::10.250.92.44:3000::cx_total::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_active::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_error::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_success::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_timeout::0
httproute/default/backend/rule/0::10.250.92.44:3000::rq_total::0
httproute/default/backend/rule/0::10.250.92.44:3000::hostname::
httproute/default/backend/rule/0::10.250.92.44:3000::health_flags::/failed_active_hc
httproute/default/backend/rule/0::10.250.92.44:3000::weight::1
httproute/default/backend/rule/0::10.250.92.44:3000::region::httproute/default/backend/rule/0/backend/0
httproute/default/backend/rule/0::10.250.92.44:3000::zone::
httproute/default/backend/rule/0::10.250.92.44:3000::sub_zone::
httproute/default/backend/rule/0::10.250.92.44:3000::canary::false
httproute/default/backend/rule/0::10.250.92.44:3000::priority::0
httproute/default/backend/rule/0::10.250.92.44:3000::success_rate::-1
httproute/default/backend/rule/0::10.250.92.44:3000::local_origin_success_rate::-1

Analysis:

  1. EG doesn't set config.core.v3.HealthCheck.HttpHealthCheck.host in cluster.go#L207-#221.
  2. EG doesn't set config.endpoint.v3.Endpoint.HealthCheckConfig.hostname in cluster.go#L377-L394.
  3. Envoy will use cluster name as Host header value when perform HTTP health check to endpoint.
  4. But cluster name generated by EG (like httproute/default/backend/rule/0) is not a valid Host according to RFC9110 7.2. Host and :authority, backend service will respond with 400 Bad Request to health check request issued by Envoy.
kubectl port-forward deploy/backend  3000:3000

curl localhost:3000 -H 'Host: httproute/default/backend/rule/0' -v
*   Trying [::1]:3000...
* Connected to localhost (::1) port 3000
> GET / HTTP/1.1
> Host: httproute/default/backend/rule/0
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request: malformed Host header
< Content-Type: text/plain; charset=utf-8
< Connection: close
<
* Closing connection
400 Bad Request: malformed Host header

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions