Skip to content

Envoy crashes when gRPC healthcheck service returns SERVICE_UNKNOWN #10825

@kindermoumoute

Description

@kindermoumoute

Bug Template

Title: Envoy crashes when gRPC healthcheck service returns SERVICE_UNKNOWN

Description:
Any service that use gRPC healthcheck with Envoy can make envoy crash by returning SERVICE_UNKNOWN from the Check method.

Based on the spec, SERVICE_UNKNOWN should only be used in the Watch method. However, it is too dangerous to have Envoy crash because one endpoint did not implement correctly the Check method.

Repro steps:

Admin and Stats Output:

service::default_priority::max_connections::1024
service::default_priority::max_pending_requests::1024
service::default_priority::max_requests::1024
service::default_priority::max_retries::3
service::high_priority::max_connections::1024
service::high_priority::max_pending_requests::1024
service::high_priority::max_requests::1024
service::high_priority::max_retries::3
service::added_via_api::false
service::172.27.0.3:3000::cx_active::0
service::172.27.0.3:3000::cx_connect_fail::0
service::172.27.0.3:3000::cx_total::0
service::172.27.0.3:3000::rq_active::0
service::172.27.0.3:3000::rq_error::0
service::172.27.0.3:3000::rq_success::0
service::172.27.0.3:3000::rq_timeout::0
service::172.27.0.3:3000::rq_total::0
service::172.27.0.3:3000::hostname::service1
service::172.27.0.3:3000::health_flags::/failed_active_hc
service::172.27.0.3:3000::weight::1
service::172.27.0.3:3000::region::
service::172.27.0.3:3000::zone::
service::172.27.0.3:3000::sub_zone::
service::172.27.0.3:3000::canary::false
service::172.27.0.3:3000::priority::0
service::172.27.0.3:3000::success_rate::-1.0
service::172.27.0.3:3000::local_origin_success_rate::-1.0

Config:

admin:
  access_log_path: /dev/null
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 8000
node:
  cluster: playground
  id: localhost

static_resources:
  clusters:
  - name: service
    http2_protocol_options: {}
    type: STRICT_DNS
    connect_timeout: 30s
    health_checks:
    - healthy_threshold: 1
      grpc_health_check:
        service_name: service
      interval: 1s
      no_traffic_interval: 10s
      timeout: 10s
      unhealthy_threshold: 1
    dns_refresh_rate: "7200s"
    common_lb_config:
      healthy_panic_threshold:
        value: 33
    load_assignment:
      cluster_name: service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: service1
                port_value: 3000

  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 80
    filter_chains:
    - filters:
      - config:
          access_log:
            config:
              path: /dev/stdout
            name: envoy.file_access_log
          codec_type: AUTO
          http_filters:
          - config: {}
            name: envoy.router
          common_http_protocol_options:
            idle_timeout: 30s
          route_config:
            name: playground
            virtual_hosts:
              domains:
              - '*'
              name: playground
              routes:
              - match:
                  prefix: /
                route:
                  cluster: service
          server_name: playground
          stat_prefix: playground
        name: envoy.http_connection_manager
    name: playground

Logs and Call Stack:

envoy_1     | [2020-04-17 07:35:57.809][1][debug][main] [source/server/server.cc:177] flushing stats
envoy_1     | [2020-04-17 07:35:57.810][1][debug][client] [source/common/http/codec_client.cc:34] [C1] connecting
envoy_1     | [2020-04-17 07:35:57.810][1][debug][connection] [source/common/network/connection_impl.cc:727] [C1] connecting to 172.27.0.2:3000
envoy_1     | [2020-04-17 07:35:57.810][1][debug][connection] [source/common/network/connection_impl.cc:736] [C1] connection in progress
envoy_1     | [2020-04-17 07:35:57.811][1][debug][http2] [source/common/http/http2/codec_impl.cc:970] [C1] updating connection-level initial window size to 268435456
envoy_1     | [2020-04-17 07:35:57.812][1][debug][connection] [source/common/network/connection_impl.cc:592] [C1] connected
envoy_1     | [2020-04-17 07:35:57.812][1][debug][client] [source/common/http/codec_client.cc:72] [C1] connected
envoy_1     | [2020-04-17 07:35:57.817][1][debug][client] [source/common/http/codec_client.cc:104] [C1] response complete
envoy_1     | [2020-04-17 07:35:57.817][1][critical][assert] [source/common/upstream/health_checker_impl.cc:798] panic: not reached
envoy_1     | [2020-04-17 07:35:57.817][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] Caught Aborted, suspect faulting address 0x1
envoy_1     | [2020-04-17 07:35:57.817][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
envoy_1     | [2020-04-17 07:35:57.817][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:92] Envoy version: 3504d40f752eb5c20bc2883053547717bcb92fd8/1.14.1/Clean/RELEASE/BoringSSL
envoy_1     | [2020-04-17 07:35:57.818][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:98] #0: [0x7fc0b991a3d0]
envoy_1     | [2020-04-17 07:35:57.830][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #1: Envoy::Upstream::GrpcHealthCheckerImpl::GrpcActiveHealthCheckSession::onRpcComplete() [0x5585b28683d8]
envoy_1     | [2020-04-17 07:35:57.841][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: Envoy::Upstream::GrpcHealthCheckerImpl::GrpcActiveHealthCheckSession::decodeTrailers() [0x5585b2868b85]
envoy_1     | [2020-04-17 07:35:57.850][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: Envoy::Http::ResponseDecoderWrapper::decodeTrailers() [0x5585b2801500]
envoy_1     | [2020-04-17 07:35:57.859][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: Envoy::Http::Http2::ConnectionImpl::onFrameReceived() [0x5585b29158a8]
envoy_1     | [2020-04-17 07:35:57.867][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: nghttp2_session_mem_recv [0x5585b2581de0]
envoy_1     | [2020-04-17 07:35:57.876][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: Envoy::Http::Http2::ConnectionImpl::dispatch() [0x5585b2914eca]
envoy_1     | [2020-04-17 07:35:57.884][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: Envoy::Http::CodecClient::onData() [0x5585b2871fd8]
envoy_1     | [2020-04-17 07:35:57.893][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #8: Envoy::Http::CodecClient::CodecReadFilter::onData() [0x5585b2872d2d]
envoy_1     | [2020-04-17 07:35:57.902][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #9: Envoy::Network::FilterManagerImpl::onContinueReading() [0x5585b269b603]
envoy_1     | [2020-04-17 07:35:57.910][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #10: Envoy::Network::ConnectionImpl::onReadReady() [0x5585b2697745]
envoy_1     | [2020-04-17 07:35:57.919][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #11: Envoy::Network::ConnectionImpl::onFileEvent() [0x5585b2696a9d]
envoy_1     | [2020-04-17 07:35:57.927][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #12: Envoy::Event::FileEventImpl::assignEvents()::$_0::__invoke() [0x5585b2691aa6]
envoy_1     | [2020-04-17 07:35:57.936][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #13: event_process_active_single_queue [0x5585b2ad0ecb]
envoy_1     | [2020-04-17 07:35:57.945][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #14: event_base_loop [0x5585b2acf75e]
envoy_1     | [2020-04-17 07:35:57.953][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #15: Envoy::Server::InstanceImpl::run() [0x5585b2621d89]
envoy_1     | [2020-04-17 07:35:57.962][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #16: Envoy::MainCommonBase::run() [0x5585b1b474f8]
envoy_1     | [2020-04-17 07:35:57.970][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #17: main [0x5585b1b46132]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #18: __libc_start_main [0x7fc0b9767c8d]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x0
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:92] Envoy version: 3504d40f752eb5c20bc2883053547717bcb92fd8/1.14.1/Clean/RELEASE/BoringSSL
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:98] #0: [0x7fc0b991a3d0]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #1: Envoy::Upstream::GrpcHealthCheckerImpl::GrpcActiveHealthCheckSession::onRpcComplete() [0x5585b28683d8]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: Envoy::Upstream::GrpcHealthCheckerImpl::GrpcActiveHealthCheckSession::decodeTrailers() [0x5585b2868b85]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: Envoy::Http::ResponseDecoderWrapper::decodeTrailers() [0x5585b2801500]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: Envoy::Http::Http2::ConnectionImpl::onFrameReceived() [0x5585b29158a8]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: nghttp2_session_mem_recv [0x5585b2581de0]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: Envoy::Http::Http2::ConnectionImpl::dispatch() [0x5585b2914eca]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: Envoy::Http::CodecClient::onData() [0x5585b2871fd8]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #8: Envoy::Http::CodecClient::CodecReadFilter::onData() [0x5585b2872d2d]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #9: Envoy::Network::FilterManagerImpl::onContinueReading() [0x5585b269b603]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #10: Envoy::Network::ConnectionImpl::onReadReady() [0x5585b2697745]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #11: Envoy::Network::ConnectionImpl::onFileEvent() [0x5585b2696a9d]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #12: Envoy::Event::FileEventImpl::assignEvents()::$_0::__invoke() [0x5585b2691aa6]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #13: event_process_active_single_queue [0x5585b2ad0ecb]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #14: event_base_loop [0x5585b2acf75e]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #15: Envoy::Server::InstanceImpl::run() [0x5585b2621d89]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #16: Envoy::MainCommonBase::run() [0x5585b1b474f8]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #17: main [0x5585b1b46132]
envoy_1     | [2020-04-17 07:35:57.971][1][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #18: __libc_start_main [0x7fc0b9767c8d]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions