-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Segmentation fault on EDS endpoint priority updates with active health checking #6395
Description
Title: Envoy occasionally crashes after updating endpoint priorities in EDS for clusters configured with active health checking
Description:
Our setup models localities for different zones with different priorities. Hosts are assigned per locality based on selected traffic configuration. E.g. when zonal affinity is selected, control plane sets hosts in the same zone to highest priority, and endpoints in other localities get lower priority to allow spill over as hosts in the same zone become unhealthy. If no zonal affinity is configured, all hosts are in the same locality with equal priorities. We model localities as static groups with pre-assigned priorities, and move hosts based on desired configuration: same-zone: pri 0, other zones: pri 1, all-zones: pri 2
Moving hosts across localities with different priorities works perfectly fine with no active health checking configured. With active health checking, moving hosts to lower priority localities does not seem to take effect, and traffic still goes to the hosts previously in the higher priority locality, which I expect is because they are still seen healthy post EDS removal from the higher priority locality, following Envoy's eventual consistency assumptions for service discovery.
Adding drain_connections_on_host_removal configuration to the cluster should mitigate this issue, but I frequently get segmentation fault errors once added with the stack trace below. Once Envoy restarts with the updated configuration, traffic now goes to hosts in lower priority as expected.
Even with drain_connections_on_host_removal I still occasionally see the issue, but it is more reproducible with connection-draining configured
Relevant Links:
I believe this is related to the issue: #6282. Stack trace looks similar
Envoy Version: 1.9 release "@ 37bfd8a"
Repro steps:
- Configure EDS with two
LocalityLbEndpointsrepresenting different localities; locality-1 with some endpoints, and priority 0, and locality-2 with no endpoints, and priority 1 - Ensure Cluster has active health checking configured, and
drain_connections_on_host_removalset to true - Move endpoints from locality-1 to locality-2.
I usually see the problem right away withdrain_connections_on_host_removalbut sometimes it triggers in the second or third iteration of moving hosts between localities.
Call Stack:
[2019-03-26 06:41:46.121][000062][debug][upstream] [external/envoy/source/common/config/grpc_mux_impl.cc:193] Received gRPC message for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment at version e08a0993aed2ba7517bb47fe41d080ad
[2019-03-26 06:41:46.123][000062][debug][upstream] [external/envoy/source/common/upstream/eds.cc:158] EDS hosts or locality weights changed for cluster: cluster.clb-target.c1.gcp-us-central1 current hosts 5 priority 0
[2019-03-26 06:41:46.123][000062][critical][assert] [external/envoy/source/common/upstream/health_checker_base_impl.cc:126] assert failure: active_sessions_.end() != session_iter.
[2019-03-26 06:41:46.123][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:125] Caught Aborted, suspect faulting address 0x3e
[2019-03-26 06:41:46.124][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:94] Backtrace thr<0> obj</lib/x86_64-linux-gnu/libc.so.6> (If unsymbolized, use tools/stack_decode.py):
[2019-03-26 06:41:46.124][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:117] thr<0> #0 0x7f7b8cf4ae97 (unknown)
[2019-03-26 06:41:46.124][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:117] thr<0> #1 0x7f7b8cf4c800 (unknown)
[2019-03-26 06:41:46.124][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] thr<0> obj</opt/envoy/bin/envoy>
[2019-03-26 06:41:46.141][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #2 0xdea6d7 Envoy::Upstream::HealthCheckerImplBase::onClusterMemberUpdate()
[2019-03-26 06:41:46.156][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #3 0xde8c74 Envoy::Upstream::HealthCheckerImplBase::HealthCheckerImplBase()::{lambda()#1}::operator()()
[2019-03-26 06:41:46.172][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #4 0xdec2c3 std::_Function_handler<>::_M_invoke()
[2019-03-26 06:41:46.187][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #5 0xbc36e6 std::function<>::operator()()
[2019-03-26 06:41:46.203][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #6 0xbbcb38 _ZN5Envoy6Common15CallbackManagerIIjRKSt6vectorISt10shared_ptrINS_8Upstream4HostEESaIS6_EESA_EE12runCallbacksEjSA_SA_
[2019-03-26 06:41:46.218][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #7 0xbb9d54 Envoy::Upstream::PrioritySetImpl::runUpdateCallbacks()
[2019-03-26 06:41:46.233][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #8 0xd76685 Envoy::Upstream::PrioritySetImpl::getOrCreateHostSet()::{lambda()#1}::operator()()
[2019-03-26 06:41:46.249][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #9 0xd82cbb std::_Function_handler<>::_M_invoke()
[2019-03-26 06:41:46.249][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #10 0xbc36e6 std::function<>::operator()()
[2019-03-26 06:41:46.249][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #11 0xbbcb38 _ZN5Envoy6Common15CallbackManagerIIjRKSt6vectorISt10shared_ptrINS_8Upstream4HostEESaIS6_EESA_EE12runCallbacksEjSA_SA_
[2019-03-26 06:41:46.265][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #12 0xc04641 Envoy::Upstream::HostSetImpl::runUpdateCallbacks()
[2019-03-26 06:41:46.280][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #13 0xd757b1 Envoy::Upstream::HostSetImpl::updateHosts()
[2019-03-26 06:41:46.296][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #14 0xd7f467 Envoy::Upstream::PriorityStateManager::updateClusterPrioritySet()
[2019-03-26 06:41:46.311][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #15 0xdaa0c3 Envoy::Upstream::EdsClusterImpl::updateHostsPerLocality()
[2019-03-26 06:41:46.327][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #16 0xda9950 Envoy::Upstream::EdsClusterImpl::onConfigUpdate()
[2019-03-26 06:41:46.342][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #17 0xdafdbe Envoy::Config::GrpcMuxSubscriptionImpl<>::onConfigUpdate()
[2019-03-26 06:41:46.357][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #18 0xdb74a7 Envoy::Config::GrpcMuxImpl::onReceiveMessage()
[2019-03-26 06:41:46.373][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #19 0xdbe049 Envoy::Grpc::TypedAsyncStreamCallbacks<>::onReceiveMessageUntyped()
[2019-03-26 06:41:46.388][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #20 0xe0494e Envoy::Grpc::AsyncStreamImpl::onData()
[2019-03-26 06:41:46.404][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #21 0xe09992 Envoy::Http::AsyncStreamImpl::encodeData()
[2019-03-26 06:41:46.420][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #22 0xe175b1 Envoy::Router::Filter::onUpstreamData()
[2019-03-26 06:41:46.435][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #23 0xe19118 Envoy::Router::Filter::UpstreamRequest::decodeData()
[2019-03-26 06:41:46.451][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #24 0xd600d2 Envoy::Http::StreamDecoderWrapper::decodeData()
[2019-03-26 06:41:46.466][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #25 0xea0980 Envoy::Http::Http2::ConnectionImpl::onFrameReceived()
[2019-03-26 06:41:46.482][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #26 0xea29e6 Envoy::Http::Http2::ConnectionImpl::Http2Callbacks::Http2Callbacks()::{lambda()#6}::operator()()
[2019-03-26 06:41:46.497][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #27 0xea2a16 Envoy::Http::Http2::ConnectionImpl::Http2Callbacks::Http2Callbacks()::{lambda()#6}::_FUN()
[2019-03-26 06:41:46.512][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #28 0xeb5af6 nghttp2_session_on_data_received
[2019-03-26 06:41:46.528][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #29 0xeb9d81 nghttp2_session_mem_recv
[2019-03-26 06:41:46.543][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #30 0xe9fb5b Envoy::Http::Http2::ConnectionImpl::dispatch()
[2019-03-26 06:41:46.559][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #31 0xdf0a54 Envoy::Http::CodecClient::onData()
[2019-03-26 06:41:46.574][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #32 0xdf1477 Envoy::Http::CodecClient::CodecReadFilter::onData()
[2019-03-26 06:41:46.589][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #33 0xb29850 Envoy::Network::FilterManagerImpl::onContinueReading()
[2019-03-26 06:41:46.605][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #34 0xb29964 Envoy::Network::FilterManagerImpl::onRead()
[2019-03-26 06:41:46.620][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #35 0xb20814 Envoy::Network::ConnectionImpl::onRead()
[2019-03-26 06:41:46.636][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #36 0xb21ea1 Envoy::Network::ConnectionImpl::onReadReady()
[2019-03-26 06:41:46.651][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #37 0xb21c42 Envoy::Network::ConnectionImpl::onFileEvent()
[2019-03-26 06:41:46.667][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #38 0xb1e528 Envoy::Network::ConnectionImpl::ConnectionImpl()::{lambda()#3}::operator()()
[2019-03-26 06:41:46.682][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #39 0xb23be5 std::_Function_handler<>::_M_invoke()
[2019-03-26 06:41:46.697][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #40 0xb171d7 std::function<>::operator()()
[2019-03-26 06:41:46.713][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #41 0xb16edc Envoy::Event::FileEventImpl::assignEvents()::{lambda()#1}::operator()()
[2019-03-26 06:41:46.728][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #42 0xb16f5b Envoy::Event::FileEventImpl::assignEvents()::{lambda()#1}::_FUN()
[2019-03-26 06:41:46.744][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #43 0x11759b4 event_process_active_single_queue.isra.29
[2019-03-26 06:41:46.759][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #44 0x11760fe event_process_active
[2019-03-26 06:41:46.774][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #45 0x1178e87 event_base_loop
[2019-03-26 06:41:46.790][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #46 0xb1262b Envoy::Event::DispatcherImpl::run()
[2019-03-26 06:41:46.805][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #47 0xa968c6 Envoy::Server::InstanceImpl::run()
[2019-03-26 06:41:46.821][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #48 0x4d96d1 Envoy::MainCommonBase::run()
[2019-03-26 06:41:46.836][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #49 0x4d7b21 Envoy::MainCommon::run()
[2019-03-26 06:41:46.851][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #50 0x4d75ee main
[2019-03-26 06:41:46.851][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] thr<0> obj</lib/x86_64-linux-gnu/libc.so.6>
[2019-03-26 06:41:46.851][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:117] thr<0> #51 0x7f7b8cf2db96 (unknown)
[2019-03-26 06:41:46.851][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] thr<0> obj</opt/envoy/bin/envoy>
[2019-03-26 06:41:46.866][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<0> #52 0x41c0e8 _start
[2019-03-26 06:41:46.866][000062][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:121] end backtrace thread 0
/opt/envoy/bin/entrypoint.sh: line 68: 62 Aborted (core dumped) $ROOT/bin/envoy.sh
+ wait