-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Title: Data race in callback managers' usage
Description:
When investigating an Envoy segfault issue, I think, the root cause is a data race for managing dynamic clusters from different threads without protection.
Basic Reproducing Steps
Reproducing requires to have a setup, where an envoy instance(ingressgateway in original case) has large number of dynamic clusters, and those clusters are using TLS socket. Those TLS socket should have the same SdsConfig as key source.
Then we constantly send the traffic to this envoy instance. Meanwhile, we constantly add & remove the dynamic clusters using some script automation.
Keep running setup as above for couple of hours, we can reliably get segfault with stacktrace as below.
Analysis
In one of the ASAN's run, when segfault, we see error report as below:
ERROR: AddressSanitizer: heap-use-after-free on address 0x6060033b07e0 at pc 0x5615ffe5b711 bp 0x7fbbb09c6480 sp 0x7fbbb09c6478
READ of size 8 at 0x6060033b07e0 thread T10
#0 0x5615ffe5b710 (/usr/local/bin/envoy+0x54993710)
#1 0x5615ffe5a8ae (/usr/local/bin/envoy+0x549928ae)
0x6060033b07e0 is located 0 bytes inside of 64-byte region [0x6060033b07e0,0x6060033b0820)
freed by thread T9 here:
#0 0x5615f4bbfccd (/usr/local/bin/envoy+0x496f7ccd)
#1 0x5615ffe55b44 (/usr/local/bin/envoy+0x5498db44)
previously allocated by thread T0 here:
#0 0x5615f4bbff4d (/usr/local/bin/envoy+0x496f7f4d)
#1 0x7fbbba7a5297 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x93297)
#2 0x5615ffe58c65 (/usr/local/bin/envoy+0x54990c65)
Thread T10 created by T0 here:
#0 0x5615f4baa6da (/usr/local/bin/envoy+0x496e26da)
#1 0x5616031b6462 (/usr/local/bin/envoy+0x57cee462)
The thread ID is different. From the stacktrace [1], we see the segfault in CallbackManager::remove, link. And at the bottom of the trace, it's a ClusterEntry destructor, which would argubably owned by a ThreadLocalClusterManagerImpl.
At this point, I think it's obivous we have a race across multiple thread local cluster manager
- One cluster manager's call chains is trying to remove the call backs, doing iteration using std::list::remove_if, and erasing.
- Another cluster manager is also doing same call chains, but removing another element in the linked list.
- When one thread is removing element X, and another one is happen to read that one's iterator, and trying to do a dereference, in a remove_if's predicate. Segfault happens.
This also explains why when ASAN build is used, the triggering rate is much lower. taking 2 - 3X times to occur. This argubaly is because ASAN is slower therefore the the likelihood of triggering race is lower.
Code Path
I spent a bit time understanding the code path.
ClusterEntry owns a TransportSocketMatchesImpl, which owns some ClientSslSocketFactory, which owns owns some ClientContextConfigImpl, subclassed of ContextConfigImpl.
The ContextConfigImpl when created, is adding some callbacks to the corresponding SdsApi object. And the Callback_manager is owned by SdsApi, which is key/indexed on some xDS config registries. Therefore, when different clusters are using the same Sds config snippet, they would add and remove callbacks against the same SdsApi's callback manager. That's why we have access across different threads, I think.
Misc
Notably, some recent optimization solve/mitigates the issue, by removing the iterator precisely owned by callbackHolder, https://github.com/envoyproxy/envoy/pull/11751/files.
But I know seems this can still leave some potential error space if callback manager is not a thread safe lib, and used concurrently. @lambdai think even current changes is not bullet proof, because the erase from different threads is still not safe.
Is such pattern expected or not in Envoy in general?
Just a discussion thread, feel free to close.
stacktrace[1]
2020-09-03T21:51:33.093882Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x9b
2020-09-03T21:51:33.093924Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
2020-09-03T21:51:33.093936Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:92] Envoy version: 80ad06b26b3f97606143871e16268eb036ca7dcd/1.14.3-dev/Clean/DEBUG/BoringSSL
2020-09-03T21:51:33.165802Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #0: Envoy::SignalAction::sigHandler() [0x56508bf1199c]
2020-09-03T21:51:33.166219Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #1: __restore_rt [0x7fac99fca890]
2020-09-03T21:51:33.240003Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: std::__1::list<>::remove_if<>() [0x56508b378c0f]
2020-09-03T21:51:33.320640Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: Envoy::Common::CallbackManager<>::remove() [0x56508b378893]
2020-09-03T21:51:33.390044Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: Envoy::Common::CallbackManager<>::CallbackHolder::remove() [0x56508b3787ac]
2020-09-03T21:51:33.467216Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: Envoy::Extensions::TransportSockets::Tls::ContextConfigImpl::~ContextConfigImpl() [0x56508892a2da]
2020-09-03T21:51:33.552001Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: Envoy::Extensions::TransportSockets::Tls::ClientContextConfigImpl::~ClientContextConfigImpl() [0x5650889405ce]
2020-09-03T21:51:33.625260Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: Envoy::Extensions::TransportSockets::Tls::ClientContextConfigImpl::~ClientContextConfigImpl() [0x565088937283]
2020-09-03T21:51:33.709967Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #8: Envoy::Extensions::TransportSockets::Tls::ClientContextConfigImpl::~ClientContextConfigImpl() [0x5650889372bc]
2020-09-03T21:51:33.773332Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #9: std::__1::default_delete<>::operator()() [0x5650889024ef]
2020-09-03T21:51:33.838373Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #10: std::__1::unique_ptr<>::reset() [0x56508890246f]
2020-09-03T21:51:33.902990Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #11: std::__1::unique_ptr<>::~unique_ptr() [0x5650889021d9]
2020-09-03T21:51:33.973127Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #12: Envoy::Extensions::TransportSockets::Tls::ClientSslSocketFactory::~ClientSslSocketFactory() [0x56508891d427]
2020-09-03T21:51:34.035886Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #13: Envoy::Extensions::TransportSockets::Tls::ClientSslSocketFactory::~ClientSslSocketFactory() [0x56508891d46c]
2020-09-03T21:51:34.098260Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #14: std::__1::default_delete<>::operator()() [0x56508afb5a6f]
2020-09-03T21:51:34.161593Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #15: std::__1::unique_ptr<>::reset() [0x56508afb59ef]
2020-09-03T21:51:34.233574Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #16: std::__1::unique_ptr<>::~unique_ptr() [0x56508afb2689]
2020-09-03T21:51:34.302754Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #17: Envoy::Upstream::TransportSocketMatcherImpl::FactoryMatch::~FactoryMatch() [0x56508b7c1788]
2020-09-03T21:51:34.367747Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #18: std::__1::allocator<>::destroy() [0x56508b7c2029]
2020-09-03T21:51:34.431605Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #19: std::__1::allocator_traits<>::__destroy<>() [0x56508b7c1ffd]
2020-09-03T21:51:34.495537Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #20: std::__1::allocator_traits<>::destroy<>() [0x56508b7c1fcd]
2020-09-03T21:51:34.570006Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #21: std::__1::__vector_base<>::__destruct_at_end() [0x56508b7c1f7b]
2020-09-03T21:51:34.633817Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #22: std::__1::__vector_base<>::clear() [0x56508b7c1eab]
2020-09-03T21:51:34.695359Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #23: std::__1::__vector_base<>::~__vector_base() [0x56508b7c1d27]
2020-09-03T21:51:34.758774Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #24: std::__1::vector<>::~vector() [0x56508b7c17ed]
2020-09-03T21:51:34.823329Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #25: Envoy::Upstream::TransportSocketMatcherImpl::~TransportSocketMatcherImpl() [0x56508b7c1965]
2020-09-03T21:51:34.885652Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #26: Envoy::Upstream::TransportSocketMatcherImpl::~TransportSocketMatcherImpl() [0x56508b7c19ac]
2020-09-03T21:51:34.947873Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #27: std::__1::default_delete<>::operator()() [0x56508b72d55f]
2020-09-03T21:51:35.013069Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #28: std::__1::unique_ptr<>::reset() [0x56508b72d51f]
2020-09-03T21:51:35.080359Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #29: std::__1::unique_ptr<>::~unique_ptr() [0x56508b70fcf9]
2020-09-03T21:51:35.142802Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #30: Envoy::Upstream::ClusterInfoImpl::~ClusterInfoImpl() [0x56508b71270f]
2020-09-03T21:51:35.206567Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #31: Envoy::Upstream::ClusterInfoImpl::~ClusterInfoImpl() [0x56508b71275c]
2020-09-03T21:51:35.272197Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #32: std::__1::default_delete<>::operator()() [0x56508b73055f]
2020-09-03T21:51:35.337501Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #33: std::__1::__shared_ptr_pointer<>::__on_zero_shared() [0x56508b730999]
2020-09-03T21:51:35.399892Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #34: std::__1::__shared_weak_count::__release_shared() [0x56508ce228de]
2020-09-03T21:51:35.465251Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #35: Envoy::Upstream::HostDescriptionImpl::~HostDescriptionImpl() [0x565088ab163a]
2020-09-03T21:51:35.531791Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #36: Envoy::Upstream::HostImpl::~HostImpl() [0x565088ab06ec]
2020-09-03T21:51:35.593671Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #37: Envoy::Upstream::HostImpl::~HostImpl() [0x565088acc6b3]
2020-09-03T21:51:35.655156Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #38: Envoy::Upstream::HostImpl::~HostImpl() [0x565088acc6ec]
2020-09-03T21:51:35.733583Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #39: std::__1::default_delete<>::operator()() [0x565089a0325f]
2020-09-03T21:51:35.822359Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #40: std::__1::__shared_ptr_pointer<>::__on_zero_shared() [0x565089a03029]
2020-09-03T21:51:35.822412Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #41: std::__1::__shared_weak_count::__release_shared() [0x56508ce228de]
2020-09-03T21:51:35.910752Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #42: std::__1::allocator<>::destroy() [0x565088a921c9]
2020-09-03T21:51:36.000379Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #43: std::__1::allocator_traits<>::__destroy<>() [0x565088a9219d]
2020-09-03T21:51:36.090423Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #44: std::__1::allocator_traits<>::destroy<>() [0x565088a9216d]
2020-09-03T21:51:36.170932Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #45: std::__1::__vector_base<>::__destruct_at_end() [0x565088a9211b]
2020-09-03T21:51:36.235720Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #46: std::__1::__vector_base<>::clear() [0x565088a9204b]
2020-09-03T21:51:36.302353Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #47: std::__1::__vector_base<>::~__vector_base() [0x565088a91ec7]
2020-09-03T21:51:36.368921Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #48: std::__1::vector<>::~vector() [0x565088a8afad]
2020-09-03T21:51:36.435696Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #49: std::__1::default_delete<>::operator()() [0x565088aaafab]
2020-09-03T21:51:36.500174Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #50: std::__1::__shared_ptr_pointer<>::__on_zero_shared() [0x565088ab40b9]
2020-09-03T21:51:36.500223Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #51: std::__1::__shared_weak_count::__release_shared() [0x56508ce228de]
2020-09-03T21:51:36.562835Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #52: Envoy::Upstream::HostSetImpl::~HostSetImpl() [0x565089a0f8df]
2020-09-03T21:51:36.625790Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #53: Envoy::Upstream::HostSetImpl::~HostSetImpl() [0x56508b7124ac]
2020-09-03T21:51:36.689788Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #54: std::__1::default_delete<>::operator()() [0x565088a8f07f]
2020-09-03T21:51:36.752537Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #55: std::__1::unique_ptr<>::reset() [0x565088a8efff]
2020-09-03T21:51:36.815776Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #56: std::__1::unique_ptr<>::~unique_ptr() [0x565088a8ef99]
2020-09-03T21:51:36.878107Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #57: std::__1::allocator<>::destroy() [0x565088a8ef79]
2020-09-03T21:51:36.940770Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #58: std::__1::allocator_traits<>::__destroy<>() [0x565088a8ef4d]
2020-09-03T21:51:37.004230Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #59: std::__1::allocator_traits<>::destroy<>() [0x565088a8ef1d]
2020-09-03T21:51:37.067947Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #60: std::__1::__vector_base<>::__destruct_at_end() [0x565088a8eecb]
2020-09-03T21:51:37.132594Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #61: std::__1::__vector_base<>::clear() [0x565088a8ee2b]
2020-09-03T21:51:37.196798Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #62: std::__1::__vector_base<>::~__vector_base() [0x565088a8eca7]
2020-09-03T21:51:37.259910Z critical envoy backtrace [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #63: std::__1::vector<>::~vector() [0x565088a8eb5d]
2020-09-03T21:51:37.349748Z info sds resource:ROOTCA connection is terminated: rpc error: code = Canceled desc = context canceled
2020-09-03T21:51:37.349792Z error sds Remote side closed connection