Skip to content

Data race in callback manager / ClusterInfoImpl destruction #13209

@incfly

Description

@incfly

Title: Data race in callback managers' usage

Description:

When investigating an Envoy segfault issue, I think, the root cause is a data race for managing dynamic clusters from different threads without protection.

Basic Reproducing Steps

Reproducing requires to have a setup, where an envoy instance(ingressgateway in original case) has large number of dynamic clusters, and those clusters are using TLS socket. Those TLS socket should have the same SdsConfig as key source.

Then we constantly send the traffic to this envoy instance. Meanwhile, we constantly add & remove the dynamic clusters using some script automation.

Keep running setup as above for couple of hours, we can reliably get segfault with stacktrace as below.

Analysis

In one of the ASAN's run, when segfault, we see error report as below:

ERROR: AddressSanitizer: heap-use-after-free on address 0x6060033b07e0 at pc 0x5615ffe5b711 bp 0x7fbbb09c6480 sp 0x7fbbb09c6478
READ of size 8 at 0x6060033b07e0 thread T10
    #0 0x5615ffe5b710  (/usr/local/bin/envoy+0x54993710)
    #1 0x5615ffe5a8ae  (/usr/local/bin/envoy+0x549928ae)

0x6060033b07e0 is located 0 bytes inside of 64-byte region [0x6060033b07e0,0x6060033b0820)
freed by thread T9 here:
    #0 0x5615f4bbfccd  (/usr/local/bin/envoy+0x496f7ccd)
    #1 0x5615ffe55b44  (/usr/local/bin/envoy+0x5498db44)

previously allocated by thread T0 here:
    #0 0x5615f4bbff4d  (/usr/local/bin/envoy+0x496f7f4d)
    #1 0x7fbbba7a5297  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x93297)
    #2 0x5615ffe58c65  (/usr/local/bin/envoy+0x54990c65)

Thread T10 created by T0 here:
    #0 0x5615f4baa6da  (/usr/local/bin/envoy+0x496e26da)
    #1 0x5616031b6462  (/usr/local/bin/envoy+0x57cee462)

The thread ID is different. From the stacktrace [1], we see the segfault in CallbackManager::remove, link. And at the bottom of the trace, it's a ClusterEntry destructor, which would argubably owned by a ThreadLocalClusterManagerImpl.

At this point, I think it's obivous we have a race across multiple thread local cluster manager

  1. One cluster manager's call chains is trying to remove the call backs, doing iteration using std::list::remove_if, and erasing.
  2. Another cluster manager is also doing same call chains, but removing another element in the linked list.
  3. When one thread is removing element X, and another one is happen to read that one's iterator, and trying to do a dereference, in a remove_if's predicate. Segfault happens.

This also explains why when ASAN build is used, the triggering rate is much lower. taking 2 - 3X times to occur. This argubaly is because ASAN is slower therefore the the likelihood of triggering race is lower.

Code Path

I spent a bit time understanding the code path.

ClusterEntry owns a TransportSocketMatchesImpl, which owns some ClientSslSocketFactory, which owns owns some ClientContextConfigImpl, subclassed of ContextConfigImpl.

The ContextConfigImpl when created, is adding some callbacks to the corresponding SdsApi object. And the Callback_manager is owned by SdsApi, which is key/indexed on some xDS config registries. Therefore, when different clusters are using the same Sds config snippet, they would add and remove callbacks against the same SdsApi's callback manager. That's why we have access across different threads, I think.

Misc

Notably, some recent optimization solve/mitigates the issue, by removing the iterator precisely owned by callbackHolder, https://github.com/envoyproxy/envoy/pull/11751/files.

But I know seems this can still leave some potential error space if callback manager is not a thread safe lib, and used concurrently. @lambdai think even current changes is not bullet proof, because the erase from different threads is still not safe.

Is such pattern expected or not in Envoy in general?

Just a discussion thread, feel free to close.

stacktrace[1]

2020-09-03T21:51:33.093882Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x9b
2020-09-03T21:51:33.093924Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
2020-09-03T21:51:33.093936Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:92] Envoy version: 80ad06b26b3f97606143871e16268eb036ca7dcd/1.14.3-dev/Clean/DEBUG/BoringSSL
2020-09-03T21:51:33.165802Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #0: Envoy::SignalAction::sigHandler() [0x56508bf1199c]
2020-09-03T21:51:33.166219Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #1: __restore_rt [0x7fac99fca890]
2020-09-03T21:51:33.240003Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: std::__1::list<>::remove_if<>() [0x56508b378c0f]
2020-09-03T21:51:33.320640Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: Envoy::Common::CallbackManager<>::remove() [0x56508b378893]
2020-09-03T21:51:33.390044Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: Envoy::Common::CallbackManager<>::CallbackHolder::remove() [0x56508b3787ac]
2020-09-03T21:51:33.467216Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: Envoy::Extensions::TransportSockets::Tls::ContextConfigImpl::~ContextConfigImpl() [0x56508892a2da]
2020-09-03T21:51:33.552001Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: Envoy::Extensions::TransportSockets::Tls::ClientContextConfigImpl::~ClientContextConfigImpl() [0x5650889405ce]
2020-09-03T21:51:33.625260Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: Envoy::Extensions::TransportSockets::Tls::ClientContextConfigImpl::~ClientContextConfigImpl() [0x565088937283]
2020-09-03T21:51:33.709967Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #8: Envoy::Extensions::TransportSockets::Tls::ClientContextConfigImpl::~ClientContextConfigImpl() [0x5650889372bc]
2020-09-03T21:51:33.773332Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #9: std::__1::default_delete<>::operator()() [0x5650889024ef]
2020-09-03T21:51:33.838373Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #10: std::__1::unique_ptr<>::reset() [0x56508890246f]
2020-09-03T21:51:33.902990Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #11: std::__1::unique_ptr<>::~unique_ptr() [0x5650889021d9]
2020-09-03T21:51:33.973127Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #12: Envoy::Extensions::TransportSockets::Tls::ClientSslSocketFactory::~ClientSslSocketFactory() [0x56508891d427]
2020-09-03T21:51:34.035886Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #13: Envoy::Extensions::TransportSockets::Tls::ClientSslSocketFactory::~ClientSslSocketFactory() [0x56508891d46c]
2020-09-03T21:51:34.098260Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #14: std::__1::default_delete<>::operator()() [0x56508afb5a6f]
2020-09-03T21:51:34.161593Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #15: std::__1::unique_ptr<>::reset() [0x56508afb59ef]
2020-09-03T21:51:34.233574Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #16: std::__1::unique_ptr<>::~unique_ptr() [0x56508afb2689]
2020-09-03T21:51:34.302754Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #17: Envoy::Upstream::TransportSocketMatcherImpl::FactoryMatch::~FactoryMatch() [0x56508b7c1788]
2020-09-03T21:51:34.367747Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #18: std::__1::allocator<>::destroy() [0x56508b7c2029]
2020-09-03T21:51:34.431605Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #19: std::__1::allocator_traits<>::__destroy<>() [0x56508b7c1ffd]
2020-09-03T21:51:34.495537Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #20: std::__1::allocator_traits<>::destroy<>() [0x56508b7c1fcd]
2020-09-03T21:51:34.570006Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #21: std::__1::__vector_base<>::__destruct_at_end() [0x56508b7c1f7b]
2020-09-03T21:51:34.633817Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #22: std::__1::__vector_base<>::clear() [0x56508b7c1eab]
2020-09-03T21:51:34.695359Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #23: std::__1::__vector_base<>::~__vector_base() [0x56508b7c1d27]
2020-09-03T21:51:34.758774Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #24: std::__1::vector<>::~vector() [0x56508b7c17ed]
2020-09-03T21:51:34.823329Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #25: Envoy::Upstream::TransportSocketMatcherImpl::~TransportSocketMatcherImpl() [0x56508b7c1965]
2020-09-03T21:51:34.885652Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #26: Envoy::Upstream::TransportSocketMatcherImpl::~TransportSocketMatcherImpl() [0x56508b7c19ac]
2020-09-03T21:51:34.947873Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #27: std::__1::default_delete<>::operator()() [0x56508b72d55f]
2020-09-03T21:51:35.013069Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #28: std::__1::unique_ptr<>::reset() [0x56508b72d51f]
2020-09-03T21:51:35.080359Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #29: std::__1::unique_ptr<>::~unique_ptr() [0x56508b70fcf9]
2020-09-03T21:51:35.142802Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #30: Envoy::Upstream::ClusterInfoImpl::~ClusterInfoImpl() [0x56508b71270f]
2020-09-03T21:51:35.206567Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #31: Envoy::Upstream::ClusterInfoImpl::~ClusterInfoImpl() [0x56508b71275c]
2020-09-03T21:51:35.272197Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #32: std::__1::default_delete<>::operator()() [0x56508b73055f]
2020-09-03T21:51:35.337501Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #33: std::__1::__shared_ptr_pointer<>::__on_zero_shared() [0x56508b730999]
2020-09-03T21:51:35.399892Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #34: std::__1::__shared_weak_count::__release_shared() [0x56508ce228de]
2020-09-03T21:51:35.465251Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #35: Envoy::Upstream::HostDescriptionImpl::~HostDescriptionImpl() [0x565088ab163a]
2020-09-03T21:51:35.531791Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #36: Envoy::Upstream::HostImpl::~HostImpl() [0x565088ab06ec]
2020-09-03T21:51:35.593671Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #37: Envoy::Upstream::HostImpl::~HostImpl() [0x565088acc6b3]
2020-09-03T21:51:35.655156Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #38: Envoy::Upstream::HostImpl::~HostImpl() [0x565088acc6ec]
2020-09-03T21:51:35.733583Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #39: std::__1::default_delete<>::operator()() [0x565089a0325f]
2020-09-03T21:51:35.822359Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #40: std::__1::__shared_ptr_pointer<>::__on_zero_shared() [0x565089a03029]
2020-09-03T21:51:35.822412Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #41: std::__1::__shared_weak_count::__release_shared() [0x56508ce228de]
2020-09-03T21:51:35.910752Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #42: std::__1::allocator<>::destroy() [0x565088a921c9]
2020-09-03T21:51:36.000379Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #43: std::__1::allocator_traits<>::__destroy<>() [0x565088a9219d]
2020-09-03T21:51:36.090423Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #44: std::__1::allocator_traits<>::destroy<>() [0x565088a9216d]
2020-09-03T21:51:36.170932Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #45: std::__1::__vector_base<>::__destruct_at_end() [0x565088a9211b]
2020-09-03T21:51:36.235720Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #46: std::__1::__vector_base<>::clear() [0x565088a9204b]
2020-09-03T21:51:36.302353Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #47: std::__1::__vector_base<>::~__vector_base() [0x565088a91ec7]
2020-09-03T21:51:36.368921Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #48: std::__1::vector<>::~vector() [0x565088a8afad]
2020-09-03T21:51:36.435696Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #49: std::__1::default_delete<>::operator()() [0x565088aaafab]
2020-09-03T21:51:36.500174Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #50: std::__1::__shared_ptr_pointer<>::__on_zero_shared() [0x565088ab40b9]
2020-09-03T21:51:36.500223Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #51: std::__1::__shared_weak_count::__release_shared() [0x56508ce228de]
2020-09-03T21:51:36.562835Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #52: Envoy::Upstream::HostSetImpl::~HostSetImpl() [0x565089a0f8df]
2020-09-03T21:51:36.625790Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #53: Envoy::Upstream::HostSetImpl::~HostSetImpl() [0x56508b7124ac]
2020-09-03T21:51:36.689788Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #54: std::__1::default_delete<>::operator()() [0x565088a8f07f]
2020-09-03T21:51:36.752537Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #55: std::__1::unique_ptr<>::reset() [0x565088a8efff]
2020-09-03T21:51:36.815776Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #56: std::__1::unique_ptr<>::~unique_ptr() [0x565088a8ef99]
2020-09-03T21:51:36.878107Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #57: std::__1::allocator<>::destroy() [0x565088a8ef79]
2020-09-03T21:51:36.940770Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #58: std::__1::allocator_traits<>::__destroy<>() [0x565088a8ef4d]
2020-09-03T21:51:37.004230Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #59: std::__1::allocator_traits<>::destroy<>() [0x565088a8ef1d]
2020-09-03T21:51:37.067947Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #60: std::__1::__vector_base<>::__destruct_at_end() [0x565088a8eecb]
2020-09-03T21:51:37.132594Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #61: std::__1::__vector_base<>::clear() [0x565088a8ee2b]
2020-09-03T21:51:37.196798Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #62: std::__1::__vector_base<>::~__vector_base() [0x565088a8eca7]
2020-09-03T21:51:37.259910Z	critical	envoy backtrace	[bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #63: std::__1::vector<>::~vector() [0x565088a8eb5d]
2020-09-03T21:51:37.349748Z	info	sds	resource:ROOTCA connection is terminated: rpc error: code = Canceled desc = context canceled
2020-09-03T21:51:37.349792Z	error	sds	Remote side closed connection

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions