Skip to content

Draining a listener before RunHelper is invoked (and crashing) #7431

@asraa

Description

@asraa

Draining a listener before RunHelper is invoked (and crashing)

Description
I'm developing an xDS fuzzer that tests how Envoy handles sequences of dynamic xDS updates. A particular crash came up when I added a listener and then tried to remove it before the listener manager started workers on the added listeners. Removing and draining the listener requires the worker thread to exist with a fatal assert, so this causes a crash.

Reproducing
The sequence of xDS updates was:

  1. Respond to a CDS and EDS request, build cluster_0.
  2. Send RouteConfiguration DiscoveryResponse, build route_config_0.
  3. Send Listener DiscoveryResponse, add listener_0 referencing route_config_0 above.
  4. Send Listener DiscoveryResponse, remove listener_0.
    ---> crash at Assert

Questions

  1. Is it required that Envoy receives the RouteConfiguration DiscoveryResponse that the listener references strictly after the Listener DiscoveryResponse? If so, maybe instead of ASSERT(thread_) in drainListener, we should Assert for something like listener_success && thread_? Or if not....

Details
Flipping steps 2 and 3 would have been a successful run, since following the request to add listener_0 we would have responded with an update to the route config it references. At the server init manager level, a successful run would add target LDS to an unitialized server (+1 server init manager target count, with total count = 1), the RunHelper watcher is added to the init manager, lds starts to add listener_0, we add the RdsRouteConfigSubscription target (+1 target count, total count = 2), lds successfully adds the listener, target LDS is ready (-1 on count, total count = 1), Rds is ready (-1 on count, total count = 0). The server is now initialized so RunHelper is invoked and worker threads start.

On the other hand, in this case, the RunHelper callback (containing the startWorkers() call) is never invoked. target LDS is never ready. The server init manager then contains both target LDS and target RdsRouteConfigSubscription, but neither are ready and the server never initializes. Stopping the listener at this point results in a fatal crash.

Reproducing
I can reproduce this in a testcase in ads_integration_test.
https://github.com/asraa/envoy/blob/xdsfuzzdrain/test/integration/ads_integration_test.cc

Stack trace

[2019-06-28 20:02:01.965][397][critical][assert] [source/server/worker_impl.cc:90] assert failure: thread_.
[2019-06-28 20:02:01.965][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:81] Caught Aborted, suspect faulting address 0x6fad50000000e
[2019-06-28 20:02:01.965][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:69] Backtrace (use tools/stack_decode.py to get line numbers):
[2019-06-28 20:02:01.986][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #0: Envoy::SignalAction::sigHandler() [0x55e6bf23e26e]
[2019-06-28 20:02:01.986][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #1: __restore_rt [0x7f27e53f63a0]
[2019-06-28 20:02:02.006][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #2: Envoy::Server::ListenerManagerImpl::drainListener() [0x55e6bdd8ccd3]
...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions