Draining a listener before RunHelper is invoked (and crashing)

**Draining a listener before RunHelper is invoked (and crashing)**

**Description**
I'm developing an xDS fuzzer that tests how Envoy handles sequences of dynamic xDS updates. A particular crash came up when I added a listener and then tried to remove it before the listener manager started workers on the added listeners. Removing and draining the listener requires the worker thread to exist with a fatal assert, so this causes a crash. 

**Reproducing**
The sequence of xDS updates was:
1. Respond to a CDS and EDS request, build cluster_0.
2. Send RouteConfiguration DiscoveryResponse, build route_config_0.
3. Send Listener DiscoveryResponse, add listener_0 referencing route_config_0 above.
4. Send Listener DiscoveryResponse, remove listener_0.
    ---> crash at Assert

**Questions**
1) Is it *required* that Envoy receives the RouteConfiguration DiscoveryResponse that the listener references strictly *after* the Listener DiscoveryResponse? If so, maybe instead of ASSERT(thread_) in drainListener, we should Assert for something like listener_success  && thread_? Or if not....

**Details**
Flipping steps 2 and 3 would have been a *successful run*, since following the request to add listener_0 we would have responded with an update to the route config it references. At the server init manager level, a successful run would add target LDS to an unitialized server (+1 server init manager target count, with total count = 1), the RunHelper watcher is added to the init manager, lds starts to add listener_0, we add the RdsRouteConfigSubscription target (+1 target count, total count = 2), lds successfully adds the listener, target LDS is ready (-1 on count, total count = 1), Rds is ready (-1 on count, total count = 0). The server is now initialized so RunHelper is invoked and worker threads start.

On the other hand, in this case, the RunHelper callback (containing the startWorkers() call) is never invoked. target LDS is never ready. The server init manager then contains both target LDS and target RdsRouteConfigSubscription, but neither are ready and the server never initializes. Stopping the listener at this point results in a fatal crash.

**Reproducing**
I can reproduce this in a testcase in ads_integration_test.
https://github.com/asraa/envoy/blob/xdsfuzzdrain/test/integration/ads_integration_test.cc

**Stack trace**
> [2019-06-28 20:02:01.965][397][critical][assert] [source/server/worker_impl.cc:90] **assert failure: thread_.**
> [2019-06-28 20:02:01.965][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:81] Caught Aborted, suspect faulting address 0x6fad50000000e
> [2019-06-28 20:02:01.965][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:69] Backtrace (use tools/stack_decode.py to get line numbers):
> [2019-06-28 20:02:01.986][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #0: Envoy::SignalAction::sigHandler() [0x55e6bf23e26e]
> [2019-06-28 20:02:01.986][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #1: __restore_rt [0x7f27e53f63a0]
> [2019-06-28 20:02:02.006][397][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #2: **Envoy::Server::ListenerManagerImpl::drainListener() [0x55e6bdd8ccd3]**
> ...
> 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draining a listener before RunHelper is invoked (and crashing) #7431

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Draining a listener before RunHelper is invoked (and crashing) #7431

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions