Skip to content

admin: add drain listeners endpoint #8415

Merged
mattklein123 merged 40 commits intoenvoyproxy:masterfrom
ramaraochavali:fix/listener_drain
Oct 22, 2019
Merged

admin: add drain listeners endpoint #8415
mattklein123 merged 40 commits intoenvoyproxy:masterfrom
ramaraochavali:fix/listener_drain

Conversation

@ramaraochavali
Copy link
Copy Markdown
Contributor

Description: Adds an admin endpoint that allows listeners to be drained.
Risk Level: Low, only admin initiated.
Testing: Added tests
Docs Changes: Updated
Release Notes: Added
Fixes #4460

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Copy Markdown
Contributor Author

@mattklein123 PTAL

@mattklein123 mattklein123 self-assigned this Sep 27, 2019
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks in general this looks good. I mainly have some comments about the tests. Thank you!

/wait

} // namespace

TEST_P(IntegrationAdminTest, Admin) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra newline here and below.

EXPECT_TRUE(response->complete());
EXPECT_EQ("200", response->headers().Status()->value().getStringView());
EXPECT_EQ("text/plain; charset=UTF-8", ContentType(response));
test_server_->waitForCounterGe("listener_manager.listener_removed", 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long do these tests take to run with real time? I'm guessing many seconds because of draining? I think you will likely need to modify the drain time for the listener or Envoy to make this shorter?

More generally, can you potentially do a test with actual requests to make sure that the listener is actually draining and then gone when we do this admin command?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It runs for 1s. For integration test server it is set here https://github.com/envoyproxy/envoy/blob/master/test/integration/server.cc#L42.

My intention was not not test the actual draining behaviour - but ensure that it is triggered as draining api would have already been tested. Added a new test now. Please see if that is what you are expecting?

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the added test, very nice. One question.

/wait-any

const bool inbound_only = params.find("inboundonly") != params.end();
for (const Network::ListenerConfig& listener : server_.listenerManager().listeners()) {
if (!inbound_only || listener.direction() == envoy::api::v2::core::TrafficDirection::INBOUND) {
server_.listenerManager().removeListener(listener.name());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In thinking about this more, I'm not convinced this is correct. I think if you remove the listener here, it's possible that LDS will add it back? Don't we want to stop the listener instead? WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assuming that it is operationally handled that this is invoked on SIGTERM and wait for drain time is done and then Envoy goes away any ways, it may not be a big issue because till the listener is draining it will not allow to add back?

Are you suggesting, we call the drain completion with empty call back here https://github.com/envoyproxy/envoy/blob/master/source/server/listener_manager_impl.cc#L668 for this case so that it won't remove?

Let me know if you want to handle that case - we will have to introduce a new API to listener manager called stopListener and call drain with null calback? or any other better way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and added api for stopListener. Please see if this better.

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this. A high level comment before diving into detailed review.

/wait-any

return num_connections;
}

bool ListenerManagerImpl::stopListener(const std::string& name) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can simplify this change by just making this API take a ListenerConfig, and then just call the worker stop listener API directly with that config? I think this will work correctly.

I think no matter we have a potential race condition here in which if we remove anything, LDS will come and add it back later I think. Given this, it seems better to just stop the listeners directly? This would not cover the warming case, though that might be good enough? I think to completely remove any race conditions you would need to kill warming listeners and keep them from coming back by name? I'm really not sure if this type of complexity is worth it, but might be worth a bunch of comments.

WDYT?

@ramaraochavali
Copy link
Copy Markdown
Contributor Author

ramaraochavali commented Oct 2, 2019 via email

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Copy Markdown
Contributor Author

@mattklein123 addressed the feedback. PTAL

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM with a few small comments.

/wait

listeners_type_ = listeners_type;
for (Network::ListenerConfig& listener : listeners()) {
for (const auto& worker : workers_) {
if (listeners_type_.has_value() &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must have a value, right? You set it above.

Copy link
Copy Markdown
Contributor Author

@ramaraochavali ramaraochavali Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :-). Morning coffee needed..

std::list<DrainingListener> draining_listeners_;
std::list<WorkerPtr> workers_;
bool workers_started_{};
absl::optional<StopListenersType> listeners_type_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stop_listeners_type_


upstream_request_->encodeHeaders(default_response_headers_, false);

// Invoke drain listeners endpoint and validate that the listener is actually draining.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How have you verified it is draining?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it validates that after the listener drain is invoked, we can send some data. The drain part is validated in the listener_manager_impl_test by expecting on stopListener call on worker

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the change LGTM but I would really like to see if we can come up with a stronger integration test. Can you take a look and see if you have any ideas? Thank you.

/wait

void ListenerManagerImpl::stopListeners() {
for (const auto& worker : workers_) {
worker->stopListeners();
void ListenerManagerImpl::stopListeners(StopListenersType listeners_type) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/listeners_type/stop_listeners_type (this will fall clang-tidy at some point)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry. missed this. Will change

} // namespace

TEST_P(IntegrationAdminTest, Admin) {
config_helper_.addConfigModifier([&](envoy::config::bootstrap::v2::Bootstrap& bootstrap) -> void {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this add anything to this test? We aren't verifying any behavior below AFAICT? This is why I have been asking if there is any way that we can actually verify that we are draining a listener. Even if there is no stat could we potentially look for an info log either in this test or in the proper integration test?

Copy link
Copy Markdown
Contributor Author

@ramaraochavali ramaraochavali Oct 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does. It modifies the existing listener and sets the traffic direction as OUTBOUND. But drain_listeners should drain listeners irrespective of direction, this is not needed in this test. So removed. I have added the stat now - Now it should be clear. PTAL

Copy link
Copy Markdown
Contributor Author

@ramaraochavali ramaraochavali Oct 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also brought back the second test, because with stat we can validate that behaviour as well, the listener configured with INBOUND will be stopped with ?inboundonly call.

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM with some small remaining comments.

/wait

total_listeners_warming, Gauge, Number of currently warming listeners
total_listeners_active, Gauge, Number of currently active listeners
total_listeners_draining, Gauge, Number of currently draining listeners
total_listeners_stopped, Gauge, Number of listeners that were stopped
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like this should just be a counter? If not don't you need to keep track of stopped listeners that have been removed, etc.?

checkStats(2, 1, 2, 0, 0, 0);
}

// Validates that StopListener functionality works correctly when only inbound listeners are
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the stopped stat (counter?) here also.

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Copy Markdown
Contributor Author

@mattklein123 moved to counter and added verification in listener_manager_impl_test as well. PTAL

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rama, LGTM!

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@mattklein123 mattklein123 merged commit 552847c into envoyproxy:master Oct 22, 2019
@ramaraochavali ramaraochavali deleted the fix/listener_drain branch October 23, 2019 02:12
spenceral added a commit to spenceral/envoy that referenced this pull request Oct 23, 2019
* master: (54 commits)
  Update dependencies - Go, Bazel toos, xxHash, nanopb, rules_go, curl, protobuf, Jinja, yaml-cpp, re2 (envoyproxy#8728)
  test: increase coverage of listener_manager_impl.cc (envoyproxy#8737)
  test: modify some macros to reduce number of uncovered lines reported (envoyproxy#8725)
  build: add a script to setup clang (envoyproxy#8682)
  http: fix ssl_redirect on external (envoyproxy#8664)
  docs: update fedora build requirements (envoyproxy#8641)
  fix draining listener removal logs (envoyproxy#8733)
  dubbo: fix router doc (envoyproxy#8734)
  server: provide server health status when stats disabled (envoyproxy#8482)
  router: adding a knob to configure a cap on buffering for shadowing/retries (envoyproxy#8574)
  tcp proxy: add default 1 hour idle timeout (envoyproxy#8705)
  thrift: fix filter names in docs (envoyproxy#8726)
  Quiche changes to avoid gcc flags on Windows (envoyproxy#8514)
  test: increase test coverage in Router::HeaderParser (envoyproxy#8721)
  admin: add drain listeners endpoint  (envoyproxy#8415)
  buffer filter: add content-length when ending stream with trailers (envoyproxy#8609)
  clarify draining option docs (envoyproxy#8712)
  build: ignore go-control-plane mirror git commit error code (envoyproxy#8703)
  api: remove API type database from checked in artifacts. (envoyproxy#8716)
  admin: correct help strings (envoyproxy#8710)
  ...

Signed-off-by: Spencer Lewis <slewis@squareup.com>
derekargueta pushed a commit to derekargueta/envoy that referenced this pull request Oct 24, 2019
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to drain the listeners using the admin interface

4 participants