admin: add drain listeners endpoint #8415
Conversation
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@mattklein123 PTAL |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks in general this looks good. I mainly have some comments about the tests. Thank you!
/wait
| } // namespace | ||
|
|
||
| TEST_P(IntegrationAdminTest, Admin) { | ||
|
|
There was a problem hiding this comment.
nit: remove extra newline here and below.
| EXPECT_TRUE(response->complete()); | ||
| EXPECT_EQ("200", response->headers().Status()->value().getStringView()); | ||
| EXPECT_EQ("text/plain; charset=UTF-8", ContentType(response)); | ||
| test_server_->waitForCounterGe("listener_manager.listener_removed", 1); |
There was a problem hiding this comment.
How long do these tests take to run with real time? I'm guessing many seconds because of draining? I think you will likely need to modify the drain time for the listener or Envoy to make this shorter?
More generally, can you potentially do a test with actual requests to make sure that the listener is actually draining and then gone when we do this admin command?
There was a problem hiding this comment.
It runs for 1s. For integration test server it is set here https://github.com/envoyproxy/envoy/blob/master/test/integration/server.cc#L42.
My intention was not not test the actual draining behaviour - but ensure that it is triggered as draining api would have already been tested. Added a new test now. Please see if that is what you are expecting?
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for the added test, very nice. One question.
/wait-any
source/server/http/admin.cc
Outdated
| const bool inbound_only = params.find("inboundonly") != params.end(); | ||
| for (const Network::ListenerConfig& listener : server_.listenerManager().listeners()) { | ||
| if (!inbound_only || listener.direction() == envoy::api::v2::core::TrafficDirection::INBOUND) { | ||
| server_.listenerManager().removeListener(listener.name()); |
There was a problem hiding this comment.
In thinking about this more, I'm not convinced this is correct. I think if you remove the listener here, it's possible that LDS will add it back? Don't we want to stop the listener instead? WDYT?
There was a problem hiding this comment.
I think assuming that it is operationally handled that this is invoked on SIGTERM and wait for drain time is done and then Envoy goes away any ways, it may not be a big issue because till the listener is draining it will not allow to add back?
Are you suggesting, we call the drain completion with empty call back here https://github.com/envoyproxy/envoy/blob/master/source/server/listener_manager_impl.cc#L668 for this case so that it won't remove?
Let me know if you want to handle that case - we will have to introduce a new API to listener manager called stopListener and call drain with null calback? or any other better way?
There was a problem hiding this comment.
Went ahead and added api for stopListener. Please see if this better.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for iterating on this. A high level comment before diving into detailed review.
/wait-any
| return num_connections; | ||
| } | ||
|
|
||
| bool ListenerManagerImpl::stopListener(const std::string& name) { |
There was a problem hiding this comment.
I'm wondering if we can simplify this change by just making this API take a ListenerConfig, and then just call the worker stop listener API directly with that config? I think this will work correctly.
I think no matter we have a potential race condition here in which if we remove anything, LDS will come and add it back later I think. Given this, it seems better to just stop the listeners directly? This would not cover the warming case, though that might be good enough? I think to completely remove any race conditions you would need to kill warming listeners and keep them from coming back by name? I'm really not sure if this type of complexity is worth it, but might be worth a bunch of comments.
WDYT?
|
I was thinking that as another option. I definitely agree that the
complexity is not worth in this case. I will switch to calling
stopListeners directly
…On Wed, 2 Oct 2019, 01:10 Matt Klein, ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Thanks for iterating on this. A high level comment before diving into
detailed review.
/wait-any
------------------------------
In source/server/listener_manager_impl.cc
<#8415 (comment)>:
> @@ -767,6 +770,38 @@ uint64_t ListenerManagerImpl::numConnections() {
return num_connections;
}
+bool ListenerManagerImpl::stopListener(const std::string& name) {
I'm wondering if we can simplify this change by just making this API take
a ListenerConfig, and then just call the worker stop listener API
directly with that config? I think this will work correctly.
I think no matter we have a potential race condition here in which if we
remove anything, LDS will come and add it back later I think. Given this,
it seems better to just stop the listeners directly? This would not cover
the warming case, though that might be good enough? I think to completely
remove any race conditions you would need to kill warming listeners and
keep them from coming back by name? I'm really not sure if this type of
complexity is worth it, but might be worth a bunch of comments.
WDYT?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#8415?email_source=notifications&email_token=AEDINKECWSO3FOTMGOVES3TQMORTPA5CNFSM4I3F6QE2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCGQ7FMA#pullrequestreview-295826096>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEDINKA4ROMV7E2W5QDALDDQMORTPANCNFSM4I3F6QEQ>
.
|
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@mattklein123 addressed the feedback. PTAL |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks LGTM with a few small comments.
/wait
| listeners_type_ = listeners_type; | ||
| for (Network::ListenerConfig& listener : listeners()) { | ||
| for (const auto& worker : workers_) { | ||
| if (listeners_type_.has_value() && |
There was a problem hiding this comment.
This must have a value, right? You set it above.
There was a problem hiding this comment.
Yeah :-). Morning coffee needed..
| std::list<DrainingListener> draining_listeners_; | ||
| std::list<WorkerPtr> workers_; | ||
| bool workers_started_{}; | ||
| absl::optional<StopListenersType> listeners_type_; |
test/integration/integration_test.cc
Outdated
|
|
||
| upstream_request_->encodeHeaders(default_response_headers_, false); | ||
|
|
||
| // Invoke drain listeners endpoint and validate that the listener is actually draining. |
There was a problem hiding this comment.
How have you verified it is draining?
There was a problem hiding this comment.
it validates that after the listener drain is invoked, we can send some data. The drain part is validated in the listener_manager_impl_test by expecting on stopListener call on worker
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, the change LGTM but I would really like to see if we can come up with a stronger integration test. Can you take a look and see if you have any ideas? Thank you.
/wait
| void ListenerManagerImpl::stopListeners() { | ||
| for (const auto& worker : workers_) { | ||
| worker->stopListeners(); | ||
| void ListenerManagerImpl::stopListeners(StopListenersType listeners_type) { |
There was a problem hiding this comment.
nit: s/listeners_type/stop_listeners_type (this will fall clang-tidy at some point)
There was a problem hiding this comment.
sorry. missed this. Will change
| } // namespace | ||
|
|
||
| TEST_P(IntegrationAdminTest, Admin) { | ||
| config_helper_.addConfigModifier([&](envoy::config::bootstrap::v2::Bootstrap& bootstrap) -> void { |
There was a problem hiding this comment.
Does this add anything to this test? We aren't verifying any behavior below AFAICT? This is why I have been asking if there is any way that we can actually verify that we are draining a listener. Even if there is no stat could we potentially look for an info log either in this test or in the proper integration test?
There was a problem hiding this comment.
Yes it does. It modifies the existing listener and sets the traffic direction as OUTBOUND. But drain_listeners should drain listeners irrespective of direction, this is not needed in this test. So removed. I have added the stat now - Now it should be clear. PTAL
There was a problem hiding this comment.
I also brought back the second test, because with stat we can validate that behaviour as well, the listener configured with INBOUND will be stopped with ?inboundonly call.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, LGTM with some small remaining comments.
/wait
| total_listeners_warming, Gauge, Number of currently warming listeners | ||
| total_listeners_active, Gauge, Number of currently active listeners | ||
| total_listeners_draining, Gauge, Number of currently draining listeners | ||
| total_listeners_stopped, Gauge, Number of listeners that were stopped |
There was a problem hiding this comment.
This seems like this should just be a counter? If not don't you need to keep track of stopped listeners that have been removed, etc.?
| checkStats(2, 1, 2, 0, 0, 0); | ||
| } | ||
|
|
||
| // Validates that StopListener functionality works correctly when only inbound listeners are |
There was a problem hiding this comment.
Please test the stopped stat (counter?) here also.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@mattklein123 moved to counter and added verification in listener_manager_impl_test as well. PTAL |
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
* master: (54 commits) Update dependencies - Go, Bazel toos, xxHash, nanopb, rules_go, curl, protobuf, Jinja, yaml-cpp, re2 (envoyproxy#8728) test: increase coverage of listener_manager_impl.cc (envoyproxy#8737) test: modify some macros to reduce number of uncovered lines reported (envoyproxy#8725) build: add a script to setup clang (envoyproxy#8682) http: fix ssl_redirect on external (envoyproxy#8664) docs: update fedora build requirements (envoyproxy#8641) fix draining listener removal logs (envoyproxy#8733) dubbo: fix router doc (envoyproxy#8734) server: provide server health status when stats disabled (envoyproxy#8482) router: adding a knob to configure a cap on buffering for shadowing/retries (envoyproxy#8574) tcp proxy: add default 1 hour idle timeout (envoyproxy#8705) thrift: fix filter names in docs (envoyproxy#8726) Quiche changes to avoid gcc flags on Windows (envoyproxy#8514) test: increase test coverage in Router::HeaderParser (envoyproxy#8721) admin: add drain listeners endpoint (envoyproxy#8415) buffer filter: add content-length when ending stream with trailers (envoyproxy#8609) clarify draining option docs (envoyproxy#8712) build: ignore go-control-plane mirror git commit error code (envoyproxy#8703) api: remove API type database from checked in artifacts. (envoyproxy#8716) admin: correct help strings (envoyproxy#8710) ... Signed-off-by: Spencer Lewis <slewis@squareup.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Description: Adds an admin endpoint that allows listeners to be drained.
Risk Level: Low, only admin initiated.
Testing: Added tests
Docs Changes: Updated
Release Notes: Added
Fixes #4460