listener: in place filter chain update#10662
listener: in place filter chain update#10662mattklein123 merged 33 commits intoenvoyproxy:masterfrom
Conversation
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
|
@mattklein123 Ping. Could you leave some comment for the first round? |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this. Very cool. I left some high level comments to get started. Thank you!
/wait
| } | ||
|
|
||
| void ListenerManagerImpl::drainFilterChains(ListenerImplPtr&& listener) { | ||
| void ListenerManagerImpl::onIntelligentListenerWarmed(ListenerImpl& listener) { |
There was a problem hiding this comment.
Lots of copying in this function. Having been through the change in more detail, my general impression is that it will be simpler to perform an in place update of an existing listener. This would preserve the tag and I think the draining of filter chains could be done within the listener itself?
There was a problem hiding this comment.
I think the draining of filter chains could be done within the listener itself?
That is the consequence of reusing tag, correct?
There was a problem hiding this comment.
yeah I think if we have everything happening within a ListenerImpl some things get a lot easier (sharing tag, some state management) but some things get more difficult. I would definitely be curious to see what it looks like to update the ListenerImpl in place as my intuition is that the overall code will wind up simpler to understand and reason about.
lambdai
left a comment
There was a problem hiding this comment.
There are 2 further directions
-
Reuse the same listener tag but not the ListenerImpl. This is relatively easy. The only violated assumption is that now we may have two listenerimpl mapping to the same tag. The impacted code path will be restricted to master thread, mostly listener inspection and add/stop. AFAIK some code paths tolerated this condition.
-
Reuse the same
ListenerImpland the listenertag value.
This approach makes the in place update code path straightforward, although some efforts are needed. More specifically:
a. Figure out what members need to maintain 2 versions starting from warm up to drain complete.
b. Figure out what members need to be maintained 2+ versions: example, draining filters represents 1 listener config update.
c. Handle listener update when in place update is ongoing. Notes the new listener update could be another in place update, or a traditional update.
I had an experimental branch under the philosophy of reusing the ListenerImpl. The statement is way more complicated than the current one. Let me see if I can recover it and bring it to discussion.
| } | ||
|
|
||
| void ListenerManagerImpl::drainFilterChains(ListenerImplPtr&& listener) { | ||
| void ListenerManagerImpl::onIntelligentListenerWarmed(ListenerImpl& listener) { |
There was a problem hiding this comment.
I think the draining of filter chains could be done within the listener itself?
That is the consequence of reusing tag, correct?
Yeah maybe spend a bit of time and time box it and then we can look at it together? I'm OK to ship and iterate if we need to, but per my other comment I do think that re-using an existing ListenerImpl will be a cleaner long term solution. |
lambdai
left a comment
There was a problem hiding this comment.
TODO
- reuse tag value
- delete the two "ForTests" methond
- runtime kill switch
- rebase the commits using same ListenerImpl obj. Will be in another PR and just scaffold.
source/server/listener_impl.cc
Outdated
| if (Network::Utility::protobufAddressSocketType(config.address()) != | ||
| Network::Address::SocketType::Stream || | ||
| Network::Utility::protobufAddressSocketType(config.address()) != | ||
| Network::Address::SocketType::Stream) { |
There was a problem hiding this comment.
errr... I cannot recall why I was thinking...
source/server/listener_impl.cc
Outdated
|
|
||
| bool ListenerImpl::supportUpdateFilterChain(const envoy::config::listener::v3::Listener& config, | ||
| bool worker_started) { | ||
| // It's adding very little value to support filter chain only update at start phase. |
There was a problem hiding this comment.
Add the comment: the current warm up flow needs active listener at connection handler.
The worker started along with the located ListenerImpl is the sufficient condition.
It's very close to necessary condition: The missing gap is that the current listener itself is ready but waiting for other listener to be initialized.
source/server/listener_impl.cc
Outdated
|
|
||
| void ListenerImpl::diffFilterChain(const ListenerImpl& listener, | ||
| std::function<void(Network::DrainableFilterChain&)> callback) { | ||
| for (const auto& filter_chain : filter_chain_manager_.allFilterChains()) { |
There was a problem hiding this comment.
Sorry I renamed the allFilterChains() to filterChainsByMessage(). That is a stronger signal of O(N)
| WorkerFactory& worker_factory, bool enable_dispatcher_stats); | ||
|
|
||
| void onListenerWarmed(ListenerImpl& listener); | ||
| void onIntelligentListenerWarmed(ListenerImpl& listener); |
There was a problem hiding this comment.
Renamed to inPlaceFilterChainUpdate() for now
If I pass bool flag, there will be 3 if expression... Not sure which is better. Will give it a try.
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
|
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for the iteration. Looks good with a few remaining comments. Thanks.
/wait
source/server/listener_impl.cc
Outdated
| return false; | ||
| } | ||
|
|
||
| // Guard the filter_chains[0] access. |
There was a problem hiding this comment.
If the new listener has non-zero and the old listener has zero this will crash?
| context_ = Ssl::createClientSslTransportSocketFactory({}, *context_manager_, *api_); | ||
| } | ||
|
|
||
| std::unique_ptr<SslClient> connect(const std::string& alpn) { |
There was a problem hiding this comment.
For the HTTP test you should be able to use the standard HTTP test integration client which supports TLS. Please switch to that. I still prefer you add TLS support to the raw test driver vs. have it handled here but if you insist on not doing that please add a TODO to clean up that part in a follow up. Thanks.
There was a problem hiding this comment.
For the HTTP test you should be able to use the standard HTTP test integration client which supports TLS. Please switch to that.
Looking.
I still prefer you add TLS support to the raw test driver vs. have it handled here but if you insist on not doing that please add a TODO to clean up that part in a follow up.
I am happy to move the the next PR. I see overlap code in RawConnectionDriver, Ssl Integration test and this test. But I don't have a clear picture what is the final shape it will be like. I guess you also want to review a smaller PR focusing on a dedicated goal of refactor.
lambdai
left a comment
There was a problem hiding this comment.
Adding test for false case of in place update, also the http integration test.
source/server/listener_impl.cc
Outdated
| return false; | ||
| } | ||
|
|
||
| // Guard the filter_chains[0] access. |
There was a problem hiding this comment.
The old (tcp) listener choose to reject the config... Sorry I don't exactly understand the rationale. In place update won't crash on zero filter chain, at least we can fix the crash.
The question is, we decided to reject listener with zero filter chain in the past, should we change the behavior at in place update?
if (config.filter_chains().empty() && (socket_type == Network::Address::SocketType::Stream ||
!udp_listener_factory_->isTransportConnectionless())) {
// If we got here, this is a tcp listener or connection-oriented udp listener, so ensure there
// is a filter chain specified
throw EnvoyException(fmt::format("error adding listener '{}': no filter chains specified",
address_->asString()));
| context_ = Ssl::createClientSslTransportSocketFactory({}, *context_manager_, *api_); | ||
| } | ||
|
|
||
| std::unique_ptr<SslClient> connect(const std::string& alpn) { |
There was a problem hiding this comment.
For the HTTP test you should be able to use the standard HTTP test integration client which supports TLS. Please switch to that.
Looking.
I still prefer you add TLS support to the raw test driver vs. have it handled here but if you insist on not doing that please add a TODO to clean up that part in a follow up.
I am happy to move the the next PR. I see overlap code in RawConnectionDriver, Ssl Integration test and this test. But I don't have a clear picture what is the final shape it will be like. I guess you also want to review a smaller PR focusing on a dedicated goal of refactor.
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
|
I may have some sense of uniting RawConnectionDriver and SslClient. |
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
|
@mattklein123 Is RawConnectionDriver in #10998 matching your expectation? |
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for iterating here. Really nice work on a very difficult change. Few more comments and let's ship! :)
Please run the integration tests with runs_per_test=1000 to check for flakes also and ping back.
/wait
source/server/listener_impl.cc
Outdated
| return false; | ||
| } | ||
|
|
||
| // Guard the filter_chains[0] access. |
There was a problem hiding this comment.
Sorry are you saying that it's impossible to get here and the previous listener has an empty filter chain? If so can you add a comment to that effect?
lambdai
left a comment
There was a problem hiding this comment.
running repeated integration test and will get back to the code update after 4pm
source/server/listener_impl.cc
Outdated
| return false; | ||
| } | ||
|
|
||
| // Guard the filter_chains[0] access. |
There was a problem hiding this comment.
are you saying that it's impossible to get here and the previous listener has an empty filter chain?
Yes, there is a validation for zero filter chain, in product code and a test case. So previous tcp listener cannot have 0 filter chain.
I also add a test case for N filter chain to zero filter chain listener. The expect path is to go with full listener update and exception is thrown there.
The intention is to leave the decision to full listener update. In the future, if we choose to allow 0 filter chain, only full listener update flow needs change.
|
Yeah! I will clean up some TODOs in a slower pace :) |
Commit Message:
Provide new listener update path which could update the filter chains without draining all the connections of the old listener.
The in place filter chain update flow is an optimization of listener update. If the supportUpdateFilterChain() passes and runtime
"envoy.reloadable_features.listener_in_place_filterchain_update" is not explicitly disabled,
the existing connections may not be drained if the owning filter chains are not updated in the new listener config.
Additional Description:
patch 3/N of #9773
Follow up of #10528
This PR is providing new listener update path guarded by supportUpdateFilterChain()
The in place filter chain update flow is an optimization of listener update. If the above check pass and runtime
"envoy.reloadable_features.listener_in_place_filterchain_update" is not explicitly disabled,
the existing connections may not be drained if the owning filter chains are not updated in the new listener config.
When the feature is disabled by the runtime, the following listener update won't use the optimized path.
The new update flow starts with the construction of
ListenerImpl.ListenerImpl will share PerListenerFactoryContext, and the warmup callback utilize the below worker interface
Those 2 methods are landed in worker: provide removeFilterChain interface #10528
Risk Level: Medium
Connection life time changes by default. Can be disabled by setting "envoy.reloadable_features.listener_in_place_filterchain_update" to false.
Testing: integration test,unit test, runtime flag test.
Docs Changes:
Release Notes: see current.rst