Skip to content

listener: in place filter chain update#10662

Merged
mattklein123 merged 33 commits intoenvoyproxy:masterfrom
lambdai:listenerutil
Apr 30, 2020
Merged

listener: in place filter chain update#10662
mattklein123 merged 33 commits intoenvoyproxy:masterfrom
lambdai:listenerutil

Conversation

@lambdai
Copy link
Copy Markdown
Contributor

@lambdai lambdai commented Apr 6, 2020

Commit Message:

Provide new listener update path which could update the filter chains without draining all the connections of the old listener.

The in place filter chain update flow is an optimization of listener update. If the supportUpdateFilterChain() passes and runtime
"envoy.reloadable_features.listener_in_place_filterchain_update" is not explicitly disabled,
the existing connections may not be drained if the owning filter chains are not updated in the new listener config.

Additional Description:

patch 3/N of #9773
Follow up of #10528

This PR is providing new listener update path guarded by supportUpdateFilterChain()
The in place filter chain update flow is an optimization of listener update. If the above check pass and runtime
"envoy.reloadable_features.listener_in_place_filterchain_update" is not explicitly disabled,
the existing connections may not be drained if the owning filter chains are not updated in the new listener config.

When the feature is disabled by the runtime, the following listener update won't use the optimized path.

The new update flow starts with the construction of ListenerImpl.
ListenerImpl will share PerListenerFactoryContext, and the warmup callback utilize the below worker interface

  1. addListener() with overridden listener
  2. removeFilterChains()
    Those 2 methods are landed in worker: provide removeFilterChain interface #10528

Risk Level: Medium
Connection life time changes by default. Can be disabled by setting "envoy.reloadable_features.listener_in_place_filterchain_update" to false.

Testing: integration test,unit test, runtime flag test.
Docs Changes:
Release Notes: see current.rst

@stale
Copy link
Copy Markdown

stale bot commented Apr 14, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 14, 2020
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 17, 2020
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai lambdai changed the title WiP: Intelligent listener update, utilities listener: Intelligent listener update Apr 17, 2020
@lambdai lambdai marked this pull request as ready for review April 17, 2020 19:07
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Apr 20, 2020

@mattklein123 Ping. Could you leave some comment for the first round?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Very cool. I left some high level comments to get started. Thank you!

/wait

}

void ListenerManagerImpl::drainFilterChains(ListenerImplPtr&& listener) {
void ListenerManagerImpl::onIntelligentListenerWarmed(ListenerImpl& listener) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of copying in this function. Having been through the change in more detail, my general impression is that it will be simpler to perform an in place update of an existing listener. This would preserve the tag and I think the draining of filter chains could be done within the listener itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the draining of filter chains could be done within the listener itself?

That is the consequence of reusing tag, correct?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think if we have everything happening within a ListenerImpl some things get a lot easier (sharing tag, some state management) but some things get more difficult. I would definitely be curious to see what it looks like to update the ListenerImpl in place as my intuition is that the overall code will wind up simpler to understand and reason about.

Copy link
Copy Markdown
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 further directions

  1. Reuse the same listener tag but not the ListenerImpl. This is relatively easy. The only violated assumption is that now we may have two listenerimpl mapping to the same tag. The impacted code path will be restricted to master thread, mostly listener inspection and add/stop. AFAIK some code paths tolerated this condition.

  2. Reuse the same ListenerImpl and the listenertag value.
    This approach makes the in place update code path straightforward, although some efforts are needed. More specifically:
    a. Figure out what members need to maintain 2 versions starting from warm up to drain complete.
    b. Figure out what members need to be maintained 2+ versions: example, draining filters represents 1 listener config update.
    c. Handle listener update when in place update is ongoing. Notes the new listener update could be another in place update, or a traditional update.

I had an experimental branch under the philosophy of reusing the ListenerImpl. The statement is way more complicated than the current one. Let me see if I can recover it and bring it to discussion.

}

void ListenerManagerImpl::drainFilterChains(ListenerImplPtr&& listener) {
void ListenerManagerImpl::onIntelligentListenerWarmed(ListenerImpl& listener) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the draining of filter chains could be done within the listener itself?

That is the consequence of reusing tag, correct?

@mattklein123
Copy link
Copy Markdown
Member

I had an experimental branch under the philosophy of reusing the ListenerImpl. The statement is way more complicated than the current one. Let me see if I can recover it and bring it to discussion.

Yeah maybe spend a bit of time and time box it and then we can look at it together? I'm OK to ship and iterate if we need to, but per my other comment I do think that re-using an existing ListenerImpl will be a cleaner long term solution.

Copy link
Copy Markdown
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

  1. reuse tag value
  2. delete the two "ForTests" methond
  3. runtime kill switch
  4. rebase the commits using same ListenerImpl obj. Will be in another PR and just scaffold.

Comment on lines +647 to +650
if (Network::Utility::protobufAddressSocketType(config.address()) !=
Network::Address::SocketType::Stream ||
Network::Utility::protobufAddressSocketType(config.address()) !=
Network::Address::SocketType::Stream) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errr... I cannot recall why I was thinking...


bool ListenerImpl::supportUpdateFilterChain(const envoy::config::listener::v3::Listener& config,
bool worker_started) {
// It's adding very little value to support filter chain only update at start phase.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the comment: the current warm up flow needs active listener at connection handler.
The worker started along with the located ListenerImpl is the sufficient condition.
It's very close to necessary condition: The missing gap is that the current listener itself is ready but waiting for other listener to be initialized.


void ListenerImpl::diffFilterChain(const ListenerImpl& listener,
std::function<void(Network::DrainableFilterChain&)> callback) {
for (const auto& filter_chain : filter_chain_manager_.allFilterChains()) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I renamed the allFilterChains() to filterChainsByMessage(). That is a stronger signal of O(N)

WorkerFactory& worker_factory, bool enable_dispatcher_stats);

void onListenerWarmed(ListenerImpl& listener);
void onIntelligentListenerWarmed(ListenerImpl& listener);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to inPlaceFilterChainUpdate() for now

If I pass bool flag, there will be 3 if expression... Not sure which is better. Will give it a try.

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Apr 28, 2020

@mattklein123

  1. is_stopped flag is added. I give up in place update in this situation. People will see full listener update. Not sure if it is the best solution but it will be the less surprising.
  2. filterChainOnlyUpdate: My intention is to add the notes here if there are further accounting fields in the Listener message (e.g, a time, a version number), we should update this function, to ignore those fields beside filter chain. I know our general principle is not to worry about the unlikely change but this one need somewhat attention.
  3. The newly added integration test has many duplicated codes, among my newly added cases and among neighbor test cases. I plan to refactor either after this PR. If you feel it is too ugly and some refactor must be done in this PR, please let me know, thanks!

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the iteration. Looks good with a few remaining comments. Thanks.

/wait

return false;
}

// Guard the filter_chains[0] access.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new listener has non-zero and the old listener has zero this will crash?

context_ = Ssl::createClientSslTransportSocketFactory({}, *context_manager_, *api_);
}

std::unique_ptr<SslClient> connect(const std::string& alpn) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the HTTP test you should be able to use the standard HTTP test integration client which supports TLS. Please switch to that. I still prefer you add TLS support to the raw test driver vs. have it handled here but if you insist on not doing that please add a TODO to clean up that part in a follow up. Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the HTTP test you should be able to use the standard HTTP test integration client which supports TLS. Please switch to that.

Looking.

I still prefer you add TLS support to the raw test driver vs. have it handled here but if you insist on not doing that please add a TODO to clean up that part in a follow up.

I am happy to move the the next PR. I see overlap code in RawConnectionDriver, Ssl Integration test and this test. But I don't have a clear picture what is the final shape it will be like. I guess you also want to review a smaller PR focusing on a dedicated goal of refactor.

Copy link
Copy Markdown
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding test for false case of in place update, also the http integration test.

return false;
}

// Guard the filter_chains[0] access.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old (tcp) listener choose to reject the config... Sorry I don't exactly understand the rationale. In place update won't crash on zero filter chain, at least we can fix the crash.
The question is, we decided to reject listener with zero filter chain in the past, should we change the behavior at in place update?

  if (config.filter_chains().empty() && (socket_type == Network::Address::SocketType::Stream ||
                                         !udp_listener_factory_->isTransportConnectionless())) {
    // If we got here, this is a tcp listener or connection-oriented udp listener, so ensure there
    // is a filter chain specified
    throw EnvoyException(fmt::format("error adding listener '{}': no filter chains specified",
                                     address_->asString()));

context_ = Ssl::createClientSslTransportSocketFactory({}, *context_manager_, *api_);
}

std::unique_ptr<SslClient> connect(const std::string& alpn) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the HTTP test you should be able to use the standard HTTP test integration client which supports TLS. Please switch to that.

Looking.

I still prefer you add TLS support to the raw test driver vs. have it handled here but if you insist on not doing that please add a TODO to clean up that part in a follow up.

I am happy to move the the next PR. I see overlap code in RawConnectionDriver, Ssl Integration test and this test. But I don't have a clear picture what is the final shape it will be like. I guess you also want to review a smaller PR focusing on a dedicated goal of refactor.

lambdai added 3 commits April 29, 2020 03:32
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
lambdai added 5 commits April 29, 2020 08:10
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Apr 29, 2020

I may have some sense of uniting RawConnectionDriver and SslClient.
Leave TODO and will do in next round

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Apr 29, 2020

@mattklein123 Is RawConnectionDriver in #10998 matching your expectation?

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating here. Really nice work on a very difficult change. Few more comments and let's ship! :)

Please run the integration tests with runs_per_test=1000 to check for flakes also and ping back.

/wait

return false;
}

// Guard the filter_chains[0] access.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry are you saying that it's impossible to get here and the previous listener has an empty filter chain? If so can you add a comment to that effect?

Copy link
Copy Markdown
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running repeated integration test and will get back to the code update after 4pm

return false;
}

// Guard the filter_chains[0] access.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying that it's impossible to get here and the previous listener has an empty filter chain?

Yes, there is a validation for zero filter chain, in product code and a test case. So previous tcp listener cannot have 0 filter chain.
I also add a test case for N filter chain to zero filter chain listener. The expect path is to go with full listener update and exception is thrown there.

The intention is to leave the decision to full listener update. In the future, if we choose to allow 0 filter chain, only full listener update flow needs change.

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice work!

@mattklein123 mattklein123 merged commit c86c679 into envoyproxy:master Apr 30, 2020
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Apr 30, 2020

Yeah!
@mattklein123 Thank you for the guidance these days!

I will clean up some TODOs in a slower pace :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants