http: support resource removal for ecds#15368
Conversation
This adds support for resource removal for ECDS, allowing it to work with xDS TTLs. Upon expiry, the configuration is reset back to the null state, which should result in Envoy reverting to the state it would find itself in should the original xDS fetch time out. Signed-off-by: Snow Pettersen <snowp@lyft.com>
| void DynamicFilterConfigProviderImpl::onConfigRemoved(Config::ConfigAppliedCb cb) { | ||
| tls_.runOnAllThreads( | ||
| [cb](OptRef<ThreadLocalConfig> tls) { | ||
| tls->config_ = absl::nullopt; | ||
| if (cb) { | ||
| cb(); | ||
| } | ||
| }, | ||
| [this]() { | ||
| // This happens after all workers have discarded the previous config so it can be safely | ||
| // deleted on the main thread by an update with the new config. | ||
| this->current_config_ = absl::nullopt; | ||
| }); | ||
| } | ||
|
|
There was a problem hiding this comment.
I am not fully sure but I know some config has default config.
Maybe we should start with defining the exact remove semantic for such config?
CC @kyessenov
There was a problem hiding this comment.
It should definitely revert back to the default config in that case. I had assumed that we fall back to when the provider returned empty, but it seems like it uses the standard update mechanism to fill in the default config, so seems like this PR needs some work
|
Going to wait for https://github.com/envoyproxy/envoy/pull/15371/files to land since there will be conflicts here |
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
|
/retest I think this can be reviewed now, I think the main change https://github.com/envoyproxy/envoy/pull/15371/files will require is clearing out all the other fields tracked from the last received response. |
|
Retrying Azure Pipelines: |
|
Would be great to have a first pass from @kyessenov if he has time. |
| } | ||
|
|
||
| last_config_hash_ = 0; | ||
| } else if (!added_resources.empty()) { |
There was a problem hiding this comment.
qq: Can resources be added and removed at the same time?
There was a problem hiding this comment.
I think we'll never get both a removal and addition/update of the same resource and we only subscribe to one resource here so we shouldn't ever have to handle both at the same time. It would be nice to have this be better encoded in the API, but I'm not sure if there's a great way to handle this besides ASSERTs? cc @htuch
There was a problem hiding this comment.
Yeah, I think the GrpcMux should ensure this property, but I can't say for certainty if it's true. We should make it part of the API IMHO, i.e. if the server does do a simultaneous add/remove, we don't expose any of the subscriptions to this. CC @dmitri-d @adisuissa
There was a problem hiding this comment.
In the unified mux implementation there's a check for this: https://github.com/envoyproxy/envoy/pull/15473/files#diff-4228bc8f4213ff904d2a3584f69a2cf049180957f668d8bc0125d9bfa3379fa2R119. If failed, the DeltaDiscoveryResponse is discarded.
| void onConfigUpdate(Envoy::Http::FilterFactoryCb config, const std::string&, | ||
| Config::ConfigAppliedCb cb) override; | ||
| void onConfigRemoved(Config::ConfigAppliedCb cb) override; | ||
| void setDefaultConfiguration(Envoy::Http::FilterFactoryCb config) override { |
There was a problem hiding this comment.
After the other PR, we could probably move this into the constructor?
| { | ||
| // Reinstate the previous configuration. | ||
| sendXdsResponse("foo", "1", denyPrivateConfig(), true); | ||
| // Wait until the TTL removes the resource. |
There was a problem hiding this comment.
This comment sounds wrong.
kyessenov
left a comment
There was a problem hiding this comment.
Looks good to me, with some minor comments.
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
|
@kyessenov I merged in your change and I believe I've addressed the other comments, ptal |
|
|
||
| namespace { | ||
| void validateTypeUrlHelper(const std::string& type_url, | ||
| absl::flat_hash_set<std::string> require_type_urls) { |
There was a problem hiding this comment.
nit: const flat_hash_set<string>&
htuch
left a comment
There was a problem hiding this comment.
Looks good! Just a few comments..
/wait
| // This init target is used to activate the subscription but not wait | ||
| // for a response. It is used whenever a default config is provided to be | ||
| // used while waiting for a response. | ||
| // This init target is used to activate |
There was a problem hiding this comment.
Can you fix this weird word wrap? LGTM and ready to ship once that is done, thanks!
There was a problem hiding this comment.
Kept missing this for some reason! Will fix
Signed-off-by: Snow Pettersen <snowp@lyft.com>
| std::make_shared<std::atomic<uint64_t>>(filter_config_providers.size()); | ||
| for (auto* provider : filter_config_providers) { | ||
| update_cb(provider, [remaining_providers, done_cb] { | ||
| if (--(*remaining_providers) == 0) { |
There was a problem hiding this comment.
Even thought this int is atomic, can't this still race, e.g. imagine we have two threads A and B. They both see 2 remaining. A decrements, then B decrements, then A reads then B reads and they both see 0?
I think you need atomic compare/exchange, but even then, I rather we didn't review this kind of code more than once, so suggest factoring out to something in source/common/common, i.e. some sort of "apply to a container's elements in parallel and callback when done"..
There was a problem hiding this comment.
Moved this to common/common and made it not thread safe since I changed it to be invoked as a completion callback for the runOnAllThreads call, which means it should get invoked on the main thread. I can see an argument for wanting to keep it thread safe just in case, but lmk what you think
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
source/common/common/containers.h
Outdated
|
|
||
| /** | ||
| * Invokes a function for all elements in a container that accepts a completion callback which will | ||
| * be invoked asynchronously, invoking a final completion callback once all callbacks associated |
There was a problem hiding this comment.
.. and potentially cross thread? Or all on the calling thread..
source/common/common/containers.h
Outdated
|
|
||
| auto remaining_elements = std::make_shared<uint64_t>(container.size()); | ||
| for (auto element : container) { | ||
| update_cb(element, [remaining_elements, done_cb] { |
There was a problem hiding this comment.
So, before we were doing N threads * M providers, but now we just do M providers. Presumably there was something hiding underneath the provider onConfigUpdate implying N concurrency. How does this now go away?
In general, this continuation passing style is really hard to read in C++ (I know it's common in some languages). @kyessenov is there a way to simplify the interfaces here to make them less hard to reason about in terms of concurrency?
There was a problem hiding this comment.
Previously it would be decremented on the per thread callback, but I changed it to be done on the main thread completion callback instead. This eliminated the need to check the concurrency and also made the code no longer run on different threads, removing the need for atomics.
There was a problem hiding this comment.
It might be possible to do something clever around RAII semantics and a shared pointer to avoid some of this complexity, let me try some stuff out.
Signed-off-by: Snow Pettersen <snowp@lyft.com>
might not actually simplify things as it adds some complexity around keeping a shared pointer around Signed-off-by: Snow Pettersen <snowp@lyft.com>
htuch
left a comment
There was a problem hiding this comment.
LGTM modulo comments, thanks for iterating!
/wait
source/common/common/containers.h
Outdated
| * of a container that may require asynchronous processing. | ||
| */ | ||
| template <class ContainerT, class UpdateCbT> | ||
| void applyToAllWithCleanup(const ContainerT& container, UpdateCbT update_cb, |
There was a problem hiding this comment.
Can you document UpdateCbT expectation a bit more, or maybe add more structure around its type that isn't generic (since for example the cleanup arg is known)?
source/common/common/containers.h
Outdated
| * update callback to delay cleanup until some arbitrary time: the completion callback will be | ||
| * invoked once no more references to the provided shared pointer exists. | ||
| * | ||
| * This provides a thread safe way of tracking the completion of callbacks based on a the elements |
There was a problem hiding this comment.
| * This provides a thread safe way of tracking the completion of callbacks based on a the elements | |
| * This provides a thread safe way of tracking the completion of callbacks based on the elements |
source/common/common/containers.h
Outdated
| /** | ||
| * Invokes a function for all elements in a container, providing a cleanup function that will be | ||
| * executed after all the elements have been processed. The Cleanup object is provided to allow each | ||
| * update callback to delay cleanup until some arbitrary time: the completion callback will be |
There was a problem hiding this comment.
The docs on Cleanup probably belong inside the method. Probably can just talk about what done_cb does and the thread safety. It did require a couple of mins at staring at this to understand what is going on, but I think this is a pretty cool pattern.
There was a problem hiding this comment.
I updated it a bit, lmk what you think. I think some mention of Cleanup is necessary since it's part of the function signature, but I moved most of it into the function body
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
|
Finally got CI green here, ptal |
This adds support for resource removal for ECDS, allowing it to work
with xDS TTLs. Upon expiry, the configuration is reset back to the null
state, which should result in Envoy reverting to the state it would find
itself in should the original xDS fetch time out.
Signed-off-by: Snow Pettersen snowp@lyft.com
Risk Level: Medium
Testing: Updated UTs
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
Fixes #15336