Skip to content

http: support resource removal for ecds#15368

Merged
htuch merged 29 commits intoenvoyproxy:mainfrom
snowp:ecds-ttl
Apr 2, 2021
Merged

http: support resource removal for ecds#15368
htuch merged 29 commits intoenvoyproxy:mainfrom
snowp:ecds-ttl

Conversation

@snowp
Copy link
Copy Markdown
Contributor

@snowp snowp commented Mar 8, 2021

This adds support for resource removal for ECDS, allowing it to work
with xDS TTLs. Upon expiry, the configuration is reset back to the null
state, which should result in Envoy reverting to the state it would find
itself in should the original xDS fetch time out.

Signed-off-by: Snow Pettersen snowp@lyft.com

Risk Level: Medium
Testing: Updated UTs
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
Fixes #15336

This adds support for resource removal for ECDS, allowing it to work
with xDS TTLs. Upon expiry, the configuration is reset back to the null
state, which should result in Envoy reverting to the state it would find
itself in should the original xDS fetch time out.

Signed-off-by: Snow Pettersen <snowp@lyft.com>
Comment on lines +70 to +84
void DynamicFilterConfigProviderImpl::onConfigRemoved(Config::ConfigAppliedCb cb) {
tls_.runOnAllThreads(
[cb](OptRef<ThreadLocalConfig> tls) {
tls->config_ = absl::nullopt;
if (cb) {
cb();
}
},
[this]() {
// This happens after all workers have discarded the previous config so it can be safely
// deleted on the main thread by an update with the new config.
this->current_config_ = absl::nullopt;
});
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure but I know some config has default config.
Maybe we should start with defining the exact remove semantic for such config?

CC @kyessenov

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should definitely revert back to the default config in that case. I had assumed that we fall back to when the provider returned empty, but it seems like it uses the standard update mechanism to fill in the default config, so seems like this PR needs some work

@snowp
Copy link
Copy Markdown
Contributor Author

snowp commented Mar 9, 2021

Going to wait for https://github.com/envoyproxy/envoy/pull/15371/files to land since there will be conflicts here

Signed-off-by: Snow Pettersen <snowp@lyft.com>
Snow Pettersen added 5 commits March 9, 2021 19:04
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
@snowp
Copy link
Copy Markdown
Contributor Author

snowp commented Mar 11, 2021

/retest

I think this can be reviewed now, I think the main change https://github.com/envoyproxy/envoy/pull/15371/files will require is clearing out all the other fields tracked from the last received response.

@kyessenov @htuch

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15368 (comment) was created by @snowp.

see: more, trace.

@htuch
Copy link
Copy Markdown
Member

htuch commented Mar 12, 2021

Would be great to have a first pass from @kyessenov if he has time.

}

last_config_hash_ = 0;
} else if (!added_resources.empty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Can resources be added and removed at the same time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll never get both a removal and addition/update of the same resource and we only subscribe to one resource here so we shouldn't ever have to handle both at the same time. It would be nice to have this be better encoded in the API, but I'm not sure if there's a great way to handle this besides ASSERTs? cc @htuch

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the GrpcMux should ensure this property, but I can't say for certainty if it's true. We should make it part of the API IMHO, i.e. if the server does do a simultaneous add/remove, we don't expose any of the subscriptions to this. CC @dmitri-d @adisuissa

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unified mux implementation there's a check for this: https://github.com/envoyproxy/envoy/pull/15473/files#diff-4228bc8f4213ff904d2a3584f69a2cf049180957f668d8bc0125d9bfa3379fa2R119. If failed, the DeltaDiscoveryResponse is discarded.

void onConfigUpdate(Envoy::Http::FilterFactoryCb config, const std::string&,
Config::ConfigAppliedCb cb) override;
void onConfigRemoved(Config::ConfigAppliedCb cb) override;
void setDefaultConfiguration(Envoy::Http::FilterFactoryCb config) override {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the other PR, we could probably move this into the constructor?

{
// Reinstate the previous configuration.
sendXdsResponse("foo", "1", denyPrivateConfig(), true);
// Wait until the TTL removes the resource.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment sounds wrong.

Copy link
Copy Markdown
Contributor

@kyessenov kyessenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, with some minor comments.

Snow Pettersen added 4 commits March 17, 2021 17:30
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
@snowp
Copy link
Copy Markdown
Contributor Author

snowp commented Mar 18, 2021

@kyessenov I merged in your change and I believe I've addressed the other comments, ptal


namespace {
void validateTypeUrlHelper(const std::string& type_url,
absl::flat_hash_set<std::string> require_type_urls) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const flat_hash_set<string>&

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Snow Pettersen <snowp@lyft.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments..
/wait

// This init target is used to activate the subscription but not wait
// for a response. It is used whenever a default config is provided to be
// used while waiting for a response.
// This init target is used to activate
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Strange text reflow.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix this weird word wrap? LGTM and ready to ship once that is done, thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept missing this for some reason! Will fix

Signed-off-by: Snow Pettersen <snowp@lyft.com>
std::make_shared<std::atomic<uint64_t>>(filter_config_providers.size());
for (auto* provider : filter_config_providers) {
update_cb(provider, [remaining_providers, done_cb] {
if (--(*remaining_providers) == 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even thought this int is atomic, can't this still race, e.g. imagine we have two threads A and B. They both see 2 remaining. A decrements, then B decrements, then A reads then B reads and they both see 0?

I think you need atomic compare/exchange, but even then, I rather we didn't review this kind of code more than once, so suggest factoring out to something in source/common/common, i.e. some sort of "apply to a container's elements in parallel and callback when done"..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to common/common and made it not thread safe since I changed it to be invoked as a completion callback for the runOnAllThreads call, which means it should get invoked on the main thread. I can see an argument for wanting to keep it thread safe just in case, but lmk what you think

Snow Pettersen added 4 commits March 23, 2021 19:13
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>

/**
* Invokes a function for all elements in a container that accepts a completion callback which will
* be invoked asynchronously, invoking a final completion callback once all callbacks associated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. and potentially cross thread? Or all on the calling thread..


auto remaining_elements = std::make_shared<uint64_t>(container.size());
for (auto element : container) {
update_cb(element, [remaining_elements, done_cb] {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, before we were doing N threads * M providers, but now we just do M providers. Presumably there was something hiding underneath the provider onConfigUpdate implying N concurrency. How does this now go away?

In general, this continuation passing style is really hard to read in C++ (I know it's common in some languages). @kyessenov is there a way to simplify the interfaces here to make them less hard to reason about in terms of concurrency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it would be decremented on the per thread callback, but I changed it to be done on the main thread completion callback instead. This eliminated the need to check the concurrency and also made the code no longer run on different threads, removing the need for atomics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible to do something clever around RAII semantics and a shared pointer to avoid some of this complexity, let me try some stuff out.

Snow Pettersen added 3 commits March 25, 2021 20:28
Signed-off-by: Snow Pettersen <snowp@lyft.com>
might not actually simplify things as it adds some complexity around
keeping a shared pointer around

Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments, thanks for iterating!
/wait

* of a container that may require asynchronous processing.
*/
template <class ContainerT, class UpdateCbT>
void applyToAllWithCleanup(const ContainerT& container, UpdateCbT update_cb,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document UpdateCbT expectation a bit more, or maybe add more structure around its type that isn't generic (since for example the cleanup arg is known)?

* update callback to delay cleanup until some arbitrary time: the completion callback will be
* invoked once no more references to the provided shared pointer exists.
*
* This provides a thread safe way of tracking the completion of callbacks based on a the elements
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This provides a thread safe way of tracking the completion of callbacks based on a the elements
* This provides a thread safe way of tracking the completion of callbacks based on the elements

/**
* Invokes a function for all elements in a container, providing a cleanup function that will be
* executed after all the elements have been processed. The Cleanup object is provided to allow each
* update callback to delay cleanup until some arbitrary time: the completion callback will be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs on Cleanup probably belong inside the method. Probably can just talk about what done_cb does and the thread safety. It did require a couple of mins at staring at this to understand what is going on, but I think this is a pretty cool pattern.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it a bit, lmk what you think. I think some mention of Cleanup is necessary since it's part of the function signature, but I moved most of it into the function body

Snow Pettersen added 2 commits March 30, 2021 15:10
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Snow Pettersen added 2 commits March 31, 2021 15:04
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
htuch
htuch previously approved these changes Mar 31, 2021
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Snow Pettersen added 2 commits March 31, 2021 22:15
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Snow Pettersen added 3 commits April 1, 2021 17:01
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
Signed-off-by: Snow Pettersen <snowp@lyft.com>
@snowp
Copy link
Copy Markdown
Contributor Author

snowp commented Apr 2, 2021

Finally got CI green here, ptal

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit aea7de2 into envoyproxy:main Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support TTL removals for ECDS

5 participants