Skip to content

stats: Shared scopes phase 1, changes scope APIs to use ScopeSharedPtr, leaving ScopePtr alias behind#19791

Merged
jmarantz merged 11 commits intoenvoyproxy:mainfrom
jmarantz:shared-scopes-1
Feb 16, 2022
Merged

stats: Shared scopes phase 1, changes scope APIs to use ScopeSharedPtr, leaving ScopePtr alias behind#19791
jmarantz merged 11 commits intoenvoyproxy:mainfrom
jmarantz:shared-scopes-1

Conversation

@jmarantz
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz commented Feb 3, 2022

Commit Message: To support an algorithmic solution to a long burst of CPU on admin /stats with a large # clusters, we need to hold scopes in shared pointers. This PR makes that change and updates the usage within stats to reference scopes as ScopeSharedPtr rather than ScopePtr, and adds enable_shared_from_this as a Scope super-class so existing APIs that pass Scope& don't need to change.

This leaves behind a ScopePtr alias to ScopeSharedPtr temporarily to (a) make this PR be reviewable and (b) avoid breaking other repos that may reference ScopePtr. It will be easier to submit this PR first, then a external repos can rename their references, and we can do a follow-up PR that removes ScopePtr to avoid longer term confusion. That PR would like like #19790 which has 90 changed files in addition to the semantic change. Once this is merged, that PR will just be a strict rename and will be a lot easier to review.

This PR blocks #19693 which implements the admin change requiring holding onto a shared scope ptr.

Additional Description:
Risk Level: low
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

…fined type.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…Ptr to make a smaller 1st review.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #19791 was opened by jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Feb 3, 2022

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19791 (comment) was created by @jmarantz.

see: more, trace.

@jmarantz jmarantz changed the title stats: Shared scopes phase 1, leaves an alias behind and doesn't change most references. stats: Shared scopes phase 1, leaving an alias behind and doesn't change most references. Feb 3, 2022
@jmarantz jmarantz marked this pull request as ready for review February 3, 2022 16:21
@jmarantz jmarantz changed the title stats: Shared scopes phase 1, leaving an alias behind and doesn't change most references. stats: Shared scopes phase 1, changes scope APIs to use ScopeSharedPtr, leaving ScopePtr alias behind Feb 3, 2022
Copy link
Copy Markdown
Contributor

@pradeepcrao pradeepcrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Josh, this change looks good on its own. I will spend some time tomorrow looking at how this fits in with the rest of the changes.

null_counter_(new NullCounterImpl(symbol_table)),
null_gauge_(new NullGaugeImpl(symbol_table)) {}
null_gauge_(new NullGaugeImpl(symbol_table)),
default_scope_(std::make_shared<ScopePrefixer>("", *this)) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run into a situation wherein we have a shared_ptr to ScopePrefixer for the admin panel, but not the underlying scope?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifetime concerns are not changed by this PR; the Store needs to outlive all the Scopes declared from it.

In general if you instantiate any Scope-derived object on the stack, and you will get some time of failure (e.g. a weak_ptr exception).

That is why I changed default_scope_ to be allocated with make_shared, enabling #19693 to hold onto a reference to it, and mirroring the implementation in thread_local_store.cc.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…g over the default scope an extra time.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@pradeepcrao
Copy link
Copy Markdown
Contributor

The main doubt I have is that can the scopes be held by the admin page indefinitely, or is there a timeout mechanism? If the scopes live through several xDS updates, there is the possibility of Envoy OOMing.

@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Feb 7, 2022

This is a very insightful point! Let's dive in.

I don't think in practice this is a problem assuming requests have server-enforced timeouts. At the end of any request the shared_ptrs will be released. But it would be good to see if that's actually true.

Note we trust admin clients (a malicious one could just send /quitquitquit) but a broken admin client could just read the first chunk from a /stats request and then leave the connection open and start a new one. Note also that won't leak scopes quickly because it will just bump ref-counts on scopes that are held anyway in the configuration. The concern would be if xDS updates the config and drops the scope, held alive only by an in-progress admin request.

In the end I think it's request timeouts that prevent this from being a problem.

@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Feb 7, 2022

Also that's not an issue for this PR which is not changing the Admin behavior at all.

@pradeepcrao
Copy link
Copy Markdown
Contributor

This is a very insightful point! Let's dive in.

I don't think in practice this is a problem assuming requests have server-enforced timeouts. At the end of any request the shared_ptrs will be released. But it would be good to see if that's actually true.

Note we trust admin clients (a malicious one could just send /quitquitquit) but a broken admin client could just read the first chunk from a /stats request and then leave the connection open and start a new one. Note also that won't leak scopes quickly because it will just bump ref-counts on scopes that are held anyway in the configuration. The concern would be if xDS updates the config and drops the scope, held alive only by an in-progress admin request.

In the end I think it's request timeouts that prevent this from being a problem.

Thanks, that was my only concern.

LGTM

@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Feb 7, 2022

Thanks Pradeep

@ggreenway can you take a look?

@jmarantz
Copy link
Copy Markdown
Contributor Author

@ggreenway will you be able to take a look sometime? Or shall I re-assign?

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine. I'd like to get @mattklein123 's opinion on this also.

virtual ~Scope() = default;

/** @return a shared_ptr for this */
ScopeSharedPtr makeShared() { return shared_from_this(); }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be named getShared() or similar; makeShared() is too similar to std::make_shared() but this one doesn't construct a new object.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point: done.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…e_shared which creates a new object.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small nit. I have some vague concerns about circular references and memory leaks but we will know soon enough. :)

/wait

// files with trivial changes. Furthermore, code that depends on the Envoy stats
// infrastructure that's not in this repo will stop compiling when we remove
// ScopePtr. We should fully remove this alias in a future PR and change all the
// references, once known parthers that might break from this change of a chance
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// references, once known parthers that might break from this change of a chance
// references, once known consumers that might break from this change have a chance

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good; done by hand to avoid what I fear would be a DCO violation :) Committing shortly.

void forEachScope(SizeFn f_size, StatFn<const Scope> f_stat) const override {
if (f_size != nullptr) {
f_size(1);
f_size(scopes_.size() + 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this change but because of this change we can now do the API correctly? Is that right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, which API are you referring to?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEachScope

@jmarantz
Copy link
Copy Markdown
Contributor Author

At the moment scopes don't reference other scopes so I'm not seeing where the circular reference would come from.

But @pradeepcrao also pointed out that we might hold onto shared_ptr<scope> keeping it alive longer if there is a long-running admin operation. But I think that scopes appearing/disappearing a lot slower than even a long admin request.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit didn't fail.

🐱

Caused by: a #19791 (comment) was created by @jmarantz.

see: more, trace.

@jmarantz
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit didn't fail.

🐱

Caused by: a #19791 (comment) was created by @jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz dismissed ggreenway’s stale review February 16, 2022 15:20

addressed the requested change.

@jmarantz jmarantz enabled auto-merge (squash) February 16, 2022 15:21
@jmarantz jmarantz merged commit 5671b08 into envoyproxy:main Feb 16, 2022
@jmarantz jmarantz deleted the shared-scopes-1 branch February 16, 2022 15:56
jmarantz added a commit that referenced this pull request Apr 20, 2022
…eSharedPtr (#20888)

Commit Message: Follow-up to #19791, #19790 and #20871, finishing off all remaining ScopePtr references. One more PR to go which will be to remove the ScopePtr definition, but first need to make sure this doesn't break any references outside the repo.

Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
…eSharedPtr (envoyproxy#20888)

Commit Message: Follow-up to envoyproxy#19791, envoyproxy#19790 and envoyproxy#20871, finishing off all remaining ScopePtr references. One more PR to go which will be to remove the ScopePtr definition, but first need to make sure this doesn't break any references outside the repo.

Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
oschaaf pushed a commit to maistra/envoy that referenced this pull request Oct 26, 2022
…eSharedPtr (#20888)

Commit Message: Follow-up to envoyproxy/envoy#19791, #19790 and #20871, finishing off all remaining ScopePtr references. One more PR to go which will be to remove the ScopePtr definition, but first need to make sure this doesn't break any references outside the repo.

Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants