Skip to content

AsyncClientFactory: use getThreadLocalCluster for thread safety#15227

Merged
mattklein123 merged 12 commits intoenvoyproxy:mainfrom
wozz:patch-1
Mar 4, 2021
Merged

AsyncClientFactory: use getThreadLocalCluster for thread safety#15227
mattklein123 merged 12 commits intoenvoyproxy:mainfrom
wozz:patch-1

Conversation

@wozz
Copy link
Copy Markdown
Contributor

@wozz wozz commented Feb 28, 2021

Commit Message: AsyncClientFactory: use getThreadLocalCluster for thread safety
Additional Description: due to #14204, AsyncClientFactory should use ClusterManager::getThreadLocalCluster to ensure that checking the cluster status can be performed on main or worker threads.
Risk Level: low
Testing: existing tests
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

due to envoyproxy#14204, doing static cluster checks for AsyncClientFactory can only be done on the main thread. adds a ENVOY_BUG check to ensure that cluster checks are skipped when not running on the main thread.

Signed-off-by: Michael Wozniak <wozz@koh.ms>

Signed-off-by: wozz <wozz@users.noreply.github.com>
Signed-off-by: Michael Wozniak <wozz@koh.ms>
@wozz wozz requested a review from zuercher as a code owner February 28, 2021 22:53
Signed-off-by: Michael Wozniak <wozz@koh.ms>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this check. One question.

/wait

Signed-off-by: Michael Wozniak <wozz@koh.ms>
@wozz wozz changed the title AsyncClientFactory: add bug check for main thread AsyncClientFactory: add assert for main thread Mar 1, 2021
Comment on lines +36 to +37
ASSERT(Thread::MainThread::isMainThread(),
"async client factory cluster checks should only be performed on the main thread");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really sorry, I should have looked at this in more detail, but instead of using the clusters() api should we just be using the get() API which is thread local? This code seems broken. If we did that could we remove the skip clusters check param entirely?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can use the get api, but the skip_clusters_check would become require_static_cluster? I'm not sure what the implications are of a cluster being statically defined vs provided by API or if there are cases that actually require statically defined clusters or if that requirement is due to some legacy behavior that's no longer necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ^ exactly, thanks.

/wait

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the utility function has the same issue, relying on ClusterManager::clusters(), so I've just updated this one instance to use getThreadLocalCluster and maintain the same semantics as previously for the skip_cluster_checks param.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async client factory should only ever be used on the main thread. I think it might be worth asserting that, since you probably will get other kinds of broken behavior if you try use it on a worker thread.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example where it's currently being used on a worker thread: https://github.com/envoyproxy/envoy/blob/main/source/extensions/filters/http/ext_authz/config.cc#L80..L89

Does that need to change?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point actually, I think what you have makes sense then. CC @chaoqin-li1123 for visibility into this PR.

Signed-off-by: Michael Wozniak <wozz@koh.ms>
@wozz wozz changed the title AsyncClientFactory: add assert for main thread AsyncClientFactory: use getThreadLocalCluster for thread safety Mar 1, 2021
Signed-off-by: Michael Wozniak <wozz@koh.ms>
wozz added 4 commits March 1, 2021 23:20
Signed-off-by: Michael Wozniak <wozz@koh.ms>
Signed-off-by: Michael Wozniak <wozz@koh.ms>
Signed-off-by: Michael Wozniak <wozz@koh.ms>
Signed-off-by: Michael Wozniak <wozz@koh.ms>

using AsyncClientFactoryPtr = std::unique_ptr<AsyncClientFactory>;

enum class AsyncClientFactoryClusterChecks { Skip, ValidateStatic, ValidateStaticDuringBootstrap };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is ValidateStaticDuringBootstrap used in prod code? I don't see it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's effectively used here: https://github.com/envoyproxy/envoy/blob/main/source/server/server.cc#L620..L621 in factoryForGrpcApiConfigSource

This is the place where getThreadLocalCluster doesn't work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK got it. It really seems like we should be able to do something better here but I'm struggling to think of what that better thing is so I agree we can go with this right now. I will take a look once passing CI.

Signed-off-by: Michael Wozniak <wozz@koh.ms>
Signed-off-by: Michael Wozniak <wozz@koh.ms>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 296d106 into envoyproxy:main Mar 4, 2021
wozz added a commit to wozz/envoy that referenced this pull request Mar 11, 2021
…ty (envoyproxy#15227)"

This reverts commit 296d106.

Signed-off-by: Michael Wozniak <wozz@koh.ms>
mattklein123 pushed a commit that referenced this pull request Mar 12, 2021
…ty (#15227)" (#15449)

This reverts commit 296d106.

Signed-off-by: Michael Wozniak <wozz@koh.ms>
aunu53 pushed a commit to aunu53/envoy that referenced this pull request Mar 19, 2021
…ty (envoyproxy#15227)" (envoyproxy#15449)

This reverts commit 296d106.

Signed-off-by: Michael Wozniak <wozz@koh.ms>
Signed-off-by: Auni Ahsan <auni@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants