client channel: don't hold mutexes while calling the ConfigSelector or the LB picker by markdroth · Pull Request #31973 · grpc/grpc

markdroth · 2022-12-23T00:10:22Z

Specific changes:

We no longer hold the relevant mutexes while calling the ConfigSelector and the LB picker. Instead, each call holds the mutex for just long enough to grab a ref to the object. This has a bunch of side effects:
- Several LB policies have been updated to use synchronization in their pickers, since picks can now be done concurrently.
- Now that we will be holding refs to the ConfigSelector and LB picker outside of the WorkSerializer, we need to explicitly hop back into the WorkSerializer to release the refs, since ConfigSelectors and picker implementations should be able to expect that they are destroyed in the WorkSerializer. (A couple of LB policies actually had some code to take care of this themselves, but that shouldn't be necessary, so I've removed it.)
- Instead of trying to keep a call queued until we know that it can proceed, we now remove calls from the queue as soon as we get a resolver update or a new picker, then re-process the queued calls, and if the result of re-processing indicates that the call should still be queued, then we then re-add it to the queue. This approach is slightly less efficient for queued calls (each time a call is re-added to the queue, there is some memory allocation and potential lock contention), but since that's the slow path, this seems like a good trade-off for the improved concurrency in the fast path.
- Not holding the lock while calling the picker introduces a race condition where the picker can be updated after a call grabs a ref to the old picker. If the old picker queues the call, we need to immediately retry with the new picker. To address this, when the picker queues a call, we re-acquire the lock and check to see if the picker has changed: if so, we retry with the new picker, and if not, we queue the call.
Previously, when we re-processed queued calls, we took advantage of the fact that we were already holding the lock to synchronously process all queued calls under the lock, before then doing an async callback for each call to proceed after the lock is released. Now, we instead immediately do an async callback for each call that goes through the same code-path as non-queued calls (which is a significant code simplification), so each callback needs to independently re-acquire the lock. I'm not actually sure whether this will hurt or improve performance -- it could increase lock contention, but depending on how the async callbacks are scheduled, it could actually increase parallelism -- but even if the performance here turns out to be worse, this is the slow path, so the trade-off seems worth the code simplification.
We now use EventEngine::Run() instead of ExecCtx::Run() for re-processing queued calls. This avoids the need to store a grpc_closure for each of name resolution and load balancing, which reduces per-call memory.
We now use absl::flat_hash_set<> to store queued calls instead of a bespoke linked list approach. This allocates memory on the slow path, but it reduces per-call memory usage. (It also makes it more efficient to cancel an individual call when there are a large number of queued calls.)

Also a couple of tangential changes while I was here:

Switch from rand(3) to the absl random library in a few places.
Update comments and remove some dead code from SubchannelWrapper.
Also use absl::flat_hash_set<> instead of std::set<> for the set of subchannel wrappers.

…w to drain queue safely

…cker_mutex

src/core/ext/filters/client_channel/client_channel.h

markdroth

I also changed another existing place where we were using std::set<> to use absl::flat_hash_set<> instead.

src/core/ext/filters/client_channel/client_channel.h

…cker_mutex

…lector or the LB picker (#31973)" This reverts commit e699e01.

…lector or the LB picker (#31973)" (#32324) This reverts commit e699e01.

…ConfigSelector or the LB picker (grpc#31973)" (grpc#32324)" This reverts commit 90524b0.

@ejona86

…ConfigSelector or the LB picker (#32326) Original attempt was #31973, reverted in #32324 due to test flakiness. There were two problems causing test flakiness here. The first problem was that, upon resolver error, we were dispatching an async callback to re-process each of the queued picks *before* we updated the channel's connectivity state, which meant that the queued picks might be re-processed in another thread before the new connectivity state was set, so tests that expected the state to be TRANSIENT_FAILURE once RPCs failed might not see the expected state. The second problem affected the xDS ring hash tests, and it's a bit more involved to explain. We have an e2e test that simulates an aggregate cluster failover from a primary cluster using ring_hash at startup. The primary cluster has two addresses, both of which are unreachable when the client starts up, so the client should immediately fail over to the secondary cluster, which does have reachable endpoints. The test requires that no RPCs are failed while this failover occurs. The original PR made this test flaky. The problem here was caused by a combination of two factors: 1. Prior to the original PR, when the picker was updated (which happens inside the WorkSerializer), we re-processed previously queued picks synchronously, so it was not possible for another subchannel connectivity state update (which also happens in the WorkSerializer) to be processed between the time that we updated the picker and the time that we re-processed the previously queued picks. The original PR changed this such that the queued picks are re-processed asynchronously (outside of the WorkSerializer), so it is now possible for a subchannel connectivity state update to be processed between when the picker is updated and when we re-process the previously queued picks. 2. Unlike most LB policies, where the picker does not see updated subchannel connectivity states until a new picker is created, the ring_hash picker gets the subchannel connectivity states from the LB policy via a lock, so it can wind up seeing the new states before it gets updated. This means that when a subchannel connectivity state update is processed by the ring_hash policy in the WorkSerializer, it will immediately be seen by the existing picker, even without a picker update. With those two points in mind, the sequence of events in the failing test were as follows: 1. The pick is attempted in the ring_hash picker for the primary cluster. This causes the first subchannel to attempt to connect. 2. The subchannel transitions from IDLE to CONNECTING. A new picker is returned due to the subchannel connectivity state change, and the channel retries the queued pick. The retried pick is done asynchronously, but in this case it does not matter: the call will be re-queued. 3. The connection attempt fails, and the subchannel reports TRANSIENT_FAILURE. A new picker is again returned, and the channel retries the queued pick. The retried pick is done asynchronously, but in this case it does not matter: this causes the picker to trigger a connection attempt for the second subchannel. 4. The second subchannel transitions from IDLE to CONNECTING. A new picker is again returned, and the channel retries the queued pick. The retried pick is done asynchronously, and in this case it *does* matter. 5. The second subchannel now transitions to TRANSIENT_FAILURE. The ring_hash policy will now report TRANSIENT_FAILURE, but before it can finish that... 6. ...In another thread, the channel now tries to re-process the queued pick using the CONNECTING picker from step 4. However, because the ring_hash policy has already seen the TRANSIENT_FAILURE report from the second subchannel, that picker will now fail the pick instead of queuing it. After discussion with @ejona86 and @dfawley (since this bug actually exists in Java and Go as well), we agreed that the right solution is to change the ring_hash picker to contain its own copy of the subchannel connectivity state information, rather than sharing that information with the LB policy using synchronization.

…r the LB picker (grpc#31973) * WIP * use the same lock for queue and picker -- still need to figure out how to drain queue safely * working! * always unref pickers in WorkSerializer * simplify tracking of queued calls * fix sanity * simplify resolver queueing code * fix tsan failure * simplify queued LB pick reprocessing * minor cleanups * remove unnecessary wrap-around checks * clang-format * generate_projects * fix pollset bug * use absl::flat_hash_set<> instead of std::set<> * fix use-after-free in retry code * add missing BUILD dep

…lector or the LB picker (grpc#31973)" (grpc#32324) This reverts commit e699e01.

@ejona86

…ConfigSelector or the LB picker (grpc#32326) Original attempt was grpc#31973, reverted in grpc#32324 due to test flakiness. There were two problems causing test flakiness here. The first problem was that, upon resolver error, we were dispatching an async callback to re-process each of the queued picks *before* we updated the channel's connectivity state, which meant that the queued picks might be re-processed in another thread before the new connectivity state was set, so tests that expected the state to be TRANSIENT_FAILURE once RPCs failed might not see the expected state. The second problem affected the xDS ring hash tests, and it's a bit more involved to explain. We have an e2e test that simulates an aggregate cluster failover from a primary cluster using ring_hash at startup. The primary cluster has two addresses, both of which are unreachable when the client starts up, so the client should immediately fail over to the secondary cluster, which does have reachable endpoints. The test requires that no RPCs are failed while this failover occurs. The original PR made this test flaky. The problem here was caused by a combination of two factors: 1. Prior to the original PR, when the picker was updated (which happens inside the WorkSerializer), we re-processed previously queued picks synchronously, so it was not possible for another subchannel connectivity state update (which also happens in the WorkSerializer) to be processed between the time that we updated the picker and the time that we re-processed the previously queued picks. The original PR changed this such that the queued picks are re-processed asynchronously (outside of the WorkSerializer), so it is now possible for a subchannel connectivity state update to be processed between when the picker is updated and when we re-process the previously queued picks. 2. Unlike most LB policies, where the picker does not see updated subchannel connectivity states until a new picker is created, the ring_hash picker gets the subchannel connectivity states from the LB policy via a lock, so it can wind up seeing the new states before it gets updated. This means that when a subchannel connectivity state update is processed by the ring_hash policy in the WorkSerializer, it will immediately be seen by the existing picker, even without a picker update. With those two points in mind, the sequence of events in the failing test were as follows: 1. The pick is attempted in the ring_hash picker for the primary cluster. This causes the first subchannel to attempt to connect. 2. The subchannel transitions from IDLE to CONNECTING. A new picker is returned due to the subchannel connectivity state change, and the channel retries the queued pick. The retried pick is done asynchronously, but in this case it does not matter: the call will be re-queued. 3. The connection attempt fails, and the subchannel reports TRANSIENT_FAILURE. A new picker is again returned, and the channel retries the queued pick. The retried pick is done asynchronously, but in this case it does not matter: this causes the picker to trigger a connection attempt for the second subchannel. 4. The second subchannel transitions from IDLE to CONNECTING. A new picker is again returned, and the channel retries the queued pick. The retried pick is done asynchronously, and in this case it *does* matter. 5. The second subchannel now transitions to TRANSIENT_FAILURE. The ring_hash policy will now report TRANSIENT_FAILURE, but before it can finish that... 6. ...In another thread, the channel now tries to re-process the queued pick using the CONNECTING picker from step 4. However, because the ring_hash policy has already seen the TRANSIENT_FAILURE report from the second subchannel, that picker will now fail the pick instead of queuing it. After discussion with @ejona86 and @dfawley (since this bug actually exists in Java and Go as well), we agreed that the right solution is to change the ring_hash picker to contain its own copy of the subchannel connectivity state information, rather than sharing that information with the LB policy using synchronization.

…r the LB picker (#31973) * WIP * use the same lock for queue and picker -- still need to figure out how to drain queue safely * working! * always unref pickers in WorkSerializer * simplify tracking of queued calls * fix sanity * simplify resolver queueing code * fix tsan failure * simplify queued LB pick reprocessing * minor cleanups * remove unnecessary wrap-around checks * clang-format * generate_projects * fix pollset bug * use absl::flat_hash_set<> instead of std::set<> * fix use-after-free in retry code * add missing BUILD dep

…lector or the LB picker (#31973)" (#32324) This reverts commit e699e01.

@ejona86

…ConfigSelector or the LB picker (#32326) Original attempt was #31973, reverted in #32324 due to test flakiness. There were two problems causing test flakiness here. The first problem was that, upon resolver error, we were dispatching an async callback to re-process each of the queued picks *before* we updated the channel's connectivity state, which meant that the queued picks might be re-processed in another thread before the new connectivity state was set, so tests that expected the state to be TRANSIENT_FAILURE once RPCs failed might not see the expected state. The second problem affected the xDS ring hash tests, and it's a bit more involved to explain. We have an e2e test that simulates an aggregate cluster failover from a primary cluster using ring_hash at startup. The primary cluster has two addresses, both of which are unreachable when the client starts up, so the client should immediately fail over to the secondary cluster, which does have reachable endpoints. The test requires that no RPCs are failed while this failover occurs. The original PR made this test flaky. The problem here was caused by a combination of two factors: 1. Prior to the original PR, when the picker was updated (which happens inside the WorkSerializer), we re-processed previously queued picks synchronously, so it was not possible for another subchannel connectivity state update (which also happens in the WorkSerializer) to be processed between the time that we updated the picker and the time that we re-processed the previously queued picks. The original PR changed this such that the queued picks are re-processed asynchronously (outside of the WorkSerializer), so it is now possible for a subchannel connectivity state update to be processed between when the picker is updated and when we re-process the previously queued picks. 2. Unlike most LB policies, where the picker does not see updated subchannel connectivity states until a new picker is created, the ring_hash picker gets the subchannel connectivity states from the LB policy via a lock, so it can wind up seeing the new states before it gets updated. This means that when a subchannel connectivity state update is processed by the ring_hash policy in the WorkSerializer, it will immediately be seen by the existing picker, even without a picker update. With those two points in mind, the sequence of events in the failing test were as follows: 1. The pick is attempted in the ring_hash picker for the primary cluster. This causes the first subchannel to attempt to connect. 2. The subchannel transitions from IDLE to CONNECTING. A new picker is returned due to the subchannel connectivity state change, and the channel retries the queued pick. The retried pick is done asynchronously, but in this case it does not matter: the call will be re-queued. 3. The connection attempt fails, and the subchannel reports TRANSIENT_FAILURE. A new picker is again returned, and the channel retries the queued pick. The retried pick is done asynchronously, but in this case it does not matter: this causes the picker to trigger a connection attempt for the second subchannel. 4. The second subchannel transitions from IDLE to CONNECTING. A new picker is again returned, and the channel retries the queued pick. The retried pick is done asynchronously, and in this case it *does* matter. 5. The second subchannel now transitions to TRANSIENT_FAILURE. The ring_hash policy will now report TRANSIENT_FAILURE, but before it can finish that... 6. ...In another thread, the channel now tries to re-process the queued pick using the CONNECTING picker from step 4. However, because the ring_hash policy has already seen the TRANSIENT_FAILURE report from the second subchannel, that picker will now fail the pick instead of queuing it. After discussion with @ejona86 and @dfawley (since this bug actually exists in Java and Go as well), we agreed that the right solution is to change the ring_hash picker to contain its own copy of the subchannel connectivity state information, rather than sharing that information with the LB policy using synchronization.

- Add a couple of new TODOs that we need to address before making the LB policy API public. - Remove a comment about picker thread-safety that should be been removed in #31973. - Remove a TODO that was addressed in #33451.

WIP

a8b6fd6

markdroth added disposition/Work In Progress release notes: no Indicates if PR should not be in release notes labels Dec 23, 2022

github-actions bot added the lang/core label Dec 23, 2022

grpc-checks bot added per-call-memory/neutral per-channel-memory/neutral bloat/improvement labels Dec 23, 2022

use the same lock for queue and picker -- still need to figure out ho…

592116a

…w to drain queue safely

markdroth mentioned this pull request Jan 17, 2023

WRR: implement WRR LB policy #31904

Merged

markdroth added 4 commits January 19, 2023 20:32

Merge remote-tracking branch 'upstream/master' into client_channel_pi…

f47424c

…cker_mutex

working!

f22170f

Merge remote-tracking branch 'upstream/master' into client_channel_pi…

f1df9b7

…cker_mutex

always unref pickers in WorkSerializer

7d24c5f

grpc-checks bot added bloat/none and removed bloat/improvement labels Jan 24, 2023

markdroth added 2 commits January 24, 2023 23:52

simplify tracking of queued calls

0c77b76

fix sanity

6b2c96c

grpc-checks bot added bloat/low and removed bloat/none labels Jan 25, 2023

markdroth added 3 commits January 25, 2023 18:37

simplify resolver queueing code

f652440

fix tsan failure

9df6032

simplify queued LB pick reprocessing

0836eb1

markdroth changed the title ~~client channel: don't hold a mutex while calling the picker~~ client channel: don't hold mutexes while calling the ConfigSelector or the LB picker Jan 25, 2023

markdroth added 4 commits January 25, 2023 20:04

minor cleanups

357fd17

remove unnecessary wrap-around checks

fc28ca3

clang-format

d703788

generate_projects

50c5e09

grpc-checks bot added per-channel-memory/decrease and removed per-channel-memory/neutral per-channel-memory/decrease labels Jan 25, 2023

markdroth marked this pull request as ready for review January 25, 2023 23:00

markdroth requested a review from ctiller January 25, 2023 23:00

markdroth mentioned this pull request Jan 26, 2023

client channel: refactor call objects in preparation for promise conversion #32223

Merged

ctiller approved these changes Jan 26, 2023

View reviewed changes

src/core/ext/filters/client_channel/client_channel.h Outdated Show resolved Hide resolved

src/core/ext/filters/client_channel/client_channel.h Outdated Show resolved Hide resolved

use absl::flat_hash_set<> instead of std::set<>

653e2d5

markdroth commented Jan 26, 2023

View reviewed changes

src/core/ext/filters/client_channel/client_channel.h Outdated Show resolved Hide resolved

src/core/ext/filters/client_channel/client_channel.h Outdated Show resolved Hide resolved

grpc-checks bot added per-channel-memory/neutral and removed per-channel-memory/decrease labels Jan 26, 2023

markdroth added 2 commits January 31, 2023 00:54

Merge remote-tracking branch 'upstream/master' into client_channel_pi…

9278df8

…cker_mutex

fix use-after-free in retry code

4750201

grpc-checks bot added bloat/medium and removed bloat/low labels Jan 31, 2023

markdroth added 2 commits January 31, 2023 16:52

add missing BUILD dep

93b1dfe

Merge remote-tracking branch 'upstream/master' into client_channel_pi…

ea1cd8a

…cker_mutex

markdroth merged commit e699e01 into grpc:master Feb 7, 2023

copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Feb 8, 2023

markdroth added a commit that referenced this pull request Feb 8, 2023

Revert "client channel: don't hold mutexes while calling the ConfigSe…

32a94d8

…lector or the LB picker (#31973)" This reverts commit e699e01.

markdroth mentioned this pull request Feb 8, 2023

Revert "client channel: don't hold mutexes while calling the ConfigSelector or the LB picker" #32324

Merged

ctiller pushed a commit that referenced this pull request Feb 8, 2023

Revert "client channel: don't hold mutexes while calling the ConfigSe…

90524b0

…lector or the LB picker (#31973)" (#32324) This reverts commit e699e01.

markdroth added a commit to markdroth/grpc that referenced this pull request Feb 8, 2023

Revert "Revert "client channel: don't hold mutexes while calling the …

ee89c9f

…ConfigSelector or the LB picker (grpc#31973)" (grpc#32324)" This reverts commit 90524b0.

markdroth mentioned this pull request Feb 8, 2023

Second attempt: client channel: don't hold mutexes while calling the ConfigSelector or the LB picker #32326

Merged

XuanWang-Amos pushed a commit to XuanWang-Amos/grpc that referenced this pull request May 1, 2023

Revert "client channel: don't hold mutexes while calling the ConfigSe…

b4acee9

…lector or the LB picker (grpc#31973)" (grpc#32324) This reverts commit e699e01.

wanlin31 pushed a commit that referenced this pull request May 18, 2023

Revert "client channel: don't hold mutexes while calling the ConfigSe…

3ea661e

…lector or the LB picker (#31973)" (#32324) This reverts commit e699e01.

markdroth mentioned this pull request Aug 24, 2023

[LB policy API] update some TODOs and comments #34156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client channel: don't hold mutexes while calling the ConfigSelector or the LB picker#31973

client channel: don't hold mutexes while calling the ConfigSelector or the LB picker#31973
markdroth merged 21 commits intogrpc:masterfrom
markdroth:client_channel_picker_mutex

markdroth commented Dec 23, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

markdroth left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

markdroth commented Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markdroth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markdroth commented Dec 23, 2022 •

edited

Loading