Skip to content

A61 update: fix ring_hash proactive connection attempt logic#475

Merged
markdroth merged 3 commits into
grpc:masterfrom
markdroth:A61_ring_hash_update
Feb 14, 2025
Merged

A61 update: fix ring_hash proactive connection attempt logic#475
markdroth merged 3 commits into
grpc:masterfrom
markdroth:A61_ring_hash_update

Conversation

@markdroth

Copy link
Copy Markdown
Member

Fix logic for ring_hash proactive connection attempts to trigger even if the state update was not triggered by an endpoint entering TRANSIENT_FAILURE state. This addresses two edge cases spotted by @arjan-bal:

  1. There are four endpoints in the following states: TF, TF, READY, and IDLE. If the READY endpoint fails, it transitions to IDLE, resulting in the new states: TF, TF, IDLE, IDLE.
  2. There are four endpoints in the following states: TF, TF, CONNECTING, and IDLE. If the CONNECTING endpoint is removed, the new states become: TF, TF, IDLE.

Comment thread A61-IPv4-IPv6-dualstack-backends.md Outdated

@ejona86 ejona86 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think talking so much about priority makes things overly complicated. CONNECTING has always implied "the LB policy is connecting." And since we adopted sticky-TF everywhere, TF also implies "the LB policy is connecting (with backoff)". But I understand why priority is being discussed; it just is unfortunate it is being discussed so much.

@markdroth markdroth merged commit 188d3d7 into grpc:master Feb 14, 2025
@markdroth markdroth deleted the A61_ring_hash_update branch February 14, 2025 22:49
copybara-service Bot pushed a commit to grpc/grpc that referenced this pull request Feb 18, 2025
See grpc/proposal#475 for details.

While I was at it, I greatly simplified the logic, avoiding an unnecessary duplicate loop over all endpoints that we didn't need to be doing.

I removed an unnecessary end2end test that was broken by this change.  That test actually should have broken when I first changed ring_hash to delegate to pick_first back in #34244, and I'm not sure why it didn't.  But it definitely broke here when I changed the logic to not choose endpoints in any specific order when triggering connection attempts, and since the test was really intended to test a behavior that we no longer guarantee, I removed it.

In its place, I added a unit test that shows the new behavior in a much clearer way.  (Originally, we didn't have a good way to write unit tests for LB policies, so all of our tests were written as e2e tests.  Now that we do have a good framework for unit tests, we've been slowly finding e2e tests that really should be unit tests instead, and moving them over when we find them.)

Closes #38741

COPYBARA_INTEGRATE_REVIEW=#38741 from markdroth:ring_hash_tf_and_connecting_fix 6c4c773
PiperOrigin-RevId: 728255272
paulosjca pushed a commit to paulosjca/grpc that referenced this pull request Feb 20, 2025
See grpc/proposal#475 for details.

While I was at it, I greatly simplified the logic, avoiding an unnecessary duplicate loop over all endpoints that we didn't need to be doing.

I removed an unnecessary end2end test that was broken by this change.  That test actually should have broken when I first changed ring_hash to delegate to pick_first back in grpc#34244, and I'm not sure why it didn't.  But it definitely broke here when I changed the logic to not choose endpoints in any specific order when triggering connection attempts, and since the test was really intended to test a behavior that we no longer guarantee, I removed it.

In its place, I added a unit test that shows the new behavior in a much clearer way.  (Originally, we didn't have a good way to write unit tests for LB policies, so all of our tests were written as e2e tests.  Now that we do have a good framework for unit tests, we've been slowly finding e2e tests that really should be unit tests instead, and moving them over when we find them.)

Closes grpc#38741

COPYBARA_INTEGRATE_REVIEW=grpc#38741 from markdroth:ring_hash_tf_and_connecting_fix 6c4c773
PiperOrigin-RevId: 728255272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants