Skip to content

Handle Aggregate cluster loops and dups#10274

Merged
larry-safran merged 9 commits intogrpc:masterfrom
larry-safran:ag_clust_loop_and_dups
Jun 15, 2023
Merged

Handle Aggregate cluster loops and dups#10274
larry-safran merged 9 commits intogrpc:masterfrom
larry-safran:ag_clust_loop_and_dups

Conversation

@larry-safran
Copy link
Contributor

@larry-safran larry-safran commented Jun 9, 2023

For duplicate children, just ignore them
For loops, Put the channels into TRANSIENT_FAILURE like when no edge policy is provided

Note, to handle loop detection being done after state objects are created, I updated FakeXdsClient watcher map value to be a collection (as is true for XdsClientImpl).

Fixes go/grpc-lb-xds-tracker row 22 (not tracked as an issue)

…tors of the current CDS aggregate. Also put in a code path to do NACK instead of handling loops as that is what the other languages do.
@larry-safran larry-safran requested a review from temawi June 12, 2023 20:13
Comment on lines +215 to +217
if (childLb != null) {
childLb.shutdown();
childLb = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the previous loop guarantee that childLB is always shut down if loopStatus is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. Removed it.


// CLUSTER (aggr.) -> [cluster2 (aggr.)]
String cluster2 = "cluster-02.googleapis.com";
update =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update just swaps CLUSTER from having cluster1 to cluster2, right? Why does this need to be done in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significantly rewrote this test as it wasn't testing what I intended it to.


// cluster2 (aggr.) -> [cluster3 (EDS)]
String cluster3 = "cluster-03.googleapis.com";
CdsUpdate update2 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse the "update" variable like before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is actually cleaner for each update to be in a different variable so you can see in the debugger exactly what has been done and it is easier to spot when you are accidentally forgetting to update the variable before delivering the update.

Comment on lines +540 to +541
Status unavailable = Status.UNAVAILABLE.withDescription(
"CDS error: found 0 leaf (logical DNS or EDS) clusters for root cluster " + CLUSTER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test for missing leaf clusters as it seems, or for duplicate children as the name applies? Where are duplicates added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

…ed aggregations (same cluster appears under 2 distinct parents that doesn't create a loop).
@larry-safran larry-safran requested a review from temawi June 14, 2023 01:06
Copy link
Contributor

@temawi temawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be worth it to also do an integration test with a control plane and an XdsClient. The unit testing here is fundamental, but sometimes the interactions of load balancers in the LB tree can be unexpected.

This is actually pretty easy to do, take a look at FakeControlPlaneXdsIntegrationTest. You could ignore the DataPlaneRule part.

Optional of course if you feel confident that the unit tests are covering all the bases.

private void handleClusterDiscovered() {
List<DiscoveryMechanism> instances = new ArrayList<>();
Map<ClusterState, List<ClusterState>> parentClusters = new HashMap<>();
Status loopStatus = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to add a comment explaining that we have this loop detection built into the following logic and why it is important to have it.

Comment on lines +322 to +324
if (childClusterStates == null) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this short circuit? It's better to have a single return point from a method and this change doesn't seem to affect the logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was trying to reduce nesting if { for { if { } } }. Rewrote it to use a stream instead of a for with an if.

@larry-safran
Copy link
Contributor Author

Because we are using xdsClient.deliverCdsUpdate, I believe that the FakeControlPlane would be running through the exact same logic paths.

@larry-safran larry-safran requested a review from temawi June 14, 2023 22:46
@larry-safran larry-safran merged commit 9939200 into grpc:master Jun 15, 2023
@larry-safran larry-safran deleted the ag_clust_loop_and_dups branch June 15, 2023 17:24
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants