Skip to content

Conversation

@larry-safran
Copy link
Contributor

@larry-safran larry-safran commented Feb 1, 2024

Fix retry tests that became flaky. Fixes #10873

@larry-safran larry-safran marked this pull request as ready for review February 2, 2024 21:38
@larry-safran larry-safran changed the title improve debugging Fix flaky retry tests Feb 2, 2024
private final FakeClock fakeClock = new FakeClock();
@Mock
private ClientCall.Listener<Integer> mockCallListener;
private TestListener realMockCallListener = new TestListener();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use the word "mock" since it's not a mock object. Just testCallListener would do I think, or people ofthen use the term "fake" with these kinds of classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to testCallListener

Comment on lines 177 to 179
@Before
public void setUp() throws Exception {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

if (getTransportTracer() != null) {
getTransportTracer().reportStreamClosed(status.isOk());
}
listener().closed(status, rpcProgress, trailers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You switched the order of the the listener and transportTracer.
The original tests is about verifying census stats(in statsTraceCtx), how does that related to transportTracer?

@larry-safran
Copy link
Contributor Author

larry-safran commented Feb 2, 2024 via email

@YifeiZhuang
Copy link
Member

when the closed method is called before the tracer report is generated there is a window where the report isn't available, which is what caused the failures.

hmm, I had a theory about why the call closed method is called before those reports I put on the issue.

@larry-safran
Copy link
Contributor Author

larry-safran commented Feb 2, 2024 via email

Copy link
Contributor

@temawi temawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough understanding of the original problem to know if @YifeiZhuang's concern here is warranted, but if she is ok with the change it LGTM.

@YifeiZhuang
Copy link
Member

I agree that the test setup caused the flakiness, not a real problem. One thing to keep in mind: when canceling the retryable stream returns and close and master listener, all the substreams should have been closed already. Because notifying master listener is the last thing we should do in retryable stream. Otherwise there will be resource leak, e.g. RejectedExecutionException.
But I don't see the fix address that problem. In the current fix, the master listener may still happen before the substream listener is closed. It might mitigate the flakiness, but I'm not yet convinced this is the appropriate fix.
cc. @ejona86

@larry-safran larry-safran merged commit 374dbe9 into grpc:master Feb 5, 2024
@larry-safran larry-safran deleted the is_10873 branch February 5, 2024 18:55
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit to larry-safran/grpc-java that referenced this pull request Feb 5, 2024
* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit that referenced this pull request Feb 5, 2024
* Fix retries that timeout hanging forever. (#10855)

Fixes #10336

* Fix flaky retry tests (#10887)

* Reorder tracing and actually closing listener to eliminate test flakiness
* Use real value rather than mock for flaky test
larry-safran added a commit that referenced this pull request Feb 6, 2024
* Fix retries that timeout hanging forever. (#10855)

Fixes #10336

* Fix flaky retry tests (#10887)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

interop-test, core: retryTest is flaky

3 participants