chore: Fix flaky testCancelOuterFutureAfterStart test#2006
Conversation
gax-java/gax/src/test/java/com/google/api/gax/retrying/ScheduledRetryingExecutorTest.java
Show resolved
Hide resolved
| Thread.sleep(30L); | ||
| // The test sleeps a duration long enough to ensure that the future has been submitted for | ||
| // execution | ||
| Thread.sleep(50L); |
There was a problem hiding this comment.
So the sleep time should be long enough to ensure the future has been submitted, but also short enough so the future is not completed(<275ms in the example)? If that's the case, can we make the retry delay longer and make this sleep time longer? So that we have higher chance of the future being submitted?
There was a problem hiding this comment.
It is possible to make the retry delay longer, but that doesn't eliminate the possibility that RRD is calculated as multiple consecutive low values as it is calculated between 0...RETRY_DELAY. Increasing the retry delay only makes it less likely.
I believe removing Jitter (only for testing) would make the RRD values consistent and guarantee no flakes.
There was a problem hiding this comment.
Yes the removal of jitter should make it much more stable, but I think there is still a chance that the future is not submitted after 50 ms if the CPU is overwhelmed.
If I'm understanding correctly, the first retry is at 25ms and the last retry is at 275ms, so the sleep time should between 25 and 275. However. there is no guarantee that the first retry is at 25ms since it's a different thread, so there is a chance that after 50ms, future is not started yet. Let me know if it makes sense.
There was a problem hiding this comment.
I see. I can a pick a number closer to 175ms (right before the 4th retry occurs) ... say 150ms? Obviously, still no guarantee that the future runs before, but would be extremely unlikely the CPU is stuck that long.
I can add in another check the ensure that the future runs at least once.
There was a problem hiding this comment.
I can a pick a number closer to 175ms (right before the 4th retry occurs) ... say 150ms?
If it's too close to the end, then we may have a problem on the other side, that the sleep may take too long so the future is already completed. That's why I was suggesting maybe we can have the retry delay to 1000ms, so that the sleep time 150ms is long enough for 25 ms, and short enough for 1000ms.
I can add in another check the ensure that the future runs at least once.
Actually I get this from your comment, I'm not sure if it's worth it to add one. I wouldn't add it if it would make the test more flaky and does not add much value.
There was a problem hiding this comment.
Ok yeah agreed that 150ms might cut it too close to the other end. I can change the retry multiplier so that the max retry delay will hit 1000ms and buy us enough time.
The reason for the check to ensure it runs once is to check that the future from executor.submit() was run. The first attempt is always scheduled to "run immediately" with 0ms delay (should be be able to be scheduled within say a sleep time of 150ms) so I don't think there would be any flakes caused from that.
|
[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed!
|
|
[java_showcase_integration_tests] Kudos, SonarCloud Quality Gate passed!
|
|
[java_showcase_unit_tests] Kudos, SonarCloud Quality Gate passed!
|








Thank you for opening a Pull Request! For general contributing guidelines, please refer to contributing guide
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #1498
Reproduced:
Had some difficulty reproducing this with 20k invocations. Lowered the InitialRetryDelay value to allow less variance for the Randomized Retry Delay and was able to reproduce this fairly easily.
Setting the InitialRetryDelay to 100ms and got:
Ran this with 20k invocations with 0 flakes with jitter off.
I believe the reason this test hasn't flaked as much is due to the extremely low (but non-zero) odds of from four RRD values.