Skip to content

chore: Fix testCancelOuterFutureBeforeStart flaky test#1583

Merged
lqiu96 merged 11 commits intomainfrom
main-fix_testCancelOuterFutureBeforeStart
Apr 14, 2023
Merged

chore: Fix testCancelOuterFutureBeforeStart flaky test#1583
lqiu96 merged 11 commits intomainfrom
main-fix_testCancelOuterFutureBeforeStart

Conversation

@lqiu96
Copy link
Member

@lqiu96 lqiu96 commented Mar 29, 2023

Thank you for opening a Pull Request! For general contributing guidelines, please refer to contributing guide

Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #1342☕️

@product-auto-label product-auto-label bot added the size: xs Pull request size is extra small. label Mar 29, 2023
@Test
public void testCancelOuterFutureBeforeStart() throws Exception {
FailingCallable callable = new FailingCallable(4, "request", "SUCCESS", tracer);
FailingCallable callable = new FailingCallable(0, "request", "SUCCESS", tracer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter is expectedFailuresCount and it should be 0. The callable's call() function should not be invoked.

@lqiu96
Copy link
Member Author

lqiu96 commented Mar 29, 2023

I believe the issue is that there was a slight timing issue with when the tracer would run. The ScheduledExectur would schedule the callable to run immediately. The subsequent retries would not run since the future has been cancelled, but the first one always runs (as the first run doesn't check if the future is cancelled.

@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Mar 29, 2023
@lqiu96 lqiu96 requested a review from blakeli0 March 29, 2023 21:11
@lqiu96 lqiu96 marked this pull request as ready for review March 29, 2023 21:11
@lqiu96 lqiu96 requested a review from a team March 29, 2023 21:11
@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Mar 29, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Mar 29, 2023
@burkedavison
Copy link
Member

LGTM

@lqiu96 lqiu96 added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 30, 2023
@lqiu96
Copy link
Member Author

lqiu96 commented Mar 30, 2023

Thanks @burkedavison. Talked with @blakeli0 about this and there are some concerns about this being a behavior breaking change. There some underlying details that I'm not fully certain about, so I'm going to hold off on merging this in.


Interestingly, the only non-testing related call of executor.submit() is here: https://github.com/googleapis/gapic-generator-java/blob/3f6c4c04fde254cfc3c1441b44beeb2bbe0c7e03/gax-java/gax/src/main/java/com/google/api/gax/retrying/CallbackChainRetryingFuture.java#L136-L138 which checks if the future is not done.

I wonder if we can do a similar check inside the test, something like:

    if (!future.isDone()) {
      future.setAttemptFuture(executor.submit(future));
    }
    assertEquals(0, future.getAttemptSettings().getAttemptCount());

    // Assert that the callable's call() is not invoked
    verifyNoInteractions(tracer);

The gax implementation doesn't seem to run if it's been cancelled. This test would remove the flakiness, but would also probably render this test useless and we might as well remove it. I'll need to do some more digging into this...

@lqiu96
Copy link
Member Author

lqiu96 commented Mar 30, 2023

Actually, on a second look... maybe I can fix this via the tests itself?

FailingCallable could mimic what we have in AttemptCallable:
https://github.com/googleapis/gapic-generator-java/blob/3f6c4c04fde254cfc3c1441b44beeb2bbe0c7e03/gax-java/gax/src/main/java/com/google/api/gax/rpc/RetryingCallable.java#L61-L63

We can set the externalFuture in the FailingCallable and check if the externalFuture.isDone() is true before calling the tracer logic.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Mar 31, 2023

verifyNoMoreInteractions(tracer);
// Assert that no tracer attempt has run
verifyNoInteractions(tracer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how useful is this verification now as we are basically verifying the behavior of our own MockCallable. As you mentioned in other comments, we are always checking that the future is not done before submitting it to an retrying executor, so the current flakiness is actually expected since we are not checking if the future is done again in the submit method itself.
That being said, I don't think we want to mimic the behavior in MockCallable as that's outside the scope of the unit tests of AbstractRetryingExecutorTest. My suggestion is that we keep this test case but remove this line of verification if we don't find any other benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MockCallable's tracer is to mimic the functionality inside AttemptCallable (which is what I believe FailingCallable is based on): https://github.com/googleapis/gapic-generator-java/blob/4bb0a5e3ce78878841e87a47cd457995ea9ed87e/gax-java/gax/src/main/java/com/google/api/gax/rpc/AttemptCallable.java#L78-L85

The tracer's functionality for the MockCallable can also be covered with the assertion below: assertEquals(0, future.getAttemptSettings().getAttemptCount());

so I'm fine with removing.

# Testing maven artifacts
maven.junit_junit=junit:junit:4.13.2
maven.org_mockito_mockito_core=org.mockito:mockito-core:2.28.2
maven.org_mockito_mockito_core=org.mockito:mockito-core:4.11.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a required change? If not, I would leave it out of this PR, and create another PR for updating the mockito version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was failing the CI when running the integration tests. I can upgrade this in a separate PR.

@lqiu96 lqiu96 added owlbot:run Add this label to trigger the Owlbot post processor. and removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels Apr 10, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 10, 2023
@lqiu96 lqiu96 closed this Apr 10, 2023
@lqiu96 lqiu96 reopened this Apr 10, 2023
// Use MockCallable instead of FailingCallable as we do not need the request to be re-tried.
// For this test, the callable should attempt to run and then see that the external future
// is cancelled and that the callable's execution has ended.
MockCallable callable = new MockCallable("SUCCESS");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't verify the tracer anymore, do we still need this new MockCallable? Can we keep using FailingCallable? I think the flakiness was only from verifyNoMoreInteractions()?

Copy link
Member Author

@lqiu96 lqiu96 Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep! Actually, on a second look I was thinking that we could add in a reference for the externalFuture for FailingCallable and check that the externalFuture's isDone() is true. It would keep it similar to how it's done for AttemptCallable: https://github.com/googleapis/gapic-generator-java/blob/99ade2be85d33e166ea31ea803dffcef2d402058/gax-java/gax/src/main/java/com/google/api/gax/rpc/AttemptCallable.java#L79-L81. However, this would add changes for all the other tests that use FailingCallable. We would be able to keep the verifyNoMoreInteractions() and ensure the tracer logic isn't run.

If this is too much, I can revert the previous commit and just remove in the verify() call. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, I see that you already updated with the new approach.

@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: m Pull request size is medium. labels Apr 10, 2023
callable.setExternalFuture(future);
boolean res = future.cancel(false);

verifyNoMoreInteractions(tracer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still verify it as the last step like previously? Or it has to be verified earlier now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to verify it here (before the executor submits the future) to ensure that the callable's logic isn't invoked (not invoking == exiting once it realizes the external future is done) after it's been submitted for execution.

Otherwise we can use verifyNoInteractions(tracer) and put that at the end which requires an updating mockito version in dependencies.properties. Functionality-wise, they should be the same so I just used verifyNoMoreInteractions which already exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they the same functional-wise? The callable would never run until executor.submit(future), and I think the purpose of this test is to verify that the inner callable would never run if the future is already cancelled, despite that the future is submitted. So verifying after executor.submit(future) makes more sense to me.

Copy link
Member Author

@lqiu96 lqiu96 Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner callable does run, but does not execute any logic -- call() is invoked but then immediately exited as the external future is always done. Since this is the first attempt, the random retry delay is set to 0 and is scheduled to execute immediately. I was thinking that this would create a possible race between when the callable exits and when we run verifyNoMoreInteractions.

If call() runs before verifyNoMoreInteractions then there wouldn't be a point of running verifyNoMoreInteractions as the callable has already exited
If call() runs after verifyNoMoreInteractions, then this line would confirm the behavior.

(From my understanding) We can't guarantee when the scheduleExecutor executes the callable (either before or after verifyNoMoreInteractions in the test) so asserting verifyNoMoreInteractions before the execution would confirm that the call() ran and that the tracer never ran.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line would invoke call() immediately? I still think the callable is not invoked until executor.submit(future) based on the Javadoc

Copy link
Member Author

@lqiu96 lqiu96 Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er* It should be invoked immediately once executor.submit() runs:
https://github.com/googleapis/gapic-generator-java/blob/59efbbbd485282e37abbc5eed87b72a15a33240e/gax-java/gax/src/main/java/com/google/api/gax/retrying/ScheduledRetryingExecutor.java#L115-L119 as retryingFuture.getAttemptSettings().getRandomizedRetryDelay().toMillis() -> 0ms for the first attempt

I looked up the verifyNoMoreInteractions javadoc and I misunderstood the use case. I assumed it was a point-in-time verification to ensure there was no more interactions once it was invoked (the race logic I mentioned above), but it's meant to ensure that there were no more interactions after we've verified that things have already run/ executed/ invoked a certain number of times.

I think having either verifyNoInteractions or verifyNoMoreInteractions at the end of the file should work. May make more sense to have this be verifyNoInteractions instead of verifyNoMoreInteractions for clarity (tracer should have 0 interactions and no more implies more than 0) but that would require Mockito 4.x in the dependencies.properties file. Either one is fine with me and I'll just add a small comment in the test file.

@sonarqubecloud
Copy link

[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@sonarqubecloud
Copy link

[java_showcase_integration_tests] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@sonarqubecloud
Copy link

[java_showcase_unit_tests] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@lqiu96 lqiu96 merged commit a2ee512 into main Apr 14, 2023
@lqiu96 lqiu96 deleted the main-fix_testCancelOuterFutureBeforeStart branch April 14, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: s Pull request size is small.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gax-java(mac): Flakey ScheduledRetryingExecutorTest.testCancelOuterFutureBeforeStart

3 participants