Skip to content

[BUG] org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness flaky #3603

@reta

Description

@reta

Describe the bug
New flaky test after #3563 got merged:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness" -Dtests.seed=CD3B9289D31206B8 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=nl -Dtests.timezone=Asia/Katmandu -Druntime.java=17

org.opensearch.cluster.allocation.AwarenessAllocationIT > testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness FAILED
    java.lang.AssertionError: unexpected
        at org.opensearch.test.InternalTestCluster.removeExclusions(InternalTestCluster.java:1912)
        at org.opensearch.test.InternalTestCluster.stopNodesAndClients(InternalTestCluster.java:1777)
        at org.opensearch.test.InternalTestCluster.stopNodesAndClient(InternalTestCluster.java:1764)
        at org.opensearch.test.InternalTestCluster.stopRandomNode(InternalTestCluster.java:1672)
        at org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness(AwarenessAllocationIT.java:425)

        Caused by:
        java.util.concurrent.ExecutionException: MasterNotDiscoveredException[null]
            at org.opensearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:286)
            at org.opensearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:273)
            at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:104)
            at org.opensearch.test.InternalTestCluster.removeExclusions(InternalTestCluster.java:1910)
            ... 4 more

            Caused by:
            MasterNotDiscoveredException[null]
                at app//org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction$2.onTimeout(TransportClusterManagerNodeAction.java:282)
                at app//org.opensearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:394)
                at app//org.opensearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:294)
                at app//org.opensearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:697)
                at app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:739)
                at java.base@17.0.3/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
                at java.base@17.0.3/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
                at java.base@17.0.3/java.lang.Thread.run(Thread.java:833)

    MasterNotDiscoveredException[null]
        at app//org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction$2.onTimeout(TransportClusterManagerNodeAction.java:282)
        at app//org.opensearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:394)
        at app//org.opensearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:294)
        at app//org.opensearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:697)
        at app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:739)
        at java.base@17.0.3/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base@17.0.3/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base@17.0.3/java.lang.Thread.run(Thread.java:833)

To Reproduce
Steps to reproduce the behavior:

/gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness" -Dtests.seed=CD3B9289D31206B8 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=nl -Dtests.timezone=Asia/Katmandu -Druntime.java=17

Expected behavior
On CI, the test is flaky, see please https://ci.opensearch.org/logs/ci/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/gradle_check_6061.log for example.,

Plugins
Standard.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • CI

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

>test-failureTest failure from CI, local build, etc.bugSomething isn't workingflaky-testRandom test failure that succeeds on second run

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions