Skip to content

[BUG] SimpleClusterStateIT.testMetadataVersion is flaky #13207

@peternied

Description

@peternied

Describe the bug

SimpleClusterStateIT.testMetadataVersion failed due to an unexpected shard state.

apr 16, 2024 7:19:08 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[#1004,opensearch[node_s1][refresh][T#1],5,TGRP-SimpleClusterStateIT]
java.lang.AssertionError: shard [index-2][0] is not locked
	at __randomizedtesting.SeedInfo.seed([1BBE01ABC39963A3]:0)
	at org.opensearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:587)
	at org.opensearch.indices.IndicesService.deleteShardStore(IndicesService.java:1256)
	at org.opensearch.index.IndexService.onShardClose(IndexService.java:706)
	at org.opensearch.index.IndexService$StoreCloseListener.accept(IndexService.java:829)
	at org.opensearch.index.IndexService$StoreCloseListener.accept(IndexService.java:816)
	at org.opensearch.index.store.Store.closeInternal(Store.java:573)
	at org.opensearch.index.store.Store$1.closeInternal(Store.java:194)
	at org.opensearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:78)
	at org.opensearch.index.store.Store.decRef(Store.java:546)
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1777)
	at org.opensearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1753)
	at org.opensearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:4626)
	at org.opensearch.index.IndexService.maybeRefreshEngine(IndexService.java:1054)
	at org.opensearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1198)
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:854)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Related component

Cluster Manager

To Reproduce

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.SimpleClusterStateIT.testMetadataVersion" -Dtests.seed=1BBE01ABC39963A3 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=nn-NO -Dtests.timezone=Australia/Hobart -Druntime.java=21 NOTE: test params are: codec=Asserting(Lucene99): {fuu.keyword=PostingsFormat(name=Asserting), index_uuid=PostingsFormat(name=Asserting), foo=PostingsFormat(name=Asserting), fuu=PostingsFormat(name=Asserting), baz=PostingsFormat(name=Asserting), _id=PostingsFormat(name=Asserting), type=PostingsFormat(name=Asserting), baz.keyword=PostingsFormat(name=Asserting), foo.keyword=PostingsFormat(name=Asserting)}, docValues:{fuu.keyword=DocValuesFormat(name=Asserting), _seq_no=DocValuesFormat(name=Asserting), _primary_term=DocValuesFormat(name=Asserting), _version=DocValuesFormat(name=Asserting), baz.keyword=DocValuesFormat(name=Asserting), foo.keyword=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=701, maxMBSortInHeap=6.588006230407451, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=nn-NO, timezone=Australia/Hobart

Note; I have not tried to reproduce this bug at the time of this filing.

Expected behavior

Test should pass consistently

Additional Details

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Cluster ManagerbugSomething isn't workingflaky-testRandom test failure that succeeds on second run

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions