Test failure: http://build-us-00.elastic.co/job/es_core_master_window-2008/2553/testReport/junit/org.elasticsearch.indices.state/RareClusterStateIT/testDeleteCreateInOneBulk/
The test fails due to a race in acquiring ShardLock locks.
When an index is deleted, an asynchronous process is started to process pending deletes on shards of that index. This process first acquires all ShardLock locks for the given index in numeric shard order. Meanwhile, the new index can already have been created, and some shard locks can already be held due to shard creation in IndicesClusterStateService.applyInitializingShard. For example, shard 0 is locked by processPendingDeletes but shard 1 is locked by applyInitializingShard. In that case, processPendingDeletes cannot lock shard 1 and blocks (and will hold lock on shard 0 for 30 minutes). This means that shard 0 cannot be initialised for 30 minutes.
Interesting bits of stack trace:
"elasticsearch[node_t1][generic][T#2]" ID=602 TIMED_WAITING on java.util.concurrent.Semaphore$NonfairSync@2fc45c3b
at sun.misc.Unsafe.park(Native Method)
- timed waiting on java.util.concurrent.Semaphore$NonfairSync@2fc45c3b
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:555)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:485)
at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:429)
at org.elasticsearch.indices.IndicesService.processPendingDeletes(IndicesService.java:649)
at org.elasticsearch.cluster.action.index.NodeIndexDeletedAction.lockIndexAndAck(NodeIndexDeletedAction.java:101)
at org.elasticsearch.cluster.action.index.NodeIndexDeletedAction.access$300(NodeIndexDeletedAction.java:46)
at org.elasticsearch.cluster.action.index.NodeIndexDeletedAction$1.doRun(NodeIndexDeletedAction.java:90)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked synchronizers:
- java.util.concurrent.ThreadPoolExecutor$Worker@b17810e
"elasticsearch[node_t1][clusterService#updateTask][T#1]" ID=591 TIMED_WAITING on java.util.concurrent.Semaphore$NonfairSync@7fdcd730
at sun.misc.Unsafe.park(Native Method)
- timed waiting on java.util.concurrent.Semaphore$NonfairSync@7fdcd730
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:555)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:485)
at org.elasticsearch.index.IndexService.createShard(IndexService.java:234)
- locked org.elasticsearch.index.IndexService@707e1798
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:628)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:528)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:185)
- locked java.lang.Object@773b911a
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:517)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked synchronizers:
- java.util.concurrent.ThreadPoolExecutor$Worker@26f887da
Test failure: http://build-us-00.elastic.co/job/es_core_master_window-2008/2553/testReport/junit/org.elasticsearch.indices.state/RareClusterStateIT/testDeleteCreateInOneBulk/
The test fails due to a race in acquiring
ShardLocklocks.When an index is deleted, an asynchronous process is started to process pending deletes on shards of that index. This process first acquires all
ShardLocklocks for the given index in numeric shard order. Meanwhile, the new index can already have been created, and some shard locks can already be held due to shard creation inIndicesClusterStateService.applyInitializingShard. For example, shard 0 is locked byprocessPendingDeletesbut shard 1 is locked byapplyInitializingShard. In that case,processPendingDeletescannot lock shard 1 and blocks (and will hold lock on shard 0 for 30 minutes). This means that shard 0 cannot be initialised for 30 minutes.Interesting bits of stack trace: