Skip to content

CreateIndexIT.testCreateAndDeleteIndexConcurrently fails #15312

@ywelsch

Description

@ywelsch

http://build-us-00.elastic.co/job/es_core_master_centos/8910/testReport/junit/org.elasticsearch.action.admin.indices.create/CreateIndexIT/testCreateAndDeleteIndexConcurrently/

Stacktrace:

UnavailableShardsException[[test][1] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: index {[test][test][AVGADS6IhWB4gLTwZw1z], source[{"index_version":1}]}]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.retryBecauseUnavailable(TransportReplicationAction.java:665)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:381)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$3.onTimeout(TransportReplicationAction.java:522)
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
    at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:628)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Interesting log lines:

[2015-12-08 05:25:16,274][WARN ][org.elasticsearch.indices.cluster] [node_t0] [[test][1]] marking and sending shard failed due to [failed to create shard]
org.apache.lucene.store.LockObtainFailedException: Can't lock shard [test][1], timed out after 5000ms
    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:556)
    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:485)
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:236)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:591)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:491)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:185)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:526)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:596)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

...

[2015-12-08 05:25:26,295][DEBUG][org.elasticsearch.cluster.routing.allocation] [node_t0] [test][0] ignoring shard failure, unknown allocation id in [test][0], node[w7ZOpTaqRi6UI-q9EjfG2Q], [P], v[1], s[INITIALIZING], a[id=WmeldM5ORfuNr8rWhbHvYQ], unassigned_info[[reason=INDEX_CREATED], at[2015-12-08T05:25:11.268Z]] ([reason=ALLOCATION_FAILED], at[2015-12-08T05:25:26.295Z], details[master {node_t0}{w7ZOpTaqRi6UI-q9EjfG2Q}{local}{local[182]}[mode=>local] marked shard as initializing, but shard is marked as failed, resend shard failure])

...

[2015-12-08 05:25:26,303][DEBUG][org.elasticsearch.gateway] [node_t0] [test][0] found 0 allocations of [test][0], node[null], [P], v[2], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-12-08T05:25:26.294Z], details[failed to create shard, failure LockObtainFailedException[Can't lock shard [test][0], timed out after 5000ms]]], highest version: [-1]

Analysis:

What's happening is that index creation fails as locks cannot be obtained for allocating shards (I'm not sure why that is). Shards are marked as failed: This changes UnassignedInfo of the ShardRouting objects by setting reason from INDEX_CREATED to ALLOCATION_FAILED. What this entails is that the PrimaryShardAllocator does not treat these unassigned shards as "new index creation" anymore, but requires that shard data is available before assigning them. Relates to #15241.

Metadata

Metadata

Assignees

No one assigned

    Labels

    >testIssues or PRs that are addressing/adding tests>test-failureTriaged test failures from CI

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions