Skip to content

[bug report]LockObtainFailedException throws under presure #20876

@makeyang

Description

@makeyang

Elasticsearch version:
2.1
Plugins installed: []
delete-by-query
elasticsearch-analysis-ik
repository-hdfs
JVM version:
8u60
OS version:
CentOS release 6.6 (Final)
Description of the problem including expected versus actual behavior:
one of the data node keep throw below exception:
[2016-10-12 11:34:04,769][WARN ][cluster.action.shard ] [XXXX] [indexName][2] received shard failed for [indexName][2], node[rckOYj-DT42QNoH9CCEBJQ], relocating [v2zayugFQnuMiGu-hS1vXg], [R], v[7091], s[INI
TIALIZING], a[id=bkpcEq2qTXaPEKHl9tOunQ, rId=xeJJijQCRyaJPcSgQa7eGg], expected_shard_size[22462872851], indexUUID [sOKz0tW9Sw-u137Swoevsw], message [failed to create shard], failure [ElasticsearchException[failed to create shard]; nested: LockObtainF
ailedException[Can't lock shard [indexName][2], timed out after 5000ms]; ]
[indexName][[indexName][2]] ElasticsearchException[failed to create shard]; nested: LockObtainFailedException[Can't lock shard [indexName][2], timed out after 5000ms];
at org.elasticsearch.index.IndexService.createShard(IndexService.java:389)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:650)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:550)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:179)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:494)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.LockObtainFailedException: Can't lock shard [indexName][2], timed out after 5000ms
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:565)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:493)
at org.elasticsearch.index.IndexService.createShard(IndexService.java:307)
... 9 more
Steps to reproduce:(not very presious, I haven't reproduced it yet)

  1. give cluster a lot presure and one node out of cluster
  2. then remove presure and after a while, the node come back and try to recover some shard, it keeps throw below exception

Metadata

Metadata

Assignees

Labels

:Distributed/RecoveryAnything around constructing a new shard, either from a local or a remote source.:Distributed/Snapshot/RestoreAnything directly related to the `_snapshot/*` APIs>bugv2.4.1v5.0.0

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions