Skip to content

Shard initialization fails with DocValues exception #8009

@egueidan

Description

@egueidan

Hi,

we have been running into strange errors lately. We get a lot of exceptions of the type:

[2014-10-07 06:41:26,235][WARN ][cluster.action.shard     ] [Madcap] [my_index][1] sending failed shard for [my_index][1], node[-QGTVk8RRcuKuBwdqD8l1A], [P], s[INITIALIZING], indexUUID [u68VqfHsRii16gYXtPj1cQ], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[my_index][1] failed recovery]; nested: IllegalArgumentException[cannot change DocValues type from BINARY to SORTED_SET for field "custom.my_field"]; ]]
[2014-10-07 06:41:26,587][WARN ][indices.cluster          ] [Madcap] [my_index][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [my_index][1] failed recovery
  at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: cannot change DocValues type from BINARY to SORTED_SET for field "custom.my_field"
  at org.apache.lucene.index.FieldInfos$FieldNumbers.addOrGet(FieldInfos.java:198)
  at org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:868)
  at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:819)
  at org.elasticsearch.index.engine.internal.InternalEngine.createWriter(InternalEngine.java:1420)
  at org.elasticsearch.index.engine.internal.InternalEngine.start(InternalEngine.java:271)
  at org.elasticsearch.index.shard.service.InternalIndexShard.postRecovery(InternalIndexShard.java:692)
  at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:217)
  at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
  ... 3 more

We are in a daily index situation so the index is quite new. It contains 10 to 20 millions of documents spread over 10 shards and 5 nodes. At some point (after hours of the index being green), one of the shards becomes INITALIZING and can never start (because of the aforementioned exception). The index is then in red state and we cannot set it back on track... In this case the only solution we have found is to scroll over the whole index and reindex the data into a new index (but we most likely have lost the data from the failing shard). The field that causes the issue has the following definition {"type":"long","doc_values":true,"include_in_all":false}. This mapping is inferred from a dynamic template {"mapping":{"index":"not_analyzed","include_in_all":false,"doc_values":true,"type":"{dynamic_type}"},"match":"*"}.
One important note is that this part of the data is free-form (ie user input) and it is possible that some documents have conflicting types (one document having the field as string, the other as long); that's why the index has the setting index.mapping.ignore_malformed set to true.
Also, it might not be relevant, but this only happened on days when we had at least one node that was restarted.
We have noticed this issue since running ElasticSearch 1.3.4 (but can't be 100% sure that it did not happen before).

We cannot isolate and reproduce the issue but have faced it several times over the past few days. Feel free to suggest actions we can undertake should it happen again, to get more details to help fix it. Also, if you have suggestions to help bypass the issue when it happens (so that we can avoid reindexing the data), that'd be great.

Thanks,
Emmanuel

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions