I decided to reindex my data to take advantage of doc_values, but one of 30 indices (~120m docs in each) got less documents after reindexing. I reindexed again and docs disappeared again.
Then I bisected the problem to specific docs and found that some docs in source index has duplicate ids.
curl -s "http://web245:9200/statistics-20141110/_search?pretty&q=_id:1jC2LxTjTMS1KHCn0Prf1w"
{
"took" : 1156,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "statistics-20141110",
"_type" : "events",
"_id" : "1jC2LxTjTMS1KHCn0Prf1w",
"_score" : 1.0,
"_source":{"@timestamp":"2014-11-10T14:30:00+0300","@key":"client_belarussia_msg_sended_from_mutual__22_1","@value":"149"}
}, {
"_index" : "statistics-20141110",
"_type" : "events",
"_id" : "1jC2LxTjTMS1KHCn0Prf1w",
"_score" : 1.0,
"_source":{"@timestamp":"2014-11-10T14:30:00+0300","@key":"client_belarussia_msg_sended_from_mutual__22_1","@value":"149"}
} ]
}
}
Here are two indices, source and destination:
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open statistics-20141110 5 0 116217042 0 12.3gb 12.3gb
green open statistics-20141110-dv 5 1 116216507 0 32.3gb 16.1gb
Segments of problematic index:
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
statistics-20141110 0 p 192.168.0.190 _gga 21322 14939669 0 1.6gb 4943008 true true 4.9.0 false
statistics-20141110 0 p 192.168.0.190 _isc 24348 10913518 0 1.1gb 4101712 true true 4.9.0 false
statistics-20141110 1 p 192.168.0.245 _7i7 9727 7023269 0 766mb 2264472 true true 4.9.0 false
statistics-20141110 1 p 192.168.0.245 _i01 23329 14689581 0 1.5gb 4788872 true true 4.9.0 false
statistics-20141110 2 p 192.168.1.212 _9wx 12849 8995444 0 987.7mb 3326288 true true 4.9.0 false
statistics-20141110 2 p 192.168.1.212 _il1 24085 13205585 0 1.4gb 4343736 true true 4.9.0 false
statistics-20141110 3 p 192.168.1.212 _8pc 11280 10046395 0 1gb 4003824 true true 4.9.0 false
statistics-20141110 3 p 192.168.1.212 _hwt 23213 13226096 0 1.3gb 4287544 true true 4.9.0 false
statistics-20141110 4 p 192.168.2.88 _91i 11718 8328558 0 909.2mb 2822712 true true 4.9.0 false
statistics-20141110 4 p 192.168.2.88 _hms 22852 14848927 0 1.5gb 4777472 true true 4.9.0 false
The only thing that happened with index besides indexing is optimizing to 2 segments per shard.
I decided to reindex my data to take advantage of
doc_values, but one of 30 indices (~120m docs in each) got less documents after reindexing. I reindexed again and docs disappeared again.Then I bisected the problem to specific docs and found that some docs in source index has duplicate ids.
{ "took" : 1156, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ { "_index" : "statistics-20141110", "_type" : "events", "_id" : "1jC2LxTjTMS1KHCn0Prf1w", "_score" : 1.0, "_source":{"@timestamp":"2014-11-10T14:30:00+0300","@key":"client_belarussia_msg_sended_from_mutual__22_1","@value":"149"} }, { "_index" : "statistics-20141110", "_type" : "events", "_id" : "1jC2LxTjTMS1KHCn0Prf1w", "_score" : 1.0, "_source":{"@timestamp":"2014-11-10T14:30:00+0300","@key":"client_belarussia_msg_sended_from_mutual__22_1","@value":"149"} } ] } }Here are two indices, source and destination:
Segments of problematic index:
The only thing that happened with index besides indexing is optimizing to 2 segments per shard.