Skip to content

Duplicate id in index #8788

@bobrik

Description

@bobrik

I decided to reindex my data to take advantage of doc_values, but one of 30 indices (~120m docs in each) got less documents after reindexing. I reindexed again and docs disappeared again.

Then I bisected the problem to specific docs and found that some docs in source index has duplicate ids.

curl -s "http://web245:9200/statistics-20141110/_search?pretty&q=_id:1jC2LxTjTMS1KHCn0Prf1w"
{
  "took" : 1156,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "statistics-20141110",
      "_type" : "events",
      "_id" : "1jC2LxTjTMS1KHCn0Prf1w",
      "_score" : 1.0,
      "_source":{"@timestamp":"2014-11-10T14:30:00+0300","@key":"client_belarussia_msg_sended_from_mutual__22_1","@value":"149"}
    }, {
      "_index" : "statistics-20141110",
      "_type" : "events",
      "_id" : "1jC2LxTjTMS1KHCn0Prf1w",
      "_score" : 1.0,
      "_source":{"@timestamp":"2014-11-10T14:30:00+0300","@key":"client_belarussia_msg_sended_from_mutual__22_1","@value":"149"}
    } ]
  }
}

Here are two indices, source and destination:

health status index                  pri rep docs.count docs.deleted store.size pri.store.size
green  open   statistics-20141110      5   0  116217042            0     12.3gb         12.3gb
green  open   statistics-20141110-dv   5   1  116216507            0     32.3gb         16.1gb

Segments of problematic index:

index               shard prirep ip            segment generation docs.count docs.deleted    size size.memory committed searchable version compound
statistics-20141110 0     p      192.168.0.190 _gga         21322   14939669            0   1.6gb     4943008 true      true       4.9.0   false
statistics-20141110 0     p      192.168.0.190 _isc         24348   10913518            0   1.1gb     4101712 true      true       4.9.0   false
statistics-20141110 1     p      192.168.0.245 _7i7          9727    7023269            0   766mb     2264472 true      true       4.9.0   false
statistics-20141110 1     p      192.168.0.245 _i01         23329   14689581            0   1.5gb     4788872 true      true       4.9.0   false
statistics-20141110 2     p      192.168.1.212 _9wx         12849    8995444            0 987.7mb     3326288 true      true       4.9.0   false
statistics-20141110 2     p      192.168.1.212 _il1         24085   13205585            0   1.4gb     4343736 true      true       4.9.0   false
statistics-20141110 3     p      192.168.1.212 _8pc         11280   10046395            0     1gb     4003824 true      true       4.9.0   false
statistics-20141110 3     p      192.168.1.212 _hwt         23213   13226096            0   1.3gb     4287544 true      true       4.9.0   false
statistics-20141110 4     p      192.168.2.88  _91i         11718    8328558            0 909.2mb     2822712 true      true       4.9.0   false
statistics-20141110 4     p      192.168.2.88  _hms         22852   14848927            0   1.5gb     4777472 true      true       4.9.0   false

The only thing that happened with index besides indexing is optimizing to 2 segments per shard.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions