Skip to content

Mapping updates should be synchronous #8688

@clintongormley

Description

@clintongormley

Today, a new field can be added to the local mapping of two shards simultaneously. If the detected field type is different, then each shard can end up with a different field type. These shards then send their new mapping to the master, but only of the mappings will win.

This can result in incorrect results and even data loss, eg: one shard thinks that the field is a string, and the other shard (and the master) thinks that it is a number. In this case, sorting and aggregations will be wrong (#8485). Then, when the string-shard is allocated to a new node, that node will receive the "number" mapping from the master. Replaying the transaction log can cause shard failures (eg #8684). Or new numeric doc values are written but, when we try to merge segments, it fails with a doc values exception (#8009).

The only way to ensure that dynamic fields are the same on all shards is to wait for the master to acknowledge the mapping changes before indexing the document, so:

  • parse the document
  • update local mapping
  • if changed, send to master and wait for ack
  • index document

This will potentially slow down indexing when many dynamic fields are being added, but it is the only way to automatically protect against data loss due to mapping conflicts.

Should a user wish to disable waiting for the master (and they are certain that their dynamic mapping rules are good enough to prevent these problems) then we should allow them to opt-out by setting "dynamic": "unsafe"

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions