Skip to content

Validate dynamic mappings updates on the master node.#10634

Merged
jpountz merged 2 commits intoelastic:masterfrom
jpountz:fix/validate_mappings_on_master
Apr 21, 2015
Merged

Validate dynamic mappings updates on the master node.#10634
jpountz merged 2 commits intoelastic:masterfrom
jpountz:fix/validate_mappings_on_master

Conversation

@jpountz
Copy link
Copy Markdown
Contributor

@jpountz jpountz commented Apr 16, 2015

This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:

  • a failing index request on a non-existing type does not implicitely create
    the type anymore
  • dynamic mappings updates cannot create inconsistent mappings on different
    shards
  • indexing requests that introduce new fields might induce latency spikes
    because of the overhead to update the mappings on the master node

Closes #8688
Closes #8650

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Apr 16, 2015

This is still a work-in-progress as I have test failures on RiverTests. I am suspecting this is because this test triggers lots of concurrent dynamic mappings updates and there is a deadlock somehow because of my ignorange of how our transport works (eg. it is ok to update the mappings like I do in MapperUtils.validateDynamicMappingsUpdateOnMaster?). Any help would be greatly appreciated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bad case to get into? At least should it be warn or error level? This means we reapplied some updates from the translog, but the master rejected those updates...but we dont seem to do anything with the validation, so does that mean the mappings have already been updated in place? Also, could we limit the catch to just whatever exception would mean failed validation? Otherwise a bug as simple as an NPE in validate would get caught and just logged?

@rjernst
Copy link
Copy Markdown
Member

rjernst commented Apr 17, 2015

The mappings side looks great! My main question is about what to do on translog replay if sending mapping updates to the master fails.

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Apr 17, 2015

Good question. This should hopefully not happen once mappings updates are synchronous but I believe this could happen when upgrading from 1.x. In that case I don't think we can do better than warning a scary message to the logs?

@rjernst
Copy link
Copy Markdown
Member

rjernst commented Apr 19, 2015

@jpountz Ok, let's put it at warn level then?

@bleskes
Copy link
Copy Markdown
Contributor

bleskes commented Apr 19, 2015

I think we should fail the shard in this case. It means that some data was indexed in a way that is inconsistent with the “master” mapping in which case you probably have rogue lucene indices. Picking up on another copy is a better alternative. If we ending up having no copies we are at least clearly communicating things are bad (master have a different mapping that is inconsistent with any shard). If people want to still recover they can intervene and delete the translog (which may loose some data but I think it’s an acceptable solution for this bad place to be in). Also note that we flush on close during node shutdown, so the chance this will happen is small...

On 19 Apr 2015, at 04:01, Ryan Ernst notifications@github.com wrote:

@jpountz Ok, let's put it at warn level then?


Reply to this email directly or view it on GitHub.

@jpountz jpountz removed the WIP label Apr 20, 2015
@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Apr 20, 2015

I added (and beasted) more tests about concurrent indexing requests and the rivers issue seems to be specific to rivers actually so I left mapping updates handled like today when it comes to rivers (ie. updated locally first and then propagated to the master).

Also failing to apply a mapping update on recovery now fails the shard.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be an AtomicReference?

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Apr 20, 2015

@bleskes @kimchy Pushed a new commit to address your comments

jpountz added 2 commits April 21, 2015 11:08
This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:
 - a failing index request on a non-existing type does not implicitely create
   the type anymore
 - dynamic mappings updates cannot create inconsistent mappings on different
   shards
 - indexing requests that introduce new fields might induce latency spikes
   because of the overhead to update the mappings on the master node

Close elastic#8688
@jpountz jpountz force-pushed the fix/validate_mappings_on_master branch from e5da1df to 1adf232 Compare April 21, 2015 09:19
jpountz added a commit that referenced this pull request Apr 21, 2015
Mappings: Validate dynamic mappings updates on the master node.
@jpountz jpountz merged commit ac74247 into elastic:master Apr 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mapping updates should be synchronous Mapping potentially lost with "dynamic" : "strict", _default_ mapping and failed document index

7 participants