Skip to content

Write shard state metadata as soon as shard is created / initializing#16625

Merged
ywelsch merged 1 commit intoelastic:masterfrom
ywelsch:fix/fail-on-persist-shard-state-metadata
Feb 29, 2016
Merged

Write shard state metadata as soon as shard is created / initializing#16625
ywelsch merged 1 commit intoelastic:masterfrom
ywelsch:fix/fail-on-persist-shard-state-metadata

Conversation

@ywelsch
Copy link
Copy Markdown
Contributor

@ywelsch ywelsch commented Feb 12, 2016

As we now rely on active allocation ids persisted in the cluster state to select
the primary shard copy, we can write shard state metadata on the allocated node
as soon as the node knows about receiving this shard. This also ensures that
in case of primary relocation, when the relocation target is marked as started
by the master node, the shard state metadata with the correct allocation id has
already been written on the relocation target. Before this change, shard state
metadata was only written once the node knows it is marked as started. In case
of failures between master marking the node as started and the node
receiving and processing this event, the relation between the shard copy on disk
and the cluster state could get lost. This means that manual allocation of
the shard using the reroute command allocate_stale_primary was necessary.

Relates to #14739

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes I'm not sure what effect removing this has. The issue that made me remove this is that the shard state metadata was written when shard is created, then it was removed again if shard was recovery target, and not updated anymore since the shard state metadata did not change from point of view of IndexShard.persistMetadata(). With writing shard state metadata directly, we now know that the shard state metadata is up-to-date before we do recovery (hence no need to delete shard state?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added as a protection against a failure during the rename, leaving the shard in a corrupted state (#10053) . We later added better checks in another place ( #11269 ) . I think it's OK to remove.

@ywelsch ywelsch force-pushed the fix/fail-on-persist-shard-state-metadata branch from 9b0988e to 8b252b7 Compare February 12, 2016 08:40
@ywelsch
Copy link
Copy Markdown
Contributor Author

ywelsch commented Feb 22, 2016

@bleskes ping

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand this correctly, this part is relevant where we assigned a primary after a cluster upgrade and the shard initialized (and wrote a new state file) but we never got around to activating it before crushing again. if that's correct, can you add this to the comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

@bleskes
Copy link
Copy Markdown
Contributor

bleskes commented Feb 29, 2016

change looks good to me. Left some suggestions and questions re testing..

@ywelsch ywelsch force-pushed the fix/fail-on-persist-shard-state-metadata branch from fe713f0 to ef3f69e Compare February 29, 2016 11:33
@ywelsch
Copy link
Copy Markdown
Contributor Author

ywelsch commented Feb 29, 2016

Pushed another commit addressing review comments. Also found a copy-paste bug in a test.

@bleskes
Copy link
Copy Markdown
Contributor

bleskes commented Feb 29, 2016

LGTM. Thanks @ywelsch

As we rely on active allocation ids persisted in the cluster state to select
the primary shard copy, we can write shard state metadata on the allocated node
as soon as the node knows about receiving this shard. This also ensures that
in case of primary relocation, when the relocation target is marked as started
by the master node, the shard state metadata with the correct allocation id has
already been written on the relocation target. Before this change, shard state
metadata was only written once the node knows it is marked as started. In case
of failures between master marking the node as started and the node
receiving and processing this event, the relation between the shard copy on disk
and the cluster state could get lost. This means that manual allocation of
the shard using the reroute command allocate_stale_primary was necessary.

Closes elastic#16625
@ywelsch ywelsch force-pushed the fix/fail-on-persist-shard-state-metadata branch from ef3f69e to d76161d Compare February 29, 2016 12:49
ywelsch pushed a commit that referenced this pull request Feb 29, 2016
…e-metadata

Write shard state metadata as soon as shard is created / initializing
@ywelsch ywelsch merged commit 7fc9f03 into elastic:master Feb 29, 2016
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement v5.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants