Skip to content

storage: make subsumption atomic with merge commit application#28684

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
benesch:merge-atomic
Aug 16, 2018
Merged

storage: make subsumption atomic with merge commit application#28684
craig[bot] merged 1 commit intocockroachdb:masterfrom
benesch:merge-atomic

Conversation

@benesch
Copy link
Copy Markdown
Contributor

@benesch benesch commented Aug 16, 2018

Please review the first commit separately in #28661.

During a merge, the subsumed range logically ceases to exist at the
moment that the merge transaction commit applies. It's important that we
remove the subsumed range from disk in the same batch as the merge
commit to protect against an ill-timed crash. Tease apart
Replica.destroyRaftMuLocked to make this possible.

Release note: None

@benesch benesch requested review from a team, bdarnell and tbg August 16, 2018 05:11
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1, 8 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/replica.go, line 809 at r2 (raw file):

	// Suggest the cleared range to the compactor queue.
	r.store.compactor.Suggest(ctx, storagebase.SuggestedCompaction{

This should happen later, after you've actually committed it, in postDestroyRaftMuLocked (or any later point in time).


pkg/storage/replica.go, line 872 at r2 (raw file):

	}

	ms := r.GetMVCCStats()

Would feel better if you grabbed this before actually destroying the replica. It's unclear what answer you're getting right now.

Copy link
Copy Markdown
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/subsumation/subsumption/

Reviewed 2 of 10 files at r1, 8 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/store.go, line 2529 at r2 (raw file):

// RemoveOptions bundles boolean parameters for Store.RemoveReplica.
type RemoveOptions struct {
	DestroyReplica bool

I prefer the old name. The Replica object is effectively destroyed either way; the main effect of destroyRaftMuLocked is to delete the data.

Copy link
Copy Markdown
Contributor Author

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, thanks. I knew "subsumation" sounded funny.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/replica.go, line 809 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This should happen later, after you've actually committed it, in postDestroyRaftMuLocked (or any later point in time).

Done.


pkg/storage/replica.go, line 872 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Would feel better if you grabbed this before actually destroying the replica. It's unclear what answer you're getting right now.

Done.


pkg/storage/store.go, line 2529 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I prefer the old name. The Replica object is effectively destroyed either way; the main effect of destroyRaftMuLocked is to delete the data.

Done.

@benesch benesch changed the title storage: make subsumation atomic with merge commit application storage: make subsumption atomic with merge commit application Aug 16, 2018
During a merge, the subsumed range logically ceases to exist at the
moment that the merge transaction commit applies. It's important that we
remove the subsumed range from disk in the same batch as the merge
commit to protect against an ill-timed crash. Tease apart
Replica.destroyRaftMuLocked to make this possible.

Release note: None
@benesch
Copy link
Copy Markdown
Contributor Author

benesch commented Aug 16, 2018

bors r=bdarnell,tschottdorf

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 16, 2018

Build failed

@benesch
Copy link
Copy Markdown
Contributor Author

benesch commented Aug 16, 2018

Examples-ORMs flake.

bors r=bdarnell,tschottdorf

craig bot pushed a commit that referenced this pull request Aug 16, 2018
28684: storage: make subsumption atomic with merge commit application r=bdarnell,tschottdorf a=benesch

~Please review the first commit separately in #28661.~

During a merge, the subsumed range logically ceases to exist at the
moment that the merge transaction commit applies. It's important that we
remove the subsumed range from disk in the same batch as the merge
commit to protect against an ill-timed crash. Tease apart
Replica.destroyRaftMuLocked to make this possible.

Release note: None

28707: server: don't drain when decommissioning r=nvanbenschoten a=tschottdorf

Prior to this commit, a server which enters decommissioning would
automatically drain. This wasn't a great idea since it meant that
pointing the decommissioning command at a set of nodes required
for the cluster to work would brick that cluster, and would be
difficult to recover from (since the draining state is picked
up by the nodes when they start).

Instead, decouple draining from decommissioning. Decommissioning simply
tells the allocator to move data off that node; when the node gets shut
down cleanly (the operator's responsiblity), it will drain.

The resulting behavior when trying to decommission a too-large set of
nodes is now that the decommission command will simply not finish.
For example, starting a three node cluster and decommissioning all
nodes will simply hang (though the nodes will be marked as
decommissioning). Add three more nodes to the cluster and replicas
will move over to the newly added nodes, and the decommissioning
command will finish.

As a bonus, recommissioning a node now doesn't require the target
node to restart.

As a second bonus, the decommissioning acceptance test now takes
around 60% of the previous time (~45s down from 70+).

One remaining caveat is that users may forget that they attempted
to decommission nodes. We need to check that we prominently alert
in the UI when nodes are decommissioning. This isn't a new problem.

Fixes #27444.
Fixes #27025.

Release note (bug fix): decommissioning multiple nodes is now possible
without posing a risk to cluster health. Recommissioning a node does
no longer require a restart of the target nodes to take effect.

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 16, 2018

Build succeeded

@craig craig bot merged commit 4fcc6b5 into cockroachdb:master Aug 16, 2018
@benesch benesch deleted the merge-atomic branch August 17, 2018 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants