storage: be more resilient to learner snap conflicts by danhhz · Pull Request #40435 · cockroachdb/cockroach

danhhz · 2019-09-03T18:56:51Z

The replica addition code first adds it as a raft learner, then hands it
a snapshot, then promotes it to a voter. For various unfortunate reasons
described in the code, we have to allow the raft snapshot queue to
also send snapshots to learners. A recent etcd change exposed that
this code has always been brittle to the raft snapshot queue winning the
race and starting the snapshot first by making the race dramatically
more likely.

After this commit, learner replica addition grabs a (best effort) lock
before the conf change txn to add the learner is started. This prevents
the race when the raft leader (and thus the raft snapshot queue for that
range) is on the same node.

Closes #40207

Release note: None

cockroach-teamcity · 2019-09-03T18:56:58Z

This change is

danhhz · 2019-09-03T18:58:40Z

How much do we care about the race when the replica addition is being run from a different node than the raft snapshot queue? If we do, I've spent so long in trying to do this somewhat gracefully and hitting issue after issue that my opinion at this point in the release is we should simply sniff the error in a retry loop.

tbg

How much do we care about the race when the replica addition is being run from a different node than the raft snapshot queue? If we do, I've spent so long in trying to do this somewhat gracefully and hitting issue after issue that my opinion at this point in the release is we should simply sniff the error in a retry loop.

What you have here is good. Solving this race in all generality is not worth it at this point.

Reviewed 12 of 12 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)

pkg/storage/replica_command.go, line 960 at r1 (raw file):

		_ = r.atomicReplicationChange
		releaseSnapshotLockFn := r.lockLearnerSnapshot(ctx, adds)
		defer releaseSnapshotLockFn()

Can you release this once addLearnerReplicas returns? I don't want to end up in a situation in which we send the snap, but then the learner for whatever reason needs another snap after it has converted to voter (in atomicReplicationChange) below, but we're blocking the queue and deadlocking things.

Oh, I'm seeing that the lock has a timeout. Anyway, it'd be cleaner to release this sooner. You can just move the unlock before the if err != nil.

pkg/storage/replica_raftstorage.go, line 391 at r1 (raw file):

	r.mu.Lock()
	appliedIndex := r.mu.state.RaftAppliedIndex
	// cleared when OutgoingSnapshot closes

nit: // Cleared when OutgoingSnapshot closes.

tbg

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz)

The replica addition code first adds it as a raft learner, then hands it a snapshot, then promotes it to a voter. For various unfortunate reasons described in the code, we have to allow the raft snapshot queue to _also_ send snapshots to learners. A recent etcd change exposed that this code has always been brittle to the raft snapshot queue winning the race and starting the snapshot first by making the race dramatically more likely. After this commit, learner replica addition grabs a (best effort) lock before the conf change txn to add the learner is started. This prevents the race when the raft leader (and thus the raft snapshot queue for that range) is on the same node. Closes cockroachdb#40207 Release note: None

danhhz

TFTR!

The test failure was a result of a String method changing in the raft bump, so updated the expected values in the test.

bors r=tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)

pkg/storage/replica_command.go, line 960 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can you release this once addLearnerReplicas returns? I don't want to end up in a situation in which we send the snap, but then the learner for whatever reason needs another snap after it has converted to voter (in atomicReplicationChange) below, but we're blocking the queue and deadlocking things.

Oh, I'm seeing that the lock has a timeout. Anyway, it'd be cleaner to release this sooner. You can just move the unlock before the if err != nil.

As discussed offline, added more commentary instead of moving the unlock

pkg/storage/replica_raftstorage.go, line 391 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: // Cleared when OutgoingSnapshot closes.

Done

40435: storage: be more resilient to learner snap conflicts r=tbg a=danhhz The replica addition code first adds it as a raft learner, then hands it a snapshot, then promotes it to a voter. For various unfortunate reasons described in the code, we have to allow the raft snapshot queue to _also_ send snapshots to learners. A recent etcd change exposed that this code has always been brittle to the raft snapshot queue winning the race and starting the snapshot first by making the race dramatically more likely. After this commit, learner replica addition grabs a (best effort) lock before the conf change txn to add the learner is started. This prevents the race when the raft leader (and thus the raft snapshot queue for that range) is on the same node. Closes #40207 Release note: None 40454: kv: improve unit testing around txnSpanRefresher r=nvanbenschoten a=nvanbenschoten This was decently tested through its inclusion in the TxnCoordSender, but since I'm planning on changing some behavior to address #36431, I figured we should first build up a test suite in the same style that we have for other transaction interceptors. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2019-09-04T20:03:48Z

Build succeeded

GitHub CI (Cockroach)

danhhz requested a review from tbg September 3, 2019 18:56

tbg approved these changes Sep 4, 2019

View reviewed changes

tbg reviewed Sep 4, 2019

View reviewed changes

danhhz force-pushed the learner_snap branch from ded676c to f916d2b Compare September 4, 2019 19:09

danhhz force-pushed the learner_snap branch from f916d2b to 20ba7bc Compare September 4, 2019 19:11

danhhz commented Sep 4, 2019

View reviewed changes

craig bot merged commit 20ba7bc into cockroachdb:master Sep 4, 2019

danhhz deleted the learner_snap branch September 5, 2019 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: be more resilient to learner snap conflicts#40435

storage: be more resilient to learner snap conflicts#40435
craig[bot] merged 1 commit intocockroachdb:masterfrom
danhhz:learner_snap

danhhz commented Sep 3, 2019

Uh oh!

cockroach-teamcity commented Sep 3, 2019

Uh oh!

danhhz commented Sep 3, 2019

Uh oh!

tbg left a comment

Uh oh!

tbg left a comment

Uh oh!

danhhz left a comment

Uh oh!

craig bot commented Sep 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danhhz commented Sep 3, 2019

Uh oh!

cockroach-teamcity commented Sep 3, 2019

Uh oh!

danhhz commented Sep 3, 2019

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Sep 4, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants