eagerly write Raft state for new Ranges by tbg · Pull Request #7310 · cockroachdb/cockroach

tbg · 2016-06-17T17:23:26Z

Change the initialization of Raft groups so that whenever we create a Replica
for a "new" Raft group (as opposed to creating a Replica that will receive
a snapshot), we write an initial ReplicaState and associated Raft state.

The WIP part is in the last commit which disables some tests (all of which, I think,
create replicas out-of-band and thus don't properly initialize the state). Some of
them may actually pass, but they shouldn't (I flagged them while I still had an
assertion in there which I'm planning to put in again).

We're still updating in-mem and on-disk state kind of ad-hoc, but that's for a future
PR.

This change is

bdarnell · 2016-06-19T12:13:49Z

Generally looks good when the tests are sorted out.

Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.

storage/replica_state.go, line 326 [r6] (raw file):

  }

  if err := setHardState(eng, rangeID, raftpb.HardState{

We can't just clobber the HardState: we could have already cast a vote that must be preserved. This can happen in the TestSplitSnapshotRace_SnapshotWins scenario: this replica is behind and the other nodes have already started an election (or several, so the term could be higher too). We won't know anything else about the range because we can't apply a snapshot in this state, but the HardState is still valid. We need to load the existing HardState and only set Term and Commit if the previous value was zero.

Comments from Reviewable

tamird · 2016-06-20T02:52:28Z

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 6 of 6 files at r5, 1 of 3 files at r6.
Review status: 6 of 8 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

storage/replica.go, line 825 [r6] (raw file):

  if diskState, err := loadState(r.store.Engine(), r.mu.state.Desc); err != nil {
      log.Warning(err)
  } else if !reflect.DeepEqual(diskState, ri.ReplicaState) {

does proto.Equal work?

storage/replica_command.go, line 2188 [r5] (raw file):

  // EndTransaction, but the plumbing has not been done yet.
  sp := r.store.Tracer().StartSpan("split")
  log.Warningf("trigger")

hello

storage/replica_state.go, line 66 [r3] (raw file):

  // pointless), but it is and the migration is not worth it.
  if truncState, err := loadTruncatedState(reader, desc.RangeID); err != nil {
      return storagebase.ReplicaState{}, err

nit: doesn't pass golint

storage/replica_state.go, line 80 [r5] (raw file):

  eng engine.ReadWriter, state storagebase.ReplicaState,
) (enginepb.MVCCStats, error) {
  ms, rangeID := &state.Stats, state.Desc.RangeID

this looks like it's updating the stats in place, which is odd given that it is eventually returned. the comment on the function is not exactly clear about whether this is intended, but it seems sketchy to do both (update in place and return).

storage/replica_state.go, line 203 [r5] (raw file):

}

func loadGCThreshold(reader engine.Reader, rangeID roachpb.RangeID) (hlc.Timestamp, error) {

fair amount of random code movement in this commit, hard to tell what's what

Comments from Reviewable

tbg · 2016-06-20T22:19:24Z

All green (at least should be), PTAL.

Review status: 0 of 13 files reviewed at latest revision, 6 unresolved discussions, some commit checks pending.

storage/replica.go, line 825 [r6] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

does proto.Equal work?

No, `panic: interface conversion: interface is roachpb.RKey, not []uint8`.

storage/replica_command.go, line 2188 [r5] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

hello

Done.

storage/replica_state.go, line 66 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: doesn't pass golint

not getting any failure (at least locally), what are you looking at?

storage/replica_state.go, line 80 [r5] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this looks like it's updating the stats in place, which is odd given that it is eventually returned. the comment on the function is not exactly clear about whether this is intended, but it seems sketchy to do both (update in place and return).

`state` is our copy, so I don't think anything is updated in-place.

storage/replica_state.go, line 203 [r5] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

fair amount of random code movement in this commit, hard to tell what's what

Yeah, sorry about that. I spent a bunch of time trying to keep things in separate commits, but since I didn't know how far I would get (I was expecting more issues than I eventually had) I eventually gave up.

storage/replica_state.go, line 326 [r6] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We can't just clobber the HardState: we could have already cast a vote that must be preserved. This can happen in the TestSplitSnapshotRace_SnapshotWins scenario: this replica is behind and the other nodes have already started an election (or several, so the term could be higher too). We won't know anything else about the range because we can't apply a snapshot in this state, but the HardState is still valid. We need to load the existing HardState and only set Term and Commit if the previous value was zero.

Done, PTAL if what I did makes sense.

Comments from Reviewable

tamird · 2016-06-20T23:46:36Z

Reviewed 5 of 11 files at r11, 1 of 3 files at r15, 7 of 7 files at r16.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.

storage/replica.go, line 1946 [r16] (raw file):

  // TODO(petermattis): We did not close the writer in an earlier version of
  // the code, which went undetected even though we used the batch after
  // (though only to commit it). We should be an assertion to prevent that in

be an assertion

storage/replica_raftstorage.go, line 59 [r16] (raw file):

  hs, err := loadHardState(r.store.Engine(), r.RangeID)
  // For uninitialized ranges, membership is unknown at this point.
  if raft.IsEmptyHardState(hs) || err != nil {

this conflates empty on-disk HS with missing HS :(

storage/replica_state.go, line 66 [r3] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

not getting any failure (at least locally), what are you looking at?

looks like this was touched up in another commit.

storage/replica_state.go, line 321 [r16] (raw file):

// which does not start from zero but presupposes a few entries already having
// applied.
// The supplied MVCCStats are used for the the Stats field after adjusting for

the the

storage/replica_test.go, line 4040 [r16] (raw file):

  baseStats := initialStats
  // Add in the contribution for the leader lease request.

if this starts to fail, how do i know what to change the next line to?

storage/stats_test.go, line 33 [r16] (raw file):

// bootstrapRangeOnly. These stats are not empty because we call
// writeInitialState().
var initialStats = enginepb.MVCCStats{

past experience suggests making this a factory function

Comments from Reviewable

tbg · 2016-06-21T12:01:15Z

Review status: 0 of 13 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.

storage/replica.go, line 1946 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

be an assertion

Done.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this conflates empty on-disk HS with missing HS :(

Why does the distinction matter?

storage/replica_state.go, line 66 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

looks like this was touched up in another commit.

Moved the edit.

storage/replica_state.go, line 321 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

the the

Done.

storage/replica_test.go, line 4040 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

if this starts to fail, how do i know what to change the next line to?

You don't, though you could in principle compute the correct value (and in practice you'll just change it to whatever the expected value is after asserting that it makes sense that it would have changed). All of this MVCC testing is ripe for a rewrite, but this is enough rewrite for this PR.

storage/stats_test.go, line 33 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

past experience suggests making this a factory function

Done.

Comments from Reviewable

bdarnell · 2016-06-21T13:43:57Z

Reviewed 5 of 11 files at r21, 1 of 3 files at r22, 7 of 7 files at r23.
Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.

keys/keys.go, line 164 [r23] (raw file):

}

// RangeAppliedIndexKey returns a system-local key for a raft applied index.

Why did you rename these? The applied indexes are per-Replica, not per-range (although they are oddly counted as "replicated" keys).

RangeTruncatedStateKey makes a little more sense because it is range-level state, but I don't think it makes sense to rename the key without renaming the proto.

keys/printer_test.go, line 55 [r23] (raw file):

      {AbortCacheKey(roachpb.RangeID(1000001), txnID), fmt.Sprintf(`/Local/RangeID/1000001/r/AbortCache/%q`, txnID)},
      {RaftTombstoneKey(roachpb.RangeID(1000001)), "/Local/RangeID/1000001/r/RaftTombstone"},
      {RangeAppliedIndexKey(roachpb.RangeID(1000001)), "/Local/RangeID/1000001/r/RaftAppliedIndex"},

The pretty-printed names of these keys should match the function names.

storage/replica.go, line 1818 [r23] (raw file):

  if forcedError != nil {
      batch = r.store.Engine().NewBatch()
      batch.Defer(func() {

Add a TODO to remove this, or do you expect to leave it in indefinitely?

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Why does the distinction matter?

I don't think we need this check. If `hs` is empty, `r.mu.state.Desc.Replicas`will be empty too, so we will return an empty HardState and ConfState.

storage/replica_state.go, line 33 [r23] (raw file):

)

// TODO(tschottdorf): unified method to update both in-mem and on-disk

Remove this TODO and replace it with one on saveState to use this method everywhere? Or am I misunderstanding this TODO?

storage/replica_state.go, line 274 [r23] (raw file):

  } else {
      // The log is empty, which means we are either starting from scratch
      // or the entire log has been truncated away. raftTruncatedState

Remove the second sentence of this comment; raftTruncatedState is no longer where the smarts are.

storage/replica_state.go, line 333 [r23] (raw file):

  }
  s.RaftAppliedIndex = s.TruncatedState.Index
  s.Desc = &roachpb.RangeDescriptor{

We don't want to set the term and index to raftInitialLog{Term,Index} until we have a real RangeDescriptor. We do have a real descriptor when this function is called, so we should use that instead of writing this partial one (which could clobber the real one if things happen out of order, and at best will simply get overwritten).

storage/replica_state.go, line 359 [r23] (raw file):

  if !raft.IsEmptyHardState(oldHS) {
      if oldHS.Commit > newHS.Commit {
          log.Fatalf("new group made progress before being initalized: %+v",

This shouldn't be fatal. Raft communicates the commit index via various side-channels so we may learn that other members of the group have made progress before we reach this point (we won't, actually, because raft tries not to send a Commit index to a replica that hasn't acknowledged that index, but I don't like relying on that).

storage/replica_test.go, line 206 [r23] (raw file):

          // necessary to not immediately crash on a Raft command, but
          // nothing more.
          // If you read this and you're writing a new test, try not to

Put this comment on the definition of bootstrapRangeOnly.

Comments from Reviewable

tamird · 2016-06-21T14:05:28Z

Reviewed 5 of 11 files at r21, 1 of 3 files at r22, 7 of 7 files at r23.
Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't think we need this check. If hs is empty, r.mu.state.Desc.Replicaswill be empty too, so we will return an empty HardState and ConfState.

missing HS would indicate corruption, wouldn't it?

storage/replica_test.go, line 4071 [r23] (raw file):

  pArgs = putArgs([]byte("b"), []byte("value2"))

  // Consistent UUID needed for a deterministic SysBytes value. This is because()

because()

Comments from Reviewable

bdarnell · 2016-06-21T14:08:05Z

Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

missing HS would indicate corruption, wouldn't it?

No, missing HardState is normal for uninitialized replicas.

Comments from Reviewable

tamird · 2016-06-21T14:09:38Z

Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

No, missing HardState is normal for uninitialized replicas.

Then when is an empty HardState expected?

Comments from Reviewable

bdarnell · 2016-06-21T14:18:32Z

Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Then when is an empty HardState expected?

Never. It's always non-empty when we write the initial state, and we always check for emptiness in `handleRaftReady`.

Comments from Reviewable

tbg · 2016-06-21T15:25:07Z

TFTRs!
PTAL @bdarnell for the HardState update.

Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.

keys/keys.go, line 164 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why did you rename these? The applied indexes are per-Replica, not per-range (although they are oddly counted as "replicated" keys).

RangeTruncatedStateKey makes a little more sense because it is range-level state, but I don't think it makes sense to rename the key without renaming the proto.

You're right, the renames are a little too drive-by. I've removed them since they don't need to be in this PR. In principle though, I think the new names are better - both keys are counted as replicated keys because they are replicated - they are present in the snapshot, and they are mutated only through Raft. As such, they are part of the replica state (as in, the replica state machine state), and there's no difference between the applied index, and, say, the leader lease or frozen status. Our naming for these keys is `RangeSomethingSomethingKey`, so I wanted to homogenize in that way. Maybe I don't understand the naming here right?

BTW, what's up with RaftTombstoneKey above? Does that need to/should it be replicated?

keys/printer_test.go, line 55 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The pretty-printed names of these keys should match the function names.

Undid the rename.

storage/replica.go, line 1818 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Add a TODO to remove this, or do you expect to leave it in indefinitely?

We'll see. I added a comment on `updateState`.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Never. It's always non-empty when we write the initial state, and we always check for emptiness in handleRaftReady.

That sounds like there is room for assertions. I'm going to leave as is now but send a follow-up which returns a `bool` from all of the `loadXYZ` methods.

storage/replica_state.go, line 33 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove this TODO and replace it with one on saveState to use this method everywhere? Or am I misunderstanding this TODO?

No, you're reading it right. The comment predates `saveState`. Done.

storage/replica_state.go, line 274 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove the second sentence of this comment; raftTruncatedState is no longer where the smarts are.

Done.

storage/replica_state.go, line 333 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We don't want to set the term and index to raftInitialLog{Term,Index} until we have a real RangeDescriptor. We do have a real descriptor when this function is called, so we should use that instead of writing this partial one (which could clobber the real one if things happen out of order, and at best will simply get overwritten).

This descriptor isn't actually written, but only used for its `rangeID`. That interface clearly isn't optimal, but the descriptor (being a real mvcc key and transactionally updated) shouldn't actually be touched by `saveState`. Added comments and will see if an opportunity for improvement presents itself in follow-up work.

storage/replica_state.go, line 359 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This shouldn't be fatal. Raft communicates the commit index via various side-channels so we may learn that other members of the group have made progress before we reach this point (we won't, actually, because raft tries not to send a Commit index to a replica that hasn't acknowledged that index, but I don't like relying on that).

Good point. The correct thing is to take the largest commit index and term we see? Made that change.

storage/replica_test.go, line 206 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Put this comment on the definition of bootstrapRangeOnly.

Done.

storage/replica_test.go, line 4071 [r23] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

because()

Done.

Comments from Reviewable

bdarnell · 2016-06-21T15:42:01Z

Review status: 0 of 10 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.

storage/replica_state.go, line 333 [r23] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This descriptor isn't actually written, but only used for its rangeID. That interface clearly isn't optimal, but the descriptor (being a real mvcc key and transactionally updated) shouldn't actually be touched by saveState. Added comments and will see if an opportunity for improvement presents itself in follow-up work.

The descriptor isn't written to disk here, so `s.Desc` shouldn't be set here either. The caller should be required to populate `s.Desc` before calling `writeInitialState` (or pass that descriptor into `writeInitialState` instead of just the range ID)

keys/keys.go, line 164 [r23] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

You're right, the renames are a little too drive-by. I've removed them since they don't need to be in this PR. In principle though, I think the new names are better - both keys are counted as replicated keys because they are replicated - they are present in the snapshot, and they are mutated only through Raft. As such, they are part of the replica state (as in, the replica state machine state), and there's no difference between the applied index, and, say, the leader lease or frozen status. Our naming for these keys is RangeSomethingSomethingKey, so I wanted to homogenize in that way. Maybe I don't understand the naming here right?

BTW, what's up with RaftTombstoneKey above? Does that need to/should it be replicated?

I don't feel strongly about the names. Our current naming is kind of a mess, but even with this change, starting with `Range` is not a reliable indicator for replicated range-id-local keys (`AbortCacheKey` is another one in this category, while `RangeLast{ReplicaGC,Verification}TimestampKey` start with `Range` but are not replicated. If we want to standardize on `Range` for replicated local keys (or `RangeID` to distinguish from the range descriptor, range tree, and transaction keys) and `Replica` for unreplicated ones I'd be OK with that.

I think the replicated status of RaftTombstoneKey is a bug, that manages to not matter in practice because it appears only on stores that are not currently a member of the group.

Comments from Reviewable

This was not done previously. The handling of truncated state is still a bit sketchy but will cleaned up further.

Seems like we could phase it out if it was worth the trouble.

Change the initialization of Raft groups so that whenever we create a Replica for a "new" Raft group (as opposed to creating a Replica that will receive a snapshot), we write an initial ReplicaState and associated Raft state.

tbg · 2016-06-21T16:11:23Z

Review status: 0 of 10 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

storage/replica_state.go, line 333 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The descriptor isn't written to disk here, so s.Desc shouldn't be set here either. The caller should be required to populate s.Desc before calling writeInitialState (or pass that descriptor into writeInitialState instead of just the range ID)

Passing the descriptor in works fine, did that.

keys/keys.go, line 164 [r23] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't feel strongly about the names. Our current naming is kind of a mess, but even with this change, starting with Range is not a reliable indicator for replicated range-id-local keys (AbortCacheKey is another one in this category, while RangeLast{ReplicaGC,Verification}TimestampKey start with Range but are not replicated. If we want to standardize on Range for replicated local keys (or RangeID to distinguish from the range descriptor, range tree, and transaction keys) and Replica for unreplicated ones I'd be OK with that.

I think the replicated status of RaftTombstoneKey is a bug, that manages to not matter in practice because it appears only on stores that are not currently a member of the group.

Filed https://github.com//issues/7357.

Comments from Reviewable

bdarnell · 2016-06-21T16:15:51Z

LGTM

Review status: 0 of 10 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.

Comments from Reviewable

tamird · 2016-06-21T16:16:32Z

Reviewed 3 of 13 files at r24, 7 of 10 files at r34, 3 of 3 files at r35.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.

storage/replica_state.go, line 32 [r35] (raw file):

)

// loadState loads a ReplicaState from disc. The exception is the Desc field,

pedantic nit: disk?

Comments from Reviewable

tbg · 2016-06-21T16:44:08Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

storage/replica_state.go, line 32 [r35] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

pedantic nit: disk?

Making mental note, but not ready to invest in CI cycle.

Comments from Reviewable

tbg · 2016-06-22T02:30:25Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

That sounds like there is room for assertions. I'm going to leave as is now but send a follow-up which returns a bool from all of the loadXYZ methods.

I think I didn't properly follow this thread. @bdarnell, you were saying that it's normal not to find a HardState, but that when a HardState is found, it will not be empty, right? So there's not much to check (except that we never persist an empty HardState, but that isn't a problem even if it happened, I think).

Comments from Reviewable

tamird · 2016-06-22T02:32:23Z

Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

when a HardState is found, it will not be empty

isn't that assertable?

Comments from Reviewable

tbg · 2016-06-22T02:51:42Z

Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

when a HardState is found, it will not be empty

isn't that assertable?

Sure, but it's not a very useful assertion. I thought we had a better grip on when the on-disk state could be missing.

Comments from Reviewable

bdarnell · 2016-06-22T08:16:11Z

Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.

storage/replica_raftstorage.go, line 59 [r16] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Sure, but it's not a very useful assertion. I thought we had a better grip on when the on-disk state could be missing.

Correct. We should never write an empty HardState. We could assert on that, but it doesn't seem very useful to do so.

Comments from Reviewable

In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.

Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.

In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.

Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.

In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.

Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.

tbg assigned bdarnell Jun 17, 2016

tbg force-pushed the state branch from cf286b4 to fb54979 Compare June 18, 2016 02:02

tbg force-pushed the state branch 2 times, most recently from e7d375b to a42f482 Compare June 20, 2016 22:19

tbg changed the title ~~WIP: eagerly write Raft state for new Ranges~~ eagerly write Raft state for new Ranges Jun 21, 2016

tbg force-pushed the state branch 2 times, most recently from 2cf0d9c to f379bcf Compare June 21, 2016 12:01

tbg force-pushed the state branch from f379bcf to 5b1f907 Compare June 21, 2016 15:25

tbg added 6 commits June 21, 2016 12:08

storage: update split comments

6e03d6f

storage: move state helpers to own file

3684886

add truncatedState helpers and use in loadState

dbd8a60

This was not done previously. The handling of truncated state is still a bit sketchy but will cleaned up further.

storage: add comment on lastReplicaGCTimestamp

01ff451

Seems like we could phase it out if it was worth the trouble.

storage: eagerly write Raft state

5f78d71

Change the initialization of Raft groups so that whenever we create a Replica for a "new" Raft group (as opposed to creating a Replica that will receive a snapshot), we write an initial ReplicaState and associated Raft state.

storage: add state assertions

d1bdb19

tbg mentioned this pull request Jun 21, 2016

keys: standardize replicated and unreplicated key names #7357

Closed

tbg force-pushed the state branch from 5b1f907 to d1bdb19 Compare June 21, 2016 16:11

tbg merged commit 2b3c12d into cockroachdb:master Jun 21, 2016

tbg deleted the state branch June 21, 2016 16:45

tbg mentioned this pull request Jun 22, 2016

stability: crash at startup: panic: [group 903] entries[1:12) is unavailable from storage #7389

Closed

tbg mentioned this pull request Jun 23, 2016

storage: migrate raft state on startup #7429

Merged

tbg mentioned this pull request Jun 24, 2016

storage: fix preemptive snapshots #7468

Merged

Conversation

tbg commented Jun 17, 2016 • edited by petermattis Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdarnell commented Jun 19, 2016

Uh oh!

tamird commented Jun 20, 2016

Uh oh!

tbg commented Jun 20, 2016

Uh oh!

tamird commented Jun 20, 2016

Uh oh!

tbg commented Jun 21, 2016

Uh oh!

bdarnell commented Jun 21, 2016

Uh oh!

tamird commented Jun 21, 2016

Uh oh!

bdarnell commented Jun 21, 2016

Uh oh!

tamird commented Jun 21, 2016

Uh oh!

bdarnell commented Jun 21, 2016

Uh oh!

tbg commented Jun 21, 2016

Uh oh!

bdarnell commented Jun 21, 2016

Uh oh!

tbg commented Jun 21, 2016

Uh oh!

bdarnell commented Jun 21, 2016

Uh oh!

tamird commented Jun 21, 2016

Uh oh!

tbg commented Jun 21, 2016

Uh oh!

tbg commented Jun 22, 2016

Uh oh!

tamird commented Jun 22, 2016

Uh oh!

tbg commented Jun 22, 2016

Uh oh!

bdarnell commented Jun 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbg commented Jun 17, 2016 •

edited by petermattis

Loading