Skip to content

engine: better randomized MVCCStats testing; fix bugs#21070

Merged
tbg merged 7 commits intocockroachdb:masterfrom
tbg:fix-gcbytesage-stage2
Dec 29, 2017
Merged

engine: better randomized MVCCStats testing; fix bugs#21070
tbg merged 7 commits intocockroachdb:masterfrom
tbg:fix-gcbytesage-stage2

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Dec 27, 2017

This PR removes TestMVCCStatsWithRandomRuns and replaces it with a
more principled test that achieves much better coverage. This can be seen
from the amount of additional tests it inspired, each of which explores one
family of failures of the new TestMVCCStatsRandomized.

Release note: None

@tbg tbg requested review from a team December 27, 2017 19:08
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg requested a review from nvb December 27, 2017 19:09
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Dec 27, 2017

Reviewed 3 of 3 files at r1, 7 of 7 files at r2, 9 of 9 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, 2 of 2 files at r7, 1 of 1 files at r8.
Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


pkg/sqlmigrations/migrations.go, line 150 at r3 (raw file):

	},
	{
		name:   "remove cluster setting `kv.gc.batch_size`",

In commit message: s/the fact that add the time/the fact that at the time/.

Also, this will break anyone who is expecting that the cluster setting exists. I doubt anyone is actually using it, but at the very least we should mark this as a backwards-incompatible change in the release notes.


pkg/sqlmigrations/migrations_test.go, line 450 at r3 (raw file):

	mt.start(t, base.TestServerArgs{})

	mt.sqlDB.Exec(t, `INSERT INTO system.settings (name, "lastUpdated", "valueType", value) `+

Why the quotes on only some column names?


pkg/storage/engine/mvcc.go, line 276 at r1 (raw file):

	// meta.Timestamp.WallTime < orig.Timestamp.WallTime. This wouldn't happen
	// outside of tests (due to our semantics of txn.OrigTimestamp, which never
	// decreases) but it sure does happen in randomized testing. An earlier

Is this worth supporting/testing then? If the randomized testing is diverging from what it's trying to simulate then it seems like we should fix that instead. Maybe I'm missing a reason why we'd want to support this. I guess it's more general.


pkg/storage/engine/mvcc.go, line 341 at r1 (raw file):

	if sys {
		ms.SysBytes += metaKeySize + metaValSize - origMetaValSize - origMetaKeySize
		ms.AgeTo(meta.Timestamp.WallTime)

ms.AgeTo(meta.Timestamp.WallTime) why are we able to remove this?


pkg/storage/engine/mvcc.go, line 350 at r1 (raw file):

		ms.ValBytes -= origMetaValSize + orig.ValBytes

		ms.IntentBytes -= orig.KeyBytes + orig.ValBytes

nit, add comment that parallels the others and move this above the if !meta.Deleted { ... block.


pkg/storage/engine/mvcc.go, line 2258 at r3 (raw file):

		// that the (implicit or explicit) meta key (and thus all versions) are
		// being removed. We had this faulty functionality at some point; it
		// should no more be necessary since the higher levels already make sure

s/no more be necessary/no longer be necessary/


pkg/storage/engine/mvcc.go, line 241 at r7 (raw file):

				// If the original value was an intent, we're replacing the
				// intent.
				ms.SysBytes -= orig.KeyBytes + orig.ValBytes

This doesn't have an impact on IntentBytes or IntentCount like it does below? I think it might be because these system keys are implicit, in which case you might want to note that in a comment.


pkg/storage/engine/mvcc_stats_test.go, line 171 at r1 (raw file):

		LiveBytes:       mKeySize + mValSize + vKeySize + vValSize, // 2+44+12+10 = 68
		LiveCount:       1,
		KeyBytes:        mKeySize + vKeySize, // 14

nit: 2 + 12 = 14 to be consistent.


pkg/storage/engine/mvcc_stats_test.go, line 176 at r1 (raw file):

		ValCount:        1,
		IntentCount:     1,
		IntentBytes:     vKeySize + vValSize, // 12+10 = 22

Add GCBytesAge: 0 here so that it's clear how this changes between the two assertions.


pkg/storage/engine/mvcc_stats_test.go, line 183 at r1 (raw file):

	// push-commit as it would happen for a SNAPSHOT txn)
	ts4 := hlc.Timestamp{WallTime: 4 * 1E9}
	txn.Status = roachpb.COMMITTED

This is the only line that differs between this test and the next. Think it's worth collapsing the tests into one?


pkg/storage/engine/mvcc_stats_test.go, line 240 at r1 (raw file):

		LiveBytes:       mKeySize + mValSize + vKeySize + vValSize, // 2+44+12+10 = 68
		LiveCount:       1,
		KeyBytes:        mKeySize + vKeySize, // 14

Ditto.


pkg/storage/engine/mvcc_stats_test.go, line 245 at r1 (raw file):

		ValCount:        1,
		IntentCount:     1,
		IntentBytes:     vKeySize + vValSize, // 12+10 = 22

Same as above, add IntentAge: 0 here.


pkg/storage/engine/mvcc_stats_test.go, line 361 at r1 (raw file):

// TestMVCCStatsDocumentNegativeWrites documents that things go wrong when you
// write at a negative timestamp. We shouldn't do that in practice and perhaps
// we should have it error outright.

What's the argument for not having this error outright?


pkg/storage/engine/mvcc_stats_test.go, line 743 at r1 (raw file):

	defer eng.Close()

	s := &randomTest{

nit: We might want to pull most of this into a constructor for the randomTest struct so that it can be used in different kinds of tests. Your call.


pkg/storage/engine/mvcc_stats_test.go, line 324 at r2 (raw file):

committing/aborting intents

committing/aborting intents from other transactions?


pkg/storage/engine/mvcc_stats_test.go, line 279 at r3 (raw file):

DelDelGC

Was this intentional?


pkg/storage/engine/mvcc_stats_test.go, line 309 at r3 (raw file):

		KeyCount:        1,
		ValCount:        2,
		GCBytesAge:      vKeySize, // first tombstone, aged from ts1 to ts2

nit: add some indication that this is only equal to vKeySize because (ts2-ts1)/1E9 = 1 (in other words, there's a coefficient that's being hidden here).


pkg/storage/engine/mvcc_stats_test.go, line 841 at r6 (raw file):

		return ""
	}
	s.actions["DelRange"] = func(s *state) string {

Any reason not to fold this into the first commit to keep it all together?


pkg/storage/engine/enginepb/mvcc.go, line 64 at r1 (raw file):

}

// ForceAge encapsulates the complexity of computing the increment in age

Think this needs a test?


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Dec 27, 2017

Overall this looks good. I'm a big supporter of randomized testing for things like this where the number of different event permutations gets too large to reason about or test exhaustively.


Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 28, 2017

Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


pkg/sqlmigrations/migrations.go, line 150 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

In commit message: s/the fact that add the time/the fact that at the time/.

Also, this will break anyone who is expecting that the cluster setting exists. I doubt anyone is actually using it, but at the very least we should mark this as a backwards-incompatible change in the release notes.

Done.


pkg/sqlmigrations/migrations_test.go, line 450 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why the quotes on only some column names?

Because it's necessary for camel-cased columns (or it'll look for lastupdated which doesn't exist).


pkg/storage/engine/mvcc.go, line 276 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this worth supporting/testing then? If the randomized testing is diverging from what it's trying to simulate then it seems like we should fix that instead. Maybe I'm missing a reason why we'd want to support this. I guess it's more general.

Many of the bugs here were because our testing assumed that only very specific operations happen. The way I see it, at the MVCC layer, everything that returns OK should be fair game. Just because we don't happen to move intents in certain ways doesn't mean it should be allowed to return bogus results. (In this case, the decision is further made easy by the fact that the more general code is also not less intuitive).


pkg/storage/engine/mvcc.go, line 341 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

ms.AgeTo(meta.Timestamp.WallTime) why are we able to remove this?

MVCCStats obtained via AgeTo are all equivalent, and nothing relies on the LastUpdateNanos being a certain number. When you need that, you call AgeTo anyway.


pkg/storage/engine/mvcc.go, line 350 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit, add comment that parallels the others and move this above the if !meta.Deleted { ... block.

Done.


pkg/storage/engine/mvcc.go, line 2258 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/no more be necessary/no longer be necessary/

Done.


pkg/storage/engine/mvcc.go, line 241 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This doesn't have an impact on IntentBytes or IntentCount like it does below? I think it might be because these system keys are implicit, in which case you might want to note that in a comment.

Done.


pkg/storage/engine/mvcc_stats_test.go, line 171 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: 2 + 12 = 14 to be consistent.

Done.


pkg/storage/engine/mvcc_stats_test.go, line 176 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add GCBytesAge: 0 here so that it's clear how this changes between the two assertions.

Done.


pkg/storage/engine/mvcc_stats_test.go, line 183 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is the only line that differs between this test and the next. Think it's worth collapsing the tests into one?

I thought about that too, but I think these tests are hard enough to follow that I don't mind the duplication. (All of this stuff is also covered by the randomized tests, so they're really just documentation of things once wrong and to hopefully catch any regressions in the future).


pkg/storage/engine/mvcc_stats_test.go, line 240 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Ditto.

Done.


pkg/storage/engine/mvcc_stats_test.go, line 245 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same as above, add IntentAge: 0 here.

Done.


pkg/storage/engine/mvcc_stats_test.go, line 361 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's the argument for not having this error outright?

There's no good argument except having to add lots of random checks. I should do it. Filed #21112


pkg/storage/engine/mvcc_stats_test.go, line 743 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: We might want to pull most of this into a constructor for the randomTest struct so that it can be used in different kinds of tests. Your call.

I'd be happy to see some of those, but I think I'll defer that refactoring to when they're actually used.


pkg/storage/engine/mvcc_stats_test.go, line 324 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

committing/aborting intents

committing/aborting intents from other transactions?

no, our own intents. A snapshot txn with OrigTimestamp=1 and eventually Timestamp=10 would keep writing at 1 and only the commit would move the intents to 10. Let me know if you have an update to the comment in mind that would clarify.


pkg/storage/engine/mvcc_stats_test.go, line 279 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

DelDelGC

Was this intentional?

Yep. Added a comment though.


pkg/storage/engine/mvcc_stats_test.go, line 309 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: add some indication that this is only equal to vKeySize because (ts2-ts1)/1E9 = 1 (in other words, there's a coefficient that's being hidden here).

Added a coefficient 1. BTW, It's ts2/1E9 - ts1/1E9 which is not the same as (ts2-ts1)/1E9 generally (it is in this test, of course 😄 )


pkg/storage/engine/mvcc_stats_test.go, line 841 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Any reason not to fold this into the first commit to keep it all together?

I think it's nice to see which things tickled bugs and which didn't. Definitely a matter of taste, but I'd just leave as-is unless you feel strongly.


pkg/storage/engine/enginepb/mvcc.go, line 64 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Think this needs a test?

Good point, done.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Dec 28, 2017

:lgtm: I'd double check that tests pass between each commit since there has been a lot of code movement between them.


Reviewed 13 of 13 files at r9, 7 of 7 files at r10, 9 of 9 files at r11, 1 of 1 files at r12, 1 of 1 files at r13, 2 of 2 files at r14, 1 of 1 files at r15.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/engine/mvcc.go, line 276 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Many of the bugs here were because our testing assumed that only very specific operations happen. The way I see it, at the MVCC layer, everything that returns OK should be fair game. Just because we don't happen to move intents in certain ways doesn't mean it should be allowed to return bogus results. (In this case, the decision is further made easy by the fact that the more general code is also not less intuitive).

👍


pkg/storage/engine/mvcc.go, line 350 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Done.

I also meant moving the ms.KeyBytes -= and ms.ValBytes -= down so that subtraction and addition structures are the same. Still a small nit.


pkg/storage/engine/mvcc.go, line 362 at r9 (raw file):

		if !commit {
			// If not committing, the intent reappears (but at meta.Timestamp).

Make a note of when this can happen (a push, right?)


pkg/storage/engine/mvcc_stats_test.go, line 183 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I thought about that too, but I think these tests are hard enough to follow that I don't mind the duplication. (All of this stuff is also covered by the randomized tests, so they're really just documentation of things once wrong and to hopefully catch any regressions in the future).

👍


pkg/storage/engine/mvcc_stats_test.go, line 324 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

no, our own intents. A snapshot txn with OrigTimestamp=1 and eventually Timestamp=10 would keep writing at 1 and only the commit would move the intents to 10. Let me know if you have an update to the comment in mind that would clarify.

So does the timestamp in the txn and the timestamp passed directly to MVCCPut differ for serializable transactions?


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 28, 2017

TFTR! git rebase -i --exec 'make test PKG=./pkg/storage/engine...' :/Merge had no complaints.


Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/engine/mvcc.go, line 350 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I also meant moving the ms.KeyBytes -= and ms.ValBytes -= down so that subtraction and addition structures are the same. Still a small nit.

Ah, gotcha. I think I got it this time!


pkg/storage/engine/mvcc.go, line 362 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Make a note of when this can happen (a push, right?)

Done.


pkg/storage/engine/mvcc_stats_test.go, line 324 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

So does the timestamp in the txn and the timestamp passed directly to MVCCPut differ for serializable transactions?

Yes and no. Yes in general because serializable transactions can have their timestamp pushed too, except that then of course they'll have to restart eventually. So in a way, when looking at MVCCStats, everything looks like a snapshot transaction for the most part.


Comments from Reviewable

@tbg tbg force-pushed the fix-gcbytesage-stage2 branch from 4f6245b to 5b773c9 Compare December 28, 2017 22:55
tbg added 7 commits December 28, 2017 18:05
This commit removes `TestMVCCStatsWithRandomRuns` and replaces it with
a more principled test that achieves much better coverage. This can be
seen from the amount of additional tests it inspired, each of which
explores one family of failures of the new `TestMVCCStatsRandomized`.

Release note: None
the preceding commit has shown that AgeTo (which works as a kind of
`Forward()`)is often not what we want. In fact, just moving the stats to the
specified wall time is the correct choice almost all the time, perhaps except
when updating the replica state's stats (as we prefer to see the highest
received timestamp as opposed to the most recent one). Change the semantics
accordingly and preserve the old ones via the more descriptive `.Forward()`.

Simplify TestCmdClearRangeBytesThreshold: Remove variable `delta` and do not
write only inline values.

Release note: None
Found via `TestMVCCStatsRandomized`. We weren't correctly updating the
stats in the case in which the cluster setting limited the number of
keys to delete, but the GC Timestamp covered the first entry. In that
case, the stats update would assume implicitly that all values were
going to be deleted.

Since cockroachdb#20373 and others around the same time have made the cluster setting
obsolete, the course of action taken here was simply the removal of this
limiting functionality. This is congruent with the fact that at the time at
which it was added, `kv.gc.batch_size` was a stopgap fix.

Release note (backwards-incompatible change): removed the obsolete
`kv.gc.batch_size` cluster setting.
Release note: None
Unsurprisingly, this just worked (it calls `MVCCDelete` under the hood,
which in turn has already been tested).

Release note: None
When adding in a new value, we weren't taking into account `ms.{Key,Val}Bytes`
when subtracting the contribution of the original value. In particular, the
existing code wasn't paying attention to the fact that system keys can hold
intents as well (such as the range descriptor key would).

Exposed via `TestMVCCStatsRandomized`.

Release note: None
As expected, this just worked (ranged resolutions simply call the corresponding
single-key code for ever intent they discover).

Also exercises `InitPut`, predictably without problems.

Release note: None
@tbg tbg force-pushed the fix-gcbytesage-stage2 branch from 5b773c9 to 50d7474 Compare December 28, 2017 23:06
@tbg tbg merged commit eb5b878 into cockroachdb:master Dec 29, 2017
@tbg tbg deleted the fix-gcbytesage-stage2 branch December 29, 2017 01:21
tbg added a commit to tbg/cockroach that referenced this pull request Dec 29, 2017
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration that rewrites all legacy keys early in the node
  start sequence; as a result, a node running this commit or later will
  never see any legacy keys, except when they come in through a snapshot
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.0, as at that point all peers have version at least v2.0 (which has this
commit and will never send legacy tombstones in snapshots).

Since sending legacy keys in snapshots was never intended (and isn't relied
upon), none of this needs a cluster version migration.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Dec 29, 2017
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration that rewrites all legacy keys early in the node
  start sequence; as a result, a node running this commit or later will
  never see any legacy keys, except when they come in through a snapshot
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.0, as at that point all peers have version at least v2.0 (which has this
commit and will never send legacy tombstones in snapshots).

Since sending legacy keys in snapshots was never intended (and isn't relied
upon), none of this needs a cluster version migration.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Dec 29, 2017
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration that rewrites all legacy keys early in the node
  start sequence; as a result, a node running this commit or later will
  never see any legacy keys, except when they come in through a snapshot
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.0, as at that point all peers have version at least v2.0 (which has this
commit and will never send legacy tombstones in snapshots).

Since sending legacy keys in snapshots was never intended (and isn't relied
upon), none of this needs a cluster version migration.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Dec 30, 2017
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration that rewrites all legacy keys early in the node
  start sequence; as a result, a node running this commit or later will
  never see any legacy keys, except when they come in through a snapshot
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.0, as at that point all peers have version at least v2.0 (which has this
commit and will never send legacy tombstones in snapshots).

Since sending legacy keys in snapshots was never intended (and isn't relied
upon), none of this needs a cluster version migration.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Jan 3, 2018
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration that rewrites all legacy keys, to be activated
  when the cluster version surpasses the newly introduced version. Until
  then, both tombstones are checked, but only the old one written.
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within (even before the cluster version is bumped).

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.1, as at that point all peers have booted with a version that runs the
migration (it's morally true in `v2.0` as well, but it's possible that the
cluster version update would interleave with a write of a legacy tombstone).

In v2.0, there may be both legacy and old-style tombstones, but they'll never
proliferate through snapshots. This implies that if a tombstone is left around
from before the `v2.0` migration runs, then it is in a replica that since hasn't
been recreated (for if it had, the tombstone would have been wiped), and thus it
can't trip up the consistency checker. Thus, post v2.0, the replica consistency
checker can stop skipping the legacy tombstone key.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Jan 8, 2018
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration. Once activated, only the new tombstone is written,
  but both tombstones are checked. Additionally, as the node restarts, all
  legacy tombstones are replaced by correct non-legacy ones.
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within (even before the cluster version is bumped). This prevents new
  legacy tombstones to trickle on from other nodes.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.1, as at that point all peers have booted with a version that runs the
migration.

Thus, post v2.1, the replica consistency checker can stop skipping the legacy
tombstone key.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Jan 9, 2018
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration. Once activated, only the new tombstone is written,
  but both tombstones are checked. Additionally, as the node restarts, all
  legacy tombstones are replaced by correct non-legacy ones.
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within (even before the cluster version is bumped). This prevents new
  legacy tombstones to trickle on from other nodes.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.1, as at that point all peers have booted with a version that runs the
migration.

Thus, post v2.1, the replica consistency checker can stop skipping the legacy
tombstone key.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants