Skip to content

engine: fix divergent MVCCStats updates#20996

Merged
tbg merged 4 commits intocockroachdb:masterfrom
tbg:fix-gcbytesage
Dec 27, 2017
Merged

engine: fix divergent MVCCStats updates#20996
tbg merged 4 commits intocockroachdb:masterfrom
tbg:fix-gcbytesage

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Dec 21, 2017

Found while investigating #20554 (though this does not at all fix the issue
discovered there; that will be a separate PR).

We were incorrectly updating GCBytesAge when moving an intent.

This change raises the question of what to do about the divergent stats
that exist in real-world clusters. As part of addressing #20554, we'll need
a mechanism to correct the stats anyway, and so I will defer its
introduction.

You'll want to view this diff with ?w=1 (insensitive to whitespace
changes).

Release note: None.

@tbg tbg requested a review from a team December 21, 2017 17:57
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb self-assigned this Dec 21, 2017
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Dec 23, 2017

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 2 of 2 files at r5, 3 of 3 files at r6, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


pkg/storage/engine/mvcc.go, line 1847 at r2 (raw file):

}

// unsafePeekNextVersion positions the iterator at the successor to latestKey. If this value

I'm not sure I like the term "peek" here. That usually implies that the iterator is not moved by the function (for instance, that's the case for sql/parser.Scanner), which is not true in this case. unsafeNextVersion is probably a better name.


pkg/storage/engine/mvcc.go, line 1859 at r2 (raw file):

	}
	unsafeKey := iter.UnsafeKey()
	if !unsafeKey.IsValue() {

Is this case actually possible or is it just a sanity check? We already verified that the next value had the same key.


pkg/storage/engine/mvcc.go, line 326 at r6 (raw file):

	if sys {
		ms.SysBytes += metaKeySize + metaValSize - origMetaValSize - origMetaKeySize

minor nit here and below: (metaKeySize + metaValSize) - (origMetaValSize + origMetaKeySize) is a bit easier to read.


pkg/storage/engine/mvcc.go, line 333 at r6 (raw file):

		}
		// At orig.Timestamp, the original meta key disappears.
		ms.KeyBytes -= origMetaKeySize + orig.KeyBytes

Were we missing the orig.KeyBytes and meta.KeyBytes handling before? That wasn't the fundamental problem here, right?


pkg/storage/engine/mvcc.go, line 342 at r6 (raw file):

		}

		ms.AgeTo(meta.Timestamp.WallTime)

I think I understand what's going on here after staring at this for 10 minutes, but since this always takes so long to page back in, do you mind expanding the commit message to explain what the "divergent MVCCStats update" was? Maybe also add a comment about why this ms.AgeTo needs to be above the KeyBytes and ValBytes change below.


pkg/storage/engine/mvcc_stats_test.go, line 1 at r5 (raw file):

// Copyright 2014 The Cockroach Authors.

There were no changes in this code movement, right?


pkg/storage/engine/mvcc_test.go, line 3414 at r3 (raw file):

}

func verifyStats(debug string, ms, expMS *enginepb.MVCCStats, t *testing.T) {

nit: consider moving t *testing.T to be the first argument.


pkg/storage/engine/mvcc_test.go, line 3416 at r3 (raw file):

func verifyStats(debug string, ms, expMS *enginepb.MVCCStats, t *testing.T) {
	t.Helper()
	if !ms.Equal(expMS) {

Huh, why weren't we just doing this before?


pkg/storage/engine/mvcc_test.go, line 3436 at r4 (raw file):

		verifyStats(debug, ms, expMS, t)

		it := engine.NewIterator(false)

Think it makes sense to put this all in verifyStats so that we make the assertion everywhere?


pkg/storage/engine/mvcc_test.go, line 3452 at r4 (raw file):

		}

		if t.Failed() {

What's this about?


pkg/storage/engine/mvcc_test.go, line 3491 at r4 (raw file):

	// Delete the value using a transaction.
	// TODO(tschottdorf): this case is interesting: we write at ts2, bt the timestamp is ts1.

s/bt/but/


Comments from Reviewable

@tbg tbg requested review from a team December 24, 2017 03:07
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 24, 2017

@nvanbenschoten I'll get to your comments, but in the meantime, I dumped an additional number of commits that finds and addresses a boatload of bugs.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 24, 2017

Review status: 0 of 13 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


pkg/storage/engine/mvcc.go, line 1847 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm not sure I like the term "peek" here. That usually implies that the iterator is not moved by the function (for instance, that's the case for sql/parser.Scanner), which is not true in this case. unsafeNextVersion is probably a better name.

Done. I moved this to #21036 because this PR is becoming large and I assume it won't land for a while.


pkg/storage/engine/mvcc.go, line 1859 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this case actually possible or is it just a sanity check? We already verified that the next value had the same key.

Sanity check. This is also the previous code.
I moved this to #21036 because this PR is becoming large and I assume it won't land for a while.


pkg/storage/engine/mvcc_stats_test.go, line 1 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

There were no changes in this code movement, right?

Correct.


pkg/storage/engine/mvcc_test.go, line 3416 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Huh, why weren't we just doing this before?

I think this code predated pretty.Diff.


pkg/storage/engine/mvcc_test.go, line 3452 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's this about?

You want to see the numbers from all the computations above (once they are upgraded to Errorf), but then bail.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 24, 2017

Review status: 0 of 13 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


pkg/storage/engine/mvcc.go, line 326 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

minor nit here and below: (metaKeySize + metaValSize) - (origMetaValSize + origMetaKeySize) is a bit easier to read.

Done.


pkg/storage/engine/mvcc.go, line 333 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Were we missing the orig.KeyBytes and meta.KeyBytes handling before? That wasn't the fundamental problem here, right?

Yep. I added to the commit message.


pkg/storage/engine/mvcc.go, line 342 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think I understand what's going on here after staring at this for 10 minutes, but since this always takes so long to page back in, do you mind expanding the commit message to explain what the "divergent MVCCStats update" was? Maybe also add a comment about why this ms.AgeTo needs to be above the KeyBytes and ValBytes change below.

Yep. I added to the commit message. By the way, testing gets a lot better in later commits, with more descriptive and smaller tests.


pkg/storage/engine/mvcc_test.go, line 3414 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider moving t *testing.T to be the first argument.

Good point, but with the code movement and the fact that it's cleaned up in later commits, I'd like to leave it as is in this one.


pkg/storage/engine/mvcc_test.go, line 3436 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Think it makes sense to put this all in verifyStats so that we make the assertion everywhere?

This


pkg/storage/engine/mvcc_test.go, line 3491 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/bt/but/

Done.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 24, 2017

Actually I just nuked these extra commits again. I'll open a separate PR with them when this one is in. Sorry about the diff noise.


Review status: 1 of 4 files reviewed at latest revision, 11 unresolved discussions.


Comments from Reviewable

This keeps tests passing, but exposes the problem.

Release note: None
tbg added 3 commits December 26, 2017 14:52
Found while investigating cockroachdb#20554 (though this does not at all fix the
issue discovered there; that will be a separate PR).

We were incorrectly updating GCBytesAge when moving an intent. The old code was
pretty broken (and, with hindsight, it is still after this commit, as is exposed
by the various child commits). Its main problem was that it failed to account
for the `GCBytesAge` difference that would result from moving the intent due
to the incorrect assumption that the size of the intent would be the same. The
code was also somewhat intransparent and an attempt has been made to improve
its legibility.

This change raises the question of what to do about the divergent stats that
exist in real-world clusters. As part of addressing cockroachdb#20554, we'll need a
mechanism to correct the stats anyway, and so I will defer its introduction.

You'll want to view this diff with `?w=1` (insensitive to whitespace changes).

Release note: None.
Release note: None
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Dec 27, 2017

:lgtm:


Reviewed 4 of 4 files at r8, 2 of 2 files at r9, 3 of 3 files at r10, 1 of 1 files at r11.
Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 27, 2017

TFTR @nvanbenschoten!

@tbg tbg merged commit 1d48d74 into cockroachdb:master Dec 27, 2017
@tbg tbg deleted the fix-gcbytesage branch December 27, 2017 17:35
tbg added a commit to tbg/cockroach that referenced this pull request Jan 3, 2018
This lays the foundation for planned work in which the consistency checker can
trigger a stats recomputation should the (consistently replicated) stats be
found to be incorrect. In light of cockroachdb#20996, this is evidently something we need.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 3, 2018
This lays the foundation for planned work in which the consistency checker can
trigger a stats recomputation should the (consistently replicated) stats be
found to be incorrect. In light of cockroachdb#20996, this is evidently something we need.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 3, 2018
This lays the foundation for planned work in which the consistency checker can
trigger a stats recomputation should the (consistently replicated) stats be
found to be incorrect. In light of cockroachdb#20996, this is evidently something we need.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 3, 2018
This lays the foundation for planned work in which the consistency checker can
trigger a stats recomputation should the (consistently replicated) stats be
found to be incorrect. In light of cockroachdb#20996, this is evidently something we need.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 8, 2018
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration. Once activated, only the new tombstone is written,
  but both tombstones are checked. Additionally, as the node restarts, all
  legacy tombstones are replaced by correct non-legacy ones.
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within (even before the cluster version is bumped). This prevents new
  legacy tombstones to trickle on from other nodes.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.1, as at that point all peers have booted with a version that runs the
migration.

Thus, post v2.1, the replica consistency checker can stop skipping the legacy
tombstone key.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Jan 9, 2018
RaftTombstoneKey was accidentally made a replicated key when it was first
introduced, a problem we first realized existed when it was [included in
snapshots].

At the time, we included workarounds to skip this key in various places
(snapshot application, consistency checker) but of course we have failed to
insert further hacks of the same kind elsewhere since (the one that prompting
this PR being the stats recomputation on splits, which I'm looking into as part
of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem)

It feels sloppy that we didn't follow through back then, but luckily the damage
appears to be limited; it is likely that the replicated existence of this key
results in MVCCStats SysBytes inconsistencies, but as it happens, these stats
are [already] [very] [inconsistent].

This commit does a few things:

- renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey`
- introduces a (correctly unreplicated) `RaftTombstoneKey`
- introduces a migration. Once activated, only the new tombstone is written,
  but both tombstones are checked. Additionally, as the node restarts, all
  legacy tombstones are replaced by correct non-legacy ones.
- when applying a snapshot, forcibly delete any legacy tombstone contained
  within (even before the cluster version is bumped). This prevents new
  legacy tombstones to trickle on from other nodes.

`RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries
post v2.1, as at that point all peers have booted with a version that runs the
migration.

Thus, post v2.1, the replica consistency checker can stop skipping the legacy
tombstone key.

Fixes cockroachdb#12154.

Release note: None

[included in snapshots]: cockroachdb#12131
[already]: cockroachdb#20554
[very]: cockroachdb#20996
[inconsistent]: cockroachdb#21070
tbg added a commit to tbg/cockroach that referenced this pull request Jan 9, 2018
A number of bugs in our MVCCStats logic has been fixed recently (see for example
\cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for
future bugs or deliberate adjustments of the computations, we need a mechanism
to recompute the stats in order to purge incorrect numbers from the system over
time. Such a mechanism is introduced here.

It consists of two main components:

- A new RPC `AdjustStats`, which applies to a single Range and computes the difference
  between the persisted stats and the recomputed ones; it can "inject" a suitable delta
  into the stats (thus fixing the divergence) or do a "dry run".
- A trigger in the consistency checker that runs on the coordinating node (the lease
  holder). The consistency checker already recomputes the stats, and it can compare
  them against the persisted stats and judge whether there is a divergence. If there
  is one, naively one may hope to just insert the newly computed diff into the range,
  but this is not ideal due to concerns about double application and racing consistency
  checks. Instead, use `AdjustStats` (which, of course, was introduced for this purpose)
  which strikes a balance between efficiency and correctness.

Updates cockroachdb#20554.

Release note (general change): added a mechanism to recompute range stats on demand.
tbg added a commit to tbg/cockroach that referenced this pull request Jan 12, 2018
A number of bugs in our MVCCStats logic has been fixed recently (see for example
\cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for
future bugs or deliberate adjustments of the computations, we need a mechanism
to recompute the stats in order to purge incorrect numbers from the system over
time. Such a mechanism is introduced here.

It consists of two main components:

- A new RPC `RecomputeStats`, which applies to a single Range and
computes the difference between the persisted stats and the recomputed
ones; it can "inject" a suitable delta into the stats (thus fixing the
divergence) or do a "dry run".
- A trigger in the consistency checker that runs on the coordinating
node (the lease holder). The consistency checker already recomputes the
stats, and it can compare them against the persisted stats and judge
whether there is a divergence. If there is one, naively one may hope to
just insert the newly computed diff into the range, but this is not
ideal due to concerns about double application and racing consistency
checks. Instead, use `RecomputeStats` (which, of course, was introduced
for this purpose) which strikes a balance between efficiency and
correctness.

Updates cockroachdb#20554.

Release note (general change): added a mechanism to recompute range
stats on demand.
tbg added a commit to tbg/cockroach that referenced this pull request Jan 17, 2018
A number of bugs in our MVCCStats logic has been fixed recently (see for example
\cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for
future bugs or deliberate adjustments of the computations, we need a mechanism
to recompute the stats in order to purge incorrect numbers from the system over
time. Such a mechanism is introduced here.

It consists of two main components:

- A new RPC `RecomputeStats`, which applies to a single Range and
computes the difference between the persisted stats and the recomputed
ones; it can "inject" a suitable delta into the stats (thus fixing the
divergence) or do a "dry run".
- A trigger in the consistency checker that runs on the coordinating
node (the lease holder). The consistency checker already recomputes the
stats, and it can compare them against the persisted stats and judge
whether there is a divergence. If there is one, naively one may hope to
just insert the newly computed diff into the range, but this is not
ideal due to concerns about double application and racing consistency
checks. Instead, use `RecomputeStats` (which, of course, was introduced
for this purpose) which strikes a balance between efficiency and
correctness.

Updates cockroachdb#20554.

Release note (general change): added a mechanism to recompute range
stats on demand.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants