engine: fix divergent MVCCStats updates#20996
Conversation
|
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 2 of 2 files at r5, 3 of 3 files at r6, 1 of 1 files at r7. pkg/storage/engine/mvcc.go, line 1847 at r2 (raw file):
I'm not sure I like the term pkg/storage/engine/mvcc.go, line 1859 at r2 (raw file):
Is this case actually possible or is it just a sanity check? We already verified that the next value had the same key. pkg/storage/engine/mvcc.go, line 326 at r6 (raw file):
minor nit here and below: pkg/storage/engine/mvcc.go, line 333 at r6 (raw file):
Were we missing the pkg/storage/engine/mvcc.go, line 342 at r6 (raw file):
I think I understand what's going on here after staring at this for 10 minutes, but since this always takes so long to page back in, do you mind expanding the commit message to explain what the "divergent MVCCStats update" was? Maybe also add a comment about why this pkg/storage/engine/mvcc_stats_test.go, line 1 at r5 (raw file):
There were no changes in this code movement, right? pkg/storage/engine/mvcc_test.go, line 3414 at r3 (raw file):
nit: consider moving pkg/storage/engine/mvcc_test.go, line 3416 at r3 (raw file):
Huh, why weren't we just doing this before? pkg/storage/engine/mvcc_test.go, line 3436 at r4 (raw file):
Think it makes sense to put this all in pkg/storage/engine/mvcc_test.go, line 3452 at r4 (raw file):
What's this about? pkg/storage/engine/mvcc_test.go, line 3491 at r4 (raw file):
Comments from Reviewable |
|
@nvanbenschoten I'll get to your comments, but in the meantime, I dumped an additional number of commits that finds and addresses a boatload of bugs. |
|
Review status: 0 of 13 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. pkg/storage/engine/mvcc.go, line 1847 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. I moved this to #21036 because this PR is becoming large and I assume it won't land for a while. pkg/storage/engine/mvcc.go, line 1859 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Sanity check. This is also the previous code. pkg/storage/engine/mvcc_stats_test.go, line 1 at r5 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Correct. pkg/storage/engine/mvcc_test.go, line 3416 at r3 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think this code predated pkg/storage/engine/mvcc_test.go, line 3452 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You want to see the numbers from all the computations above (once they are upgraded to Comments from Reviewable |
|
Review status: 0 of 13 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. pkg/storage/engine/mvcc.go, line 326 at r6 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/engine/mvcc.go, line 333 at r6 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yep. I added to the commit message. pkg/storage/engine/mvcc.go, line 342 at r6 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yep. I added to the commit message. By the way, testing gets a lot better in later commits, with more descriptive and smaller tests. pkg/storage/engine/mvcc_test.go, line 3414 at r3 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Good point, but with the code movement and the fact that it's cleaned up in later commits, I'd like to leave it as is in this one. pkg/storage/engine/mvcc_test.go, line 3436 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This pkg/storage/engine/mvcc_test.go, line 3491 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. Comments from Reviewable |
|
Actually I just nuked these extra commits again. I'll open a separate PR with them when this one is in. Sorry about the diff noise. Review status: 1 of 4 files reviewed at latest revision, 11 unresolved discussions. Comments from Reviewable |
This keeps tests passing, but exposes the problem. Release note: None
Release note: None
Found while investigating cockroachdb#20554 (though this does not at all fix the issue discovered there; that will be a separate PR). We were incorrectly updating GCBytesAge when moving an intent. The old code was pretty broken (and, with hindsight, it is still after this commit, as is exposed by the various child commits). Its main problem was that it failed to account for the `GCBytesAge` difference that would result from moving the intent due to the incorrect assumption that the size of the intent would be the same. The code was also somewhat intransparent and an attempt has been made to improve its legibility. This change raises the question of what to do about the divergent stats that exist in real-world clusters. As part of addressing cockroachdb#20554, we'll need a mechanism to correct the stats anyway, and so I will defer its introduction. You'll want to view this diff with `?w=1` (insensitive to whitespace changes). Release note: None.
Release note: None
|
Reviewed 4 of 4 files at r8, 2 of 2 files at r9, 3 of 3 files at r10, 1 of 1 files at r11. Comments from Reviewable |
|
TFTR @nvanbenschoten! |
This lays the foundation for planned work in which the consistency checker can trigger a stats recomputation should the (consistently replicated) stats be found to be incorrect. In light of cockroachdb#20996, this is evidently something we need. Release note: None
This lays the foundation for planned work in which the consistency checker can trigger a stats recomputation should the (consistently replicated) stats be found to be incorrect. In light of cockroachdb#20996, this is evidently something we need. Release note: None
This lays the foundation for planned work in which the consistency checker can trigger a stats recomputation should the (consistently replicated) stats be found to be incorrect. In light of cockroachdb#20996, this is evidently something we need. Release note: None
This lays the foundation for planned work in which the consistency checker can trigger a stats recomputation should the (consistently replicated) stats be found to be incorrect. In light of cockroachdb#20996, this is evidently something we need. Release note: None
RaftTombstoneKey was accidentally made a replicated key when it was first introduced, a problem we first realized existed when it was [included in snapshots]. At the time, we included workarounds to skip this key in various places (snapshot application, consistency checker) but of course we have failed to insert further hacks of the same kind elsewhere since (the one that prompting this PR being the stats recomputation on splits, which I'm looking into as part of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem) It feels sloppy that we didn't follow through back then, but luckily the damage appears to be limited; it is likely that the replicated existence of this key results in MVCCStats SysBytes inconsistencies, but as it happens, these stats are [already] [very] [inconsistent]. This commit does a few things: - renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey` - introduces a (correctly unreplicated) `RaftTombstoneKey` - introduces a migration. Once activated, only the new tombstone is written, but both tombstones are checked. Additionally, as the node restarts, all legacy tombstones are replaced by correct non-legacy ones. - when applying a snapshot, forcibly delete any legacy tombstone contained within (even before the cluster version is bumped). This prevents new legacy tombstones to trickle on from other nodes. `RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries post v2.1, as at that point all peers have booted with a version that runs the migration. Thus, post v2.1, the replica consistency checker can stop skipping the legacy tombstone key. Fixes cockroachdb#12154. Release note: None [included in snapshots]: cockroachdb#12131 [already]: cockroachdb#20554 [very]: cockroachdb#20996 [inconsistent]: cockroachdb#21070
RaftTombstoneKey was accidentally made a replicated key when it was first introduced, a problem we first realized existed when it was [included in snapshots]. At the time, we included workarounds to skip this key in various places (snapshot application, consistency checker) but of course we have failed to insert further hacks of the same kind elsewhere since (the one that prompting this PR being the stats recomputation on splits, which I'm looking into as part of cockroachdb#20181 -- unfortunately this commit doesn't seem to pertain to that problem) It feels sloppy that we didn't follow through back then, but luckily the damage appears to be limited; it is likely that the replicated existence of this key results in MVCCStats SysBytes inconsistencies, but as it happens, these stats are [already] [very] [inconsistent]. This commit does a few things: - renames the old tombstone key to `RaftIncorrectLegacyTombstoneKey` - introduces a (correctly unreplicated) `RaftTombstoneKey` - introduces a migration. Once activated, only the new tombstone is written, but both tombstones are checked. Additionally, as the node restarts, all legacy tombstones are replaced by correct non-legacy ones. - when applying a snapshot, forcibly delete any legacy tombstone contained within (even before the cluster version is bumped). This prevents new legacy tombstones to trickle on from other nodes. `RaftIncorrectLegacyTombstoneKey` can be purged from the codebase in binaries post v2.1, as at that point all peers have booted with a version that runs the migration. Thus, post v2.1, the replica consistency checker can stop skipping the legacy tombstone key. Fixes cockroachdb#12154. Release note: None [included in snapshots]: cockroachdb#12131 [already]: cockroachdb#20554 [very]: cockroachdb#20996 [inconsistent]: cockroachdb#21070
A number of bugs in our MVCCStats logic has been fixed recently (see for example \cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for future bugs or deliberate adjustments of the computations, we need a mechanism to recompute the stats in order to purge incorrect numbers from the system over time. Such a mechanism is introduced here. It consists of two main components: - A new RPC `AdjustStats`, which applies to a single Range and computes the difference between the persisted stats and the recomputed ones; it can "inject" a suitable delta into the stats (thus fixing the divergence) or do a "dry run". - A trigger in the consistency checker that runs on the coordinating node (the lease holder). The consistency checker already recomputes the stats, and it can compare them against the persisted stats and judge whether there is a divergence. If there is one, naively one may hope to just insert the newly computed diff into the range, but this is not ideal due to concerns about double application and racing consistency checks. Instead, use `AdjustStats` (which, of course, was introduced for this purpose) which strikes a balance between efficiency and correctness. Updates cockroachdb#20554. Release note (general change): added a mechanism to recompute range stats on demand.
A number of bugs in our MVCCStats logic has been fixed recently (see for example \cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for future bugs or deliberate adjustments of the computations, we need a mechanism to recompute the stats in order to purge incorrect numbers from the system over time. Such a mechanism is introduced here. It consists of two main components: - A new RPC `RecomputeStats`, which applies to a single Range and computes the difference between the persisted stats and the recomputed ones; it can "inject" a suitable delta into the stats (thus fixing the divergence) or do a "dry run". - A trigger in the consistency checker that runs on the coordinating node (the lease holder). The consistency checker already recomputes the stats, and it can compare them against the persisted stats and judge whether there is a divergence. If there is one, naively one may hope to just insert the newly computed diff into the range, but this is not ideal due to concerns about double application and racing consistency checks. Instead, use `RecomputeStats` (which, of course, was introduced for this purpose) which strikes a balance between efficiency and correctness. Updates cockroachdb#20554. Release note (general change): added a mechanism to recompute range stats on demand.
A number of bugs in our MVCCStats logic has been fixed recently (see for example \cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for future bugs or deliberate adjustments of the computations, we need a mechanism to recompute the stats in order to purge incorrect numbers from the system over time. Such a mechanism is introduced here. It consists of two main components: - A new RPC `RecomputeStats`, which applies to a single Range and computes the difference between the persisted stats and the recomputed ones; it can "inject" a suitable delta into the stats (thus fixing the divergence) or do a "dry run". - A trigger in the consistency checker that runs on the coordinating node (the lease holder). The consistency checker already recomputes the stats, and it can compare them against the persisted stats and judge whether there is a divergence. If there is one, naively one may hope to just insert the newly computed diff into the range, but this is not ideal due to concerns about double application and racing consistency checks. Instead, use `RecomputeStats` (which, of course, was introduced for this purpose) which strikes a balance between efficiency and correctness. Updates cockroachdb#20554. Release note (general change): added a mechanism to recompute range stats on demand.
Found while investigating #20554 (though this does not at all fix the issue
discovered there; that will be a separate PR).
We were incorrectly updating GCBytesAge when moving an intent.
This change raises the question of what to do about the divergent stats
that exist in real-world clusters. As part of addressing #20554, we'll need
a mechanism to correct the stats anyway, and so I will defer its
introduction.
You'll want to view this diff with
?w=1(insensitive to whitespacechanges).
Release note: None.