storage: recompute and adjust stats via consistency checker#21345
storage: recompute and adjust stats via consistency checker#21345tbg merged 2 commits intocockroachdb:masterfrom
Conversation
d16989d to
633d8f5
Compare
|
Reviewed 1 of 2 files at r1, 20 of 20 files at r2. pkg/roachpb/api.proto, line 397 at r2 (raw file):
This looks unused. pkg/roachpb/api.proto, line 400 at r2 (raw file):
Do you expect to use this anywhere but tests? We're going to be issuing this command without dry_run in an automated fashion and it should be safe and idempotent, so I don't think a dry run flag is necessary. (a dry run is not any cheaper than a full run since even a "dry run" goes through raft as a write) pkg/storage/replica_consistency.go, line 132 at r2 (raw file):
If we've lost the lease, this request may go to another node, and during an upgrade it may not have the AdjustStats method. We should make sure that this doesn't log in a scary way. pkg/storage/batcheval/cmd_adjust_stats.go, line 40 at r2 (raw file):
The start key is a real key, so this will interfere with at least some regular transactions. Maybe we should target this command to the range descriptor instead? pkg/storage/batcheval/cmd_adjust_stats.go, line 68 at r2 (raw file):
This could check Comments from Reviewable |
|
Reviewed 2 of 2 files at r1, 20 of 20 files at r2. pkg/roachpb/api.proto, line 386 at r2 (raw file):
or what? pkg/roachpb/api.proto, line 393 at r2 (raw file):
pkg/storage/consistency_queue_test.go, line 37 at r2 (raw file):
Sorry, it just sticks out like a sore thumb :) pkg/storage/consistency_queue_test.go, line 328 at r2 (raw file):
Should we force a run of the pkg/storage/helpers_test.go, line 28 at r2 (raw file):
ditto pkg/storage/replica_command.go, line 26 at r2 (raw file):
Is this a goland issue? pkg/storage/replica_consistency.go, line 122 at r2 (raw file):
We should note that this doesn't mean that it will never be run twice at the same time. pkg/storage/replica_consistency.go, line 132 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I was also surprised that this didn't require a new cluster version. pkg/storage/batcheval/cmd_adjust_stats.go, line 45 at r2 (raw file):
If we wanted to avoid concurrent adjustments, couldn't we just make one of these declared keys a write? pkg/storage/batcheval/cmd_adjust_stats.go, line 72 at r2 (raw file):
Could you expand the first sentence to explain why we need to unwrap it? Something like "unwrap it because we lied to it in declareKeysAdjustStats". Comments from Reviewable |
The semantics for computing GCBytesAge were incorrect and are fixed in this commit. Prior to this commit, a non-live write would accrue GCBytesAge from its own timestamp on. That is, if you wrote two versions of a key at 1s and 2s, then when the older version is replaced (at 2s) it would start out with one second of age (from 1s to 2s). However, the key *really* became non-live at 2s, and should have had an age of zero. By extension, imagine a table with lots of writes all dating back to early 2017, and assume that today (early 2018) all these writes are deleted (i.e. a tombstone placed on top of them). Prior to this commit, each key would immediately get assigned an age of `(early 2018) - early 2017)`, i.e. a very large number. Yet, the GC queue could only garbage collect them after (early 2018) + TTL`, so by default 25 hours after the deletion. We use GCBytesAge to trigger the GC queue, so that would cause the GC queue to run without ever getting to remove anything, for the TTL. This was a big problem bound to be noticed by users. This commit changes the semantics to what the GCQueue (and the layman) expects: 1. when a version is shadowed, it becomes non-live at that point and also starts accruing GCBytesAge from that point on. 2. deletion tombstones are an exception: They accrue age from their own timestamp on. This makes sense because a tombstone can be deleted whenever it's older than the TTL (as opposed to a value, which can only be deleted once it's been *shadowed* for longer than the TTL). This work started out by updating `ComputeStatsGo` to have the desired semantics, fixing up existing tests, and then stress testing `TestMVCCStatsRandomized` with short history lengths to discover failure modes which were then transcribed into small unit tests. When no more such failures were discoverable, the resulting logic in the various incremental MVCCStats update helpers was simplified and documented, and `ComputeStats` updated accordingly. In turn, `TestMVCCStatsBasic` was removed: it was notoriously hard to read and maintain, and does not add any coverage at this point. The recomputation of the stats in existing clusters is addressed in cockroachdb#21345. Fixes cockroachdb#20554.
633d8f5 to
3d07b83
Compare
|
I still have to figure out what happens when introducing new batch requests and sending them to a node that doesn't know them. I hope I addressed the rest. Review status: 1 of 21 files reviewed at latest revision, 14 unresolved discussions. pkg/roachpb/api.proto, line 386 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/roachpb/api.proto, line 393 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/roachpb/api.proto, line 397 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/roachpb/api.proto, line 400 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yeah, only in tests. pkg/storage/consistency_queue_test.go, line 37 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/consistency_queue_test.go, line 328 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The queue is trying to run quickly. I think there's just a lot of stuff going on in pkg/storage/helpers_test.go, line 28 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I've checked pkg/storage/replica_command.go, line 26 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
https://youtrack.jetbrains.com/issue/GO-3092 :( Interestingly, diff --git a/pkg/storage/replica_command.go b/pkg/storage/replica_command.go
index 9ede04f7c..16513f7a5 100644
--- a/pkg/storage/replica_command.go
+++ b/pkg/storage/replica_command.go
@@ -23,10 +23,11 @@ import (
"sync/atomic"
"time"
- "github.com/cockroachdb/cockroach/pkg/storage/rditer"
"github.com/coreos/etcd/raft/raftpb"
"github.com/pkg/errors"
+ "github.com/cockroachdb/cockroach/pkg/storage/rditer"
+
"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/internal/client"
"github.com/cockroachdb/cockroach/pkg/keys"I'll humor you in this PR, but generally I'm not super excited by these nits (though I can understand them). pkg/storage/replica_consistency.go, line 122 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica_consistency.go, line 132 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Hmm, that's a good point. I've tried to test this out, but probably didn't do this right. What I think will happen is that we send a
pkg/storage/batcheval/cmd_adjust_stats.go, line 40 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
PTAL. I think declaring a read on the range descriptor (for splits) and a write on some key that only we use is the way to go. pkg/storage/batcheval/cmd_adjust_stats.go, line 45 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We could, but then we'd block a whole lot more. I did something though, see the comment thread above this one (and the updated code). pkg/storage/batcheval/cmd_adjust_stats.go, line 68 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/batcheval/cmd_adjust_stats.go, line 72 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. Comments from Reviewable |
|
LGTM aside from the question of migration. Reviewed 1 of 20 files at r2, 21 of 21 files at r4. pkg/storage/replica_consistency.go, line 132 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I think it'll be a non-empty Comments from Reviewable |
|
Breaks as expected. I'll fix this regression, send a patch to the v1.1 series, and add migrations for Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed. Comments from Reviewable |
|
Ah, this panic actually doesn't apply to realistic requests (it's the test-only store-driven replica lookup path), but I'm sure it would break in other ways and should be prevented from getting into the system in the first place. Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed. Comments from Reviewable |
When a new BatchRequest type is introduced, we have to be careful to not send it to older versions of CockroachDB because the new request will simply be dropped when unmarshaling the proto. This leads to a `ba.Requests` slice containing zero values, which leads to panics. This commit adds a guard against unknown requests into `(*Node).Batch` so that beginning in v2.0, it is less dramatic to forget the correct cluster migration which gates use of new request types properly. (I am not suggesting leaving out these migrations when this commit has matured). See: cockroachdb#21345 (comment). Release note: None
3d07b83 to
67f016f
Compare
|
Added the migration ( Will merge on green, TFTRs! |
When a new BatchRequest type is introduced, we have to be careful to not send it to older versions of CockroachDB because the new request will simply be dropped when unmarshaling the proto. This leads to a `ba.Requests` slice containing zero values, which leads to panics. This commit adds a guard against unknown requests into `(*Node).Batch` so that beginning in v2.0, it is less dramatic to forget the correct cluster migration which gates use of new request types properly. (I am not suggesting leaving out these migrations when this commit has matured). See: cockroachdb#21345 (comment). Release note: None
67f016f to
3f3133d
Compare
3f3133d to
d55ca52
Compare
The semantics for computing GCBytesAge were incorrect and are fixed in this commit. Prior to this commit, a non-live write would accrue GCBytesAge from its own timestamp on. That is, if you wrote two versions of a key at 1s and 2s, then when the older version is replaced (at 2s) it would start out with one second of age (from 1s to 2s). However, the key *really* became non-live at 2s, and should have had an age of zero. By extension, imagine a table with lots of writes all dating back to early 2017, and assume that today (early 2018) all these writes are deleted (i.e. a tombstone placed on top of them). Prior to this commit, each key would immediately get assigned an age of `(early 2018) - early 2017)`, i.e. a very large number. Yet, the GC queue could only garbage collect them after (early 2018) + TTL`, so by default 25 hours after the deletion. We use GCBytesAge to trigger the GC queue, so that would cause the GC queue to run without ever getting to remove anything, for the TTL. This was a big problem bound to be noticed by users. This commit changes the semantics to what the GCQueue (and the layman) expects: 1. when a version is shadowed, it becomes non-live at that point and also starts accruing GCBytesAge from that point on. 2. deletion tombstones are an exception: They accrue age from their own timestamp on. This makes sense because a tombstone can be deleted whenever it's older than the TTL (as opposed to a value, which can only be deleted once it's been *shadowed* for longer than the TTL). This work started out by updating `ComputeStatsGo` to have the desired semantics, fixing up existing tests, and then stress testing `TestMVCCStatsRandomized` with short history lengths to discover failure modes which were then transcribed into small unit tests. When no more such failures were discoverable, the resulting logic in the various incremental MVCCStats update helpers was simplified and documented, and `ComputeStats` updated accordingly. In turn, `TestMVCCStatsBasic` was removed: it was notoriously hard to read and maintain, and does not add any coverage at this point. The recomputation of the stats in existing clusters is addressed in cockroachdb#21345. Fixes cockroachdb#20554. Release note (bug fix): fix a problem that could cause spurious GC activity, in particular after dropping a table.
The semantics for computing GCBytesAge were incorrect and are fixed in this commit. Prior to this commit, a non-live write would accrue GCBytesAge from its own timestamp on. That is, if you wrote two versions of a key at 1s and 2s, then when the older version is replaced (at 2s) it would start out with one second of age (from 1s to 2s). However, the key *really* became non-live at 2s, and should have had an age of zero. By extension, imagine a table with lots of writes all dating back to early 2017, and assume that today (early 2018) all these writes are deleted (i.e. a tombstone placed on top of them). Prior to this commit, each key would immediately get assigned an age of `(early 2018) - early 2017)`, i.e. a very large number. Yet, the GC queue could only garbage collect them after (early 2018) + TTL`, so by default 25 hours after the deletion. We use GCBytesAge to trigger the GC queue, so that would cause the GC queue to run without ever getting to remove anything, for the TTL. This was a big problem bound to be noticed by users. This commit changes the semantics to what the GCQueue (and the layman) expects: 1. when a version is shadowed, it becomes non-live at that point and also starts accruing GCBytesAge from that point on. 2. deletion tombstones are an exception: They accrue age from their own timestamp on. This makes sense because a tombstone can be deleted whenever it's older than the TTL (as opposed to a value, which can only be deleted once it's been *shadowed* for longer than the TTL). This work started out by updating `ComputeStatsGo` to have the desired semantics, fixing up existing tests, and then stress testing `TestMVCCStatsRandomized` with short history lengths to discover failure modes which were then transcribed into small unit tests. When no more such failures were discoverable, the resulting logic in the various incremental MVCCStats update helpers was simplified and documented, and `ComputeStats` updated accordingly. In turn, `TestMVCCStatsBasic` was removed: it was notoriously hard to read and maintain, and does not add any coverage at this point. The recomputation of the stats in existing clusters is addressed in cockroachdb#21345. Fixes cockroachdb#20554. Release note (bug fix): fix a problem that could cause spurious GC activity, in particular after dropping a table.
d55ca52 to
ffd20e7
Compare
ffd20e7 to
5bf5f49
Compare
|
Fixing up that test was more interesting than I thought it was going to be, but now everything should be hunky dory. @nvanbenschoten could you give that a look? Review status: 1 of 24 files reviewed at latest revision, 12 unresolved discussions. Comments from Reviewable |
5bf5f49 to
9b8b807
Compare
Release note: None
A number of bugs in our MVCCStats logic has been fixed recently (see for example \cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for future bugs or deliberate adjustments of the computations, we need a mechanism to recompute the stats in order to purge incorrect numbers from the system over time. Such a mechanism is introduced here. It consists of two main components: - A new RPC `RecomputeStats`, which applies to a single Range and computes the difference between the persisted stats and the recomputed ones; it can "inject" a suitable delta into the stats (thus fixing the divergence) or do a "dry run". - A trigger in the consistency checker that runs on the coordinating node (the lease holder). The consistency checker already recomputes the stats, and it can compare them against the persisted stats and judge whether there is a divergence. If there is one, naively one may hope to just insert the newly computed diff into the range, but this is not ideal due to concerns about double application and racing consistency checks. Instead, use `RecomputeStats` (which, of course, was introduced for this purpose) which strikes a balance between efficiency and correctness. Updates cockroachdb#20554. Release note (general change): added a mechanism to recompute range stats on demand.
9b8b807 to
aaa6acd
Compare
|
Reviewed 1 of 20 files at r2, 1 of 21 files at r4, 6 of 21 files at r5, 23 of 23 files at r6. pkg/storage/batcheval/cmd_recompute_stats.go, line 49 at r6 (raw file):
Do any other commands declare this key? Or is it completely synthetic? pkg/storage/batcheval/cmd_recompute_stats.go, line 79 at r6 (raw file):
Nice, this actually makes things simpler! Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful. pkg/storage/batcheval/cmd_recompute_stats.go, line 49 at r6 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Synthetic, though I assume a split would use it, but very unlikely with a nil ID. Comments from Reviewable |
A number of bugs in our MVCCStats logic has been fixed recently (e.g. #20996)
and others are still present (#20554). For both, and also potentially for future
bugs or deliberate adjustments of the computations, we need a mechanism to
recompute the stats in order to purge incorrect numbers from the system over
time. Such a mechanism is introduced here.
It consists of two main components:
AdjustStats, which applies to a single Range and computes thedifference between the persisted stats and the recomputed ones; it can "inject"
a suitable delta into the stats (thus fixing the divergence) or do a "dry run".
lease holder). The consistency checker already recomputes the stats, and it can
compare them against the persisted stats and judge whether there is a divergence.
If there is one, naively one may hope to just insert the newly computed diff into
the range, but this is not ideal due to concerns about double application and racing
consistency checks. Instead, use
AdjustStats(which, of course, was introduced forthis purpose) which strikes a balance between efficiency and correctness.
Updates #20554.
Release note (general change): introduced a mechanism to recompute range stats
automatically over time to reflect changes in the underlying logic.