Skip to content

storage: recompute and adjust stats via consistency checker#21345

Merged
tbg merged 2 commits intocockroachdb:masterfrom
tbg:cons-check-adjust-stats
Jan 17, 2018
Merged

storage: recompute and adjust stats via consistency checker#21345
tbg merged 2 commits intocockroachdb:masterfrom
tbg:cons-check-adjust-stats

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jan 9, 2018

A number of bugs in our MVCCStats logic has been fixed recently (e.g. #20996)
and others are still present (#20554). For both, and also potentially for future
bugs or deliberate adjustments of the computations, we need a mechanism to
recompute the stats in order to purge incorrect numbers from the system over
time. Such a mechanism is introduced here.

It consists of two main components:

  • A new RPC AdjustStats, which applies to a single Range and computes the
    difference between the persisted stats and the recomputed ones; it can "inject"
    a suitable delta into the stats (thus fixing the divergence) or do a "dry run".
  • A trigger in the consistency checker that runs on the coordinating node (the
    lease holder). The consistency checker already recomputes the stats, and it can
    compare them against the persisted stats and judge whether there is a divergence.
    If there is one, naively one may hope to just insert the newly computed diff into
    the range, but this is not ideal due to concerns about double application and racing
    consistency checks. Instead, use AdjustStats (which, of course, was introduced for
    this purpose) which strikes a balance between efficiency and correctness.

Updates #20554.

Release note (general change): introduced a mechanism to recompute range stats
automatically over time to reflect changes in the underlying logic.

@tbg tbg requested a review from a team as a code owner January 9, 2018 15:49
@tbg tbg requested a review from a team January 9, 2018 15:49
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg force-pushed the cons-check-adjust-stats branch 3 times, most recently from d16989d to 633d8f5 Compare January 9, 2018 17:36
@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Jan 9, 2018

:lgtm:


Reviewed 1 of 2 files at r1, 20 of 20 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/roachpb/api.proto, line 397 at r2 (raw file):

  Span header = 1 [(gogoproto.nullable) = false, (gogoproto.embed) = true];
  bytes computation_end_key = 2 [(gogoproto.casttype) = "Key"];

This looks unused.


pkg/roachpb/api.proto, line 400 at r2 (raw file):

  // When dry_run is true, the stats delta is computed, but no stats adjustment
  // is performed.
  bool dry_run = 3;

Do you expect to use this anywhere but tests? We're going to be issuing this command without dry_run in an automated fashion and it should be safe and idempotent, so I don't think a dry run flag is necessary. (a dry run is not any cheaper than a full run since even a "dry run" goes through raft as a write)


pkg/storage/replica_consistency.go, line 132 at r2 (raw file):

		b.AddRawRequest(&req)

		err := r.store.db.Run(ctx, &b)

If we've lost the lease, this request may go to another node, and during an upgrade it may not have the AdjustStats method. We should make sure that this doesn't log in a scary way.


pkg/storage/batcheval/cmd_adjust_stats.go, line 40 at r2 (raw file):

	// Declare only the target key, even though we're really iterating over a key range. This is OK
	// since all we're doing is computing a stats delta, and applying this delta commutes with other
	// operations on the same key space (except splits, which always touch the start key).

The start key is a real key, so this will interfere with at least some regular transactions. Maybe we should target this command to the range descriptor instead?


pkg/storage/batcheval/cmd_adjust_stats.go, line 68 at r2 (raw file):

	}

	if !descSpan.Equal(reqSpan) {

This could check args.Key.Equal(desc.StartKey) since the EndKey is set identically in both cases.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jan 9, 2018

Reviewed 2 of 2 files at r1, 20 of 20 files at r2.
Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


pkg/roachpb/api.proto, line 386 at r2 (raw file):

//
// Since this request targets a specific Range, the start key must equal the
// start key of the target Range.

or what?


pkg/roachpb/api.proto, line 393 at r2 (raw file):

// it is safe to invoke at any time, including repeatedly, though it should be
// used conservatively due to performing a full scan of the Range.
message AdjustStatsRequest {

AdjustStatsRequest sounds a bit off to me. If this request took a stats delta to apply without scanning the entire range it would seem appropriate, but as is, I don't think this conveys the right information. RecomputeStatsRequest seems like a better name because it implies how expensive this might be.


pkg/storage/consistency_queue_test.go, line 37 at r2 (raw file):

	"github.com/cockroachdb/cockroach/pkg/util/log"
	"github.com/cockroachdb/cockroach/pkg/util/retry"
	"github.com/pkg/errors"

Sorry, it just sticks out like a sore thumb :)


pkg/storage/consistency_queue_test.go, line 328 at r2 (raw file):

	// The stats should magically repair themselves. The high timeout is only
	// relevant in stress testing; it's usually quick.
	if err := retry.ForDuration(3*time.Minute, func() error {

Should we force a run of the consistencyQueue? 3 minutes seems like a long time.


pkg/storage/helpers_test.go, line 28 at r2 (raw file):

	"time"

	"github.com/cockroachdb/cockroach/pkg/storage/rditer"

ditto


pkg/storage/replica_command.go, line 26 at r2 (raw file):

	"time"

	"github.com/cockroachdb/cockroach/pkg/storage/rditer"

Is this a goland issue?


pkg/storage/replica_consistency.go, line 122 at r2 (raw file):

		// Note that this code runs only on the lease holder (at the time of initiating
		// the computation), so this work isn't duplicated. Also, we're essentially
		// paced by the consistency checker.

We should note that this doesn't mean that it will never be run twice at the same time.


pkg/storage/replica_consistency.go, line 132 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If we've lost the lease, this request may go to another node, and during an upgrade it may not have the AdjustStats method. We should make sure that this doesn't log in a scary way.

I was also surprised that this didn't require a new cluster version.


pkg/storage/batcheval/cmd_adjust_stats.go, line 45 at r2 (raw file):

	// shorten the range, so our recomputation would be bogus), though declaring the start key
	// (above this comment) already has that effect.
	spans.Add(spanset.SpanReadOnly, roachpb.Span{Key: keys.RangeDescriptorKey(desc.StartKey)})

If we wanted to avoid concurrent adjustments, couldn't we just make one of these declared keys a write?


pkg/storage/batcheval/cmd_adjust_stats.go, line 72 at r2 (raw file):

	}

	// If this is a SpanSetBatch, unwrap it. We're intentionally reading without

Could you expand the first sentence to explain why we need to unwrap it? Something like "unwrap it because we lied to it in declareKeysAdjustStats".


Comments from Reviewable

tbg added a commit to tbg/cockroach that referenced this pull request Jan 11, 2018
The semantics for computing GCBytesAge were incorrect and are fixed in
this commit. Prior to this commit, a non-live write would accrue
GCBytesAge from its own timestamp on. That is, if you wrote two versions
of a key at 1s and 2s, then when the older version is replaced (at 2s)
it would start out with one second of age (from 1s to 2s). However, the
key *really* became non-live at 2s, and should have had an age of zero.

By extension, imagine a table with lots of writes all dating back to
early 2017, and assume that today (early 2018) all these writes are
deleted (i.e. a tombstone placed on top of them). Prior to this commit,
each key would immediately get assigned an age of `(early 2018) - early
2017)`, i.e. a very large number. Yet, the GC queue could only garbage
collect them after (early 2018) + TTL`, so by default 25 hours after
the deletion. We use GCBytesAge to trigger the GC queue, so that would
cause the GC queue to run without ever getting to remove anything, for
the TTL. This was a big problem bound to be noticed by users.

This commit changes the semantics to what the GCQueue (and the layman)
expects:

1. when a version is shadowed, it becomes non-live at that point and
   also starts accruing GCBytesAge from that point on.
2. deletion tombstones are an exception: They accrue age from their
   own timestamp on. This makes sense because a tombstone can be
   deleted whenever it's older than the TTL (as opposed to a value,
   which can only be deleted once it's been *shadowed* for longer
   than the TTL).

This work started out by updating `ComputeStatsGo` to have the desired
semantics, fixing up existing tests, and then stress testing
`TestMVCCStatsRandomized` with short history lengths to discover failure
modes which were then transcribed into small unit tests. When no more
such failures were discoverable, the resulting logic in the various
incremental MVCCStats update helpers was simplified and documented, and
`ComputeStats` updated accordingly. In turn, `TestMVCCStatsBasic` was
removed: it was notoriously hard to read and maintain, and does not add
any coverage at this point.

The recomputation of the stats in existing clusters is addressed in
cockroachdb#21345.

Fixes cockroachdb#20554.
@tbg tbg force-pushed the cons-check-adjust-stats branch from 633d8f5 to 3d07b83 Compare January 12, 2018 22:53
@tbg tbg requested review from a team January 12, 2018 22:53
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 12, 2018

I still have to figure out what happens when introducing new batch requests and sending them to a node that doesn't know them. I hope I addressed the rest.


Review status: 1 of 21 files reviewed at latest revision, 14 unresolved discussions.


pkg/roachpb/api.proto, line 386 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

or what?

Done.


pkg/roachpb/api.proto, line 393 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

AdjustStatsRequest sounds a bit off to me. If this request took a stats delta to apply without scanning the entire range it would seem appropriate, but as is, I don't think this conveys the right information. RecomputeStatsRequest seems like a better name because it implies how expensive this might be.

Done.


pkg/roachpb/api.proto, line 397 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This looks unused.

Done.


pkg/roachpb/api.proto, line 400 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Do you expect to use this anywhere but tests? We're going to be issuing this command without dry_run in an automated fashion and it should be safe and idempotent, so I don't think a dry run flag is necessary. (a dry run is not any cheaper than a full run since even a "dry run" goes through raft as a write)

Yeah, only in tests.


pkg/storage/consistency_queue_test.go, line 37 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Sorry, it just sticks out like a sore thumb :)

Done.


pkg/storage/consistency_queue_test.go, line 328 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we force a run of the consistencyQueue? 3 minutes seems like a long time.

The queue is trying to run quickly. I think there's just a lot of stuff going on in testrace that slows it down. 3 minutes is just to catch the tail end of it.


pkg/storage/helpers_test.go, line 28 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

ditto

I've checked


pkg/storage/replica_command.go, line 26 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this a goland issue?

https://youtrack.jetbrains.com/issue/GO-3092 :(

Interestingly, goimports -local doesn't completely fix this:

diff --git a/pkg/storage/replica_command.go b/pkg/storage/replica_command.go
index 9ede04f7c..16513f7a5 100644
--- a/pkg/storage/replica_command.go
+++ b/pkg/storage/replica_command.go
@@ -23,10 +23,11 @@ import (
 	"sync/atomic"
 	"time"

-	"github.com/cockroachdb/cockroach/pkg/storage/rditer"
 	"github.com/coreos/etcd/raft/raftpb"
 	"github.com/pkg/errors"

+	"github.com/cockroachdb/cockroach/pkg/storage/rditer"
+
 	"github.com/cockroachdb/cockroach/pkg/base"
 	"github.com/cockroachdb/cockroach/pkg/internal/client"
 	"github.com/cockroachdb/cockroach/pkg/keys"

I'll humor you in this PR, but generally I'm not super excited by these nits (though I can understand them).


pkg/storage/replica_consistency.go, line 122 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should note that this doesn't mean that it will never be run twice at the same time.

Done.


pkg/storage/replica_consistency.go, line 132 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I was also surprised that this didn't require a new cluster version.

Hmm, that's a good point. I've tried to test this out, but probably didn't do this right. What I think will happen is that we send a BatchRequest with a length-one Requests that will look like a zero-length Requests at the receiver. That will either receive an error (empty batch) or an empty response. In the latter case we're likely going to shoot ourselves in the foot because we rely on there being a response in the BatchResponse.

ClearRange would have a similar problem. I've added a TODO, intending to look at this in this PR.


pkg/storage/batcheval/cmd_adjust_stats.go, line 40 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The start key is a real key, so this will interfere with at least some regular transactions. Maybe we should target this command to the range descriptor instead?

PTAL. I think declaring a read on the range descriptor (for splits) and a write on some key that only we use is the way to go.


pkg/storage/batcheval/cmd_adjust_stats.go, line 45 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we wanted to avoid concurrent adjustments, couldn't we just make one of these declared keys a write?

We could, but then we'd block a whole lot more. I did something though, see the comment thread above this one (and the updated code).


pkg/storage/batcheval/cmd_adjust_stats.go, line 68 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This could check args.Key.Equal(desc.StartKey) since the EndKey is set identically in both cases.

Done.


pkg/storage/batcheval/cmd_adjust_stats.go, line 72 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you expand the first sentence to explain why we need to unwrap it? Something like "unwrap it because we lied to it in declareKeysAdjustStats".

Done.


Comments from Reviewable

@bdarnell
Copy link
Copy Markdown
Contributor

LGTM aside from the question of migration.


Reviewed 1 of 20 files at r2, 21 of 21 files at r4.
Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.


pkg/storage/replica_consistency.go, line 132 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Hmm, that's a good point. I've tried to test this out, but probably didn't do this right. What I think will happen is that we send a BatchRequest with a length-one Requests that will look like a zero-length Requests at the receiver. That will either receive an error (empty batch) or an empty response. In the latter case we're likely going to shoot ourselves in the foot because we rely on there being a response in the BatchResponse.

ClearRange would have a similar problem. I've added a TODO, intending to look at this in this PR.

I think it'll be a non-empty Requests slice, but attempting to extract the request will result in nil, which I'm pretty sure will panic somewhere. If it makes it far enough, Replica.Send will treat it as read-only since it doesn't contain any requests with the write or admin flags, then evaluateBatch will panic in RequestUnion.GetInner. So I think we will need a migration for this case. Adding a check so that unknown commands return a clean error instead of a panic would make future cases like this easier since we could catch the error instead of killing the node.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 16, 2018

Breaks as expected. I'll fix this regression, send a patch to the v1.1 series, and add migrations for ClearRange and RecomputeStats.

E180116 09:56:14.048458 1796 util/log/crash_reporting.go:83  [n1] a panic has occurred!
E180116 09:56:14.177092 1796 util/log/crash_reporting.go:176  [n1] Reported as error 4b753b8a92df46698d28f184860cbb08
panic: interface conversion: interface is nil, not roachpb.Request [recovered]
        panic: interface conversion: interface is nil, not roachpb.Request

goroutine 1796 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc420568500, 0x6e6b9e8, 0xc420c2eb40)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:200 +0xb1
panic(0x533ed00, 0xc420db8180)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/cockroachdb/cockroach/pkg/roachpb.RequestUnion.GetInner(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.go:367 +0x4f
github.com/cockroachdb/cockroach/pkg/keys.Range(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/keys/keys.go:637 +0x17f
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send(0xc4202b4080, 0x6e6b9e8, 0xc420c2eba0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, .
..)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:148 +0x66
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x6e6b9e8, 0xc420c2eba0, 0x0, 0x0)

Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 16, 2018

Ah, this panic actually doesn't apply to realistic requests (it's the test-only store-driven replica lookup path), but I'm sure it would break in other ways and should be prevented from getting into the system in the first place.


Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.


Comments from Reviewable

tbg added a commit to tbg/cockroach that referenced this pull request Jan 16, 2018
When a new BatchRequest type is introduced, we have to be careful to
not send it to older versions of CockroachDB because the new request
will simply be dropped when unmarshaling the proto. This leads to a
`ba.Requests` slice containing zero values, which leads to panics.

This commit adds a guard against unknown requests into `(*Node).Batch`
so that beginning in v2.0, it is less dramatic to forget the correct
cluster migration which gates use of new request types properly.
(I am not suggesting leaving out these migrations when this commit
has matured).

See:
cockroachdb#21345 (comment).

Release note: None
@tbg tbg force-pushed the cons-check-adjust-stats branch from 3d07b83 to 67f016f Compare January 16, 2018 15:40
@tbg tbg requested a review from a team January 16, 2018 15:40
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 16, 2018

Added the migration (ClearRange was OK already, I had overlooked the migration earlier).
Refusing unknown requests is in #21466.

Will merge on green, TFTRs!

tbg added a commit to tbg/cockroach that referenced this pull request Jan 16, 2018
When a new BatchRequest type is introduced, we have to be careful to
not send it to older versions of CockroachDB because the new request
will simply be dropped when unmarshaling the proto. This leads to a
`ba.Requests` slice containing zero values, which leads to panics.

This commit adds a guard against unknown requests into `(*Node).Batch`
so that beginning in v2.0, it is less dramatic to forget the correct
cluster migration which gates use of new request types properly.
(I am not suggesting leaving out these migrations when this commit
has matured).

See:
cockroachdb#21345 (comment).

Release note: None
@tbg tbg force-pushed the cons-check-adjust-stats branch from 67f016f to 3f3133d Compare January 16, 2018 17:40
@tbg tbg force-pushed the cons-check-adjust-stats branch from 3f3133d to d55ca52 Compare January 16, 2018 19:12
tbg added a commit to tbg/cockroach that referenced this pull request Jan 16, 2018
The semantics for computing GCBytesAge were incorrect and are fixed in
this commit. Prior to this commit, a non-live write would accrue
GCBytesAge from its own timestamp on. That is, if you wrote two versions
of a key at 1s and 2s, then when the older version is replaced (at 2s)
it would start out with one second of age (from 1s to 2s). However, the
key *really* became non-live at 2s, and should have had an age of zero.

By extension, imagine a table with lots of writes all dating back to
early 2017, and assume that today (early 2018) all these writes are
deleted (i.e. a tombstone placed on top of them). Prior to this commit,
each key would immediately get assigned an age of `(early 2018) - early
2017)`, i.e. a very large number. Yet, the GC queue could only garbage
collect them after (early 2018) + TTL`, so by default 25 hours after
the deletion. We use GCBytesAge to trigger the GC queue, so that would
cause the GC queue to run without ever getting to remove anything, for
the TTL. This was a big problem bound to be noticed by users.

This commit changes the semantics to what the GCQueue (and the layman)
expects:

1. when a version is shadowed, it becomes non-live at that point and
   also starts accruing GCBytesAge from that point on.
2. deletion tombstones are an exception: They accrue age from their
   own timestamp on. This makes sense because a tombstone can be
   deleted whenever it's older than the TTL (as opposed to a value,
   which can only be deleted once it's been *shadowed* for longer
   than the TTL).

This work started out by updating `ComputeStatsGo` to have the desired
semantics, fixing up existing tests, and then stress testing
`TestMVCCStatsRandomized` with short history lengths to discover failure
modes which were then transcribed into small unit tests. When no more
such failures were discoverable, the resulting logic in the various
incremental MVCCStats update helpers was simplified and documented, and
`ComputeStats` updated accordingly. In turn, `TestMVCCStatsBasic` was
removed: it was notoriously hard to read and maintain, and does not add
any coverage at this point.

The recomputation of the stats in existing clusters is addressed in
cockroachdb#21345.

Fixes cockroachdb#20554.

Release note (bug fix): fix a problem that could cause spurious GC
activity, in particular after dropping a table.
tbg added a commit to tbg/cockroach that referenced this pull request Jan 16, 2018
The semantics for computing GCBytesAge were incorrect and are fixed in
this commit. Prior to this commit, a non-live write would accrue
GCBytesAge from its own timestamp on. That is, if you wrote two versions
of a key at 1s and 2s, then when the older version is replaced (at 2s)
it would start out with one second of age (from 1s to 2s). However, the
key *really* became non-live at 2s, and should have had an age of zero.

By extension, imagine a table with lots of writes all dating back to
early 2017, and assume that today (early 2018) all these writes are
deleted (i.e. a tombstone placed on top of them). Prior to this commit,
each key would immediately get assigned an age of `(early 2018) - early
2017)`, i.e. a very large number. Yet, the GC queue could only garbage
collect them after (early 2018) + TTL`, so by default 25 hours after
the deletion. We use GCBytesAge to trigger the GC queue, so that would
cause the GC queue to run without ever getting to remove anything, for
the TTL. This was a big problem bound to be noticed by users.

This commit changes the semantics to what the GCQueue (and the layman)
expects:

1. when a version is shadowed, it becomes non-live at that point and
   also starts accruing GCBytesAge from that point on.
2. deletion tombstones are an exception: They accrue age from their
   own timestamp on. This makes sense because a tombstone can be
   deleted whenever it's older than the TTL (as opposed to a value,
   which can only be deleted once it's been *shadowed* for longer
   than the TTL).

This work started out by updating `ComputeStatsGo` to have the desired
semantics, fixing up existing tests, and then stress testing
`TestMVCCStatsRandomized` with short history lengths to discover failure
modes which were then transcribed into small unit tests. When no more
such failures were discoverable, the resulting logic in the various
incremental MVCCStats update helpers was simplified and documented, and
`ComputeStats` updated accordingly. In turn, `TestMVCCStatsBasic` was
removed: it was notoriously hard to read and maintain, and does not add
any coverage at this point.

The recomputation of the stats in existing clusters is addressed in
cockroachdb#21345.

Fixes cockroachdb#20554.

Release note (bug fix): fix a problem that could cause spurious GC
activity, in particular after dropping a table.
@tbg tbg force-pushed the cons-check-adjust-stats branch from d55ca52 to ffd20e7 Compare January 16, 2018 20:54
@tbg tbg force-pushed the cons-check-adjust-stats branch from ffd20e7 to 5bf5f49 Compare January 17, 2018 15:30
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 17, 2018

Fixing up that test was more interesting than I thought it was going to be, but now everything should be hunky dory. @nvanbenschoten could you give that a look?


Review status: 1 of 24 files reviewed at latest revision, 12 unresolved discussions.


Comments from Reviewable

@tbg tbg force-pushed the cons-check-adjust-stats branch from 5bf5f49 to 9b8b807 Compare January 17, 2018 15:44
tbg added 2 commits January 17, 2018 10:52
A number of bugs in our MVCCStats logic has been fixed recently (see for example
\cockroachdb#20996) and others are still present (cockroachdb#20554). For both, and also potentially for
future bugs or deliberate adjustments of the computations, we need a mechanism
to recompute the stats in order to purge incorrect numbers from the system over
time. Such a mechanism is introduced here.

It consists of two main components:

- A new RPC `RecomputeStats`, which applies to a single Range and
computes the difference between the persisted stats and the recomputed
ones; it can "inject" a suitable delta into the stats (thus fixing the
divergence) or do a "dry run".
- A trigger in the consistency checker that runs on the coordinating
node (the lease holder). The consistency checker already recomputes the
stats, and it can compare them against the persisted stats and judge
whether there is a divergence. If there is one, naively one may hope to
just insert the newly computed diff into the range, but this is not
ideal due to concerns about double application and racing consistency
checks. Instead, use `RecomputeStats` (which, of course, was introduced
for this purpose) which strikes a balance between efficiency and
correctness.

Updates cockroachdb#20554.

Release note (general change): added a mechanism to recompute range
stats on demand.
@tbg tbg force-pushed the cons-check-adjust-stats branch from 9b8b807 to aaa6acd Compare January 17, 2018 16:04
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jan 17, 2018

:lgtm_strong:


Reviewed 1 of 20 files at r2, 1 of 21 files at r4, 6 of 21 files at r5, 23 of 23 files at r6.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.


pkg/storage/batcheval/cmd_recompute_stats.go, line 49 at r6 (raw file):

a write on a transaction anchored at the range descriptor key

Do any other commands declare this key? Or is it completely synthetic?


pkg/storage/batcheval/cmd_recompute_stats.go, line 79 at r6 (raw file):

	// we read from the range state if concurrent writes are inflight[1].
	//
	// Note that in doing so, we also circumvent the assertions (present in both

Nice, this actually makes things simpler!


Comments from Reviewable

@tbg tbg merged commit 5354ac2 into cockroachdb:master Jan 17, 2018
@tbg tbg deleted the cons-check-adjust-stats branch January 17, 2018 16:36
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 17, 2018

Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


pkg/storage/batcheval/cmd_recompute_stats.go, line 49 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

a write on a transaction anchored at the range descriptor key

Do any other commands declare this key? Or is it completely synthetic?

Synthetic, though I assume a split would use it, but very unlikely with a nil ID.


Comments from Reviewable

@tbg tbg restored the cons-check-adjust-stats branch April 16, 2018 15:23
@tbg tbg deleted the cons-check-adjust-stats branch May 8, 2018 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants