Skip to content

storage: fix (the semantics of) MVCCStats.GCBytesAge#21407

Merged
tbg merged 1 commit intocockroachdb:masterfrom
tbg:gcbytesage
Jan 16, 2018
Merged

storage: fix (the semantics of) MVCCStats.GCBytesAge#21407
tbg merged 1 commit intocockroachdb:masterfrom
tbg:gcbytesage

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jan 11, 2018

The semantics for computing GCBytesAge were incorrect and are fixed in
this commit. Prior to this commit, a non-live write would accrue
GCBytesAge from its own timestamp on. That is, if you wrote two versions
of a key at 1s and 2s, then when the older version is replaced (at 2s)
it would start out with one second of age (from 1s to 2s). However, the
key really became non-live at 2s, and should have had an age of zero.

By extension, imagine a table with lots of writes all dating back to
early 2017, and assume that today (early 2018) all these writes are
deleted (i.e. a tombstone placed on top of them). Prior to this commit,
each key would immediately get assigned an age of (early 2018) - early 2017), i.e. a very large number. Yet, the GC queue could only garbage
collect them after (early 2018) + TTL`, so by default 25 hours after
the deletion. We use GCBytesAge to trigger the GC queue, so that would
cause the GC queue to run without ever getting to remove anything, for
the TTL. This was a big problem bound to be noticed by users.

This commit changes the semantics to what the GCQueue (and the layman)
expects:

  1. when a version is shadowed, it becomes non-live at that point and
    also starts accruing GCBytesAge from that point on.
  2. deletion tombstones are an exception: They accrue age from their
    own timestamp on. This makes sense because a tombstone can be
    deleted whenever it's older than the TTL (as opposed to a value,
    which can only be deleted once it's been shadowed for longer
    than the TTL).

This work started out by updating ComputeStatsGo to have the desired
semantics, fixing up existing tests, and then stress testing
TestMVCCStatsRandomized with short history lengths to discover failure
modes which were then transcribed into small unit tests. When no more
such failures were discoverable, the resulting logic in the various
incremental MVCCStats update helpers was simplified and documented, and
ComputeStats updated accordingly. In turn, TestMVCCStatsBasic was
removed: it was notoriously hard to read and maintain, and does not add
any coverage at this point.

The recomputation of the stats in existing clusters is addressed in
#21345.

Fixes #20554.

@tbg tbg requested a review from a team January 11, 2018 22:45
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg requested a review from nvb January 11, 2018 22:45
@bdarnell
Copy link
Copy Markdown
Contributor

:lgtm:


Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


pkg/storage/engine/mvcc.go, line 234 at r1 (raw file):

	var ms enginepb.MVCCStats

	if false {

Remove this before merging?


pkg/storage/engine/mvcc.go, line 255 at r1 (raw file):

	if isSysLocal(key) {
		// Handling system-local keys is straightforward because no ageable quantities

If there are transactions and intents (which we handle below), there are age-able quantities. Specifically, the range-local copy of the range descriptor has historical revisions that need to be GC'd. Maybe it's small enough to ignore that for the purposes of GCBytesAge, but the comment should indicate that that's what it's doing.


pkg/storage/engine/mvcc.go, line 413 at r1 (raw file):

	var ms enginepb.MVCCStats

	if false {

Either delete this or put in a comment about why this disabled code is here.


pkg/storage/engine/mvcc.go, line 446 at r1 (raw file):

	// Jump into the method below for extensive commentary on their semantics
	// and "rules one and two".
	_ = updateStatsOnPut

Nice hyperlink :)


pkg/storage/engine/mvcc.go, line 2666 at r1 (raw file):

	var prevKey []byte
	first := false
	var deletionNanos int64

s/deletionNanos/overwriteNanos/?

This could also use a comment about how we're iterating in reverse chronological order, so this tracks the timestamp of a value for use in computing the GC timestamp of its predecessor.


pkg/storage/engine/mvcc.go, line 2702 at r1 (raw file):

		if !isValue || implicitMeta {
			metaKeySize := int64(len(unsafeKey.Key)) + 1 // unsafeKey.EncodedSize()

What's up with this comment?


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jan 16, 2018

Reviewed 1 of 3 files at r1.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


pkg/storage/engine/mvcc.go, line 227 at r1 (raw file):

func updateStatsOnPut(
	key roachpb.Key,
	prevKeySize, prevValSize int64,

It doesn't look like we need prevKeySize. In fact, it only looks like we need prevValSize to determine whether the previous value is a deletion tombstone. Should we trim these arguments? Perhaps just down to prevIsTombstone bool? That would also make a lot of the if prevValSize > 0 { checks easier to read.


pkg/storage/engine/mvcc.go, line 405 at r1 (raw file):

func updateStatsOnResolve(
	key roachpb.Key,
	prevKeySize, prevValSize int64,

Ditto


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 16, 2018

Review status: 0 of 4 files reviewed at latest revision, 8 unresolved discussions.


pkg/storage/engine/mvcc.go, line 227 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It doesn't look like we need prevKeySize. In fact, it only looks like we need prevValSize to determine whether the previous value is a deletion tombstone. Should we trim these arguments? Perhaps just down to prevIsTombstone bool? That would also make a lot of the if prevValSize > 0 { checks easier to read.

We use prevValSize for arithmetic so I wasn't able to turn it into a bool, but we were not using prevKeySize and I removed it. I'm also using a local bool to simplify reading the logic, see if that makes more sense now.


pkg/storage/engine/mvcc.go, line 255 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If there are transactions and intents (which we handle below), there are age-able quantities. Specifically, the range-local copy of the range descriptor has historical revisions that need to be GC'd. Maybe it's small enough to ignore that for the purposes of GCBytesAge, but the comment should indicate that that's what it's doing.

I clarified that we just don't track them but could.


pkg/storage/engine/mvcc.go, line 405 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Ditto

Done.


pkg/storage/engine/mvcc.go, line 413 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Either delete this or put in a comment about why this disabled code is here.

Done, 🤞 that we won't need it any more.


pkg/storage/engine/mvcc.go, line 2666 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/deletionNanos/overwriteNanos/?

This could also use a comment about how we're iterating in reverse chronological order, so this tracks the timestamp of a value for use in computing the GC timestamp of its predecessor.

Added comment. Renamed to accrueGCAgeNanos which I like because it specifies what we're using it for.


pkg/storage/engine/mvcc.go, line 2702 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What's up with this comment?

I toyed around with using EncodedSize() here instead, but decided it wasn't worth it. Removed.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 16, 2018

Review status: 0 of 4 files reviewed at latest revision, 7 unresolved discussions.


pkg/storage/engine/mvcc.go, line 234 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove this before merging?

Done.


Comments from Reviewable

@tbg tbg force-pushed the gcbytesage branch 2 times, most recently from a9aff1f to e3977cb Compare January 16, 2018 20:20
@bdarnell
Copy link
Copy Markdown
Contributor

Reviewed 4 of 4 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


c-deps/libroach/db.cc, line 2361 at r2 (raw file):

  // NB: making this uninitialized triggers compiler warnings
  // with `-Werror=maybe-uninitialized`. This warning seems like
  // a false positive (changing the above line to `first=false`

The above line is first=false, so maybe this comment meant something else?


c-deps/libroach/db.cc, line 2456 at r2 (raw file):

          stats.gc_bytes_age += total_bytes * age_factor(wall_time, now_nanos);
        } else {
          stats.gc_bytes_age += total_bytes * age_factor(accrue_gc_age_nanos, now_nanos);

Maybe assert here that accrue_gc_age_nanos is non-zero to ensure we didn't hit the uninitialized path that the compiler was warning about.


Comments from Reviewable

The semantics for computing GCBytesAge were incorrect and are fixed in
this commit. Prior to this commit, a non-live write would accrue
GCBytesAge from its own timestamp on. That is, if you wrote two versions
of a key at 1s and 2s, then when the older version is replaced (at 2s)
it would start out with one second of age (from 1s to 2s). However, the
key *really* became non-live at 2s, and should have had an age of zero.

By extension, imagine a table with lots of writes all dating back to
early 2017, and assume that today (early 2018) all these writes are
deleted (i.e. a tombstone placed on top of them). Prior to this commit,
each key would immediately get assigned an age of `(early 2018) - early
2017)`, i.e. a very large number. Yet, the GC queue could only garbage
collect them after (early 2018) + TTL`, so by default 25 hours after
the deletion. We use GCBytesAge to trigger the GC queue, so that would
cause the GC queue to run without ever getting to remove anything, for
the TTL. This was a big problem bound to be noticed by users.

This commit changes the semantics to what the GCQueue (and the layman)
expects:

1. when a version is shadowed, it becomes non-live at that point and
   also starts accruing GCBytesAge from that point on.
2. deletion tombstones are an exception: They accrue age from their
   own timestamp on. This makes sense because a tombstone can be
   deleted whenever it's older than the TTL (as opposed to a value,
   which can only be deleted once it's been *shadowed* for longer
   than the TTL).

This work started out by updating `ComputeStatsGo` to have the desired
semantics, fixing up existing tests, and then stress testing
`TestMVCCStatsRandomized` with short history lengths to discover failure
modes which were then transcribed into small unit tests. When no more
such failures were discoverable, the resulting logic in the various
incremental MVCCStats update helpers was simplified and documented, and
`ComputeStats` updated accordingly. In turn, `TestMVCCStatsBasic` was
removed: it was notoriously hard to read and maintain, and does not add
any coverage at this point.

The recomputation of the stats in existing clusters is addressed in
cockroachdb#21345.

Fixes cockroachdb#20554.

Release note (bug fix): fix a problem that could cause spurious GC
activity, in particular after dropping a table.
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 16, 2018

Review status: 3 of 4 files reviewed at latest revision, 4 unresolved discussions.


c-deps/libroach/db.cc, line 2361 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The above line is first=false, so maybe this comment meant something else?

Nope, just wanted to write true. Fixed.


c-deps/libroach/db.cc, line 2456 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Maybe assert here that accrue_gc_age_nanos is non-zero to ensure we didn't hit the uninitialized path that the compiler was warning about.

Done.


Comments from Reviewable

@tbg tbg merged commit d55669f into cockroachdb:master Jan 16, 2018
@tbg tbg deleted the gcbytesage branch January 16, 2018 21:18
@tbg tbg restored the gcbytesage branch April 16, 2018 15:25
@tbg tbg deleted the gcbytesage branch May 8, 2018 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage: unexpected GC queue activity immediately after DROP

4 participants