Skip to content

gcjob: issue DeleteRange tombstones and then wait for GC#85878

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
ajwerner:ajwerner/del-range-gc-job
Aug 14, 2022
Merged

gcjob: issue DeleteRange tombstones and then wait for GC#85878
craig[bot] merged 3 commits intocockroachdb:masterfrom
ajwerner:ajwerner/del-range-gc-job

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

Note that this does not change anything about tenant GC.

Fixes #70427

Release note (sql change): The asynchronous garbage collection process has
been changed such that very soon after dropping a table, index, or database, or
after refreshing a materialized view, the system will issue range deletion
tombstones over the dropped data. These tombstones will result in the KV
statistics properly counting these bytes as garbage. Before this change, the
asynchronous "gc job" would wait out the TTL and then issue a lower-level
operation to clear out the data. That meant that while the job was waiting
out the TTL, the data would appear in the statistics to still be live. This
was confusing.

@ajwerner ajwerner requested a review from a team August 10, 2022 05:25
@ajwerner ajwerner requested review from a team as code owners August 10, 2022 05:25
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just very few nits.

}

// performGC GCs any schema elements that are in the DELETING state and returns
// a bool indicating if it GC'd any elements.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it does not return a bool.

return err
}); err != nil {
if errors.Is(err, catalog.ErrDescriptorNotFound) {
// This can happen if another GC job created for the same table got to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a table can be dropped twice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had bugs in the past where we'd create extra GC jobs.


// TODO(ajwerner): How does this happen?
if !table.Dropped() {
// We shouldn't drop this table yet.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we error out if this can't happen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erroring out in the GC job just leads to it retrying. I don't know how we get here. I think I'll log it.

@ajwerner ajwerner force-pushed the ajwerner/del-range-gc-job branch from fcd2a8d to 3f9904f Compare August 11, 2022 06:11
@ajwerner ajwerner requested a review from a team as a code owner August 11, 2022 06:11
@ajwerner ajwerner requested a review from a team August 11, 2022 06:11
@ajwerner ajwerner requested a review from a team as a code owner August 11, 2022 06:11
@ajwerner ajwerner requested review from HonoreDB and adityamaru and removed request for a team August 11, 2022 06:11
@ajwerner ajwerner force-pushed the ajwerner/del-range-gc-job branch 3 times, most recently from 5d2889c to 875c130 Compare August 12, 2022 15:16
@ajwerner ajwerner requested a review from a team as a code owner August 12, 2022 15:16
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed this at a high level, but LGTM. Thanks!

Comment on lines +130 to +134
// of interest. A case where one will show up in a changefeed is when
// the primary index changes while we're watching it and then the old
// primary index is dropped. In this case, we'll get a schema event to
// restart into the new primary index, but the DeleteRange may come
// through before the schema event.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something tells me this will be an unwelcome surprise once we start using MVCC range tombstones for other stuff too, but we can cross that bridge when we get there.

if err != nil || len(jobIDs) == 0 {
return err
}
log.Infof(ctx, "waiting for %d GC jobs to adopt the new protocol: %v", len(jobIDs), jobIDs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of these jobs are in backoff due to errors, can we end up waiting for hours here? Or are they accounted for by nonTerminalStatusTupleString? If yes, would it be safe to ignore these since they would adopt the new protocol once they resume?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a good suggestion! Thanks.

@ajwerner ajwerner force-pushed the ajwerner/del-range-gc-job branch 7 times, most recently from acd4357 to ad75d48 Compare August 12, 2022 22:44
We're going to need some new language to capture the states which correspond to
ClearRange so we can differentiate them from the DelRange states.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/del-range-gc-job branch from ad75d48 to 4530f4b Compare August 13, 2022 20:46
Note that this does not change anything about tenant GC.

Fixes cockroachdb#70427

Release note (sql change): The asynchronous garbage collection process has
been changed such that very soon after dropping a table, index, or database, or
after refreshing a materialized view, the system will issue range deletion
tombstones over the dropped data. These tombstones will result in the KV
statistics properly counting these bytes as garbage. Before this change, the
asynchronous "gc job" would wait out the TTL and then issue a lower-level
operation to clear out the data. That meant that while the job was waiting
out the TTL, the data would appear in the statistics to still be live. This
was confusing.
@ajwerner ajwerner force-pushed the ajwerner/del-range-gc-job branch from 4530f4b to de12cee Compare August 14, 2022 19:04
@ajwerner
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 14, 2022

Build failed:

@ajwerner
Copy link
Copy Markdown
Contributor Author

The test failure was in //pkg/ui/workspaces/db-console. I can't see how that'd be related, so I'm going to try again.

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 14, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: migrate schema object removals and GC to DeleteRange

4 participants