gcjob: issue DeleteRange tombstones and then wait for GC by ajwerner · Pull Request #85878 · cockroachdb/cockroach

ajwerner · 2022-08-10T05:25:11Z

Note that this does not change anything about tenant GC.

Release note (sql change): The asynchronous garbage collection process has
been changed such that very soon after dropping a table, index, or database, or
after refreshing a materialized view, the system will issue range deletion
tombstones over the dropped data. These tombstones will result in the KV
statistics properly counting these bytes as garbage. Before this change, the
asynchronous "gc job" would wait out the TTL and then issue a lower-level
operation to clear out the data. That meant that while the job was waiting
out the TTL, the data would appear in the statistics to still be live. This
was confusing.

cockroach-teamcity · 2022-08-10T05:25:22Z

This change is

chengxiong-ruan

LGTM, just very few nits.

chengxiong-ruan · 2022-08-10T18:39:42Z

pkg/sql/gcjob/gc_job.go

 }

+// performGC GCs any schema elements that are in the DELETING state and returns
+// a bool indicating if it GC'd any elements.


looks like it does not return a bool.

chengxiong-ruan · 2022-08-10T18:41:22Z

pkg/sql/gcjob/gc_job.go

+			return err
+		}); err != nil {
+			if errors.Is(err, catalog.ErrDescriptorNotFound) {
+				// This can happen if another GC job created for the same table got to


a table can be dropped twice?

We had bugs in the past where we'd create extra GC jobs.

chengxiong-ruan · 2022-08-10T18:47:26Z

pkg/sql/gcjob/table_garbage_collection.go

+
+		// TODO(ajwerner): How does this happen?
+		if !table.Dropped() {
+			// We shouldn't drop this table yet.


should we error out if this can't happen?

Erroring out in the GC job just leads to it retrying. I don't know how we get here. I think I'll log it.

erikgrinaker

Only reviewed this at a high level, but LGTM. Thanks!

erikgrinaker · 2022-08-12T15:27:26Z

pkg/ccl/changefeedccl/kvfeed/physical_kv_feed.go

+				// of interest. A case where one will show up in a changefeed is when
+				// the primary index changes while we're watching it and then the old
+				// primary index is dropped. In this case, we'll get a schema event to
+				// restart into the new primary index, but the DeleteRange may come
+				// through before the schema event.


Something tells me this will be an unwelcome surprise once we start using MVCC range tombstones for other stuff too, but we can cross that bridge when we get there.

pkg/sql/gcjob/gc_job.go

pkg/upgrade/upgrades/wait_for_del_range_in_gc_job.go

erikgrinaker · 2022-08-12T15:38:49Z

pkg/upgrade/upgrades/wait_for_del_range_in_gc_job.go

+		if err != nil || len(jobIDs) == 0 {
+			return err
+		}
+		log.Infof(ctx, "waiting for %d GC jobs to adopt the new protocol: %v", len(jobIDs), jobIDs)


If any of these jobs are in backoff due to errors, can we end up waiting for hours here? Or are they accounted for by nonTerminalStatusTupleString? If yes, would it be safe to ignore these since they would adopt the new protocol once they resume?

This was a good suggestion! Thanks.

We're going to need some new language to capture the states which correspond to ClearRange so we can differentiate them from the DelRange states. Release note: None

Note that this does not change anything about tenant GC. Fixes cockroachdb#70427 Release note (sql change): The asynchronous garbage collection process has been changed such that very soon after dropping a table, index, or database, or after refreshing a materialized view, the system will issue range deletion tombstones over the dropped data. These tombstones will result in the KV statistics properly counting these bytes as garbage. Before this change, the asynchronous "gc job" would wait out the TTL and then issue a lower-level operation to clear out the data. That meant that while the job was waiting out the TTL, the data would appear in the statistics to still be live. This was confusing.

…upgrade Release note: None

ajwerner · 2022-08-14T20:13:41Z

TFTR!

bors r+

craig · 2022-08-14T21:04:15Z

Build failed:

Bazel Essential CI (Cockroach)

ajwerner · 2022-08-14T21:10:12Z

The test failure was in //pkg/ui/workspaces/db-console. I can't see how that'd be related, so I'm going to try again.

bors r+

craig · 2022-08-14T23:06:27Z

Build succeeded:

Bazel Essential CI (Cockroach)

ajwerner requested a review from a team August 10, 2022 05:25

ajwerner requested review from a team as code owners August 10, 2022 05:25

chengxiong-ruan approved these changes Aug 10, 2022

View reviewed changes

ajwerner force-pushed the ajwerner/del-range-gc-job branch from fcd2a8d to 3f9904f Compare August 11, 2022 06:11

ajwerner requested a review from a team as a code owner August 11, 2022 06:11

ajwerner requested a review from a team August 11, 2022 06:11

ajwerner requested a review from a team as a code owner August 11, 2022 06:11

ajwerner requested review from HonoreDB and adityamaru and removed request for a team August 11, 2022 06:11

ajwerner force-pushed the ajwerner/del-range-gc-job branch 3 times, most recently from 5d2889c to 875c130 Compare August 12, 2022 15:16

ajwerner requested a review from a team as a code owner August 12, 2022 15:16

erikgrinaker approved these changes Aug 12, 2022

View reviewed changes

ajwerner force-pushed the ajwerner/del-range-gc-job branch 7 times, most recently from acd4357 to ad75d48 Compare August 12, 2022 22:44

jobspb,gcjob: rename state enum in anticipation of DelRange

11a0fad

We're going to need some new language to capture the states which correspond to ClearRange so we can differentiate them from the DelRange states. Release note: None

ajwerner force-pushed the ajwerner/del-range-gc-job branch from ad75d48 to 4530f4b Compare August 13, 2022 20:46

ajwerner added 2 commits August 14, 2022 19:04

upgrades,clusterversion: write a migration to wait for the GC job to …

de12cee

…upgrade Release note: None

ajwerner force-pushed the ajwerner/del-range-gc-job branch from 4530f4b to de12cee Compare August 14, 2022 19:04

craig bot merged commit b6d1689 into cockroachdb:master Aug 14, 2022

cockroach-teamcity mentioned this pull request Aug 14, 2022

gcjob: issue DeleteRange tombstones and then wait for GC cockroachdb/docs#14822

Closed

erikgrinaker mentioned this pull request Aug 15, 2022

sql: migrate bulk job cancellation to DeleteRange #70428

Closed

ajwerner mentioned this pull request Nov 10, 2022

storage,schema: live bytes not updated immediately in response to TRUNCATE/DROP TABLE #71617

Closed

Conversation

ajwerner commented Aug 10, 2022

Uh oh!

cockroach-teamcity commented Aug 10, 2022

Uh oh!

chengxiong-ruan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajwerner commented Aug 14, 2022

Uh oh!

craig bot commented Aug 14, 2022

Uh oh!

ajwerner commented Aug 14, 2022

Uh oh!

craig bot commented Aug 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants