Skip to content

bulk: large index validation after backfill can be starved by GC TTL, bounding index creation scalability #84911

@nvb

Description

@nvb

After a backfill, we run a series of validation queries (SchemaChanger.validateIndexes) to ensure the integrity of the new index. These amount to full table scans over the new index and the primary index.

When running an index creation on a 4TB table (32,194,851,254 rows) with a 50m GC TTL, I found that these queries could be starved and prevented from completing successfully. Instead, they would be retried every 50 minutes indefinitely with logs like the following:

I220722 11:59:49.467458 69188175 sql/backfill.go:2115 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 250073  finished backfilling indexes
I220722 11:59:49.467505 69188175 sql/backfill.go:1398 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 250074  validating new indexes
W220722 12:51:35.499342 69188175 sql/schema_changer.go:2614 ⋮ [n1,job=781044268655935489] 254915  error while running schema change, retrying: ‹verify-idx-count›: ‹batch timestamp 1658491189.517372325,0 must be after replica GC threshold 1658491263.919029492,0›
I220722 12:51:54.252033 69188175 sql/schema_changer.go:663 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254943  schema change on ‹"cash_transaction"› (v29) starting execution...
I220722 12:51:54.287300 69188175 sql/schema_changer.go:1085 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254944  stepping through state machine
I220722 12:51:54.294906 69188175 sql/schema_changer.go:1173 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254945  finished stepping through state machine
I220722 12:51:54.317960 69188175 sql/backfill.go:242 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254946  running backfill for ‹"cash_transaction"›, v=29
I220722 12:51:54.318006 69188175 sql/backfill.go:2051 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254947  backfilling 1 indexes: [‹/Table/115/{8-9}›] (writeAtRequestTimestamp: false)
I220722 12:51:54.391779 69188175 sql/backfill.go:2115 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254948  finished backfilling indexes
I220722 12:51:54.391804 69188175 sql/backfill.go:1398 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254949  validating new indexes
W220722 13:43:45.978629 69188175 sql/schema_changer.go:2614 ⋮ [n1,job=781044268655935489] 259866  error while running schema change, retrying: ‹verify-idx-count›: ‹batch timestamp 1658494314.415510637,0 must be after replica GC threshold 1658494384.053000856,0›
I220722 13:44:04.081576 69188175 sql/schema_changer.go:663 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259889  schema change on ‹"cash_transaction"› (v29) starting execution...
I220722 13:44:04.088672 69188175 sql/schema_changer.go:1085 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259890  stepping through state machine
I220722 13:44:04.090402 69188175 sql/schema_changer.go:1173 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259891  finished stepping through state machine
I220722 13:44:04.128474 69188175 sql/backfill.go:242 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259893  running backfill for ‹"cash_transaction"›, v=29
I220722 13:44:04.128532 69188175 sql/backfill.go:2051 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259894  backfilling 1 indexes: [‹/Table/115/{8-9}›] (writeAtRequestTimestamp: false)

At one point, I did see a single side of the validation complete just in time:

I220722 11:07:31.448156 788495333 sql/backfill.go:1716 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 245898  validation: index ‹cash_transaction›/‹cash_transaction_ct_dts_idx› row count = 32194851254, time so far 50m55.088212272s

This revealed that we were scanning about 10M rows per second. Not terrible. I also confirmed that the query is using DistSQL.

There's a larger question of whether SQL execution should be creating protected timestamps whenever it sees long-running queries that might run into the GC threshold. In lieu of that, I think the schema changer should be manually placing a protected timestamp to protect these queries. Without one, we have an effective cap on the size of a table that can be given a new index.

I don't know whether this lives in Bulk or SQL Schema, so adding to both for now.

Jira issue: CRDB-17945

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryA-schema-changesC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions