-
Notifications
You must be signed in to change notification settings - Fork 4.1k
bulk: large index validation after backfill can be starved by GC TTL, bounding index creation scalability #84911
Description
After a backfill, we run a series of validation queries (SchemaChanger.validateIndexes) to ensure the integrity of the new index. These amount to full table scans over the new index and the primary index.
When running an index creation on a 4TB table (32,194,851,254 rows) with a 50m GC TTL, I found that these queries could be starved and prevented from completing successfully. Instead, they would be retried every 50 minutes indefinitely with logs like the following:
I220722 11:59:49.467458 69188175 sql/backfill.go:2115 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 250073 finished backfilling indexes
I220722 11:59:49.467505 69188175 sql/backfill.go:1398 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 250074 validating new indexes
W220722 12:51:35.499342 69188175 sql/schema_changer.go:2614 ⋮ [n1,job=781044268655935489] 254915 error while running schema change, retrying: ‹verify-idx-count›: ‹batch timestamp 1658491189.517372325,0 must be after replica GC threshold 1658491263.919029492,0›
I220722 12:51:54.252033 69188175 sql/schema_changer.go:663 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254943 schema change on ‹"cash_transaction"› (v29) starting execution...
I220722 12:51:54.287300 69188175 sql/schema_changer.go:1085 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254944 stepping through state machine
I220722 12:51:54.294906 69188175 sql/schema_changer.go:1173 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254945 finished stepping through state machine
I220722 12:51:54.317960 69188175 sql/backfill.go:242 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254946 running backfill for ‹"cash_transaction"›, v=29
I220722 12:51:54.318006 69188175 sql/backfill.go:2051 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254947 backfilling 1 indexes: [‹/Table/115/{8-9}›] (writeAtRequestTimestamp: false)
I220722 12:51:54.391779 69188175 sql/backfill.go:2115 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254948 finished backfilling indexes
I220722 12:51:54.391804 69188175 sql/backfill.go:1398 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 254949 validating new indexes
W220722 13:43:45.978629 69188175 sql/schema_changer.go:2614 ⋮ [n1,job=781044268655935489] 259866 error while running schema change, retrying: ‹verify-idx-count›: ‹batch timestamp 1658494314.415510637,0 must be after replica GC threshold 1658494384.053000856,0›
I220722 13:44:04.081576 69188175 sql/schema_changer.go:663 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259889 schema change on ‹"cash_transaction"› (v29) starting execution...
I220722 13:44:04.088672 69188175 sql/schema_changer.go:1085 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259890 stepping through state machine
I220722 13:44:04.090402 69188175 sql/schema_changer.go:1173 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259891 finished stepping through state machine
I220722 13:44:04.128474 69188175 sql/backfill.go:242 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259893 running backfill for ‹"cash_transaction"›, v=29
I220722 13:44:04.128532 69188175 sql/backfill.go:2051 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 259894 backfilling 1 indexes: [‹/Table/115/{8-9}›] (writeAtRequestTimestamp: false)
At one point, I did see a single side of the validation complete just in time:
I220722 11:07:31.448156 788495333 sql/backfill.go:1716 ⋮ [n1,job=781044268655935489,scExec,id=115,mutation=5] 245898 validation: index ‹cash_transaction›/‹cash_transaction_ct_dts_idx› row count = 32194851254, time so far 50m55.088212272s
This revealed that we were scanning about 10M rows per second. Not terrible. I also confirmed that the query is using DistSQL.
There's a larger question of whether SQL execution should be creating protected timestamps whenever it sees long-running queries that might run into the GC threshold. In lieu of that, I think the schema changer should be manually placing a protected timestamp to protect these queries. Without one, we have an effective cap on the size of a table that can be given a new index.
I don't know whether this lives in Bulk or SQL Schema, so adding to both for now.
Jira issue: CRDB-17945