-
Notifications
You must be signed in to change notification settings - Fork 4.1k
ttl,admission: reduce performance impact of large row-level TTL jobs #98722
Description
Is your feature request related to a problem? Please describe.
We've seen in several internal incidents that using row-level TTL can be a performance footgun with large tables. We observe throughput/latency impact either when initially applying row-level TTL on an existing table using ttl_expire_after (which kicks off a column backfill; using ttl_expiration_expression does not cause a backfill), or when a TTL job is unpaused and starts aggressively issuing row deletes through SQL. It can also appear as part of routine row-deletions without prolonged job pauses. This issue focuses on the latter case, and can look like below:
For the column backfill case, we replication admission control to help (#95563). Internal incidents:
- https://github.com/cockroachlabs/support/issues/1961
- https://cockroachdb.zendesk.com/agent/tickets/15684
- https://github.com/cockroachlabs/support/issues/1628
- https://github.com/cockroachlabs/support/issues/2050
Jira issue: CRDB-25463
Epic CRDB-25458


