Skip to content

ttl,admission: reduce performance impact of large row-level TTL jobs #98722

@irfansharif

Description

@irfansharif

Is your feature request related to a problem? Please describe.

We've seen in several internal incidents that using row-level TTL can be a performance footgun with large tables. We observe throughput/latency impact either when initially applying row-level TTL on an existing table using ttl_expire_after (which kicks off a column backfill; using ttl_expiration_expression does not cause a backfill), or when a TTL job is unpaused and starts aggressively issuing row deletes through SQL. It can also appear as part of routine row-deletions without prolonged job pauses. This issue focuses on the latter case, and can look like below:

image
image
image

For the column backfill case, we replication admission control to help (#95563). Internal incidents:

Jira issue: CRDB-25463

Epic CRDB-25458

Metadata

Metadata

Assignees

Labels

A-admission-controlA-row-level-ttlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions