Skip to content

feat: add retention period to catalog#26479

Merged
waynr merged 3 commits intomainfrom
feat/add-retention-period-to-catalog_core
Jun 3, 2025
Merged

feat: add retention period to catalog#26479
waynr merged 3 commits intomainfrom
feat/add-retention-period-to-catalog_core

Conversation

@waynr
Copy link
Copy Markdown
Contributor

@waynr waynr commented May 31, 2025

This is the first step in implementation for #26169

This adds catalog-level and http-level support for setting retention periods on databases. It does not implement retention period handling in management of query buffers or object store contents. The intent here is just to support the catalog types as a base for subsequent work.

The major change to take note of here is that we are adding a new retention_period field to databases which is represented by a RetentionPeriod enum which can either be:

  • RetentionPeriod::Indefinite
    • the default
    • effectively no retention period
  • RetentionPeriod::Duration(t)
    • must be manually set by a user via the HTTP API (in a later PR, via CLI)
    • data is kept for no longer than the specified duration

@waynr waynr force-pushed the feat/add-retention-period-to-catalog_core branch from 5ee8193 to b0d33f9 Compare May 31, 2025 00:53
@waynr waynr requested a review from a team May 31, 2025 00:56
@waynr waynr force-pushed the feat/add-retention-period-to-catalog_core branch from b0d33f9 to 7d9cf2a Compare June 2, 2025 21:53
Copy link
Copy Markdown
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does what it says on the box – nice work!

Was a good study for me too, as I'll eventually have to add something similar for adding API to perform / update the hard delete date.

// Return the oldest allowable timestamp for the given table according to the
// currently-available set of retention policies. This is returned as a number of nanoseconds
// since the Unix Epoch.
pub fn get_retention_period_cutoff_ts_nanos(&self, db_id: &DbId, _: &TableId) -> Option<i64> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Just my musing, not something that needs to be changed for this PR

I have been thinking that adding a specific UnixTimestamp would be beneficial. We handle nanosecond UNIX epoch timestamps regularly, but an i64 doesn't carry any semantic meaning. Passing around UnixTimestamp would be clearer. I found a crate, unix-ts, but that stores the value as an i128. We could consider adding our own for i64.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the sound of that as a general repo-wide refactor, but for this purpose I just stuck with the type used to represent timestamps wherever these values will be used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same – definitely should be a deliberate PR. Start with developing the type with a good API, and then a separate PR to do the repo-wide refactor.

Comment on lines +338 to +341
retention_period: snap
.retention_period
.map(Snapshot::from_snapshot)
.unwrap_or(RetentionPeriod::Indefinite),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ default to Indefinite is snap.retention_period is None

@waynr waynr merged commit acdb8f6 into main Jun 3, 2025
12 checks passed
Copy link
Copy Markdown
Contributor

@hiltontj hiltontj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of comments (none blocking).

Comment on lines +609 to +610
let now = self.time_provider.now().timestamp_nanos();
Some(now - retention_period as i64)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW there is a Time::checked_sub API that you could use here to directly subtract the retention period Duration from self.time_provider.now().

db_name: &str,
duration: Duration,
) -> Result<OrderedCatalogBatch> {
info!("create new retention policy");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to log the db_name and duration provided.

&self,
db_name: &str,
) -> Result<OrderedCatalogBatch> {
info!("delete retention policy");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to log the db_name

mgattozzi added a commit that referenced this pull request Sep 8, 2025
* fix: Automatic intermediate directory cleanup for file object store (#26480)

Removes empty intermediate directories when a key is removed from
local file system object storage, which matches cloud-based providers.

* feat: add retention period to catalog (#26479)

* feat: add retention period to catalog

* fix: handle humantime parsing error properly

* refactor: use new iox_http_util types

---------

Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>

* chore: address a couple post-merge PR comments (#26489)

* feat: add concurrency limit for WAL replay (#26483)

WAL replay currently loads _all_ WAL files concurrently running into
OOM. This commit adds a CLI parameter `--wal-replay-concurrency-limit`
that would allow the user to set a lower limit and run WAL replay again.

closes: #26481

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>
Co-authored-by: praveen-influx <pkumar@influxdata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants