Skip to content

sql: Add exclude_data_from_backup storage option to ALTER TABLE#75295

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
adityamaru:add-ephemeral-on-table-desc
Feb 2, 2022
Merged

sql: Add exclude_data_from_backup storage option to ALTER TABLE#75295
craig[bot] merged 1 commit intocockroachdb:masterfrom
adityamaru:add-ephemeral-on-table-desc

Conversation

@adityamaru
Copy link
Copy Markdown
Contributor

@adityamaru adityamaru commented Jan 21, 2022

This change adds a storage parameter exclude_data_from_backup
to mark a table's row data as excluded from a backup.

ALTER TABLE foo SET (exclude_data_from_backup = true);

The above ALTER TABLE stmt sets the ExcludeDataFromBackup field
on a table descriptor

ExcludeDataFromBackup specifies if the table's row data can be excluded
from a backup targeting this table. This in turn means that the protected
timestamp record written during the backup will exclude this table as a
target, thereby not holding up GC on this table for the runtime of the
backup. This is particularly useful for table's with high-churn that can
then be configured with a low GC-TTL.

It is important to note that we will still backup the table descriptor, and
all other objects that reference this table egs: jobs, and so restoring
such a table will result in an empty table on the restoring cluster.

This change does not teach any part of the system to use this information
from the table descriptor, that will come in follow up PRs.

Informs: #73536

Release note (sql change): Adds new ALTER TABLE <table> SET (exclude_data_from_backup = true);
syntax. This allows a user to exclude row data for a particular table during a backup. On restoring this backup, the table will be created in the restoring cluster but will be empty. This flag also prevents a cluster or database backup from holding up the garbage collection time to live (GC TTL) on the key span representing this table. This is particularly useful for high churn tables that wish to keep a much shorter GC TTL to prevent the buildup of garbage.

@adityamaru adityamaru requested review from a team, ajwerner, dt and stevendanna January 21, 2022 21:26
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@adityamaru adityamaru force-pushed the add-ephemeral-on-table-desc branch from 30cbe8c to b02120e Compare January 21, 2022 21:27
@adityamaru adityamaru force-pushed the add-ephemeral-on-table-desc branch from b02120e to 547e02c Compare January 28, 2022 01:51
@adityamaru adityamaru changed the title sql: Add ALTER TABLE ... SET EPHEMERAL DATA sql: Add exclude_data_from_backup storage option to ALTER TABLE Jan 28, 2022
@adityamaru
Copy link
Copy Markdown
Contributor Author

@otan I'm going to hold off on merging this until your commits have landed so that I don't step on your toes. I just wanted some syntax in place so that I can build #75451 on top of this.

@adityamaru adityamaru requested review from otan and removed request for a team January 28, 2022 01:54
@otan
Copy link
Copy Markdown
Contributor

otan commented Jan 28, 2022

@otan I'm going to hold off on merging this until your commits have landed so that I don't step on your toes

SGTM!

Copy link
Copy Markdown
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestions with easy fixes; see #75779 if you need a ref to pull from.

return errors.New("cannot set data in a table with inbound foreign key constraints to be excluded from backup")
}

s, err := GetSingleBool(key, datum)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to allow 'on' as well, as we do with outer variables?

if so, you may want to pull in 3bc72cf and use boolFromDatum

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure exclude_data_from_backup = on makes sense so leaving this as is for now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also allows 'true' as a string, and is standard for session vars (e.g. set experimental_enable_temp_tables = 'on')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(although at the end of the day up to you)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah makes sense, adding it in. Thanks for the pointers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! sprinkled 'on' and 'true' in the logictest too.

@adityamaru adityamaru force-pushed the add-ephemeral-on-table-desc branch 2 times, most recently from 933dd52 to 738b23d Compare February 2, 2022 00:07
@adityamaru adityamaru requested review from a team as code owners February 2, 2022 00:07
@adityamaru adityamaru removed request for ajwerner and dt February 2, 2022 00:09
This change adds a storage parameter `exclude_data_from_backup`
to mark a table's row data as excluded from a backup.

`ALTER TABLE foo SET (exclude_data_from_backup = true);`

The above ALTER TABLE stmt sets the `ExcludeDataFromBackup` field
on a table descriptor

ExcludeDataFromBackup specifies if the table's row data can be excluded
from a backup targeting this table. This in turn means that the protected
timestamp record written during the backup will exclude this table as a
target, thereby not holding up GC on this table for the runtime of the
backup. This is particularly useful for table's with high-churn that can
then be configured with a low GC-TTL.

It is important to note that we will still backup the table descriptor, and
all other objects that reference this table egs: jobs, and so restoring
such a table will result in an empty table on the restoring cluster.

This change does not teach any part of the system to use this information
from the table descriptor, that will come in follow up PRs.

Informs: cockroachdb#73536

Release note: None
@adityamaru adityamaru force-pushed the add-ephemeral-on-table-desc branch from 738b23d to b557f36 Compare February 2, 2022 14:37
@adityamaru
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r=otan

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 2, 2022

Build succeeded:

@craig craig bot merged commit 577e9b0 into cockroachdb:master Feb 2, 2022
@adityamaru adityamaru deleted the add-ephemeral-on-table-desc branch February 2, 2022 17:49
@adityamaru
Copy link
Copy Markdown
Contributor Author

adityamaru commented Apr 12, 2022

cc: @kathancox I just updated this Release Note now that the feature is complete and being shipped in 22.1. Happy to file a docs issue with details if you prefer that?

@katmayb
Copy link
Copy Markdown

katmayb commented Apr 14, 2022

Hey @adityamaru, if you could open a docs issue that would be great. It would be good to start this in the next week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants