backup: use separate incremental directory by default#75970
backup: use separate incremental directory by default#75970craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
a844458 to
e4b346b
Compare
e4b346b to
286dc4f
Compare
There was a problem hiding this comment.
Left a few comments on the backup and restore code. I still need to review the tests and the show backup code. Let's discuss if you have questions!
I think it's also worth writing out the whole policy of destination resolution in the commit message, or maybe by revising the backup destination resolution doc I sent you that Liv originally wrote. I'm still not totally clear what the intended policy is, i.e.: given the user passes these params for Backup or Restore, this is what happens.
|
A few nits on commit message style (I promise these become second nature eventually):
Putting these nits together, I suggest something like:
|
c1ab9d9 to
ed89ea8
Compare
|
Thanks David! I updated the message. |
ed89ea8 to
56c4a96
Compare
|
One small nit on the commit message: this feature only applies to backups with the |
msbutler
left a comment
There was a problem hiding this comment.
left another round of comments in Backup/Restore code. Again, haven't looked at tests and show backup yet.
Waiting to refactor internal variables until cockroachdb#75970 merges. Release note (sql change): refactored BACKUP, SHOW BACKUP, and RESTORE incremental_storage option to incremental_location.
76345: import: default to writing at ingest time r=dt a=dt Release note: none. 76368: build: turn off metamorphic constants for more gen rules r=ajwerner a=stevendanna This add `COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=true` to more tool invocations to cut down on log spam. Informs #76363 Release note: None 76416: backupccl: rename incremental_storage option to incremental_location r=dt a=msbutler Waiting to refactor internal variables until #75970 merges. Release note (sql change): refactored BACKUP, SHOW BACKUP, and RESTORE incremental_storage option to incremental_location. Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Waiting to refactor internal variables until cockroachdb#75970 merges. Release note (sql change): refactored BACKUP, SHOW BACKUP, and RESTORE incremental_storage option to incremental_location.
052cf5b to
5f307e9
Compare
| // make sure to resolve the paths correctly here. We don't want to accidentally | ||
| // correct an unauthorized file path to an authorized one, then write a backup | ||
| // to an unexpected place or print the wrong error message on a restore. | ||
| func JoinURLPath(args ...string) string { |
There was a problem hiding this comment.
i'm not totally clear on the subtle differences between filepath.Join and path.join and this helper function. Perhaps a few unit tests for this func would help illustrate? Maybe this function belongs in pkg/cloud/uris.go ? Perhaps @dt has more to say on this.
There was a problem hiding this comment.
We don't want to be using filepath for anything since it is client dependent -- e.g. on a windows node it'd use \ as its separator when joining -- and we're working with network paths. I think we just want to use path.Join?
There was a problem hiding this comment.
Thanks david - good catch with the separator, I didn't think that through.
path.Join doesn't quite work - I tweaked the comments and added some unit tests. This any clearer?
There was a problem hiding this comment.
Ah, I see, yeah, I remember this somewhat unexpected "path.Join calls path.Clean" behavior from working with nodelocal but it internally avoids that.
We already use path.Join throughout backup resolution though, and changing it will be a user-facing change to how ../ behaves, so my vote would be we don't do that in this change but rather address it, if we decide we should, separately?
There was a problem hiding this comment.
I think this actually maintains the expected user behavior - https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/backupccl/backup_test.go#L5430 fails without it. (I think it passed before because path.Join wasn't actually called on that particular path, but with the refactoring path.Join is now called.)
| // make sure to resolve the paths correctly here. We don't want to accidentally | ||
| // correct an unauthorized file path to an authorized one, then write a backup | ||
| // to an unexpected place or print the wrong error message on a restore. | ||
| func JoinURLPath(args ...string) string { |
There was a problem hiding this comment.
We don't want to be using filepath for anything since it is client dependent -- e.g. on a windows node it'd use \ as its separator when joining -- and we're working with network paths. I think we just want to use path.Join?
2527fdf to
70f8889
Compare
msbutler
left a comment
There was a problem hiding this comment.
nearly there! I think it's also worth adding a test to show_test to capture the regression Aditya mentioned in slack.
| @@ -238,10 +225,10 @@ func TestBackupRestoreResolveDestination(t *testing.T) { | |||
| { | |||
| to := localizeURI(t, baseDir, localities) | |||
| // The full backup should go into the baseDir. | |||
There was a problem hiding this comment.
The comment indicates we're using the old BACKUP TO syntax. Why would this subtest get changed by this PR, which I thought only affects BACKUP INTO?
There was a problem hiding this comment.
Discussed in slack - this doesn't get us much from the UX perspective, but the code is probably cleaner since we don't have to make copies and fork more on syntax type. Your thoughts?
There was a problem hiding this comment.
just read through the slack thread. I'm ok with this change! To clarify for Kathryn and future users, I'd amend the commit message so it's clear that this change modifies both BACKUP TO and BACKUP INTO.
|
|
||
| sib := fmt.Sprintf("BACKUP DATABASE fkdb INTO LATEST IN %s WITH incremental_location = %s", dest, inc) | ||
| sqlDB.Exec(t, sib) | ||
| sir := fmt.Sprintf("RESTORE DATABASE fkdb FROM LATEST IN %s WITH new_db_name = 'inc_fkdb'", dest) |
There was a problem hiding this comment.
to make this test even stronger, what if after you check the query results below, you did the following: ensure the same restore command will behave the same, even in the presence of an inc backup in the default location. I.e. after checkQueryResults:
sqlDB.Exec(t, `INSERT INTO fkdb.fk (ind) VALUES ($1)`, 200)
dib := fmt.Sprintf("BACKUP DATABASE fkdb INTO LATEST IN %s WITH incremental_location = %s", dest, defaultinc)
sqlDB.Exec(t, sib)
sir2 := fmt.Sprintf("RESTORE DATABASE fkdb FROM LATEST IN %s WITH new_db_name = 'inc_fkdb2'", dest)
sqlDB.Exec(t, sir2)
sqlDB.CheckQueryResults(t, `SELECT * FROM inc_fkdb.fk`, sqlDB.QueryStr(t, `SELECT * FROM inc_fkdb2.fk`))
There was a problem hiding this comment.
I don't think that I like that so much - while I agree the behavior would right now be as you describe, it's not part of the API contract. RESTORE makes no promises about which incrementals dir it will use if both have layers; in fact, BACKUP promises never to do that by default. I'd rather consider this behavior as "undefined" - not because I think it's likely we'll want to change it, but because I think it makes the functionality simpler and clearer for callers.
Have I convinced?
There was a problem hiding this comment.
One worry I have is that it's perfectly legal for the user to use the incremental_location parameter to get into this undefined state. Given that we can detect this undefined state, what if we throw an error during RESTORE and SHOW BACKUPS if we detect this, and ask the user to pass the incremental_location parameter?
There was a problem hiding this comment.
OK, this I like! Seems like an easy win.
e6a6b23 to
258abc5
Compare
Waiting to refactor internal variables until cockroachdb#75970 merges. Release note (sql change): refactored BACKUP, SHOW BACKUP, and RESTORE incremental_storage option to incremental_location.
|
@adityamaru @msbutler I think this rounds out the revisions! I fixed the regression Aditya caught (thanks again!), added a unit test, and tweaked a few other things. I left the commits separate so it's easier to see the differences - if you think we're good here, I'll accept your ✅ , squash and send to David for stability release approval! Thanks! |
5db157b to
4adf8af
Compare
| } | ||
| incPaths, err = FindPriorBackups(ctx, incStore, IncludeManifest) | ||
|
|
||
| collection, computedSubdir := CollectionAndSubdir(dest, subdir) |
There was a problem hiding this comment.
When do we need this? If someone does SHOW BACKUP <collection>/<backup-date> instead of SHOW BACKUP <backup-date> IN <collection> ?
I'm temped to say that with the move to separated per-collection incremental storage, pointing directly to a raw full backup path like the former should be discouraged, or at least, shouldn't try to be magically collection aware, and maybe just shows the full backup at that dest. That implies implying SHOW BACKUP abc IN col would show something different (all the incs) than SHOW BACKUP col/abc but that seems... maybe okay? i.e. we moved incremtnals to be collection level, and so use the collection syntax to interact with them?
I dunno. Not blocking, just something to chew on / maybe room to remove some code/complexity of supporting both here.
There was a problem hiding this comment.
Yes, exactly. Aditya caught the regression, discussed in https://cockroachlabs.slack.com/archives/CJXBR3693/p1646276263127099?thread_ts=1646246621.037659&cid=CJXBR3693.
Naively, I don't think I'd expect those two SHOW commands to have different outputs. It seems like a lot of work for a simple IN to convey "this is the modern collection syntax," in ways I'd expect to surprise users. But I do agree this code introduces a lot of complexity to get the "it just works" magic, and I would love to think of ways to simplify by reducing functionality.
Previously support was added to optionally cause interactions with a backup to search for its incremental layers in an alternative path. This change expands that behavior, adding a default path for incremental layers to interactions with backups in a collection. This new default is the '/incrementals' subdirectory of the collection root. This default may be overridden by the prevoiusly-added incremental_storage parameter. This default means that an operator can easily apply different policies to incremental backups retroactively. This behavior also provides an alternative to passing matching incremental_storage parameters to every interaction with those backups, which improves usability. During the upgrade, when interacting with existing backups, this behavior remains disabled. After the upgrade, if an existing chain of incremental backups exists in the old default collection, this behavior also remains disabled. Also after the upgrade, if no such chain exists, or the chain exists in the new default location, the new behavior will be enabled. Release note (enterprise change): Incremental backups created by BACKUP TO LATEST IN or BACKUP TO are now stored by default under the path 'incrementals' within the backup destination, rather than under each backup's path. This enables easier management of cloud-storage provider policies specifically applied to incremental backups. Release justification: This feature is very well-tested, and the code refactoring makes it more verifiably correct.
4adf8af to
f36dd64
Compare
|
bors r+ |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
|
Hey @benbardin, would you mind updating the release note here, to clarify that this is an update to the |
It's actually both; I updated the release note to reflect. (Updating the old syntax didn't get us any meaningful functionality, I think, but the code was easier to organize and understand.) Thanks for catching! |
backup: use separate incremental directory by default
Previously support was added to optionally cause interactions with a backup to search for its incremental layers in an alternative path.
This change expands that behavior, adding a default path for incremental layers to interactions with backups in a collection. This new default is the '/incrementals' subdirectory of the collection root. This default may be overridden by the prevoiusly-added incremental_storage parameter. This default means that an operator can easily apply different policies to incremental backups retroactively. This behavior also provides an alternative to passing matching incremental_storage parameters to every interaction with those backups, which improves usability.
During the upgrade, when interacting with existing backups, this behavior remains disabled.
After the upgrade, if an existing chain of incremental backups exists in the old default collection, this behavior also remains disabled.
Also after the upgrade, if no such chain exists, or the chain exists in the new default location, the new behavior will be enabled.
Release note (enterprise change): Incremental backups created by
BACKUP INTOandBACKUP TOare now stored by default under the path 'incrementals' within the backup destination, rather than under each backup's path. This enables easier application of cloud-storage provider policies specifically to incremental backups.Release justification: This feature is very well-tested, and the code refactoring makes it more verifiably correct.