Skip to content

Add a vtbackup batch command to take a backup and exit#4858

Merged
dkhenry merged 26 commits intovitessio:masterfrom
planetscale:dk-backup-only
Jun 29, 2019
Merged

Add a vtbackup batch command to take a backup and exit#4858
dkhenry merged 26 commits intovitessio:masterfrom
planetscale:dk-backup-only

Conversation

@dkhenry
Copy link
Copy Markdown
Contributor

@dkhenry dkhenry commented May 7, 2019

This adds a new vtbackup binary, which is a batch command that allows taking a backup without shutting down a running tablet. The vtbackup command will restore from the latest backup, silently replicate the latest changes without joining the Vitess cluster as a tablet, then take a new backup and exit.

Signed-off-by: Dan Kozlowski koz@planetscale.com

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
@dkhenry dkhenry requested a review from sougou as a code owner May 7, 2019 20:19
@dkhenry dkhenry requested review from deepthi and enisoc May 7, 2019 20:19
Removing whitespace and renaming flag

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
@deepthi
Copy link
Copy Markdown
Collaborator

deepthi commented May 8, 2019

Q: what is the motivation for this feature?

@dkhenry
Copy link
Copy Markdown
Contributor Author

dkhenry commented May 9, 2019

The motivation for this feature is to have the backup workflow of

  1. Provision a new tablet
  2. Restore from the Previous Backup, which will validate that the previous backup is functional
  3. Take a new backup from the restored tablet
  4. Cleanly exit

@deepthi
Copy link
Copy Markdown
Collaborator

deepthi commented May 9, 2019

The motivation for this feature is to have the backup workflow of

  1. Provision a new tablet
  2. Restore from the Previous Backup, which will validate that the previous backup is functional
  3. Take a new backup from the restored tablet
  4. Cleanly exit

This seems like a nice way to update your backups to be more current.
Is there any expectation around replica lag before taking a new backup?

@dkhenry
Copy link
Copy Markdown
Contributor Author

dkhenry commented May 9, 2019

There is. The busy loop that is there is to wait until the tablet catches up. It won't set SERVING until it is caught up, however on @enisoc 's suggestion I am swithing this to work with explicit BACKUP types so I am going to change the test slightly to be when SERVING would be set

Dan Kozlowski added 2 commits May 9, 2019 14:44
… of tablet

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
Signed-off-by: Dan Kozlowski <koz@planetscale.com>
Copy link
Copy Markdown
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you did this. It's elegant and clean.

However, I have to agree with @enisoc. This brings in a fundamental change to what vttablet is: instead of being a server, it becomes a job. I think this will cause problems in the future.

This job has to be done from outside, ideally by an automation layer like a script or an operator.

This removes the backup_only from vttablet and creates a new binary
only for backups. This binary does not register itsself with the topo
server, but it does still require a valid tablet alias to facilitate
replication.

This can also be though of as Just Enough Vitess for Backup

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
@dkhenry
Copy link
Copy Markdown
Contributor Author

dkhenry commented May 11, 2019

@enisoc @deepthi - I totally redid this. There is now a vtbackup job that only takes backups. This will require some more cleanup and tests before it is safe to merge, but this should generally be an acceptable solution

@enisoc
Copy link
Copy Markdown
Member

enisoc commented May 11, 2019

I haven't reviewed it in detail yet, but the new approach looks awesome! This seems like it'll have all the nice properties we wanted in the long-term solution. Let me know when you've got it ready for final review.

@enisoc enisoc changed the title Adding a new option to vttablet to enable it to take a backup and exit Add a vtbackup batch command to take a backup and exit May 29, 2019
enisoc added 6 commits June 4, 2019 16:18
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
The lag value can lie. Also we're guaranteed to hit the snapshot
position eventually as long as replication is making progress. If our
goal is a lag value, we may never catch up if replication is slower than
the rate of new transactions on the master.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
enisoc and others added 3 commits June 7, 2019 12:41
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
vtbackup: Clean up and add policy enforcement
@dkhenry dkhenry requested a review from derekperkins as a code owner June 10, 2019 21:44
Copy link
Copy Markdown
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enisoc @dkhenry can one of you update the PR description right at the top? It is no longer accurate.

enisoc and others added 4 commits June 14, 2019 15:32
Requiring that no tablets are running at all was too strict. It
prevented using the new -wait_for_backup_interval flag to start up
tablets concurrently while the initial backup is going.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
This makes it better match the state that you'd have if you used
InitShardMaster instead.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Also adding in support for non initial_backup backup being taken with vtbackup

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
…k-backup-only

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
@dkhenry
Copy link
Copy Markdown
Contributor Author

dkhenry commented Jun 19, 2019

@deepthi The title and description look accurate.
@enisoc the tests have been updated. I had to made a few more changes to vtbackup and the tests. I added a new command line flag to allow this binary to take the first backup of an existing shard.

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
Teardown option is no longer used

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
@dkhenry dkhenry requested review from deepthi and enisoc June 28, 2019 02:20
Dan Kozlowski and others added 3 commits June 27, 2019 19:22
…-only

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
This block of code doesn't make any sense the way
its written and I feel something has been lost in translation.
I have modified it so at least I can explain it.

If you have no backups you will return ErrNoBackup.
If you fail to reset replication you will log that message
If you fail to Populate the Metadata tables we will return that error

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
We need to create a backup with the new timestamp so we know that our
backups are fresh.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc
Copy link
Copy Markdown
Member

enisoc commented Jun 28, 2019

@dkhenry FYI I added another commit to remove a check I had added earlier, which turned out to be problematic: 33b3035

By the way, is this ready for re-review?

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@dkhenry
Copy link
Copy Markdown
Contributor Author

dkhenry commented Jun 28, 2019

@enisoc yes this is ready for re-review

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
Copy link
Copy Markdown
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Signed-off-by: Dan Kozlowski <koz@planetscale.com>
@dkhenry dkhenry merged commit 22c64df into vitessio:master Jun 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants