roach{prod,test}: add first-class support for disk snapshots#103757
roach{prod,test}: add first-class support for disk snapshots#103757craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
Pure code movement. We'll make use of it outside this file in subsequent commits. Release note: None
These tests have been stable for a few months now. Reduce to a weekly cadence. Release note: None
0a8ba7a to
f13b01d
Compare
smg260
left a comment
There was a problem hiding this comment.
Nice! Doubtlessly will result in significant time (and cost) savings. Just minor observations/questions from me.
Reviewed 2 of 21 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)
pkg/cmd/roachtest/cluster/cluster_interface.go line 144 at r1 (raw file):
// prefix. These snapshots can later be retrieved, deleted or applied to // already instantiated clusters. CreateSnapshot(ctx context.Context, snapshotPrefix string) error
Seems like also returning a vm.VolumeSnapshot would make sense here.
pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 51 at r1 (raw file):
// TODO(irfansharif): Search by taking in the other parts of the // snapshot fingerprint, i.e. the node count, the version, etc. Name: snapshotPrefix,
Nit: Name would be clearer as NamePrefix
pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 59 at r1 (raw file):
t.L().Printf("no existing snapshots found for %s (%s), doing pre-work", t.Name(), snapshotPrefix) // TODO(irfansharif): Add validation that we're some released // version, probably the predecessor one. Also ensure that any
This is where we'd do the "fixture" work to get the disk in a state to snapshot, right?
Doesn't it make sense to only c.ApplySnapshots(..) when not in this branch (as opposed to always, below)
pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 28 at r1 (raw file):
// (GCE) let you store volume snapshots in buckets with a pre-configured TTL. So // we use this nightly roachtest as a poor man's cron job. func registerPruneDanglingSnapshotsAndDisks(r registry.Registry) {
A nice idea that ultimately should belong elsewhere as part of the framework - perhaps in the test_runner, during clean up?
pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 51 at r1 (raw file):
for _, snapshot := range snapshots { if err := c.DeleteSnapshots(ctx, snapshot); err != nil {
Why loop if the 2nd arg is variadic? A slice of err could be returned.
sumeerbhola
left a comment
There was a problem hiding this comment.
drive-by review -- admission_control*.go files look good.
Reviewed 21 of 21 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @srosenberg)
pkg/cmd/roachtest/cluster/cluster_interface.go line 143 at r1 (raw file):
// CreateSnapshot creates volume snapshots of the cluster using the given // prefix. These snapshots can later be retrieved, deleted or applied to // already instantiated clusters.
Maybe worth documenting the noop case for the local setup. The assumption this interface makes is that the roachtest that needed to do CreateSnapshot will proceed with then using the already populated disk and not rely on the rest of these snapshot methods actually doing something.
pkg/cmd/roachtest/cluster/cluster_interface.go line 154 at r1 (raw file):
// concerned - all already-attached volumes are detached and deleted to make // room for new snapshot-derived volumes. The new volumes are created using // the same specs (size, disk type, etc.) as the original cluster.
is this assumed to be one volume per node? Some of the nodes may not be running CockroachDB -- does that matter?
I see the implementation assumes equal nodes in the cluster. Worth adding to the code comment.
b1c58e7 to
5dbd1bf
Compare
irfansharif
left a comment
There was a problem hiding this comment.
TFTR!
@smg260, PTAL -- there a few more non admission_control*.go changes (around the tpce.go harness, and exposing top-level subcommands in roachprod) in addition to addressing your comments below.
@sumeerbhola, @bananabrick I fleshed out admission_control_index_backfill.go to be fully fledged. You could take another look. It's entirely optional for both y'all.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @smg260, @srosenberg, and @sumeerbhola)
pkg/cmd/roachtest/cluster/cluster_interface.go line 143 at r1 (raw file):
Previously, sumeerbhola wrote…
Maybe worth documenting the noop case for the local setup. The assumption this interface makes is that the roachtest that needed to do
CreateSnapshotwill proceed with then using the already populated disk and not rely on the rest of these snapshot methods actually doing something.
Done.
pkg/cmd/roachtest/cluster/cluster_interface.go line 144 at r1 (raw file):
Previously, smg260 (Miral Gadani) wrote…
Seems like also returning a
vm.VolumeSnapshotwould make sense here.
Done, great suggestion!
pkg/cmd/roachtest/cluster/cluster_interface.go line 154 at r1 (raw file):
Previously, sumeerbhola wrote…
is this assumed to be one volume per node? Some of the nodes may not be running CockroachDB -- does that matter?
I see the implementation assumes equal nodes in the cluster. Worth adding to the code comment.
It's assuming one volume per node but can be changed, so added a TODO. Ditto for different kinds of volumes. It doesn't matter whether CRDB is running or not - it falls back to using "unknown" as the fingerprint version. But that's not desirable -- and I'll leave it to test authors to fiddle with it and improve safeguards. Also, CRDB can't be running, else we'd risk taking inconsistent disk snapshots cluster wide and cause badness. So the CRDB version is retrieved by looking at binary directly.
pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 51 at r1 (raw file):
Previously, smg260 (Miral Gadani) wrote…
Nit:
Namewould be clearer asNamePrefix
Done.
pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 59 at r1 (raw file):
Previously, smg260 (Miral Gadani) wrote…
This is where we'd do the "fixture" work to get the disk in a state to snapshot, right?
Doesn't it make sense to only
c.ApplySnapshots(..)when not in this branch (as opposed to always, below)
Done. I'd written it that way originally to always exercise the apply-snapshot slow path, to increase robustness, but it just looks awkward.
pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 28 at r1 (raw file):
Previously, smg260 (Miral Gadani) wrote…
A nice idea that ultimately should belong elsewhere as part of the framework - perhaps in the test_runner, during clean up?
Unless you insist strongly otherwise, I feel like this is better. These things add up to some non-trivial latency and the test runner clean up is invoked too frequently for this checking. Also it's a bit nice to be able to run this manually through just:
roachtest run prune-dangling --cockroach ./cockroach --cluster irfansharif-prune
And get specific logging for it.
pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 51 at r1 (raw file):
Previously, smg260 (Miral Gadani) wrote…
Why loop if the 2nd arg is variadic? A slice of
errcould be returned.
It's just to get per-pruning logging below, which I've found useful. Slice of errors is unconventional and difficult to work with - you've got to figure out what indexes have succeeded and what haven't.
5dbd1bf to
d6074df
Compare
smg260
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)
pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 28 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Unless you insist strongly otherwise, I feel like this is better. These things add up to some non-trivial latency and the test runner clean up is invoked too frequently for this checking. Also it's a bit nice to be able to run this manually through just:
roachtest run prune-dangling --cockroach ./cockroach --cluster irfansharif-pruneAnd get specific logging for it.
Fair.
Not a strong opinion; I was alluding to a clean up phase at the end of running all tests and it was more of a general comment of how we could improve roachtest - definitely not for this PR.
d6074df to
5ddb79b
Compare
|
Spoke to @smg260 offline -- running the nightly roachtest suite manually showed that our TC agents need updated GCE permissions to be able to run these gcloud snapshot/volume related operations. So checking in at skipped for now. Internal thread, https://cockroachlabs.atlassian.net/browse/DEVINF-768. |
Long-lived disk snapshots can drastically reduce testing time for scale
tests. Tests, whether run by hand or through CI, need only run the
long running fixture generating code (importing some dataset, generating
it organically through workload, etc.) once snapshot fingerprints are
changed, fingerprints that incorporate the major crdb version that
generated them.
Here's an example run that freshly generates disk snapshots:
=== RUN admission-control/index-backfill
no existing snapshots found for admission-control/index-backfill (ac-index-backfill), doing pre-work
created volume snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 for volume irfansharif-snapshot-0001-1 on irfansharif-snapshot-0001-1/n1
using 1 newly created snapshot(s) with prefix "ac-index-backfill"
detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001
created volume irfansharif-snapshot-0001-1
attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
--- PASS: admission-control/index-backfill (79.14s)
Here's a subsequent run that makes use of the aforementioned disk
snapshots:
=== RUN admission-control/index-backfill
using 1 pre-existing snapshot(s) with prefix "ac-index-backfill"
detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001
created volume irfansharif-snapshot-0001-1
attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
--- PASS: admission-control/index-backfill (43.47s)
We add the following APIs to the roachtest.Cluster interface, for tests
to interact with disk snapshots. admission-control/index-backfill is an
example test making use of these APIs.
type Cluster interface {
// ...
// CreateSnapshot creates volume snapshots of the cluster using
// the given prefix. These snapshots can later be retrieved,
// deleted or applied to already instantiated clusters.
CreateSnapshot(ctx context.Context, snapshotPrefix string) error
// ListSnapshots lists the individual volume snapshots that
// satisfy the search criteria.
ListSnapshots(
ctx context.Context, vslo vm.VolumeSnapshotListOpts,
) ([]vm.VolumeSnapshot, error)
// DeleteSnapshots permanently deletes the given snapshots.
DeleteSnapshots(
ctx context.Context, snapshots ...vm.VolumeSnapshot,
) error
// ApplySnapshots applies the given volume snapshots to the
// underlying cluster. This is a destructive operation as far as
// existing state is concerned - all already-attached volumes are
// detached and deleted to make room for new snapshot-derived
// volumes. The new volumes are created using the same specs
// (size, disk type, etc.) as the original cluster.
ApplySnapshots(
ctx context.Context, snapshots []vm.VolumeSnapshot,
) error
}
These Cluster APIs are in turn is powered by the following additions to
the vm.Provider interface, implemented by each cloud provider. GCE is
the fully spec-ed out one for now.
type Provider interface {
// ...
// CreateVolume creates a new volume using the given options.
CreateVolume(l *logger.Logger, vco VolumeCreateOpts) (Volume, error)
// ListVolumes lists all volumes already attached to the given VM.
ListVolumes(l *logger.Logger, vm *VM) ([]Volume, error)
// DeleteVolume detaches and deletes the given volume from the
// given VM.
DeleteVolume(l *logger.Logger, volume Volume, vm *VM) error
// AttachVolume attaches the given volume to the given VM.
AttachVolume(l *logger.Logger, volume Volume, vm *VM) (string, error)
// CreateVolumeSnapshot creates a snapshot of the given volume,
// using the given options.
CreateVolumeSnapshot(
l *logger.Logger, volume Volume, vsco VolumeSnapshotCreateOpts,
) (VolumeSnapshot, error)
// ListVolumeSnapshots lists the individual volume snapshots that
// satisfy the search criteria.
ListVolumeSnapshots(
l *logger.Logger, vslo VolumeSnapshotListOpts,
) ([]VolumeSnapshot, error)
// DeleteVolumeSnapshot permanently deletes the given snapshot.
DeleteVolumeSnapshot(l *logger.Logger, snapshot VolumeSnapshot) error
}
Since these snapshots necessarily outlive the tests, and we don't want
them dangling perpetually, we introduce a prune-dangling roachtest that
acts as a poor man's cron job, sifting through expired snapshots
(>30days) and deleting them. For GCE at least it's not obvious to me how
to create these snapshots in cloud buckets with a TTL built in, hence
this hack. It looks like this (with change to the TTL):
=== RUN prune-dangling
pruned old snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8
--- PASS: prune-dangling (8.59s)
---
We add expose some of these APIs through the roachprod binary directly.
$ roachprod snapshot --help
snapshot enables creating/listing/deleting/applying cluster snapshots
Usage:
roachprod snapshot [command]
Available Commands:
create snapshot a named cluster, using the given snapshot name and description
list list all snapshots for the given cloud provider, optionally filtering by the given name
delete delete all snapshots for the given cloud provider optionally filtering by the given name
apply apply the named snapshots from the given cloud provider to the named cluster
---
About admission-control/index-backfill. It's a fully featured test that
uses the TPC-C 100k dataset and runs a foreground load for 20k
customers. It takes >4hrs to import this data set; with disk snapshots
this step is skipped entirely and takes a few minutes. The actual test
is trivial, we run the foreground load for 1hr and run a large index
backfill concurrently. Before cockroachdb#98308, this results in wild performance
oscillations. It's still a bit wild after flow control, but less so.
We slightly extend the tpc-e harness to make this happen, adding a few
smarts: exposing a 'during' helper to run backfills concurrently with
foreground load, integrate with --skip-init, estimated setup times,
prometheus, and disk snapshots of course.
Release note: None
5ddb79b to
23110d7
Compare
smg260
left a comment
There was a problem hiding this comment.
For posterity: OK to merge as is. @irfansharif to revisit once required permissions for TC agents are granted.
Reviewed 28 of 28 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)
smg260
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)
|
Thanks! bors r+ |
|
Build succeeded: |
Support for disk snapshots has been recently added in [1]. This change adds minor fixes which were necessary to run in a different environment which was previously tested. Specifically, several commands were missing `--project` which resulted in (silent) failures. Error propagation from `c.Parallel` was signaling an internal error instead of passing command's error. (Granted, the existing mechanism for error propagation is to be revisited in [2].) [1] cockroachdb#103757 [2] cockroachdb#104312 Epic: none Release note: None
104573: roachprod: minor fixes for snapshots r=srosenberg a=srosenberg Support for disk snapshots has been recently added in [1]. This change adds minor fixes which were necessary to run in a different environment which was previously tested. Specifically, several commands were missing `--project` which resulted in (silent) failures. Error propagation from `c.Parallel` was signaling an internal error instead of passing command's error. (Granted, the existing mechanism for error propagation is to be revisited in [2].) [1] #103757 [2] #104312 Epic: none Release note: None Co-authored-by: Stan Rosenberg <stan.rosenberg@gmail.com>
Part of #89978. Pre-cursor to #83826. Part of #98703.
Long-lived disk snapshots can drastically reduce testing time for scale tests. Tests, whether run by hand or through CI, need only run the long running fixture generating code (importing some dataset, generating it organically through workload, etc.) once snapshot fingerprints are changed, fingerprints that incorporate the major crdb version that generated them.
Here's an example run that freshly generates disk snapshots:
Here's a subsequent run that makes use of the aforementioned disk snapshots:
We add the following APIs to the roachtest.Cluster interface, for tests to interact with disk snapshots. admission-control/index-backfill is an example test making use of these APIs.
These Cluster APIs are in turn is powered by the following additions to the vm.Provider interface, implemented by each cloud provider. GCE is the fully spec-ed out one for now.
Since these snapshots necessarily outlive the tests, and we don't want them dangling perpetually, we introduce a prune-dangling roachtest that acts as a poor man's cron job, sifting through expired snapshots (>30days) and deleting them. For GCE at least it's not obvious to me how to create these snapshots in cloud buckets with a TTL built in, hence this hack. It looks like this (with change to the TTL):
We add expose some of these APIs through the roachprod binary directly.
About admission-control/index-backfill. It's a fully featured test that uses the TPC-C 100k dataset and runs a foreground load for 20k customers. It takes >4hrs to import this data set; with disk snapshots this step is skipped entirely and takes a few minutes. The actual test is trivial, we run the foreground load for 1hr and run a large index backfill concurrently. Before #98308, this results in wild performance oscillations. It's still a bit wild after flow control, but less so.
We slightly extend the tpc-e harness to make this happen, adding a few smarts: exposing a 'during' helper to run backfills concurrently with foreground load, integrate with --skip-init, estimated setup times, prometheus, and disk snapshots of course.
Release note: None