roachtest: add admission-control/index-backfill#103816
Closed
irfansharif wants to merge 5 commits intocockroachdb:masterfrom
Closed
roachtest: add admission-control/index-backfill#103816irfansharif wants to merge 5 commits intocockroachdb:masterfrom
irfansharif wants to merge 5 commits intocockroachdb:masterfrom
Conversation
Pure code movement. We'll make use of it outside this file in subsequent commits. Release note: None
These tests have been stable for a few months now. Reduce to a weekly cadence. Release note: None
Long-lived disk snapshots can drastically reduce testing time for scale
tests. Tests, whether run by hand or through CI, need only run the
long running fixture generating code (importing some dataset, generating
it organically through workload, etc.) once snapshot fingerprints are
changed, fingerprints that incorporate the major crdb version that
generated them.
Here's an example run that freshly generates disk snapshots:
=== RUN admission-control/index-backfill
03:57:19 admission_control_index_backfill.go:53: no existing snapshots found for admission-control/index-backfill (ac-index-backfill), doing pre-work
03:57:54 roachprod.go:1626: created volume snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 (id=6426236595187320652) for volume irfansharif-snapshot-0001-1 on irfansharif-snapshot-0001-1/n1
03:57:55 admission_control_index_backfill.go:61: using 1 newly created snapshot(s) with prefix "ac-index-backfill"
03:58:02 roachprod.go:1716: detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001
03:58:28 roachprod.go:1764: created volume irfansharif-snapshot-0001-1
03:58:33 roachprod.go:1770: attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
03:58:36 roachprod.go:1783: mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
--- PASS: admission-control/index-backfill (79.14s)
Here's a subsequent run that makes use of the aforementioned disk
snapshot:
=== RUN admission-control/index-backfill
04:00:40 admission_control_index_backfill.go:63: using 1 pre-existing snapshot(s) with prefix "ac-index-backfill"
04:00:47 roachprod.go:1716: detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001
04:01:14 roachprod.go:1763: created volume irfansharif-snapshot-0001-1
04:01:19 roachprod.go:1769: attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
04:01:22 roachprod.go:1782: mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
--- PASS: admission-control/index-backfill (43.47s)
We add the following APIs to the roachtest.Cluster interface, for tests
to interact with disk snapshots. admission-control/index-backfill is a
placeholder test making use of these APIs.
type Cluster interface {
// ...
// CreateSnapshot creates volume snapshots of the cluster using
// the given prefix. These snapshots can later be retrieved,
// deleted or applied to already instantiated clusters.
CreateSnapshot(ctx context.Context, snapshotPrefix string) error
// ListSnapshots lists the individual volume snapshots that
// satisfy the search criteria.
ListSnapshots(
ctx context.Context, vslo vm.VolumeSnapshotListOpts,
) ([]vm.VolumeSnapshot, error)
// DeleteSnapshots permanently deletes the given snapshots.
DeleteSnapshots(
ctx context.Context, snapshots ...vm.VolumeSnapshot,
) error
// ApplySnapshots applies the given volume snapshots to the
// underlying cluster. This is a destructive operation as far as
// existing state is concerned - all already-attached volumes are
// detached and deleted to make room for new snapshot-derived
// volumes. The new volumes are created using the same specs
// (size, disk type, etc.) as the original cluster.
ApplySnapshots(
ctx context.Context, snapshots []vm.VolumeSnapshot,
) error
}
This in turn is powered by the following additions to the vm.Provider
interface, implemented by each cloud provider.
type Provider interface {
// ...
// CreateVolume creates a new volume using the given options.
CreateVolume(l *logger.Logger, vco VolumeCreateOpts) (Volume, error)
// ListVolumes lists all volumes already attached to the given VM.
ListVolumes(l *logger.Logger, vm *VM) ([]Volume, error)
// DeleteVolume detaches and deletes the given volume from the
// given VM.
DeleteVolume(l *logger.Logger, volume Volume, vm *VM) error
// AttachVolume attaches the given volume to the given VM.
AttachVolume(l *logger.Logger, volume Volume, vm *VM) (string, error)
// CreateVolumeSnapshot creates a snapshot of the given volume,
// using the given options.
CreateVolumeSnapshot(
l *logger.Logger, volume Volume, vsco VolumeSnapshotCreateOpts,
) (VolumeSnapshot, error)
// ListVolumeSnapshots lists the individual volume snapshots that
// satisfy the search criteria.
ListVolumeSnapshots(
l *logger.Logger, vslo VolumeSnapshotListOpts,
) ([]VolumeSnapshot, error)
// DeleteVolumeSnapshot permanently deletes the given snapshot.
DeleteVolumeSnapshot(l *logger.Logger, snapshot VolumeSnapshot) error
}
Since these snapshots necessarily outlive the tests, and we don't want
them dangling perpetually, we introduce a prune-dangling roachtest that
acts as a poor man's cron job, sifting through expired snapshots
(>30days) and deleting them. For GCE at least it's not obvious to me how
to create these snapshots in cloud buckets with a TTL built in, hence
this hack. It looks like this (with change to the TTL):
=== RUN prune-dangling
06:22:48 prune_dangling_snapshots_and_disks.go:54: pruned old snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 (id=7962137245497025996)
06:22:48 test_runner.go:1023: tearing down after success; see teardown.log
--- PASS: prune-dangling (8.59s)
Subsequent commits will:
- [ ] Fill out admission-control/index-backfill, a non-trivial use of
disk snapshots. It will cut down the test time from >4hrs to <25m.
- [ ] Expose top-level commands in roachprod to manipulate these
snapshots.
Release note: None
Member
c05f987 to
01a4c74
Compare
And make it use disk snapshots. Add a few smarts to the TPC-E harness (exposing a 'during' helper to run backfills concurrently with foreground load, integrate with --skip-init, --local, estimated setup times, prometheus, and disk snapshots of course). Release note: None
7bd072d to
7ce8dfb
Compare
Release note: None
7ce8dfb to
7b176c1
Compare
Contributor
Author
|
Pulled it into #103757. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
And make it use disk snapshots. Add a few smarts to the TPC-E harness
(exposing a 'during' helper, integrate with --skip-init, --local,
estimated setup times, prometheus, and disk snapshots) while here.
Release note: None