-
Notifications
You must be signed in to change notification settings - Fork 4.1k
roachtest: ClusterSpec should support user-specified clouds _compatible_ with a test #104029
Description
Currently, roachtests don't have an established mechanism for specifying a set of cloud providers which are compatible with a given test. Theoretically, a roachtest should be cloud-agnostic since it doesn't directly interact with cloud APIs, a task that's delegated to roachprod. In practice, several roachtests may in fact be incompatible with a set of cloud providers. E.g.,
schemachange/mixed-versions-compatusesgsutilto copy corpus data from a GCS bucket [1]restorevariant uses AWS-specific zones [2]- variants of
clearrangeandyscb/Ause ZFS, available only in GCE [3], [4], [5]
Note, while gsutil in the first example may seem like a superficial incompatibility, in reality large-scale backup/restore tests may induce large egress if data is pulled from a cloud provider, different from where the test is scheduled to execute. Hence, we need to ensure, either data is sufficiently replicated (i.e., local to the test's cloud provider), or the test is specified to be incompatible with the cloud providers which lack the required test data (fixtures).
In practice, there may be additional, albeit rare reasons for incompatibility; e.g., quota, price, specific machine type, etc.
Consequently, there must be an established mechanism, both for specifying when a test is incompatible with a cloud provider, and for skipping the test from executing against all incompatible cloud providers. Currently, ClusterSpec.Cloud denotes the cloud provider that's been provided via roachtest run --cloud, not a compatible cloud provider, specified by the test author. That is, the framework makes an implicit (and wrong) assumption that every test should be executable against ClusterSpec.Cloud. As a further confounding factor, CI uses tags to select roachtests per given cloud provider [6].
As for skipping incompatible tests, test authors came up with ad hoc workarounds, e.g., [7], [8], thereby complicating both the setup logic, as well as, future refactoring.
[1]
cockroach/pkg/cmd/roachtest/tests/mixed_version_decl_schemachange_compat.go
Lines 67 to 70 in 87c6775
| err = c.RunE(ctx, c.Node(1), | |
| fmt.Sprintf(" gsutil cp gs://cockroach-corpus/corpus-%s/corpus %s", | |
| version, | |
| corpusFilePath)) |
[2]
cockroach/pkg/cmd/roachtest/tests/restore.go
Lines 299 to 304 in 87c6775
| hardware: makeHardwareSpecs(hardwareSpecs{ | |
| nodes: 9, | |
| zones: []string{"us-east-2b", "us-west-2b", "eu-west-1b"}}), // These zones are AWS-specific. | |
| backup: makeBackupSpecs(backupSpecs{}), | |
| timeout: 90 * time.Minute, | |
| tags: registry.Tags("aws"), |
[3]
| Cluster: r.MakeClusterSpec(10, spec.CPU(16), spec.SetFileSystem(spec.Zfs)), |
[4]
cockroach/pkg/cmd/roachtest/tests/ycsb.go
Line 118 in 87c6775
| Cluster: r.MakeClusterSpec(4, spec.CPU(cpus), spec.SetFileSystem(spec.Zfs)), |
[5]
cockroach/pkg/cmd/roachtest/spec/cluster_spec.go
Lines 271 to 274 in 87c6775
| if s.Cloud != GCE { | |
| return vm.CreateOpts{}, nil, errors.Errorf( | |
| "node creation with zfs file system not yet supported on %s", s.Cloud, | |
| ) |
[6]
cockroach/build/teamcity/util/roachtest_util.sh
Lines 66 to 77 in 87c6775
| gce) | |
| # Confusing due to how we've handled tags in the past where it has been assumed that all tests should | |
| # be run on GCE. Now with refactoring of how tags are handled, we need: | |
| # - "default" to ensure we select tests that don't have any user specified tags (preserve old behavior) | |
| # - "aws" to ensure we select tests that now no longer have "default" because they have the "aws" tag | |
| # Ideally, refactor the tags themselves to be explicit about what cloud they are for and when they can run. | |
| # https://github.com/cockroachdb/cockroach/issues/100605 | |
| FILTER="tag:aws tag:default" | |
| ;; | |
| aws) | |
| if [ -z "${FILTER}" ]; then | |
| FILTER="tag:aws" |
[7]
cockroach/pkg/cmd/roachtest/tests/ycsb.go
Lines 50 to 53 in 87c6775
| // For now, we only want to run the zfs tests on GCE, since only GCE supports | |
| // starting roachprod instances on zfs. | |
| if c.Spec().FileSystem == spec.Zfs && c.Spec().Cloud != spec.GCE { | |
| t.Skip("YCSB zfs benchmark can only be run on GCE", "") |
[8]
cockroach/pkg/cmd/roachtest/tests/restore.go
Lines 725 to 727 in 87c6775
| if rd.c.Spec().Cloud != rd.sp.backup.cloud { | |
| // For now, only run the test on the cloud provider that also stores the backup. | |
| rd.t.Skip("test configured to run on %s", rd.sp.backup.cloud) |
Jira issue: CRDB-28322