Skip to content

roachtest: ClusterSpec should support user-specified clouds _compatible_ with a test #104029

@srosenberg

Description

@srosenberg

Currently, roachtests don't have an established mechanism for specifying a set of cloud providers which are compatible with a given test. Theoretically, a roachtest should be cloud-agnostic since it doesn't directly interact with cloud APIs, a task that's delegated to roachprod. In practice, several roachtests may in fact be incompatible with a set of cloud providers. E.g.,

  • schemachange/mixed-versions-compat uses gsutil to copy corpus data from a GCS bucket [1]
  • restore variant uses AWS-specific zones [2]
  • variants of clearrange and yscb/A use ZFS, available only in GCE [3], [4], [5]

Note, while gsutil in the first example may seem like a superficial incompatibility, in reality large-scale backup/restore tests may induce large egress if data is pulled from a cloud provider, different from where the test is scheduled to execute. Hence, we need to ensure, either data is sufficiently replicated (i.e., local to the test's cloud provider), or the test is specified to be incompatible with the cloud providers which lack the required test data (fixtures).
In practice, there may be additional, albeit rare reasons for incompatibility; e.g., quota, price, specific machine type, etc.

Consequently, there must be an established mechanism, both for specifying when a test is incompatible with a cloud provider, and for skipping the test from executing against all incompatible cloud providers. Currently, ClusterSpec.Cloud denotes the cloud provider that's been provided via roachtest run --cloud, not a compatible cloud provider, specified by the test author. That is, the framework makes an implicit (and wrong) assumption that every test should be executable against ClusterSpec.Cloud. As a further confounding factor, CI uses tags to select roachtests per given cloud provider [6].

As for skipping incompatible tests, test authors came up with ad hoc workarounds, e.g., [7], [8], thereby complicating both the setup logic, as well as, future refactoring.

[1]

err = c.RunE(ctx, c.Node(1),
fmt.Sprintf(" gsutil cp gs://cockroach-corpus/corpus-%s/corpus %s",
version,
corpusFilePath))

[2]
hardware: makeHardwareSpecs(hardwareSpecs{
nodes: 9,
zones: []string{"us-east-2b", "us-west-2b", "eu-west-1b"}}), // These zones are AWS-specific.
backup: makeBackupSpecs(backupSpecs{}),
timeout: 90 * time.Minute,
tags: registry.Tags("aws"),

[3]
Cluster: r.MakeClusterSpec(10, spec.CPU(16), spec.SetFileSystem(spec.Zfs)),

[4]
Cluster: r.MakeClusterSpec(4, spec.CPU(cpus), spec.SetFileSystem(spec.Zfs)),

[5]
if s.Cloud != GCE {
return vm.CreateOpts{}, nil, errors.Errorf(
"node creation with zfs file system not yet supported on %s", s.Cloud,
)

[6]
gce)
# Confusing due to how we've handled tags in the past where it has been assumed that all tests should
# be run on GCE. Now with refactoring of how tags are handled, we need:
# - "default" to ensure we select tests that don't have any user specified tags (preserve old behavior)
# - "aws" to ensure we select tests that now no longer have "default" because they have the "aws" tag
# Ideally, refactor the tags themselves to be explicit about what cloud they are for and when they can run.
# https://github.com/cockroachdb/cockroach/issues/100605
FILTER="tag:aws tag:default"
;;
aws)
if [ -z "${FILTER}" ]; then
FILTER="tag:aws"

[7]
// For now, we only want to run the zfs tests on GCE, since only GCE supports
// starting roachprod instances on zfs.
if c.Spec().FileSystem == spec.Zfs && c.Spec().Cloud != spec.GCE {
t.Skip("YCSB zfs benchmark can only be run on GCE", "")

[8]
if rd.c.Spec().Cloud != rd.sp.backup.cloud {
// For now, only run the test on the cloud provider that also stores the backup.
rd.t.Skip("test configured to run on %s", rd.sp.backup.cloud)

Jira issue: CRDB-28322

Metadata

Metadata

Assignees

Labels

A-testingTesting tools and infrastructureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-testengTestEng Teamv23.1.15

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions