roachtest/mixedversion: sync workload binary versions with the cluster version#153269
Conversation
69f6bfb to
d25794d
Compare
7314118 to
8baf4cd
Compare
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
b95c845 to
055900b
Compare
|
Not all tests passing yet (5/29 failing), debugging multi cluster scenarios but opening up for review for general feedback |
| // exists, assume the binary has already been uploaded previously. Returns the | ||
| // path of the uploaded binaries on the nodes. | ||
| // | ||
| // If --versions-binary-override option is set and if version v is contained in |
There was a problem hiding this comment.
nit: only if we are uploading the cockroach binary
There was a problem hiding this comment.
so in other words for mixedversion to use --versions-binary-override, you have to use --cockroach ...? I don't know the CI roachtest call / if --cockroach-stage ... is used, just after reading the logic of the override binary version
it seems I don't see right away why it wouldn't work if
--cockroach-stage is set. No worries on finding the that logic though, I believe you
just wondering if those args aren't meant to be used together should i mark it out them out as incompatible in main?
(the reason I wrote that comment in the first place was the behavior wasn't the clearest to me in a first pass so just trying to make things more digestible)
There was a problem hiding this comment.
I more so meant that we only support overriding the cockroach binary, i.e. you can't override the DeprecatedWorkload()
| // TODO(testeng): Replace with https://github.com/cockroachdb/cockroach/issues/147374 | ||
| // By default, the binary used to run the command(s) will be the same as the | ||
| // the current version of the cluster at the time this hook is executed. | ||
| // This is because the binary version is no longer backwards compatible as of |
There was a problem hiding this comment.
nit: this is a bit overgeneralizing. Only some of the workloads, e.g. bank, are no longer backwards compatible. We can just say that we assume the workload is not backwards compatible.
pkg/cmd/roachtest/tests/tpcc.go
Outdated
| // version as the cockroach cluster. | ||
| // TODO(testeng): Replace with https://github.com/cockroachdb/cockroach/issues/147374 | ||
| binary := uploadCockroach(ctx, t, c, randomNode, h.System.FromVersion) | ||
| binary := clusterupgrade.BinaryPathForVersion(t, h.System.FromVersion, "cockroach") |
There was a problem hiding this comment.
I think we can hide this behind a h.CockroachBinary method.
That way the test can just call cmd := roachtestutil.NewCommand("%s workload fixtures import bank", h.CockroachBinary) and not think about what version we need.
There was a problem hiding this comment.
So this works for workloads defined in user defined hooks like admission-control/elastic-workload/mixed-version that have access to helper
runForeground := func(ctx context.Context, duration time.Duration, h *mixedversion.Helper) error {
cmd := roachtestutil.NewCommand("%s workload run kv "+
"%s --concurrency=500 "+
"--max-rate=5000 --read-percent=5 "+
"--min-block-bytes=512 --max-block-bytes=1024 "+
"--txn-qos='regular' "+
"--duration=%v {pgurl%s}",
h.CockroachBinaryForWorkload(t), roachtestutil.GetWorkloadHistogramArgs(t, c, labels), duration,
c.CRDBNodes())
return c.RunE(ctx, option.WithNodes(c.WorkloadNode()), cmd.String())
}
but in workloads that are defined in the outer regiester_test() like in backup-restore/mixed-version, there's no access to Helper (if this is wrong let me know)
func bankWorkloadCmd(
l *logger.Logger, testRNG *rand.Rand, roachNodes option.NodeListOption, mock bool,
) (init *roachtestutil.Command, run *roachtestutil.Command) {
...
init = roachtestutil.NewCommand("./cockroach workload init bank").
Flag("rows", bankRows).
MaybeFlag(bankPayload != 0, "payload-bytes", bankPayload).
Flag("ranges", 0).
Arg("{pgurl%s}", roachNodes)
so because of this i think the override logic is will needed in Workload
for all usages in user defined hooks i'll switch to the helper
There was a problem hiding this comment.
there's no access to Helper
Right, it's used outside of mixed version tests so we don't want to pass the Helper there. Can we instead just pass a binary string?
There was a problem hiding this comment.
I'm not sure, because for clusterupgrade.BinaryPathForVersion(t, h.System.FromVersion, "cockroach"), that needs a version that i was relying on Helper to get. Calling Workload get's around this because it defines the command in the hook which has access to Helper
Will see if I can think of another way
There was a problem hiding this comment.
If the main concern is to get rid of the override boolean in Workload, could also have a Workload function and a WorkloadWithOverrideBinary function
There was a problem hiding this comment.
Ahh I see the issue, that makes sense. In that case, I'd still lean towards removing it since we currently have nothing that sets OverrideBinary to true. Also remember that Workload is a convenience wrapper so tests can always just do it manually via BackgroundFunc if you really need to.
| p.setFinalizing(service, false) | ||
| p.setStage(service, AfterUpgradeFinalizedStage) | ||
|
|
||
| if len(p.cluster.All()) != len(p.cluster.CRDBNodes()) { |
There was a problem hiding this comment.
What do you think about just passing in the workload node(s) as an argument in the test? I'm a bit uneasy about relying on this since:
- In the future we might make the workload node(s) its own cluster
- If the test doesn't care about syncing the binaries, this adds extra work
- WorkloadNode() as of right now is purely an optional convenience thing.
There was a problem hiding this comment.
Pass it in as an arg in mixedversion.NewTest(...) as a type option.NodeListOption? Sure sounds better than relying on math. Alternatively before i was thinking of using the lower level framework cluster.WorkloadNode, but unfortunately that calls Fatal if there's not a workload node. So adding a new cluster method could also work. Any preference on working in mixedversion / cluster?
- If we add it to mixedversion's
test.Test(and/or helper)?, we can rip out cluster from the main code path - If we add it to cluster, it might be simpler bc less abstraction but would add a reliance on cluster
leaning towards your suggestion, but curious if you have any thoughts
There was a problem hiding this comment.
Yeah adding it to mixedversion test.Test was what i was thinking. I'm not sure if I follow your second option though. If you want to see if a workload node was declared, you can check the test spec. However, mixed version doesn't have access to the roachtest test spec because it has it's own test spec (test.Test).
we can rip out cluster from the main code path
I think we still need cluster for things like cluster.Run but I could be wrong.
There was a problem hiding this comment.
Something similar to this
cockroach/pkg/cmd/roachtest/cluster.go
Line 1202 in 797f4bb
But leaning towards your initial suggestion anyways. Also just meant ripping the cluster stuff that i added which is the reason why i had to add the cluster mock in the unit test so can take that out too
There was a problem hiding this comment.
passing in the workload node(s) as an argument in the test
So 2 approaches I thought of here:
- Changing function signature
- Wouldn't make sense for non workload tests to have to pass in a nil
- could use a new constructor that takes in workload nodes, but having 2 separate constructors might be a little confusing
- AND would need to add logic to add to testOptions so the planner can see the objs which seems like an antipattern in this framework as testOptions seems to be entirely user supplied
func NewTestWithWorkload(
ctx context.Context,
t test.Test,
l *logger.Logger,
c cluster.Cluster,
crdbNodes option.NodeListOption,
workloadNodes option.NodeListOption,
options ...CustomOption,
) *Test { }- Create an option
- alligned with framework design, could be seen as slightly odd as passing in the crdb nodes as a required param and not workload nodes, but not every mvt use's workload nodes so seems ok
mvt := mixedversion.NewTest(
ctx, t, t.L(), c, c.CRDBNodes(),
// We use a longer upgrade timeout in this test to give the
// migrations enough time to finish considering all the data
// that might exist in the cluster by the time the upgrade is
// attempted.
mixedversion.WithWorkloadNodes(c.WorkloadNode()),Leaning towards the option
There was a problem hiding this comment.
option makes sense and is what I was thinking. WorkloadNode itself is already an option
| // This is because the binary version is no longer backwards compatible as of | ||
| // v25.3 | ||
| // | ||
| // If overrideBinary is true, the binary used to run the command(s) will be |
There was a problem hiding this comment.
This bool sounds like it needs to be flipped to make sense. If it's true and we use the binary passed in the command, then it sounds like we aren't overriding anything at all.
There was a problem hiding this comment.
We also might not need it at all anymore, instead we can just make sure to change the tests to use h.Cockroach() instead of ./cockroach which should give the versioned binary. This gives the same flexibility but without this extra bool.
| func (s stageWorkloadBinaryStep) Run( | ||
| ctx context.Context, l *logger.Logger, _ *rand.Rand, h *Helper, | ||
| ) error { | ||
| numWorkloadNodes := len(h.runner.cluster.All()) - len(h.runner.cluster.CRDBNodes()) |
There was a problem hiding this comment.
Same as above, it'd be nice to not rely on node arithmetic here if possible.
I don't think we wanna run the staging step concurrently with anything. |
|
I also wonder if we could just add a step at the very start to stage all the versions needed across the entire test at once. That way we don't add a bunch of new steps that are purely framework details. |
|
There's also this CDC failure you should fix that we (I) missed a while ago: |
Have a fix that's working locally, need to verify on CI @DarrylWong |
Makes sense, I'll give that a try. Is that what's being done for all the binaries for cluster nodes too? |
I assumed that user hooks running in |
No, but for the cluster nodes there's a logical spot to upload them i.e. when we do the rolling restart for each node so we don't need a separate step. It's not a huge deal though - I'm partially thinking ahead to our new modular framework There's no dependency for uploading the workload binaries, so we could upload them all at the start if we wanted. |
| Flags map[string]*string | ||
| UseEquals bool | ||
| // InlineEnvVarAssignment e.g. COCKROACH_RANDOM_SEED=%d ./workload ... | ||
| InlineEnvVarAssignments map[string]*string |
There was a problem hiding this comment.
nit: just EnvVars is probably clear enough
| // exists, assume the binary has already been uploaded previously. Returns the | ||
| // path of the uploaded binaries on the nodes. | ||
| // | ||
| // If --versions-binary-override option is set and if version v is contained in |
There was a problem hiding this comment.
I more so meant that we only support overriding the cockroach binary, i.e. you can't override the DeprecatedWorkload()
| // See: https://github.com/cockroachdb/cockroach/issues/121411. | ||
| mixedversion.AlwaysUseLatestPredecessors, | ||
| ) | ||
| tester := newSchemaChangeMixedVersionTester(t, c, maxOps, concurrency) |
There was a problem hiding this comment.
The schemachange workload is a bit different since it has backwards compatibility built in. Let me think about it more but I think we actually want to use the tip of master here.
There was a problem hiding this comment.
before I changed that file to used versioned workload binary, this was the error I was seeing, different from the unknown flag / arg error I saw when using cockroach workload .... but I assumed it was the same root cause bc of using the newer binary. And when I ran locally with my change with the same seed it started to pass
Wraps: (5) full command output in run_193441.767109628_n4_COCKROACHRANDOMSEED2.log
Wraps: (6) Node 4. Command with error:
| ```
| COCKROACH_RANDOM_SEED=2207545676704137517 ./workload run schemachange {pgurl:1-4} --concurrency 5 --max-ops 1000 --verbose 1
| ```
| <truncated> ... 60 1 workload/cli/run.go:590 [-] 3 + | src/runtime/asm_amd64.s:1700
| E250917 19:35:30.793260 1 workload/cli/run.go:590 [-] 3 +Wraps: (4) ***UNEXPECTED ERROR; Received an unexpected execution error.
| E250917 19:35:30.793260 1 workload/cli/run.go:590 [-] 3 +Wraps: (5) ERROR: function udf_w3_23("char",bit,"char",bit,bool,box2d,bpchar,bytes,citext,string,date,decimal,float,geography,geometry,inet,int,int2,int2vector,int,interval,jsonb,string,oidvector,pg_lsn,refcursor,string,time) does not exist (SQLSTATE 42883)
| E250917 19:35:30.793260 1 workload/cli/run.go:590 [-] 3 +Error types: (1) *markers.withMark (2) *schemachange.ErrorState (3) *withstack.withStack (4) *errutil.withPrefix (5) *pgconn.PgError
| Error: ***UNEXPECTED ERROR; Received an unexpected execution error.: ERROR: function udf_w3_23("char",bit,"char",bit,bool,box2d,bpchar,bytes,citext,string,date,decimal,float,geography,geometry,inet,int,int2,int2vector,int,interval,jsonb,string,oidvector,pg_lsn,refcursor,string,time) does not exist (SQLSTATE 42883)
Wraps: (7) COMMAND_PROBLEM
Wraps: (8) exit status 1
but yea I also don't know about the nuance that using workload vs cockroach workload has besides it's a different binary so if I should do something different let me know
There was a problem hiding this comment.
cockroach workload contains a subset of the workloads found in workload. They aren't 1:1 to ensure the cockroach binary isn't too large. However, to make sure we don't upload an unnecessary binary unless needed (since we already upload cockroach to every node), we opt for cockroach if possible.
Kicked off a new run with a fix for |
| // initializeSchemaAndIDs ensures schema objects are created in the cluster, and determines | ||
| // the various IDs that will be used as placeholders for the endpoints. | ||
| func initializeSchemaAndIDs(ctx context.Context, c cluster.Cluster, l *logger.Logger) error { | ||
| func initializeSchemaAndIDs( |
There was a problem hiding this comment.
nit: could just pass in the binaryPath instead of the entire tpcc command
| } | ||
| } | ||
|
|
||
| // WithWorkloadNodes allows callers to specify workload nodes which |
There was a problem hiding this comment.
Looks like an unfinished comment :)
| }) | ||
| } | ||
|
|
||
| // DepricatedWorkload is this needed? TODO remove if not needed |
There was a problem hiding this comment.
GItHub will remember for us, so no need to keep commented code around. We'll just pull from history if we need to restore it.
| Name: "decommission/mixed-versions", | ||
| Owner: registry.OwnerKV, | ||
| Cluster: r.MakeClusterSpec(numNodes), | ||
| Cluster: r.MakeClusterSpec(numNodes, spec.WorkloadNode()), |
There was a problem hiding this comment.
This test doesn't look like it was using a workload node. It was importing some data into the cluster using the workload binary, but just doing so on a random CRDB node since it's not heavyweight enough to require it's own VM.
There was a problem hiding this comment.
(i don't think my prev comment sent for whatever reason, but this might be a dup resp)
kk i can change it, i was initially thinking from a functional separation of concerns point of view, but if there's no clear benefit to having it / we can save a VM then i'll take it out
There was a problem hiding this comment.
spec.WorkloadNode just indicates that the last node in the cluster is denoted as a workload node, it doesn't actually add a VM to the cluster (i.e. the cluster is still size(numNodes))
Maybe I'd do it differently now since people have been confused about this, but at the time I did it this way because:
- I didn't want to change hundreds of tests that were already adding the workload node in the
ClusterSpecor callingclusterSpec.NodeCount. Test writers were already used to it and was my thinking. - It was (optimistically) just a temporary change until we got the ability to spin up multiple clusters per roachtests.
There was a problem hiding this comment.
Ah ok thanks for the info! I feel like i remember reading that or someone telling me a while back, but yeah for some reason when i see spec.WorkloadNode in my head I default to thinking it's adding a node.
I feel like it does make sense to initialize the cluster with all the nodes you need and declare node(s) as workload, maybe the function name SetNodesAsWorkload vs WorkloadNode could help? I don't know if you could tell, but i tend to lean towards more verbose names haha
|
|
||
| runWorkloadCmd := tester.runWorkloadCmd(mvt.RNG()) | ||
| _ = mvt.BackgroundCommand("run workload", tester.workloadNodes, runWorkloadCmd) | ||
| _ = mvt.Workload("bank", tester.workloadNodes, nil, runWorkloadCmd) |
There was a problem hiding this comment.
nit: Why did this need to change from BackgroundCommand to Workload? CDCMixedVersionCheckpointing still has it as BackgroundCommand so we should make it consistent whatever we choose.
There was a problem hiding this comment.
Ah thanks for the catch, I missed CDCMixedVersionCheckpointing, i'll make the same change
And as for why, I changed to using Workload so workload can handle replacing the binary path with the correct versioned one. Without calling Workload, there's no way to get the versioned binary path because the logic currently exists in the test's main body and not in a user hook so it doesn't have access to Helper. Just thought it would be simpler to change this call vs moving the workload command building logic into a custom user hook. Also Workload calls BackgroundFunc which calls BackgroundCommand so it looks like we get to the same thing in the end, but let me know if there's anything to be aware of with that call chain, it looks like its mainly just abstracting which node the command is ran on, but since we're just passing the 1 workload node i think it should be consistent
func Workload(...) {
...
initCmd.Binary = h.CockroachBinaryForWorkload(t.rt)
}| Name: "import/mixed-versions", | ||
| Owner: registry.OwnerSQLQueries, | ||
| Cluster: r.MakeClusterSpec(4), | ||
| Cluster: r.MakeClusterSpec(4, spec.WorkloadNode()), |
There was a problem hiding this comment.
Same as above, this test wasn't using a Workload node since it just imports cold data but doesn't actually run the workload.
There was a problem hiding this comment.
kk, same as above can remove
DarrylWong
left a comment
There was a problem hiding this comment.
Some more nits and unit test only comments. I think the core functionality looks good though!
| return c.Flag(name, falseVal) | ||
| } | ||
|
|
||
| func (c *Command) InlineEnvVarAssignment(name string, val interface{}) *Command { |
There was a problem hiding this comment.
nit: Just EnvVar is probably fine here too. IMO inline is implied, since it would be surprising if it e.g. exported the env var.
There was a problem hiding this comment.
oops should've changed that when i changed the field name
| Flags map[string]*string | ||
| UseEquals bool | ||
| // EnvVars e.g. COCKROACH_RANDOM_SEED=%d ./workload ... | ||
| EnvVars map[string]*string |
There was a problem hiding this comment.
There's a commandbuilder_test already, wouldn't hurt to add an example using the env var option.
| mvt = createDataDrivenMixedVersionTest(t, d.CmdArgs) | ||
| case "mixed-version-test-with-workload": | ||
| mc := mockcluster.NewMockCluster(ctrl) | ||
| mc.EXPECT().WorkloadNode().AnyTimes().Return(option.NewNodeListOptionRange(1, 1)) |
There was a problem hiding this comment.
I'm all for using gomock more in unit tests (I've wanted to lift the mock out of clusterstats for a while), but it seems overkill if this is all we use it for.
Something like this should work too without the need for a mock:
case "workload-node":
mvt = newTest(ctrl, WithWorkloadNodes(option.NodeListOption{5}))
Better yet, you can make it an arg in createDataDrivenMixedVersionTest:
case "workload_node":
workloadNodes := option.NodeListOption{}
for _, nodeStr := range arg.Vals {
n, err := strconv.Atoi(nodeStr)
require.NoError(t, err)
workloadNodes = append(workloadNodes, n)
}
opts = append(opts, WithWorkloadNodes(workloadNodes))
This would let the test itself specify node(s) as well as let you keep just one mixed-version-test hook:
mixed-version-test workload_node=(1,3)
----
ok
There was a problem hiding this comment.
Ah I didn't consider just skipping the cluster part and passing in a list of nodes directly, thanks! And yea i agree, originally with my first approach where I was relying on cluster methods I had to mock out those calls in all the unit tests, but now it's not needed so ripped everything out
| return numReleases > 1 | ||
| } | ||
|
|
||
| // CockroachBinaryForWorkload returns the correct binary path to use when |
There was a problem hiding this comment.
nit: what do you think about something like VersionedCockroachPath?
- Just a hypothetical, but maybe we want to run some
./cockroach debugcommands that also needs to be versioned. This would work for that use case too, so we can make it generic. - Makes it more obvious that this is a versioned binary from just the name.
There was a problem hiding this comment.
Makes sense! changed
|
|
||
| // schemaChangeMixedVersionTester bundles state and configuration to pass to | ||
| // test hooks | ||
| type schemaChangeMixedVersionTester struct { |
There was a problem hiding this comment.
Looks like this was just refactoring and not any actual test changes? Just making sure I'm not missing anything.
There was a problem hiding this comment.
The change was using a versioned dedicated workload binary instead of the default that looks like just get's built here build/teamcity/cockroach/ci/builds/build_impl.sh for CI. Also we had a thread on this before https://github.com/cockroachdb/cockroach/pull/153269/files#r2363296130
But I was seeing an odd workload run failure that seemed to go away after I started using the versioned binary (logs in the linked thread)
Basically I did something similar to your original workaround. In every hook that is doing a ./workload {run || init} I call a roachtest level function to get the versioned binary
binaryPath, _, err := clusterupgrade.UploadWorkload(
ctx, t.t, l, t.c, t.c.WorkloadNode(), h.System.FromVersion)
if err != nil {
return err
}
To do this, I needed to create the struct and have the mixedversion hooks be methods on that struct so I can pass along the test.Test (roachtest) inside the mixedversion hook so I can call UploadWorkload because it needs t.t as a parameter
The mixedversion CDC tests do something similar which is where i got the inspiration from https://github.com/cockroachdb/cockroach/pull/153269/files#diff-187d25c5a34e03c63c9633e5b04ca4c2ee071761b055df84c7d05e95bfcfa5c0L147
There was a problem hiding this comment.
On that note though, I am seeing this test fail in the most recent CI run, taking a look
There was a problem hiding this comment.
also I forgot about this case when doing the original change to doing all the binary downloads up front, I can make this another mixedversion test option, but before I do that
The schemachange workload is a bit different since it has backwards compatibility built in. Let me think about it more but I think we actually want to use the tip of master here.
should we even be using the versioned binary here? I don't have the logs from the original run I did with this test so I can do some runs from before this change to first verify the failure was happening. and the reason i didn't properly verify this was failing before making the change originally was because we decided to just go ahead and try to use the versioned binary as a default
but I'm going to look at the other test failures first because this seems like a special case.
There was a problem hiding this comment.
Okay so after some digging in the original PR #113890
It seems like we do want this to be versioned, and we currently set:
// Disable version skipping and limit the test to only one upgrade as the workload is only
// compatible with the branch it was built from and the major version before that.
mixedversion.NumUpgrades(1),
mixedversion.DisableSkipVersionUpgrades,
However, it seems like since the schemachange workload is compatible with one version previous, I think that actually means we want to stage the h.System.ToVersion? If. we do that, then we can get rid of the above mixedversion.NumUpgrades limitation.
IMO it also wouldn't hurt to do this as a followup, that way we don't hold up the PR while you are testing/figuring this out.
There was a problem hiding this comment.
kk thanks for the follow up info! i'll go ahead and revert the file on this branch, and open up a new followup issue
DarrylWong
left a comment
There was a problem hiding this comment.
LGTM modulo the commits being cleaned up before merging.
|
ty for the extensive review @DarrylWong !! Running another round of tests will merge if everything looks good https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/20528160?hideProblemsFromDependencies=false&hideTestsFromDependencies=false&expandBuildChangesSection=true |
|
Looks like cdc/mixed-version is failing on 25.4 so lets try to backport it if it goes through cleanly |
9d81c33 to
0f02751
Compare
0f02751 to
e39348f
Compare
96d1bf6 to
aee214e
Compare
…r version The current workload binary is no longer backwards compatible for certain workloads. Previously, the workload binary would just use whatever default binary was staged on the workload node. Currently, there's a quick fix that installs the current system version's binary on the workload node when a workload cmd is being executed. This change bakes that behavior into the test framework itself, so we can remove the binary staging from the Workload hook. Workload nodes are staged with all versioned binaries in the upgrade plan so all workload commands can use a versioned binary that matches the current state of the cluster. Resolves cockroachdb#147374 Epic: None Release note: None
aee214e to
9b2f590
Compare
|
resolved merged conflicts, just had to update the |
|
blathers auth-check |
|
You shall pass |
|
TFTR!! bors r=DarrylWong |
|
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #147374: branch-release-25.4. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 9b2f590 to blathers/backport-release-25.4-153269: POST https://api.github.com/repos/williamchoe3/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 25.4.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Fixes #147374
Problem
The current workload binary is no longer backwards compatible with certain workloads i.e.
bank(see original issue above for more information). Previously, the workload binary would just use whatever default binary was staged on the workload node by default. In a test, the test author could explicitly download an older version of workload to circumvent version compatibility issues or useWorkload'soverride(PR) parameter to download a binary matching the cluster version to the workload node when the workload hook is being executed.Change
While the later's solution works, it makes more sense to always just use a workload binary that matches the cluster. If a test wants to use a specific workload binary version though, that functionality remains possible as well.
During
startStep, while we are staging the cluster at it's initial version, we also stagethat version's binaryall the versioned binaries in the upgrade plan on the workload node(s). Previously we just left the workload node alone at this step.Then, introduced a new test framework specific stepstageWorkloadBinaryStepwhich gets added intest.Planner.afterUpgradeStepswhich get added to the plan after a cluster finalizes it's upgrade.stageWorkloadBinaryStepstages the finalized cluster version's binary on the workload node(s).Removed the call to
UploadCockroachfromWorkloadas when the workload init or run hooks are called, the cluster binary will already exist on the workload node.Keptoverrideparameter, but now if override is set,Workloadwill simply just respect the binary that get's passed into the command instead of defaulting the current cluster version.Added a new helper method to
Helperwhich can be used in hooks to abstract a lower level roachtest framework callAdded new mixedversion test option which will add the framework step to stage all versioned cockroach binaries in the upgrade plan to the WorkloadNode
Example
Given a plan e.g. during the
run startup hooks concurrentlyset of stepsVerification
Adhoc Nightly against all mixed version tests https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/20482572?buildTab=overview&expandBuildDeploymentsSection=false&expandBuildChangesSection=true&hideTestsFromDependencies=false&expandBuildTestsSection=true&hideProblemsFromDependencies=false&expandBuildProblemsSection=true
Added a
gomockfor Cluster in cluster/mock for unit testsAdded datadriven unit test that captures the new post upgrade internal framework step
While looking at
TestI saw these test specific fields, (not sure what this specific methodology is called), but just curious because my initial thought was to add a mock Cluster implementation, because I have workload node related logic that needs to call cluster interface methods. I guess i could've maybe went with something like below, but I personally don't like intertwining prod logic and test logic. I guess is there any reason we wouldn't want the mock cluster? Besides the boiler plate it creates in the unit tests?https://github.com/cockroachdb/cockroach/blob/b4738823bcc55547ff431b99684576370760e56d/pkg/cmd/roachtest/roachtestutil/mixedversion/planner_test.go
mixedversion test refactor
Key
⚠️ : can't add versioned call because workload logic not defined in a user hook.
👌: no workload, no change needed
➕: added mixedversion.WithWorkloadNodes(c.WorkloadNode()) test option to stage versioned binaries on workload node
✅: removed hardcoded "./cockroach workload ..." call to something like "%s workload... ", h.CockroachBinaryForWorkload(t)
❌: not refactoring bc it's not using bank and the logic is tighty coupled with shared non mixedversion roachtests so would need to decouple, which I think would make the test logic harder to understand and not worth having a 1 line helper
Test Changes
Notes
pkg/cmd/roachtest/clusterstats/BUILD.bazelI made some edits initially, removed them, and now ./dev gen is generating go_library / go_test deps in a different order 🤷♂️pkg/cmd/roachtest/roachtestutil/clusterupgrade/clusterupgrade.goinclusterupgradepkg,UploadWorkloadbut chose not to use it because it's a currently unused path that also eventually calls a deprecated Cluster API function andUploadCockroach&UploadWorkloadare both just wrappers arounduploadBinaryVersion