roachtest: port tpcc/mixed-headroom to the new framework#113706
roachtest: port tpcc/mixed-headroom to the new framework#113706craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
48e0bc2 to
292a48c
Compare
|
RFAL Passed on gceworker + TC job |
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):
// `bank` dataset, and runs one or multiple database upgrades while a // TPCC workload is running. The number of database upgrades is // controlled by the `versionsToUpgrade` parameter.
This needs updating (no more versionsToUpgrade). It would be good to include a high level idea of how it works (I assume mixedversion chooses a random older but supported version to start with?)
pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):
} uploadVersion(ctx, t, c, workloadNode, clusterupgrade.CurrentVersion())
At which point in this test sequence do we start the upgrade process? Is it at some point during the runTPCCWorkload because that is the InMixedVersion step?
DarrylWong
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):
Previously, RaduBerinde wrote…
This needs updating (no more versionsToUpgrade). It would be good to include a high level idea of how it works (I assume
mixedversionchooses a random older but supported version to start with?)
Good catch. Done.
pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):
Previously, RaduBerinde wrote…
At which point in this test sequence do we start the upgrade process? Is it at some point during the
runTPCCWorkloadbecause that is theInMixedVersionstep?
The upgrade process is started after OnStartup but before InMixedVersion, where InMixedVersion hooks will be called once per upgrade step. It's randomized when, but the general idea is if we are going from v23.1 to v23.2, then at any point before it upgrades an individual node, it could call runTPCCWorkload, doing so once per cluster upgrade/downgrade step.
Taking a peek at the planning output in test.log probably gives a better idea of a potential plan than what I can explain. (I had to save it locally for the formatting to be nice btw)
pkg/cmd/roachtest/tests/tpcc.go
Outdated
| runTPCCWorkload := func(ctx context.Context, l *logger.Logger, rng *rand.Rand, h *mixedversion.Helper) error { | ||
| cmd := roachtestutil.NewCommand("./cockroach workload run tpcc"). | ||
| Arg("{pgurl%s}", crdbNodes). | ||
| Flag("duration", rampDuration(c.IsLocal())). |
There was a problem hiding this comment.
Probably better to have the duration be separate from rampDuration.
Also, I think the following change would be welcome in order to test the migrations:
workloadDur := 10 * time.Minute
rampDur := rampDuration(c.IsLocal())
if h.Context().Finalizing && !c.IsLocal() {
rampDur = 1 * time.Minute
workloadDur = 100 * time.Minute
}We want the workload to ramp up faster when migrations are running to expose them to more concurrent load. Also letting the workload run for longer in that case is good, and mimics what the previous version of this test was doing.
Does that make sense?
There was a problem hiding this comment.
Done.
I'll give 100 minutes a test run but are we concerned at all that this will take too long? I'm not sure what the chances are but could it possibly call runTPCCWorkload every time while migrations are happening and all of a sudden this test takes 10+ hours to finish?
renatolabs
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):
I had to save it locally for the formatting to be nice btw
If anyone uses firefox, the Repair Text Encoding option should fix it:
Not sure if there is a way to tell TC an artifact is UTF-8 encoded so that this wouldn't be necessary.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):
Previously, DarrylWong wrote…
Good catch. Done.
I don't think you pushed the new version.
pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
I had to save it locally for the formatting to be nice btw
If anyone uses firefox, the
Repair Text Encodingoption should fix it:Not sure if there is a way to tell TC an artifact is UTF-8 encoded so that this wouldn't be necessary.
Very cool, thanks!
DarrylWong
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):
Previously, RaduBerinde wrote…
I don't think you pushed the new version.
Ah my bad, saw Renato's comment last second and got distracted, it's pushed now.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):
Previously, DarrylWong wrote…
Ah my bad, saw Renato's comment last second and got distracted, it's pushed now.
Thanks, looks great!
renatolabs
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 412 at r1 (raw file):
every time while migrations are happening and all of a sudden this test takes 10+ hours to finish
We perform up to 3 upgrades by default, so worst case scenario would be 300 minutes or 5h, which is long but not the longest roachtest to exist. That said, I didn't realize the existing roachtest only uses 100 minutes if we are upgrading to current, so we can do that here too, i.e., only bump the duration if we are upgrading to the current version:
workloadDur := 10 * time.Minute
rampDur := rampDuration(c.IsLocal())
if h.Context().Finalizing && !c.IsLocal() {
rampDur = 1 * time.Minute
if h.Context().ToVersion.IsCurrent() {
workloadDur = 100 * time.Minute
}
}Note that the new framework randomizes how many version upgrades occur. This removes the need for both a single upgrade and multiple upgrade test and the two were merged. Release note: None Fixes: cockroachdb#110537
DarrylWong
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go line 412 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
every time while migrations are happening and all of a sudden this test takes 10+ hours to finish
We perform up to 3 upgrades by default, so worst case scenario would be 300 minutes or 5h, which is long but not the longest roachtest to exist. That said, I didn't realize the existing roachtest only uses 100 minutes if we are upgrading to current, so we can do that here too, i.e., only bump the duration if we are upgrading to the current version:
workloadDur := 10 * time.Minute rampDur := rampDuration(c.IsLocal()) if h.Context().Finalizing && !c.IsLocal() { rampDur = 1 * time.Minute if h.Context().ToVersion.IsCurrent() { workloadDur = 100 * time.Minute } }
Done.
renatolabs
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)
|
TFTRs! bors r=renatolabs, RaduBerinde |
|
Build succeeded: |
|
blathers backport 23.1 |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 8cdcf77 to blathers/backport-release-23.1-113706: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |

Note that the new framework randomizes how many version upgrades occur. This removes the need for both a single upgrade and multiple upgrade test and the two were merged.
Release note: None
Fixes: #110537