Skip to content

roachtest: move schemachange/mixed-version to new framework#113890

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
annrpom:refactor-schng-mv
Nov 20, 2023
Merged

roachtest: move schemachange/mixed-version to new framework#113890
craig[bot] merged 1 commit intocockroachdb:masterfrom
annrpom:refactor-schng-mv

Conversation

@annrpom
Copy link
Copy Markdown
Contributor

@annrpom annrpom commented Nov 6, 2023

This patch updates the schemachange/mixed-versions roachtest to use the new mixed version testing framework - which in turn provides testing enhancements.

Fixes: #110533
Epic: None

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@annrpom annrpom changed the title roachtest: move schemachange/mixed-version to new framework [ignore] roachtest: move schemachange/mixed-version to new framework Nov 6, 2023
@annrpom annrpom force-pushed the refactor-schng-mv branch 6 times, most recently from a580aa6 to b462311 Compare November 8, 2023 16:29
@annrpom annrpom changed the title [ignore] roachtest: move schemachange/mixed-version to new framework roachtest: move schemachange/mixed-version to new framework Nov 8, 2023
@annrpom annrpom force-pushed the refactor-schng-mv branch 8 times, most recently from 58dd2cc to eb244c2 Compare November 9, 2023 21:24
@annrpom annrpom requested a review from rafiss November 9, 2023 21:25
@annrpom annrpom marked this pull request as ready for review November 9, 2023 21:29
@annrpom annrpom requested a review from a team as a code owner November 9, 2023 21:29
@annrpom annrpom requested review from DarrylWong, herkolategan and renatolabs and removed request for a team November 9, 2023 21:29
Copy link
Copy Markdown
Contributor

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let rafi or renato comment on correctness since my understanding of schemachange is limited here. I left a few comments though. Looks good 👍

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @annrpom, @herkolategan, @rafiss, and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 58 at r1 (raw file):

}

func newSchemaChangeMixedVersionTester(

In my opinion I don't think this test is complicated enough to need a tester struct/function. Unless there's future plans to add more tests with different parameters you could get away with removing it and moving uploadAndInitSchemaChangeWorkload and runSchemaChangeWorkloadStepAndValidate into runSchemaChangeMixedVersions.

But i'm not married to the idea if someone else feels strongly about it 🤷


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 76 at r1 (raw file):

	ctx context.Context, l *logger.Logger, rand *rand.Rand, h *mixedversion.Helper,
) error {
	if err := scmvt.c.PutE(ctx, l, scmvt.t.DeprecatedWorkload(), "./workload", scmvt.c.All()); err != nil {

I think the other tests upload the workload outside of the mixed version testing framework? I don't think it would be wrong to do it in a startup hook but maybe we should keep it consistent.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 79 at r1 (raw file):

		return err
	}
	if err := scmvt.c.PutE(ctx, l, scmvt.t.Cockroach(), "./cockroach-doctor", scmvt.c.All()); err != nil {

I wonder if we if need to upload a new cockroach binary here or if you could just use test.DefaultCockroachPath? My memory is a little fuzzy but I think the non current releases shouldn't overwrite the default binary path that is uploaded at the start of every test.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 92 at r1 (raw file):

	scmvt.numFeatureRuns += 1
	l.Printf("Workload step run: %d", scmvt.numFeatureRuns)
	runCmd := []string{

Not a blocker or necessary but it would be nice if we started using roachtestutil.NewCommand for stuff like this. Just makes it nicer to read IMO.

Code snippet:

runCmd := roachtestutil.NewCommand("./workload run schemachange").
	Flag("verbose", 1).
	Flag("max-ops", scmvt.maxOps).
	Flag("concurrency", scmvt.concurrency).
	Arg("{pgurl:1-%d}", scmvt.c.Spec().NodeCount).
	String()

pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 109 at r1 (raw file):

	}
	// Now we validate that nothing is broken after the random schema changes have been run.
	runCmd = []string{

Same as above wrt using roachtestutil.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 128 at r1 (raw file):

) {
	scmvt := newSchemaChangeMixedVersionTester(c, t, maxOps, concurrency, 0 /* numFeatureRuns */)
	mvt := mixedversion.NewTest(ctx, t, t.L(), c, c.All(), mixedversion.AlwaysUseLatestPredecessors, mixedversion.AlwaysUseFixtures)

Do we need to always use fixtures here? I see that's how it was before but unless it's a correctness issue it'd be better to leave it out. (I have no idea if it is).


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 131 at r1 (raw file):

	schemaChangeAndValidationStep := scmvt.runSchemaChangeWorkloadStepAndValidate
	if buildVersion.Major() < 20 {

I don't think the new framework will ever run on a version this old.

Copy link
Copy Markdown
Contributor Author

@annrpom annrpom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your review & for your patience with me/my questions heh

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, @rafiss, and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 58 at r1 (raw file):

Previously, DarrylWong wrote…

In my opinion I don't think this test is complicated enough to need a tester struct/function. Unless there's future plans to add more tests with different parameters you could get away with removing it and moving uploadAndInitSchemaChangeWorkload and runSchemaChangeWorkloadStepAndValidate into runSchemaChangeWorkloadStepAndValidate.

But i'm not married to the idea if someone else feels strongly about it 🤷

true - we spoke offline about this, but for the sake of tracking: i did this for my own readability 💯, but i kind of realize that yes it is a bit unnecessary~ i just went ahead and changed it


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 76 at r1 (raw file):

Previously, DarrylWong wrote…

I think the other tests upload the workload outside of the mixed version testing framework? I don't think it would be wrong to do it in a startup hook but maybe we should keep it consistent.

done


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 79 at r1 (raw file):

I wonder if we if need to upload a new cockroach binary here or if you could just use test.DefaultCockroachPath

right, it seems to me like we can just use test.DefaultCockroachPath here


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 92 at r1 (raw file):

Previously, DarrylWong wrote…

Not a blocker or necessary but it would be nice if we started using roachtestutil.NewCommand for stuff like this. Just makes it nicer to read IMO.

done


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 109 at r1 (raw file):

Previously, DarrylWong wrote…

Same as above wrt using roachtestutil.

done


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 128 at r1 (raw file):

Previously, DarrylWong wrote…

Do we need to always use fixtures here? I see that's how it was before but unless it's a correctness issue it'd be better to leave it out. (I have no idea if it is).

i am under the impression that the random schemachange workload should be able to perform schema changes on some data that exists previously (but not necessarily need such data), but i am also under the impression that what we want would be a correctness check (which doesn't seem to align with needing to always use fixtures)

anyone feel free to correct me if i am wrong!


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 131 at r1 (raw file):

Previously, DarrylWong wrote…

I don't think the new framework will ever run on a version this old.

ah good point 💯

@annrpom annrpom requested a review from DarrylWong November 10, 2023 20:16
Copy link
Copy Markdown

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions and small modifications to the test. Thanks for working on this!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @rafiss)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 79 at r1 (raw file):

Previously, annrpom (annie pompa) wrote…

I wonder if we if need to upload a new cockroach binary here or if you could just use test.DefaultCockroachPath

right, it seems to me like we can just use test.DefaultCockroachPath here

Confirming this is safe, previous release binaries are namespaced in their own directories.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 33 at r2 (raw file):

		// This tests the work done for 20.1 that made schema changes jobs and in
		// addition prevented making any new schema changes on a mixed cluster in
		// order to prevent bugs during upgrades.

This comment seems outdated now?


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 40 at r2 (raw file):

		Run: func(ctx context.Context, t test.Test, c cluster.Cluster) {
			maxOps := 100
			concurrency := 5

For consideration and my own understanding: why do we choose these values? 100 ops seems like very little, could we change this to at least 1000?


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

) {
	numFeatureRuns := 0
	mvt := mixedversion.NewTest(ctx, t, t.L(), c, c.All(), mixedversion.AlwaysUseLatestPredecessors)

Just curious: do we "rediscover" bugs if we run it without AlwaysUseLatestPredecessors? If so, which ones?


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 57 at r2 (raw file):

	mvt := mixedversion.NewTest(ctx, t, t.L(), c, c.All(), mixedversion.AlwaysUseLatestPredecessors)

	if err := c.PutE(ctx, t.L(), t.DeprecatedWorkload(), "./workload", c.All()); err != nil {

A more common setup for these tests is to have the workload running on an arbitrary (but generally the last) node in the cluster and having the workload itself pick the gateway crdb node (which most workloads already do).

In other words, we don't need to upload workload to every node 🙂


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 57 at r2 (raw file):

	mvt := mixedversion.NewTest(ctx, t, t.L(), c, c.All(), mixedversion.AlwaysUseLatestPredecessors)

	if err := c.PutE(ctx, t.L(), t.DeprecatedWorkload(), "./workload", c.All()); err != nil {

Nit: can be just c.Put()


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 63 at r2 (raw file):

	// Run the schemachange workload on a random node, along with validating the schema changes for the cluster on a random node.
	schemaChangeAndValidationStep := func(
		ctx context.Context, l *logger.Logger, rand *rand.Rand, h *mixedversion.Helper,

Small comment: I thought we had a linter that stopped us from having variable names that shadowed package names? It seems we don't anymore. I could swear I've hit that linter before 😄


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 66 at r2 (raw file):

	) error {
		numFeatureRuns += 1
		l.Printf("Workload step run: %d", numFeatureRuns)

For my own understanding: what value do you get out of this counter? It seems easy enough to know how many times the workload has run by looking at the test plan.

Nothing wrong with the counter, just want to understand if there's some usability problem to solve.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 67 at r2 (raw file):

		numFeatureRuns += 1
		l.Printf("Workload step run: %d", numFeatureRuns)
		runCmd := roachtestutil.NewCommand("./workload run schemachange").

Now that you added support for deterministic schemachange operations (:tada:), we should make use of it here to help reproducing test failures with something like:

workloadSeed := rng.Int63()
roachtestutil.NewCommand("COCKROACH_RANDOM_SEED=%d ./workload run schemachange", workloadSeed)

pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 71 at r2 (raw file):

			Flag("max-ops", maxOps).
			Flag("concurrency", concurrency).
			Arg("{pgurl:1-%d}", c.Spec().NodeCount).

Nit: This could be ("{pgurl:%s}", c.All()). Preferred because it's nice to keep the representation of the nodes internal.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 81 at r2 (raw file):

		if c.IsLocal() {
			// TODO(before merge): is there a c.localCertsDir equivalent to use here?
			fmt.Println("ignore lint err for now")

Not sure I understand what's happening here. Let me know if you have questions about the secure cluster setup.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 84 at r2 (raw file):

		}
		// Now we validate that nothing is broken after the random schema changes have been run.
		runCmd = roachtestutil.NewCommand("%s debug doctor examine cluster", test.DefaultCockroachPath).

For my own education: what does this command do?


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 88 at r2 (raw file):

			String()
		return c.RunE(ctx,
			option.NodeListOption{h.RandomNode(rand, workloadNodes)},

Reminder that we should probably just have a single workload node.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 96 at r2 (raw file):

		//if err := c.PutE(ctx, t.L(), t.Cockroach(), "./cockroach-doctor", c.All()); err != nil {
		//	return err
		//}

Can be removed?


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 97 at r2 (raw file):

		//	return err
		//}
		return c.RunE(ctx, c.All(), fmt.Sprintf("./workload init schemachange {pgurl%s}", c.All()))

This is running workload init on every node. Probably makes more sense to run it only once.

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall! my comments are all pretty minor

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

Just curious: do we "rediscover" bugs if we run it without AlwaysUseLatestPredecessors? If so, which ones?

more of a meta-question, but what value do we get out of the default behavior of not using the latest predecessor? i see two cases:

  • we have not discovered a bug yet. in this case, the bug will exist in the latest predecessor, as well as in versions before the latest predecessor.
  • we have discovered the bug. in this case, only more recent predecessor versions will have the fix, and if we test with older versions, we are liable to "re-discover" the bug.

cc: @rafiss would you by chance know why this is happening? this "alter column irrelevantcolumnname" seems to have existed during operation generation from 21.2.0

I am not sure.. but I see that the error is ERROR: index "IrrelevantIndexName" does not exist, but the commands above are actually touching IrrelevantColumnName. is something missing from the logs?


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 63 at r2 (raw file):

Previously, annrpom (annie pompa) wrote…

O.O

nit: rename rand to r or rng


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 29 at r4 (raw file):

	r.Add(registry.TestSpec{
		// schemachange/mixed-versions tests random schema changes (via the schemachange workload)
		// in a mixed version state, validating that the cluster is still healthy (via .

nit: the comment is incomplete


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 64 at r4 (raw file):

		numFeatureRuns += 1
		l.Printf("Workload step run: %d", numFeatureRuns)
		workloadSeed := rand.Int63()

nit: let's get the seed using _, workloadSeed := randutil.NewTestRand()

i believe that does something a bit smarter so that tests are seeded consistently across runs based on their name. actually, i'm not sure if that applies to roachtests or only to unit tests, but probably better to make the way we get a seed consistent across the codebase.

@renatolabs renatolabs requested a review from rafiss November 14, 2023 17:32
Copy link
Copy Markdown

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @rafiss)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

what value do we get out of the default behavior of not using the latest predecessor

  • An upgrade path closer to what customers might actually go through. Not everyone is running the latest patch release at all times, especially in self-hosted.
  • A bug may only become apparent later. For example, v22.2.4 and below could have some behaviour that only becomes relevant when we start running 23.1.
  • Just because a bug exists across all patch releases does not mean they are equally likely to manifest in all of them. This is especially the case historically when larger backports happened somewhat regularly.

These were some of the motivations for the default behaviour. That said, it does come with the trade-off that we might re-discover bugs. In cases where that's a problem for a test, the "always use latest" option is available. I think tests should use the default when possible.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 64 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: let's get the seed using _, workloadSeed := randutil.NewTestRand()

i believe that does something a bit smarter so that tests are seeded consistently across runs based on their name. actually, i'm not sure if that applies to roachtests or only to unit tests, but probably better to make the way we get a seed consistent across the codebase.

I'd recommend not using that. That package is primarily written with unit tests in mind, doing string matching on test function names, and that doesn't translate well to roachtests.

In addition, these mixedversion functions should see the same rand.Rand instances when the same COCKROACH_RANDOM_SEED is passed. That should be the main way to reproduce test failures and get consistent random choices across runs:

https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/roachtestutil/mixedversion/README.md#reproducing-failures

Copy link
Copy Markdown
Contributor Author

@annrpom annrpom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, @rafiss, and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 40 at r2 (raw file):

Previously, annrpom (annie pompa) wrote…

good question - I think Stan asked this question 2 years ago (https://cockroachlabs.slack.com/archives/C0331MYJ8BW/p1649129613095089?thread_ts=1649084584.440829&cid=C0331MYJ8BW). I asked a colleague and they said: "The only reason was intermittent failures might get worse".

i shall bump the number of maxOps 👯 and stress it

1000 ops seems ok - i don't see any intermittent failures getting worse and throwing an unexpected error or anything. let's stick with this and see how the nightlies like it


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

what value do we get out of the default behavior of not using the latest predecessor

  • An upgrade path closer to what customers might actually go through. Not everyone is running the latest patch release at all times, especially in self-hosted.
  • A bug may only become apparent later. For example, v22.2.4 and below could have some behaviour that only becomes relevant when we start running 23.1.
  • Just because a bug exists across all patch releases does not mean they are equally likely to manifest in all of them. This is especially the case historically when larger backports happened somewhat regularly.

These were some of the motivations for the default behaviour. That said, it does come with the trade-off that we might re-discover bugs. In cases where that's a problem for a test, the "always use latest" option is available. I think tests should use the default when possible.

For posterity: we were running into some weird command failure with cockroach debug doctor examine:

run\_154213.137443734\_n1\_cockroach-debug-doct: 15:42:13 cluster.go:2189: > ./cockroach debug doctor examine cluster --certs-dir certs
ERROR: current transaction is aborted, commands ignored until end of transaction block
SQLSTATE: 25P02

The trend looked like it fell under the fact that we would stage the current binary on every mixed-version setup - meaning that there would be times where we would see failures due to the fact that the workload really only works 1 back from the current version it is on.

The solution discussed with Rafi was to set AlwaysUseLatestPredecessor, but Renato mentioned we could just stage the workload binary to align with the upgrade being performed.

I will do the latter in a follow-up PR

I tried working this out with

if err := c.Stage(ctx,  t.L(),  "workload",  helper.Context().FromVersion.Version.String(),  "",  workloadNode,  ); err != nil {  
return err  
}

in my OnStartup hook, but end up with an incorrect workload link - it points to cockroach-edge storage bucket and the suffix of the URL looks something like cockroach/workload.v23.1.11 - am I doing something wrong here?


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 63 at r2 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: rename rand to r or rng

done


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 29 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: the comment is incomplete

done

Copy link
Copy Markdown
Contributor

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with the latest predecessor issue but besides that it lgtm

Copy link
Copy Markdown

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work! 👏

The only comment I'd recommend addressing before merging is the one about --local mode. Others are nice to have but can be follow up work.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @rafiss)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):
A couple of comments:

The solution discussed with Rafi was to set AlwaysUseLatestPredecessor

I'm confused how this is related to the schemachange workload only working up to 1 version back. AlwaysUseLatestPredecessors forces the test to always to pick the latest patch release when using previous releases; workload would not be involved.

FWIW, I ran this test 20 times on my gceworker without AlwaysUseLatestPredecessors and didn't observe any failures. I still think it's a good idea to not use this option unless it flakes. It's your team's call, but IMO the test is stronger without it.

Renato mentioned we could just stage the workload binary to align with the upgrade being performed

This suggestion was to deal with the fact that workload schemachange only works up to 1 version back. It would unfortunately be a little more complicated than the code you shared, which makes me think that roachtest should provide a higher level API. I'll add this to test eng's backlog, so you can ignore this suggestion for now.

looked like it fell under the fact that we would stage the current binary on every mixed-version setup

Do we understand why debug doctor would fail in this way? And more generally, if the test failed in a similar way in a nightly run, would SQL Foundations have enough information in the artifacts to diagnose the issue? This debug doctor command should be a bit more verbose and mention which statement failed; unless this information is available somewhere other than stdout/stderr.


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 84 at r5 (raw file):

		runCmd = roachtestutil.NewCommand("%s debug doctor examine cluster", test.DefaultCockroachPath).
			Flag("certs-dir", certs).
			String()

I'd strongly recommend adding something like the code below so that this test will work in local mode:

randomNode := h.RandomNode(r, c.All())
doctorURL := fmt.Sprintf("{pgurl:%d}", randomNode)

  // ...
  Flag("url", doctorURL).
  String()

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

The solution discussed with Rafi was to set AlwaysUseLatestPredecessor, but Renato mentioned we could just stage the workload binary to align with the upgrade being performed.

small correction: we had discussed using MaxUpgrades(1) to solve the problem of the workload only being compatible with the branch it was build from and the major version before that.

@annrpom
Copy link
Copy Markdown
Contributor Author

annrpom commented Nov 15, 2023

we had discussed using MaxUpgrades(1) to solve the problem of the workload only being compatible with the branch it was build from and the major version before that.

my memory serves me wrong everytime (apologies) - i just had a misunderstanding of AlwaysUseLatestPredecessors and accidentally put some words in Rafi's mouth 💯

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

Just because a bug exists across all patch releases does not mean they are equally likely to manifest in all of them. This is especially the case historically when larger backports happened somewhat regularly.

i don't really follow the intuition for this. if a bug is undiscovered, i'd expect it to be just as likely to appear in the latest patch release

also, i feel like not using the latest predecessor will stop making sense as soon as we discover the first bug for which we need to backport a fix. after that happens, then we know for certain that running on earlier predecessor versions is likely to hit the same bug.

@renatolabs renatolabs requested a review from rafiss November 15, 2023 20:00
Copy link
Copy Markdown

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @rafiss)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

if a bug is undiscovered, i'd expect it to be just as likely to appear in the latest patch release

I think there is enough non-determinism in every component of the database that backported changes can make certain ordering of events (and their implications) more or less likely to occur. I unfortunately don't have a good example offhand of a bug that would only be revealed by using older versions, but the codebase is complex enough that I believe it's not hard to imagine that it's not impossible to have certain things be more likely in certain patch releases than in others.

I also think this is more important when the feature tested involves dealing with persisted state (in this case, jobs, descriptors, etc).

not using the latest predecessor will stop making sense as soon as we discover the first bug for which we need to backport a fix

I agree; My (again, optional) suggestion is to not have that option unless it's known to be necessary.

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DarrylWong, @herkolategan, and @renatolabs)


pkg/cmd/roachtest/tests/mixed_version_schemachange.go line 55 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

if a bug is undiscovered, i'd expect it to be just as likely to appear in the latest patch release

I think there is enough non-determinism in every component of the database that backported changes can make certain ordering of events (and their implications) more or less likely to occur. I unfortunately don't have a good example offhand of a bug that would only be revealed by using older versions, but the codebase is complex enough that I believe it's not hard to imagine that it's not impossible to have certain things be more likely in certain patch releases than in others.

I also think this is more important when the feature tested involves dealing with persisted state (in this case, jobs, descriptors, etc).

not using the latest predecessor will stop making sense as soon as we discover the first bug for which we need to backport a fix

I agree; My (again, optional) suggestion is to not have that option unless it's known to be necessary.

i see. well i guess i'd be fine either way, just getting paranoid about avoiding flakes

This patch updates the `schemachange/mixed-versions` roachtest to use
the new mixed version testing framework - which in turn provides testing
enhancements.

Fixes: cockroachdb#110533
Epic: None

Release note: None
@annrpom
Copy link
Copy Markdown
Contributor Author

annrpom commented Nov 20, 2023

I'd strongly recommend adding something like the code below so that this test will work in local mode:

done!

I have an issue open on the debug doctor command here: #114722

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 20, 2023

Build succeeded:

@craig craig bot merged commit 724e09b into cockroachdb:master Nov 20, 2023
@renatolabs
Copy link
Copy Markdown

Should we backport this to 23.2 / 23.1?

@annrpom
Copy link
Copy Markdown
Contributor Author

annrpom commented Nov 22, 2023

blathers backport 23.2 23.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: port schemachange/mixed-versions to new mixed-version framework

5 participants