Skip to content

roachtest: port follower-reads/mixed-version/single-region to new framework#115317

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/fix115133
Dec 11, 2023
Merged

roachtest: port follower-reads/mixed-version/single-region to new framework#115317
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/fix115133

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Nov 30, 2023

Fixes #115133.

This commit ports the follower-reads/mixed-version/single-region roachtest to the new mixed-version framework, as described in #110528.

One note for reviewers is that the new mixed-version framework requires clusters to have 4 nodes (unless we want to disable fixtures), so part of the work here was to adapt the test to work with 4 nodes. That's worthwhile anyway, because testing the case where a gateway has no replica (leaseholder or follower) for the table is interesting.

Release note: None

@nvb nvb requested a review from kvoli November 30, 2023 06:33
@nvb nvb requested a review from a team as a code owner November 30, 2023 06:33
@nvb nvb requested review from DarrylWong and renatolabs and removed request for a team November 30, 2023 06:33
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb force-pushed the nvanbenschoten/fix115133 branch from 57a429d to b7312a5 Compare November 30, 2023 06:34
@nvb nvb changed the title roachtest: port ollower-reads/mixed-version/single-region to new framework roachtest: port follower-reads/mixed-version/single-region to new framework Nov 30, 2023
Copy link
Copy Markdown
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new mixed-version framework requires clusters to have 4 nodes (unless we want to disable fixtures)

I thought any subset of the fixtures is usable; i.e., InstallFixtures should work for 1<= |node| <= 4, no?

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, @nvanbenschoten, and @renatolabs)


pkg/cmd/roachtest/tests/follower_reads.go line 899 at r1 (raw file):

		// well with secure clusters. As of the time of writing, they return
		// either of the following errors:
		//  tls: failed to verify certificate: x509: “node” certificate is not standards compliant

@renatolabs Do we have a fix for this? Perhaps, something like,

tls.Config{InsecureSkipVerify: true}

since generating client certs signed by known authority sounds like more work.

Copy link
Copy Markdown
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought any subset of the fixtures is usable; i.e., InstallFixtures should work for 1<= |node| <= 4, no?

Answering my own question, unfortunately the answer is "no". The fixtures contain the gossiped metadata which refers to n4. Thus, while it's in theory possible to bootstrap 3-node clusters, upgrade and other things may fail until the reference to n4 is removed. We could attempt to remove it via,

s.db.PutInline(ctx, keys.NodeStatusKey(nodeID), nil)

as is done in the node decommissioning logic, but it seems like a can of worms. I suppose sticking with 4-nodes in this case is not a deficiency. For future use-cases, we could also generate fixtures for 3-nodes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, @nvanbenschoten, and @renatolabs)


pkg/cmd/roachtest/tests/follower_reads.go line 899 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

@renatolabs Do we have a fix for this? Perhaps, something like,

tls.Config{InsecureSkipVerify: true}

since generating client certs signed by known authority sounds like more work.

Unfortunately, it goes much deeper than that. getFollowerReadCounts (uses $adminURL/_status/vars) doesn't actually require authentication, which seem a bit surprising... I guess we allow metrics to be read by the whole internet?! However, verifySQLLatency (uses $adminURL/ts/query) requires password-based authentication. We currently don't have a nice programmatic API for doing that. I've documented the findings [1].

[1] #63145 (comment)

Copy link
Copy Markdown
Contributor

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @nvanbenschoten, and @renatolabs)


pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):

	var response tspb.TimeSeriesQueryResponse
	if err := httputil.PostProtobuf(ctx, http.Client{}, url, &request, &response); err != nil {

This may begin failing when querying v22.2 versions or earlier from #98077:

                                           n1          n2          n3
#released versions           v22.1.7    v22.1.7     v22.1.7
#logical binary versions      22.1        22.1        22.1
#cluster versions                 22.1        22.1        22.1
# Querying n1 here
....
400 Bad Request, content-type: application/json, body: {
  "error": "unknown field \"tenantId\" in tspb.Query",
  "code": 3,
  "message": "unknown field \"tenantId\" in tspb.Query",
  "details": [
  ]
}

A workaround is limiting the max upgrades, to prevent the cluster version from starting pre-22.2.

mixedversion.MaxUpgrades(1)

Copy link
Copy Markdown
Contributor

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @nvanbenschoten, and @renatolabs)


pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):

Previously, kvoli (Austen) wrote…

This may begin failing when querying v22.2 versions or earlier from #98077:

                                           n1          n2          n3
#released versions           v22.1.7    v22.1.7     v22.1.7
#logical binary versions      22.1        22.1        22.1
#cluster versions                 22.1        22.1        22.1
# Querying n1 here
....
400 Bad Request, content-type: application/json, body: {
  "error": "unknown field \"tenantId\" in tspb.Query",
  "code": 3,
  "message": "unknown field \"tenantId\" in tspb.Query",
  "details": [
  ]
}

A workaround is limiting the max upgrades, to prevent the cluster version from starting pre-22.2.

mixedversion.MaxUpgrades(1)

This is only an issue with the JSON http method used by roachtests/tests/ts_utils.go, ignore the above suggestion.

@nvb nvb requested a review from kvoli December 5, 2023 22:43
Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question, unfortunately the answer is "no".

Thanks for looking into this. Sticking with 4-nodes is actually a nice improvement to test coverage, so I'll keep it here. For other tests which are more sensitive to the node count, I imagine we will be more motivated to generate fixtures for 3-nodes.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, and @renatolabs)


pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):

Previously, kvoli (Austen) wrote…

This is only an issue with the JSON http method used by roachtests/tests/ts_utils.go, ignore the above suggestion.

This test does use httputil.PostJSON up above in verifySQLLatency. I'll switch that to PostProtobuf like you're doing in #115559.

I'm not sure why I didn't run into issues while stressing.

Copy link
Copy Markdown

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, and @srosenberg)


pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):

I'm not sure why I didn't run into issues while stressing.

I believe that's because verifySQLLatency is only called in multi-region mode, which is not used in the mixed-version test.


pkg/cmd/roachtest/tests/follower_reads.go line 899 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Unfortunately, it goes much deeper than that. getFollowerReadCounts (uses $adminURL/_status/vars) doesn't actually require authentication, which seem a bit surprising... I guess we allow metrics to be read by the whole internet?! However, verifySQLLatency(uses$adminURL/ts/query`) requires password-based authentication. We currently don't have a nice programmatic API for doing that. I've documented the findings [1].

[1] #63145 (comment)

FWIW, we have some prior art in calling HTTP endpoints in secure roachtests in [1]. Not a big deal in this case, at least for now. We should package this functionality in a more consumable way and at that point we can come back and update this test.

[1] https://github.com/cockroachdb/cockroach/blob/release-23.1/pkg/cmd/roachtest/tests/mixed_version_tenant_span_stats.go#L140

@nvb nvb force-pushed the nvanbenschoten/fix115133 branch from b7312a5 to fd25719 Compare December 11, 2023 04:38
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Dec 11, 2023

TFTRs!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 11, 2023

Build failed:

…mework

Fixes cockroachdb#115133.

This commit ports the `follower-reads/mixed-version/single-region`
roachtest to the new mixed-version framework, as described in cockroachdb#110528.

One note for reviewers is that the new mixed-version framework requires
clusters to have 4 nodes (unless we want to disable fixtures), so part
of the work here was to adapt the test to work with 4 nodes. That's
worthwhile anyway, because testing the case where a gateway has no
replica (leaseholder or follower) for the table is interesting.

Another note is that the fixtures that the framework restores already
have a database called "test", so this commit renames its own multi-region
database to "mr_db" to avoid confusion.

Release note: None
@nvb nvb force-pushed the nvanbenschoten/fix115133 branch from fd25719 to bbf5ad4 Compare December 11, 2023 05:15
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Dec 11, 2023

Fixed unused lint warning for upgradeNodes, which was due to skew with 55e2d6f.

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 11, 2023

Build succeeded:

@craig craig bot merged commit 8f91e03 into cockroachdb:master Dec 11, 2023
@renatolabs
Copy link
Copy Markdown

Should we backport this to 23.2 (or even 23.1)?

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Jan 4, 2024

Backported to v23.2 and v23.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: port follower-reads/mixed-version/single-region to the new mixed-version framework

5 participants