roachtest: port follower-reads/mixed-version/single-region to new framework#115317
roachtest: port follower-reads/mixed-version/single-region to new framework#115317craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
57a429d to
b7312a5
Compare
srosenberg
left a comment
There was a problem hiding this comment.
new mixed-version framework requires clusters to have 4 nodes (unless we want to disable fixtures)
I thought any subset of the fixtures is usable; i.e., InstallFixtures should work for 1<= |node| <= 4, no?
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, @nvanbenschoten, and @renatolabs)
pkg/cmd/roachtest/tests/follower_reads.go line 899 at r1 (raw file):
// well with secure clusters. As of the time of writing, they return // either of the following errors: // tls: failed to verify certificate: x509: “node” certificate is not standards compliant
@renatolabs Do we have a fix for this? Perhaps, something like,
tls.Config{InsecureSkipVerify: true}since generating client certs signed by known authority sounds like more work.
There was a problem hiding this comment.
I thought any subset of the fixtures is usable; i.e.,
InstallFixturesshould work for1<= |node| <= 4, no?
Answering my own question, unfortunately the answer is "no". The fixtures contain the gossiped metadata which refers to n4. Thus, while it's in theory possible to bootstrap 3-node clusters, upgrade and other things may fail until the reference to n4 is removed. We could attempt to remove it via,
s.db.PutInline(ctx, keys.NodeStatusKey(nodeID), nil)
as is done in the node decommissioning logic, but it seems like a can of worms. I suppose sticking with 4-nodes in this case is not a deficiency. For future use-cases, we could also generate fixtures for 3-nodes.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, @nvanbenschoten, and @renatolabs)
pkg/cmd/roachtest/tests/follower_reads.go line 899 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
@renatolabs Do we have a fix for this? Perhaps, something like,
tls.Config{InsecureSkipVerify: true}since generating client certs signed by known authority sounds like more work.
Unfortunately, it goes much deeper than that. getFollowerReadCounts (uses $adminURL/_status/vars) doesn't actually require authentication, which seem a bit surprising... I guess we allow metrics to be read by the whole internet?! However, verifySQLLatency (uses $adminURL/ts/query) requires password-based authentication. We currently don't have a nice programmatic API for doing that. I've documented the findings [1].
[1] #63145 (comment)
kvoli
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @nvanbenschoten, and @renatolabs)
pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):
var response tspb.TimeSeriesQueryResponse if err := httputil.PostProtobuf(ctx, http.Client{}, url, &request, &response); err != nil {
This may begin failing when querying v22.2 versions or earlier from #98077:
n1 n2 n3
#released versions v22.1.7 v22.1.7 v22.1.7
#logical binary versions 22.1 22.1 22.1
#cluster versions 22.1 22.1 22.1
# Querying n1 here
....
400 Bad Request, content-type: application/json, body: {
"error": "unknown field \"tenantId\" in tspb.Query",
"code": 3,
"message": "unknown field \"tenantId\" in tspb.Query",
"details": [
]
}
A workaround is limiting the max upgrades, to prevent the cluster version from starting pre-22.2.
mixedversion.MaxUpgrades(1)
kvoli
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @nvanbenschoten, and @renatolabs)
pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):
Previously, kvoli (Austen) wrote…
This may begin failing when querying v22.2 versions or earlier from #98077:
n1 n2 n3 #released versions v22.1.7 v22.1.7 v22.1.7 #logical binary versions 22.1 22.1 22.1 #cluster versions 22.1 22.1 22.1 # Querying n1 here .... 400 Bad Request, content-type: application/json, body: { "error": "unknown field \"tenantId\" in tspb.Query", "code": 3, "message": "unknown field \"tenantId\" in tspb.Query", "details": [ ] }A workaround is limiting the max upgrades, to prevent the cluster version from starting pre-22.2.
mixedversion.MaxUpgrades(1)
This is only an issue with the JSON http method used by roachtests/tests/ts_utils.go, ignore the above suggestion.
nvb
left a comment
There was a problem hiding this comment.
Answering my own question, unfortunately the answer is "no".
Thanks for looking into this. Sticking with 4-nodes is actually a nice improvement to test coverage, so I'll keep it here. For other tests which are more sensitive to the node count, I imagine we will be more motivated to generate fixtures for 3-nodes.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, and @renatolabs)
pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):
Previously, kvoli (Austen) wrote…
This is only an issue with the JSON http method used by
roachtests/tests/ts_utils.go, ignore the above suggestion.
This test does use httputil.PostJSON up above in verifySQLLatency. I'll switch that to PostProtobuf like you're doing in #115559.
I'm not sure why I didn't run into issues while stressing.
renatolabs
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, and @srosenberg)
pkg/cmd/roachtest/tests/follower_reads.go line 741 at r1 (raw file):
I'm not sure why I didn't run into issues while stressing.
I believe that's because verifySQLLatency is only called in multi-region mode, which is not used in the mixed-version test.
pkg/cmd/roachtest/tests/follower_reads.go line 899 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Unfortunately, it goes much deeper than that.
getFollowerReadCounts(uses$adminURL/_status/vars) doesn't actually require authentication, which seem a bit surprising... I guess we allow metrics to be read by the whole internet?! However,verifySQLLatency(uses$adminURL/ts/query`) requires password-based authentication. We currently don't have a nice programmatic API for doing that. I've documented the findings [1].[1] #63145 (comment)
FWIW, we have some prior art in calling HTTP endpoints in secure roachtests in [1]. Not a big deal in this case, at least for now. We should package this functionality in a more consumable way and at that point we can come back and update this test.
b7312a5 to
fd25719
Compare
|
TFTRs! bors r+ |
|
Build failed: |
…mework Fixes cockroachdb#115133. This commit ports the `follower-reads/mixed-version/single-region` roachtest to the new mixed-version framework, as described in cockroachdb#110528. One note for reviewers is that the new mixed-version framework requires clusters to have 4 nodes (unless we want to disable fixtures), so part of the work here was to adapt the test to work with 4 nodes. That's worthwhile anyway, because testing the case where a gateway has no replica (leaseholder or follower) for the table is interesting. Another note is that the fixtures that the framework restores already have a database called "test", so this commit renames its own multi-region database to "mr_db" to avoid confusion. Release note: None
fd25719 to
bbf5ad4
Compare
|
Fixed unused lint warning for bors r+ |
|
Build succeeded: |
|
Should we backport this to 23.2 (or even 23.1)? |
|
Backported to v23.2 and v23.1. |
Fixes #115133.
This commit ports the
follower-reads/mixed-version/single-regionroachtest to the new mixed-version framework, as described in #110528.One note for reviewers is that the new mixed-version framework requires clusters to have 4 nodes (unless we want to disable fixtures), so part of the work here was to adapt the test to work with 4 nodes. That's worthwhile anyway, because testing the case where a gateway has no replica (leaseholder or follower) for the table is interesting.
Release note: None