roachprod: remove default port assumption in start and start-sql#114097
roachprod: remove default port assumption in start and start-sql#114097craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
2e2af7a to
5392f16
Compare
|
drive-by comment: a lot of the ORM tests rely on a known port. i do think there might be a way to refactor a lot of them to be aware of this. |
|
Thanks for the heads up! And yeah, the ORM tests aren't alone in this. I got over 100 roachtest fails 😢 so refactoring them to be compatible will be the bulk of this PR. |
5b7e957 to
df4e698
Compare
Another drive-by comment: the intention is just to change the default port and remove assumptions on any particular assignment. Individual tests that rely on a specific configuration, such as ORM tests, should be able to choose a ports to bind but that is now made explicit by setting |
8923adf to
822a673
Compare
2a7184b to
a022097
Compare
27617fc to
a8f2747
Compare
DarrylWong
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan and @srosenberg)
pkg/cmd/roachtest/roachtestutil/utils.go line 40 at r1 (raw file):
I suppose most tests don't care about the node id, right?
Correct.
we can randomize the index
Yes, I think that would be easy enough. Unfortunately I think the problem is that this randomization needs to happen outside of this helper.
i.e., if we're calling c.Run(ctx, c.Node(1), "./cockroach workload fixtures import tpcc '{url}'"), the url would have to be for node 1.
The solution would be to refactor all the c.Node(1) calls in the code to instead use RandNode(). but given "In any case, this is probably a moot point since we plan to switch to load-balanced MIGs", I'm not sure if this is worth it, and definitely seems out of the scope of this PR to me.
pkg/cmd/roachtest/tests/admission_control_intent_resolution.go line 74 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
I wonder if we should encode this into
TestSpec, since it's a property/requirement of a specific test? By making this requirement implicit, we can't readily extract all tests which support non-default ports, other than through source code analysis.
Hmm, I agree that it would be nice, but it's not obvious to me the best way to do this. These port values are used in c.Start() which has no knowledge of the test spec.
One solution would be to rework DefaultStartOpts() to take in the TestSpec, and return appropriate start opts there based on the TestSpec. The DefaultStartOptsNoBackups() DefaultStartSingleNodeOpts() and DefaultStartOptsInMemory() functions fall under the same line of thinking so we could make those part of TestSpec while we're at it. But that would touch every roachtest and might be out of the scope of this PR?
I'm open to other solutions or thoughts on the above solution! For what it's worth, I do think this would be a far less troublesome refactor than randomizing the node used.
pkg/cmd/roachtest/tests/alterpk.go line 116 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Remove?
Done.
pkg/cmd/roachtest/tests/tpchvec.go line 580 at r1 (raw file):
I see now that I can combine this cmd and the one below, but I don't think I quite get what you mean by refactoring maybeExpandPgPort. Do you mean making a helper to find the port using c.DiscoverService that can be used both here and in maybeExpandPgPort?
unless this is a one-off.
It's also done in npgsql, so one-off given the above combining commands change.
|
Above comments addressed offline with Stan. Reran/fixed the failing roachtests, and confirmed they all pass except multitenant-upgrade which appears to be a unrelated issue. Will leave this open a bit longer though in case anyone else has comments 😄 |
Previously, the default sql port and admin ui port were set to 26257 and 26258 respectively in start and start-sql. This change now removes that default assignment and now randomly assigns an available open port. Many roachtests rely on this default port assumption and fail after this change. To fix this, we now explicity specify the port when applicable. Some roachtests use a third party library, where the port cannot be easily passed in and is hardcoded. In these cases, we retain the previous behaviour by specifying StartOpts.RoachprodOpts.SQLPort.
c4055ea to
b9f2d19
Compare
|
TFTR! bors r=srosenberg |
|
Build succeeded: |
|
Reminder to self: cherrypick #116452 when this is backported |
After cockroachdb#114097, we need to specify ports for cockroach node connections. This one was missed and timeseries were not being collected in test failures. Release note: none Epic: none
116218: roachtest: run consistency checks after tests r=erikgrinaker a=erikgrinaker This patch runs replica consistency checks during post-test assertions. Previously, this only checked for stats inconsistencies, it now checks for actual data inconsistencies. This is useful to detect data corruption bugs, typically in e.g. Pebble or Raft snapshots. The consistency check runs at full speed for up to 20 minutes before timing out, and will report any inconsistencies found during a partial run. This is sufficient to check about 200 GB of data. Tests can opt out by skipping `registry.PostValidationReplicaDivergence`. Storage checkpoints are not yet taken when inconsistencies are detected, since it requires additional support in SQL builtins to take them at the same Raft log index across nodes. This will be addressed separately. Touches #115770. Epic: none Release note: None 116614: sql: add telemetry for mixed DDL/DML transactions r=rafiss a=rafiss This patch adds feature counter telemetry for explicit transactions that have schema changes. We track if a transaction has DDL only or a mixture of DDL and DML, and if it succeeded or failed. fixes #115012 Release note: None 116672: sql: skip TestExperimentalRelocateNonVoters under duress r=rafiss a=rafiss fixes #115935 Release note: None 116695: roachtest: specify port for post test timeseries collection r=renatolabs a=DarrylWong After #114097, we need to specify ports for cockroach node connections. This one was missed and timeseries were not being collected in test failures. Release note: none Epic: none Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: DarrylWong <darryl@cockroachlabs.com>
The change cockroachdb#114097 removed default port assumptions for SQLPort and AdminUIPort, and instead finds open ports to assign. However, Prometheus scraping relies on this hardcoded AdminUIPort as it can't discover the port itself. This change temporarily reverts AdminUIPort back to the hardcoded port. Release note: none Epic: none Informs: cockroachdb#117125
117141: roachprod: revert AdminUIPort back to DefaultAdminUIPort r=srosenberg a=DarrylWong The change #114097 removed default port assumptions for SQLPort and AdminUIPort, and instead finds open ports to assign. However, Prometheus scraping relies on this hardcoded AdminUIPort as it can't discover the port itself. This change temporarily reverts AdminUIPort back to the hardcoded port. Release note: none Epic: none Informs: #117125 Co-authored-by: DarrylWong <darryl@cockroachlabs.com>
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging, but adminui was reverted in cockroachdb#117141 and mistakenly did not revert the special case for virtual clusters. Release note: None
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging, but adminui was reverted in cockroachdb#117141 and mistakenly did not revert the special case for virtual clusters. Release note: None
Previously, the default sql port and admin ui port were set to 26257
and 26258 respectively in start and start-sql. This change now removes
that default assignment and now randomly assigns an available open port.
Many roachtests rely on this default port assumption and fail after this
change. To fix this, we now explicity specify the port when applicable.
Some roachtests use a third party library, where the port cannot be easily
passed in and is hardcoded. In these cases, we retain the previous behaviour
by specifying StartOpts.RoachprodOpts.SQLPort.
Epic: None
Fixes: #111052
Release note: none