sql: setup http router after version initialization#138343
sql: setup http router after version initialization#138343craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
dec70d6 to
db46929
Compare
| func registerHTTPRestart(r registry.Registry) { | ||
| r.Add(registry.TestSpec{ | ||
| Name: "http-restart/mixed-version", | ||
| Owner: registry.OwnerTestEng, |
There was a problem hiding this comment.
nit: should test eng own this or obs? I'd argue that obs should own it since its testing that the http server is registering routes at the proper time.
There was a problem hiding this comment.
Above reasoning makes sense to me; +1 for obs.
There was a problem hiding this comment.
Ack. Since we want to keep ownership I kept it in its own test instead of adding to version-upgrade.
| } | ||
| } | ||
|
|
||
| for i := 1; i <= 4; i++ { |
There was a problem hiding this comment.
I think we should use for _, node := range c.CRDBNodes() {... instead of a hardcoded size
There was a problem hiding this comment.
Yep, hardcoding should be eschewed because it makes the test configuration brittle, i.e., difficult to refactor.
DarrylWong
left a comment
There was a problem hiding this comment.
Left a few comments, mostly on my own code 😅
The regression test feels pretty lightweight, I wonder if it needs to be it's own test or if we can stick it in version-upgrade. Maybe only in non local runs if we are worried about keeping the acceptance test light.
| } | ||
| if err := client.PostProtobuf(ctx, url, &request, &response); err != nil { | ||
| if logEvery.ShouldLog() { | ||
| l.Printf("Error posting protobuf: %s", err) |
There was a problem hiding this comment.
nit: maybe prefix message with node ID
There was a problem hiding this comment.
Also wonder if this error checking should be smarter i.e. check that it suceeds at least once. Could definitely see a papercut in the url such that the http req always fails and we never catch it.
| return | ||
| default: | ||
| } | ||
| // Copied from elsewhere, actual HTTP req shouldn't matter |
There was a problem hiding this comment.
Interesting, I tried changing this to a simpler request (/_status/vars) and couldn't get it to fail after stressing 10 times. Is a tsdb request much faster so we hit the race condition (much) more often?
There was a problem hiding this comment.
TSDB request doesn't require auth. from what I recall. (Unless that changed recently.)
There was a problem hiding this comment.
@DarrylWong You were just unlucky in selecting /_status/vars as it doesn't use the authenticated handler.
TSDB reqs don't require any sort of sql role but they do go through the auth http handler.
| } | ||
|
|
||
| for i := 1; i <= 4; i++ { | ||
| mvt.BackgroundFunc("HTTP requests", func(ctx context.Context, l *logger.Logger, rng *rand.Rand, h *mixedversion.Helper) error { |
There was a problem hiding this comment.
Why not OnStartup? Its execution always precedes any background func. (See testPlanner.testStartSteps.)
There was a problem hiding this comment.
The test seemed to stall when I put it in OnStartup so I kept it in BackgroundFunc.
There was a problem hiding this comment.
BackgroundFunc runs in a goroutine managed by framework. The framework knows to not block on these steps and expects them to run for the duration of the test. OnStartup is a normal step so the test will block until the step is finished. If you wanted to achieve something similar in OnStartup you'd have to add your own concurrency control with h.GoWithCancel().
To answer Stan's question, there are no startup steps here so the ordering of the hooks is the exact same. So I just went with BackgroundFunc since the ergonomics of it are slightly cleaner imo, no strong preference though.
Ideally it should run before any nodes are started, that way we don't miss out on testing the first time it starts up. There's currently no way to do that in framework though 😞
There was a problem hiding this comment.
Ideally it should run before any nodes are started, that way we don't miss out on testing the first time it starts up. There's currently no way to do that in framework though.
Right, that was my thinking too...BackgroundFunc may not be able to catch a regression on every run. I suppose that's a reasonable compromise for now. I wonder if we should create a follow-up issue to add the missing feature? To my knowledge, this is the first such use-case where we do want to execute a user-defined func. immediately at startup.
There was a problem hiding this comment.
I think the timing to pull that off makes it complicated. c.Start handles both the selecting the adminUIPorts and starting CRDB, but we'd need to interleave this hook between the two steps. A workaround would be to hardcode the ports used, but that also isn't supported out of the box since mixed version framework controls the creation of clusters/tenants.
There was a problem hiding this comment.
Yep. We'd need to refactor a bit. E.g., c.Start will take an optional callback func; it's called immediately after we know the port(s), i.e., after generateStartArgs. Then, we expose a new user hook inside the mixedversion framework; it gets invoked before the crdb process is started. It seems rather contrived to satisfy this particular use-case, but some other use-case will probably come up in the future. Let's file it away in a new issue so that we don't have to fish for it; worst-case it will linger like many others from ages ago :)
db46929 to
b2564a6
Compare
e75abbb to
030804e
Compare
Previously the http routes for secondary tenants were set up after sql version initialization. As of cockroachdb#127170, the version initialization occurs in the sql server prestart, which occurs just after the router setup. It became possible to hit the http routes before the version was set, leading to fatal errors as the http routes rely on the sql version being set. Epic: none Fixes: cockroachdb#138342 Release note (bug fix): Secondary tenants will not fatal when issuing HTTP requests during tenant startup.
030804e to
331596c
Compare
|
TFTR! |
|
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #138342: branch-release-24.2, branch-release-24.3. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error setting reviewers, but backport branch blathers/backport-release-24.2-138343 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/138754/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. [] Backport to branch 24.2.x failed. See errors above. error setting reviewers, but backport branch blathers/backport-release-24.3-138343 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/138755/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. [] Backport to branch 24.3.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously the http routes were set up after sql version initialization. As of #127170, the version initialization occurs in the sql server prestart, which occurs just after router setup. It became possible to hit the http routes before the version was set, leading to panics as the http routes rely on the sql version being set.
Epic: none
Fixes: #138342