server: reduce API discrepancies between SQL-only servers and regular servers#75852
server: reduce API discrepancies between SQL-only servers and regular servers#75852craig[bot] merged 8 commits intocockroachdb:masterfrom
Conversation
de46182 to
cf8a39b
Compare
f286bad to
ae05f35
Compare
dhartunian
left a comment
There was a problem hiding this comment.
Very exciting!
Reviewed 3 of 3 files at r1, 5 of 5 files at r2, 4 of 4 files at r3, 3 of 3 files at r4, 6 of 6 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @knz, @rimadeodhar, and @srosenberg)
-- commits, line 16 at r3:
nit: maybe mention that this is moving configs from KVConfig into BaseConfig. Seems like those flags should have been in base anyway.
pkg/server/server_http.go, line 116 at r4 (raw file):
s.mux.Handle(adminHealth, handleRequestsUnauthenticated) // The /_status/vars endpoint is not authenticated either. Useful for monitoring. s.mux.Handle(statusVars, http.HandlerFunc(varsHandler{metricSource, s.cfg.Settings}.handleVars))
I saw this pattern in the httpServer as well that I am modifying in my PR here: https://github.com/cockroachdb/cockroach/pull/72659/files#diff-104a11712684e520f0affa95e6a053faeb713306a10f69ee99b9729cc9dd1996R908
If you make varsHandler implement http.Handler you no longer need to wrap in http.HandlerFunc. Or is there a reason to prefer the wrapping? Since we already have a struct representing the service, it seems natural to have it implement the Handler interface.
pkg/server/tenant.go, line 181 at r6 (raw file):
s.stopper, grpcMain, baseCfg.AdvertiseAddr,
Just to confirm my understanding for this change: all addresses should come frombaseCfg which gets updated when we run startListenRPCAndSQL above and the pgL listener it returns should not be used to query for addresses.
pkg/server/tenant.go, line 198 at r6 (raw file):
httpServer := newHTTPServer(baseCfg) httpServer.handleHealth(gwMux)
should we consider creating the server and handling health earlier than here? the non-tenant code seems to call this a lot earlier than setupRoutes.
knz
left a comment
There was a problem hiding this comment.
thanks for the early review! I hope to have the next iteration ready tomorrow.
Do you have some ideas about why TestTenantGRPC is unhappy in CI?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @rimadeodhar, and @srosenberg)
pkg/server/server_http.go, line 116 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I saw this pattern in the httpServer as well that I am modifying in my PR here: https://github.com/cockroachdb/cockroach/pull/72659/files#diff-104a11712684e520f0affa95e6a053faeb713306a10f69ee99b9729cc9dd1996R908
If you make
varsHandlerimplementhttp.Handleryou no longer need to wrap inhttp.HandlerFunc. Or is there a reason to prefer the wrapping? Since we already have a struct representing the service, it seems natural to have it implement theHandlerinterface.
Let's discuss this out of the review and come back to it after the discussion completes.
pkg/server/tenant.go, line 181 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Just to confirm my understanding for this change: all addresses should come from
baseCfgwhich gets updated when we runstartListenRPCAndSQLabove and thepgLlistener it returns should not be used to query for addresses.
yes
rimadeodhar
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, and @srosenberg)
pkg/server/tenant_admin.go, line 55 at r1 (raw file):
// TODO(knz): add Drain implementation here. // TODO(rima): add Nodes implementation here for debug zip.
I'm confused about why we would want to add Nodes implementation to the tenantAdminServer as opposed to the tenantStatusServer? It is a part of the statusClient currently. Also, would we want to implement the Nodes API at all for SQL only servers given that the NodesResponse object contains a fair number of fields that don't have much meaning in the SQL only server context? If it is mainly about having parity between the tenantStatusServer and statusServer then that makes sense but the usability of the Nodes API would be fairly restricted for the SQL only servers
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @rimadeodhar, and @srosenberg)
pkg/server/tenant.go, line 198 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
should we consider creating the server and handling health earlier than here? the non-tenant code seems to call this a lot earlier than
setupRoutes.
Yes that would perhaps be a good idea, at least after we split the constructor code from the "start" code.
(As is done for non-tenant code, but not currently the case here.)
However, that would be a different type of change, which I believe is beyond the scope of the PR here. I wanted to keep this one neutral wrt initialization order so as to limit the amount of code movement.
pkg/server/tenant_admin.go, line 55 at r1 (raw file):
It is a part of the statusClient currently.
Oh my, you are right. I got myself confused too. Removed this comment.
Also, would we want to implement the Nodes API at all for SQL only servers given that the NodesResponse object contains a fair number of fields that don't have much meaning in the SQL only server context?
What you did in the other PR, adding a new endpoint, is fine. However, I can imagine a transition period, where we increase coverage of the existing API to bridge compatibility with the existing UI code (which uses Nodes), and only seek to remove/switch APIs in a second phase.
24f4fc4 to
ca97c3e
Compare
|
@maryliag @dhartunian can you have folk in your team review the test API changes and the unit test changes in ccl/serverccl. (And conversely, we need to train every member of every team to understand that any network API needs authentication + authorization checks. It's unfortunate that unit tests get written and approved during reviews without checks for authn/authz.) |
ca97c3e to
664ab79
Compare
664ab79 to
d13f54e
Compare
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)
Previously, dhartunian (David Hartunian) wrote…
nit: maybe mention that this is moving configs from
KVConfigintoBaseConfig. Seems like those flags should have been in base anyway.
Done.
knz
left a comment
There was a problem hiding this comment.
Do you have some ideas about why TestTenantGRPC is unhappy in CI?
This was the missing authentication. Fixed it.
The PR is now ready for review. Thanks
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)
d13f54e to
e7fca53
Compare
knz
left a comment
There was a problem hiding this comment.
If so desired, I can break it up in smaller PRs, one per commit?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)
e1fff17 to
8756f14
Compare
8756f14 to
2d9b5e0
Compare
b46ac6b to
a94e77b
Compare
maryliag
left a comment
There was a problem hiding this comment.
Regarding the unit test changes in ccl/serverccl:
Reviewed 3 of 6 files at r14, 1 of 4 files at r15, 1 of 3 files at r16, 1 of 6 files at r17, 6 of 11 files at r18, 3 of 3 files at r19, 3 of 3 files at r20, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)
Release note: None
This ensures that the login/logout endpoints become available. Release note: None
... since it does not need a KVConfig. This moves `EnableDemoLoginEndpoint` and `EnableWebSessionAuthentication` from KVConfig to BaseConfig, where they had belonged from the start. Release note: None
Release note: None
Release note (api change): The `/_status/load` endpoint, which delivers an instant measurement of CPU load, is now available for regular CockroachDB nodes and not just multitenant SQL-only servers. (This is arguably a bug fix - this endpoint should have been available from the start.)
This change reuses the common HTTP server infrastructure for SQL-only servers, the same one used for regular nodes. This achieves multiple milestones: - it makes it possible to serve the DB console on SQL-only servers. - it makes it possible to navigate SQL sessions and statements using a web browser. - it properly authenticates the `/debug` APIs. At this point, many parts of the UI are still "blank" (or report "method not implemented") because the corresponding server-side support is not yet ... implemented. This will come incrementally. Release note: None
... and tenant.go, into `newSQLServer()` where it belongs. Release note: None
Release note: None
a94e77b to
d6d3b32
Compare
dhartunian
left a comment
There was a problem hiding this comment.
(And conversely, we need to train every member of every team to understand that any network API needs authentication + authorization checks. It's unfortunate that unit tests get written and approved during reviews without checks for authn/authz.)
Regarding this issue, to what extent can this be nudged by making auth the default mode? Renaming GetHTTPClient to GetUnauthenticatedHTTPClient() could be enough to help people have awareness that security boundaries are being crossed.
Reviewed 17 of 17 files at r13, 6 of 6 files at r14, 22 of 22 files at r21, 5 of 5 files at r22, 4 of 4 files at r23, 3 of 3 files at r24, 6 of 6 files at r25, 11 of 11 files at r26, 3 of 3 files at r27, 3 of 3 files at r28, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @rimadeodhar, and @srosenberg)
pkg/server/testserver.go, line 844 at r26 (raw file):
} // AdminURL implements TestServerInterface.
Do we need to take off these comments?
pkg/testutils/serverutils/test_server_shim.go, line 143 at r26 (raw file):
// GetAuthenticatedHTTPClient returns an http client which has been // authenticated to access Admin API methods (via a cookie). GetAuthenticatedHTTPClient(isAdmin bool) (http.Client, error)
I don't quite follow why these methods get removed from TestServerInterface but get added to TestTenantInterface.
knz
left a comment
There was a problem hiding this comment.
Renaming GetHTTPClient to GetUnauthenticatedHTTPClient() could be enough to help people have awareness that security boundaries are being crossed.
Ooh this is a great idea. Let's do this. Filed as #76241
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian, @rimadeodhar, and @srosenberg)
pkg/server/testserver.go, line 844 at r26 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Do we need to take off these comments?
Which comments? I think they are removed in the final commit.
pkg/testutils/serverutils/test_server_shim.go, line 143 at r26 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I don't quite follow why these methods get removed from
TestServerInterfacebut get added toTestTenantInterface.
TestServerInterface embeds TestTenantInterface.
Maybe we need to rename these two interfaces (maybe a different PR):
- TestTenantInterface contains the common APIs, valid for both sql-only and mixed servers.
- TestServerInterface contains the APIs that are only useful for mixed KV/SQL servers.
|
I'm going to go ahead and merge this, and then we can follow-up with more PRs as needed for more polish. bors r=dhartunian,rimadheodar,maryliag |
|
Build succeeded: |
In cockroachdb#75852 these methods have recently been added to the underlying TestTenantInterface, so we no longer need to stub them out in this shim. Release note: None
76210: util/tracing: reduce allocations in EnsureForkSpan r=andreimatei a=andreimatei This patch pre-allocates capacity for the options slice in EnsureForkSpan. Before, we were reallocating the slice storage repeatedly, as it was growing from 0 to 1 and from 1 to 2 elements. This patch also simplifies, and slighly speeds up, EnsureChildSpan. This guy was using a sync.Pool for its options slice. That is not necessary, since the slice storage does not escape. In the benchmarks below, detached-child=false is EnsureChildSpan and detached-child=true is EnsureForkSpan. > make bench PKG=./pkg/util/tracing BENCHES=BenchmarkSpanCreation TESTFLAGS="-count=5 -cpu=1,2,4,8,16" name old time/op new time/op delta SpanCreation/detached-child=false 1.70µs ± 1% 1.61µs ± 1% -5.06% (p=0.008 n=5+5) SpanCreation/detached-child=false-2 1.17µs ± 7% 1.04µs ± 3% -10.70% (p=0.008 n=5+5) SpanCreation/detached-child=false-4 958ns ± 2% 893ns ±10% -6.76% (p=0.008 n=5+5) SpanCreation/detached-child=false-8 1.11µs ±38% 0.95µs ± 1% ~ (p=0.310 n=5+5) SpanCreation/detached-child=false-16 925ns ± 1% 919ns ± 1% ~ (p=0.286 n=4+5) SpanCreation/detached-child=true 2.27µs ± 2% 1.88µs ± 0% -17.27% (p=0.008 n=5+5) SpanCreation/detached-child=true-2 1.82µs ±10% 1.33µs ±22% -26.81% (p=0.008 n=5+5) SpanCreation/detached-child=true-4 1.29µs ± 2% 1.02µs ± 1% -21.35% (p=0.008 n=5+5) SpanCreation/detached-child=true-8 1.16µs ± 1% 0.91µs ± 5% -21.56% (p=0.008 n=5+5) SpanCreation/detached-child=true-16 1.14µs ± 2% 0.97µs ± 1% -15.08% (p=0.008 n=5+5) name old alloc/op new alloc/op delta SpanCreation/detached-child=false 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-2 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-4 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-8 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-16 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=true 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-2 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-4 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-8 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-16 769B ± 0% 288B ± 0% -62.55% (p=0.008 n=5+5) name old allocs/op new allocs/op delta SpanCreation/detached-child=false 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-2 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-4 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-8 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-16 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=true 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-2 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-4 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-8 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-16 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) Release note: None 76308: changefeedccl: remove methods from testServerShim r=miretskiy a=stevendanna In #75852 these methods have recently been added to the underlying TestTenantInterface, so we no longer need to stub them out in this shim. Release note: None 76309: build: disable metamorphic constants when calling execgen r=irfansharif a=stevendanna Commands such as `./dev test --verbose` might previously produce two log lines everytime it happens to have to call the genrule for execgen targets: ``` INFO: From Executing genrule //pkg/sql/colexec/colexecwindow:gen-window-framer: initialized metamorphic constant "span-reuse-rate" with value 10 ``` This is because execgen has the tracing package in its dependency tree ``` > go list -f '{{ join .Deps "\n" }}' ./pkg/sql/colexec/execgen/ | grep tracing | head -1 github.com/cockroachdb/cockroach/pkg/util/tracing ``` and that package instantiates this metamorphic constant. This PR sets COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=true when calling the generator so we don't get metamorphic constants enabled. Something like #75065 would give us more complete coverage of commands that might be generating noise. Release note: None 76338: jobs: De-flake TestTransientTxnErrors test. r=miretskiy a=miretskiy Fix rare, but silly flake in TestTransientTxnErrors where the `executeSchedules` go routine could run before we were ready. Fixes #65045 Release Notes: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
In cockroachdb#75852 these methods have recently been added to the underlying TestTenantInterface, so we no longer need to stub them out in this shim. Release note: None
As of this PR, all the RPCs are left unimplemented (by inheriting from UnimplementedAdminServer).
This also serves the UI (DB Console) on SQL servers, using the same code as regular servers.
Now we can incrementally add more API endpoints as needed.