Skip to content

server: reduce API discrepancies between SQL-only servers and regular servers#75852

Merged
craig[bot] merged 8 commits intocockroachdb:masterfrom
knz:20220202-tenant-admin
Feb 8, 2022
Merged

server: reduce API discrepancies between SQL-only servers and regular servers#75852
craig[bot] merged 8 commits intocockroachdb:masterfrom
knz:20220202-tenant-admin

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Feb 2, 2022

As of this PR, all the RPCs are left unimplemented (by inheriting from UnimplementedAdminServer).

This also serves the UI (DB Console) on SQL servers, using the same code as regular servers.

Now we can incrementally add more API endpoints as needed.

@knz knz requested review from a team, dhartunian and rimadeodhar February 2, 2022 11:56
@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Feb 2, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz knz changed the title server: deliver the AdminServer RPC interface on SQL-only servers. server: deliver the AuthServer & AdminServer RPC interfaces on SQL-only servers. Feb 2, 2022
@knz knz force-pushed the 20220202-tenant-admin branch from de46182 to cf8a39b Compare February 2, 2022 12:12
@knz knz requested a review from a team as a code owner February 2, 2022 12:59
@knz knz requested review from srosenberg and removed request for a team February 2, 2022 12:59
@knz knz changed the title server: deliver the AuthServer & AdminServer RPC interfaces on SQL-only servers. server: reduce API discrepancies between SQL-only servers and regular servers Feb 2, 2022
@knz knz force-pushed the 20220202-tenant-admin branch 2 times, most recently from f286bad to ae05f35 Compare February 2, 2022 15:05
Copy link
Copy Markdown
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting!

Reviewed 3 of 3 files at r1, 5 of 5 files at r2, 4 of 4 files at r3, 3 of 3 files at r4, 6 of 6 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rimadeodhar, and @srosenberg)


-- commits, line 16 at r3:
nit: maybe mention that this is moving configs from KVConfig into BaseConfig. Seems like those flags should have been in base anyway.


pkg/server/server_http.go, line 116 at r4 (raw file):

	s.mux.Handle(adminHealth, handleRequestsUnauthenticated)
	// The /_status/vars endpoint is not authenticated either. Useful for monitoring.
	s.mux.Handle(statusVars, http.HandlerFunc(varsHandler{metricSource, s.cfg.Settings}.handleVars))

I saw this pattern in the httpServer as well that I am modifying in my PR here: https://github.com/cockroachdb/cockroach/pull/72659/files#diff-104a11712684e520f0affa95e6a053faeb713306a10f69ee99b9729cc9dd1996R908

If you make varsHandler implement http.Handler you no longer need to wrap in http.HandlerFunc. Or is there a reason to prefer the wrapping? Since we already have a struct representing the service, it seems natural to have it implement the Handler interface.


pkg/server/tenant.go, line 181 at r6 (raw file):

		s.stopper,
		grpcMain,
		baseCfg.AdvertiseAddr,

Just to confirm my understanding for this change: all addresses should come frombaseCfg which gets updated when we run startListenRPCAndSQL above and the pgL listener it returns should not be used to query for addresses.


pkg/server/tenant.go, line 198 at r6 (raw file):

	httpServer := newHTTPServer(baseCfg)

	httpServer.handleHealth(gwMux)

should we consider creating the server and handling health earlier than here? the non-tenant code seems to call this a lot earlier than setupRoutes.

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the early review! I hope to have the next iteration ready tomorrow.

Do you have some ideas about why TestTenantGRPC is unhappy in CI?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @rimadeodhar, and @srosenberg)


pkg/server/server_http.go, line 116 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I saw this pattern in the httpServer as well that I am modifying in my PR here: https://github.com/cockroachdb/cockroach/pull/72659/files#diff-104a11712684e520f0affa95e6a053faeb713306a10f69ee99b9729cc9dd1996R908

If you make varsHandler implement http.Handler you no longer need to wrap in http.HandlerFunc. Or is there a reason to prefer the wrapping? Since we already have a struct representing the service, it seems natural to have it implement the Handler interface.

Let's discuss this out of the review and come back to it after the discussion completes.


pkg/server/tenant.go, line 181 at r6 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Just to confirm my understanding for this change: all addresses should come frombaseCfg which gets updated when we run startListenRPCAndSQL above and the pgL listener it returns should not be used to query for addresses.

yes

Copy link
Copy Markdown
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, and @srosenberg)


pkg/server/tenant_admin.go, line 55 at r1 (raw file):

// TODO(knz): add Drain implementation here.
// TODO(rima): add Nodes implementation here for debug zip.

I'm confused about why we would want to add Nodes implementation to the tenantAdminServer as opposed to the tenantStatusServer? It is a part of the statusClient currently. Also, would we want to implement the Nodes API at all for SQL only servers given that the NodesResponse object contains a fair number of fields that don't have much meaning in the SQL only server context? If it is mainly about having parity between the tenantStatusServer and statusServer then that makes sense but the usability of the Nodes API would be fairly restricted for the SQL only servers

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @rimadeodhar, and @srosenberg)


pkg/server/tenant.go, line 198 at r6 (raw file):

Previously, dhartunian (David Hartunian) wrote…

should we consider creating the server and handling health earlier than here? the non-tenant code seems to call this a lot earlier than setupRoutes.

Yes that would perhaps be a good idea, at least after we split the constructor code from the "start" code.
(As is done for non-tenant code, but not currently the case here.)

However, that would be a different type of change, which I believe is beyond the scope of the PR here. I wanted to keep this one neutral wrt initialization order so as to limit the amount of code movement.


pkg/server/tenant_admin.go, line 55 at r1 (raw file):

It is a part of the statusClient currently.

Oh my, you are right. I got myself confused too. Removed this comment.

Also, would we want to implement the Nodes API at all for SQL only servers given that the NodesResponse object contains a fair number of fields that don't have much meaning in the SQL only server context?

What you did in the other PR, adding a new endpoint, is fine. However, I can imagine a transition period, where we increase coverage of the existing API to bridge compatibility with the existing UI code (which uses Nodes), and only seek to remove/switch APIs in a second phase.

@knz knz force-pushed the 20220202-tenant-admin branch 2 times, most recently from 24f4fc4 to ca97c3e Compare February 4, 2022 10:07
@knz knz requested a review from a team as a code owner February 4, 2022 10:07
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 4, 2022

@maryliag @dhartunian can you have folk in your team review the test API changes and the unit test changes in ccl/serverccl.
I think it's important for folk to understand that unit tests too need to use authentication to use HTTP APIs.

(And conversely, we need to train every member of every team to understand that any network API needs authentication + authorization checks. It's unfortunate that unit tests get written and approved during reviews without checks for authn/authz.)

@knz knz force-pushed the 20220202-tenant-admin branch from ca97c3e to 664ab79 Compare February 4, 2022 10:29
@knz knz requested a review from a team February 4, 2022 10:29
@knz knz force-pushed the 20220202-tenant-admin branch from 664ab79 to d13f54e Compare February 4, 2022 12:42
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)


-- commits, line 16 at r3:

Previously, dhartunian (David Hartunian) wrote…

nit: maybe mention that this is moving configs from KVConfig into BaseConfig. Seems like those flags should have been in base anyway.

Done.

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some ideas about why TestTenantGRPC is unhappy in CI?

This was the missing authentication. Fixed it.

The PR is now ready for review. Thanks

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)

@knz knz force-pushed the 20220202-tenant-admin branch from d13f54e to e7fca53 Compare February 4, 2022 12:52
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so desired, I can break it up in smaller PRs, one per commit?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)

@knz knz force-pushed the 20220202-tenant-admin branch 2 times, most recently from e1fff17 to 8756f14 Compare February 4, 2022 13:13
@knz knz force-pushed the 20220202-tenant-admin branch from 8756f14 to 2d9b5e0 Compare February 4, 2022 13:54
@knz knz force-pushed the 20220202-tenant-admin branch 2 times, most recently from b46ac6b to a94e77b Compare February 7, 2022 11:29
Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the unit test changes in ccl/serverccl: :lgtm:

Reviewed 3 of 6 files at r14, 1 of 4 files at r15, 1 of 3 files at r16, 1 of 6 files at r17, 6 of 11 files at r18, 3 of 3 files at r19, 3 of 3 files at r20, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian, @rimadeodhar, and @srosenberg)

knz added 8 commits February 8, 2022 17:22
This ensures that the login/logout endpoints become available.

Release note: None
... since it does not need a KVConfig.

This moves `EnableDemoLoginEndpoint` and
`EnableWebSessionAuthentication` from KVConfig to BaseConfig, where
they had belonged from the start.

Release note: None
Release note (api change): The `/_status/load` endpoint, which
delivers an instant measurement of CPU load, is now available
for regular CockroachDB nodes and not just multitenant SQL-only
servers.
(This is arguably a bug fix - this endpoint should have been available
from the start.)
This change reuses the common HTTP server infrastructure for SQL-only
servers, the same one used for regular nodes.

This achieves multiple milestones:

- it makes it possible to serve the DB console on SQL-only servers.
- it makes it possible to navigate SQL sessions and statements
  using a web browser.
- it properly authenticates the `/debug` APIs.

At this point, many parts of the UI are still "blank" (or report
"method not implemented") because the corresponding server-side
support is not yet ... implemented. This will come incrementally.

Release note: None
... and tenant.go, into `newSQLServer()` where it belongs.

Release note: None
@knz knz force-pushed the 20220202-tenant-admin branch from a94e77b to d6d3b32 Compare February 8, 2022 16:24
Copy link
Copy Markdown
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just a few small questions

(And conversely, we need to train every member of every team to understand that any network API needs authentication + authorization checks. It's unfortunate that unit tests get written and approved during reviews without checks for authn/authz.)

Regarding this issue, to what extent can this be nudged by making auth the default mode? Renaming GetHTTPClient to GetUnauthenticatedHTTPClient() could be enough to help people have awareness that security boundaries are being crossed.

Reviewed 17 of 17 files at r13, 6 of 6 files at r14, 22 of 22 files at r21, 5 of 5 files at r22, 4 of 4 files at r23, 3 of 3 files at r24, 6 of 6 files at r25, 11 of 11 files at r26, 3 of 3 files at r27, 3 of 3 files at r28, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @rimadeodhar, and @srosenberg)


pkg/server/testserver.go, line 844 at r26 (raw file):

}

// AdminURL implements TestServerInterface.

Do we need to take off these comments?


pkg/testutils/serverutils/test_server_shim.go, line 143 at r26 (raw file):

	// GetAuthenticatedHTTPClient returns an http client which has been
	// authenticated to access Admin API methods (via a cookie).
	GetAuthenticatedHTTPClient(isAdmin bool) (http.Client, error)

I don't quite follow why these methods get removed from TestServerInterface but get added to TestTenantInterface.

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming GetHTTPClient to GetUnauthenticatedHTTPClient() could be enough to help people have awareness that security boundaries are being crossed.

Ooh this is a great idea. Let's do this. Filed as #76241

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian, @rimadeodhar, and @srosenberg)


pkg/server/testserver.go, line 844 at r26 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Do we need to take off these comments?

Which comments? I think they are removed in the final commit.


pkg/testutils/serverutils/test_server_shim.go, line 143 at r26 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I don't quite follow why these methods get removed from TestServerInterface but get added to TestTenantInterface.

TestServerInterface embeds TestTenantInterface.

Maybe we need to rename these two interfaces (maybe a different PR):

  • TestTenantInterface contains the common APIs, valid for both sql-only and mixed servers.
  • TestServerInterface contains the APIs that are only useful for mixed KV/SQL servers.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 8, 2022

I'm going to go ahead and merge this, and then we can follow-up with more PRs as needed for more polish.

bors r=dhartunian,rimadheodar,maryliag

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 8, 2022

Build succeeded:

@craig craig bot merged commit f20c79e into cockroachdb:master Feb 8, 2022
@knz knz deleted the 20220202-tenant-admin branch February 8, 2022 21:39
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Feb 9, 2022
In cockroachdb#75852 these methods have recently been added to the underlying
TestTenantInterface, so we no longer need to stub them out in this
shim.

Release note: None
craig bot pushed a commit that referenced this pull request Feb 10, 2022
76210: util/tracing: reduce allocations in EnsureForkSpan r=andreimatei a=andreimatei

This patch pre-allocates capacity for the options slice in
EnsureForkSpan. Before, we were reallocating the slice storage
repeatedly, as it was growing from 0 to 1 and from 1 to 2 elements.

This patch also simplifies, and slighly speeds up, EnsureChildSpan. This
guy was using a sync.Pool for its options slice. That is not necessary,
since the slice storage does not escape.

In the benchmarks below, detached-child=false is EnsureChildSpan and
detached-child=true is EnsureForkSpan.

> make bench PKG=./pkg/util/tracing BENCHES=BenchmarkSpanCreation TESTFLAGS="-count=5 -cpu=1,2,4,8,16"
name                                  old time/op    new time/op    delta
SpanCreation/detached-child=false       1.70µs ± 1%    1.61µs ± 1%   -5.06%  (p=0.008 n=5+5)
SpanCreation/detached-child=false-2     1.17µs ± 7%    1.04µs ± 3%  -10.70%  (p=0.008 n=5+5)
SpanCreation/detached-child=false-4      958ns ± 2%     893ns ±10%   -6.76%  (p=0.008 n=5+5)
SpanCreation/detached-child=false-8     1.11µs ±38%    0.95µs ± 1%     ~     (p=0.310 n=5+5)
SpanCreation/detached-child=false-16     925ns ± 1%     919ns ± 1%     ~     (p=0.286 n=4+5)
SpanCreation/detached-child=true        2.27µs ± 2%    1.88µs ± 0%  -17.27%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-2      1.82µs ±10%    1.33µs ±22%  -26.81%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-4      1.29µs ± 2%    1.02µs ± 1%  -21.35%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-8      1.16µs ± 1%    0.91µs ± 5%  -21.56%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-16     1.14µs ± 2%    0.97µs ± 1%  -15.08%  (p=0.008 n=5+5)

name                                  old alloc/op   new alloc/op   delta
SpanCreation/detached-child=false         288B ± 0%      288B ± 0%     ~     (all equal)
SpanCreation/detached-child=false-2       288B ± 0%      288B ± 0%     ~     (all equal)
SpanCreation/detached-child=false-4       288B ± 0%      288B ± 0%     ~     (all equal)
SpanCreation/detached-child=false-8       288B ± 0%      288B ± 0%     ~     (all equal)
SpanCreation/detached-child=false-16      288B ± 0%      288B ± 0%     ~     (all equal)
SpanCreation/detached-child=true          768B ± 0%      288B ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-2        768B ± 0%      288B ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-4        768B ± 0%      288B ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-8        768B ± 0%      288B ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-16       769B ± 0%      288B ± 0%  -62.55%  (p=0.008 n=5+5)

name                                  old allocs/op  new allocs/op  delta
SpanCreation/detached-child=false         6.00 ± 0%      6.00 ± 0%     ~     (all equal)
SpanCreation/detached-child=false-2       6.00 ± 0%      6.00 ± 0%     ~     (all equal)
SpanCreation/detached-child=false-4       6.00 ± 0%      6.00 ± 0%     ~     (all equal)
SpanCreation/detached-child=false-8       6.00 ± 0%      6.00 ± 0%     ~     (all equal)
SpanCreation/detached-child=false-16      6.00 ± 0%      6.00 ± 0%     ~     (all equal)
SpanCreation/detached-child=true          16.0 ± 0%       6.0 ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-2        16.0 ± 0%       6.0 ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-4        16.0 ± 0%       6.0 ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-8        16.0 ± 0%       6.0 ± 0%  -62.50%  (p=0.008 n=5+5)
SpanCreation/detached-child=true-16       16.0 ± 0%       6.0 ± 0%  -62.50%  (p=0.008 n=5+5)

Release note: None

76308: changefeedccl: remove methods from testServerShim r=miretskiy a=stevendanna

In #75852 these methods have recently been added to the underlying
TestTenantInterface, so we no longer need to stub them out in this
shim.

Release note: None

76309: build: disable metamorphic constants when calling execgen r=irfansharif a=stevendanna

Commands such as `./dev test --verbose` might previously produce
two log lines everytime it happens to have to call the genrule for
execgen targets:

```
INFO: From Executing genrule //pkg/sql/colexec/colexecwindow:gen-window-framer:
initialized metamorphic constant "span-reuse-rate" with value 10
```

This is because execgen has the tracing package in its dependency tree

```
> go list -f '{{ join .Deps "\n" }}' ./pkg/sql/colexec/execgen/ | grep tracing | head -1
github.com/cockroachdb/cockroach/pkg/util/tracing
```

and that package instantiates this metamorphic constant.

This PR sets COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=true when
calling the generator so we don't get metamorphic constants enabled.

Something like #75065 would give us more complete coverage of commands
that might be generating noise.

Release note: None

76338: jobs: De-flake TestTransientTxnErrors test. r=miretskiy a=miretskiy

Fix rare, but silly flake in TestTransientTxnErrors where the
`executeSchedules` go routine could run before we were ready.

Fixes #65045

Release Notes: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
RajivTS pushed a commit to RajivTS/cockroach that referenced this pull request Mar 6, 2022
In cockroachdb#75852 these methods have recently been added to the underlying
TestTenantInterface, so we no longer need to stub them out in this
shim.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-server-and-security DB Server & Security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants