rpc: run grpc interceptors for local RPCs#76306
rpc: run grpc interceptors for local RPCs#76306craig[bot] merged 4 commits intocockroachdb:masterfrom
Conversation
|
Awesome, thanks for tackling this! Will review tomorrow. |
knz
left a comment
There was a problem hiding this comment.
Reviewed 5 of 7 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/rpc/context.go, line 290 at r1 (raw file):
ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler, ) (interface{}, error) { return handler(ctx, req)
I'm confused, what is this interceptor good for? It looks like the identity function to me.
pkg/rpc/context.go, line 727 at r1 (raw file):
i := 0 var callNextInterceptor grpc.UnaryHandler callNextInterceptor = func(ctx context.Context, req interface{}) (interface{}, error) {
I think it would be useful here and below to explain that you (we) did not invent this code, you essentially copy-pasted it from grpc/server.go, chain.*Interceptor().
Also explain that we're not relying on the grpc mechanism here because we want the chaining to be used also for local calls, not only remote, and grpc only calls the interceptors on remote calls.
(Essentially what you explain in the commit message, but copy-pasted in comments around this file.)
tbg
left a comment
There was a problem hiding this comment.
I don't know whether we should try to make this interceptor work or not.
I think we should. Bypassing the auth interceptor makes me a teeny bit nervous and the special casing around it (with the idx) while not exactly brittle (if it broke, we'd be using the auth interceptors on local RPCs, which would be obvious since they would prevent the local path from working) it is unsavory. Also, shouldn't this already work?
Lines 100 to 106 in d7569da
The auth interceptor will treat anything with that context bit set as the system tenant, which is exactly what we want. This also shouldn't incur any additional allocations (if it does, they should be preventable).
Going to hold off on the line-by-line review, but I am generally liking what I see here.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @tbg)
pkg/rpc/context.go, line 162 at r1 (raw file):
} // ServerInterceptorInfo contains the serve interceptors that a server created
server?
pkg/rpc/context.go, line 167 at r1 (raw file):
// UnaryInterceptors lists the interceptors for regular (unary) RPCs. UnaryInterceptors []grpc.UnaryServerInterceptor // UnaryAuthIdx is the position of the Auth interceptor withing
within
Code quote:
withingpkg/rpc/context.go, line 172 at r1 (raw file):
// StreamInterceptors lists the interceptors for streaming RPCs. StreamInterceptors []grpc.StreamServerInterceptor // StreamAuthIdx is the position of the Auth interceptor withing
within
Code quote:
withing|
What I've done here is broken, unfortunately. I need to work on this more. |
1abeee2 to
90474af
Compare
90474af to
fa4d730
Compare
|
Ready for another look. |
|
The auth interceptor is no longer excluded from the local call path. |
knz
left a comment
There was a problem hiding this comment.
LGTM with nits
Reviewed 1 of 7 files at r1, 15 of 15 files at r2, 8 of 8 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
-- commits, line 56 at r3:
Can you explain in the commit message also why it's OK not to run the client interceptors at this point.
pkg/rpc/context.go, line 568 at r2 (raw file):
ba.AdmissionHeader.SourceLocation = roachpb.AdmissionHeader_LOCAL // Create a new context from the existing one with the "local request" field set. // This tells the handler that this is an in-process request, bypassing ctx.Peer checks.
nit: make the context on a standalone line, so it's clear(er) what the comment refers to.
pkg/rpc/context.go, line 577 at r2 (raw file):
) (roachpb.Internal_RangeFeedClient, error) { ctx, cancel := context.WithCancel(ctx) ctx, sp := tracing.ChildSpan(ctx, "/cockroach.roachpb.Internal/RangeFeed")
nit: is it possible to extract this method name string from the roachpb package somehow?
pkg/rpc/context.go, line 586 at r2 (raw file):
go func() { defer cancel() defer sp.Finish()
nit: finish the span before canceling the parent context
pkg/rpc/context.go, line 705 at r3 (raw file):
info := &grpc.StreamServerInfo{ FullMethod: "/cockroach.roachpb.Internal/RangeFeed",
ditto, use a go const and share the string in both places.
fa4d730 to
0fea769
Compare
andreimatei
left a comment
There was a problem hiding this comment.
I've now added the client interceptors too. PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz, @shermanCRL, and @tbg)
Previously, knz (kena) wrote…
Can you explain in the commit message also why it's OK not to run the client interceptors at this point.
I've added the client interceptors too now.
pkg/rpc/context.go line 162 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
server?
done
pkg/rpc/context.go line 167 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
within
removed
pkg/rpc/context.go line 172 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
within
removed
pkg/rpc/context.go line 290 at r1 (raw file):
Previously, knz (kena) wrote…
I'm confused, what is this interceptor good for? It looks like the identity function to me.
this was leftover :S
Removed.
pkg/rpc/context.go line 727 at r1 (raw file):
Previously, knz (kena) wrote…
I think it would be useful here and below to explain that you (we) did not invent this code, you essentially copy-pasted it from
grpc/server.go,chain.*Interceptor().
Also explain that we're not relying on the grpc mechanism here because we want the chaining to be used also for local calls, not only remote, and grpc only calls the interceptors on remote calls.
(Essentially what you explain in the commit message, but copy-pasted in comments around this file.)
I've added a link to gRPC here.
I've added a note to the definition of internalClientAdapter about how it runs interceptors even though it otherwise doesn't use gRPC.
pkg/rpc/context.go line 568 at r2 (raw file):
Previously, knz (kena) wrote…
nit: make the context on a standalone line, so it's clear(er) what the comment refers to.
done
pkg/rpc/context.go line 577 at r2 (raw file):
Previously, knz (kena) wrote…
nit: is it possible to extract this method name string from the roachpb package somehow?
unfortunately not
pkg/rpc/context.go line 586 at r2 (raw file):
Previously, knz (kena) wrote…
nit: finish the span before canceling the parent context
this code is gone. But the Finish() was before the cancel...
pkg/rpc/context.go line 705 at r3 (raw file):
Previously, knz (kena) wrote…
ditto, use a go const and share the string in both places.
done
Before this patch, the gRPC interceptors that we normally run on the client and the server were not being run for "RPCs" to the "Internal" sevice going to the local node. Such RPCs are done through the localClientAdapter, which bypasses gRPC for performance reasons. Not running these interceptors is bad - they do useful things around creating tasks, tracing, authentication/authorization. Some of the things done by the interceptors was done separately and redundantly in random places to account for the RPC local case. This patch unifies the paths of local and remote RPCs by having the localClientAdapter run the client and server interceptors. Care was taken for the performance regression on the local path; still, one new allocation is imposed by the client interceptor interface for (unary) RPC calls: the result of the RPC needs to be allocated before the RPC is made, and the proto that the server returns needs to be copied over. I've run some benchmarks with this change and it didn't show anything. Node.batchInternal was simplified slightly to not create a task, since the server interceptor now does it in all cases. Before, creating the task was redundant in the non-local RPC case. Fixes cockroachdb#74732 Release note: None
3a729ba to
3ebf6ee
Compare
andreimatei
left a comment
There was a problem hiding this comment.
TFTRs!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @erikgrinaker, @knz, and @tbg)
|
Build failed (retrying...): |
|
Build succeeded: |
We recently started running grpc interceptors on local RPCs (in cockroachdb#76306). There's a bug in there causing server-side streaming interceptors to not work as intended. A server-side streaming interceptor is supposed to wrap a streaming RPC handler and it has the option of overriding the ServerStream provided to the handler with a custom one. An RPC handler uses the ServerStream to communicate with the caller; custom implementations of ServerStream generally wrap the gRPC implementation, adding more transparent functionality (for example, they can override the Context that the handler runs in). The bug was causing the custom ServerStream provided by the interceptors to not actually be passed to the RPC handler for local RPCs (i.e. for streaming RPCs going through the internalClientAdapter - and there's only one such RPC: RangeFeed). Instead, the handler always was getting the vanilla gRPC ServerStream. The bug was in the following lines in internalClientAdapter.RangeFeed(): rfAdapter := rangeFeedServerAdapter{rangeFeedPipe: streamPipe} handler := func(srv interface{}, stream grpc.ServerStream) error { return a.server.RangeFeed(args, rfAdapter) } We're passing `rfAdapter` to a.server.RangeFeed(), when we should be passing `stream`. Making that change needs a bit of massaging, though. I've also cleaned up the code surrounding the streaming interceptors and tried to explain it better, as it was hard to follow. I've also added a test. The code still remains pretty convoluted, but I believe that that's inherently necessary given the gRPC interfaces we're working with. The bug was manifesting itself as span use-after-finish crashes in tests for the RangeFeed RPC. Because the RangeFeed handler was not using the ServerStream provided by the tracing server-side streaming interceptor, the handler was running in the client's Context, with the client-side tracing span. The client's (DistSender's) span is sometimes finished while the server-side RPC is still running, resulting in the use-after-finish. Fixes cockroachdb#82603 Fixes cockroachdb#82579 Fixes cockroachdb#82581 Fixes cockroachdb#82585 Fixes cockroachdb#82577 Release note: None
We recently started running grpc interceptors on local RPCs (in cockroachdb#76306). There's a bug in there causing server-side streaming interceptors to not work as intended. A server-side streaming interceptor is supposed to wrap a streaming RPC handler and it has the option of overriding the ServerStream provided to the handler with a custom one. An RPC handler uses the ServerStream to communicate with the caller; custom implementations of ServerStream generally wrap the gRPC implementation, adding more transparent functionality (for example, they can override the Context that the handler runs in). The bug was causing the custom ServerStream provided by the interceptors to not actually be passed to the RPC handler for local RPCs (i.e. for streaming RPCs going through the internalClientAdapter - and there's only one such RPC: RangeFeed). Instead, the handler always was getting the vanilla gRPC ServerStream. The bug was in the following lines in internalClientAdapter.RangeFeed(): rfAdapter := rangeFeedServerAdapter{rangeFeedPipe: streamPipe} handler := func(srv interface{}, stream grpc.ServerStream) error { return a.server.RangeFeed(args, rfAdapter) } We're passing `rfAdapter` to a.server.RangeFeed(), when we should be passing `stream`. Making that change needs a bit of massaging, though. I've also cleaned up the code surrounding the streaming interceptors and tried to explain it better, as it was hard to follow. I've also added a test. The code still remains pretty convoluted, but I believe that that's inherently necessary given the gRPC interfaces we're working with. The bug was manifesting itself as span use-after-finish crashes in tests for the RangeFeed RPC. Because the RangeFeed handler was not using the ServerStream provided by the tracing server-side streaming interceptor, the handler was running in the client's Context, with the client-side tracing span. The client's (DistSender's) span is sometimes finished while the server-side RPC is still running, resulting in the use-after-finish. Fixes cockroachdb#82571 Fixes cockroachdb#82577 Fixes cockroachdb#82578 Fixes cockroachdb#82579 Fixes cockroachdb#82580 Fixes cockroachdb#82581 Fixes cockroachdb#82583 Fixes cockroachdb#82585 Fixes cockroachdb#82603 Release note: None
80090: streamingccl: make completeStreamIngestion use jobs.Update API r=gh-casper a=gh-casper Currently, completeStreamIngestion uses direct queries against jobs table to select the job, and update the job record. Now we are using jobs.Update api. Release note: None Closes: #78415 82293: sql: support geospatial index for index recommendation indexrec r=wenyihu6 a=wenyihu6 Previously, we did not consider geospatial inverted index or box2d index for index recommendation. But for queries that use index-accelerated geospatial functions or box2d operations, the spatial index should be recommended. This PR added spatial index to index candidates. Fixes: #80934 Release note (sql change): index recommendations are now supported for spatial indexes 82482: bazel: add more detail to `dev test` cobra msg r=irfansharif,rickystewart a=msbutler Fixes #82411 Release note: None 82490: kvserver: improve the delegate snapshot code r=andreimatei a=andreimatei See individual commits. 82532: sql/sem/builtins: fix span double finish r=andreimatei a=andreimatei The tracing span was finished twice on the error path, which is illegal. Release note: None 82621: rpc: fix passing of ServerStreams to RangeFeed local RPCs r=andreimatei a=andreimatei We recently started running grpc interceptors on local RPCs (in #76306). There's a bug in there causing server-side streaming interceptors to not work as intended. A server-side streaming interceptor is supposed to wrap a streaming RPC handler and it has the option of overriding the ServerStream provided to the handler with a custom one. An RPC handler uses the ServerStream to communicate with the caller; custom implementations of ServerStream generally wrap the gRPC implementation, adding more transparent functionality (for example, they can override the Context that the handler runs in). The bug was causing the custom ServerStream provided by the interceptors to not actually be passed to the RPC handler for local RPCs (i.e. for streaming RPCs going through the internalClientAdapter - and there's only one such RPC: RangeFeed). Instead, the handler always was getting the vanilla gRPC ServerStream. The bug was in the following lines in internalClientAdapter.RangeFeed(): rfAdapter := rangeFeedServerAdapter{rangeFeedPipe: streamPipe} handler := func(srv interface{}, stream grpc.ServerStream) error { return a.server.RangeFeed(args, rfAdapter) } We're passing `rfAdapter` to a.server.RangeFeed(), when we should be passing `stream`. Making that change needs a bit of massaging, though. I've also cleaned up the code surrounding the streaming interceptors and tried to explain it better, as it was hard to follow. I've also added a test. The code still remains pretty convoluted, but I believe that that's inherently necessary given the gRPC interfaces we're working with. The bug was manifesting itself as span use-after-finish crashes in tests for the RangeFeed RPC. Because the RangeFeed handler was not using the ServerStream provided by the tracing server-side streaming interceptor, the handler was running in the client's Context, with the client-side tracing span. The client's (DistSender's) span is sometimes finished while the server-side RPC is still running, resulting in the use-after-finish. Fixes #82603 Fixes #82579 Fixes #82581 Fixes #82585 Fixes #82577 Release note: None 82637: bazel: bump size of `schemachangerccl`, `pkg/sql/catalog/lease` tests r=rail a=rickystewart Release note: None Co-authored-by: Casper <casper@cockroachlabs.com> Co-authored-by: wenyihu3 <wenyi.hu@cockroachlabs.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Previous commit cockroachdb#76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional goroutine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs cockroachdb#107053 Release note: None
Previous commit cockroachdb#76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional goroutine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs cockroachdb#107053 Release note: None
Previous commit cockroachdb#76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional goroutine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs cockroachdb#107053 Release note: None
Previous commit cockroachdb#76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional goroutine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs cockroachdb#107053 Release note: None
107079: grpcinterceptor: Use ctxutil to detect context cancellation r=miretskiy a=miretskiy Previous commit #76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional go routine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by #103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs #107053 Release note: None 107267: kvcoord: make rangefeed connection class `TenantReadOnly` r=erikgrinaker a=erikgrinaker Otherwise, the setting won't have any effect for tenants, always falling through to the default. Resolves #106895. Epic: none Release note: None 107273: build: use absolute path to pnpm-lock.yaml r=maryliag a=sjbarag Previously, actions/setup-node@v3 was used with a relative path for `cache-dependency-path`. This had an unexpected interaction with the `.defaults.run.working-directory: "pkg/ui/workspaces/cluster-ui"` declaration, which caused setup-node's cleanup action to look for pkg/ui/workspaces/cluster-ui/pkg/ui/pnpm-lock.yaml. Naturally, this file didn't exist. Use an absolute path (thanks, `github.workspace`!) to pnpm-lock.yaml instead. Release note: None Epic: none Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Sean Barag <barag@cockroachlabs.com>
Previous commit cockroachdb#76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional goroutine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs cockroachdb#107053 Release note: None
…tion Backport 1/2 commits from cockroachdb#103866 Backport 1/1 commits from cockroachdb#107079 Previous commit cockroachdb#76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional goroutine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs cockroachdb#107053 Release note: None
…tion Backport 1/2 commits from cockroachdb#103866 Backport 1/1 commits from cockroachdb#107079 Previous commit cockroachdb#76306 arranged for interceptors to run for all rpcs (local or otherwise). In doing so, it added an additional goroutine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened. `ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost. Informs cockroachdb#107053 Release note: None Release justification: Important performance improvement for large scale rangefeed workloads to remove inadvertently introduced goroutine.
Before this patch, the gRPC interceptors that we normally run on the
client and the server were not being run for "RPCs" to the "Internal"
sevice going to the local node. Such RPCs are done through the
localClientAdapter, which bypasses gRPC for performance reasons. Not
running these interceptors is bad - they do useful things around
creating tasks, tracing, authentication/authorization.
Some of the things done by the interceptors was done separately and
redundantly in random places to account for the RPC local case.
This patch unifies the paths of local and remote RPCs by having the
localClientAdapter run the client and server interceptors. Care was
taken for the performance regression on the local path; still, one
new allocation is imposed by the client interceptor interface for
(unary) RPC calls: the result of the RPC needs to be allocated before
the RPC is made, and the proto that the server returns needs to be
copied over.
I've run some benchmarks with this change and it didn't show anything.
Node.batchInternal was simplified slightly to not create a task, since
the server interceptor now does it in all cases. Before, creating the
task was redundant in the non-local RPC case.
Fixes #74732