Skip to content

cli: add tracing to debug send-kv-batch#75519

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20220125-kv-req
Feb 9, 2022
Merged

cli: add tracing to debug send-kv-batch#75519
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20220125-kv-req

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Jan 25, 2022

This extends the debug send-kv-batch CLI utility to optionally
collect and print a trace of the remote processing.

(big thanks to @andreimatei for the help)

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz knz force-pushed the 20220125-kv-req branch 2 times, most recently from 76609fe to 678282d Compare January 25, 2022 21:56
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you! Out of curiosity, was there a particular need that motivated this?

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


pkg/cli/debug.go, line 1704 at r1 (raw file):

	f.BoolVar(&debugSendKVBatchOpts.verboseTracing, "verbose", debugSendKVBatchOpts.verboseTracing, "whether to set the verbose flag in the trace span before the request")
	f.BoolVar(&debugSendKVBatchOpts.printRecording, "print-recording", debugSendKVBatchOpts.printRecording, "whether to print the trace recording at the end")
	f.BoolVar(&debugSendKVBatchOpts.keepCollectedSpans, "keep-collected-spans", debugSendKVBatchOpts.keepCollectedSpans, "whether to keep the CollectedSpans field on the response")

nit: consider breaking these lines up.


pkg/cli/debug_send_kv_batch.go, line 104 at r1 (raw file):

	if debugSendKVBatchOpts.printRecording {
		// Not useful unless verbose is set.
		debugSendKVBatchOpts.verboseTracing = true

nit: is there a standard Cobra way to do argument pre-processing before the main work function?


pkg/cli/debug_send_kv_batch.go, line 140 at r1 (raw file):

	admin := serverpb.NewAdminClient(conn)

	br, rec, err := setupTraceThenSendKVBatchRequest(ctx, tracer, admin, jsonpb, &ba)

jsonpb doesn't appear to be used in setupTraceThenSendKVBatchRequest, so remove the parameter.


pkg/cli/debug_send_kv_batch.go, line 184 at r1 (raw file):

	// Run the request and collect the trace.
	br, err := doSendKVBatchRequestInternal(ctx, admin, jsonpb, ba)

nit: I think I'd prefer to inline doSendKVBatchRequestInternal, since that's also mostly fiddling with the trace.


pkg/server/node.go, line 956 at r1 (raw file):

}

func executeBatchRPCInternalFn(

nit: since this is now shared both by Node and adminServer, consider pulling it out into a different file.


pkg/server/node.go, line 966 at r1 (raw file):

	processorName redact.SafeString,
	senderName redact.SafeString,
	sender interface {

nit: we already have a named interface for this, kv.Sender.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: the commit title is a bit vague, consider e.g. "add tracing support for debug send-kv-batch"

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/cli/debug.go, line 1702 at r1 (raw file):

	f = debugSendKVBatchCmd.Flags()
	f.BoolVar(&debugSendKVBatchOpts.verboseTracing, "verbose", debugSendKVBatchOpts.verboseTracing, "whether to set the verbose flag in the trace span before the request")

I don't think anybody will ever want a non-verbose trace. I'd replace --verbose and --print-recording with a single flag called --trace, that prints a verbose recording.

But another flag that I think would be useful would be a flag for a jaeger file name. You can use Recording.ToJaegerJSON() to produce it.


pkg/cli/debug.go, line 1704 at r1 (raw file):

	f.BoolVar(&debugSendKVBatchOpts.verboseTracing, "verbose", debugSendKVBatchOpts.verboseTracing, "whether to set the verbose flag in the trace span before the request")
	f.BoolVar(&debugSendKVBatchOpts.printRecording, "print-recording", debugSendKVBatchOpts.printRecording, "whether to print the trace recording at the end")
	f.BoolVar(&debugSendKVBatchOpts.keepCollectedSpans, "keep-collected-spans", debugSendKVBatchOpts.keepCollectedSpans, "whether to keep the CollectedSpans field on the response")

Nobody will understand what this flag does. Let's get rid of it and always wipe br.CollectedSpans if tracing was enabled.


pkg/cli/debug_send_kv_batch.go, line 27 at r1 (raw file):

)

// TODO(knz): this struct belongs elsewhere.

I clicked on the linked and didn't immediately see what that issues has to do with the TODO. I doubt that this TODO adds anything. I'd say either address it or remove it.


pkg/cli/debug_send_kv_batch.go, line 168 at r1 (raw file):

}

func setupTraceThenSendKVBatchRequest(

just sendRequest :)


pkg/cli/debug_send_kv_batch.go, line 177 at r1 (raw file):

	// Set up a tracing span and enable verbose tracing if requested by
	// configuration.
	ctx, rootSpan := tracing.EnsureChildSpan(ctx, tracer, "debug-send-kv-batch", tracing.WithForceRealSpan())

WithForceRealSpan is mostly historical. If you want a recording span, say so. I'm surprised that this works at all as written in the non-verbose case. I would have expected no recording to come back from the server. Can you show me what recording you get? As written, does --print-recording without --verbose make any sense?
If you want a recording, you're supposed to use WithRecordingMode(RecordingStructured), or WithRecordingMode(RecordingVerbose).

Like I was saying in the flags comment, I think we should always do WithRecordingMode(RecordingVerbose) if tracing is wanted.


pkg/cli/debug_send_kv_batch.go, line 177 at r1 (raw file):

	// Set up a tracing span and enable verbose tracing if requested by
	// configuration.
	ctx, rootSpan := tracing.EnsureChildSpan(ctx, tracer, "debug-send-kv-batch", tracing.WithForceRealSpan())

You don't need tracing.EnsureChildSpan. Use the simpler tracer.StartSpanCtx().
As written, it reads like you're asking for a child span, but then are naming it rootSpan.

also nit: s/rootSpan/sp. The vast majority of the code calls the span sp.


pkg/cli/debug_send_kv_batch.go, line 191 at r1 (raw file):

}

func doSendKVBatchRequestInternal(

do and internal seem like too many random words.
How about sendRequestInternal? But why does this function exist at all - particularly since it has tracing concerns that overlap with the caller and could be written more nicely in the other function (i.e. you don't need tracing.SpanFromContext(); you know the span cause you've just created it)? Fold it into the caller.


pkg/cli/start.go, line 1231 at r1 (raw file):

func getClientGRPCConn(
	ctx context.Context, cfg server.Config,
) (*grpc.ClientConn, *tracing.Tracer, func(), error) {

Did you actually need the tracer used by the stopper used by the ClientConn to be the same as the Tracer used by the rest of the program? Does that stopper end up interacting with the the the traced operation?
If you do need it, let's pass the Tracer in. But if you don't actually need them to be the same, I think it'd be better to leave this code alone, and have two different Tracers. This rpcContext would be better of not doing any tracing; since it insist on using a Tracer, let it use a different one if you can get away with it.

@knz knz force-pushed the 20220125-kv-req branch from 678282d to e140866 Compare February 1, 2022 23:24
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a particular need that motivated this?

Education / onboarding.

the commit title is a bit vague, consider e.g. "add tracing support"

Done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)


pkg/cli/debug.go, line 1702 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't think anybody will ever want a non-verbose trace. I'd replace --verbose and --print-recording with a single flag called --trace, that prints a verbose recording.

But another flag that I think would be useful would be a flag for a jaeger file name. You can use Recording.ToJaegerJSON() to produce it.

Great ideas. Done.


pkg/cli/debug.go, line 1704 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Nobody will understand what this flag does. Let's get rid of it and always wipe br.CollectedSpans if tracing was enabled.

So as we've discussed today at team meeting, it's useful to have an option to keep the spans for education purposes (to explain how tracing works).


pkg/cli/debug.go, line 1704 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: consider breaking these lines up.

Done.


pkg/cli/debug_send_kv_batch.go, line 27 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I clicked on the linked and didn't immediately see what that issues has to do with the TODO. I doubt that this TODO adds anything. I'd say either address it or remove it.

It's part of a larger project to refactor the initialization code for all debug sub-commands. That's out of scope here.


pkg/cli/debug_send_kv_batch.go, line 104 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: is there a standard Cobra way to do argument pre-processing before the main work function?

We have a thorough architecture for command-line argument handling, it's just currently not available for all the debug stuff. There's a project to clean that up. Hence the TODO at the top.


pkg/cli/debug_send_kv_batch.go, line 140 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

jsonpb doesn't appear to be used in setupTraceThenSendKVBatchRequest, so remove the parameter.

Good catch. Done.


pkg/cli/debug_send_kv_batch.go, line 168 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

just sendRequest :)

This is a shared package with all the other CLI commands. This name is too generic.


pkg/cli/debug_send_kv_batch.go, line 177 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

WithForceRealSpan is mostly historical. If you want a recording span, say so. I'm surprised that this works at all as written in the non-verbose case. I would have expected no recording to come back from the server. Can you show me what recording you get? As written, does --print-recording without --verbose make any sense?
If you want a recording, you're supposed to use WithRecordingMode(RecordingStructured), or WithRecordingMode(RecordingVerbose).

Like I was saying in the flags comment, I think we should always do WithRecordingMode(RecordingVerbose) if tracing is wanted.

TIL, thanks. Done.


pkg/cli/debug_send_kv_batch.go, line 177 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

You don't need tracing.EnsureChildSpan. Use the simpler tracer.StartSpanCtx().
As written, it reads like you're asking for a child span, but then are naming it rootSpan.

also nit: s/rootSpan/sp. The vast majority of the code calls the span sp.

Done.


pkg/cli/debug_send_kv_batch.go, line 184 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: I think I'd prefer to inline doSendKVBatchRequestInternal, since that's also mostly fiddling with the trace.

Done.


pkg/cli/debug_send_kv_batch.go, line 191 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do and internal seem like too many random words.
How about sendRequestInternal? But why does this function exist at all - particularly since it has tracing concerns that overlap with the caller and could be written more nicely in the other function (i.e. you don't need tracing.SpanFromContext(); you know the span cause you've just created it)? Fold it into the caller.

What Erik said. Done.


pkg/cli/start.go, line 1231 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Did you actually need the tracer used by the stopper used by the ClientConn to be the same as the Tracer used by the rest of the program? Does that stopper end up interacting with the the the traced operation?
If you do need it, let's pass the Tracer in. But if you don't actually need them to be the same, I think it'd be better to leave this code alone, and have two different Tracers. This rpcContext would be better of not doing any tracing; since it insist on using a Tracer, let it use a different one if you can get away with it.

Yes, I needed it. I was getting panics otherwise.
I'm also not going to change the itnerface to getClientGRPCConn which is used in many places. I got lucky changing the type of the 2nd return value, because it wasn't used anywhere. I don't want to extend the work in this PR to a larger-scale refactoring.


pkg/server/node.go, line 956 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: since this is now shared both by Node and adminServer, consider pulling it out into a different file.

I was trying to keep the diff small for ease of review, but sure. Done.


pkg/server/node.go, line 966 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: we already have a named interface for this, kv.Sender.

TIL. Thanks.

@knz knz force-pushed the 20220125-kv-req branch 2 times, most recently from 8edb268 to 1c9c6ce Compare February 1, 2022 23:44
@knz knz marked this pull request as ready for review February 1, 2022 23:44
@knz knz requested review from a team as code owners February 1, 2022 23:44
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 1, 2022

also added unit tests now. RFAL.

@knz knz force-pushed the 20220125-kv-req branch 2 times, most recently from 0b0ac7a to 2532591 Compare February 2, 2022 10:10
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Will defer to @andreimatei on the tracing details.

Reviewed 5 of 8 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


pkg/server/exec_kv_batch.go, line 41 at r3 (raw file):

// txn logic, or the store directly, when the batch is executed
// "under" the txn logic.
func executeBatchRPCInternalFn(

nit: the Fn suffix seems odd and unnecessary -- does it serve any purpose?

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


-- commits, line 2 at r3:
looks like this commit message has diverged from the pr comment.


pkg/cli/debug_send_kv_batch.go, line 27 at r1 (raw file):

Previously, knz (kena) wrote…

It's part of a larger project to refactor the initialization code for all debug sub-commands. That's out of scope here.

be that as it may, this TODO is worse than nothing in my opinion :)


pkg/cli/debug_send_kv_batch.go, line 168 at r1 (raw file):

Previously, knz (kena) wrote…

This is a shared package with all the other CLI commands. This name is too generic.

ok, but you do have to find something better :)
The name doesn't need to mention tracing.


pkg/cli/debug_send_kv_batch.go, line 35 at r3 (raw file):

	traceFormat string
	// The output file to use.
	traceOutput string

traceFile ?


pkg/cli/debug_send_kv_batch.go, line 105 at r3 (raw file):

func runSendKVBatch(cmd *cobra.Command, args []string) error {
	verboseTrace := false

by naming this verboseTrace, you're suggesting that there's also an option for some other type of trace (e.g. a structured recording). But there's isn't such an option, is there? Name this trace.


pkg/cli/debug_send_kv_batch.go, line 243 at r3 (raw file):

		// because otherwise the unit test TestSendKVBatch becomes non-deterministic
		// on the contents of the traceInfo JSON field in the request.
		ctx, sp = tracer.StartSpanCtx(ctx, "debug-send-kv-batch", tracing.WithRecording(tracing.RecordingVerbose))

nit: it'd be cleaner if you don't overwrite ctx; you're not getting any benefit from overwriting it, are you? Just do sp := tracer.StartSpan(...).

But better yet, do you even need a span at all? I don't think anyone is interested in seeing this client span in the trace; it probably be better to represent only the server side trace (the client-side span in principle can tell you something about network latency, but I don't think it's interesting).
So consider populating ba.TraceInfo by hand. My guess is that populating RecordingMode should be sufficient. Maybe you also need to generate a TraceID. And then, on the response, you can just cast br.CollectedSpans to a Recording.


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

)

// executeBatchRPCInternalFn is a helper function which executes a KV

I don't think this refactoring is a good idea. Making this function generic over the sender will make the flow hard to trace.
Why was this necessary again? If we make adminServer.SendKVBatchcall setupSpanForIncomingRPC, does the reason go away?

@knz knz changed the title cli: improve debug send-kv-batch cli: add tracing to debug send-kv-batch Feb 4, 2022
@knz knz force-pushed the 20220125-kv-req branch 2 times, most recently from fd2378b to f9862ce Compare February 4, 2022 14:10
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @erikgrinaker)


-- commits, line 2 at r3:

Previously, andreimatei (Andrei Matei) wrote…

looks like this commit message has diverged from the pr comment.

Done.


pkg/cli/debug_send_kv_batch.go, line 27 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

be that as it may, this TODO is worse than nothing in my opinion :)

We have precedent elsewhere. The project to bring all debug commands under the fold wrt handling of command-line arguments is orthogonal to this PR.


pkg/cli/debug_send_kv_batch.go, line 168 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

ok, but you do have to find something better :)
The name doesn't need to mention tracing.

Done.


pkg/cli/debug_send_kv_batch.go, line 35 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

traceFile ?

Done.


pkg/cli/debug_send_kv_batch.go, line 105 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

by naming this verboseTrace, you're suggesting that there's also an option for some other type of trace (e.g. a structured recording). But there's isn't such an option, is there? Name this trace.

cool, done.


pkg/cli/debug_send_kv_batch.go, line 243 at r3 (raw file):

it'd be cleaner if you don't overwrite ctx; you're not getting any benefit from overwriting it, are you?

Well yes absolutely I do, otherwise the verbose flag does not propagate. That's where grpc picks it up from.

do you even need a span at all?

I've tried both with and without. With the span, the result trace is clearer, because it highlights the client-server network latency of the RPC call.


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't think this refactoring is a good idea. Making this function generic over the sender will make the flow hard to trace.
Why was this necessary again? If we make adminServer.SendKVBatchcall setupSpanForIncomingRPC, does the reason go away?

Doing so essentially copy-pastes the body of the new function in two places. Given that it's not trivial to read (nor to maintain), I think that sharing the code is better.


pkg/server/exec_kv_batch.go, line 41 at r3 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: the Fn suffix seems odd and unnecessary -- does it serve any purpose?

Done.

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker and @knz)


pkg/cli/debug_send_kv_batch.go, line 243 at r3 (raw file):

Well yes absolutely I do, otherwise the verbose flag does not propagate. That's where grpc picks it up from.

I don't think that's true. We're not relying on gRPC to "pick up" anything, are we (since we're explicitly setting ba.TraceInfo)? I've just tried it context.TODO() in the admin.SendKVBatch() call below, and I get the trace fine. In fact, that's why I think it'd be better to not overwrite ctx - to not give the impression that that's needed.


pkg/cli/start.go, line 1231 at r1 (raw file):

Yes, I needed it. I was getting panics otherwise.

Can you show me please? I've stressed the new tests without this change and it all seems to work fine.

I'm also not going to change the itnerface to getClientGRPCConn

If we do need to control the tracer, I don't think we'd need to change the interface. This function already takes in a tracer through cfg.


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

Previously, knz (kena) wrote…

Doing so essentially copy-pastes the body of the new function in two places. Given that it's not trivial to read (nor to maintain), I think that sharing the code is better.

I have to insist on this; I think it's important that the call to Stores.Send() be easily findable. The fact that both the server side senders and the client-side senders was, in my opinion, a mistake - they're forced to share an interface (which is inconvenient for the client-side ones which would rather not operate on pErr) and also their contracts are different despite sharing the interface. Let's not use this unfortunate history more than it's already used.

There's minimal copy-pasting needed to avoid this, and also minimal changes to the existing code (as opposed to your patch which changes a lot of code). A lot of what's in this executeBatchRPCInternal function is not particularly needed by the admin RPC; the code didn't deal with a couple of things here before, and in my opinion it was fine. I think the only thing that the admin RPC needs to do is below. I've tried it superficially.

	ctx, finishSpan := s.server.node.setupSpanForIncomingRPC(ctx, roachpb.SystemTenantID, ba)
	var br *roachpb.BatchResponse
	defer func() { finishSpan(ctx, br) }()
	br, pErr := s.server.db.NonTransactionalSender().Send(ctx, *ba)
	if pErr != nil {
		br = &roachpb.BatchResponse{}
		log.VErrEventf(ctx, 3, "error from stores.Send: %s", pErr)
	}
	br.Error = pErr
	return br, nil

@knz knz force-pushed the 20220125-kv-req branch from f9862ce to 704aa5e Compare February 7, 2022 22:00
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @erikgrinaker)


pkg/cli/debug_send_kv_batch.go, line 243 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Well yes absolutely I do, otherwise the verbose flag does not propagate. That's where grpc picks it up from.

I don't think that's true. We're not relying on gRPC to "pick up" anything, are we (since we're explicitly setting ba.TraceInfo)? I've just tried it context.TODO() in the admin.SendKVBatch() call below, and I get the trace fine. In fact, that's why I think it'd be better to not overwrite ctx - to not give the impression that that's needed.

With the custom ctx, my trace starts like this:

     0.000ms      0.000ms    === operation:debug-send-kv-batch _unfinished:1 _verbose:1
     0.009ms      0.009ms        === operation:/cockroach.server.serverpb.Admin/SendKVBatch _verbose:1 span.kind:client
     0.261ms      0.253ms        === operation:/cockroach.server.serverpb.Admin/SendKVBatch _verbose:1 span.kind:server
     0.290ms      0.029ms        event:server/exec_kv_batch.go:62 adminServer received request: 1 Get, 1 Put
     0.299ms      0.009ms            === operation:dist sender send _verbose:1 node:1

Without the custom ctx, it starts like this:

     0.000ms      0.000ms    === operation:debug-send-kv-batch _unfinished:1 _verbose:1
     0.336ms      0.336ms        === operation:/cockroach.server.serverpb.Admin/SendKVBatch _verbose:1 span.kind:server
     0.364ms      0.028ms        event:server/exec_kv_batch.go:62 adminServer received request: 1 Get, 1 Put
     0.371ms      0.007ms            === operation:dist sender send _verbose:1 node:1

So you're right about the verbose flag, but there does seem to be a difference somehow. Can you explain that to me?


pkg/cli/start.go, line 1231 at r1 (raw file):

This function already takes in a tracer through cfg.

Oh you're right, I just changed that in this PR! 😅
Simplified accordingly.


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

I think it's important that the call to Stores.Send() be easily findable.

I'm OK with that! Changed the interface accordingly.

your patch which changes a lot of code

The first version of the patch had only minimal changes, because it was simply splitting a function in two. The diff was minimal. The diff is big now because Erik also asked me to move the thing to a different file.

If you think we need to optimize for diff size, I can move the function back to the place in the file where it makes the diff minimal again.

I think the only thing that the admin RPC needs to do is below.

That's the minimal thing, that's right, but it does not give us:

  • a task that can be identified in the stopper while the batch is running.
  • the trace event about which request was received.
  • the assertion on br.Error at the end.

I really would prefer if the trace we get from the utility had the same structure as the trace we get from SQL (for education purposes). With your simplification, it does not any more.

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker and @knz)


pkg/cli/debug_send_kv_batch.go, line 243 at r3 (raw file):

Previously, knz (kena) wrote…

With the custom ctx, my trace starts like this:

     0.000ms      0.000ms    === operation:debug-send-kv-batch _unfinished:1 _verbose:1
     0.009ms      0.009ms        === operation:/cockroach.server.serverpb.Admin/SendKVBatch _verbose:1 span.kind:client
     0.261ms      0.253ms        === operation:/cockroach.server.serverpb.Admin/SendKVBatch _verbose:1 span.kind:server
     0.290ms      0.029ms        event:server/exec_kv_batch.go:62 adminServer received request: 1 Get, 1 Put
     0.299ms      0.009ms            === operation:dist sender send _verbose:1 node:1

Without the custom ctx, it starts like this:

     0.000ms      0.000ms    === operation:debug-send-kv-batch _unfinished:1 _verbose:1
     0.336ms      0.336ms        === operation:/cockroach.server.serverpb.Admin/SendKVBatch _verbose:1 span.kind:server
     0.364ms      0.028ms        event:server/exec_kv_batch.go:62 adminServer received request: 1 Get, 1 Put
     0.371ms      0.007ms            === operation:dist sender send _verbose:1 node:1

So you're right about the verbose flag, but there does seem to be a difference somehow. Can you explain that to me?

The span.kind=client span is opened by the client grpc interceptor here:

clientSpan := tracer.StartSpan(

That span is kind of useless. The way this is supposed to be is that the server rpc span is a child of a client rpc span, but that's not how our trace is constructed. I might get rid of this client span.


pkg/cli/start.go, line 1231 at r1 (raw file):

Previously, knz (kena) wrote…

This function already takes in a tracer through cfg.

Oh you're right, I just changed that in this PR! 😅
Simplified accordingly.

But are you really sure even the current change is needed? The change to this function is fine but, unless we can't get away without it, I'd rather we didn't use servercfg.Tracer in the call to sendKVBatchRequestWithTracingOption. I don't think the client needs to use any Tracer, and if it does need to use one, I'd rather it not be important that it uses the same one as the rpcContext (because I think that ideally these client RPCContexts didn't use tracing at all). You've said above that you got panics without it, but I think I've tried it and didn't get any panic (which also matched my superficial impression that there shouldn't be any interaction between the stopper's tracer and the rpc caller's). Can you please check one more time that it is indeed needed to align these tracers?


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

I'm OK with that! Changed the interface accordingly.

I was afraid that this is what you might do. I still don't like it :); there's still unneeded indirection.

a task that can be identified in the stopper while the batch is running.

Indeed. Screw that task. Or if you really want it, put it in by hand with another 3 lines.

the trace event about which request was received.

I think this is backwards; see below.

the assertion on br.Error at the end.

Screw it.

I really would prefer if the trace we get from the utility had the same structure as the trace we get from SQL (for education purposes). With your simplification, it does not any more.

I think it's the other way around. With this patch as written, this received request event that you want in the name of the traces "having the same structure" will appear twice - once in the adminServer span and once in the node span (right?). That is confusing. As you probably realize, the requests sent through AdminServer.SendKVBatch eventually make their way to a Node (after going through all the kvclient stack), where setupSpanForIncomingRPC is called. So the kvserver side of the trace is the same as what SQL gets.
Let's not couple the SendKVBatch and Batch RPCs; they're very different. I don't know what Erik said, but I really think the code was fine the way it was.

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker and @knz)


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

a task that can be identified in the stopper while the batch is running.

Indeed. Screw that task. Or if you really want it, put it in by hand with another 3 lines.

Btw, we already have tasks for all RPCs here:

if err := rpcCtx.Stopper.RunTaskWithErr(ctx, info.FullMethod, func(ctx context.Context) error {

so the task in batchInternal is only useful for local batch RPC which bypass gRPC.

FWIW, I'm trying to unify the remote and local RPC paths by running the grpc interceptors for local requests too, to address #74732. I hope I'll be able to get rid for setupSpanForIncomingRPC in the process.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @erikgrinaker, and @knz)


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

I don't know what Erik said, but I really think the code was fine the way it was.

I agree that it's unfortunate to split this function out in this way, since it makes the code harder to follow. Your point about the request traversing this twice (in AdminServer and Node) makes sense. If we can avoid this then I'm all for it.

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @erikgrinaker)


pkg/cli/start.go, line 1231 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

But are you really sure even the current change is needed? The change to this function is fine but, unless we can't get away without it, I'd rather we didn't use servercfg.Tracer in the call to sendKVBatchRequestWithTracingOption. I don't think the client needs to use any Tracer, and if it does need to use one, I'd rather it not be important that it uses the same one as the rpcContext (because I think that ideally these client RPCContexts didn't use tracing at all). You've said above that you got panics without it, but I think I've tried it and didn't get any panic (which also matched my superficial impression that there shouldn't be any interaction between the stopper's tracer and the rpc caller's). Can you please check one more time that it is indeed needed to align these tracers?

yes, now it works. the panic was from an earlier generation of the code. I have simplified accordingly.


pkg/server/exec_kv_batch.go, line 27 at r3 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I don't know what Erik said, but I really think the code was fine the way it was.

I agree that it's unfortunate to split this function out in this way, since it makes the code harder to follow. Your point about the request traversing this twice (in AdminServer and Node) makes sense. If we can avoid this then I'm all for it.

ok I have simplified it as you suggest

@knz knz force-pushed the 20220125-kv-req branch from 704aa5e to 2ceb672 Compare February 8, 2022 21:14
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @erikgrinaker, and @knz)


pkg/server/admin.go, line 2716 at r6 (raw file):

	if pErr != nil {
		br = &roachpb.BatchResponse{}
		log.VErrEventf(ctx, 3, "error from stores.Send: %s", pErr)

I think this message is wrong.
Also, is there a reason for this change? I don't think br.Error is used outside of KV. I think return nil, pErr.GoError() was the right thing.

This extends the `debug send-kv-batch` CLI utility to optionally
collect and print a trace of the remote processing.

The new command-line flags are as follows:

- `--trace`: enable/disable tracing and optionally configure
  the output trace format.
- `--keep-collected-spans`: whether to keep the collected spans on the
  response object. By default they are elided, but they can be preserved
  with this flag for troubleshooting or education purposes.
- `--trace-output`: to configure the output file for traces.
  By default traces are emitted to stderr.

Release note: None
@knz knz force-pushed the 20220125-kv-req branch from 2ceb672 to c6d292d Compare February 9, 2022 16:52
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @erikgrinaker)


pkg/server/admin.go, line 2716 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this message is wrong.
Also, is there a reason for this change? I don't think br.Error is used outside of KV. I think return nil, pErr.GoError() was the right thing.

I copy-pasted your code from https://reviewable.io/reviews/cockroachdb/cockroach/75519#-Muvaxtw1h8jwgbxvjE2:-Mv4VLJfDABOVXUmsbVJ:b-n6m6r9 ...

anyway, fixed.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 9, 2022

bors r=andreimatei,erikgrinaker

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 9, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants