kvserver/concurrency: apply goroutine labels before waiting for push#71622
kvserver/concurrency: apply goroutine labels before waiting for push#71622craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
@nvanbenschoten this was a random Friday musing based on an experience of debugging a "stuck" job where we saw in the stack trace that one of our job's goroutines had been in the |
petermattis
left a comment
There was a problem hiding this comment.
This is really neat!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @nvanbenschoten)
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 334 at r1 (raw file):
var err *roachpb.Error labels := pprof.Labels("pushee", t.PusheeTxn.ID.String(), "pusher", t.PusherTxn.ID.String()) pprof.Do(ctx, labels, func(ctx context.Context) {
I wonder how much faster it is to use pprof.SetGoroutineLabels. Doing so would avoid the closure allocation. Look at the source of pprof.Do. It is pretty small convenience function, and we're performance sensitive.
Setting the profile labels for a goroutine compiles down into a function call which does getg().labels = labels. So I think the primary overhead here is in the pprof.Labels call and the pprof.WithLabels call. Unfortunate allocations, but if used rarely on paths which are likely to be slow I think it makes a lot of sense.
nvb
left a comment
There was a problem hiding this comment.
This is cool! It's a good idea for any cold code paths where goroutines can get stuck waiting for long periods of time.
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt, @nvanbenschoten, and @petermattis)
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 333 at r1 (raw file):
var resp *roachpb.PushTxnResponse var err *roachpb.Error labels := pprof.Labels("pushee", t.PusheeTxn.ID.String(), "pusher", t.PusherTxn.ID.String())
Let's push this into MaybeWaitForPush. We can extract an inner Queue.waitForPush method that is called in the !ShouldPushImmediately case.
Then we can also handle the case where PusherTxn is empty better without cluttering this code, by giving it a label like "pusher": "non-txn".
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 334 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I wonder how much faster it is to use
pprof.SetGoroutineLabels. Doing so would avoid the closure allocation. Look at the source ofpprof.Do. It is pretty small convenience function, and we're performance sensitive.Setting the profile labels for a goroutine compiles down into a function call which does
getg().labels = labels. So I think the primary overhead here is in thepprof.Labelscall and thepprof.WithLabelscall. Unfortunate allocations, but if used rarely on paths which are likely to be slow I think it makes a lot of sense.
This is a cold enough code path that I wouldn't be too worried about an extra allocation or two. With lock wait-queues, requests only end up here if they have been waiting for more than 50ms on a conflicting transaction.
|
Nice! I had no idea labels were now included in the ?debug=1 output. Feels like this wasn't always true. |
|
They are unfortunately only in the debug=1 output, not the debug=2 output that we usually look at, collect in debug.zip’s, etc but I figure once I find, in =2 output, my stuck stack, I can then switch over to figure out who it is stuck on. |
|
Another useful goroutine to tag in this way would be the |
Does debug.zip contain |
david can you file an issue for that (unless you're picking it up in this pr) and assign to the server team, thanks |
Often I find a stack that ends in a goroutine waiting on a push, but I can't coorelate that back to which txn's are involved. Ideally we'd pprof label every single request we send but the allocation overhead of constructing the labels probably makes that a bad idea. Instead, maybe it is okay to just label right before going into the waiting step of push. Release note: none.
1117ccd to
2d0e95c
Compare
dt
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 333 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's push this into
MaybeWaitForPush. We can extract an innerQueue.waitForPushmethod that is called in the!ShouldPushImmediatelycase.Then we can also handle the case where
PusherTxnis empty better without cluttering this code, by giving it a label like"pusher": "non-txn".
Done: I pulled out the inner helper a little further down, starting after the existing log message, so I could reuse the existing pusherStr string that already handled that case as suggested.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @petermattis)
|
TFTRs! bors r+ |
|
Build succeeded: |
|
blathers backport 21.2 |
Often I find a stack that ends in a goroutine waiting on a push, but I can't coorelate that back
to which txn's are involved. Ideally we'd pprof label every single request we send but the allocation
overhead of constructing the labels probably makes that a bad idea. Instead, maybe it is okay to just
label right before going into the waiting step of push.
With this change, /debug/pprof/goroutine?debug=1 shows something like this:
Release note: none.