Skip to content

sql: Make cancel checker more efficient#103866

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
miretskiy:cancelchecker
May 31, 2023
Merged

sql: Make cancel checker more efficient#103866
craig[bot] merged 2 commits intocockroachdb:masterfrom
miretskiy:cancelchecker

Conversation

@miretskiy
Copy link
Copy Markdown
Contributor

@miretskiy miretskiy commented May 24, 2023

Arrange for cancel checker to be notified once context
is done.

Old implementation:
BenchmarkCancelChecker-10    	133185742	   8.790 ns/op
New Implementation:
BenchmarkCancelChecker-10    	1000000000	 0.5804 ns/op

Epic: None

Release note: None

@miretskiy miretskiy requested a review from yuzefovich May 24, 2023 20:06
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@miretskiy miretskiy force-pushed the cancelchecker branch 8 times, most recently from 22fec9e to 8e3f34b Compare May 25, 2023 15:01
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like it. Curious though what prompted you to look into this?

(Also, not sure if you were planning to address those, but there are a few other places where we have a similar logic - e.g. colexecutils/cancel_checker.go.)

Reviewed 7 of 7 files at r1, 11 of 11 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)


pkg/util/cancelchecker/cancel_checker.go line 50 at r2 (raw file):

			atomic.StoreUint32(&done, 1)
		}); err != nil {
			panic(err)

nit: add a comment here that err could only be non-nil if ctx.Done() is nil which we know is false.

@miretskiy
Copy link
Copy Markdown
Contributor Author

Nice! I like it. Curious though what prompted you to look into this?

I started working on conditional mutex -- where I wanted to lock the mutex when some condition
is true, but also wanted to respect context cancellation (#103846)
Then realized, that just the context portion of that PR is useful, and found this cancel checker
as a direct use case that can benefit from this change.

(Also, not sure if you were planning to address those, but there are a few other places where we have a similar logic - e.g. colexecutils/cancel_checker.go.)

I can add it to this PR if you prefer.

Reviewed 7 of 7 files at r1, 11 of 11 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)

pkg/util/cancelchecker/cancel_checker.go line 50 at r2 (raw file):

			atomic.StoreUint32(&done, 1)
		}); err != nil {
			panic(err)

nit: add a comment here that err could only be non-nil if ctx.Done() is nil which we know is false.

Ack. Done.

Add `ctxutil.WhenDone` function which invokes a callback when
parent context becomes done.

```
parent, cancelParent := context.WithTimeout(....)
ctxutil.WhenDone(parent, func() {....})
```

Normally, in order to invoke a function when context becomes
done, a go routine should be spun up that listens on
`ctx.Done()` channel.  Sometimes, this is too heavy weight.
`ctxutil.WhenDone` adds such a mechanism, which under most
circumstances, simply invokes the function when context completes.

When parent context becomes done, the funciton is invoked
with the result (Err()) of the parent context.

If the parent context never completes, `WhenDone` returns an error.

Go versions 1.20 or higher also add ability to augment
context error message with "cause" error.

Epic: None

Release note: None
@miretskiy miretskiy force-pushed the cancelchecker branch 2 times, most recently from a125731 to 0c5d534 Compare May 31, 2023 14:42
@miretskiy
Copy link
Copy Markdown
Contributor Author

I can add it to this PR if you prefer.

added.

@miretskiy miretskiy force-pushed the cancelchecker branch 2 times, most recently from 2f51263 to 26d0df0 Compare May 31, 2023 15:40
@miretskiy miretskiy marked this pull request as ready for review May 31, 2023 15:40
@miretskiy miretskiy requested a review from a team as a code owner May 31, 2023 15:40
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks!

Reviewed 14 of 14 files at r3, 18 of 19 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy)


pkg/util/ctxutil/context.go line 35 at r4 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

@yuzefovich -- do you think I should allow this case -- like: you call whenDone when context is never done -- that's on you :)
Alternatively, maybe add a method such as MustInvokeWhenDone -- which has this check and have regular "WhenDone" work with any context?

Discussed offline, the current approach seems good to me.


pkg/util/cancelchecker/cancel_checker_test.go line 44 at r5 (raw file):

}

// Old implementation:

nit: I'd extract this comment from the code into the commit message (I don't think we usually put things like this into the code).

@miretskiy
Copy link
Copy Markdown
Contributor Author

nit: I'd extract this comment from the code into the commit message (I don't think we usually put things like this into the code).

done

Arrange for cancel checker to be notified once context
is done.

```
Old implementation:
BenchmarkCancelChecker-10            133185742          8.790 ns/op

New Implementation:
BenchmarkCancelChecker-10            1000000000       0.5804 ns/op
```

Epic: None

Release note: None
@miretskiy
Copy link
Copy Markdown
Contributor Author

bors r+

@shermanCRL
Copy link
Copy Markdown
Contributor

I’d be interested @yuzefovich to know if this would be a visible improvement in SQL performance? Any sort of macro benchmark that may be interesting?

@yuzefovich
Copy link
Copy Markdown
Member

My guess is that it won't be really visible in an end-to-end testing. For an OLTP query we're likely to do only a single cancellation check throughout the execution, so a few nanoseconds won't be visible; for an OLAP query we're likely to do many cancellation checks, so we could shave off microseconds or perhaps even milliseconds, yet the query latency would be in seconds, so it still wouldn't be visible.

For the former, one could run BenchmarkFlowSetup before and after; for the latter, probably tpchvec/perf roachtest on the order of 50 times and take the average. I could kick this off if you're interested.

@miretskiy
Copy link
Copy Markdown
Contributor Author

Yeah; the overall change is not great; colexec calls to CheckEvery might have been more expensive... but still, it's hard to see those if you're talking seconds. A solid maybe :)

@shermanCRL
Copy link
Copy Markdown
Contributor

shermanCRL commented May 31, 2023

If you don’t foresee visible macro improvements, doesn’t sound like a good use of your time @yuzefovich! I was just curious. If it were in a hot loop the answer might be different. Thanks.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 31, 2023

Build succeeded:

@craig craig bot merged commit 145bb3b into cockroachdb:master May 31, 2023
@yuzefovich
Copy link
Copy Markdown
Member

Interestingly, this seems to lead to an increase in allocations which actually increases the latency on the hot path:

yuzefovich@gceworker-yuzefovich:~/go/src/github.com/cockroachdb/cockroach$ BENCHTIMEOUT=72h PKG=./pkg/sql/flowinfra BENCHES=BenchmarkFlowSetup ./scripts/bench HEAD~1 HEAD~0
Comparing HEAD~0 (new) with HEAD~1 (old)

Writing to /tmp/tmp.Y7b21cdmu3
Switching to HEAD~1
+ ./dev bench ./pkg/sql/flowinfra --timeout=72h --filter=BenchmarkFlowSetup --count=30 --bench-mem -v --stream-output --ignore-cache
Switching to HEAD~0
+ ./dev bench ./pkg/sql/flowinfra --timeout=72h --filter=BenchmarkFlowSetup --count=30 --bench-mem -v --stream-output --ignore-cache
name                                           old time/op    new time/op    delta
FlowSetup/vectorize=true/distribute=true-24       204µs ± 3%     210µs ± 4%  +2.89%  (p=0.000 n=28+27)
FlowSetup/vectorize=true/distribute=false-24      205µs ± 8%     209µs ± 8%  +2.22%  (p=0.008 n=26+26)
FlowSetup/vectorize=false/distribute=true-24      205µs ± 5%     210µs ± 6%  +2.57%  (p=0.000 n=29+30)
FlowSetup/vectorize=false/distribute=false-24     201µs ± 6%     206µs ± 3%  +2.70%  (p=0.000 n=28+27)

name                                           old alloc/op   new alloc/op   delta
FlowSetup/vectorize=true/distribute=true-24      24.8kB ±13%    25.2kB ±13%  +1.49%  (p=0.005 n=29+29)
FlowSetup/vectorize=true/distribute=false-24     23.6kB ±13%    23.4kB ±11%  -0.56%  (p=0.001 n=27+25)
FlowSetup/vectorize=false/distribute=true-24     28.7kB ±10%    28.9kB ±11%    ~     (p=0.842 n=28+29)
FlowSetup/vectorize=false/distribute=false-24    27.0kB ± 1%    27.0kB ± 1%    ~     (p=0.370 n=24+24)

name                                           old allocs/op  new allocs/op  delta
FlowSetup/vectorize=true/distribute=true-24         265 ±15%       273 ±14%  +2.81%  (p=0.001 n=30+30)
FlowSetup/vectorize=true/distribute=false-24        253 ±17%       253 ± 1%  +0.10%  (p=0.001 n=27+23)
FlowSetup/vectorize=false/distribute=true-24        262 ±10%       267 ±16%  +1.66%  (p=0.001 n=28+30)
FlowSetup/vectorize=false/distribute=false-24       246 ± 1%       247 ± 0%  +0.51%  (p=0.000 n=24+19)

(This is on 145bb3b, and I manually modified scripts/bench to use 30 iterations instead of 10.)

@miretskiy
Copy link
Copy Markdown
Contributor Author

miretskiy commented Jun 1, 2023 via email

@knz
Copy link
Copy Markdown
Contributor

knz commented Jun 19, 2023

Interestingly, this seems to lead to an increase in allocations which actually increases the latency on the hot path:

Have we resolved this latency increase? I don't think this PR is bringing us forward if we keep this latency increase on the hot path.

Should I send a revert for this PR?

@miretskiy
Copy link
Copy Markdown
Contributor Author

miretskiy commented Jun 19, 2023 via email

miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Jul 18, 2023
Previous commit cockroachdb#76306 arranged for interceptors to run
for all rpcs (local or otherwise).  In doing so, it added
an additional goroutine for each RPC whose purpose was to
detect context completion, and to arrange for the cleanup
function to run when that happened.

`ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd)
allows for the same functionality without the additional
goroutine cost.

Informs cockroachdb#107053

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Jul 19, 2023
Previous commit cockroachdb#76306 arranged for interceptors to run
for all rpcs (local or otherwise).  In doing so, it added
an additional goroutine for each RPC whose purpose was to
detect context completion, and to arrange for the cleanup
function to run when that happened.

`ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd)
allows for the same functionality without the additional
goroutine cost.

Informs cockroachdb#107053

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Jul 19, 2023
Previous commit cockroachdb#76306 arranged for interceptors to run
for all rpcs (local or otherwise).  In doing so, it added
an additional goroutine for each RPC whose purpose was to
detect context completion, and to arrange for the cleanup
function to run when that happened.

`ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd)
allows for the same functionality without the additional
goroutine cost.

Informs cockroachdb#107053

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Jul 20, 2023
Previous commit cockroachdb#76306 arranged for interceptors to run
for all rpcs (local or otherwise).  In doing so, it added
an additional goroutine for each RPC whose purpose was to
detect context completion, and to arrange for the cleanup
function to run when that happened.

`ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd)
allows for the same functionality without the additional
goroutine cost.

Informs cockroachdb#107053

Release note: None
craig bot pushed a commit that referenced this pull request Jul 20, 2023
107079: grpcinterceptor: Use ctxutil to detect context cancellation r=miretskiy a=miretskiy

Previous commit #76306 arranged for interceptors to run for all rpcs (local or otherwise).  In doing so, it added an additional go routine for each RPC whose purpose was to detect context completion, and to arrange for the cleanup function to run when that happened.

`ctxutil.WhenDone` (added by #103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd) allows for the same functionality without the additional goroutine cost.

Informs #107053

Release note: None

107267: kvcoord: make rangefeed connection class `TenantReadOnly` r=erikgrinaker a=erikgrinaker

Otherwise, the setting won't have any effect for tenants, always falling through to the default.

Resolves #106895.
Epic: none
Release note: None

107273: build: use absolute path to pnpm-lock.yaml r=maryliag a=sjbarag

Previously, actions/setup-node@v3 was used with a relative path for `cache-dependency-path`. This had an unexpected interaction with the `.defaults.run.working-directory: "pkg/ui/workspaces/cluster-ui"` declaration, which caused setup-node's cleanup action to look for pkg/ui/workspaces/cluster-ui/pkg/ui/pnpm-lock.yaml. Naturally, this file didn't exist. Use an absolute path (thanks, `github.workspace`!) to pnpm-lock.yaml instead.

Release note: None
Epic: none

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Sean Barag <barag@cockroachlabs.com>
THardy98 pushed a commit to zachlite/cockroach that referenced this pull request Jul 24, 2023
Previous commit cockroachdb#76306 arranged for interceptors to run
for all rpcs (local or otherwise).  In doing so, it added
an additional goroutine for each RPC whose purpose was to
detect context completion, and to arrange for the cleanup
function to run when that happened.

`ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd)
allows for the same functionality without the additional
goroutine cost.

Informs cockroachdb#107053

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Aug 2, 2023
…tion

Backport 1/2 commits from
cockroachdb#103866
Backport 1/1 commits from
cockroachdb#107079

Previous commit cockroachdb#76306 arranged for interceptors to run
for all rpcs (local or otherwise).  In doing so, it added
an additional goroutine for each RPC whose purpose was to
detect context completion, and to arrange for the cleanup
function to run when that happened.

`ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd)
allows for the same functionality without the additional
goroutine cost.

Informs cockroachdb#107053

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Aug 2, 2023
…tion

Backport 1/2 commits from
cockroachdb#103866
Backport 1/1 commits from
cockroachdb#107079

Previous commit cockroachdb#76306 arranged for interceptors to run
for all rpcs (local or otherwise).  In doing so, it added
an additional goroutine for each RPC whose purpose was to
detect context completion, and to arrange for the cleanup
function to run when that happened.

`ctxutil.WhenDone` (added by cockroachdb#103866/commits/dfe4ef82810aaf81ae21afecce83cbc1fb6783dd)
allows for the same functionality without the additional
goroutine cost.

Informs cockroachdb#107053

Release note: None
Release justification: Important performance improvement for large scale
rangefeed workloads to remove inadvertently introduced goroutine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants