Skip to content

kvccl: re-order enterprise check in canSendToFollower#62465

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/followerReadEnterprise
Mar 23, 2021
Merged

kvccl: re-order enterprise check in canSendToFollower#62465
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/followerReadEnterprise

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Mar 23, 2021

Fixes #62447.

In #62447, Erik found that #59571 had re-ordered the call to
utilccl.CheckEnterpriseEnabled to occur before checking the batch in
canSendToFollower, instead of after. This added an error allocation
into the hot path of all reads, which showed up in CPU profiles and
caused an 8% performance regression on kv95. This commit fixes this by
moving the enterprise check back out of the hot-path for all non-stale
read-only batches.

A follow up to this PR would be to make utilccl.CheckEnterpriseEnabled
cheaper by avoiding the error allocation for callers that don't need an
error. This work is not done in this commit.

Fixes cockroachdb#62447.

In cockroachdb#62447, Erik found that cockroachdb#59571 had re-ordered the call to
`utilccl.CheckEnterpriseEnabled` to occur before checking the batch in
`canSendToFollower`, instead of after. This added an error allocation
into the hot path of all reads, which showed up in CPU profiles and
caused an 8% performance regression on `kv95`. This commit fixes this by
moving the enterprise check back out of the hot-path for all non-stale
read-only batches.

A follow up to this PR would be to make `utilccl.CheckEnterpriseEnabled`
cheaper by avoiding the error allocation for callers that don't need an
error. This work is not done in this commit.
@nvb nvb requested a review from erikgrinaker March 23, 2021 16:06
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 23, 2021

I still need to run kv95 to demonstrate that this has fixed the perf regression.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Good place to start, can look into the enterprise check follow-up. Let me know if you'd like a hand with kv95 benchmarks as well.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 23, 2021

Let me know if you'd like a hand with kv95 benchmarks as well.

All good, I can kick them off. Do you have a pointer to which kv95 roachtest you were running? kv95/enc=false/nodes=1/cpu=32 on GCE?

@erikgrinaker
Copy link
Copy Markdown
Contributor

Yep. We usually do --count 5 --cpu-quota 320 and pick the median.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 23, 2021

I'm not seeing much of a top-line improvement with this change when using benchstat to look at statistical significance over 10 runs, but the improvement is pretty clear from CPU profiles taken before and after:

name                           old ops/s    new ops/s    delta
kv95/enc=false/nodes=1/cpu=32   65.3k ± 4%   66.5k ± 5%  +1.86%  (p=0.094 n=9+9)

name                           old avg(ms)  new avg(ms)  delta
kv95/enc=false/nodes=1/cpu=32    1.00 ± 0%    1.00 ± 0%    ~     (all equal)

name                           old p99(ms)  new p99(ms)  delta
kv95/enc=false/nodes=1/cpu=32    5.37 ± 8%    5.08 ± 8%  -5.38%  (p=0.045 n=9+9)

Before

Screen Shot 2021-03-23 at 1 01 36 PM

After

Screen Shot 2021-03-23 at 1 02 01 PM

It's possible that the call to (*hlc.Clock).Now is more expensive that the call to timeutil.Since, and that is also contributing to the regression we're seeing. I'll try mocking out the entire function and seeing how that performs.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 23, 2021

I'll try mocking out the entire function and seeing how that performs.

I ran another 10 runs with the following patch, which makes closedTimestampLikelySufficient a no-op.

diff --git a/pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go b/pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go
index 8d4695a..eda6adc 100644
--- a/pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go
+++ b/pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go
@@ -102,6 +102,9 @@ func closedTimestampLikelySufficient(
        ctPolicy roachpb.RangeClosedTimestampPolicy,
        requiredFrontierTS hlc.Timestamp,
 ) bool {
+       if true {
+               return false
+       }
        var offset time.Duration
        switch ctPolicy {
        case roachpb.LAG_BY_CLUSTER_SETTING:

This resulted in the following change in performance:

name                           old ops/s    new ops/s    delta
kv95/enc=false/nodes=1/cpu=32   66.5k ± 5%   66.9k ± 3%    ~     (p=0.661 n=9+10)

name                           old avg(ms)  new avg(ms)  delta
kv95/enc=false/nodes=1/cpu=32    1.00 ± 0%    0.95 ± 5%  -5.00%  (p=0.007 n=7+10)

name                           old p99(ms)  new p99(ms)  delta
kv95/enc=false/nodes=1/cpu=32    5.08 ± 8%    5.00 ± 0%    ~     (p=0.286 n=9+6)

So while there are likely some wins here to be had, likely around exposing a version of (*hlc.Clock).Now that does not tick the HLC, that isn't responsible for the rest of the 8% perf hit referenced in #62447, so I don't think there's anything more to do here now.

@erikgrinaker
Copy link
Copy Markdown
Contributor

Hm, if we're hitting this often enough for the clock to be significant here, I'd be even more worried about hitting this for users that do have a license:

// FIXME(tschottdorf): see whether it makes sense to cache the decoded
// license.
if str := enterpriseLicense.Get(&st.SV); str != "" {
var err error
if lic, err = decode(str); err != nil {
return err
}
}

Maybe we need to consider caching the enterprise settings?

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 23, 2021

Agreed. With this PR, the clock is hit on the hot path for all requests but the enterprise license is only hit for follower reads, instead of the other way around, but I agree that we should make that check cheaper if it's going to be on the hot path of any type of request.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 23, 2021

I ran kv95/enc=false/nodes=1/cpu=32 10 times on 0ac8ab9 and then on its parent be115b9 and am seeing a small regression, but not on the same order of magnitude as the 8% drop cited in #62447.

name                           old ops/s    new ops/s    delta
kv95/enc=false/nodes=1/cpu=32   72.6k ± 1%   72.0k ± 2%  -0.83%  (p=0.089 n=10+10)

name                           old avg(ms)  new avg(ms)  delta
kv95/enc=false/nodes=1/cpu=32    0.90 ± 0%    0.90 ± 0%    ~     (all equal)

name                           old p99(ms)  new p99(ms)  delta
kv95/enc=false/nodes=1/cpu=32    4.61 ± 8%    4.82 ± 8%    ~     (p=0.109 n=10+10)

So I feel comfortable with merging this fix and calling it a day. Thanks again for finding this @erikgrinaker!

bors r+

@erikgrinaker
Copy link
Copy Markdown
Contributor

Thanks! I'll do a couple more runs as well to recheck the numbers -- kv95 does tend do be a bit unreliable.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 23, 2021

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 23, 2021

Build succeeded:

@craig craig bot merged commit 822187c into cockroachdb:master Mar 23, 2021
@erikgrinaker
Copy link
Copy Markdown
Contributor

Opened #62489 for CheckEnterpriseEnabled optimizations.

@erikgrinaker
Copy link
Copy Markdown
Contributor

I see significant gains with this patch, going from 61656 ops/s before to 65841 ops/s after on kv95/enc=false/nodes=1/cpu=32. This is consistent with the original drop in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvfollowerreadsccl: excessive overhead in canSendToFollower

3 participants