Skip to content

util: optimize FastIntSet#86053

Merged
craig[bot] merged 4 commits intocockroachdb:masterfrom
mgartner:optimize-fast-int-set
Aug 15, 2022
Merged

util: optimize FastIntSet#86053
craig[bot] merged 4 commits intocockroachdb:masterfrom
mgartner:optimize-fast-int-set

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

This PR contains several commits which speed up util.FastIntSet and reduce the
allocations it makes in some cases. See the individual commit messages for
details.

The speedup and reduction in allocations can be observed in the optimizer's
BenchmarkPhases:

name                                                                   old time/op    new time/op    delta
kv-read/Simple/OptBuildNorm-16                                    34.9µs ± 2%    35.0µs ± 4%     ~     (p=1.000 n=5+5)
kv-read/Simple/Explore-16                                         43.9µs ± 2%    43.7µs ± 1%     ~     (p=0.222 n=5+5)
kv-read/Prepared/ExecBuild-16                                     1.18µs ± 1%    1.13µs ± 1%   -3.72%  (p=0.008 n=5+5)
kv-read-const/Simple/OptBuildNorm-16                              35.2µs ± 2%    34.9µs ± 3%     ~     (p=0.151 n=5+5)
kv-read-const/Simple/Explore-16                                   45.2µs ± 5%    44.3µs ± 1%     ~     (p=0.056 n=5+5)
kv-read-const/Prepared/ExecBuild-16                               1.02µs ± 4%    0.97µs ± 1%   -5.18%  (p=0.008 n=5+5)
tpcc-new-order/Simple/OptBuildNorm-16                             71.1µs ± 1%    70.8µs ± 0%     ~     (p=0.548 n=5+5)
tpcc-new-order/Simple/Explore-16                                   117µs ± 1%     116µs ± 2%     ~     (p=0.095 n=5+5)
tpcc-new-order/Prepared/ExecBuild-16                              1.48µs ± 2%    1.44µs ± 1%   -2.67%  (p=0.008 n=5+5)
tpcc-delivery/Simple/OptBuildNorm-16                              65.5µs ± 2%    63.9µs ± 1%   -2.37%  (p=0.032 n=5+4)
tpcc-delivery/Simple/Explore-16                                   96.2µs ± 1%    95.7µs ± 1%     ~     (p=0.690 n=5+5)
tpcc-delivery/Prepared/AssignPlaceholdersNorm-16                  12.5µs ± 1%    12.3µs ± 1%     ~     (p=0.095 n=5+5)
tpcc-delivery/Prepared/Explore-16                                 40.5µs ± 1%    39.4µs ± 0%   -2.75%  (p=0.008 n=5+5)
tpcc-stock-level/Simple/OptBuildNorm-16                            291µs ± 2%     288µs ± 2%     ~     (p=0.310 n=5+5)
tpcc-stock-level/Simple/Explore-16                                 424µs ± 0%     417µs ± 2%   -1.79%  (p=0.016 n=5+5)
tpcc-stock-level/Prepared/AssignPlaceholdersNorm-16               53.8µs ± 0%    53.3µs ± 1%     ~     (p=0.151 n=5+5)
tpcc-stock-level/Prepared/Explore-16                               181µs ± 2%     176µs ± 3%   -2.44%  (p=0.032 n=5+5)
many-columns-and-indexes-a/Simple/OptBuildNorm-16                  309µs ± 0%     254µs ± 2%  -17.78%  (p=0.008 n=5+5)
many-columns-and-indexes-a/Simple/Explore-16                       467µs ± 1%     389µs ± 1%  -16.65%  (p=0.008 n=5+5)
many-columns-and-indexes-a/Prepared/AssignPlaceholdersNorm-16      275µs ± 1%     223µs ± 1%  -18.93%  (p=0.008 n=5+5)
many-columns-and-indexes-a/Prepared/Explore-16                     434µs ± 2%     358µs ± 1%  -17.68%  (p=0.008 n=5+5)
many-columns-and-indexes-b/Simple/OptBuildNorm-16                  331µs ± 1%     281µs ± 2%  -14.91%  (p=0.008 n=5+5)
many-columns-and-indexes-b/Simple/Explore-16                       500µs ± 2%     424µs ± 1%  -15.23%  (p=0.008 n=5+5)
many-columns-and-indexes-b/Prepared/AssignPlaceholdersNorm-16      279µs ± 1%     228µs ± 1%  -17.98%  (p=0.008 n=5+5)
many-columns-and-indexes-b/Prepared/Explore-16                     449µs ± 5%     367µs ± 0%  -18.20%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Simple/OptBuildNorm-16                 13.7ms ± 1%    10.6ms ± 1%  -22.68%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Simple/Explore-16                      45.7ms ± 2%    40.0ms ± 2%  -12.42%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Prepared/AssignPlaceholdersNorm-16     11.1ms ± 1%     8.6ms ± 2%  -22.77%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Prepared/Explore-16                    43.3ms ± 0%    37.8ms ± 2%  -12.65%  (p=0.008 n=5+5)
ored-preds-100/Simple/OptBuildNorm-16                             2.59ms ± 1%    2.60ms ± 1%     ~     (p=0.690 n=5+5)
ored-preds-100/Simple/Explore-16                                  3.29ms ± 3%    3.31ms ± 2%     ~     (p=0.548 n=5+5)
ored-preds-100/Prepared/ExecBuild-16                              85.5µs ± 1%    84.7µs ± 1%     ~     (p=0.095 n=5+5)
ored-preds-using-params-100/Simple/OptBuildNorm-16                1.88ms ± 1%    1.88ms ± 1%     ~     (p=0.421 n=5+5)
ored-preds-using-params-100/Simple/Explore-16                     2.56ms ± 1%    2.57ms ± 1%     ~     (p=0.413 n=4+5)
ored-preds-using-params-100/Prepared/AssignPlaceholdersNorm-16     468µs ± 3%     472µs ± 2%     ~     (p=0.421 n=5+5)
ored-preds-using-params-100/Prepared/Explore-16                   1.16ms ± 1%    1.17ms ± 1%     ~     (p=0.222 n=5+5)

name                                                                   old alloc/op   new alloc/op   delta
kv-read/Simple/OptBuildNorm-16                                    12.3kB ± 0%    12.3kB ± 0%     ~     (all equal)
kv-read/Simple/Explore-16                                         14.7kB ± 0%    14.7kB ± 0%     ~     (all equal)
kv-read/Prepared/ExecBuild-16                                       704B ± 0%      704B ± 0%     ~     (all equal)
kv-read-const/Simple/OptBuildNorm-16                              12.4kB ± 0%    12.4kB ± 0%     ~     (all equal)
kv-read-const/Simple/Explore-16                                   14.8kB ± 0%    14.8kB ± 0%     ~     (all equal)
kv-read-const/Prepared/ExecBuild-16                                 520B ± 0%      520B ± 0%     ~     (all equal)
tpcc-new-order/Simple/OptBuildNorm-16                             24.0kB ± 0%    24.0kB ± 0%     ~     (p=0.881 n=5+5)
tpcc-new-order/Simple/Explore-16                                  34.0kB ± 0%    33.9kB ± 0%   -0.01%  (p=0.032 n=5+5)
tpcc-new-order/Prepared/ExecBuild-16                                768B ± 0%      768B ± 0%     ~     (all equal)
tpcc-delivery/Simple/OptBuildNorm-16                              19.8kB ± 0%    19.8kB ± 0%     ~     (p=0.722 n=5+5)
tpcc-delivery/Simple/Explore-16                                   26.3kB ± 0%    26.3kB ± 0%     ~     (p=0.802 n=5+5)
tpcc-delivery/Prepared/AssignPlaceholdersNorm-16                  5.44kB ± 0%    5.44kB ± 0%     ~     (all equal)
tpcc-delivery/Prepared/Explore-16                                 13.1kB ± 0%    13.1kB ± 0%     ~     (p=1.000 n=5+5)
tpcc-stock-level/Simple/OptBuildNorm-16                           89.8kB ± 0%    89.8kB ± 0%     ~     (p=0.817 n=5+5)
tpcc-stock-level/Simple/Explore-16                                 120kB ± 0%     120kB ± 0%     ~     (p=0.937 n=5+5)
tpcc-stock-level/Prepared/AssignPlaceholdersNorm-16               18.8kB ± 0%    18.8kB ± 0%     ~     (p=1.000 n=5+5)
tpcc-stock-level/Prepared/Explore-16                              46.9kB ± 0%    46.9kB ± 0%     ~     (p=0.262 n=5+5)
many-columns-and-indexes-a/Simple/OptBuildNorm-16                 16.4kB ± 0%    16.4kB ± 0%     ~     (all equal)
many-columns-and-indexes-a/Simple/Explore-16                      23.1kB ± 0%    23.1kB ± 0%     ~     (p=1.000 n=5+5)
many-columns-and-indexes-a/Prepared/AssignPlaceholdersNorm-16     3.81kB ± 0%    3.81kB ± 0%     ~     (all equal)
many-columns-and-indexes-a/Prepared/Explore-16                    10.5kB ± 0%    10.5kB ± 0%     ~     (p=1.000 n=5+5)
many-columns-and-indexes-b/Simple/OptBuildNorm-16                 23.6kB ± 0%    23.6kB ± 0%     ~     (p=0.730 n=5+5)
many-columns-and-indexes-b/Simple/Explore-16                      30.0kB ± 0%    30.0kB ± 0%     ~     (p=0.159 n=5+5)
many-columns-and-indexes-b/Prepared/AssignPlaceholdersNorm-16     6.05kB ± 0%    6.05kB ± 0%     ~     (p=1.000 n=5+5)
many-columns-and-indexes-b/Prepared/Explore-16                    12.4kB ± 0%    12.4kB ± 0%     ~     (p=0.540 n=5+5)
many-columns-and-indexes-c/Simple/OptBuildNorm-16                 5.43MB ± 0%    4.75MB ± 0%  -12.49%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Simple/Explore-16                      27.0MB ± 0%    26.3MB ± 0%   -2.58%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Prepared/AssignPlaceholdersNorm-16     4.27MB ± 0%    3.65MB ± 0%  -14.54%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Prepared/Explore-16                    25.8MB ± 0%    25.2MB ± 0%   -2.48%  (p=0.008 n=5+5)
ored-preds-100/Simple/OptBuildNorm-16                              979kB ± 0%     979kB ± 0%     ~     (p=0.889 n=5+5)
ored-preds-100/Simple/Explore-16                                  1.37MB ± 0%    1.37MB ± 0%     ~     (p=0.690 n=5+5)
ored-preds-100/Prepared/ExecBuild-16                              29.6kB ± 0%    29.6kB ± 0%     ~     (all equal)
ored-preds-using-params-100/Simple/OptBuildNorm-16                 568kB ± 0%     568kB ± 0%     ~     (p=0.794 n=5+5)
ored-preds-using-params-100/Simple/Explore-16                      962kB ± 0%     962kB ± 0%     ~     (p=0.841 n=5+5)
ored-preds-using-params-100/Prepared/AssignPlaceholdersNorm-16     348kB ± 0%     348kB ± 0%     ~     (p=0.548 n=5+5)
ored-preds-using-params-100/Prepared/Explore-16                    742kB ± 0%     742kB ± 0%     ~     (p=0.841 n=5+5)

name                                                                   old allocs/op  new allocs/op  delta
kv-read/Simple/OptBuildNorm-16                                      77.0 ± 0%      77.0 ± 0%     ~     (all equal)
kv-read/Simple/Explore-16                                           88.0 ± 0%      88.0 ± 0%     ~     (all equal)
kv-read/Prepared/ExecBuild-16                                       9.00 ± 0%      9.00 ± 0%     ~     (all equal)
kv-read-const/Simple/OptBuildNorm-16                                79.0 ± 0%      79.0 ± 0%     ~     (all equal)
kv-read-const/Simple/Explore-16                                     90.0 ± 0%      90.0 ± 0%     ~     (all equal)
kv-read-const/Prepared/ExecBuild-16                                 6.00 ± 0%      6.00 ± 0%     ~     (all equal)
tpcc-new-order/Simple/OptBuildNorm-16                                128 ± 0%       128 ± 0%     ~     (all equal)
tpcc-new-order/Simple/Explore-16                                     171 ± 0%       171 ± 0%     ~     (all equal)
tpcc-new-order/Prepared/ExecBuild-16                                9.00 ± 0%      9.00 ± 0%     ~     (all equal)
tpcc-delivery/Simple/OptBuildNorm-16                                 120 ± 0%       120 ± 0%     ~     (all equal)
tpcc-delivery/Simple/Explore-16                                      169 ± 0%       169 ± 0%     ~     (all equal)
tpcc-delivery/Prepared/AssignPlaceholdersNorm-16                    27.0 ± 0%      27.0 ± 0%     ~     (all equal)
tpcc-delivery/Prepared/Explore-16                                   77.0 ± 0%      77.0 ± 0%     ~     (all equal)
tpcc-stock-level/Simple/OptBuildNorm-16                              552 ± 0%       552 ± 0%     ~     (all equal)
tpcc-stock-level/Simple/Explore-16                                   711 ± 0%       711 ± 0%     ~     (all equal)
tpcc-stock-level/Prepared/AssignPlaceholdersNorm-16                  105 ± 0%       105 ± 0%     ~     (all equal)
tpcc-stock-level/Prepared/Explore-16                                 264 ± 0%       264 ± 0%     ~     (all equal)
many-columns-and-indexes-a/Simple/OptBuildNorm-16                   69.0 ± 0%      69.0 ± 0%     ~     (all equal)
many-columns-and-indexes-a/Simple/Explore-16                        89.0 ± 0%      89.0 ± 0%     ~     (all equal)
many-columns-and-indexes-a/Prepared/AssignPlaceholdersNorm-16       21.0 ± 0%      21.0 ± 0%     ~     (all equal)
many-columns-and-indexes-a/Prepared/Explore-16                      41.0 ± 0%      41.0 ± 0%     ~     (all equal)
many-columns-and-indexes-b/Simple/OptBuildNorm-16                    128 ± 0%       128 ± 0%     ~     (all equal)
many-columns-and-indexes-b/Simple/Explore-16                         145 ± 0%       146 ± 0%     ~     (p=0.167 n=5+5)
many-columns-and-indexes-b/Prepared/AssignPlaceholdersNorm-16       36.0 ± 0%      36.0 ± 0%     ~     (all equal)
many-columns-and-indexes-b/Prepared/Explore-16                      53.0 ± 0%      53.0 ± 0%     ~     (all equal)
many-columns-and-indexes-c/Simple/OptBuildNorm-16                  72.3k ± 0%     61.7k ± 0%  -14.66%  (p=0.029 n=4+4)
many-columns-and-indexes-c/Simple/Explore-16                        295k ± 0%      284k ± 0%   -3.68%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Prepared/AssignPlaceholdersNorm-16      66.1k ± 0%     56.4k ± 0%  -14.69%  (p=0.008 n=5+5)
many-columns-and-indexes-c/Prepared/Explore-16                      289k ± 0%      279k ± 0%   -3.46%  (p=0.008 n=5+5)
ored-preds-100/Simple/OptBuildNorm-16                              16.5k ± 0%     16.5k ± 0%     ~     (p=0.333 n=5+4)
ored-preds-100/Simple/Explore-16                                   17.9k ± 0%     17.9k ± 0%     ~     (all equal)
ored-preds-100/Prepared/ExecBuild-16                                 413 ± 0%       413 ± 0%     ~     (all equal)
ored-preds-using-params-100/Simple/OptBuildNorm-16                 4.08k ± 0%     4.08k ± 0%     ~     (all equal)
ored-preds-using-params-100/Simple/Explore-16                      5.54k ± 0%     5.54k ± 0%     ~     (all equal)
ored-preds-using-params-100/Prepared/AssignPlaceholdersNorm-16     1.44k ± 0%     1.44k ± 0%     ~     (all equal)
ored-preds-using-params-100/Prepared/Explore-16                    2.89k ± 0%     2.89k ± 0%     ~     (all equal)

@mgartner mgartner requested review from DrewKimball and nvb August 12, 2022 18:55
@mgartner mgartner requested a review from a team as a code owner August 12, 2022 18:55
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner mgartner requested a review from michae2 August 12, 2022 19:01
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

This is cool! Have you run the slow queries benchmark to see if there's any difference there?

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @nvanbenschoten)


pkg/util/fast_int_set.go line 103 at r2 (raw file):

// added to both the large and small sets.
func (s *FastIntSet) Add(i int) {
	withinSmallBounds := i >= 0 && i < smallCutoff

[nit] This condition can be inlined now.


pkg/util/fast_int_set.go line 170 at r2 (raw file):

// Len returns the number of the elements in the set.
func (s FastIntSet) Len() int {
	len := s.small.OnesCount()

[nit] feels weird to use len as a variable name; could you use length instead?

`util.FastIntSet` now stores values in the range `[0, 128)` only in its
`small` bitmap, and never in its `large *intsets.Sparse`. This better
utilizes memory allocated for `large` and can significantly reduce
allocations in some cases. Prior to this change, the set `{0, 200, 300}`
would require two allocations. After this change, only a single
allocation is required. See the comments added in the code for more
details.

This change also reduces the complexity of `fitsInSmall` which could
account for up to 16% of CPU-time in some optimizer benchmarks. This
function previously used `intsets.Sparse.Min` and `intsets.Sparse.Max`
whenever `large` was allocated, which accounted for the majority of
CPU-time spent in `fitsInSmall`. Now, `fitsInSmall` can simply check
whether `large` is empty, which is a cheaper operation.

Release note: None
Previously, `FastIntSet.Copy` and `FastIntSet.CopyFrom` would always
allocate new `large` sets if the set being copied had a non-nil `large`
set. However, this set could be non-nil and empty in cases were elements
were removed from a set, e.g., with `Remove` or `IntersectionWith`. This
caused unnecessary allocations for `large` sets that ended up being
empty. Now, the `large` set is only allocated if necessary.

Release note: None
`FastIntSet.toLarge` is now so simple that it doesn't provide any
significant benefit to the readability of the code. It has been removed.

`FastIntSet.DifferenceWith` now avoids a potential allocation if `rhs`
has a nil `large` set.

Release note: None
@mgartner mgartner force-pushed the optimize-fast-int-set branch from 24aa613 to e22a528 Compare August 12, 2022 20:16
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run the slow queries benchmark to see if there's any difference there?

They show some improvement:

name                         old time/op    new time/op    delta
SlowQueries/slow-query-1-16    35.6ms ± 4%    31.9ms ± 9%  -10.38%  (p=0.008 n=5+5)
SlowQueries/slow-query-2-16     597ms ± 5%     552ms ± 2%   -7.56%  (p=0.008 n=5+5)

name                         old alloc/op   new alloc/op   delta
SlowQueries/slow-query-1-16    13.3MB ± 0%    12.9MB ± 0%   -2.71%  (p=0.008 n=5+5)
SlowQueries/slow-query-2-16     181MB ± 0%     181MB ± 0%     ~     (p=0.310 n=5+5)

name                         old allocs/op  new allocs/op  delta
SlowQueries/slow-query-1-16      147k ± 0%      141k ± 0%   -3.84%  (p=0.008 n=5+5)
SlowQueries/slow-query-2-16     1.65M ± 0%     1.65M ± 0%   -0.01%  (p=0.008 n=5+5)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @nvanbenschoten)


pkg/util/fast_int_set.go line 103 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] This condition can be inlined now.

Done.


pkg/util/fast_int_set.go line 170 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] feels weird to use len as a variable name; could you use length instead?

Heh, oops. Changed to l.

@mgartner
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 14, 2022

Build failed:

@rytaft
Copy link
Copy Markdown
Collaborator

rytaft commented Aug 15, 2022

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 15, 2022

Build succeeded:

@craig craig bot merged commit 1d8f18c into cockroachdb:master Aug 15, 2022
@mgartner mgartner deleted the optimize-fast-int-set branch August 18, 2022 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants