util/intsets: use new intsets.Sparse implementation for util.FastIntSet#90009
util/intsets: use new intsets.Sparse implementation for util.FastIntSet#90009craig[bot] merged 5 commits intocockroachdb:masterfrom
Conversation
|
Optimizer microbenchmarks show some promising improvements: https://gist.github.com/mgartner/bafce7fa99cf96644304bceb23303919 I'm planning on running more end-to-end benchmarks to ensure this doesn't cause any significant regressions. |
d556f16 to
2fa62ae
Compare
|
👍 Minor impact on test-eng code |
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 172 of 172 files at r1, 121 of 121 files at r2, 3 of 3 files at r3, 6 of 6 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @herkolategan, @HonoreDB, @mgartner, @msbutler, and @smg260)
pkg/util/intsets/sparse.go line 29 at r4 (raw file):
// Max method allows us to use a singly-linked list here instead of a // circular, doubly-linked list. type Sparse struct {
I wonder whether it would be worth it to add an explicit unit test for this new struct. In particular, it seems that writing a randomized test that compares the new implementation against intsets.Sparse would be pretty low effort and would give us more confidence.
pkg/util/intsets/sparse.go line 51 at r4 (raw file):
// offset returns the block offset for the given integer. // Note: Bitwise AND NOT only works here because smallCutoff is a power of two.
nit: should we add an explicit assertion for this? I.e. something like
func init() {
if (smallCutoff & (smallCutoff - 1)) != 0 {
panic("boom")
}
}
pkg/util/intsets/sparse.go line 95 at r4 (raw file):
return nil } s.root.offset = b.next.offset
nit: can we do s.root = *b.next?
pkg/util/intsets/sparse.go line 129 at r4 (raw file):
func (s *Sparse) Remove(i int) { o := offset(i) b := bit(i)
nit: not sure whether it is worth it, but we could lazily compute value of b. In case when a block that would contain i is not present in the set, we'd avoid computing the bit altogether. Ditto for Contains and LowerBound.
pkg/util/intsets/sparse.go line 267 at r4 (raw file):
} } if sb == &s.root {
The split of the logic between a loop and outside seems tricky - could you add some more commentary here? Like what sb and last are expected to point at, etc.
2fa62ae to
685fcdf
Compare
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @msbutler, @smg260, and @yuzefovich)
pkg/util/intsets/sparse.go line 29 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I wonder whether it would be worth it to add an explicit unit test for this new struct. In particular, it seems that writing a randomized test that compares the new implementation against
intsets.Sparsewould be pretty low effort and would give us more confidence.
Good idea. I'll work on adding those.
pkg/util/intsets/sparse.go line 51 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: should we add an explicit assertion for this? I.e. something like
func init() { if (smallCutoff & (smallCutoff - 1)) != 0 { panic("boom") } }
Good idea. Done.
pkg/util/intsets/sparse.go line 95 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: can we do
s.root = *b.next?
We can, but it compiles to basically the same instructions:
MOVQ 24(CX), DX
- MOVQ 8(DX), BX
- MOVQ 16(DX), DX
+ MOVQ 8(CX), BX
+ MOVQ 16(CX), SI
+ MOVQ (CX), CX
+ MOVQ CX, (AX)
MOVQ BX, 8(AX)
- MOVQ DX, 16(AX)
- MOVQ 24(CX), CX
- MOVQ 24(CX), CX
+ MOVQ SI, 16(AX)
PCDATA $0, $-2
CMPL runtime.writeBarrier(SB), $0
- JNE command-line-arguments_Sparse_removeBlock_pc126
- MOVQ CX, 24(AX)
- JMP command-line-arguments_Sparse_removeBlock_pc135
-command-line-arguments_Sparse_removeBlock_pc126:
+ JNE command-line-arguments_Sparse_removeBlock_pc118
+ MOVQ DX, 24(AX)
+ JMP command-line-arguments_Sparse_removeBlock_pc127
+command-line-arguments_Sparse_removeBlock_pc118:
LEAQ 24(AX), DI
- CALL runtime.gcWriteBarrierCX(SB)
-command-line-arguments_Sparse_removeBlock_pc135:
+ CALL runtime.gcWriteBarrierDX(SB)
+command-line-arguments_Sparse_removeBlock_pc127:But if you think it's more clear and concise, I can make the change.
pkg/util/intsets/sparse.go line 129 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: not sure whether it is worth it, but we could lazily compute value of
b. In case when a block that would containiis not present in the set, we'd avoid computing the bit altogether. Ditto forContainsandLowerBound.
This doesn't seem to change the compiled assembly at all, so I'll leave as-is.
pkg/util/intsets/sparse.go line 267 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
The split of the logic between a loop and outside seems tricky - could you add some more commentary here? Like what
sbandlastare expected to point at, etc.
This was the trickiest part to get correct. I've left a note - let me know if it helps.
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 134 of 134 files at r6, 121 of 121 files at r7, 3 of 3 files at r8, 6 of 6 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @mgartner, @msbutler, and @smg260)
pkg/util/intsets/sparse.go line 95 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
We can, but it compiles to basically the same instructions:
MOVQ 24(CX), DX - MOVQ 8(DX), BX - MOVQ 16(DX), DX + MOVQ 8(CX), BX + MOVQ 16(CX), SI + MOVQ (CX), CX + MOVQ CX, (AX) MOVQ BX, 8(AX) - MOVQ DX, 16(AX) - MOVQ 24(CX), CX - MOVQ 24(CX), CX + MOVQ SI, 16(AX) PCDATA $0, $-2 CMPL runtime.writeBarrier(SB), $0 - JNE command-line-arguments_Sparse_removeBlock_pc126 - MOVQ CX, 24(AX) - JMP command-line-arguments_Sparse_removeBlock_pc135 -command-line-arguments_Sparse_removeBlock_pc126: + JNE command-line-arguments_Sparse_removeBlock_pc118 + MOVQ DX, 24(AX) + JMP command-line-arguments_Sparse_removeBlock_pc127 +command-line-arguments_Sparse_removeBlock_pc118: LEAQ 24(AX), DI - CALL runtime.gcWriteBarrierCX(SB) -command-line-arguments_Sparse_removeBlock_pc135: + CALL runtime.gcWriteBarrierDX(SB) +command-line-arguments_Sparse_removeBlock_pc127:But if you think it's more clear and concise, I can make the change.
Thanks for checking, for me it's a minor stylistic preference, but please feel free to ignore this nit.
pkg/util/intsets/sparse.go line 267 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This was the trickiest part to get correct. I've left a note - let me know if it helps.
Yes, now it's clear, thanks!
DrewKimball
left a comment
There was a problem hiding this comment.
Thanks for pulling it into separate commits! I'm excited to see what sort of additional gains we can see by using arena allocation for
intsets.Sparse
Reviewed 44 of 172 files at r1, 134 of 134 files at r6, 121 of 121 files at r7, 3 of 3 files at r8, 6 of 6 files at r9.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @HonoreDB, @mgartner, @msbutler, @smg260, and @yuzefovich)
pkg/util/intsets/sparse.go line 58 at r9 (raw file):
// offset returns the block offset for the given integer. // Note: Bitwise AND NOT only works here because smallCutoff is a power of two. func offset(i int) int {
[nit] Should we add gcassert:inline to some of these methods?
|
@mgartner do you have any plans to bring this one across the finish line? I'm happy to help if it means getting those sweet perf improvements. |
8aa0939 to
6f24912
Compare
mgartner
left a comment
There was a problem hiding this comment.
@DrewKimball I've just rebased this and I think the last remaining thing is writing a bunch of randomized tests for sparse. Let me see if I can make some progress on that today.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @msbutler, @smg260, and @yuzefovich)
pkg/util/intsets/sparse.go line 267 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Yes, now it's clear, thanks!
Done.
pkg/util/intsets/sparse.go line 58 at r9 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Should we add gcassert:inline to some of these methods?
I've added them, but I don't think the TestGCAssert linter is running under bazel, so they won't do anything until that is fixed. See #65485.
6f24912 to
c89e7ca
Compare
mgartner
left a comment
There was a problem hiding this comment.
I've added tests and fixed a bug that they uncovered in UnionWith. I think the last thing to do it run the optimizer micro-benchmarks again and make sure that we still see the same improvement after the minor change that fixed the bug.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @msbutler, @smg260, and @yuzefovich)
pkg/util/intsets/sparse.go line 29 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Good idea. I'll work on adding those.
I've added randomized tests for intsets.Sparse using an oracle with the same API but backed by a map. It's a good thing I did it too - it revealed a bug in Sparse.UnionWith that was hidden by intsets.Fast.
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewed 25 of 152 files at r11, 121 of 124 files at r12, 1 of 3 files at r13, 7 of 9 files at r14, 1 of 1 files at r15, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @HonoreDB, @mgartner, @msbutler, @smg260, and @yuzefovich)
pkg/util/intsets/sparse_test.go line 29 at r15 (raw file):
func TestSparse(t *testing.T) { for _, minVal := range []int{-100_000, -smallCutoff * 10, -smallCutoff, 0, smallCutoff, smallCutoff * 10} {
What's with the underscore in -100_000?
|
Previously, DrewKimball (Drew Kimball) wrote…
It has no semantic meaning, it's just a way to make it easier to read long numbers - just like the comma in "100,000". You can put them wherever you want, e.g., |
c89e7ca to
79538e0
Compare
|
Unfortunately, the new benchmark results are less promising than the original ones. For example, we were seeing a -24% latency delta in I'm not sure what explains the difference in results. It could be the addition of |
|
I think this is still worth merging for the increased optimization opportunities, as you say. I think any minor regressions will be overshadowed by potential to block-allocate - that's going to be a huge win. |
79538e0 to
e63f198
Compare
Release note: None
Release note: None
Release note: None
e63f198 to
24b3ac3
Compare
This commit replaces usages of the Sparse type in
golang.org/x/tools/container/intsets with a new `intsets.Sparse` type.
The new type is inspired by the `x/tools` type, but differs in several
ways:
1. The new `Sparse` type provides a smaller API than the `x/tools`
`Sparse` type, only containing the methods required by
`intsets.Fast`.
2. The new `Sparse` type is implemented as a singly-linked list of
blocks rather than a circular, doubly-linked list.
3. The new `Sparse` type reuses the `bitmap` type used in
`intsets.Fast`. As a result, each block can store up to 128
integers instead of 256.
This simpler implementation yields a performance boost in query
optimization of some types of queries.
Release note: None
Previously, `intsets.Fast` shifted values by 128 when storing them in `Fast.large` to minimize allocations within `Fast.large` (see the deleted comments in the diff). Now that `intsets.Sparse`'s uses blocks with 128-bit bitmaps instead of 256-bit bitmaps, this shift no longer provides any benefit, so it has been removed. Release note: None
24b3ac3 to
c760db7
Compare
|
Ok, sounds good. TFTRs! bors r+ |
|
Build succeeded: |
In cockroachdb#90009, `x/tools/Sparse` was replaced by a new `Sparse` integer set library. In that commit we forgot to update the test-only implementation of `Fast` to use the new library correctly. This was not caught earlier because CI and roachtests are not run with the `fast_int_set_small` and `fast_int_set_large` build tags. Release note: None
101873: opt: do not add derived FK equalities to lookup join ON filters r=mgartner a=mgartner This commit fixes a minor bug introduced in #90599 which can unnecessarily add derived FK equality filters to a lookup join's ON condition. The bug is minor because it does not cause incorrect results. It only adds a bit of extra work to evaluate the equality. This commit ensures that the derived FK filters are only used as equality columns in the lookup join. Fixes #101844 Release note: None 101908: util/intsets: fix test-only Fast r=mgartner a=mgartner In #90009, `x/tools/Sparse` was replaced by a new `Sparse` integer set library. In that commit we forgot to update the test-only implementation of `Fast` to use the new library correctly. This was not caught earlier because CI and roachtests are not run with the `fast_int_set_small` and `fast_int_set_large` build tags. Epic: None Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
In #90009, `x/tools/Sparse` was replaced by a new `Sparse` integer set library. In that commit we forgot to update the test-only implementation of `Fast` to use the new library correctly. This was not caught earlier because CI and roachtests are not run with the `fast_int_set_small` and `fast_int_set_large` build tags. Release note: None
util: move FastIntSet to new pkg/util/intsets
Release note: None
intsets: rename FastIntSet to Fast
Release note: None
intsets: move bitmap into a separate file
Release note: None
intsets: add Sparse and use it in Fast
This commit replaces usages of the Sparse type in
golang.org/x/tools/container/intsets with a new
intsets.Sparsetype.The new type is inspired by the
x/toolstype, but differs in severalways:
Sparsetype provides a smaller API than thex/toolsSparsetype, only containing the methods required byintsets.Fast.Sparsetype is implemented as a singly-linked list ofblocks rather than a circular, doubly-linked list.
Sparsetype reuses thebitmaptype used inintsets.Fast. As a result, each block can store up to 128integers instead of 256.
This simpler implementation yields a performance boost in query
optimization of some types of queries.
Release note: None
intsets: do not shift values in Fast.large
Previously,
intsets.Fastshifted values by 128 when storing them inFast.largeto minimize allocations withinFast.large(see thedeleted comments in the diff). Now that
intsets.Sparse's usesblocks with 128-bit bitmaps instead of 256-bit bitmaps, this shift no
longer provides any benefit, so it has been removed.
Release note: None
Epic: None