Skip to content

util/intsets: use new intsets.Sparse implementation for util.FastIntSet#90009

Merged
craig[bot] merged 5 commits intocockroachdb:masterfrom
mgartner:marcus-sparse-set
Dec 20, 2022
Merged

util/intsets: use new intsets.Sparse implementation for util.FastIntSet#90009
craig[bot] merged 5 commits intocockroachdb:masterfrom
mgartner:marcus-sparse-set

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

@mgartner mgartner commented Oct 14, 2022

util: move FastIntSet to new pkg/util/intsets

Release note: None

intsets: rename FastIntSet to Fast

Release note: None

intsets: move bitmap into a separate file

Release note: None

intsets: add Sparse and use it in Fast

This commit replaces usages of the Sparse type in
golang.org/x/tools/container/intsets with a new intsets.Sparse type.
The new type is inspired by the x/tools type, but differs in several
ways:

  1. The new Sparse type provides a smaller API than the x/tools
    Sparse type, only containing the methods required by
    intsets.Fast.
  2. The new Sparse type is implemented as a singly-linked list of
    blocks rather than a circular, doubly-linked list.
  3. The new Sparse type reuses the bitmap type used in
    intsets.Fast. As a result, each block can store up to 128
    integers instead of 256.

This simpler implementation yields a performance boost in query
optimization of some types of queries.

Release note: None

intsets: do not shift values in Fast.large

Previously, intsets.Fast shifted values by 128 when storing them in
Fast.large to minimize allocations within Fast.large (see the
deleted comments in the diff). Now that intsets.Sparse's uses
blocks with 128-bit bitmaps instead of 256-bit bitmaps, this shift no
longer provides any benefit, so it has been removed.

Release note: None

Epic: None

@mgartner mgartner requested review from a team as code owners October 14, 2022 20:38
@mgartner mgartner requested review from a team, HonoreDB, herkolategan, msbutler and smg260 and removed request for a team October 14, 2022 20:38
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner
Copy link
Copy Markdown
Contributor Author

Optimizer microbenchmarks show some promising improvements: https://gist.github.com/mgartner/bafce7fa99cf96644304bceb23303919

I'm planning on running more end-to-end benchmarks to ensure this doesn't cause any significant regressions.

@smg260
Copy link
Copy Markdown
Contributor

smg260 commented Oct 17, 2022

👍 Minor impact on test-eng code

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! :lgtm:

Reviewed 172 of 172 files at r1, 121 of 121 files at r2, 3 of 3 files at r3, 6 of 6 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @herkolategan, @HonoreDB, @mgartner, @msbutler, and @smg260)


pkg/util/intsets/sparse.go line 29 at r4 (raw file):

// Max method allows us to use a singly-linked list here instead of a
// circular, doubly-linked list.
type Sparse struct {

I wonder whether it would be worth it to add an explicit unit test for this new struct. In particular, it seems that writing a randomized test that compares the new implementation against intsets.Sparse would be pretty low effort and would give us more confidence.


pkg/util/intsets/sparse.go line 51 at r4 (raw file):

// offset returns the block offset for the given integer.
// Note: Bitwise AND NOT only works here because smallCutoff is a power of two.

nit: should we add an explicit assertion for this? I.e. something like

func init() {
  if (smallCutoff & (smallCutoff - 1)) != 0 {
    panic("boom")
  }
}

pkg/util/intsets/sparse.go line 95 at r4 (raw file):

			return nil
		}
		s.root.offset = b.next.offset

nit: can we do s.root = *b.next?


pkg/util/intsets/sparse.go line 129 at r4 (raw file):

func (s *Sparse) Remove(i int) {
	o := offset(i)
	b := bit(i)

nit: not sure whether it is worth it, but we could lazily compute value of b. In case when a block that would contain i is not present in the set, we'd avoid computing the bit altogether. Ditto for Contains and LowerBound.


pkg/util/intsets/sparse.go line 267 at r4 (raw file):

		}
	}
	if sb == &s.root {

The split of the logic between a loop and outside seems tricky - could you add some more commentary here? Like what sb and last are expected to point at, etc.

Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @msbutler, @smg260, and @yuzefovich)


pkg/util/intsets/sparse.go line 29 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I wonder whether it would be worth it to add an explicit unit test for this new struct. In particular, it seems that writing a randomized test that compares the new implementation against intsets.Sparse would be pretty low effort and would give us more confidence.

Good idea. I'll work on adding those.


pkg/util/intsets/sparse.go line 51 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we add an explicit assertion for this? I.e. something like

func init() {
  if (smallCutoff & (smallCutoff - 1)) != 0 {
    panic("boom")
  }
}

Good idea. Done.


pkg/util/intsets/sparse.go line 95 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: can we do s.root = *b.next?

We can, but it compiles to basically the same instructions:

         MOVQ    24(CX), DX
-        MOVQ    8(DX), BX
-        MOVQ    16(DX), DX
+        MOVQ    8(CX), BX
+        MOVQ    16(CX), SI
+        MOVQ    (CX), CX
+        MOVQ    CX, (AX)
         MOVQ    BX, 8(AX)
-        MOVQ    DX, 16(AX)
-        MOVQ    24(CX), CX
-        MOVQ    24(CX), CX
+        MOVQ    SI, 16(AX)
         PCDATA  $0, $-2
         CMPL    runtime.writeBarrier(SB), $0
-        JNE     command-line-arguments_Sparse_removeBlock_pc126
-        MOVQ    CX, 24(AX)
-        JMP     command-line-arguments_Sparse_removeBlock_pc135
-command-line-arguments_Sparse_removeBlock_pc126:
+        JNE     command-line-arguments_Sparse_removeBlock_pc118
+        MOVQ    DX, 24(AX)
+        JMP     command-line-arguments_Sparse_removeBlock_pc127
+command-line-arguments_Sparse_removeBlock_pc118:
         LEAQ    24(AX), DI
-        CALL    runtime.gcWriteBarrierCX(SB)
-command-line-arguments_Sparse_removeBlock_pc135:
+        CALL    runtime.gcWriteBarrierDX(SB)
+command-line-arguments_Sparse_removeBlock_pc127:

But if you think it's more clear and concise, I can make the change.


pkg/util/intsets/sparse.go line 129 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: not sure whether it is worth it, but we could lazily compute value of b. In case when a block that would contain i is not present in the set, we'd avoid computing the bit altogether. Ditto for Contains and LowerBound.

This doesn't seem to change the compiled assembly at all, so I'll leave as-is.


pkg/util/intsets/sparse.go line 267 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

The split of the logic between a loop and outside seems tricky - could you add some more commentary here? Like what sb and last are expected to point at, etc.

This was the trickiest part to get correct. I've left a note - let me know if it helps.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 134 of 134 files at r6, 121 of 121 files at r7, 3 of 3 files at r8, 6 of 6 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @mgartner, @msbutler, and @smg260)


pkg/util/intsets/sparse.go line 95 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

We can, but it compiles to basically the same instructions:

         MOVQ    24(CX), DX
-        MOVQ    8(DX), BX
-        MOVQ    16(DX), DX
+        MOVQ    8(CX), BX
+        MOVQ    16(CX), SI
+        MOVQ    (CX), CX
+        MOVQ    CX, (AX)
         MOVQ    BX, 8(AX)
-        MOVQ    DX, 16(AX)
-        MOVQ    24(CX), CX
-        MOVQ    24(CX), CX
+        MOVQ    SI, 16(AX)
         PCDATA  $0, $-2
         CMPL    runtime.writeBarrier(SB), $0
-        JNE     command-line-arguments_Sparse_removeBlock_pc126
-        MOVQ    CX, 24(AX)
-        JMP     command-line-arguments_Sparse_removeBlock_pc135
-command-line-arguments_Sparse_removeBlock_pc126:
+        JNE     command-line-arguments_Sparse_removeBlock_pc118
+        MOVQ    DX, 24(AX)
+        JMP     command-line-arguments_Sparse_removeBlock_pc127
+command-line-arguments_Sparse_removeBlock_pc118:
         LEAQ    24(AX), DI
-        CALL    runtime.gcWriteBarrierCX(SB)
-command-line-arguments_Sparse_removeBlock_pc135:
+        CALL    runtime.gcWriteBarrierDX(SB)
+command-line-arguments_Sparse_removeBlock_pc127:

But if you think it's more clear and concise, I can make the change.

Thanks for checking, for me it's a minor stylistic preference, but please feel free to ignore this nit.


pkg/util/intsets/sparse.go line 267 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This was the trickiest part to get correct. I've left a note - let me know if it helps.

Yes, now it's clear, thanks!

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for pulling it into separate commits! I'm excited to see what sort of additional gains we can see by using arena allocation for intsets.Sparse

Reviewed 44 of 172 files at r1, 134 of 134 files at r6, 121 of 121 files at r7, 3 of 3 files at r8, 6 of 6 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @HonoreDB, @mgartner, @msbutler, @smg260, and @yuzefovich)


pkg/util/intsets/sparse.go line 58 at r9 (raw file):

// offset returns the block offset for the given integer.
// Note: Bitwise AND NOT only works here because smallCutoff is a power of two.
func offset(i int) int {

[nit] Should we add gcassert:inline to some of these methods?

@DrewKimball
Copy link
Copy Markdown
Collaborator

@mgartner do you have any plans to bring this one across the finish line? I'm happy to help if it means getting those sweet perf improvements.

@mgartner mgartner force-pushed the marcus-sparse-set branch 2 times, most recently from 8aa0939 to 6f24912 Compare December 16, 2022 20:37
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrewKimball I've just rebased this and I think the last remaining thing is writing a bunch of randomized tests for sparse. Let me see if I can make some progress on that today.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @msbutler, @smg260, and @yuzefovich)


pkg/util/intsets/sparse.go line 267 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Yes, now it's clear, thanks!

Done.


pkg/util/intsets/sparse.go line 58 at r9 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Should we add gcassert:inline to some of these methods?

I've added them, but I don't think the TestGCAssert linter is running under bazel, so they won't do anything until that is fixed. See #65485.

@mgartner mgartner requested a review from a team December 16, 2022 22:55
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests and fixed a bug that they uncovered in UnionWith. I think the last thing to do it run the optimizer micro-benchmarks again and make sure that we still see the same improvement after the minor change that fixed the bug.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @herkolategan, @HonoreDB, @msbutler, @smg260, and @yuzefovich)


pkg/util/intsets/sparse.go line 29 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Good idea. I'll work on adding those.

I've added randomized tests for intsets.Sparse using an oracle with the same API but backed by a map. It's a good thing I did it too - it revealed a bug in Sparse.UnionWith that was hidden by intsets.Fast.

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks great! :lgtm:

Reviewed 25 of 152 files at r11, 121 of 124 files at r12, 1 of 3 files at r13, 7 of 9 files at r14, 1 of 1 files at r15, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @HonoreDB, @mgartner, @msbutler, @smg260, and @yuzefovich)


pkg/util/intsets/sparse_test.go line 29 at r15 (raw file):

func TestSparse(t *testing.T) {
	for _, minVal := range []int{-100_000, -smallCutoff * 10, -smallCutoff, 0, smallCutoff, smallCutoff * 10} {

What's with the underscore in -100_000?

@mgartner
Copy link
Copy Markdown
Contributor Author

pkg/util/intsets/sparse_test.go line 29 at r15 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

What's with the underscore in -100_000?

It has no semantic meaning, it's just a way to make it easier to read long numbers - just like the comma in "100,000". You can put them wherever you want, e.g., 1000 == 1_0_0_0 => true. I've seen it used in other languages, and I think this feature was added in Go 1.13, here's the proposal: golang/go#28493

@mgartner
Copy link
Copy Markdown
Contributor Author

Unfortunately, the new benchmark results are less promising than the original ones. For example, we were seeing a -24% latency delta in Phases/many-columns-and-indexes-c/Prepared/ExecBuild-24, and now we see only -8%. All the SlowQueries benchmarks originally had reduced latency, and now only SlowQueries/slow-query-1-24 does. I've lost some confidence that this is justified to merge, what do you think @DrewKimball. One justification that this opens the door to attempt to pool some of these block allocations, or create a version with a custom allocator.

I'm not sure what explains the difference in results. It could be the addition of if rhs.Empty() { return; } to Sparse.UnionWith that fixes the bug found by the Sparse tests, our recent upgrade to Go 1.19, or something else.

@DrewKimball
Copy link
Copy Markdown
Collaborator

I think this is still worth merging for the increased optimization opportunities, as you say. I think any minor regressions will be overshadowed by potential to block-allocate - that's going to be a huge win.

This commit replaces usages of the Sparse type in
golang.org/x/tools/container/intsets with a new `intsets.Sparse` type.
The new type is inspired by the `x/tools` type, but differs in several
ways:

  1. The new `Sparse` type provides a smaller API than the `x/tools`
     `Sparse` type, only containing the methods required by
     `intsets.Fast`.
  2. The new `Sparse` type is implemented as a singly-linked list of
     blocks rather than a circular, doubly-linked list.
  3. The new `Sparse` type reuses the `bitmap` type used in
     `intsets.Fast`. As a result, each block can store up to 128
     integers instead of 256.

This simpler implementation yields a performance boost in query
optimization of some types of queries.

Release note: None
Previously, `intsets.Fast` shifted values by 128 when storing them in
`Fast.large` to minimize allocations within `Fast.large` (see the
deleted comments in the diff). Now that `intsets.Sparse`'s uses blocks
with 128-bit bitmaps instead of 256-bit bitmaps, this shift no longer
provides any benefit, so it has been removed.

Release note: None
@mgartner
Copy link
Copy Markdown
Contributor Author

Ok, sounds good. TFTRs!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 20, 2022

Build succeeded:

@craig craig bot merged commit 48ef0d8 into cockroachdb:master Dec 20, 2022
@mgartner mgartner deleted the marcus-sparse-set branch December 20, 2022 19:09
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 20, 2023
In cockroachdb#90009, `x/tools/Sparse` was replaced by a new `Sparse` integer set
library. In that commit we forgot to update the test-only implementation
of `Fast` to use the new library correctly. This was not caught earlier
because CI and roachtests are not run with the `fast_int_set_small` and
`fast_int_set_large` build tags.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 20, 2023
101873: opt: do not add derived FK equalities to lookup join ON filters r=mgartner a=mgartner

This commit fixes a minor bug introduced in #90599 which can
unnecessarily add derived FK equality filters to a lookup join's ON
condition. The bug is minor because it does not cause incorrect results.
It only adds a bit of extra work to evaluate the equality. This commit
ensures that the derived FK filters are only used as equality columns in
the lookup join.

Fixes #101844

Release note: None


101908: util/intsets: fix test-only Fast r=mgartner a=mgartner

In #90009, `x/tools/Sparse` was replaced by a new `Sparse` integer set
library. In that commit we forgot to update the test-only implementation
of `Fast` to use the new library correctly. This was not caught earlier
because CI and roachtests are not run with the `fast_int_set_small` and
`fast_int_set_large` build tags.

Epic: None

Release note: None


Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this pull request Apr 20, 2023
In #90009, `x/tools/Sparse` was replaced by a new `Sparse` integer set
library. In that commit we forgot to update the test-only implementation
of `Fast` to use the new library correctly. This was not caught earlier
because CI and roachtests are not run with the `fast_int_set_small` and
`fast_int_set_large` build tags.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants