util/interval: generate type-safe specializations of interval B-tree#43850
util/interval: generate type-safe specializations of interval B-tree#43850craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
7b47bdf to
fc64462
Compare
petermattis
left a comment
There was a problem hiding this comment.
This is fairly neat. Not very intrusive and the generated code is perfectly readable. I have no objections to this approach if you can define the generic type appropriately for the lock table use case.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):
// T is a Template type. The methods in the interface make up its contract. type T = interface {
What is this about? I didn't see mention of it in the go_generics readme. That documentation is admittedly sparse.
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
What is this about? I didn't see mention of it in the go_generics readme. That documentation is admittedly sparse.
Oh, I see that this is just the type being replaced. What through me is the type T =. I see other examples of go_generics which use type T interface. Are both styles supported?
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @petermattis)
pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Oh, I see that this is just the type being replaced. What through me is the
type T =. I see other examples ofgo_genericswhich usetype T interface. Are both styles supported?
Yes, the type T is replaced. I'm not sure what the difference between the type alias and the type definition is. https://github.com/scylladb/go-set uses the former and https://github.com/mmatczuk/go_generics uses the latter in its tests. Based on the code in the generator itself, it doesn't look like there's a difference in how they're handled - either way, they're removed from the resulting AST.
The interface serves two purposes: to specify a contract for the parameterized type and allow the templated go files to compile.
sumeerbhola
left a comment
There was a problem hiding this comment.
Looks great. Thanks.
Reviewed 14 of 15 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
pkg/util/interval/generic/doc.go, line 27 at r1 (raw file):
3. Include a go generate declaration that invokes the gen.sh script with the type name as the first argument and the package name as the second argument. 4. Invoke go gnerate.
nit: generate
pkg/util/interval/generic/internal/contract.go, line 21 at r1 (raw file):
SetID(uint64) SetKey([]byte) SetEndKey([]byte)
why does the btree implementation need the Set* methods? And I don't see the code in example_interval_btree.go calling the Set* methods.
Ah, I now see it is needed by the test. Can you add a comment here?
pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):
} // incRef acquires a reference to the node.
I have a hard time understanding the ref counting happening in this file and reasoning about its correctness.
A snapshot of the tree only increments the ref count on the root node. And the snapshot happens before insertion of the latches. So presumably the root has a refcount of at least 2 (if the baseline is a refcount of 1). Then the insertion needs a mutable root node which must mean a copy is made of the root, and this copy increments the refs of its children, but not a recursive increment. But mut() also does a recursive decrement -- why (when were these recursive refs acquired)? I am assuming that mut() acquiring refs on its children is what ensures that whatever path the insert takes will see a node that does not have a ref count of 1 and so have to make a copy, while the paths not taken don't need a recursive refcount increment -- so a refcount of > 1 at a node also protects its subtree.
Could you add a high-level comment somewhere in this file?
fc64462 to
4c6b73b
Compare
nvb
left a comment
There was a problem hiding this comment.
Thanks for taking a look. I still need to make a few tweaks here to get it passing all lint checks. I'll ping the PR when that's done.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @sumeerbhola)
pkg/util/interval/generic/doc.go, line 27 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: generate
Done.
pkg/util/interval/generic/internal/contract.go, line 21 at r1 (raw file):
Previously, sumeerbhola wrote…
why does the btree implementation need the
Set*methods? And I don't see the code inexample_interval_btree.gocalling theSet*methods.Ah, I now see it is needed by the test. Can you add a comment here?
Done.
pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):
Your understanding sounds almost exactly correct. The one thing that it sounds like you're still unsure about is why mut() does a recursive decrement on the cloned node's refcount. On the path that you're referencing, mut() is performing a clone so that it can write. Because mut() is cloning a node, it can release it's reference to the old node. If this is the final reference to the old node, which is only possible if we're racing with another call to decRef (I added a comment), then that should recursively release references on the children it points to and be recycled.
Could you add a high-level comment somewhere in this file?
I added the following comment to btree.Clone:
Incrementing the reference count on the root node is sufficient to ensure that
no node in the cloned tree can be mutated by an actor holding a reference to the
original tree and vice versa. This property is upheld because the root node in
the receiver btree and the returned btree will both necessarily have a reference
count of at least 2 when this method returns. All tree mutations recursively
acquire mutable node references (see mut) as they traverse down the tree. The
act of acquiring a mutable node reference performs a clone if a node's reference
count is greater than one. Cloning a node (see clone) increases the reference
count on each of its children, ensuring that they have a reference count of at
least 2. This, in turn, ensures that any of the child nodes that are modified
will also be copied-on-write, recursively ensuring the immutability property
over the entire tree.```
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 15 files at r1, 2 of 2 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Your understanding sounds almost exactly correct. The one thing that it sounds like you're still unsure about is why
mut()does a recursive decrement on the cloned node's refcount. On the path that you're referencing,mut()is performing a clone so that it can write. Becausemut()is cloning a node, it can release it's reference to the old node. If this is the final reference to the old node, which is only possible if we're racing with another call todecRef(I added a comment), then that should recursively release references on the children it points to and be recycled.Could you add a high-level comment somewhere in this file?
I added the following comment to
btree.Clone:Incrementing the reference count on the root node is sufficient to ensure that no node in the cloned tree can be mutated by an actor holding a reference to the original tree and vice versa. This property is upheld because the root node in the receiver btree and the returned btree will both necessarily have a reference count of at least 2 when this method returns. All tree mutations recursively acquire mutable node references (see mut) as they traverse down the tree. The act of acquiring a mutable node reference performs a clone if a node's reference count is greater than one. Cloning a node (see clone) increases the reference count on each of its children, ensuring that they have a reference count of at least 2. This, in turn, ensures that any of the child nodes that are modified will also be copied-on-write, recursively ensuring the immutability property over the entire tree.``` </blockquote></details> This comment is helpful. I was still a bit lost, so I reread the code. The key thing that I missed earlier (a code comment would help) is that a ref represents how many parents a node has. The one special case is the root ref is how many btree structs point to it. Having one parent does not mean the node can be mutated since that parent may be part of two trees. - when mutating, when see a node N with more than one parent, eagerly clone it. This propagates information of what is shared (that is known higher up in the tree), recursively downwards by ensuring anything along the mutation path (worst case log N) below this parent is also cloned. - This cloning ensures the children of N have multiple parents -- the parents are the cloned node N' and any earlier parent nodes (which include N). Any of these children that will need to be modified will also need to be cloned. - Clone decrements the ref count of N. If the ref count of N is becoming 0 it will be deleted and no longer a parent of its children, so by the definition of what a ref count means it should decrement the ref count of its children. <!-- Sent from Reviewable.io -->
danhhz
left a comment
There was a problem hiding this comment.
I like the //go:generate ../../util/interval/generic/gen.sh lines, that's a nice way to do this
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/gen.sh, line 10 at r3 (raw file):
fi which go_generics > /dev/null || (echo "Could not find go_generics, to install run: \"go get -u github.com/mmatczuk/go_generics/cmd/go_generics\""; exit 1)
shouldn't the makefile install them into ./bin and then this use the ones from there? otherwise we have hermetic build issues, no?
4c6b73b to
816bc02
Compare
nvb
left a comment
There was a problem hiding this comment.
I spent a little too long trying to figure out why some comments were being stripped from the generated files. First I thought they were being stripped somewhere in go_generics. I then realized they were being stripped earlier, in go_merge when we merged the contract.go file with the template files. go_merge passes ast.FilterUnassociatedComments to the stdlib's ast.MergePackageFiles here. Removing that didn't do the trick though, because comments were then positioned incorrectly after the file merge, completely mangling the code. It turns out that ast.MergePackageFiles doesn't make any attempt to reposition comments as it merges other AST nodes. I gave fixing this a shot for a little too long before eventually finding out that this is a "major design flaw" in how Go handles the relation between free-floating comments and AST nodes: golang/go#20744. In fact, according to Rob Griesemer, this is the single biggest issue when manipulating the AST.
So long story short, we're dropping the dependency on go_merge and doing the simpler thing of a small grep and then a file concatenation. That works and avoids the issues around missing comments. With that, all of the lint warnings are resolved and this is ready to go in.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/gen.sh, line 10 at r3 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
shouldn't the makefile install them into
./binand then this use the ones from there? otherwise we have hermetic build issues, no?
Yes, the Makefile change ensures that these are installed into ./bin, this language was just off. Fixed (by removing).
816bc02 to
045ed93
Compare
petermattis
left a comment
There was a problem hiding this comment.
modulo a small comment on the the
*_tmpl.go files. I want to find a reason to use go_generics myself now.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @danhhz, @nvanbenschoten, and @sumeerbhola)
pkg/util/interval/generic/internal/unused.go, line 15 at r5 (raw file):
// Silence unused warnings. The linter doesn't notice that these are test // functions because their file is not named *_test.go. We don't want it // to be because then it would be run by `go test`.
Show the *_tmpl.go files contain a build tag preventing them from being built by default? Not sure if that would work. Alternately, doesn't Go ignore files that start with _?
045ed93 to
e967bc8
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/internal/unused.go, line 15 at r5 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Show the
*_tmpl.gofiles contain a build tag preventing them from being built by default? Not sure if that would work. Alternately, doesn't Go ignore files that start with_?
That's a good question. Changing the files to start with _ didn't appease the linter without this file. Neither did adding a // +build ignore build tag.
However, switching the test file from interval_btree_test_tmpl.go to interval_btree_tmpl_test.go allowed it to get picked up as a test by the linter, so we could then get rid of this file. That then required the use of a // +build ignore build tag, which I added and now strip out in gen.sh.
That seems like a nicer solution to me. Thanks for the suggestion.
e967bc8 to
1dc5772
Compare
|
CI is trying to pick up and stress this new bors r=petermattis |
Build failed (retrying...) |
|
^ Everything passed except |
Build failed (retrying...) |
Build failed (retrying...) |
Build failed |
See cockroachdb#43740. This PR pulls the existing immutable interval B-Tree structure out of the spanlatch package and into a utility library. In order to avoid a loss of both type safety and performance, it opts for code generation by using https://github.com/mmatczuk/go_generics. The new util library exposes a script that is meant to be invoked by go generate to create specializations of the ordered tree structure for individual use cases. This ensures that each use can achieve the performance of being hand-written without needing to continue the pattern of copying the btree code and manually modifying it each time we want to use it.
This commit adds a comment to `btree.Clone` explaining why it is safe. Release note: None
1dc5772 to
f16ed52
Compare
|
We're no longer avoiding the compilation of bors r=petermattis |
43850: util/interval: generate type-safe specializations of interval B-tree r=petermattis a=nvanbenschoten See #43740. This PR pulls the existing immutable interval B-Tree structure out of the spanlatch package and into a utility library. In order to avoid a loss of both type safety and performance, it opts for code generation by using https://github.com/mmatczuk/go_generics. The new util library exposes a script that is meant to be invoked by go generate to create specializations of the ordered tree structure for individual use cases. This ensures that each use can achieve the performance of being hand-written without needing to continue the pattern of copying the btree code and manually modifying it each time we want to use it. cc. @petermattis @sumeerbhola I'm curious to get your thoughts on this experiment. I'm hoping this also unblocks @sumeerbhola on #43740. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
44617: util/interval/generic: avoid new allocations in btree benchmarks r=nvanbenschoten a=nvanbenschoten The generic template for interval btree makes an attempt to work with both value and pointer parameterized types. To facilitate this option in tests, it used reflection to create new instances of the item type. This reflection was preventing allocations of the item type from being inlined and avoided by escape analysis, which was skewing benchmarks. This commit fixes this by avoiding the reflection and instead adding a constructor to the contract of the parameterized type. This is cleaner and allows the allocations to be avoided in these benchmarks. Here's the new `benchdiff` output of the latch manager's btree from before #43850 to after this change: [benchdiff sheet](https://docs.google.com/spreadsheets/d/1ZxtPPSWV0z76msCiwIyqL0xnUQF530ZwnbnfrvEzNAE/edit#gid=4). Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
See #43740.
This PR pulls the existing immutable interval B-Tree structure out of the
spanlatch package and into a utility library.
In order to avoid a loss of both type safety and performance, it opts for
code generation by using https://github.com/mmatczuk/go_generics. The new
util library exposes a script that is meant to be invoked by go generate
to create specializations of the ordered tree structure for individual
use cases. This ensures that each use can achieve the performance of being
hand-written without needing to continue the pattern of copying the btree
code and manually modifying it each time we want to use it.
cc. @petermattis @sumeerbhola I'm curious to get your thoughts on this experiment.
I'm hoping this also unblocks @sumeerbhola on #43740.