storage/diskmap: remove SortedDiskMapIterator.{Key,Value}#43145
storage/diskmap: remove SortedDiskMapIterator.{Key,Value}#43145craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @nvanbenschoten)
pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):
k := r.UnsafeKey() v := r.UnsafeValue() // TODO(asubiotto): the "true ||" should not be necessary. We should be to
@asubiotto This deserves a look. Seems like a caller of Row() is misusing the API. If the results from Row() need to be stable, we could add something like a util/bufalloc.ByteAllocator here, though it would be better to track down the offending caller.
nvb
left a comment
There was a problem hiding this comment.
other than the condition in
diskRowIterator.Row. Thanks for tracking this down.
Reviewed 4 of 4 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
asubiotto
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @petermattis)
pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
@asubiotto This deserves a look. Seems like a caller of
Row()is misusing the API. If the results fromRow()need to be stable, we could add something like autil/bufalloc.ByteAllocatorhere, though it would be better to track down the offending caller.
I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. CopyRows of EncDatums are only implemented as shallow copies, so the reference to this reuse only applies to the EncDatumRow, but not to the encoded field, which is what is rewritten in this case. The encoded field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for Datums as well. I think we should use a ByteAllocator here with a TODO referencing #43182
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear.
CopyRows ofEncDatums are only implemented as shallow copies, so the reference to this reuse only applies to theEncDatumRow, but not to theencodedfield, which is what is rewritten in this case. Theencodedfield will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies forDatums as well. I think we should use aByteAllocatorhere with a TODO referencing #43182
The comment above diskRowIterator.Row seems to make the contract for the lifetime of the result pretty explicit. Do you mind if I leave this TODO as-is for you (or someone else on the SQL exec team) to fix? Doing a copy here as this PR is doing, is equivalent to what the prior code was doing. In fact, this PR is slightly better as it is replacing 2 allocations with 1.
asubiotto
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @petermattis)
pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
The comment above
diskRowIterator.Rowseems to make the contract for the lifetime of the result pretty explicit. Do you mind if I leave this TODO as-is for you (or someone else on the SQL exec team) to fix? Doing a copy here as this PR is doing, is equivalent to what the prior code was doing. In fact, this PR is slightly better as it is replacing 2 allocations with 1.
👍
Remove `SortedDiskMapIterator.{Key,Value}` as these accessors are
horribly slow due to performing an allocation on every call. Change the
existing uses of these methods to `Unsafe{Key,Value}` adding copying
when necessary. Most of the use cases were easy to fix, though note that
`diskRowIterator.Row()` contains a TODO because the removal of the
allocation there caused many test failures.
The `PebbleMapIteration` benchmarks still show a regression in
comparison to f5009c8. That regression
is entirely due to Pebble's new memtable sizing heuristics which start
with a small memtable size and dynamically grow the size up to the
configured limit. Adding a knob to disable that behavior for the
purposes of a benchmark does not seem worthwhile.
name old time/op new time/op delta
RocksDBMapIteration/InputSize4096-16 1.18ms ± 3% 0.81ms ± 0% -31.56% (p=0.000 n=10+8)
RocksDBMapIteration/InputSize16384-16 5.83ms ± 1% 4.14ms ± 3% -29.13% (p=0.000 n=9+10)
RocksDBMapIteration/InputSize65536-16 24.8ms ± 1% 17.7ms ± 3% -28.54% (p=0.000 n=9+10)
RocksDBMapIteration/InputSize262144-16 137ms ± 0% 105ms ± 2% -23.65% (p=0.000 n=9+9)
RocksDBMapIteration/InputSize1048576-16 547ms ± 1% 430ms ± 1% -21.44% (p=0.000 n=8+9)
PebbleMapIteration/InputSize4096-16 594µs ± 1% 323µs ± 2% -45.65% (p=0.000 n=9+9)
PebbleMapIteration/InputSize16384-16 3.29ms ± 1% 2.15ms ± 1% -34.70% (p=0.000 n=10+9)
PebbleMapIteration/InputSize65536-16 16.0ms ± 7% 11.2ms ±11% -30.26% (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16 96.7ms ± 3% 76.5ms ± 5% -20.88% (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16 267ms ± 0% 185ms ± 1% -30.60% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
RocksDBMapIteration/InputSize4096-16 262kB ± 0% 0kB ± 0% -99.97% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16 1.31MB ± 0% 0.00MB ± 0% -99.99% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16 5.51MB ± 0% 0.00MB ± 3% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16 22.3MB ± 0% 0.0MB ± 0% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16 89.4MB ± 0% 0.0MB ± 0% -100.00% (p=0.000 n=10+9)
PebbleMapIteration/InputSize4096-16 263kB ± 0% 0kB ± 0% -99.91% (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16 1.31MB ± 0% 0.00MB ± 0% -99.98% (p=0.000 n=10+8)
PebbleMapIteration/InputSize65536-16 5.50MB ± 0% 0.00MB ± 3% -99.99% (p=0.000 n=10+9)
PebbleMapIteration/InputSize262144-16 22.3MB ± 0% 0.0MB ± 0% -99.99% (p=0.000 n=10+7)
PebbleMapIteration/InputSize1048576-16 89.3MB ± 0% 0.0MB ±26% -100.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
RocksDBMapIteration/InputSize4096-16 8.20k ± 0% 0.00k ± 0% -99.96% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16 41.0k ± 0% 0.0k ± 0% -99.99% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16 172k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16 696k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16 2.79M ± 0% 0.00M ± 0% -100.00% (p=0.000 n=9+9)
PebbleMapIteration/InputSize4096-16 8.20k ± 0% 0.01k ± 0% -99.94% (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16 41.0k ± 0% 0.0k ± 0% -99.99% (p=0.000 n=10+10)
PebbleMapIteration/InputSize65536-16 172k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16 696k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16 2.79M ± 0% 0.00M ± 9% -100.00% (p=0.000 n=10+10)
Release note: None
c3108ed to
ff890ab
Compare
|
TFTR! bors r+ |
Build failed (retrying...) |
Build failed (retrying...) |
Build failed (retrying...) |
Build failed |
|
4 different CI failures, 3 of which are unrelated to this PR. The 4th is out of space on bors r+ |
43145: storage/diskmap: remove SortedDiskMapIterator.{Key,Value} r=petermattis a=petermattis
Remove `SortedDiskMapIterator.{Key,Value}` as these accessors are
horribly slow due to performing an allocation on every call. Change the
existing uses of these methods to `Unsafe{Key,Value}` adding copying
when necessary. Most of the use cases were easy to fix, though note that
`diskRowIterator.Row()` contains a TODO because the removal of the
allocation there caused many test failures.
The `PebbleMapIteration` benchmarks still show a regression in
comparison to f5009c8. That regression
is entirely due to Pebble's new memtable sizing heuristics which start
with a small memtable size and dynamically grow the size up to the
configured limit. Adding a knob to disable that behavior for the
purposes of a benchmark does not seem worthwhile.
```
name old time/op new time/op delta
RocksDBMapIteration/InputSize4096-16 1.18ms ± 3% 0.81ms ± 0% -31.56% (p=0.000 n=10+8)
RocksDBMapIteration/InputSize16384-16 5.83ms ± 1% 4.14ms ± 3% -29.13% (p=0.000 n=9+10)
RocksDBMapIteration/InputSize65536-16 24.8ms ± 1% 17.7ms ± 3% -28.54% (p=0.000 n=9+10)
RocksDBMapIteration/InputSize262144-16 137ms ± 0% 105ms ± 2% -23.65% (p=0.000 n=9+9)
RocksDBMapIteration/InputSize1048576-16 547ms ± 1% 430ms ± 1% -21.44% (p=0.000 n=8+9)
PebbleMapIteration/InputSize4096-16 594µs ± 1% 323µs ± 2% -45.65% (p=0.000 n=9+9)
PebbleMapIteration/InputSize16384-16 3.29ms ± 1% 2.15ms ± 1% -34.70% (p=0.000 n=10+9)
PebbleMapIteration/InputSize65536-16 16.0ms ± 7% 11.2ms ±11% -30.26% (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16 96.7ms ± 3% 76.5ms ± 5% -20.88% (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16 267ms ± 0% 185ms ± 1% -30.60% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
RocksDBMapIteration/InputSize4096-16 262kB ± 0% 0kB ± 0% -99.97% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16 1.31MB ± 0% 0.00MB ± 0% -99.99% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16 5.51MB ± 0% 0.00MB ± 3% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16 22.3MB ± 0% 0.0MB ± 0% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16 89.4MB ± 0% 0.0MB ± 0% -100.00% (p=0.000 n=10+9)
PebbleMapIteration/InputSize4096-16 263kB ± 0% 0kB ± 0% -99.91% (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16 1.31MB ± 0% 0.00MB ± 0% -99.98% (p=0.000 n=10+8)
PebbleMapIteration/InputSize65536-16 5.50MB ± 0% 0.00MB ± 3% -99.99% (p=0.000 n=10+9)
PebbleMapIteration/InputSize262144-16 22.3MB ± 0% 0.0MB ± 0% -99.99% (p=0.000 n=10+7)
PebbleMapIteration/InputSize1048576-16 89.3MB ± 0% 0.0MB ±26% -100.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
RocksDBMapIteration/InputSize4096-16 8.20k ± 0% 0.00k ± 0% -99.96% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16 41.0k ± 0% 0.0k ± 0% -99.99% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16 172k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16 696k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16 2.79M ± 0% 0.00M ± 0% -100.00% (p=0.000 n=9+9)
PebbleMapIteration/InputSize4096-16 8.20k ± 0% 0.01k ± 0% -99.94% (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16 41.0k ± 0% 0.0k ± 0% -99.99% (p=0.000 n=10+10)
PebbleMapIteration/InputSize65536-16 172k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16 696k ± 0% 0k ± 0% -100.00% (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16 2.79M ± 0% 0.00M ± 9% -100.00% (p=0.000 n=10+10)
```
Release note: None
43207: roachprod/gce: use 2 SSDs for c2 machine types r=nvanbenschoten a=nvanbenschoten
This is required to spin up c2 instances, just as it is to spin up
n2 instances.
Release note: None
43214: sql: stop swallowing errors from UncachedPhysicalAccessor.IsValidSchema r=andreimatei a=ajwerner
Before this commit we'd swallow retriable errors during execution. This can
be very problematic as it can lead to lost writes and other general funkiness.
We're opting to not write a test for this specific case as there is ongoing
work to change interfaces to preclude this sort of bug.
Fixes #43067
Fixes #37883 (comment)
Release note: None
43219: pgwire/pgcode: fix DeprecatedInternalConnectionFailure r=nvanbenschoten a=nvanbenschoten
This was moved from "08006" in #41451, not "XXC03".
Release note: None
Co-authored-by: Peter Mattis <petermattis@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
This commit addresses an old TODO about figuring out why we cannot reuse the same buffer in `diskRowIterator.Row` method. The contract of that method was previously confusing - it says that the call to `Row` invalidates the row returned on the previous call; however, the important piece was missing - that the datums themselves cannot be mutated (this is what we assume elsewhere and perform only the "shallow" copies). This commit clarifies the contract of the method and explicitly explains why we need to allocate fresh byte slices (which is done via `ByteAllocator` to reduce the number of allocations). Additional context can be found in cockroachdb#43145 which added this copy in the first place. Here is a relevant quote from Alfonso: ``` I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. `CopyRow`s of `EncDatum`s are only implemented as shallow copies, so the reference to this reuse only applies to the `EncDatumRow`, but not to the `encoded` field, which is what is rewritten in this case. The `encoded` field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for `Datum`s as well. ``` I think we were under the impression that we'd implement the "deep copy" in `CopyRow`, but I highly doubt we'll do so given that we're mostly investing in the columnar infrastructure nowadays, and the current way of performing shallow copying has worked well long enough. Epic: None Release note: None
This commit addresses an old TODO about figuring out why we cannot reuse the same buffer in `diskRowIterator.Row` method. The contract of that method was previously confusing - it says that the call to `Row` invalidates the row returned on the previous call; however, the important piece was missing - that the datums themselves cannot be mutated (this is what we assume elsewhere and perform only the "shallow" copies). This commit clarifies the contract of the method and explicitly explains why we need to allocate fresh byte slices (which is done via `ByteAllocator` to reduce the number of allocations). Additional context can be found in cockroachdb#43145 which added this copy in the first place. Here is a relevant quote from Alfonso: ``` I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. `CopyRow`s of `EncDatum`s are only implemented as shallow copies, so the reference to this reuse only applies to the `EncDatumRow`, but not to the `encoded` field, which is what is rewritten in this case. The `encoded` field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for `Datum`s as well. ``` I think we were under the impression that we'd implement the "deep copy" in `CopyRow`, but I highly doubt we'll do so given that we're mostly investing in the columnar infrastructure nowadays, and the current way of performing shallow copying has worked well long enough. Epic: None Release note: None
92713: sql: fix left semi and left anti virtual lookup joins r=yuzefovich a=yuzefovich This commit fixes the execution of the left semi and left anti virtual lookup joins. The bug was that we forgot to project away the looked up columns (coming from the "right" side) which could then lead to wrong columns being used higher up the tree. The bug was introduced during 22.1 release cycle where we added the optimizer support for generating plans that could contain left semi and left anti virtual lookup joins. This commit fixes that issue as well as the output columns of such joins (I'm not sure whether there is a user facing impact of having incorrect "output columns"). Additionally, this commit fixes the execution of these virtual lookup joins to correctly return the input row only once. Previously, for left anti joins we'd be producing an output row if there was a match (which is wrong), and for both left semi and left anti we would emit an output row every time there was a match (but this should be done only once). (Although I'm not sure whether it is possible for virtual indexes to result in multiple looked up rows.) Also, as a minor simplification this commit makes it so that the output rows are not added into the row container for left semi and left anti and the container is not instantiated at all. Fixes: #91012. Fixes: #88096. Release note (bug fix): CockroachDB previously could incorrectly evaluate queries that performed left semi and left anti "virtual lookup" joins on tables in `pg_catalog` or `information_schema`. These join types can be planned when a subquery is used inside of a filter condition. The bug was introduced in 22.1.0 and is now fixed. 92783: nogo: detect deprecated go standard library packages r=rickystewart a=healthy-pod In #87327, we patched `staticcheck` to detect deprecated "objects" from the standard library. This patch ensures that we also detect deprecated "packages". Closes #84877 Release note: None 92859: rowcontainer: address an old TODO r=yuzefovich a=yuzefovich This commit addresses an old TODO about figuring out why we cannot reuse the same buffer in `diskRowIterator.Row` method. The contract of that method was previously confusing - it says that the call to `Row` invalidates the row returned on the previous call; however, the important piece was missing - that the datums themselves cannot be mutated (this is what we assume elsewhere and perform only the "shallow" copies). This commit clarifies the contract of the method and explicitly explains why we need to allocate fresh byte slices (which is done via `ByteAllocator` to reduce the number of allocations). Additional context can be found in #43145 which added this copy in the first place. Here is a relevant quote from Alfonso: ``` I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. `CopyRow`s of `EncDatum`s are only implemented as shallow copies, so the reference to this reuse only applies to the `EncDatumRow`, but not to the `encoded` field, which is what is rewritten in this case. The `encoded` field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for `Datum`s as well. ``` I think we were under the impression that we'd implement the "deep copy" in `CopyRow`, but I highly doubt we'll do so given that we're mostly investing in the columnar infrastructure nowadays, and the current way of performing shallow copying has worked well long enough. Epic: None Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: healthy-pod <ahmad@cockroachlabs.com>
This commit addresses an old TODO about figuring out why we cannot reuse the same buffer in `diskRowIterator.Row` method. The contract of that method was previously confusing - it says that the call to `Row` invalidates the row returned on the previous call; however, the important piece was missing - that the datums themselves cannot be mutated (this is what we assume elsewhere and perform only the "shallow" copies). This commit clarifies the contract of the method and explicitly explains why we need to allocate fresh byte slices (which is done via `ByteAllocator` to reduce the number of allocations). Additional context can be found in #43145 which added this copy in the first place. Here is a relevant quote from Alfonso: ``` I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. `CopyRow`s of `EncDatum`s are only implemented as shallow copies, so the reference to this reuse only applies to the `EncDatumRow`, but not to the `encoded` field, which is what is rewritten in this case. The `encoded` field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for `Datum`s as well. ``` I think we were under the impression that we'd implement the "deep copy" in `CopyRow`, but I highly doubt we'll do so given that we're mostly investing in the columnar infrastructure nowadays, and the current way of performing shallow copying has worked well long enough. Epic: None Release note: None
This commit addresses an old TODO about figuring out why we cannot reuse the same buffer in `diskRowIterator.Row` method. The contract of that method was previously confusing - it says that the call to `Row` invalidates the row returned on the previous call; however, the important piece was missing - that the datums themselves cannot be mutated (this is what we assume elsewhere and perform only the "shallow" copies). This commit clarifies the contract of the method and explicitly explains why we need to allocate fresh byte slices (which is done via `ByteAllocator` to reduce the number of allocations). Additional context can be found in cockroachdb#43145 which added this copy in the first place. Here is a relevant quote from Alfonso: ``` I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. `CopyRow`s of `EncDatum`s are only implemented as shallow copies, so the reference to this reuse only applies to the `EncDatumRow`, but not to the `encoded` field, which is what is rewritten in this case. The `encoded` field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for `Datum`s as well. ``` I think we were under the impression that we'd implement the "deep copy" in `CopyRow`, but I highly doubt we'll do so given that we're mostly investing in the columnar infrastructure nowadays, and the current way of performing shallow copying has worked well long enough. Epic: None Release note: None
Remove
SortedDiskMapIterator.{Key,Value}as these accessors arehorribly slow due to performing an allocation on every call. Change the
existing uses of these methods to
Unsafe{Key,Value}adding copyingwhen necessary. Most of the use cases were easy to fix, though note that
diskRowIterator.Row()contains a TODO because the removal of theallocation there caused many test failures.
The
PebbleMapIterationbenchmarks still show a regression incomparison to f5009c8. That regression
is entirely due to Pebble's new memtable sizing heuristics which start
with a small memtable size and dynamically grow the size up to the
configured limit. Adding a knob to disable that behavior for the
purposes of a benchmark does not seem worthwhile.
Release note: None