Skip to content

[WIP] kv: merge latch spans and lock spans#52610

Draft
nvb wants to merge 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/removeLockSpans
Draft

[WIP] kv: merge latch spans and lock spans#52610
nvb wants to merge 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/removeLockSpans

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Aug 11, 2020

name                          old time/op    new time/op    delta
KV/Scan/Native/rows=10000-16    49.9ms ± 4%    46.8ms ± 6%   -6.21%  (p=0.000 n=10+10)
KV/Scan/Native/rows=1000-16     4.04ms ± 3%    3.82ms ± 3%   -5.48%  (p=0.000 n=10+9)
KV/Scan/Native/rows=100-16       412µs ± 5%     397µs ± 3%   -3.65%  (p=0.013 n=10+8)
KV/Scan/Native/rows=10-16       64.1µs ± 4%    61.4µs ± 2%   -4.33%  (p=0.000 n=10+9)
KV/Scan/Native/rows=1-16        28.8µs ± 4%    28.5µs ± 3%     ~     (p=0.133 n=10+9)

name                          old alloc/op   new alloc/op   delta
KV/Scan/Native/rows=10000-16    19.8MB ± 2%    16.5MB ± 3%  -16.64%  (p=0.000 n=10+10)
KV/Scan/Native/rows=1000-16     1.37MB ± 0%    1.23MB ± 1%  -10.40%  (p=0.000 n=8+10)
KV/Scan/Native/rows=100-16       150kB ± 1%     135kB ± 2%   -9.70%  (p=0.000 n=10+10)
KV/Scan/Native/rows=10-16       21.3kB ± 0%    19.0kB ± 0%  -10.49%  (p=0.000 n=10+9)
KV/Scan/Native/rows=1-16        8.19kB ± 0%    8.06kB ± 0%   -1.59%  (p=0.000 n=8+9)

name                          old allocs/op  new allocs/op  delta
KV/Scan/Native/rows=1-16          72.0 ± 0%      69.0 ± 0%   -4.17%  (p=0.000 n=9+10)
KV/Scan/Native/rows=10-16          236 ± 0%       229 ± 0%   -2.97%  (p=0.001 n=8+9)
KV/Scan/Native/rows=100-16       1.78k ± 0%     1.77k ± 0%   -0.62%  (p=0.000 n=8+8)
KV/Scan/Native/rows=1000-16      17.1k ± 0%     17.1k ± 0%   -0.10%  (p=0.000 n=10+8)
KV/Scan/Native/rows=10000-16      171k ± 0%      171k ± 0%   -0.03%  (p=0.031 n=9+9)

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 18, 2020

@sumeerbhola did you manage to take this for a spin with the geospatial queries that you are looking at? I'm curious whether this is something we should be pushing on for this release.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is taking me longer than expected to get to this -- I'll try to run with this today to see how it affects some sample queries.

Reviewed 7 of 17 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/spanset/spanset.go, line 175 at r2 (raw file):

// BoundarySpan returns a span containing all the spans with the given params.
func (s *SpanSet) BoundarySpan(scope SpanScope) roachpb.Span {

The scope parameter isn't used below.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to help with a couple of sample queries, but there is enough noise that it is possible that the slow down indicated below isn't actually true. I probably won't have time to run more extensive performance comparisons this week before the code freeze, so unless you have time, I think we should look into this afterwards.

$ benchstat bench_before_span_opt/bench_CockroachDB.txt bench_after_span_opt/bench_CockroachDB.txt 
name          old ms     new ms     delta
Test/nyc/nyc   118 ±11%   133 ±14%  +12.06%  (p=0.000 n=96+99)
Test/osm/osm  86.2 ±14%  94.9 ±16%  +10.14%  (p=0.000 n=96+98)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 18, 2020

Huh, I'm pretty surprised by the slowdown. I could see this not helping, but don't see how it could hurt unless there's just a bug in the PR. There's no chance you got "before" and "after" swapped, is there? 😄

Anyway, thanks for testing.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no chance you got "before" and "after" swapped, is there?
There's no chance you got "before" and "after" swapped, is there?

Unfortunately not -- I looked quite carefully.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

nvb added 2 commits August 20, 2020 20:13
```
name                          old time/op    new time/op    delta
KV/Scan/Native/rows=10000-16    49.9ms ± 4%    46.8ms ± 6%   -6.21%  (p=0.000 n=10+10)
KV/Scan/Native/rows=1000-16     4.04ms ± 3%    3.82ms ± 3%   -5.48%  (p=0.000 n=10+9)
KV/Scan/Native/rows=100-16       412µs ± 5%     397µs ± 3%   -3.65%  (p=0.013 n=10+8)
KV/Scan/Native/rows=10-16       64.1µs ± 4%    61.4µs ± 2%   -4.33%  (p=0.000 n=10+9)
KV/Scan/Native/rows=1-16        28.8µs ± 4%    28.5µs ± 3%     ~     (p=0.133 n=10+9)

name                          old alloc/op   new alloc/op   delta
KV/Scan/Native/rows=10000-16    19.8MB ± 2%    16.5MB ± 3%  -16.64%  (p=0.000 n=10+10)
KV/Scan/Native/rows=1000-16     1.37MB ± 0%    1.23MB ± 1%  -10.40%  (p=0.000 n=8+10)
KV/Scan/Native/rows=100-16       150kB ± 1%     135kB ± 2%   -9.70%  (p=0.000 n=10+10)
KV/Scan/Native/rows=10-16       21.3kB ± 0%    19.0kB ± 0%  -10.49%  (p=0.000 n=10+9)
KV/Scan/Native/rows=1-16        8.19kB ± 0%    8.06kB ± 0%   -1.59%  (p=0.000 n=8+9)

name                          old allocs/op  new allocs/op  delta
KV/Scan/Native/rows=1-16          72.0 ± 0%      69.0 ± 0%   -4.17%  (p=0.000 n=9+10)
KV/Scan/Native/rows=10-16          236 ± 0%       229 ± 0%   -2.97%  (p=0.001 n=8+9)
KV/Scan/Native/rows=100-16       1.78k ± 0%     1.77k ± 0%   -0.62%  (p=0.000 n=8+8)
KV/Scan/Native/rows=1000-16      17.1k ± 0%     17.1k ± 0%   -0.10%  (p=0.000 n=10+8)
KV/Scan/Native/rows=10000-16      171k ± 0%      171k ± 0%   -0.03%  (p=0.031 n=9+9)
```
@nvb nvb force-pushed the nvanbenschoten/removeLockSpans branch from 93d7ba3 to 328a527 Compare August 21, 2020 00:13
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 21, 2020

I tested this out because I couldn't figure out a way that this could hurt performance as indicated (and didn't have anything better to do on a Thursday night in quarantine). In my experiments following @sumeerbhola's instructions, it does appear to help:

name                                old ms    new ms    delta
Test/postgis_geometry_tutorial/nyc  142 ±14%  138 ±14%  -2.87%  (p=0.000 n=968+969)

old is 904b7cb
new is 328a527 (904b7cb + these two commits)

Those results make a lot more sense to me, as they line up closely with what we see in KV/Scan/Native. @sumeerbhola I made sure to run these 1000 iterations over a series of interleaved 100 run chunks. All tests were run on the same machine. Is it possible that the test results posted above came from two different VMs?

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They came from the same VMs within a few minutes of each other -- but I've noticed enough variability in latency numbers to not be surprised.

Do you have time to polish this up for a proper review?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 21, 2020

Do you have time to polish this up for a proper review?

I can try to polish it up tomorrow if you're on board with trying to get this into the upcoming release.

Out of curiosity, is there a latency target you're interested in? How fast is Postgres on this query?

@sumeerbhola
Copy link
Copy Markdown
Collaborator

sumeerbhola commented Aug 21, 2020

Out of curiosity, is there a latency target you're interested in? How fast is Postgres on this query?

I don't have a latency target, but < 2x would be good. This query is already there (running on my macbook; after I bump Pebble -- which gets us down from 108ms), but there are others that are > 4x slower, so any optimization here will help.

(old is postgres and new is cockroachdb)
name old ms new ms delta
Test/nyc/nyc-intersects 49.5 ±10% 82.8 ±18% +67.18% (p=0.000 n=93+97)

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 21, 2020

I don't think I'm going to be able to get this in a good state for this release. The change still has a few open questions around the timestamps of latches for ResolveIntent, ResolveRangedIntent, and GCRequest. These are all questions that arise from these requests declaring arguably incorrect latches, but that's not a problem we can solve immediately. So I'm going to hold off on this.

nvb added a commit to nvb/cockroach that referenced this pull request Aug 22, 2020
In cockroachdb#52610, we merged the latch and lock span declared by requests. Doing
so is the long term goal in this area, but it doesn't seem like that's
going to be possible for v20.2. Meanwhile, the collection of lock spans
is currently less efficient than the collection of latch spans. This is
because we have a heuristic to eagerly allocate lock span slices ahead
of time, instead of letting them grow as they are appended to. This
optimization does not exist for lock spans.

We can see this by looking at an alloc_space heap profile while running
a geospatial query of interest. We see that the lock spans account for
**9.76%** of all allocated space while the latch spans account for only
**2.65%** of all allocated space.

```
----------------------------------------------------------+-------------
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.DefaultDeclareIsolatedKeys /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/declare.go:90
         0     0%   100%   646.37MB  9.76%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddNonMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:127
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:139
----------------------------------------------------------+-------------
                                          175.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:691
  175.37MB  2.65% 61.51%   175.37MB  2.65%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```

To improve this, we can do the same ahead of time slice resizing for
lock spans by reserving an estimate for the number and type of lock
spans we expect a request to declare. This guess may be a little less
accurate than our guess for latch spans, because there are a few request
types that declare latch spans but not lock spans, but these are rare
requests that are much less important to optimize than the requests that
do declare a latch and a lock span.

With this change, we shave off about **6.57%** of total allocated space
in this workload. We no longer see allocations in `SpanSet.AddNonMVCC`
and the amount of heap memory allocated by `SpanSet.Reserve` roughly
doubles, as we would expect.

```
----------------------------------------------------------+-------------
                                          190.99MB 53.34% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:692
                                          167.09MB 46.66% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:693
  358.08MB  5.84% 36.53%   358.08MB  5.84%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```
nvb added a commit to nvb/cockroach that referenced this pull request Aug 22, 2020
In cockroachdb#52610, we merged the latch and lock span declared by requests. Doing
so is the long term goal in this area, but it doesn't seem like that's
going to be possible for v20.2. Meanwhile, the collection of lock spans
is currently less efficient than the collection of latch spans. This is
because we have a heuristic to eagerly allocate lock span slices ahead
of time, instead of letting them grow as they are appended to. This
optimization does not exist for lock spans.

We can see this by looking at an alloc_space heap profile while running
a geospatial query of interest. We see that the lock spans account for
**9.76%** of all allocated space while the latch spans account for only
**2.65%** of all allocated space.

```
----------------------------------------------------------+-------------
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.DefaultDeclareIsolatedKeys /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/declare.go:90
         0     0%   100%   646.37MB  9.76%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddNonMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:127
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:139
----------------------------------------------------------+-------------
                                          175.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:691
  175.37MB  2.65% 61.51%   175.37MB  2.65%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```

To improve this, we can do the same ahead of time slice resizing for
lock spans by reserving an estimate for the number and type of lock
spans we expect a request to declare. This guess may be a little less
accurate than our guess for latch spans, because there are a few request
types that declare latch spans but not lock spans, but these are rare
requests that are much less important to optimize than the requests that
do declare a latch and a lock span.

With this change, we shave off about **6.57%** of total allocated space
in this workload. We no longer see allocations in `SpanSet.AddNonMVCC`
and the amount of heap memory allocated by `SpanSet.Reserve` roughly
doubles, as we would expect.

```
----------------------------------------------------------+-------------
                                          190.99MB 53.34% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:692
                                          167.09MB 46.66% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:693
  358.08MB  5.84% 36.53%   358.08MB  5.84%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```
craig bot pushed a commit that referenced this pull request Aug 24, 2020
52972: cli: command to deserialize descriptors r=spaskob a=spaskob

We currently export the descriptors and job payloads in debug zips with
hex encoding.  It is often very valuable to decode these protos into
JSON for inspection.

Fixes #52063.

Release note (cli change):
The new debug command `decode-proto` reads descriptor from stdin in hex
or base64 format (auto-detected) and a flag --schema=\<fully qualified name to decode\>
with default value `cockroach.sql.sqlbase.Descriptor` and outputs to stdout the
deserialized proto in JSON format.

53226: colexec: make hash table more dynamic and fix hash join in some cases r=yuzefovich a=yuzefovich

**colexec: make hash table more dynamic**

This commit replaces all fixed allocations of constant size in the hash
table in favor of allocating the slices dynamically which makes the hash
table more dynamic as well. This allows us to place the logic of
clearing up those slices for the next iteration in a single place which
simplifies the code a bit.

Additionally it moves `buckets` slice that is only used by the hash
joiner out of the hash table.

Release note: None

**colexec: fix LEFT ANTI hash join when right eq cols are key**

Release note (bug fix): Previously, CockroachDB could return incorrect
results when performing LEFT ANTI hash join when right equality columns
form a key when it was executed via the vectorized engine, and now this
has been fixed.

**colexec: fix set-op hash joins and optimize them a bit**

This commit fixes several problems with set-operation hash joins:
1. similar to how LEFT ANTI hash joins are handled, INTERSECT ALL and
EXCEPT ALL hash joins rely on `headID` to be populated. That step is
skipped if the right side is distinct (meaning right equality columns
form a key). This would result in incorrect output, and we now override
`rightDistinct` flag to get the desired behavior (probably sub-optimal
but correct).
2. actually, before we would get an incorrect result, we would hit an
out of bounds error because `hashTable.visited` would not have been
created previously. That slice is used by set-op joins to track which
tuples from the right side have already been "deleted" (i.e. they were
used for a match). This is now also fixed.

However, currently the information whether the right equality columns
form a key is not propagated for set-operation joins, so the bug would
not have occurred in the non-test environment.

Additionally, this commit optimizes the handling of LEFT ANTI and EXCEPT
ALL joins by not allocating `same` slice because those two join types
have separate `collectAnti*` methods that don't use that slice.

Release note: None

53248: release: fix adding libgeos files to .tar.gz and compress .zip r=jlinder a=otan

* Fix issue where only the first file was added to a .tar.gz.
* Compress files better when adding to zips.

Release note: None

53252: sql/kv: optimize heap allocations for inverted joins and scans with many spans r=nvanbenschoten a=nvanbenschoten

This PR contains a collection of optimizations that build on #53181,
#53215, and #53237 to reduce the number and size of heap allocations
performed by inverted joins.

Combined, these changes results in a **1.46%** speedup on the following
geospatial query:
```sql
SELECT Count(census.blkid), Count(subways.name)
FROM nyc_census_blocks AS census
JOIN nyc_subway_stations AS subways
ON ST_Intersects(subways.geom, census.geom);
```
```
name                                old ms    new ms    delta
Test/postgis_geometry_tutorial/nyc  144 ±15%  142 ±16%  -1.46%  (p=0.000 n=972+974)
```

----

### kv: reserve lock spans in SpanSet ahead of time

In #52610, we merged the latch and lock span declared by requests. Doing
so is the long term goal in this area, but it doesn't seem like that's
going to be possible for v20.2. Meanwhile, the collection of lock spans
is currently less efficient than the collection of latch spans. This is
because we have a heuristic to eagerly allocate lock span slices ahead
of time, instead of letting them grow as they are appended to. This
optimization does not exist for lock spans.

We can see this by looking at an alloc_space heap profile while running
a geospatial query of interest. We see that the lock spans account for
**9.76%** of all allocated space while the latch spans account for only
**2.65%** of all allocated space.

```
----------------------------------------------------------+-------------
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.DefaultDeclareIsolatedKeys /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/declare.go:90
         0     0%   100%   646.37MB  9.76%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddNonMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:127
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:139
----------------------------------------------------------+-------------
                                          175.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:691
  175.37MB  2.65% 61.51%   175.37MB  2.65%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```

To improve this, we can do the same ahead of time slice resizing for
lock spans by reserving an estimate for the number and type of lock
spans we expect a request to declare. This guess may be a little less
accurate than our guess for latch spans, because there are a few request
types that declare latch spans but not lock spans, but these are rare
requests that are much less important to optimize than the requests that
do declare a latch and a lock span.

With this change, we shave off about **6.57%** of total allocated space
in this workload. We no longer see allocations in `SpanSet.AddNonMVCC`
and the amount of heap memory allocated by `SpanSet.Reserve` roughly
doubles, as we would expect.

```
----------------------------------------------------------+-------------
                                          190.99MB 53.34% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:692
                                          167.09MB 46.66% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:693
  358.08MB  5.84% 36.53%   358.08MB  5.84%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```

### sql/row: batch allocate RequestUnion wrapper structs in txnKVFetcher.fetch

A Request's approach towards representing a union type results in double
indirection. For instance, a ScanRequest is represented like:
```
Request{Value: &RequestUnion_Scan{Scan: &ScanRequest{}}}
```

The code in txnKVFetcher.fetch was batch allocating the inner struct in
this structure across all scans in a batch, but was not batch allocating
the outer "union" struct. The effect of this is that the outer struct's
heap allocation showed up as 4.78% of the alloc_objects heap profile of
an interesting geospatial query:

```
----------------------------------------------------------+-------------
                                           2621460   100% |   github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).fetch /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:297
         0     0%   100%    2621460  4.78%                | github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion).MustSetInner /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.go:604
                                           2621460   100% |   github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion).SetInner /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/batch_generated.go:354
----------------------------------------------------------+-------------
```

This commit avoids this source of per-request allocations by batch
allocating these union structs across all requests in a batch just
like the code was previously batch allocating the inner structs.

### sql/rowexec: avoid allocating slice header on sort in fragmentPendingSpans

batchedInvertedExprEvaluator's fragmentPendingSpans performs a
`sort.Slice` to sort a list of pending spans by their end key.
`sort.Slice` is convenient, but it has some trade-offs. For one, it uses
reflection to mimic parameterization over all slice types. Second, it
accepts the slice as an interface, which causes the slice header to
escape to the heap.

The effect of this is that the call was showing up as 4.35% of the
alloc_objects heap profile of an interesting geospatial query:

```
----------------------------------------------------------+-------------
                                           2387773   100% |   github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:406
   2162754  3.94% 40.58%    2387773  4.35%                | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).fragmentPendingSpans /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:284
                                            225019  9.42% |   sort.Slice /usr/local/go/src/sort/slice.go:15
----------------------------------------------------------+-------------
```

This commit avoids this heap allocation by assigning the slice to a
field on the batchedInvertedExprEvaluator (which was already on the
heap) and using the old `sort.Sort` function.

----

@otan an area I didn't touch but wanted to talk to you about was the
memory representation of `DGeography` and `DGeometry`. Specifically, I'm
wondering why we're embedding pointers instead of values inside of these
datums, effectively creating a double-boxing situation. I see that most
of the geo operators work off pointer and that the "canonical"
representation of the objects exported by `pkg/geo` is a pointer (e.g.
the package exports "New..." methods instead of "Make..." methods). I
don't understand why this is though. Both `Geography` and `Geometry` are
only 40 bytes large, which is well within reason to pass on the stack.

I ask because this makes working with these datums more expensive than
strictly necessary. We make an effort to batch allocate datum objects
themselves (see `DatumAlloc`), but don't do anything special with
internal heap allocations. I suspect that there would be a moderate win
here if we eliminated the double-boxing on any query that scans a large
number of rows containing one of these data types, as I'm seeing the
heap allocations in DecodeUntaggedDatum fairly prominently in profiles.

Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvb added a commit to nvb/cockroach that referenced this pull request Nov 25, 2020
This commit is a partial revert of cockroachdb#40600 and cockroachdb#46830. It solves the same
problem that those PRs were solving, but in a different way.

Those two PRs were handling the case where a reading transaction
observes an intent in its uncertainty interval. Before those fixes, we
were not considering intents in a scan's uncertainty interval to be
uncertain. This had the potential to cause stale reads because an
unresolved intent doesn't indicate that its transaction hasn’t been
committed and is not a causal ancestor of the scan.

The approach the first PR took was to throw a WriteIntentError on
intents in a scan's uncertainty interval. Effectively, it made scans
consider intents in their uncertainty interval to be write-read
conflicts. This had the benefit that the scan would push the intent and
eventually resolve the conflict, either by aborting the intent, pushing
it out of the read's uncertainty interval, or waiting for it to commit.
In this last case (which is by the far the most common), the scan would
then throw an uncertainty error, because it now had a committed value in
its uncertainty interval.

The second PR introduced some awkward code to deal with the fallout from
this decision. Once we started trying to keep the lock table in sync
with the MVCC state, we needed to use latches to ensure proper
synchronization between any operations that could conflict because such
conflicts would influence the state of the lock table. This meant that
we needed to start holding latches across a read's uncertainty interval,
because intent writes in this interval were considered write-read
conflicts. This led to some moderately gross code and always felt a little
wrong.

Now that we are complicating the logic around uncertainty intervals even
further, this becomes even more of a burden. This motivates a reworking
of these interactions. This commit replaces the logic that considers
intents in a transaction's uncertainty interval to be write-read
conflicts for logic that considers such intents to be... uncertain.
Doing so means that a transaction will immediately refresh/restart
above the uncertain timestamp and will only then begin conflicting
with the intent.

This has a number of nice benefits:
1. it keeps all considerations of read uncertainty intervals down in
   MVCC iteration logic. The lock table no longer needs to be aware of it.
   This is a big win now and an even bigger win once we introduce synthetic
   timestamps.
2. read latches no longer need to be acquired up to the edge of a read's
   uncertainty interval, they only need to be acquired up to its read
   timestamp. Similarly, the lock table only needs to consider true
   write-read conflicts.
3. readers that are almost certainly bound to hit an uncertainty error
   and need to restart will now do so sooner. In rare cases, this may
   result in wasted work. In the vast majority of cases, this will allow
   the reader to be more responsive to the commit of its conflict.
4. uncertainty errors will only be thrown for locks in the uncertainty
   interval of a read that are protecting a provisional write (intents).
   Before, any form of a lock in a read's uncertainty interval would be
   considered a write-read conflict, which was pessimistic and not needed
   for correctness.

   In a future with a fully segregated lock table, the change in semantic
   meaning here becomes even more clear. Instead of detecting the lock
   associated with an intent in a read's uncertainty interval and declaring
   a write-read conflict, the read will instead pass through the lock table
   untouched and will detect the provisional value associated with an intent
   and declaring uncertainty. This seems closer to what were actually trying
   to say about these interactions.
5. partially unblocks a change like cockroachdb#52610. Now that latches and lock
   consider the same time ranges for write-read conflicts, merging the
   latch spans and lock spans will be easier.

Before making this change, I intend to validate the hypothesis that it
will not affect performance (or may even slightly improve performance)
by running it on the YCSB benchmark suite.
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-noremind Bots won't notify about PRs with X-noremind

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants