Skip to content

sql/kv: optimize heap allocations for inverted joins and scans with many spans#53252

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/geoOpt
Aug 24, 2020
Merged

sql/kv: optimize heap allocations for inverted joins and scans with many spans#53252
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/geoOpt

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Aug 22, 2020

This PR contains a collection of optimizations that build on #53181,
#53215, and #53237 to reduce the number and size of heap allocations
performed by inverted joins.

Combined, these changes results in a 1.46% speedup on the following
geospatial query:

SELECT Count(census.blkid), Count(subways.name)
FROM nyc_census_blocks AS census
JOIN nyc_subway_stations AS subways
ON ST_Intersects(subways.geom, census.geom);
name                                old ms    new ms    delta
Test/postgis_geometry_tutorial/nyc  144 ±15%  142 ±16%  -1.46%  (p=0.000 n=972+974)

kv: reserve lock spans in SpanSet ahead of time

In #52610, we merged the latch and lock span declared by requests. Doing
so is the long term goal in this area, but it doesn't seem like that's
going to be possible for v20.2. Meanwhile, the collection of lock spans
is currently less efficient than the collection of latch spans. This is
because we have a heuristic to eagerly allocate lock span slices ahead
of time, instead of letting them grow as they are appended to. This
optimization does not exist for lock spans.

We can see this by looking at an alloc_space heap profile while running
a geospatial query of interest. We see that the lock spans account for
9.76% of all allocated space while the latch spans account for only
2.65% of all allocated space.

----------------------------------------------------------+-------------
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.DefaultDeclareIsolatedKeys /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/declare.go:90
         0     0%   100%   646.37MB  9.76%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddNonMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:127
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:139
----------------------------------------------------------+-------------
                                          175.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:691
  175.37MB  2.65% 61.51%   175.37MB  2.65%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------

To improve this, we can do the same ahead of time slice resizing for
lock spans by reserving an estimate for the number and type of lock
spans we expect a request to declare. This guess may be a little less
accurate than our guess for latch spans, because there are a few request
types that declare latch spans but not lock spans, but these are rare
requests that are much less important to optimize than the requests that
do declare a latch and a lock span.

With this change, we shave off about 6.57% of total allocated space
in this workload. We no longer see allocations in SpanSet.AddNonMVCC
and the amount of heap memory allocated by SpanSet.Reserve roughly
doubles, as we would expect.

----------------------------------------------------------+-------------
                                          190.99MB 53.34% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:692
                                          167.09MB 46.66% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:693
  358.08MB  5.84% 36.53%   358.08MB  5.84%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------

sql/row: batch allocate RequestUnion wrapper structs in txnKVFetcher.fetch

A Request's approach towards representing a union type results in double
indirection. For instance, a ScanRequest is represented like:

Request{Value: &RequestUnion_Scan{Scan: &ScanRequest{}}}

The code in txnKVFetcher.fetch was batch allocating the inner struct in
this structure across all scans in a batch, but was not batch allocating
the outer "union" struct. The effect of this is that the outer struct's
heap allocation showed up as 4.78% of the alloc_objects heap profile of
an interesting geospatial query:

----------------------------------------------------------+-------------
                                           2621460   100% |   github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).fetch /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:297
         0     0%   100%    2621460  4.78%                | github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion).MustSetInner /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.go:604
                                           2621460   100% |   github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion).SetInner /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/batch_generated.go:354
----------------------------------------------------------+-------------

This commit avoids this source of per-request allocations by batch
allocating these union structs across all requests in a batch just
like the code was previously batch allocating the inner structs.

sql/rowexec: avoid allocating slice header on sort in fragmentPendingSpans

batchedInvertedExprEvaluator's fragmentPendingSpans performs a
sort.Slice to sort a list of pending spans by their end key.
sort.Slice is convenient, but it has some trade-offs. For one, it uses
reflection to mimic parameterization over all slice types. Second, it
accepts the slice as an interface, which causes the slice header to
escape to the heap.

The effect of this is that the call was showing up as 4.35% of the
alloc_objects heap profile of an interesting geospatial query:

----------------------------------------------------------+-------------
                                           2387773   100% |   github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:406
   2162754  3.94% 40.58%    2387773  4.35%                | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).fragmentPendingSpans /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:284
                                            225019  9.42% |   sort.Slice /usr/local/go/src/sort/slice.go:15
----------------------------------------------------------+-------------

This commit avoids this heap allocation by assigning the slice to a
field on the batchedInvertedExprEvaluator (which was already on the
heap) and using the old sort.Sort function.


@otan an area I didn't touch but wanted to talk to you about was the
memory representation of DGeography and DGeometry. Specifically, I'm
wondering why we're embedding pointers instead of values inside of these
datums, effectively creating a double-boxing situation. I see that most
of the geo operators work off pointer and that the "canonical"
representation of the objects exported by pkg/geo is a pointer (e.g.
the package exports "New..." methods instead of "Make..." methods). I
don't understand why this is though. Both Geography and Geometry are
only 40 bytes large, which is well within reason to pass on the stack.

I ask because this makes working with these datums more expensive than
strictly necessary. We make an effort to batch allocate datum objects
themselves (see DatumAlloc), but don't do anything special with
internal heap allocations. I suspect that there would be a moderate win
here if we eliminated the double-boxing on any query that scans a large
number of rows containing one of these data types, as I'm seeing the
heap allocations in DecodeUntaggedDatum fairly prominently in profiles.

@nvb nvb requested a review from sumeerbhola August 22, 2020 06:11
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@otan
Copy link
Copy Markdown
Contributor

otan commented Aug 22, 2020

@otan an area I didn't touch but wanted to talk to you about was the memory representation of DGeography and DGeometry.

I will work on refactoring this out during the stability period!

(Dan talked me into it...)

In cockroachdb#52610, we merged the latch and lock span declared by requests. Doing
so is the long term goal in this area, but it doesn't seem like that's
going to be possible for v20.2. Meanwhile, the collection of lock spans
is currently less efficient than the collection of latch spans. This is
because we have a heuristic to eagerly allocate lock span slices ahead
of time, instead of letting them grow as they are appended to. This
optimization does not exist for lock spans.

We can see this by looking at an alloc_space heap profile while running
a geospatial query of interest. We see that the lock spans account for
**9.76%** of all allocated space while the latch spans account for only
**2.65%** of all allocated space.

```
----------------------------------------------------------+-------------
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.DefaultDeclareIsolatedKeys /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/declare.go:90
         0     0%   100%   646.37MB  9.76%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddNonMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:127
                                          646.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).AddMVCC /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:139
----------------------------------------------------------+-------------
                                          175.37MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:691
  175.37MB  2.65% 61.51%   175.37MB  2.65%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```

To improve this, we can do the same ahead of time slice resizing for
lock spans by reserving an estimate for the number and type of lock
spans we expect a request to declare. This guess may be a little less
accurate than our guess for latch spans, because there are a few request
types that declare latch spans but not lock spans, but these are rare
requests that are much less important to optimize than the requests that
do declare a latch and a lock span.

With this change, we shave off about **6.57%** of total allocated space
in this workload. We no longer see allocations in `SpanSet.AddNonMVCC`
and the amount of heap memory allocated by `SpanSet.Reserve` roughly
doubles, as we would expect.

```
----------------------------------------------------------+-------------
                                          190.99MB 53.34% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:692
                                          167.09MB 46.66% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).collectSpans /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:693
  358.08MB  5.84% 36.53%   358.08MB  5.84%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).Reserve /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:120
----------------------------------------------------------+-------------
```
@nvb nvb force-pushed the nvanbenschoten/geoOpt branch from 0518e30 to 50e4816 Compare August 22, 2020 17:17
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 22, 2020

(Dan talked me into it...)

Out of curiosity, what were the arguments for the pointer?

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will work on refactoring this out during the stability period!
(Dan talked me into it...)

Interestingly this has come up twice in the last week (though when I mentioned this I was thinking about the extra indirection overhead instead of the allocation).

I'm going to optimize allocations for ewkb.Unmarshal -- there are many slice allocations and we can reuse the memory of the previous slices when iterating over the geometries on the left side of the join.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 22, 2020

I'm going to optimize allocations for ewkb.Unmarshal -- there are many slice allocations and we can reuse the memory of the previous slices when iterating over the geometries on the left side of the join.

I stumbled on to this yesterday as well. Nothing concrete, but in case you're trying to optimize the slice allocation in SpatialObject.Unmarshal, the method uses an append to build the EWKB slice, so if you assigned some capacity to the field from a recycled/batch allocated slice (maybe a ByteAllocator) in DecodeUntaggedGeoValue, you should be able to avoid the fragmented slice allocations per datum.

Actually, maybe we're talking about different things. I now see that ewkb.Unmarshal is something separate.

@otan
Copy link
Copy Markdown
Contributor

otan commented Aug 22, 2020 via email

nvb added 2 commits August 23, 2020 18:23
…fetch

A Request's approach towards representing a union type results in double
indirection. For instance, a ScanRequest is represented like:
```
Request{Value: &RequestUnion_Scan{Scan: &ScanRequest{}}}
```

The code in txnKVFetcher.fetch was batch allocating the inner struct in
this structure across all scans in a batch, but was not batch allocating
the outer "union" struct. The effect of this is that the outer struct's
heap allocation showed up as 4.78% of the alloc_objects heap profile of
an interesting geospatial query:

```
----------------------------------------------------------+-------------
                                           2621460   100% |   github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).fetch /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:297
         0     0%   100%    2621460  4.78%                | github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion).MustSetInner /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.go:604
                                           2621460   100% |   github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion).SetInner /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/batch_generated.go:354
----------------------------------------------------------+-------------
```

This commit avoids this source of per-request allocations by batch
allocating these union structs across all requests in a batch just
like the code was previously batch allocating the inner structs.
…Spans

batchedInvertedExprEvaluator's fragmentPendingSpans performs a
`sort.Slice` to sort a list of pending spans by their end key.
`sort.Slice` is convenient, but it has some trade-offs. For one, it uses
reflection to mimic parameterization over all slice types. Second, it
accepts the slice as an interface, which causes the slice header to
escape to the heap.

The effect of this is that the call was showing up as 4.35% of the
alloc_objects heap profile of an interesting geospatial query:

```
----------------------------------------------------------+-------------
                                           2387773   100% |   github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:406
   2162754  3.94% 40.58%    2387773  4.35%                | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).fragmentPendingSpans /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:284
                                            225019  9.42% |   sort.Slice /usr/local/go/src/sort/slice.go:15
----------------------------------------------------------+-------------
```

This commit avoids this heap allocation by assigning the slice to a
field on the batchedInvertedExprEvaluator (which was already on the
heap) and using the old `sort.Sort` function.
@nvb nvb force-pushed the nvanbenschoten/geoOpt branch from 50e4816 to e639c97 Compare August 23, 2020 22:23
@sumeerbhola
Copy link
Copy Markdown
Collaborator

in case you're trying to optimize the slice allocation in SpatialObject.Unmarshal, the method uses an append to build the EWKB slice, so if you assigned some capacity to the field from a recycled/batch allocated slice (maybe a ByteAllocator) in DecodeUntaggedGeoValue, you should be able to avoid the fragmented slice allocations per datum.

I'm trying to wrap my head around this: once @otan has changed DGeometry to contain a geopb.SpatialObject the DatumAlloc will take care of amortizing that allocation. But since DatumAlloc is only about amortizing allocations and not recycling the same slice, say []tree.DGeometry, to be reused, the geopb.SpatialObjects in there will never have been used before, so their EWKB field will be nil. EWKB is only appended to once, so if we don't get the capacity to be sufficient up-front, the allocation is wasted. And we could potentially overallocate and waste space. If we could hand the Unmarshal routing a ByteAllocator it could use the amortization built into ByteAllocator but this is generated code. Can we make the assumption that when ExprHelper.IndexedVarEval is called, we can reuse the previously allocated Datums in its DatumAlloc -- I am guessing not since some of those may have escaped as the result of the evaluation.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 24, 2020

I'm trying to wrap my head around this

It's not a fully thought out idea. I don't know enough about how this works for it to be.

EWKB is only appended to once, so if we don't get the capacity to be sufficient up-front, the allocation is wasted

That was something I was unsure about. I didn't know whether EWKB bytes had a fixed size or a variable-length size. If it's fixed then this would be easier.

One question I have with all of this, which is more of a SQL question than a geo-specific question, is whether DecodeUntaggedDatum can assume mutable ownership over the bytes associated with the datum that it is decoding. There was prior discussion about this in #50117 (comment), but it was never fully clear to me. I ask because we seem to treat the input slice to DecodeUntaggedDatum as an immutable temporary reference, so we copy out of it whenever we need to. But this is probably more pessimistic than strictly necessary if we were clearer about ownership in this code, and this has a real performance impact (again, see #50117 (comment)). If we handed over ownership of the bytes to the decoded datum then we could do something like:

--- a/pkg/util/encoding/encoding.go
+++ b/pkg/util/encoding/encoding.go
@@ -2586,6 +2586,7 @@ func DecodeUntaggedGeoValue(
        if err != nil {
                return b, geopb.SpatialObject{}, err
        }
+       spatialObject.EWKB = data[:0:len(data)]
        err = protoutil.Unmarshal(data, &spatialObject)
        return remaining, spatialObject, err
 }

This would avoid any interior allocations without needing to touch the generated code. It would leave an unfortunate memcpy, but that's probably a price we'd be ok with.

nvb added a commit to nvb/cockroach that referenced this pull request Aug 24, 2020
This commit shaves off a heap allocation in each of the following functions:
- `EncodeTableKey`
- `EncodeGeoAscending`
- `EncodeGeoDescending`
- `DecodeGeoAscending`
- `DecodeGeoDescending`
- `EncodeGeoValue`
- `EncodeUntaggedGeoValue`
- `DecodeUntaggedGeoValue`

In each of these, we were letting a local SpatialObject variable escape
to the heap when passed through protoutil.Marshal or protoutil.Unmarshal.
This was unnecessary, as we already have a Datum on the heap nearby any
of the callers of these functions, so with a bit of restructuring, we
can avoid the allocations entirely.

The callers of the decode functions are a little more awkward than
strictly necessary right now. They are written the way that they are so
that we don't accidentally regress on this improvement when we remove
the double-boxing being discussed in cockroachdb#53252.

I don't have any benchmarks set up, so I wasn't able to measure the
exact effect of this change.
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know whether EWKB bytes had a fixed size or a variable-length size. If it's fixed then this would be easier.

It is variable sized

:lgtm:

Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/sql/rowexec/inverted_expr_evaluator.go, line 299 at r3 (raw file):

	// end keys. Assign slice to a field on the receiver before sorting to avoid
	// a heap allocation when the slice header passes through an interface.
	b.pendingSpansToSort = invertedSpanRoutingInfosByEndKey(pendingSpans)

this was clever.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 24, 2020

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Build succeeded:

@craig craig bot merged commit a40c8cb into cockroachdb:master Aug 24, 2020
craig bot pushed a commit that referenced this pull request Aug 24, 2020
53287: builtins: implement ST_Multi and ST_Collection(Extract|Homogenize) r=otan a=erikgrinaker

Release note (sql change): Implement the geometry builtins `ST_Multi`,
`ST_CollectionExtract`, and `ST_CollectionHomogenize`.

Closes #48904.
Closes #48905.
Closes #48991.

53298: backupccl: checksum backup manifest r=dt a=pbardea

This commit checksums that backup manifest when it is written. This
helps detect issues such as random bitflips of the manifest file.

When writing a backup manifest, it will also write the checksum of that
file to a file with a "-CHECKSUM" suffix appended to the manifest file
name. When reading manifests, we check for the '...-CHECKSUM' file,
and if one is present, we compare the checksums. If there isn't one
present, we will continue as we did before. This allows backwards
compatibility with old backups.

Fixes #52061.

Release note: None

53304: encoding: avoid heap allocation when encoding/decoding SpatialObject r=nvanbenschoten a=nvanbenschoten

This commit shaves off a heap allocation in each of the following functions:
- `EncodeTableKey`
- `EncodeGeoAscending`
- `EncodeGeoDescending`
- `DecodeGeoAscending`
- `DecodeGeoDescending`
- `EncodeGeoValue`
- `EncodeUntaggedGeoValue`
- `DecodeUntaggedGeoValue`

In each of these, we were letting a local SpatialObject variable escape
to the heap when passed through `protoutil.Marshal` or `protoutil.Unmarshal`.
This was unnecessary, as we already have a Datum on the heap nearby any
of the callers of these functions, so with a bit of restructuring, we
can avoid the allocations entirely.

The callers of the decode functions are a little more awkward than
strictly necessary right now. They are written the way that they are so
that we don't accidentally regress on this improvement when we remove
the double-boxing being discussed in #53252.

I don't have any benchmarks set up, so I wasn't able to measure the
exact effect of this change.

53324: security: add a missing file info label r=nvanbenschoten a=knz

Release note: None

53326: builtins: implement ST_Expand for Geometry types r=sumeerbhola a=otan

Release note (sql change): implement ST_Expand for Geometry based types.

Co-authored-by: Erik Grinaker <erik@grinaker.org>
Co-authored-by: Paul Bardea <pbardea@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
@nvb nvb deleted the nvanbenschoten/geoOpt branch August 25, 2020 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants