Skip to content

sqlliveness/slstorage,timeutil: various improvements#53196

Merged
craig[bot] merged 6 commits intocockroachdb:masterfrom
ajwerner:ajwerner/tighten-up-sqlliveness-behavior
Aug 23, 2020
Merged

sqlliveness/slstorage,timeutil: various improvements#53196
craig[bot] merged 6 commits intocockroachdb:masterfrom
ajwerner:ajwerner/tighten-up-sqlliveness-behavior

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

See individual commits. This PR addresses some anxieties I had about the implementation of slstorage. In particular, it makes the testing deterministic by giving it a new table independent of the host cluster's state and allows injecting time. It then changes the GC to be less frequent and instead relies on failing out sessions when it determines they are expired. This is important to avoid spinning between when a session is deemed expired and when it would later be deleted by a gc interval. It also jitters that GC interval to avoid problems due to contention in larger clusters.

@ajwerner ajwerner requested a review from spaskob August 21, 2020 13:59
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I had set it up to review each commit separately, they are self-contained :(

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spaskob)

Copy link
Copy Markdown
Contributor

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 15 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spaskob)

@ajwerner
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r=spaskob

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 21, 2020

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Aug 21, 2020
46407: kvcoord: small fixes r=andreimatei a=andreimatei

See individual commits.

52740: sql: owning a schema gives privilege to drop tables in schema r=RichardJCai a=RichardJCai

sql: owning a schema gives privilege to drop tables in schema

Release note (sql change): Schema owners can drop tables inside
the schema without explicit DROP privilege on the table.

Fixes #51931

53127: builtins: use ST_Union as aggregate r=rytaft,sumeerbhola a=otan

Add the ST_Union aggregate, removing the two-argument version
temporarily as we cannot currently have an aggregate and non-aggregate
at the same time. This is ok since we haven't released yet, and from
reading it seems more likely people will use the aggregate version.

Release note (sql change): Implement the ST_Union builtin as an
aggregate. The previous alpha-available ST_Union for two arguments is
deprecated.

53162: builtins: implement ST_GeomFromGeoHash/ST_Box2DFromGeoHash r=sumeerbhola a=otan

Resolves #48806

Release note (sql change): Implemented the ST_GeomFromGeoHash and
ST_Box2DFromGeoHash builtins.

53181: sql/kv: avoid expensive heap allocations with inverted joins r=nvanbenschoten a=nvanbenschoten

This PR contains three optimizations that reduce heap allocations by
inverted join operations. These were found by profiling the following
geospatial query:
```sql
SELECT Count(census.blkid), Count(subways.name)
FROM nyc_census_blocks AS census
JOIN nyc_subway_stations AS subways
ON ST_Intersects(subways.geom, census.geom);
```

Combined, these three commits speed up the query by a little over **3%**.

```
name                                old ms    new ms    delta
Test/postgis_geometry_tutorial/nyc  139 ±12%  135 ±11%  -3.04%  (p=0.000 n=241+243)
```

#### sql/rowexec: avoid heap allocations in batchedInvertedExprEvaluator

This commit restructures the slice manipulation performed in
`(*batchedInvertedExprEvaluator).init` to avoid heap allocations. Before
the change, this method used to contain the first and fifth most
expensive memory allocations by total allocation size (`alloc_space`)
in a geospatial query of interest:
```
----------------------------------------------------------+-------------
                                         3994.06MB   100% |   github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).readInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:418
 3994.06MB 10.70% 10.70%  3994.06MB 10.70%                | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:366
----------------------------------------------------------+-------------

----------------------------------------------------------+-------------
                                          954.07MB   100% |   github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).readInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:418
  954.07MB  4.61% 41.71%   954.07MB  4.61%                | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:409
----------------------------------------------------------+-------------
```

The largest offender here was the `routingSpans` slice, which made no
attempt to recycle memory. The other offender was the `pendingSpans`
slice. We made an attempt to recycle the memory in this slice, except
we truncated it from the front instead of the back, so we ended us
not actually recycling anything.

This commit addresses this in two ways. First, it updates the `pendingSpans`
slice to point into the `routingSpans` slice so the two can share memory.
`pendingSpans` becomes a mutable window into `routingSpans`, so it no
longer requires its own heap allocation. Next, the commit updates the
memory recycling to recycle `routingSpans` instead of `pendingSpans`.
Since we never truncate `routingSpans` from the front, the recycling
now works.

With these two changes, the two heap allocations, which used to account
for **15.31%** of total allocated space combined drops to a single heap
allocation that accounts for only **2.43%** of total allocated space on
the geospatial query.

```
----------------------------------------------------------+-------------
                                          182.62MB   100% |   github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).readInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:418
  182.62MB  2.43% 58.36%   182.62MB  2.43%                | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:366
----------------------------------------------------------+-------------
```

#### sql: properly copy span slices

This commit fixes two locations where we not optimally performing copies
from one slice to another.

In `invertedJoiner.generateSpans`, we were not pre-allocating the
destination slice and were using append instead of direct assignment,
which is measurably slower due to the extra writes to the slice header.

In `getProtoSpans`, we again were appending to a slice with zero length
and sufficient capacity instead of assigning to a slice with sufficient
length.

#### kv: check for expensive logging before VEventf in appendRefreshSpans

Before this change, we called VEventf directly with the refresh span
corresponding to each request in a batch. This caused each span object
to allocate while passing through an `interface{}`. This could be seen
in heap profiles, where it accounted for **1.89%** of total allocated
memory (`alloc_space`) in a geospatial query of interest:

```
----------------------------------------------------------+-------------
                                          142.01MB   100% |   github.com/cockroachdb/cockroach/pkg/roachpb.(*BatchRequest).RefreshSpanIterate /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/batch.go:407
  142.01MB  1.89% 68.71%   142.01MB  1.89%                | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).appendRefreshSpans.func1 /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:510
----------------------------------------------------------+-------------
```

This commit adds a guard around the VEventf to avoid the allocation.

53196: sqlliveness/slstorage,timeutil: various improvements r=spaskob a=ajwerner

See individual commits. This PR addresses some anxieties I had about the implementation of slstorage. In particular, it makes the testing deterministic by giving it a new table independent of the host cluster's state and allows injecting time. It then changes the GC to be less frequent and instead relies on failing out sessions when it determines they are expired. This is important to avoid spinning between when a session is deemed expired and when it would later be deleted by a gc interval. It also jitters that GC interval to avoid problems due to contention in larger clusters. 

53214: serverpb: add docstrings to `HotRangesRequest` and `HotRangesResponse` r=mjibson a=knz

This is part of a project to document the public HTTP APIs.

@mjibson you'll notice in this PR that the doc generator doesn't deal well with newline characters in docstrings.

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: richardjcai <caioftherichard@gmail.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 21, 2020

Build failed (retrying...):

@otan
Copy link
Copy Markdown
Contributor

otan commented Aug 22, 2020

merge conflict

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 22, 2020

Canceled.

@ajwerner ajwerner force-pushed the ajwerner/tighten-up-sqlliveness-behavior branch from f0f85e5 to f108b75 Compare August 22, 2020 01:44
@ajwerner
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 22, 2020

Build failed (retrying...):

@ajwerner
Copy link
Copy Markdown
Contributor Author

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 22, 2020

Canceled.

@ajwerner ajwerner force-pushed the ajwerner/tighten-up-sqlliveness-behavior branch from f108b75 to eb3fbf8 Compare August 22, 2020 05:20
Andrew Werner added 2 commits August 22, 2020 15:59
Once a second seems quite fast. Even our epoch-based leases aren't that fast
and they time out after 7 seconds.

Release note: None
These are generally useful tools to control not just the clock but associated
timers. These tools are helpful when testing code that attempts to do something
with some periodicity.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/tighten-up-sqlliveness-behavior branch from eb3fbf8 to 72b9a9a Compare August 22, 2020 20:01
Andrew Werner added 4 commits August 22, 2020 16:28
The old testing facilities relied on interacting with the state of the
subsystem in a running cluster. It also relied on real clocks. This
made it difficult to write precise tests without serious hacks.

Release note: None
…on fetch

Running the GC every is very extreme in order to be responsive to failures.
In general, if we observe that an expiration hasn't changed when we go to
fetch, we should just delete it then. This makes the GC interval much less
important for not a lot of cost. The GC process is not cheap and will contend
with updates.

Release note: None
Periodic loops which contend for the same data can cause real problems if
they are aligned with system startup. Let's jitter the loop and not run it
immediately upon startup.

Release note: None
This wasn't really a correctness problem as we already checked the dead
cache but if we determine a session is dead, why keep it around?

Also adds a TODO that was in my head.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/tighten-up-sqlliveness-behavior branch from 72b9a9a to d0a0140 Compare August 22, 2020 20:28
@ajwerner
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 22, 2020

Build failed:

@ajwerner
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 23, 2020

Build succeeded:

@craig craig bot merged commit d121978 into cockroachdb:master Aug 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants