Skip to content

storage: avoid blocking the command queue for scans with limits#31904

Closed
spencerkimball wants to merge 1 commit intocockroachdb:masterfrom
spencerkimball:efficient-scan-limits
Closed

storage: avoid blocking the command queue for scans with limits#31904
spencerkimball wants to merge 1 commit intocockroachdb:masterfrom
spencerkimball:efficient-scan-limits

Conversation

@spencerkimball
Copy link
Copy Markdown
Member

SQL has a tendancy to create scans which cover a range's entire key
span, though looking only to return a finite number of results. These
requests end up blocking writes in the command queue, which block
other reads, causing requsets to effectively serialize. In reality,
when there is a scan with a limit, the actual affected key span ends
up being a small subset of the range's key span.

This change creates a new "pre-evaluation" execution path for read-
only requests. When pre-evaluating, the batch still interacts with
the command queue, but it neither waits for pre-requisites nor causes
successor, dependent write batches from waiting on it. Instead, any
prereqs or dependents are tracked while the read-only batch is
immediately evaluated without blocking. On successful evaluation,
the actual span(s) covered and recorded in the read-only batch's
BatchResponse are compared to the pre-requisite and dependent
commands. If there is no overlap, the timestamp cache is updated and
the read-only batch response returned to the caller. If there is
overlap, then the read-only batch is re-executed, but with the
pre-evaluation option disabled, so that it proceeds to interact with
the command queue in the traditional fashion, waiting on prereqs
and making dependents wait on it.

Release note: improved performance on workloads which mix OLAP queries
with updates.

@spencerkimball spencerkimball requested review from a team and nvb October 25, 2018 22:48
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the contending case, we'll perform the read twice, correct? Seems like that should usually be a win. Echoing my comment on #31663, I worry that this could be destabilizing (though the concern here is less than on #31663). Perhaps we should hold off on merging this until we've got the roachtest stability issues under control.

PS I think this fixes #9521.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@tbg
Copy link
Copy Markdown
Member

tbg commented Oct 26, 2018

What @petermattis said. Other than worrying about the stability aspect though, I'm pretty excited about this change. (This being a perf change, it'll really need some numbers, though!).

Copy link
Copy Markdown
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this fixes #9521.

Needs tests, and I'd like to see a detailed overview comment of pre-evaluation somewhere.

Release note: improved performance on workloads which mix OLAP queries
with updates.

I don't see this change as being particularly OLAP-focused. OLAP queries tend to not have limits, or to have limits after a GROUP BY or other post-processing. The LIMIT 1 seen in #9521 is definitely an OLTP use case. (The OLAP queries that run into this issue are likely indications that SQL is using a batch limit that is too small, which is something we should consider as we rethink how we handle parallelism and memory limits here)

A related topic (I thought we had an issue for this but can't find it now) is that if we acquired a rocksdb snapshot at the right time, we would not need to keep reads in the command queue while they evaluate. This might be a pessimization for the low-contention use case (especially because we have some reservations about rocksdb snapshots in general), but it might be worth considering and evaluating.

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/command_queue.go, line 108 at r1 (raw file):

	prereqsAndDeps []*cmd

	pending chan struct{} // closed when complete, or if pre-evaulating

Where exactly does pending get closed for pre-evaluating commands?


pkg/storage/command_queue_test.go, line 37 at r1 (raw file):

func add(cq *CommandQueue, from, to roachpb.Key, readOnly bool, prereqs []*cmd) *cmd {
	return cq.add(readOnly, false /* preEvaluate */, zeroTS, prereqs, []roachpb.Span{{Key: from, EndKey: to}})

There should be some tests in command_queue_test.go with preEvaluate set to true.


pkg/storage/replica.go, line 2430 at r1 (raw file):

					// If either the command or its prereq are pre-evaluating,
					// resolve the prereq immediately instead of waiting.
					if newCmd.preEvaluate || pre.preEvaluate {

If pending is closed for preevaluating commands (as described in the comment above), we don't need the second part of this check and can fall through into the select.


pkg/storage/replica.go, line 3054 at r1 (raw file):

) (br *roachpb.BatchResponse, pErr *roachpb.Error) {
	br, pErr = r.tryExecuteReadOnlyBatch(ctx, ba, true /* preEvaluateOnLimits */)
	if pErr == nil || pErr != retryWithoutPreEvaluationError {

The pErr == nil part of this is unnecessary.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some tests and benchmarks demonstrating the effect of this. We'll certainly want to do something like this, but I do get the feeling that the CommandQueue is getting in your way and resulting in some artificial overhead.

A related topic (I thought we had an issue for this but can't find it now) is that if we acquired a rocksdb snapshot at the right time, we would not need to keep reads in the command queue while they evaluate. This might be a pessimization for the low-contention use case (especially because we have some reservations about rocksdb snapshots in general), but it might be worth considering and evaluating.

You might be thinking about https://forum.cockroachlabs.com/t/why-do-we-keep-read-commands-in-the-command-queue/360. There's definitely something here, and addressing it would probably simplify the code in this PR significantly. We could synchronously grab a snapshot of overlapping write spans in the CommandQueue while grabbing a RocksDB snapshot. We would then pre-evaluate and check whether our results overlap with these write spans.

We might also consider not leaving the CommandQueue while pre-evaluating. That wouldn't fit into the current code structure, but would prevent later writes from slipping in front of a pre-evaluating scan.

Reviewed 2 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/replica.go, line 2430 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If pending is closed for preevaluating commands (as described in the comment above), we don't need the second part of this check and can fall through into the select.

That's a nice generalization.


pkg/storage/replica.go, line 3078 at r1 (raw file):

func (r *Replica) tryExecuteReadOnlyBatch(
	ctx context.Context, ba roachpb.BatchRequest, preEvaluateOnLimits bool,
) (br *roachpb.BatchResponse, pErr *roachpb.Error) {

Instead of making retryWithoutPreEvaluationError an error, I'd draw a parallel with proposalRetryReason.


pkg/storage/replica.go, line 3142 at r1 (raw file):

	defer func() {
		// If we pre-evaluated and skipped waiting in the command queue,
		// check for concurrent write conflicts.

This is relying on the fact that none of the read request types have the updatesTSCacheOnError flag, otherwise, a retryWithoutPreEvaluationError would still update the timestamp cache, which would be an issue. Let's make that very clear.

@spencerkimball
Copy link
Copy Markdown
Member Author

Looks like we get about 15% improvement on throughput with a modified KV which reads a value > $1 LIMIT 1. However, the p99 latencies are worse, suggesting retries. It's a contrived test.

Before:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
      1s        0        10805.0        10804.8      1.4      2.8      3.5     10.5
      2s        0        11804.5        11304.2      1.3      2.5      3.1      5.0
      3s        0        11787.0        11465.0      1.3      2.5      3.1      5.0
      4s        0        11510.1        11476.2      1.4      2.5      3.3      5.0
      5s        0        10966.0        11374.2      1.4      2.6      3.4     11.5
      6s        0        10544.4        11235.9      1.4      2.8      3.7     22.0
      7s        0        10806.0        11174.5      1.4      2.8      3.4      5.5
      8s        0        10994.6        11152.0      1.4      2.6      3.4      6.3
      9s        0        10871.5        11120.9      1.4      2.8      3.5      5.8
     10s        0        11375.2        11146.3      1.4      2.6      3.3      5.8
     11s        0        11108.9        11142.9      1.4      2.6      3.3     21.0
     12s        0        11337.1        11159.1      1.4      2.6      3.3      6.0
     13s        0        10646.1        11119.7      1.4      2.9      3.7      6.3
     14s        0        11148.8        11121.8      1.4      2.6      3.4      5.0
     15s        0        11445.7        11143.4      1.4      2.6      3.3      6.0
     16s        0        11232.8        11149.0      1.4      2.6      3.5     17.8
     17s        0        11253.2        11155.1      1.3      2.8      3.7      7.1
     18s        0        11998.0        11201.9      1.3      2.5      3.1      5.2
     19s        0        11711.3        11228.7      1.3      2.5      3.3      4.7
     20s        0        11027.8        11218.7      1.4      2.8      3.4      5.5
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     21s        0        10893.1        11203.2      1.4      2.8      3.4      6.0
     22s        0        11082.8        11197.7      1.4      2.8      3.4      6.0
     23s        0        10746.1        11178.1      1.4      2.8      3.7     26.2
     24s        0        11102.5        11174.9      1.4      2.6      3.5      5.5
     25s        0        10487.7        11147.5      1.4      2.9      3.7     18.9
     26s        0        10642.3        11128.0      1.4      2.8      3.5     25.2
     27s        0        10923.1        11120.4      1.4      2.8      3.5      6.3
     28s        0        11374.2        11129.5      1.4      2.6      3.3      5.0
     29s        0        11563.1        11144.4      1.3      2.6      3.3      5.5
     30s        0        11290.1        11149.3      1.4      2.6      3.5      6.3
     31s        0        11160.0        11149.6      1.4      2.8      3.5      5.5
     32s        0        10223.6        11120.7      1.5      2.9      3.8      7.3
     33s        0        10420.6        11099.5      1.4      3.0      4.2      7.9
     34s        0        11381.8        11107.8      1.4      2.6      3.3      5.0
     35s        0        12124.9        11136.8      1.3      2.5      3.0      4.5
     36s        0        11296.6        11141.3      1.3      2.6      3.4     25.2
     37s        0        10771.8        11131.3      1.4      2.8      3.5      5.5
     38s        0        10724.1        11120.6      1.4      2.8      3.5      4.7
     39s        0        11241.0        11123.6      1.4      2.6      3.3      5.2
     40s        0        11403.4        11130.6      1.4      2.6      3.3      5.2
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     41s        0        11077.9        11129.3      1.4      2.6      3.4      6.0
     42s        0        11350.9        11134.6      1.4      2.6      3.4      5.2
     43s        0        11639.1        11146.4      1.3      2.5      3.1      6.0
     44s        0        10879.2        11140.3      1.4      2.8      3.5      5.8
     45s        0        11548.9        11149.3      1.3      2.5      3.3      5.2
     46s        0        10887.0        11143.6      1.4      2.6      3.4     26.2
     47s        0        10626.9        11132.6      1.4      2.9      3.5      5.8
     48s        0        10161.0        11112.4      1.5      2.9      3.8     17.8
     49s        0        10727.1        11104.5      1.4      2.8      3.5     24.1
     50s        0        10412.1        11090.7      1.4      2.9      3.7      5.2
     51s        0        10266.7        11074.5      1.5      2.9      3.7      5.8
     52s        0        10460.9        11062.7      1.5      2.8      3.7      6.6
     53s        0        10458.8        11051.3      1.4      2.9      3.7     13.6
     54s        0        10825.1        11047.2      1.4      2.8      3.5      6.6
     55s        0        10556.5        11038.2      1.4      2.8      3.5     13.1
     56s        0        10031.3        11020.3      1.4      3.0      3.8     10.5
     57s        0        10142.8        11004.9      1.5      2.9      3.7     30.4
     58s        0        11007.8        11004.9      1.4      2.8      3.5      6.3
     59s        0        11275.0        11009.5      1.4      2.6      3.3      5.2
    1m0s        0        11034.1        11009.9      1.4      2.8      3.4      5.5
^C
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
   60.5s        0         665834        11005.7      1.5      1.4      2.8      3.5     30.4

After:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
      1s        0        12822.9        12822.6      1.2      2.9      4.7     15.2
      2s        0        12549.8        12686.4      1.3      2.9      4.7      8.1
      3s        0        12544.5        12639.1      1.2      2.9      4.7      8.1
      4s        0        12340.1        12564.4      1.3      2.9      4.7      7.1
      5s        0        12404.5        12532.4      1.2      3.0      5.0      9.4
      6s        0        12185.0        12474.5      1.3      3.0      5.0      7.9
      7s        0        12435.1        12468.8      1.3      3.0      4.7      8.9
      8s        0        12558.3        12480.0      1.3      2.9      5.0     17.8
      9s        0        12929.9        12530.0      1.2      2.8      4.5      7.9
     10s        0        12864.7        12563.4      1.4      2.8      4.5      7.9
     11s        0        13181.2        12619.6      1.0      2.8      4.5      7.3
     12s        0        12835.3        12637.6      1.2      2.8      4.5     14.7
     13s        0        12969.9        12663.1      1.1      2.8      4.7      7.3
     14s        0        12802.3        12673.1      1.2      2.9      4.5      7.9
     15s        0        12747.5        12678.0      1.2      3.0      4.7      8.9
     16s        0        13536.9        12731.7      1.2      2.8      4.5      7.9
     17s        0        13399.5        12771.0      1.2      2.8      4.5      9.4
     18s        0        12957.4        12781.3      1.1      2.9      4.7     21.0
     19s        0        13106.3        12798.4      1.2      2.8      4.5      9.4
     20s        0        13119.0        12814.5      1.2      2.8      4.5      9.4
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     21s        0        11813.9        12766.9      1.2      3.0      5.2     26.2
     22s        0        13217.0        12787.4      1.2      2.8      4.5      7.3
     23s        0        12794.3        12787.7      1.2      2.9      4.5      7.9
     24s        0        12770.5        12786.9      1.2      2.9      4.7      8.9
     25s        0        11596.0        12739.2      1.3      3.1      5.0     10.0
     26s        0        13034.1        12750.6      1.1      2.8      4.5      8.4
     27s        0        13416.3        12775.2      1.0      2.8      4.5      7.6
     28s        0        12703.9        12772.7      1.2      2.8      4.7     23.1
     29s        0        12377.5        12759.0      1.2      3.0      5.0     10.0
     30s        0        12116.5        12737.6      1.3      3.0      5.0      8.4
     31s        0        12219.2        12720.9      1.2      3.1      4.7      9.4
     32s        0        12125.2        12702.3      1.2      3.1      5.0     25.2
     33s        0        12361.1        12691.9      1.3      3.0      5.0      8.9
     34s        0        13018.1        12701.5      1.2      2.8      4.5      8.1
     35s        0        12881.3        12706.7      1.2      2.9      4.7      7.9
     36s        0        12810.5        12709.6      1.2      2.8      4.5      8.9
     37s        0        13176.7        12722.2      1.2      2.8      4.5      7.6
     38s        0        12765.0        12723.3      1.1      2.9      4.7     31.5
     39s        0        12874.0        12727.2      1.2      2.8      4.5      7.9
     40s        0        13287.0        12741.1      1.2      2.8      4.5      8.9
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     41s        0        12988.8        12747.2      1.2      2.8      4.7      8.1
     42s        0        12914.7        12751.2      1.2      2.9      4.5      8.1
     43s        0        11907.8        12731.5      1.3      3.3      5.0      8.9
     44s        0        13273.3        12743.8      1.1      2.8      4.7      7.9
     45s        0        12901.8        12747.3      1.2      2.8      4.7     24.1
     46s        0        13019.0        12753.2      1.2      2.8      4.5      7.3
     47s        0        13124.4        12761.1      1.2      2.8      4.5      6.8
     48s        0        12619.6        12758.1      1.2      2.9      4.7     29.4
     49s        0        12477.3        12752.4      1.3      2.9      4.5     10.5
     50s        0        12249.1        12742.3      1.3      3.0      5.0      8.4
     51s        0         9809.9        12684.9      1.4      4.1      6.6     14.2
     52s        0        12410.3        12679.6      1.2      2.9      4.7      9.4
     53s        0        13540.9        12695.8      1.2      2.6      4.2      7.1
     54s        0        13003.8        12701.5      1.2      2.8      4.5      7.6
     55s        0        13203.9        12710.7      1.1      2.8      4.5      8.1
     56s        0        13270.9        12720.7      1.2      2.8      4.5     10.0
     57s        0        11965.4        12707.5      1.2      2.9      4.7     62.9
     58s        0        13795.0        12726.2      1.0      2.5      3.4     29.4
     59s        0        14251.7        12752.1      1.0      2.5      3.4      8.4
    1m0s        0        14451.4        12780.4      1.0      2.4      3.3      5.2
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1m1s        0        13751.9        12796.3      1.1      2.6      3.4      7.6
^C
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
   61.1s        0         781401        12795.6      1.3      1.2      2.9      4.7     62.9

@spencerkimball
Copy link
Copy Markdown
Member Author

I also ran with LIMIT=10000. All of these are from fresh data directories, so the performance drops steadily as the key space fills up and successive select stmts return more rows.

Before:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
      1s        0         4916.7         4916.5      2.5      8.4     12.1     22.0
      2s        0         2594.6         3756.4      5.0     15.2     23.1     41.9
      3s        0         2016.4         3176.8      6.8     19.9     27.3     52.4
      4s        0         1618.8         2787.2      7.3     25.2     39.8     60.8
      5s        0         1396.9         2509.3      8.9     31.5     50.3     88.1
      6s        0         1354.0         2316.9      9.4     29.4     39.8     56.6
      7s        0         1205.9         2158.2     10.0     33.6     56.6    192.9
      8s        0         1168.1         2034.5     11.0     37.7     46.1     58.7
      9s        0         1101.1         1930.6     11.0     37.7     54.5     71.3
     10s        0         1035.8         1841.3     11.0     44.0     60.8    130.0
     11s        0          967.8         1761.9     12.1     48.2     67.1    100.7
     12s        0          898.2         1689.9     13.1     50.3     71.3     96.5
     13s        0          860.0         1626.1     14.2     50.3     71.3     96.5
     14s        0          831.0         1569.3     14.2     54.5     75.5    113.2
     15s        0          764.0         1515.6     14.2     67.1     88.1    121.6
     16s        0          798.3         1470.8     13.1     62.9     79.7    130.0
     17s        0          719.7         1426.6     16.3     62.9     92.3    121.6
     18s        0          727.5         1387.8     14.7     71.3     92.3    151.0
     19s        0          817.5         1357.8     14.2     58.7     83.9    100.7
     20s        0          660.0         1322.9     17.8     67.1     92.3    134.2
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     21s        0          704.3         1292.8     15.2     71.3     96.5    125.8
     22s        0          665.5         1264.9     17.8     67.1     88.1    125.8
     23s        0          644.0         1237.9     18.9     71.3     92.3    151.0
     24s        0          690.9         1215.1     16.8     67.1     83.9    109.1
     25s        0          631.0         1191.8     17.8     75.5    121.6    184.5
     26s        0          616.0         1169.6     19.9     75.5    109.1    134.2
     27s        0          616.0         1149.1     19.9     75.5    104.9    134.2
     28s        0          623.0         1130.3     18.9     71.3     96.5    142.6
     29s        0          672.9         1114.5     18.9     67.1     92.3    104.9
     30s        0          565.9         1096.3     21.0     79.7    104.9    167.8
     31s        0          564.9         1079.1     18.9     79.7    117.4    142.6
     32s        0          626.2         1065.0     18.9     71.3     88.1    109.1
     33s        0          556.0         1049.6     22.0     79.7    104.9    125.8
     34s        0          574.5         1035.5     18.9     79.7    104.9    142.6
     35s        0          505.7         1020.4     22.0    100.7    130.0    218.1
     36s        0          568.7         1007.9     18.9     83.9    100.7    121.6
     37s        0          529.8          995.0     22.0     88.1    117.4    184.5
     38s        0          545.3          983.1     19.9     92.3    113.2    159.4
     39s        0          530.0          971.5     21.0     83.9    104.9    130.0
     40s        0          569.0          961.5     18.9     83.9    142.6    159.4
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     41s        0          557.3          951.6     19.9     88.1    113.2    176.2
     42s        0          530.6          941.6     23.1     83.9    117.4    142.6
     43s        0          495.8          931.2     27.3     92.3    130.0    151.0
     44s        0          545.1          922.4     21.0     79.7    117.4    151.0
     45s        0          451.0          912.0     26.2    100.7    159.4    201.3
     46s        0          501.8          903.0     23.1     96.5    142.6    184.5
     47s        0          547.3          895.5     22.0     79.7    113.2    142.6
     48s        0          485.9          887.0     25.2     96.5    121.6    184.5
     49s        0          497.6          879.0     25.2     83.9    113.2    151.0
     50s        0          494.4          871.3     22.0     88.1    121.6    159.4
     51s        0          541.0          864.8     21.0     83.9    113.2    130.0
     52s        0          503.9          857.9     24.1     88.1    109.1    142.6
     53s        0          463.0          850.4     23.1    100.7    134.2    176.2
     54s        0          496.0          843.9     22.0    100.7    121.6    184.5
     55s        0          463.9          837.0     24.1     96.5    121.6    142.6
     56s        0          506.5          831.1     23.1     92.3    130.0    159.4
     57s        0          487.6          825.0     23.1     92.3    130.0    184.5
     58s        0          461.0          818.8     25.2     96.5    142.6    218.1
     59s        0          486.6          813.1     23.1    100.7    134.2    159.4
    1m0s        0          481.4          807.6     26.2     96.5    121.6    167.8
^C
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
   60.3s        0          48580          806.1     19.8     11.0     71.3    104.9    218.1

After:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
      1s        0         4651.0         4650.9      2.8      8.4     12.1     21.0
      2s        0         2503.3         3578.0      5.5     15.2     23.1     35.7
      3s        0         1877.1         3011.0      7.3     21.0     29.4     39.8
      4s        0         1627.2         2666.3      7.9     25.2     35.7     56.6
      5s        0         1409.8         2414.9      8.9     30.4     41.9     71.3
      6s        0         1296.2         2228.7     10.0     30.4     41.9    104.9
      7s        0         1089.9         2066.0     11.5     39.8     54.5     83.9
      8s        0          983.0         1930.7     13.1     41.9     54.5     71.3
      9s        0         1019.6         1829.4     12.6     39.8     56.6     92.3
     10s        0          905.6         1737.1     13.6     48.2     75.5    125.8
     11s        0          884.8         1659.6     13.6     52.4     65.0    113.2
     12s        0          838.6         1591.0     14.7     50.3     71.3     88.1
     13s        0          772.9         1528.2     14.7     58.7     92.3    121.6
     14s        0          743.7         1472.2     15.2     60.8     92.3    151.0
     15s        0          614.4         1415.0     18.9     75.5     96.5    159.4
     16s        0          681.1         1369.2     16.8     62.9    109.1    159.4
     17s        0          690.0         1329.2     16.8     58.7     83.9    104.9
     18s        0          708.8         1294.8     17.8     60.8     96.5    142.6
     19s        0          638.2         1260.2     17.8     71.3    104.9    209.7
     20s        0          620.7         1228.2     19.9     75.5    104.9    117.4
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     21s        0          695.9         1202.9     16.3     62.9     88.1    134.2
     22s        0          660.7         1178.2     17.8     67.1    100.7    159.4
     23s        0          661.8         1155.8     17.8     71.3     92.3    176.2
     24s        0          652.4         1134.8     18.9     71.3    104.9    142.6
     25s        0          623.3         1114.3     19.9     71.3    100.7    176.2
     26s        0          652.2         1096.6     18.9     71.3     92.3    125.8
     27s        0          595.6         1078.0     19.9     75.5     96.5    117.4
     28s        0          634.4         1062.2     18.9     71.3    100.7    192.9
     29s        0          572.2         1045.3     22.0     71.3     96.5    142.6
     30s        0          563.2         1029.2     19.9     88.1    113.2    159.4
     31s        0          612.0         1015.7     17.8     71.3     96.5    159.4
     32s        0          607.6         1003.0     19.9     71.3    104.9    121.6
     33s        0          609.2          991.1     19.9     71.3     92.3    100.7
     34s        0          535.4          977.7     24.1     75.5     96.5    142.6
     35s        0          509.8          964.3     21.0     96.5    134.2    184.5
     36s        0          530.7          952.2     24.1     75.5    121.6    142.6
     37s        0          573.8          942.0     22.0     71.3     96.5    121.6
     38s        0          528.2          931.2     25.2     79.7    104.9    142.6
     39s        0          536.0          921.0     22.0     83.9    125.8    192.9
     40s        0          537.8          911.4     21.0     79.7    134.2    176.2
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
     41s        0          561.1          902.9     21.0     75.5    109.1    117.4
     42s        0          549.1          894.5     22.0     83.9    125.8    167.8
     43s        0          551.8          886.5     19.9     79.7    104.9    134.2
     44s        0          511.6          878.0     24.1     83.9    117.4    142.6
     45s        0          495.5          869.5     24.1     88.1    134.2    167.8
     46s        0          490.1          861.2     24.1     92.3    130.0    167.8
     47s        0          452.8          852.5     24.1     96.5    125.8    167.8
     48s        0          483.0          844.9     26.2     92.3    142.6    218.1
     49s        0          457.7          836.9     25.2     92.3    125.8    184.5
     50s        0          444.3          829.1     24.1    113.2    142.6    209.7
     51s        0          532.2          823.3     21.0     79.7    117.4    167.8
     52s        0          462.8          816.3     28.3     96.5    117.4    159.4
     53s        0          469.5          809.8     25.2    100.7    134.2    159.4
     54s        0          489.2          803.9     23.1     92.3    130.0    184.5
     55s        0          409.1          796.7     29.4    113.2    176.2    226.5
     56s        0          442.1          790.4     27.3     96.5    134.2    167.8
     57s        0          469.6          784.7     25.2     92.3    121.6    176.2
     58s        0          460.0          779.1     26.2     96.5    130.0    167.8
     59s        0          485.2          774.1     23.1     88.1    113.2    151.0
    1m0s        0          432.0          768.4     30.4     96.5    130.0    184.5
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1m1s        0          497.3          764.0     25.2     92.3    117.4    142.6
    1m2s        0          448.8          758.9     29.4     92.3    130.0    176.2
    1m3s        0          456.0          754.1     27.3    100.7    125.8    167.8
^C
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
   63.1s        0          47594          753.7     21.2     12.6     71.3    104.9    226.5

@spencerkimball
Copy link
Copy Markdown
Member Author

I'm happy to leave this PR alone. It doesn't seem crucially important; I just had an idea that seemed workable to address the issue and wanted to give it form. I'm going to leave this branch for future consideration / testing by someone who can really own this work.

SQL has a tendancy to create scans which cover a range's entire key
span, though looking only to return a finite number of results. These
requests end up blocking writes in the command queue, which block
other reads, causing requsets to effectively serialize. In reality,
when there is a scan with a limit, the actual affected key span ends
up being a small subset of the range's key span.

This change creates a new "pre-evaluation" execution path for read-
only requests. When pre-evaluating, the batch still interacts with
the command queue, but it neither waits for pre-requisites nor causes
successor, dependent write batches from waiting on it. Instead, any
prereqs or dependents are tracked while the read-only batch is
immediately evaluated without blocking. On successful evaluation,
the actual span(s) covered and recorded in the read-only batch's
`BatchResponse` are compared to the pre-requisite and dependent
commands. If there is no overlap, the timestamp cache is updated and
the read-only batch response returned to the caller. If there is
overlap, then the read-only batch is re-executed, but with the
pre-evaluation option disabled, so that it proceeds to interact with
the command queue in the traditional fashion, waiting on prereqs
and making dependents wait on it.

Release note: improved performance on workloads which mix OLAP queries
with updates.
nvb added a commit to nvb/cockroach that referenced this pull request Nov 22, 2018
Informs cockroachdb#4768.
Informs cockroachdb#31904.

This change was inspired by cockroachdb#31904 and is a progression of the thinking
started in cockroachdb#4768 (comment).

The change introduces `spanlatch.Manager`, which will replace the `CommandQueue`
**in a future PR**. The new type isn't hooked up yet because doing so will
require a lot of plumbing changes in that storage package that are best kept
in a separate PR. The structure uses a new strategy that reduces lock contention,
simplifies the code, avoids allocations, and makes cockroachdb#31904 easier to implement.

The primary objective, reducing lock contention, is addressed by minimizing
the amount of work we perform under the exclusive "sequencing" mutex while
locking the structure. This is made possible by employing a copy-on-write
strategy. Before this change, commands would lock the queue, create a large
slice of prerequisites, insert into the queue and unlock. After the change,
commands lock the manager, grab an immutable snapshot of the manager's trees in
O(1) time, insert into the manager, and unlock. They can then iterate over the
immutable tree snapshot outside of the lock. Effectively, this means that
the work performed under lock is linear with respect to the number of spans
that a command declares but NO LONGER linear with respect to the number of
other commands that it will wait on. This is important because `Replica.beginCmds`
repeatedly comes up as the largest source of mutex contention in our system,
especially on hot ranges.

The use of immutable snapshots also simplifies the code significantly. We're
no longer copying our prereqs into a slice so we no longer need to carefully
determine which transitive dependencies we do or don't need to wait on
explicitly. This also makes lock cancellation trivial because we no longer
explicitly hold on to our prereqs at all. Instead, we simply iterate through
the snapshot outside of the lock.

While rewriting the structure, I also spent some time optimizing its allocations.
Under normal operation, acquiring a latch now incurs only a single allocation -
that being for the `spanlatch.Guard`. All other allocations are avoided through
object pooling where appropriate. The overhead of using a copy-on-write
technique is almost entirely avoided by atomically reference counting btree nodes,
which allows us to release them back into the btree node pools when they're no
longer references by any btree snapshots. This means that we don't expect any
allocations when inserting into the internal trees, even with the COW policy.

Finally, this will make the approach taken in cockroachdb#31904 much more natural.
Instead of tracking dependents and prerequisites for speculative reads
and then iterating through them to find overlaps after, we can use the
immutable snapshots directly! We can grab a snapshot and sequence ourselves
as usual, but avoid waiting for prereqs. We then execute optimistically
before finally checking whether we overlapped any of our prereqs. The
great thing about this is that we already have the prereqs in an interval
tree structure, so we get an efficient validation check for free.

_### Naming changes

| Before                     | After                             |
|----------------------------|-----------------------------------|
| `CommandQueue`             | `spanlatch.Manager`               |
| "enter the command queue"  | "acquire span latches"            |
| "exit the command queue"   | "release span latches"            |
| "wait for prereq commands" | "wait for latches to be released" |

The use of the word "latch" is based on the definition of
latches presented by Goetz Graefe in https://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe.pdf
(see https://i.stack.imgur.com/fSRzd.png). An important reason
for avoiding the word "lock" here is that it is critical for
understanding that we don't confuse the operational locking
performed by the CommandQueue/spanlatch.Manager with the
transaction-scoped locking enforced by intents and our
transactional concurrency control model.

_### Microbenchmarks

NOTE: these are single-threaded benchmarks that don't benefit at all
from the concurrency improvements enabled by this new structure.

```
name                              cmdq time/op    spanlatch time/op    delta
ReadOnlyMix/size=1-4                  897ns ±21%           917ns ±18%     ~     (p=0.897 n=8+10)
ReadOnlyMix/size=4-4                  827ns ±22%           772ns ±15%     ~     (p=0.448 n=10+10)
ReadOnlyMix/size=16-4                 905ns ±19%           770ns ±10%  -14.90%  (p=0.004 n=10+10)
ReadOnlyMix/size=64-4                 907ns ±20%           730ns ±15%  -19.51%  (p=0.001 n=10+10)
ReadOnlyMix/size=128-4                926ns ±17%           731ns ±11%  -21.04%  (p=0.000 n=9+10)
ReadOnlyMix/size=256-4                977ns ±19%           726ns ± 9%  -25.65%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=0-4       12.5µs ± 4%           0.7µs ±17%  -94.70%  (p=0.000 n=8+9)
ReadWriteMix/readsPerWrite=1-4       8.18µs ± 5%          0.63µs ± 6%  -92.24%  (p=0.000 n=10+9)
ReadWriteMix/readsPerWrite=4-4       3.80µs ± 2%          0.66µs ± 5%  -82.58%  (p=0.000 n=8+10)
ReadWriteMix/readsPerWrite=16-4      1.82µs ± 2%          0.70µs ± 5%  -61.43%  (p=0.000 n=9+10)
ReadWriteMix/readsPerWrite=64-4       894ns ±12%           514ns ± 6%  -42.48%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=128-4      717ns ± 5%           472ns ± 1%  -34.21%  (p=0.000 n=10+8)
ReadWriteMix/readsPerWrite=256-4      607ns ± 5%           453ns ± 3%  -25.35%  (p=0.000 n=7+10)

name                              cmdq alloc/op   spanlatch alloc/op   delta
ReadOnlyMix/size=1-4                   223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=4-4                   223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=16-4                  223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=64-4                  223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=128-4                 223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=256-4                 223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=0-4         915B ± 0%            144B ± 0%  -84.26%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=1-4         730B ± 0%            144B ± 0%  -80.29%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=4-4         486B ± 0%            144B ± 0%  -70.35%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=16-4        350B ± 0%            144B ± 0%  -58.86%  (p=0.000 n=9+10)
ReadWriteMix/readsPerWrite=64-4        222B ± 0%            144B ± 0%  -35.14%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=128-4       199B ± 0%            144B ± 0%  -27.64%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=256-4       188B ± 0%            144B ± 0%  -23.40%  (p=0.000 n=10+10)

name                              cmdq allocs/op  spanlatch allocs/op  delta
ReadOnlyMix/size=1-4                   1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=4-4                   1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=16-4                  1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=64-4                  1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=128-4                 1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=256-4                 1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=0-4         34.0 ± 0%             1.0 ± 0%  -97.06%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=1-4         22.0 ± 0%             1.0 ± 0%  -95.45%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=4-4         10.0 ± 0%             1.0 ± 0%  -90.00%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=16-4        4.00 ± 0%            1.00 ± 0%  -75.00%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=64-4        1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=128-4       1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=256-4       1.00 ± 0%            1.00 ± 0%     ~     (all equal)
```

Release note: None
nvb added a commit to nvb/cockroach that referenced this pull request Nov 28, 2018
Informs cockroachdb#4768.
Informs cockroachdb#31904.

This change was inspired by cockroachdb#31904 and is a progression of the thinking
started in cockroachdb#4768 (comment).

The change introduces `spanlatch.Manager`, which will replace the `CommandQueue`
**in a future PR**. The new type isn't hooked up yet because doing so will
require a lot of plumbing changes in that storage package that are best kept
in a separate PR. The structure uses a new strategy that reduces lock contention,
simplifies the code, avoids allocations, and makes cockroachdb#31904 easier to implement.

The primary objective, reducing lock contention, is addressed by minimizing
the amount of work we perform under the exclusive "sequencing" mutex while
locking the structure. This is made possible by employing a copy-on-write
strategy. Before this change, commands would lock the queue, create a large
slice of prerequisites, insert into the queue and unlock. After the change,
commands lock the manager, grab an immutable snapshot of the manager's trees in
O(1) time, insert into the manager, and unlock. They can then iterate over the
immutable tree snapshot outside of the lock. Effectively, this means that
the work performed under lock is linear with respect to the number of spans
that a command declares but NO LONGER linear with respect to the number of
other commands that it will wait on. This is important because `Replica.beginCmds`
repeatedly comes up as the largest source of mutex contention in our system,
especially on hot ranges.

The use of immutable snapshots also simplifies the code significantly. We're
no longer copying our prereqs into a slice so we no longer need to carefully
determine which transitive dependencies we do or don't need to wait on
explicitly. This also makes lock cancellation trivial because we no longer
explicitly hold on to our prereqs at all. Instead, we simply iterate through
the snapshot outside of the lock.

While rewriting the structure, I also spent some time optimizing its allocations.
Under normal operation, acquiring a latch now incurs only a single allocation -
that being for the `spanlatch.Guard`. All other allocations are avoided through
object pooling where appropriate. The overhead of using a copy-on-write
technique is almost entirely avoided by atomically reference counting btree nodes,
which allows us to release them back into the btree node pools when they're no
longer references by any btree snapshots. This means that we don't expect any
allocations when inserting into the internal trees, even with the COW policy.

Finally, this will make the approach taken in cockroachdb#31904 much more natural.
Instead of tracking dependents and prerequisites for speculative reads
and then iterating through them to find overlaps after, we can use the
immutable snapshots directly! We can grab a snapshot and sequence ourselves
as usual, but avoid waiting for prereqs. We then execute optimistically
before finally checking whether we overlapped any of our prereqs. The
great thing about this is that we already have the prereqs in an interval
tree structure, so we get an efficient validation check for free.

_### Naming changes

| Before                     | After                             |
|----------------------------|-----------------------------------|
| `CommandQueue`             | `spanlatch.Manager`               |
| "enter the command queue"  | "acquire span latches"            |
| "exit the command queue"   | "release span latches"            |
| "wait for prereq commands" | "wait for latches to be released" |

The use of the word "latch" is based on the definition of
latches presented by Goetz Graefe in https://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe.pdf
(see https://i.stack.imgur.com/fSRzd.png). An important reason
for avoiding the word "lock" here is that it is critical for
understanding that we don't confuse the operational locking
performed by the CommandQueue/spanlatch.Manager with the
transaction-scoped locking enforced by intents and our
transactional concurrency control model.

_### Microbenchmarks

NOTE: these are single-threaded benchmarks that don't benefit at all
from the concurrency improvements enabled by this new structure.

```
name                              cmdq time/op    spanlatch time/op    delta
ReadOnlyMix/size=1-4                  897ns ±21%           917ns ±18%     ~     (p=0.897 n=8+10)
ReadOnlyMix/size=4-4                  827ns ±22%           772ns ±15%     ~     (p=0.448 n=10+10)
ReadOnlyMix/size=16-4                 905ns ±19%           770ns ±10%  -14.90%  (p=0.004 n=10+10)
ReadOnlyMix/size=64-4                 907ns ±20%           730ns ±15%  -19.51%  (p=0.001 n=10+10)
ReadOnlyMix/size=128-4                926ns ±17%           731ns ±11%  -21.04%  (p=0.000 n=9+10)
ReadOnlyMix/size=256-4                977ns ±19%           726ns ± 9%  -25.65%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=0-4       12.5µs ± 4%           0.7µs ±17%  -94.70%  (p=0.000 n=8+9)
ReadWriteMix/readsPerWrite=1-4       8.18µs ± 5%          0.63µs ± 6%  -92.24%  (p=0.000 n=10+9)
ReadWriteMix/readsPerWrite=4-4       3.80µs ± 2%          0.66µs ± 5%  -82.58%  (p=0.000 n=8+10)
ReadWriteMix/readsPerWrite=16-4      1.82µs ± 2%          0.70µs ± 5%  -61.43%  (p=0.000 n=9+10)
ReadWriteMix/readsPerWrite=64-4       894ns ±12%           514ns ± 6%  -42.48%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=128-4      717ns ± 5%           472ns ± 1%  -34.21%  (p=0.000 n=10+8)
ReadWriteMix/readsPerWrite=256-4      607ns ± 5%           453ns ± 3%  -25.35%  (p=0.000 n=7+10)

name                              cmdq alloc/op   spanlatch alloc/op   delta
ReadOnlyMix/size=1-4                   223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=4-4                   223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=16-4                  223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=64-4                  223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=128-4                 223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=256-4                 223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=0-4         915B ± 0%            144B ± 0%  -84.26%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=1-4         730B ± 0%            144B ± 0%  -80.29%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=4-4         486B ± 0%            144B ± 0%  -70.35%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=16-4        350B ± 0%            144B ± 0%  -58.86%  (p=0.000 n=9+10)
ReadWriteMix/readsPerWrite=64-4        222B ± 0%            144B ± 0%  -35.14%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=128-4       199B ± 0%            144B ± 0%  -27.64%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=256-4       188B ± 0%            144B ± 0%  -23.40%  (p=0.000 n=10+10)

name                              cmdq allocs/op  spanlatch allocs/op  delta
ReadOnlyMix/size=1-4                   1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=4-4                   1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=16-4                  1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=64-4                  1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=128-4                 1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=256-4                 1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=0-4         34.0 ± 0%             1.0 ± 0%  -97.06%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=1-4         22.0 ± 0%             1.0 ± 0%  -95.45%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=4-4         10.0 ± 0%             1.0 ± 0%  -90.00%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=16-4        4.00 ± 0%            1.00 ± 0%  -75.00%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=64-4        1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=128-4       1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=256-4       1.00 ± 0%            1.00 ± 0%     ~     (all equal)
```

Release note: None
nvb added a commit to nvb/cockroach that referenced this pull request Nov 29, 2018
Informs cockroachdb#4768.
Informs cockroachdb#31904.

This change was inspired by cockroachdb#31904 and is a progression of the thinking
started in cockroachdb#4768 (comment).

The change introduces `spanlatch.Manager`, which will replace the `CommandQueue`
**in a future PR**. The new type isn't hooked up yet because doing so will
require a lot of plumbing changes in that storage package that are best kept
in a separate PR. The structure uses a new strategy that reduces lock contention,
simplifies the code, avoids allocations, and makes cockroachdb#31904 easier to implement.

The primary objective, reducing lock contention, is addressed by minimizing
the amount of work we perform under the exclusive "sequencing" mutex while
locking the structure. This is made possible by employing a copy-on-write
strategy. Before this change, commands would lock the queue, create a large
slice of prerequisites, insert into the queue and unlock. After the change,
commands lock the manager, grab an immutable snapshot of the manager's trees in
O(1) time, insert into the manager, and unlock. They can then iterate over the
immutable tree snapshot outside of the lock. Effectively, this means that
the work performed under lock is linear with respect to the number of spans
that a command declares but NO LONGER linear with respect to the number of
other commands that it will wait on. This is important because `Replica.beginCmds`
repeatedly comes up as the largest source of mutex contention in our system,
especially on hot ranges.

The use of immutable snapshots also simplifies the code significantly. We're
no longer copying our prereqs into a slice so we no longer need to carefully
determine which transitive dependencies we do or don't need to wait on
explicitly. This also makes lock cancellation trivial because we no longer
explicitly hold on to our prereqs at all. Instead, we simply iterate through
the snapshot outside of the lock.

While rewriting the structure, I also spent some time optimizing its allocations.
Under normal operation, acquiring a latch now incurs only a single allocation -
that being for the `spanlatch.Guard`. All other allocations are avoided through
object pooling where appropriate. The overhead of using a copy-on-write
technique is almost entirely avoided by atomically reference counting btree nodes,
which allows us to release them back into the btree node pools when they're no
longer references by any btree snapshots. This means that we don't expect any
allocations when inserting into the internal trees, even with the COW policy.

Finally, this will make the approach taken in cockroachdb#31904 much more natural.
Instead of tracking dependents and prerequisites for speculative reads
and then iterating through them to find overlaps after, we can use the
immutable snapshots directly! We can grab a snapshot and sequence ourselves
as usual, but avoid waiting for prereqs. We then execute optimistically
before finally checking whether we overlapped any of our prereqs. The
great thing about this is that we already have the prereqs in an interval
tree structure, so we get an efficient validation check for free.

_### Naming changes

| Before                     | After                             |
|----------------------------|-----------------------------------|
| `CommandQueue`             | `spanlatch.Manager`               |
| "enter the command queue"  | "acquire span latches"            |
| "exit the command queue"   | "release span latches"            |
| "wait for prereq commands" | "wait for latches to be released" |

The use of the word "latch" is based on the definition of
latches presented by Goetz Graefe in https://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe.pdf
(see https://i.stack.imgur.com/fSRzd.png). An important reason
for avoiding the word "lock" here is that it is critical for
understanding that we don't confuse the operational locking
performed by the CommandQueue/spanlatch.Manager with the
transaction-scoped locking enforced by intents and our
transactional concurrency control model.

_### Microbenchmarks

NOTE: these are single-threaded benchmarks that don't benefit at all
from the concurrency improvements enabled by this new structure.

```
name                              cmdq time/op    spanlatch time/op    delta
ReadOnlyMix/size=1-4                  897ns ±21%           917ns ±18%     ~     (p=0.897 n=8+10)
ReadOnlyMix/size=4-4                  827ns ±22%           772ns ±15%     ~     (p=0.448 n=10+10)
ReadOnlyMix/size=16-4                 905ns ±19%           770ns ±10%  -14.90%  (p=0.004 n=10+10)
ReadOnlyMix/size=64-4                 907ns ±20%           730ns ±15%  -19.51%  (p=0.001 n=10+10)
ReadOnlyMix/size=128-4                926ns ±17%           731ns ±11%  -21.04%  (p=0.000 n=9+10)
ReadOnlyMix/size=256-4                977ns ±19%           726ns ± 9%  -25.65%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=0-4       12.5µs ± 4%           0.7µs ±17%  -94.70%  (p=0.000 n=8+9)
ReadWriteMix/readsPerWrite=1-4       8.18µs ± 5%          0.63µs ± 6%  -92.24%  (p=0.000 n=10+9)
ReadWriteMix/readsPerWrite=4-4       3.80µs ± 2%          0.66µs ± 5%  -82.58%  (p=0.000 n=8+10)
ReadWriteMix/readsPerWrite=16-4      1.82µs ± 2%          0.70µs ± 5%  -61.43%  (p=0.000 n=9+10)
ReadWriteMix/readsPerWrite=64-4       894ns ±12%           514ns ± 6%  -42.48%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=128-4      717ns ± 5%           472ns ± 1%  -34.21%  (p=0.000 n=10+8)
ReadWriteMix/readsPerWrite=256-4      607ns ± 5%           453ns ± 3%  -25.35%  (p=0.000 n=7+10)

name                              cmdq alloc/op   spanlatch alloc/op   delta
ReadOnlyMix/size=1-4                   223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=4-4                   223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=16-4                  223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=64-4                  223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=128-4                 223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadOnlyMix/size=256-4                 223B ± 0%            191B ± 0%  -14.35%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=0-4         915B ± 0%            144B ± 0%  -84.26%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=1-4         730B ± 0%            144B ± 0%  -80.29%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=4-4         486B ± 0%            144B ± 0%  -70.35%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=16-4        350B ± 0%            144B ± 0%  -58.86%  (p=0.000 n=9+10)
ReadWriteMix/readsPerWrite=64-4        222B ± 0%            144B ± 0%  -35.14%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=128-4       199B ± 0%            144B ± 0%  -27.64%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=256-4       188B ± 0%            144B ± 0%  -23.40%  (p=0.000 n=10+10)

name                              cmdq allocs/op  spanlatch allocs/op  delta
ReadOnlyMix/size=1-4                   1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=4-4                   1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=16-4                  1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=64-4                  1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=128-4                 1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=256-4                 1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=0-4         34.0 ± 0%             1.0 ± 0%  -97.06%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=1-4         22.0 ± 0%             1.0 ± 0%  -95.45%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=4-4         10.0 ± 0%             1.0 ± 0%  -90.00%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=16-4        4.00 ± 0%            1.00 ± 0%  -75.00%  (p=0.000 n=10+10)
ReadWriteMix/readsPerWrite=64-4        1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=128-4       1.00 ± 0%            1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=256-4       1.00 ± 0%            1.00 ± 0%     ~     (all equal)
```

Release note: None
craig bot pushed a commit that referenced this pull request Nov 29, 2018
31997: storage/spanlatch: create spanlatch.Manager using immutable btrees r=nvanbenschoten a=nvanbenschoten

Informs #4768.
Informs #31904.

This change was inspired by #31904 and is a progression of the thinking started in #4768 (comment).

The change introduces `spanlatch.Manager`, which will replace the `CommandQueue` **in a future PR**. The new type isn't hooked up yet because doing so will require a lot of plumbing changes in that storage package that are best kept in a separate PR. The structure uses a new strategy that reduces lock contention, simplifies the code, avoids allocations, and makes #31904 easier to implement.

The primary objective, reducing lock contention, is addressed by minimizing the amount of work we perform under the exclusive "sequencing" mutex while locking the structure. This is made possible by employing a copy-on-write strategy. Before this change, commands would lock the queue, create a large slice of prerequisites, insert into the queue and unlock. After the change, commands lock the manager, grab an immutable snapshot of the manager's trees in O(1) time, insert into the manager, and unlock. They can then iterate over the immutable tree snapshot outside of the lock. Effectively, this means that the work performed under lock is linear with respect to the number of spans that a command declares but NO LONGER linear with respect to the number of other commands that it will wait on. This is important because `Replica.beginCmds` repeatedly comes up as the largest source of mutex contention in our system, especially on hot ranges.

The use of immutable snapshots also simplifies the code significantly. We're no longer copying our prereqs into a slice so we no longer need to carefully determine which transitive dependencies we do or don't need to wait on explicitly. This also makes lock cancellation trivial because we no longer explicitly hold on to our prereqs at all. Instead, we simply iterate through the snapshot outside of the lock.

While rewriting the structure, I also spent some time optimizing its allocations. Under normal operation, acquiring a latch now incurs only a single allocation - that being for the `spanlatch.Guard`. All other allocations are avoided through object pooling where appropriate. The overhead of using a copy-on-write technique is almost entirely avoided by atomically reference counting immutable btree nodes, which allows us to release them back into the btree node pools when they're no longer needed. This means that we don't expect any allocations when inserting into the internal trees, even with the copy-on-write policy.

Finally, this will make the approach taken in #31904 much more natural. Instead of tracking dependents and prerequisites for speculative reads and then iterating through them to find overlaps after, we can use the immutable snapshots directly! We can grab a snapshot and sequence ourselves as usual, but avoid waiting for prereqs. We then execute optimistically before finally checking whether we overlapped any of our prereqs. The great thing about this is that we already have the prereqs in an interval tree structure, so we get an efficient validation check for free.

### Naming changes

| Before                     | After                             |
|----------------------------|-----------------------------------|
| `CommandQueue`             | `spanlatch.Manager`               |
| "enter the command queue"  | "acquire span latches"            |
| "exit the command queue"   | "release span latches"            |
| "wait for prereq commands" | "wait for latches to be released" |

The use of the word "latch" is based on the definition of latches presented by Goetz Graefe in https://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe.pdf (see https://i.stack.imgur.com/fSRzd.png). An important reason for avoiding the word "lock" here is that it is critical for understanding that we don't confuse the operational locking performed by the CommandQueue/spanlatch.Manager with the transaction-scoped locking enforced by intents and our transactional concurrency control model.

### Microbenchmarks

NOTE: these are single-threaded benchmarks that don't benefit at all from the concurrency improvements enabled by this new structure.

```
name                              old time/op    new time/op    delta
ReadOnlyMix/size=1-4                 706ns ±20%     404ns ±10%  -42.81%  (p=0.008 n=5+5)
ReadOnlyMix/size=4-4                 649ns ±23%     382ns ± 5%  -41.13%  (p=0.008 n=5+5)
ReadOnlyMix/size=16-4                611ns ±16%     367ns ± 5%  -39.83%  (p=0.008 n=5+5)
ReadOnlyMix/size=64-4                692ns ±14%     370ns ± 1%  -46.49%  (p=0.016 n=5+4)
ReadOnlyMix/size=128-4               637ns ±22%     398ns ±14%  -37.48%  (p=0.008 n=5+5)
ReadOnlyMix/size=256-4               676ns ±15%     385ns ± 4%  -43.01%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=0-4      12.2µs ± 4%     0.6µs ±17%  -94.85%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=1-4      7.88µs ± 2%    0.55µs ± 7%  -92.99%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=4-4      4.19µs ± 3%    0.58µs ± 5%  -86.26%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=16-4     2.09µs ± 6%    0.54µs ±13%  -74.13%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=64-4      875ns ±17%     423ns ±29%  -51.64%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=128-4     655ns ± 6%     362ns ±16%  -44.71%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=256-4     549ns ±16%     314ns ±13%  -42.73%  (p=0.008 n=5+5)

name                              old alloc/op   new alloc/op   delta
ReadOnlyMix/size=1-4                  223B ± 0%      160B ± 0%  -28.25%  (p=0.079 n=4+5)
ReadOnlyMix/size=4-4                  223B ± 0%      160B ± 0%  -28.25%  (p=0.008 n=5+5)
ReadOnlyMix/size=16-4                 223B ± 0%      160B ± 0%  -28.25%  (p=0.008 n=5+5)
ReadOnlyMix/size=64-4                 223B ± 0%      160B ± 0%  -28.25%  (p=0.008 n=5+5)
ReadOnlyMix/size=128-4                217B ± 4%      160B ± 0%  -26.27%  (p=0.008 n=5+5)
ReadOnlyMix/size=256-4                223B ± 0%      160B ± 0%  -28.25%  (p=0.079 n=4+5)
ReadWriteMix/readsPerWrite=0-4      1.25kB ± 0%    0.16kB ± 0%  -87.15%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=1-4      1.00kB ± 0%    0.16kB ± 0%  -84.00%  (p=0.079 n=4+5)
ReadWriteMix/readsPerWrite=4-4        708B ± 0%      160B ± 0%  -77.40%  (p=0.079 n=4+5)
ReadWriteMix/readsPerWrite=16-4       513B ± 0%      160B ± 0%  -68.81%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=64-4       264B ± 0%      160B ± 0%  -39.39%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=128-4      221B ± 0%      160B ± 0%  -27.60%  (p=0.079 n=4+5)
ReadWriteMix/readsPerWrite=256-4      198B ± 0%      160B ± 0%  -19.35%  (p=0.008 n=5+5)

name                              old allocs/op  new allocs/op  delta
ReadOnlyMix/size=1-4                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=4-4                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=16-4                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=64-4                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=128-4                1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ReadOnlyMix/size=256-4                1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=0-4        38.0 ± 0%       1.0 ± 0%  -97.37%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=1-4        24.0 ± 0%       1.0 ± 0%  -95.83%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=4-4        12.0 ± 0%       1.0 ± 0%  -91.67%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=16-4       5.00 ± 0%      1.00 ± 0%  -80.00%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=64-4       2.00 ± 0%      1.00 ± 0%  -50.00%  (p=0.008 n=5+5)
ReadWriteMix/readsPerWrite=128-4      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ReadWriteMix/readsPerWrite=256-4      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```

There are a few interesting things to point about about these benchmark results:
- The `ReadOnlyMix` results demonstrate a fixed improvement, regardless of size. This is due to the replacement of the hash-map with a linked-list for the readSet structure.
- The `ReadWriteMix` is more interesting. We see that the spanlatch implementation is faster across the board. This is especially true with a high write/read ratio.
- We see that the allocated memory stays constant regardless of the write/read ratio in the spanlatch implementation. This is due to the memory recylcing that it performs on btree nodes. It is not the case for the CommandQueue implementation.

Release note: None

32416:  scripts: enhance the release notes r=knz a=knz

Fixes #25180.

With this the amount of release notes for the first 2.2 alpha in cockroachdb/docs#4051 is reduced to just under two pages.

Also this PR makes it easier to monitor progress during the execution of the script.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
nvb added a commit to nvb/cockroach that referenced this pull request Dec 27, 2018
Fixes cockroachdb#9521.
Supersedes cockroachdb#31904.

SQL has a tendency to create scans which cover a range's entire key
span, though looking only to return a finite number of results. These
requests end up blocking on writes that are holding latches over keys
that the scan will not actually touch. In reality, when there is a
scan with a limit, the actual affected key span ends up being a small
subset of the range's key span.

This change creates a new "optimistic evaluation" path for read-only
requests. When evaluating optimistically, the batch will sequence itself
with the latch manager, but will not wait to acquire all of its latches.
Instead, it begins evaluating immediately and verifies that it would not
have needed to wait on any latch acquisitions after-the-fact. When
performing this verification, it uses knowledge about the limited set
of keys that the scan actually looked at. If there are no conflicts, the
scan succeeds. If there are conflicts, the request waits for all of its
latch acquisition attempts to finish and re-evaluates.

This PR replaces cockroachdb#31904. The major difference between the two is that
this PR exploits the structure of the latch manager to efficiently
perform optimistic latch acquisition and after-the-fact verification
of conflicts. Doing this requires keeping no extra state because it
uses the immutable snapshots that the latch manager now captures during
sequencing. The other major difference is that this PR does not release
latches after a failed optimistic evaluation.

NOTE: a prevalent theory of the pathological case with this behavior
was that overestimated read latches would serialize with write latches,
causing all requests on a range to serialize. I wasn't seeing this
in practice. It turns out that the "timestamp awareness" in the
latch manager should avoid this behavior in most cases because later
writes will have higher timestamps than earlier reads. The effect of
this is that they won't be considered to interfere by the latch manager.
Still, large clusters with a high amount of clock skew could see a
bounded variant of this situation.

_### Benchmark Results

```
name                                   old ops/sec  new ops/sec  delta
kv95/cores=16/nodes=3/splits=3          51.9k ± 0%   51.7k ± 1%     ~     (p=0.400 n=3+3)
kvS70-L1/cores=16/nodes=3/splits=3      24.1k ± 4%   27.7k ± 1%  +14.75%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3      24.5k ± 1%   27.5k ± 1%  +12.08%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3   16.0k ± 1%   16.6k ± 2%   +3.79%  (p=0.100 n=3+3)

name                                   old p50(ms)  new p50(ms)  delta
kv95/cores=16/nodes=3/splits=3           0.70 ± 0%    0.70 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       1.07 ± 6%    0.90 ± 0%  -15.62%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       1.10 ± 0%    0.90 ± 0%  -18.18%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    1.80 ± 0%    1.67 ± 4%   -7.41%  (p=0.100 n=3+3)

name                                   old p99(ms)  new p99(ms)  delta
kv95/cores=16/nodes=3/splits=3           1.80 ± 0%    1.80 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       5.77 ±32%    4.70 ± 0%     ~     (p=0.400 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       5.00 ± 0%    4.70 ± 0%   -6.00%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    6.90 ± 3%    7.33 ± 8%     ~     (p=0.400 n=3+3)
```

_S<num> = --span-percent=<num>, L<num> = --span-limit=<num>_

Release note (performance improvement): improved performance on workloads
which mix OLAP queries with inserts and updates.
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
nvb added a commit to nvb/cockroach that referenced this pull request Jul 11, 2019
Fixes cockroachdb#9521.
Supersedes cockroachdb#31904.

SQL has a tendency to create scans which cover a range's entire key
span, though looking only to return a finite number of results. These
requests end up blocking on writes that are holding latches over keys
that the scan will not actually touch. In reality, when there is a
scan with a limit, the actual affected key span ends up being a small
subset of the range's key span.

This change creates a new "optimistic evaluation" path for read-only
requests. When evaluating optimistically, the batch will sequence itself
with the latch manager, but will not wait to acquire all of its latches.
Instead, it begins evaluating immediately and verifies that it would not
have needed to wait on any latch acquisitions after-the-fact. When
performing this verification, it uses knowledge about the limited set
of keys that the scan actually looked at. If there are no conflicts, the
scan succeeds. If there are conflicts, the request waits for all of its
latch acquisition attempts to finish and re-evaluates.

This PR replaces cockroachdb#31904. The major difference between the two is that
this PR exploits the structure of the latch manager to efficiently
perform optimistic latch acquisition and after-the-fact verification
of conflicts. Doing this requires keeping no extra state because it
uses the immutable snapshots that the latch manager now captures during
sequencing. The other major difference is that this PR does not release
latches after a failed optimistic evaluation.

NOTE: a prevalent theory of the pathological case with this behavior
was that overestimated read latches would serialize with write latches,
causing all requests on a range to serialize. I wasn't seeing this
in practice. It turns out that the "timestamp awareness" in the
latch manager should avoid this behavior in most cases because later
writes will have higher timestamps than earlier reads. The effect of
this is that they won't be considered to interfere by the latch manager.
Still, large clusters with a high amount of clock skew could see a
bounded variant of this situation.

_### Benchmark Results

```
name                                   old ops/sec  new ops/sec  delta
kv95/cores=16/nodes=3/splits=3          51.9k ± 0%   51.7k ± 1%     ~     (p=0.400 n=3+3)
kvS70-L1/cores=16/nodes=3/splits=3      24.1k ± 4%   27.7k ± 1%  +14.75%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3      24.5k ± 1%   27.5k ± 1%  +12.08%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3   16.0k ± 1%   16.6k ± 2%   +3.79%  (p=0.100 n=3+3)

name                                   old p50(ms)  new p50(ms)  delta
kv95/cores=16/nodes=3/splits=3           0.70 ± 0%    0.70 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       1.07 ± 6%    0.90 ± 0%  -15.62%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       1.10 ± 0%    0.90 ± 0%  -18.18%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    1.80 ± 0%    1.67 ± 4%   -7.41%  (p=0.100 n=3+3)

name                                   old p99(ms)  new p99(ms)  delta
kv95/cores=16/nodes=3/splits=3           1.80 ± 0%    1.80 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       5.77 ±32%    4.70 ± 0%     ~     (p=0.400 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       5.00 ± 0%    4.70 ± 0%   -6.00%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    6.90 ± 3%    7.33 ± 8%     ~     (p=0.400 n=3+3)
```

_S<num> = --span-percent=<num>, L<num> = --span-limit=<num>_

Release note (performance improvement): improved performance on workloads
which mix OLAP queries with inserts and updates.
@bobvawter
Copy link
Copy Markdown
Contributor

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@nvb nvb closed this Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-noremind Bots won't notify about PRs with X-noremind

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants