storage: Use optimized SeekGE in CheckSSTConflicts#73981
storage: Use optimized SeekGE in CheckSSTConflicts#73981craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
This can't be merged until cockroachdb/pebble#1412 lands, and the vendor ref is bumped in Cockroach. |
|
I ran a single TPCE import benchmark with and without this patch. For posterity, the import workload is: CREATE TABLE trade (
t_id INT8 NOT NULL PRIMARY KEY,
t_dts TIMESTAMP NOT NULL,
t_st_id VARCHAR(4) NOT NULL,
t_tt_id VARCHAR(3) NOT NULL,
t_is_cash BOOL NOT NULL,
t_s_symb VARCHAR(15) NOT NULL,
t_qty INT4 NOT NULL CHECK (t_qty > 0),
t_bid_price DECIMAL(8,2) NOT NULL CHECK (t_bid_price > 0),
t_ca_id INT8 NOT NULL,
t_exec_name VARCHAR(49) NOT NULL,
t_trade_price DECIMAL(8,2),
t_chrg DECIMAL(10,2) NOT NULL CHECK (t_chrg >= 0),
t_comm DECIMAL(10,2) NOT NULL CHECK (t_comm >= 0),
t_tax DECIMAL(10,2) NOT NULL CHECK (t_tax >= 0),
t_lifo BOOL NOT NULL,
FAMILY (t_id),
FAMILY (t_comm, t_dts, t_st_id, t_trade_price, t_exec_name, t_tt_id, t_is_cash, t_s_symb, t_qty, t_bid_price, t_ca_id, t_chrg, t_lifo),
FAMILY (t_tax)
);
CREATE INDEX ON trade (t_ca_id, t_dts DESC) STORING (t_st_id, t_tt_id, t_is_cash, t_s_symb, t_qty, t_bid_price, t_exec_name, t_trade_price, t_chrg);
CREATE INDEX ON trade (t_s_symb, t_dts ASC) STORING (t_ca_id, t_exec_name, t_is_cash, t_trade_price, t_qty, t_tt_id);
IMPORT INTO trade CSV DATA ('gs://cockroach-fixtures/tpce-csv/customers=100000/*/Trade.txt?AUTH=implicit') WITH delimiter = '|', DETACHED;This took 4h43m46s without this patch, and 3h54m37s with it -- a pretty significant reduction of 17.3%. A few relevant graphs, where the first import started at 21:40 and the second one at 08:55, show that we're getting better resource utilization both in terms of CPU and IO: I'll do another run and see if the results are consistent. |
|
Second run was consistent: 4h55m23s vs 4h11m23s, so a 14.9% reduction. Nice work! |
d6e9e0d to
4c6df3e
Compare
|
First commit is #74745 , which this change depends on, but otherwise this PR is ready for review. |
4c6df3e to
095addc
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @dt and @itsbilal)
pkg/storage/sst_iterator.go, line 95 at r2 (raw file):
trySeekUsingNext := false if r.seekGELastOp { // trySeekUsingNext = r.prevSeekKey <= key
leftover code?
pkg/storage/sst_iterator.go, line 97 at r2 (raw file):
// trySeekUsingNext = r.prevSeekKey <= key trySeekUsingNext = !key.Less(r.prevSeekKey) }
maybe add a comment like
// NB: seekGELastOp may still be true, and we haven't updated prevSeekKey. So be careful not to return before the end of the function that sets these fields up for the next SeekGE.
Changes pulled in:
```
3d0ff924d13a3d5fdf6e56a391c5c178c18ff196 *: Add trySeekUsingNext optimization to SeekGE
0c503048eb0365981929177c30178add8a56ae3e sstable: add (*sstable.Writer).RangeKey{Set,Unset,Delete} methods
fe52b49cc28df62dce9b00c382a5ce217936be56 tool/logs: aggregate compaction logs by node and store ID
8ab4358bc59dfa62e5e34e4b0e5ce81a68f5fe91 sstable: return err from `(BlockPropertyCollector).FinishTable`
91c18ef0ee999980c2869d11e5ce468410acbe8d internal/keyspan: add FragmentIterator interface
953fdb078ff0585489206ae96e1d80ca9f6f90c7 internal/keyspan: implement SetBounds on Iter
aa376a819bf67cd6766ee827feed4bf0bd508f1f tool: add compaction log event aggregation tool
```
Release note: None.
This nearly reverts cockroachdb#73514 by moving back to calling SeekGE on the engine to skip past any empty spans on either the engine or the SSTable. This is the more optimal approach on average, and given optimizations in cockroachdb/pebble#1412 which this change depends on, it also ends up performing better than a SeekPrefixGE-driven appraoch and the pre-cockroachdb#73514 approach. Improvement when running BenchmarkCheckSSTConflicts against the pre-cockroachdb#73514 revision (vs. this one): ``` name old time/op new time/op delta CheckSSTConflicts/keys=1000/versions=8/sstKeys=1000/overlap=false-24 72.6µs ±11% 66.3µs ± 1% -8.67% (p=0.008 n=5+5) CheckSSTConflicts/keys=1000/versions=8/sstKeys=1000/overlap=true-24 12.2ms ± 1% 1.7ms ± 1% -86.41% (p=0.008 n=5+5) CheckSSTConflicts/keys=1000/versions=8/sstKeys=10000/overlap=false-24 69.8µs ± 2% 67.4µs ± 1% -3.48% (p=0.008 n=5+5) CheckSSTConflicts/keys=1000/versions=8/sstKeys=10000/overlap=true-24 13.3ms ± 3% 2.8ms ± 1% -78.97% (p=0.008 n=5+5) CheckSSTConflicts/keys=1000/versions=64/sstKeys=1000/overlap=false-24 75.8µs ± 3% 63.8µs ± 1% -15.86% (p=0.008 n=5+5) CheckSSTConflicts/keys=1000/versions=64/sstKeys=1000/overlap=true-24 13.0ms ± 1% 1.9ms ± 1% -85.11% (p=0.008 n=5+5) CheckSSTConflicts/keys=1000/versions=64/sstKeys=10000/overlap=false-24 69.8µs ±11% 64.6µs ± 1% -7.45% (p=0.008 n=5+5) CheckSSTConflicts/keys=1000/versions=64/sstKeys=10000/overlap=true-24 14.8ms ± 9% 3.1ms ± 2% -79.05% (p=0.008 n=5+5) CheckSSTConflicts/keys=10000/versions=8/sstKeys=1000/overlap=false-24 66.1µs ± 2% 63.7µs ± 1% -3.65% (p=0.008 n=5+5) CheckSSTConflicts/keys=10000/versions=8/sstKeys=1000/overlap=true-24 14.2ms ± 9% 1.9ms ± 1% -86.55% (p=0.008 n=5+5) CheckSSTConflicts/keys=10000/versions=8/sstKeys=10000/overlap=false-24 72.3µs ±10% 64.5µs ± 0% -10.77% (p=0.008 n=5+5) CheckSSTConflicts/keys=10000/versions=8/sstKeys=10000/overlap=true-24 122ms ± 2% 17ms ± 1% -86.03% (p=0.008 n=5+5) CheckSSTConflicts/keys=10000/versions=64/sstKeys=1000/overlap=false-24 69.0µs ± 9% 62.4µs ± 1% -9.57% (p=0.032 n=5+5) CheckSSTConflicts/keys=10000/versions=64/sstKeys=1000/overlap=true-24 14.0ms ± 1% 2.3ms ± 2% -83.46% (p=0.016 n=4+5) CheckSSTConflicts/keys=10000/versions=64/sstKeys=10000/overlap=false-24 69.4µs ± 9% 62.7µs ± 1% -9.63% (p=0.016 n=5+5) CheckSSTConflicts/keys=10000/versions=64/sstKeys=10000/overlap=true-24 140ms ± 5% 26ms ± 1% -81.70% (p=0.008 n=5+5) CheckSSTConflicts/keys=100000/versions=8/sstKeys=1000/overlap=false-24 69.2µs ±10% 62.5µs ± 1% -9.66% (p=0.008 n=5+5) CheckSSTConflicts/keys=100000/versions=8/sstKeys=1000/overlap=true-24 15.3ms ±11% 2.3ms ± 1% -85.21% (p=0.008 n=5+5) CheckSSTConflicts/keys=100000/versions=8/sstKeys=10000/overlap=false-24 69.7µs ±12% 63.6µs ± 1% ~ (p=0.095 n=5+5) CheckSSTConflicts/keys=100000/versions=8/sstKeys=10000/overlap=true-24 148ms ± 6% 28ms ± 2% -80.90% (p=0.008 n=5+5) CheckSSTConflicts/keys=100000/versions=64/sstKeys=1000/overlap=false-24 67.1µs ±10% 61.1µs ± 2% -8.93% (p=0.016 n=5+5) CheckSSTConflicts/keys=100000/versions=64/sstKeys=1000/overlap=true-24 14.4ms ± 2% 2.5ms ± 5% -82.45% (p=0.016 n=4+5) CheckSSTConflicts/keys=100000/versions=64/sstKeys=10000/overlap=false-24 68.9µs ±21% 62.2µs ± 1% -9.76% (p=0.008 n=5+5) CheckSSTConflicts/keys=100000/versions=64/sstKeys=10000/overlap=true-24 204ms ±14% 42ms ± 5% -79.44% (p=0.008 n=5+5) ``` Fixes cockroachdb#66410. Release note: None.
095addc to
65cc09e
Compare
itsbilal
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @dt and @sumeerbhola)
pkg/storage/sst_iterator.go, line 95 at r2 (raw file):
Previously, sumeerbhola wrote…
leftover code?
Nope, it's just writing the below line in mathematical form, as the code form is a little weird with the !Less.
pkg/storage/sst_iterator.go, line 97 at r2 (raw file):
Previously, sumeerbhola wrote…
maybe add a comment like
// NB: seekGELastOp may still be true, and we haven't updated prevSeekKey. So be careful not to return before the end of the function that sets these fields up for the next SeekGE.
Done.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 2 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @itsbilal, and @sumeerbhola)
pkg/storage/sst_iterator.go, line 95 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nope, it's just writing the below line in mathematical form, as the code form is a little weird with the
!Less.
ah yes, I was too hasty in my reading.
|
Thanks! Seeing as both this and #74745 are green and approved, I'll merge just this PR to save time. bors r=sumeerbhola |
|
Build failed (retrying...): |
|
Build succeeded: |




This nearly reverts #73514 by moving back to calling SeekGE on
the engine to skip past any empty spans on either the engine
or the SSTable. This is the more optimal approach on average,
and given optimizations in cockroachdb/pebble#1412 which this
change depends on, it also ends up performing better than
a SeekPrefixGE-driven appraoch and the pre-#73514 approach.
Improvement when running BenchmarkCheckSSTConflicts
against the pre-#73514 revision (vs. this one):
Fixes #66410.
Release note: None.