importccl: parallelize CSV-skipping workload IMPORT#36106
importccl: parallelize CSV-skipping workload IMPORT#36106craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
f7320c3 to
3c2156d
Compare
|
Should we remove the env var? |
|
I'd want us to run some tests first. At the very least, a |
|
I've run that one, with |
|
Oh? How long did the IMPORTs take? |
|
I don't have the 1k perf numbers handy but looking though what I have in my scrollback, I have side-by-side local runs for 75 warehouses -- I'll rerun on roachprod with 1k today though. For w=75 on mbp: |
|
Eh, actually, I don't think I want to think about including swapping the default in the same PR. I'll send a separate one with check/benchmark results. |
|
Cool. I'm definitely interested in the 1k numbers, so let me know when you have them. Agree we should not hold up this PR on switching the default. In fact, I'd prefer it be a separate PR anyway. |
|
This was on a 4 node gce roachprod cluster. I had to use n1-standard-8 machines since otherwise I consistently get an OOM in direct when doing
Not too surprising that the speedup of Workload compared to CSV reader is more noticeable in distsql, where we read everything twice, than in direct where it is read only once. It looks like in bigger ingestion jobs, we're still seeing the no-sort version spend more time in ingest -- recomputing stats, repeated compactions and poor of splitting -- but that's a separate issue (and I think mostly down to the same known issues that bigger index backfills are currently trying to solve). I let the checks finish on the direct+workload run and they were satisfied. |
This spins up multiple workers to import workload data when skipping CSV encoding and directly calling the production function. Compared to the non-parallelized reader: ``` name old time/op new time/op delta ImportFixtureTPCC-12 2.77s ± 4% 2.00s ± 2% -27.75% (p=0.008 n=5+5) name old speed new speed delta ImportFixtureTPCC-12 31.9MB/s ± 4% 44.1MB/s ± 2% +38.38% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ImportFixtureTPCC-12 2.32GB ± 0% 2.37GB ± 0% +1.95% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ImportFixtureTPCC-12 26.1M ± 0% 26.0M ± 0% -0.37% (p=0.008 n=5+5) ``` and compared to the parallelized CSV reader: ``` name old time/op new time/op delta ImportFixtureTPCC-12 2.22s ± 6% 2.00s ± 2% -9.70% (p=0.008 n=5+5) name old speed new speed delta ImportFixtureTPCC-12 39.9MB/s ± 6% 44.1MB/s ± 2% +10.66% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ImportFixtureTPCC-12 2.58GB ± 0% 2.37GB ± 0% -8.47% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ImportFixtureTPCC-12 32.0M ± 0% 26.0M ± 0% -18.66% (p=0.008 n=5+5) ``` Release note: none.
|
Thanks! Let's get this merged! \o/ |
|
bors r+ |
36106: importccl: parallelize CSV-skipping workload IMPORT r=dt a=dt (only last commit, first two commits are #36042) Reopening #36060 after accidentally deleting branch and then github said it cannot re-open :/ This spins up multiple workers, each importing every i'th batch, to do workload IMPORT. As noted inline, this execution order, as opposed to assinging large spans of batches to each worker, should mean that adjacent batches are processed at roughly the same time and thus end up in the same sort-batch for SST creation, preserving the non-overlapping SSTs when the workload's batches are ordered and non-overlapping. Release note: none. Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Build succeeded |
(only last commit, first two commits are #36042)
Reopening #36060 after accidentally deleting branch and then github said it cannot re-open :/
This spins up multiple workers, each importing every i'th batch, to do
workload IMPORT.
As noted inline, this execution order, as opposed to assinging large
spans of batches to each worker, should mean that adjacent batches are
processed at roughly the same time and thus end up in the same
sort-batch for SST creation, preserving the non-overlapping SSTs when
the workload's batches are ordered and non-overlapping.
Release note: none.