Skip to content

importccl: parallelize CSV-skipping workload IMPORT#36106

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:workload-par
Mar 27, 2019
Merged

importccl: parallelize CSV-skipping workload IMPORT#36106
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:workload-par

Conversation

@dt
Copy link
Copy Markdown
Contributor

@dt dt commented Mar 25, 2019

(only last commit, first two commits are #36042)

Reopening #36060 after accidentally deleting branch and then github said it cannot re-open :/

This spins up multiple workers, each importing every i'th batch, to do
workload IMPORT.

As noted inline, this execution order, as opposed to assinging large
spans of batches to each worker, should mean that adjacent batches are
processed at roughly the same time and thus end up in the same
sort-batch for SST creation, preserving the non-overlapping SSTs when
the workload's batches are ordered and non-overlapping.

Release note: none.

@dt dt requested review from a team and danhhz March 25, 2019 17:47
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@dt dt force-pushed the workload-par branch 2 times, most recently from f7320c3 to 3c2156d Compare March 25, 2019 17:49
@dt
Copy link
Copy Markdown
Contributor Author

dt commented Mar 25, 2019

Should we remove the env var?

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Mar 25, 2019

I'd want us to run some tests first. At the very least, a fixtures import tpcc --warehouses=1000 --checks=true as a sanity check and to make sure it really is at least as fast for real usage.

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Mar 25, 2019

I've run that one, with --checks=true but nothing other than tpcc

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Mar 25, 2019

Oh? How long did the IMPORTs take?

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Mar 25, 2019

I don't have the 1k perf numbers handy but looking though what I have in my scrollback, I have side-by-side local runs for 75 warehouses -- I'll rerun on roachprod with 1k today though.

For w=75 on mbp:

csv:   imported 5.6 GiB bytes in 9 tables (took 3m36.383236s, 26.60 MiB/s)
nopar: imported 5.6 GiB bytes in 9 tables (took 3m50.354778s, 24.98 MiB/s)
par:   imported 5.6 GiB bytes in 9 tables (took 3m08.931523s, 30.46 MiB/s)

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Mar 25, 2019

Eh, actually, I don't think I want to think about including swapping the default in the same PR. I'll send a separate one with check/benchmark results.

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Mar 25, 2019

Cool. I'm definitely interested in the 1k numbers, so let me know when you have them. Agree we should not hold up this PR on switching the default. In fact, I'd prefer it be a separate PR anyway.

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Mar 26, 2019

This was on a 4 node gce roachprod cluster. I had to use n1-standard-8 machines since otherwise I consistently get an OOM in direct when doing warehouses=1000.

ingest reader time
distsql csv 55m
distsql workload 41m
direct SST csv 67m
direct SST workload 65m

Not too surprising that the speedup of Workload compared to CSV reader is more noticeable in distsql, where we read everything twice, than in direct where it is read only once.

It looks like in bigger ingestion jobs, we're still seeing the no-sort version spend more time in ingest -- recomputing stats, repeated compactions and poor of splitting -- but that's a separate issue (and I think mostly down to the same known issues that bigger index backfills are currently trying to solve).

I let the checks finish on the direct+workload run and they were satisfied.

Full timing output

This spins up multiple workers to import workload data when skipping CSV
encoding and directly calling the production function.

Compared to the non-parallelized reader:
```
name                  old time/op    new time/op    delta
ImportFixtureTPCC-12     2.77s ± 4%     2.00s ± 2%  -27.75%  (p=0.008 n=5+5)

name                  old speed      new speed      delta
ImportFixtureTPCC-12  31.9MB/s ± 4%  44.1MB/s ± 2%  +38.38%  (p=0.008 n=5+5)

name                  old alloc/op   new alloc/op   delta
ImportFixtureTPCC-12    2.32GB ± 0%    2.37GB ± 0%   +1.95%  (p=0.008 n=5+5)

name                  old allocs/op  new allocs/op  delta
ImportFixtureTPCC-12     26.1M ± 0%     26.0M ± 0%   -0.37%  (p=0.008 n=5+5)
```

and compared to the parallelized CSV reader:
```
name                  old time/op    new time/op    delta
ImportFixtureTPCC-12     2.22s ± 6%     2.00s ± 2%   -9.70%  (p=0.008 n=5+5)

name                  old speed      new speed      delta
ImportFixtureTPCC-12  39.9MB/s ± 6%  44.1MB/s ± 2%  +10.66%  (p=0.008 n=5+5)

name                  old alloc/op   new alloc/op   delta
ImportFixtureTPCC-12    2.58GB ± 0%    2.37GB ± 0%   -8.47%  (p=0.008 n=5+5)

name                  old allocs/op  new allocs/op  delta
ImportFixtureTPCC-12     32.0M ± 0%     26.0M ± 0%  -18.66%  (p=0.008 n=5+5)
```

Release note: none.
@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Mar 26, 2019

Thanks! Let's get this merged! \o/

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Mar 27, 2019

bors r+

craig bot pushed a commit that referenced this pull request Mar 27, 2019
36106: importccl: parallelize CSV-skipping workload IMPORT r=dt a=dt

(only last commit, first two commits are #36042)

Reopening #36060 after accidentally deleting branch and then github said it cannot re-open :/

This spins up multiple workers, each importing every i'th batch, to do
workload IMPORT.

As noted inline, this execution order, as opposed to assinging large
spans of batches to each worker, should mean that adjacent batches are
processed at roughly the same time and thus end up in the same
sort-batch for SST creation, preserving the non-overlapping SSTs when
the workload's batches are ordered and non-overlapping.

Release note: none.

Co-authored-by: David Taylor <tinystatemachine@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 27, 2019

Build succeeded

@craig craig bot merged commit 95b41d1 into cockroachdb:master Mar 27, 2019
@dt dt deleted the workload-par branch March 27, 2019 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants