Skip to content

workload/tpcc: initial data load perf improvements#35322

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
danhhz:workload_tpcc_perf
Mar 4, 2019
Merged

workload/tpcc: initial data load perf improvements#35322
craig[bot] merged 3 commits intocockroachdb:masterfrom
danhhz:workload_tpcc_perf

Conversation

@danhhz
Copy link
Copy Markdown
Contributor

@danhhz danhhz commented Mar 2, 2019

See individual commits for details

Touches #34809

@danhhz danhhz requested review from a team, jordanlewis and nvb March 2, 2019 03:28
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@danhhz danhhz changed the title workload/tpcc: perf improvements workload/tpcc: initial data load perf improvements Mar 2, 2019
@danhhz danhhz force-pushed the workload_tpcc_perf branch from ccb586b to 3461767 Compare March 3, 2019 05:32
@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Mar 4, 2019

Nathan's out this week, removing him as a possible reviewer

@danhhz danhhz requested review from petermattis and removed request for nvb March 4, 2019 15:27
danhhz added 3 commits March 4, 2019 07:36
Use bufalloc.ByteAllocator in randStringFromAlphabet instead of
allocating many small byte arrays.

Benchmark results:
- InitTPCC only measures generating initial data
- ImportFixtureTPCC measures loading initial data with
  COCKROACH_IMPORT_WORKLOAD_FASTER=true, which skips the CSV roundtrip.

    name                     old time/op    new time/op    delta
    InitTPCC/warehouses=1-8     2.33s ± 0%     2.25s ± 1%   -3.04%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         5.00s ± 2%     4.83s ± 2%   -3.32%  (p=0.008 n=5+5)

    name                     old speed      new speed      delta
    InitTPCC/warehouses=1-8  30.4MB/s ± 0%  31.4MB/s ± 1%   +3.14%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8      17.7MB/s ± 2%  18.3MB/s ± 2%   +3.40%  (p=0.008 n=5+5)

    name                     old alloc/op   new alloc/op   delta
    InitTPCC/warehouses=1-8     329MB ± 0%     246MB ± 0%  -25.28%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8        3.70GB ± 0%    3.61GB ± 0%   -2.20%  (p=0.008 n=5+5)

    name                     old allocs/op  new allocs/op  delta
    InitTPCC/warehouses=1-8     9.38M ± 0%     5.60M ± 0%  -40.27%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         36.0M ± 0%     32.2M ± 0%  -10.49%  (p=0.008 n=5+5)

Touches cockroachdb#34809

Release note: None
Background: golang/go#21835

The current thinking for go2 is to switch to a Permuted Congruential
Generator (PCG), which is much faster than the current "math/rand"
generator. It's smaller in memory (128-bit vs 607 64-bit words) and
_much_ faster to seed (which is a common operation in workload, which
uses seeding to make deterministic results).

The linked issues claims "The current implementation, in pure Go, is
somewhat slower than math/rand, but could be comparably fast or even
faster given compiler support for the 128-bit multiply inside." but for
our needs it seems to already be faster.

Benchmark results:
- InitTPCC only measures generating initial data
- ImportFixtureTPCC measures loading initial data with
  COCKROACH_IMPORT_WORKLOAD_FASTER=true, which skips the CSV roundtrip.

    name                     old time/op    new time/op    delta
    InitTPCC/warehouses=1-8     2.25s ± 1%     2.06s ± 1%  -8.07%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         4.90s ± 3%     4.64s ± 2%  -5.44%  (p=0.008 n=5+5)

    name                     old speed      new speed      delta
    InitTPCC/warehouses=1-8  31.5MB/s ± 1%  34.3MB/s ± 1%  +8.80%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8      18.0MB/s ± 3%  19.1MB/s ± 2%  +5.77%  (p=0.008 n=5+5)

    name                     old alloc/op   new alloc/op   delta
    InitTPCC/warehouses=1-8     246MB ± 0%     246MB ± 0%    ~     (p=0.095 n=5+5)
    ImportFixtureTPCC-8        3.61GB ± 0%    3.61GB ± 0%    ~     (p=0.548 n=5+5)

    name                     old allocs/op  new allocs/op  delta
    InitTPCC/warehouses=1-8     5.60M ± 0%     5.60M ± 0%  +0.06%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         32.2M ± 0%     32.3M ± 0%  +0.08%  (p=0.008 n=5+5)

Touches cockroachdb#34809

Release note: None
The slowest part of generating random strings is generating the random
number. Previously, we made a uint64 and threw away most of it to pick
one character from an alphabet. Instead, pick as many characters as we
can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)).

New random string generation microbenchmark:

    name                      time/op
    RandStringFast/letters-8    75.3ns ± 2%
    RandStringFast/numbers-8    74.2ns ± 1%
    RandStringFast/aChars-8     84.5ns ± 1%

    name                      speed
    RandStringFast/letters-8   345MB/s ± 2%
    RandStringFast/numbers-8   351MB/s ± 1%
    RandStringFast/aChars-8    308MB/s ± 1%

Diffs in TPCC data generation and TPCC IMPORT:

    name                     old time/op    new time/op     delta
    InitTPCC/warehouses=1-8     2.03s ± 2%      0.65s ± 1%   -67.85%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         4.11s ± 5%      3.75s ± 2%    -8.67%  (p=0.008 n=5+5)

    name                     old speed      new speed       delta
    InitTPCC/warehouses=1-8  34.9MB/s ± 2%  108.5MB/s ± 1%  +211.01%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8      21.5MB/s ± 5%   23.6MB/s ± 2%    +9.45%  (p=0.008 n=5+5)

Touches cockroachdb#34809

Release note: None
@danhhz danhhz force-pushed the workload_tpcc_perf branch from 3461767 to f602397 Compare March 4, 2019 15:38
@danhhz danhhz mentioned this pull request Mar 4, 2019
6 tasks
Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, 12 of 12 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @petermattis)

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Mar 4, 2019

Thanks for the review!

bors r=petermattis

craig bot pushed a commit that referenced this pull request Mar 4, 2019
35322: workload/tpcc: initial data load perf improvements r=petermattis a=danhhz

See individual commits for details

Touches #34809

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 4, 2019

Build succeeded

@craig craig bot merged commit f602397 into cockroachdb:master Mar 4, 2019
danhhz added a commit to danhhz/cockroach that referenced this pull request Mar 6, 2019
Give bank's initial table data generation the same love that tpcc just
got in cockroachdb#35322. We use bank for some of the roachtests that just want a
bunch of data, so it's worth speeding up. The generated "payload" field
has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that
way before for any particular reason, so this should be okay.

In summary:
- The slowest part of generating random strings is generating the random
  number. Previously, we made a uint64 and threw away most of it to pick
  one character from an alphabet. Instead, pick as many characters as we
  can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)).
- The current thinking for go2 is to switch to a Permuted Congruential
  Generator (PCG), which is much faster than the current "math/rand"
  generator. It's smaller in memory (128-bit vs 607 64-bit words) and
  _much_ faster to seed. Use the implementation of that proposal that's
  in "golang.org/x/exp/rand"

Benchmark:

    name        old time/op    new time/op      delta
    InitBank-8    12.8ms ± 1%       0.7ms ± 1%    -94.92%  (p=0.008 n=5+5)

    name        old speed      new speed        delta
    InitBank-8  7.88MB/s ± 1%  158.21MB/s ± 1%  +1907.77%  (p=0.008 n=5+5)

    name        old alloc/op   new alloc/op     delta
    InitBank-8    5.88MB ± 0%      0.28MB ± 0%    -95.24%  (p=0.000 n=4+5)

    name        old allocs/op  new allocs/op    delta
    InitBank-8     9.00k ± 0%       7.00k ± 0%    -22.21%  (p=0.008 n=5+5)

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this pull request Mar 6, 2019
Give bank's initial table data generation the same love that tpcc just
got in cockroachdb#35322. We use bank for some of the roachtests that just want a
bunch of data, so it's worth speeding up. The generated "payload" field
has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that
way before for any particular reason, so this should be okay.

In summary:
- The slowest part of generating random strings is generating the random
  number. Previously, we made a uint64 and threw away most of it to pick
  one character from an alphabet. Instead, pick as many characters as we
  can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)).
- The current thinking for go2 is to switch to a Permuted Congruential
  Generator (PCG), which is much faster than the current "math/rand"
  generator. It's smaller in memory (128-bit vs 607 64-bit words) and
  _much_ faster to seed. Use the implementation of that proposal that's
  in "golang.org/x/exp/rand"

Benchmark:

    name        old time/op    new time/op      delta
    InitBank-8    12.6ms ± 1%       0.6ms ± 2%    -95.54%  (p=0.008 n=5+5)

    name        old speed      new speed        delta
    InitBank-8  8.02MB/s ± 1%  183.44MB/s ± 2%  +2187.28%  (p=0.008 n=5+5)

    name        old alloc/op   new alloc/op     delta
    InitBank-8    5.88MB ± 0%      0.23MB ± 0%    -96.05%  (p=0.008 n=5+5)

    name        old allocs/op  new allocs/op    delta
    InitBank-8     9.00k ± 0%       6.00k ± 0%    -33.32%  (p=0.008 n=5+5)

Release note: None
craig bot pushed a commit that referenced this pull request Mar 11, 2019
35441: workload/bank: generate random strings much faster r=nvanbenschoten a=danhhz

Give bank's initial table data generation the same love that tpcc just
got in #35322. We use bank for some of the roachtests that just want a
bunch of data, so it's worth speeding up. The generated "payload" field
has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that
way before for any particular reason, so this should be okay.

In summary:
- The slowest part of generating random strings is generating the random
  number. Previously, we made a uint64 and threw away most of it to pick
  one character from an alphabet. Instead, pick as many characters as we
  can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)).
- The current thinking for go2 is to switch to a Permuted Congruential
  Generator (PCG), which is much faster than the current "math/rand"
  generator. It's smaller in memory (128-bit vs 607 64-bit words) and
  _much_ faster to seed. Use the implementation of that proposal that's
  in "golang.org/x/exp/rand"

Benchmark:

    name        old time/op    new time/op      delta
    InitBank-8    12.6ms ± 1%       0.6ms ± 2%    -95.54%  (p=0.008 n=5+5)

    name        old speed      new speed        delta
    InitBank-8  8.02MB/s ± 1%  183.44MB/s ± 2%  +2187.28%  (p=0.008 n=5+5)

    name        old alloc/op   new alloc/op     delta
    InitBank-8    5.88MB ± 0%      0.23MB ± 0%    -96.05%  (p=0.008 n=5+5)

    name        old allocs/op  new allocs/op    delta
    InitBank-8     9.00k ± 0%       6.00k ± 0%    -33.32%  (p=0.008 n=5+5)

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@danhhz danhhz deleted the workload_tpcc_perf branch March 11, 2019 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants