workload/tpcc: initial data load perf improvements#35322
Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom Mar 4, 2019
Merged
workload/tpcc: initial data load perf improvements#35322craig[bot] merged 3 commits intocockroachdb:masterfrom
craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
Member
ccb586b to
3461767
Compare
Contributor
Author
|
Nathan's out this week, removing him as a possible reviewer |
Use bufalloc.ByteAllocator in randStringFromAlphabet instead of
allocating many small byte arrays.
Benchmark results:
- InitTPCC only measures generating initial data
- ImportFixtureTPCC measures loading initial data with
COCKROACH_IMPORT_WORKLOAD_FASTER=true, which skips the CSV roundtrip.
name old time/op new time/op delta
InitTPCC/warehouses=1-8 2.33s ± 0% 2.25s ± 1% -3.04% (p=0.008 n=5+5)
ImportFixtureTPCC-8 5.00s ± 2% 4.83s ± 2% -3.32% (p=0.008 n=5+5)
name old speed new speed delta
InitTPCC/warehouses=1-8 30.4MB/s ± 0% 31.4MB/s ± 1% +3.14% (p=0.008 n=5+5)
ImportFixtureTPCC-8 17.7MB/s ± 2% 18.3MB/s ± 2% +3.40% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
InitTPCC/warehouses=1-8 329MB ± 0% 246MB ± 0% -25.28% (p=0.008 n=5+5)
ImportFixtureTPCC-8 3.70GB ± 0% 3.61GB ± 0% -2.20% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
InitTPCC/warehouses=1-8 9.38M ± 0% 5.60M ± 0% -40.27% (p=0.008 n=5+5)
ImportFixtureTPCC-8 36.0M ± 0% 32.2M ± 0% -10.49% (p=0.008 n=5+5)
Touches cockroachdb#34809
Release note: None
Background: golang/go#21835 The current thinking for go2 is to switch to a Permuted Congruential Generator (PCG), which is much faster than the current "math/rand" generator. It's smaller in memory (128-bit vs 607 64-bit words) and _much_ faster to seed (which is a common operation in workload, which uses seeding to make deterministic results). The linked issues claims "The current implementation, in pure Go, is somewhat slower than math/rand, but could be comparably fast or even faster given compiler support for the 128-bit multiply inside." but for our needs it seems to already be faster. Benchmark results: - InitTPCC only measures generating initial data - ImportFixtureTPCC measures loading initial data with COCKROACH_IMPORT_WORKLOAD_FASTER=true, which skips the CSV roundtrip. name old time/op new time/op delta InitTPCC/warehouses=1-8 2.25s ± 1% 2.06s ± 1% -8.07% (p=0.008 n=5+5) ImportFixtureTPCC-8 4.90s ± 3% 4.64s ± 2% -5.44% (p=0.008 n=5+5) name old speed new speed delta InitTPCC/warehouses=1-8 31.5MB/s ± 1% 34.3MB/s ± 1% +8.80% (p=0.008 n=5+5) ImportFixtureTPCC-8 18.0MB/s ± 3% 19.1MB/s ± 2% +5.77% (p=0.008 n=5+5) name old alloc/op new alloc/op delta InitTPCC/warehouses=1-8 246MB ± 0% 246MB ± 0% ~ (p=0.095 n=5+5) ImportFixtureTPCC-8 3.61GB ± 0% 3.61GB ± 0% ~ (p=0.548 n=5+5) name old allocs/op new allocs/op delta InitTPCC/warehouses=1-8 5.60M ± 0% 5.60M ± 0% +0.06% (p=0.008 n=5+5) ImportFixtureTPCC-8 32.2M ± 0% 32.3M ± 0% +0.08% (p=0.008 n=5+5) Touches cockroachdb#34809 Release note: None
The slowest part of generating random strings is generating the random
number. Previously, we made a uint64 and threw away most of it to pick
one character from an alphabet. Instead, pick as many characters as we
can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)).
New random string generation microbenchmark:
name time/op
RandStringFast/letters-8 75.3ns ± 2%
RandStringFast/numbers-8 74.2ns ± 1%
RandStringFast/aChars-8 84.5ns ± 1%
name speed
RandStringFast/letters-8 345MB/s ± 2%
RandStringFast/numbers-8 351MB/s ± 1%
RandStringFast/aChars-8 308MB/s ± 1%
Diffs in TPCC data generation and TPCC IMPORT:
name old time/op new time/op delta
InitTPCC/warehouses=1-8 2.03s ± 2% 0.65s ± 1% -67.85% (p=0.008 n=5+5)
ImportFixtureTPCC-8 4.11s ± 5% 3.75s ± 2% -8.67% (p=0.008 n=5+5)
name old speed new speed delta
InitTPCC/warehouses=1-8 34.9MB/s ± 2% 108.5MB/s ± 1% +211.01% (p=0.008 n=5+5)
ImportFixtureTPCC-8 21.5MB/s ± 5% 23.6MB/s ± 2% +9.45% (p=0.008 n=5+5)
Touches cockroachdb#34809
Release note: None
3461767 to
f602397
Compare
petermattis
approved these changes
Mar 4, 2019
Collaborator
petermattis
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r1, 12 of 12 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @petermattis)
Contributor
Author
|
Thanks for the review! bors r=petermattis |
Contributor
Build succeeded |
danhhz
added a commit
to danhhz/cockroach
that referenced
this pull request
Mar 6, 2019
Give bank's initial table data generation the same love that tpcc just got in cockroachdb#35322. We use bank for some of the roachtests that just want a bunch of data, so it's worth speeding up. The generated "payload" field has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that way before for any particular reason, so this should be okay. In summary: - The slowest part of generating random strings is generating the random number. Previously, we made a uint64 and threw away most of it to pick one character from an alphabet. Instead, pick as many characters as we can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)). - The current thinking for go2 is to switch to a Permuted Congruential Generator (PCG), which is much faster than the current "math/rand" generator. It's smaller in memory (128-bit vs 607 64-bit words) and _much_ faster to seed. Use the implementation of that proposal that's in "golang.org/x/exp/rand" Benchmark: name old time/op new time/op delta InitBank-8 12.8ms ± 1% 0.7ms ± 1% -94.92% (p=0.008 n=5+5) name old speed new speed delta InitBank-8 7.88MB/s ± 1% 158.21MB/s ± 1% +1907.77% (p=0.008 n=5+5) name old alloc/op new alloc/op delta InitBank-8 5.88MB ± 0% 0.28MB ± 0% -95.24% (p=0.000 n=4+5) name old allocs/op new allocs/op delta InitBank-8 9.00k ± 0% 7.00k ± 0% -22.21% (p=0.008 n=5+5) Release note: None
danhhz
added a commit
to danhhz/cockroach
that referenced
this pull request
Mar 6, 2019
Give bank's initial table data generation the same love that tpcc just got in cockroachdb#35322. We use bank for some of the roachtests that just want a bunch of data, so it's worth speeding up. The generated "payload" field has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that way before for any particular reason, so this should be okay. In summary: - The slowest part of generating random strings is generating the random number. Previously, we made a uint64 and threw away most of it to pick one character from an alphabet. Instead, pick as many characters as we can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)). - The current thinking for go2 is to switch to a Permuted Congruential Generator (PCG), which is much faster than the current "math/rand" generator. It's smaller in memory (128-bit vs 607 64-bit words) and _much_ faster to seed. Use the implementation of that proposal that's in "golang.org/x/exp/rand" Benchmark: name old time/op new time/op delta InitBank-8 12.6ms ± 1% 0.6ms ± 2% -95.54% (p=0.008 n=5+5) name old speed new speed delta InitBank-8 8.02MB/s ± 1% 183.44MB/s ± 2% +2187.28% (p=0.008 n=5+5) name old alloc/op new alloc/op delta InitBank-8 5.88MB ± 0% 0.23MB ± 0% -96.05% (p=0.008 n=5+5) name old allocs/op new allocs/op delta InitBank-8 9.00k ± 0% 6.00k ± 0% -33.32% (p=0.008 n=5+5) Release note: None
craig bot
pushed a commit
that referenced
this pull request
Mar 11, 2019
35441: workload/bank: generate random strings much faster r=nvanbenschoten a=danhhz Give bank's initial table data generation the same love that tpcc just got in #35322. We use bank for some of the roachtests that just want a bunch of data, so it's worth speeding up. The generated "payload" field has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that way before for any particular reason, so this should be okay. In summary: - The slowest part of generating random strings is generating the random number. Previously, we made a uint64 and threw away most of it to pick one character from an alphabet. Instead, pick as many characters as we can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)). - The current thinking for go2 is to switch to a Permuted Congruential Generator (PCG), which is much faster than the current "math/rand" generator. It's smaller in memory (128-bit vs 607 64-bit words) and _much_ faster to seed. Use the implementation of that proposal that's in "golang.org/x/exp/rand" Benchmark: name old time/op new time/op delta InitBank-8 12.6ms ± 1% 0.6ms ± 2% -95.54% (p=0.008 n=5+5) name old speed new speed delta InitBank-8 8.02MB/s ± 1% 183.44MB/s ± 2% +2187.28% (p=0.008 n=5+5) name old alloc/op new alloc/op delta InitBank-8 5.88MB ± 0% 0.23MB ± 0% -96.05% (p=0.008 n=5+5) name old allocs/op new allocs/op delta InitBank-8 9.00k ± 0% 6.00k ± 0% -33.32% (p=0.008 n=5+5) Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See individual commits for details
Touches #34809