workload/bank: generate random strings much faster#35441
workload/bank: generate random strings much faster#35441craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 3 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)
cpu.prof, line 0 at r1 (raw file):
Woops 😄
pkg/workload/bank/bank.go, line 121 at r1 (raw file):
b.rows, func(rowIdx int) []interface{} { rng := rand.New(rand.NewSource(b.seed + uint64(rowIdx)))
If we're just using this as a rand.Source, I bet we can remove an allocation per row by changing this to:
rng := rand.NewSource(b.seed + uint64(rowIdx))
Or if we want to go crazy, pull the variable out of this function and just Seed it each time. The second suggestion is only safe if we never call this function concurrently.
pkg/workload/bank/bank.go, line 122 at r1 (raw file):
func(rowIdx int) []interface{} { rng := rand.New(rand.NewSource(b.seed + uint64(rowIdx))) payload := make([]byte, b.payloadBytes)
Do we get any speedup from batch allocating this? Something like:
func (b *bank) Tables() []workload.Table {
var payloadAlloc []byte
...
func(rowIdx int) []interface{} {
if len(payloadAlloc) < b.payloadBytes {
payloadAlloc = make([]byte, 64*b.payloadBytes)
}
payload = payloadAlloc[:b.payloadBytes]
payloadAlloc = payloadAlloc[b.payloadBytes:]
Or if we're confident that this function will be called b.rows, just start with payloadAlloc sized to b.rows*b.payloadBytes?
Give bank's initial table data generation the same love that tpcc just got in cockroachdb#35322. We use bank for some of the roachtests that just want a bunch of data, so it's worth speeding up. The generated "payload" field has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that way before for any particular reason, so this should be okay. In summary: - The slowest part of generating random strings is generating the random number. Previously, we made a uint64 and threw away most of it to pick one character from an alphabet. Instead, pick as many characters as we can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)). - The current thinking for go2 is to switch to a Permuted Congruential Generator (PCG), which is much faster than the current "math/rand" generator. It's smaller in memory (128-bit vs 607 64-bit words) and _much_ faster to seed. Use the implementation of that proposal that's in "golang.org/x/exp/rand" Benchmark: name old time/op new time/op delta InitBank-8 12.6ms ± 1% 0.6ms ± 2% -95.54% (p=0.008 n=5+5) name old speed new speed delta InitBank-8 8.02MB/s ± 1% 183.44MB/s ± 2% +2187.28% (p=0.008 n=5+5) name old alloc/op new alloc/op delta InitBank-8 5.88MB ± 0% 0.23MB ± 0% -96.05% (p=0.008 n=5+5) name old allocs/op new allocs/op delta InitBank-8 9.00k ± 0% 6.00k ± 0% -33.32% (p=0.008 n=5+5) Release note: None
ed4e814 to
dc09a4c
Compare
danhhz
left a comment
There was a problem hiding this comment.
RFAL
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
cpu.prof, line at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Woops 😄
🙈
pkg/workload/bank/bank.go, line 121 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're just using this as a
rand.Source, I bet we can remove an allocation per row by changing this to:rng := rand.NewSource(b.seed + uint64(rowIdx))Or if we want to go crazy, pull the variable out of this function and just
Seedit each time. The second suggestion is only safe if we never call this function concurrently.
Ah, great eye. The reason that method takes a rand.Source in the first place is because when I was doing #35322, I noticed that it was faster when you gave it a *rand.PCGSource instead of a *rand.Rand. But when I tried plumbing the source all the way down, I didn't see any difference in the overall InitTPCC benchmark. It does however make a difference here, both in cpu and allocs! Diff from simply removing the rand.New wrapper:
name old time/op new time/op delta
InitBank-8 642µs ± 1% 561µs ± 2% -12.68% (p=0.008 n=5+5)
name old speed new speed delta
InitBank-8 160MB/s ± 1% 183MB/s ± 2% +14.53% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
InitBank-8 280kB ± 0% 232kB ± 0% -17.13% (p=0.000 n=4+5)
name old allocs/op new allocs/op delta
InitBank-8 7.00k ± 0% 6.00k ± 0% -14.28% (p=0.008 n=5+5)
pkg/workload/bank/bank.go, line 122 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we get any speedup from batch allocating this? Something like:
func (b *bank) Tables() []workload.Table { var payloadAlloc []byte ... func(rowIdx int) []interface{} { if len(payloadAlloc) < b.payloadBytes { payloadAlloc = make([]byte, 64*b.payloadBytes) } payload = payloadAlloc[:b.payloadBytes] payloadAlloc = payloadAlloc[b.payloadBytes:]Or if we're confident that this function will be called
b.rows, just start withpayloadAllocsized tob.rows*b.payloadBytes?
Those functions have to be concurrency safe. TPCC gets around this by putting them in a pool using bufalloc.BytesAllocator, which is straightforward, but I'll leave for a followup.
(FWIW, BytesAllocator is probably a better idea here than allocating b.rows * b.payloadBytes, since we seem to be using this workload when we need some large sized cluster but don't particularly care what's in it, so it tends to get a combo that means a very large b.rows * b.payloadBytes.)
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 3 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @tbg)
|
TFTR! bors r=nvanbenschoten |
35441: workload/bank: generate random strings much faster r=nvanbenschoten a=danhhz Give bank's initial table data generation the same love that tpcc just got in #35322. We use bank for some of the roachtests that just want a bunch of data, so it's worth speeding up. The generated "payload" field has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that way before for any particular reason, so this should be okay. In summary: - The slowest part of generating random strings is generating the random number. Previously, we made a uint64 and threw away most of it to pick one character from an alphabet. Instead, pick as many characters as we can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)). - The current thinking for go2 is to switch to a Permuted Congruential Generator (PCG), which is much faster than the current "math/rand" generator. It's smaller in memory (128-bit vs 607 64-bit words) and _much_ faster to seed. Use the implementation of that proposal that's in "golang.org/x/exp/rand" Benchmark: name old time/op new time/op delta InitBank-8 12.6ms ± 1% 0.6ms ± 2% -95.54% (p=0.008 n=5+5) name old speed new speed delta InitBank-8 8.02MB/s ± 1% 183.44MB/s ± 2% +2187.28% (p=0.008 n=5+5) name old alloc/op new alloc/op delta InitBank-8 5.88MB ± 0% 0.23MB ± 0% -96.05% (p=0.008 n=5+5) name old allocs/op new allocs/op delta InitBank-8 9.00k ± 0% 6.00k ± 0% -33.32% (p=0.008 n=5+5) Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Build succeeded |
Give bank's initial table data generation the same love that tpcc just
got in #35322. We use bank for some of the roachtests that just want a
bunch of data, so it's worth speeding up. The generated "payload" field
has changed a bit (a-zA-Z instead of hex encoded), but it wasn't that
way before for any particular reason, so this should be okay.
In summary:
number. Previously, we made a uint64 and threw away most of it to pick
one character from an alphabet. Instead, pick as many characters as we
can get from 64 bits: floor(log(math.MaxUint64)/log(len(alpabet)).
Generator (PCG), which is much faster than the current "math/rand"
generator. It's smaller in memory (128-bit vs 607 64-bit words) and
much faster to seed. Use the implementation of that proposal that's
in "golang.org/x/exp/rand"
Benchmark:
Release note: None