storage/concurrency: benchmark for lockTable#44964
storage/concurrency: benchmark for lockTable#44964craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
nvb
left a comment
There was a problem hiding this comment.
this looks really good. Do you mind posting the results to the PR and in the commit message? Ideally, you'd run with
-benchmem and then pass the output to benchstat.
Reviewed 1 of 1 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/storage/concurrency/lock_table_test.go, line 998 at r1 (raw file):
&item.Txn.TxnMeta
This will allocate if we don't pass benchWorkItem in as a pointer to this function.
To do this, switch:
go doBenchWork(items[i], env, requestDoneCh)
with
go doBenchWork(&items[i], env, requestDoneCh)
pkg/storage/concurrency/lock_table_test.go, line 1003 at r1 (raw file):
} } env.lm.Release(lg)
Was there a reason you're acquiring latches regardless of whether there are any locks to acquire or not?
Most of the allocations in lockTable are due to the temporary *lockState created for btree lookup. And the total bytes allocated is roughly equal to the bytes allocated by spanlatch.allocGuardAndLatches in the contended benchmarks. The cpu is mostly in runtime.mcall, in pthread_cond_wait and pthread_cond_signal. Benchmark output on my machine: BenchmarkLockTable/groups=1,outstanding=1,read=0/-16 200000 12149 ns/op 7257 B/op 34 allocs/op BenchmarkLockTable/groups=1,outstanding=1,read=1/-16 200000 9288 ns/op 6294 B/op 28 allocs/op BenchmarkLockTable/groups=1,outstanding=1,read=2/-16 200000 10132 ns/op 5331 B/op 22 allocs/op BenchmarkLockTable/groups=1,outstanding=1,read=3/-16 200000 6380 ns/op 4367 B/op 16 allocs/op BenchmarkLockTable/groups=1,outstanding=1,read=4/-16 300000 4207 ns/op 2501 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=1,read=5/-16 300000 4826 ns/op 2501 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=2,read=0/-16 100000 19704 ns/op 9350 B/op 43 allocs/op BenchmarkLockTable/groups=1,outstanding=2,read=1/-16 100000 13420 ns/op 8400 B/op 37 allocs/op BenchmarkLockTable/groups=1,outstanding=2,read=2/-16 100000 12233 ns/op 7427 B/op 31 allocs/op BenchmarkLockTable/groups=1,outstanding=2,read=3/-16 200000 11819 ns/op 6446 B/op 25 allocs/op BenchmarkLockTable/groups=1,outstanding=2,read=4/-16 500000 2810 ns/op 2503 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=2,read=5/-16 500000 2796 ns/op 2503 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=4,read=0/-16 100000 18419 ns/op 9379 B/op 43 allocs/op BenchmarkLockTable/groups=1,outstanding=4,read=1/-16 100000 14616 ns/op 8402 B/op 37 allocs/op BenchmarkLockTable/groups=1,outstanding=4,read=2/-16 100000 12598 ns/op 7430 B/op 31 allocs/op BenchmarkLockTable/groups=1,outstanding=4,read=3/-16 200000 11091 ns/op 6463 B/op 25 allocs/op BenchmarkLockTable/groups=1,outstanding=4,read=4/-16 1000000 1964 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=4,read=5/-16 1000000 2079 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=8,read=0/-16 100000 16523 ns/op 9362 B/op 43 allocs/op BenchmarkLockTable/groups=1,outstanding=8,read=1/-16 100000 15131 ns/op 8395 B/op 37 allocs/op BenchmarkLockTable/groups=1,outstanding=8,read=2/-16 100000 14093 ns/op 7429 B/op 31 allocs/op BenchmarkLockTable/groups=1,outstanding=8,read=3/-16 100000 12182 ns/op 6463 B/op 25 allocs/op BenchmarkLockTable/groups=1,outstanding=8,read=4/-16 1000000 1768 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=8,read=5/-16 1000000 2016 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=16,read=0/-16 100000 17909 ns/op 9357 B/op 43 allocs/op BenchmarkLockTable/groups=1,outstanding=16,read=1/-16 100000 15952 ns/op 8392 B/op 37 allocs/op BenchmarkLockTable/groups=1,outstanding=16,read=2/-16 100000 14637 ns/op 7426 B/op 31 allocs/op BenchmarkLockTable/groups=1,outstanding=16,read=3/-16 100000 12950 ns/op 6461 B/op 25 allocs/op BenchmarkLockTable/groups=1,outstanding=16,read=4/-16 1000000 1849 ns/op 2506 B/op 9 allocs/op BenchmarkLockTable/groups=1,outstanding=16,read=5/-16 1000000 1943 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=1,read=0/-16 100000 18541 ns/op 7316 B/op 34 allocs/op BenchmarkLockTable/groups=16,outstanding=1,read=1/-16 100000 14632 ns/op 6349 B/op 28 allocs/op BenchmarkLockTable/groups=16,outstanding=1,read=2/-16 200000 11921 ns/op 5383 B/op 22 allocs/op BenchmarkLockTable/groups=16,outstanding=1,read=3/-16 200000 8337 ns/op 4407 B/op 16 allocs/op BenchmarkLockTable/groups=16,outstanding=1,read=4/-16 1000000 1727 ns/op 2506 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=1,read=5/-16 1000000 1871 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=2,read=0/-16 50000 27195 ns/op 9479 B/op 47 allocs/op BenchmarkLockTable/groups=16,outstanding=2,read=1/-16 100000 21031 ns/op 8442 B/op 40 allocs/op BenchmarkLockTable/groups=16,outstanding=2,read=2/-16 100000 14650 ns/op 6569 B/op 28 allocs/op BenchmarkLockTable/groups=16,outstanding=2,read=3/-16 200000 9725 ns/op 4972 B/op 18 allocs/op BenchmarkLockTable/groups=16,outstanding=2,read=4/-16 1000000 1858 ns/op 2506 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=2,read=5/-16 1000000 1887 ns/op 2506 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=4,read=0/-16 50000 27303 ns/op 9484 B/op 47 allocs/op BenchmarkLockTable/groups=16,outstanding=4,read=1/-16 100000 21513 ns/op 8502 B/op 40 allocs/op BenchmarkLockTable/groups=16,outstanding=4,read=2/-16 100000 16280 ns/op 7267 B/op 31 allocs/op BenchmarkLockTable/groups=16,outstanding=4,read=3/-16 100000 12648 ns/op 6029 B/op 23 allocs/op BenchmarkLockTable/groups=16,outstanding=4,read=4/-16 1000000 1762 ns/op 2506 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=4,read=5/-16 1000000 1846 ns/op 2506 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=8,read=0/-16 50000 27690 ns/op 9493 B/op 48 allocs/op BenchmarkLockTable/groups=16,outstanding=8,read=1/-16 100000 22154 ns/op 8582 B/op 41 allocs/op BenchmarkLockTable/groups=16,outstanding=8,read=2/-16 100000 17375 ns/op 7524 B/op 32 allocs/op BenchmarkLockTable/groups=16,outstanding=8,read=3/-16 100000 14593 ns/op 6631 B/op 26 allocs/op BenchmarkLockTable/groups=16,outstanding=8,read=4/-16 1000000 1778 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=8,read=5/-16 1000000 1864 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=16,read=0/-16 50000 34606 ns/op 9798 B/op 48 allocs/op BenchmarkLockTable/groups=16,outstanding=16,read=1/-16 50000 26477 ns/op 9048 B/op 42 allocs/op BenchmarkLockTable/groups=16,outstanding=16,read=2/-16 100000 23607 ns/op 7821 B/op 32 allocs/op BenchmarkLockTable/groups=16,outstanding=16,read=3/-16 100000 20832 ns/op 7002 B/op 27 allocs/op BenchmarkLockTable/groups=16,outstanding=16,read=4/-16 1000000 1905 ns/op 2505 B/op 9 allocs/op BenchmarkLockTable/groups=16,outstanding=16,read=5/-16 1000000 2007 ns/op 2504 B/op 9 allocs/op Release note: None
sumeerbhola
left a comment
There was a problem hiding this comment.
Do you mind posting the results to the PR and in the commit message? Ideally, you'd run with -benchmem and then pass the output to benchstat.
Done. There isn't any before number to compare with using benchstat.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/storage/concurrency/lock_table_test.go, line 998 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
&item.Txn.TxnMetaThis will allocate if we don't pass
benchWorkItemin as a pointer to this function.To do this, switch:
go doBenchWork(items[i], env, requestDoneCh)with
go doBenchWork(&items[i], env, requestDoneCh)
Done.
How do I get escape analysis output for the test itself? When I do go build -gcflags "-m -m" it does not produce anything for lock_table_test.go.
pkg/storage/concurrency/lock_table_test.go, line 1003 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Was there a reason you're acquiring latches regardless of whether there are any locks to acquire or not?
Oversight. Fixed.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/storage/concurrency/lock_table_test.go, line 1112 at r2 (raw file):
// RunParallel() -- it doesn't seem possible to get parallelism between these // two values when using B.RunParallel() since B.SetParallelism() accepts an // integer multiplier to GOMAXPROCS.
btw, do we have an existing test framework to do better than this?
I was first going to vary the concurrency in smaller increments and have a shared variable incremented by the goroutine for each group and when one of them detects the total count has been reached it would signal (via closing a shared channel) to all the other groups. It was more code so I did the simpler thing here.
|
bors r+ |
44964: storage/concurrency: benchmark for lockTable r=sumeerbhola a=sumeerbhola Most of the allocations in lockTable are due to the temporary *lockState created for btree lookup. And the total bytes allocated is roughly equal to the bytes allocated by spanlatch.allocGuardAndLatches in the contended benchmarks. The cpu is mostly in runtime.mcall, in pthread_cond_wait and pthread_cond_signal. Release note: None Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
nvb
left a comment
There was a problem hiding this comment.
There isn't any before number to compare with using benchstat.
That's fine, it is still useful with only a single point of comparison because it aggregates across trials.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/concurrency/lock_table_test.go, line 998 at r1 (raw file):
How do I get escape analysis output for the test itself? When I do go build -gcflags "-m -m" it does not produce anything for lock_table_test.go.
go test -gcflags "-m -m" should give you what you want, but I've never used that so I might be mistaken.
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/concurrency/lock_table_test.go, line 1112 at r2 (raw file):
Previously, sumeerbhola wrote…
btw, do we have an existing test framework to do better than this?
I was first going to vary the concurrency in smaller increments and have a shared variable incremented by the goroutine for each group and when one of them detects the total count has been reached it would signal (via closing a shared channel) to all the other groups. It was more code so I did the simpler thing here.
I'm not aware of any test framework for doing this. We generally don't use B.RunParallel, in part because of the integer limitation you pointed out here. Instead, we usually just do our own thing with WaitGroups and a manual division of work between each goroutine.
Build succeeded |
Most of the allocations in lockTable are due to the temporary
*lockState created for btree lookup. And the total bytes allocated
is roughly equal to the bytes allocated by
spanlatch.allocGuardAndLatches in the contended benchmarks.
The cpu is mostly in runtime.mcall, in pthread_cond_wait and
pthread_cond_signal.
Release note: None