util/tracing: reduce allocations in EnsureForkSpan#76210
util/tracing: reduce allocations in EnsureForkSpan#76210craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
This patch pre-allocates capacity for the options slice in EnsureForkSpan. Before, we were reallocating the slice storage repeatedly, as it was growing from 0 to 1 and from 1 to 2 elements. This patch also simplifies, and slighly speeds up, EnsureChildSpan. This guy was using a sync.Pool for its options slice. That is not necessary, since the slice storage does not escape. In the benchmarks below, detached-child=false is EnsureChildSpan and detached-child=true is EnsureForkSpan. > make bench PKG=./pkg/util/tracing BENCHES=BenchmarkSpanCreation TESTFLAGS="-count=5 -cpu=1,2,4,8,16" name old time/op new time/op delta SpanCreation/detached-child=false 1.70µs ± 1% 1.61µs ± 1% -5.06% (p=0.008 n=5+5) SpanCreation/detached-child=false-2 1.17µs ± 7% 1.04µs ± 3% -10.70% (p=0.008 n=5+5) SpanCreation/detached-child=false-4 958ns ± 2% 893ns ±10% -6.76% (p=0.008 n=5+5) SpanCreation/detached-child=false-8 1.11µs ±38% 0.95µs ± 1% ~ (p=0.310 n=5+5) SpanCreation/detached-child=false-16 925ns ± 1% 919ns ± 1% ~ (p=0.286 n=4+5) SpanCreation/detached-child=true 2.27µs ± 2% 1.88µs ± 0% -17.27% (p=0.008 n=5+5) SpanCreation/detached-child=true-2 1.82µs ±10% 1.33µs ±22% -26.81% (p=0.008 n=5+5) SpanCreation/detached-child=true-4 1.29µs ± 2% 1.02µs ± 1% -21.35% (p=0.008 n=5+5) SpanCreation/detached-child=true-8 1.16µs ± 1% 0.91µs ± 5% -21.56% (p=0.008 n=5+5) SpanCreation/detached-child=true-16 1.14µs ± 2% 0.97µs ± 1% -15.08% (p=0.008 n=5+5) name old alloc/op new alloc/op delta SpanCreation/detached-child=false 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-2 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-4 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-8 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=false-16 288B ± 0% 288B ± 0% ~ (all equal) SpanCreation/detached-child=true 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-2 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-4 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-8 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-16 769B ± 0% 288B ± 0% -62.55% (p=0.008 n=5+5) name old allocs/op new allocs/op delta SpanCreation/detached-child=false 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-2 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-4 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-8 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=false-16 6.00 ± 0% 6.00 ± 0% ~ (all equal) SpanCreation/detached-child=true 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-2 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-4 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-8 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) SpanCreation/detached-child=true-16 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5) Release note: None
dhartunian
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
-- commits, line 15 at r1:
for my learning: it best practice to benchmark with many cpu values like this? what value does running with different values for -cpu values offer here? is the goal to show the consistency of the impact?gh
pkg/util/tracing/tracer.go, line 1496 at r1 (raw file):
) (context.Context, *Span) { // NB: Making the capacity dynamic, based on len(os), makes this allocate. // With a fixed length, it doesn't allocate.
trying to understand how this works...when you call make with a fixed integer length, no allocation is done, and the allocation happens when you append for the first time? but what if you called make([]SpanOption, 0, len(os)+1) wouldn't that give you exactly the capacity you need below?
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @nvanbenschoten)
Previously, dhartunian (David Hartunian) wrote…
for my learning: it best practice to benchmark with many cpu values like this? what value does running with different values for
-cpuvalues offer here? is the goal to show the consistency of the impact?gh
For benchmarks that have been programmed for parallelism (b.RunParallel(func))), varying the number of cpus varies the concurrency. So, for example, you can see changes that only help or hurt at high concurrency. In this particular case, I probably could have run with a single cpu count; there's not much interesting happening as you vary it since span creation doesn't scale beyond two cores.
pkg/util/tracing/tracer.go, line 1496 at r1 (raw file):
when you call make with a fixed integer length, no allocation is done, and the allocation happens when you append for the first time?
When you use a fixed length/capacity, if that size is small enough and also if the storage for the slice doesn't need to escape to the heap (i.e. nobody's referencing slice elements beyond the lifetime of the function), then the storage will be allocated on the stack. append will not allocate while there's available capacity; it'll allocate once the original capacity is exhausted.
but what if you called make([]SpanOption, 0, len(os)+1) wouldn't that give you exactly the capacity you need below?
That would give me the capacity I want, but the storage would be allocated on the heap. As far as I can tell, the go compiler never allocates dynamically-sized objects on the stack. There's no alloca like in C.
That's why I made this capacity static, in the hope that it's not often exceeded.
dhartunian
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @dhartunian and @nvanbenschoten)
andreimatei
left a comment
There was a problem hiding this comment.
TFTR!
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @dhartunian and @nvanbenschoten)
|
Build succeeded: |
This patch pre-allocates capacity for the options slice in
EnsureForkSpan. Before, we were reallocating the slice storage
repeatedly, as it was growing from 0 to 1 and from 1 to 2 elements.
This patch also simplifies, and slighly speeds up, EnsureChildSpan. This
guy was using a sync.Pool for its options slice. That is not necessary,
since the slice storage does not escape.
In the benchmarks below, detached-child=false is EnsureChildSpan and
detached-child=true is EnsureForkSpan.
name old alloc/op new alloc/op delta
SpanCreation/detached-child=false 288B ± 0% 288B ± 0% ~ (all equal)
SpanCreation/detached-child=false-2 288B ± 0% 288B ± 0% ~ (all equal)
SpanCreation/detached-child=false-4 288B ± 0% 288B ± 0% ~ (all equal)
SpanCreation/detached-child=false-8 288B ± 0% 288B ± 0% ~ (all equal)
SpanCreation/detached-child=false-16 288B ± 0% 288B ± 0% ~ (all equal)
SpanCreation/detached-child=true 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-2 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-4 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-8 768B ± 0% 288B ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-16 769B ± 0% 288B ± 0% -62.55% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
SpanCreation/detached-child=false 6.00 ± 0% 6.00 ± 0% ~ (all equal)
SpanCreation/detached-child=false-2 6.00 ± 0% 6.00 ± 0% ~ (all equal)
SpanCreation/detached-child=false-4 6.00 ± 0% 6.00 ± 0% ~ (all equal)
SpanCreation/detached-child=false-8 6.00 ± 0% 6.00 ± 0% ~ (all equal)
SpanCreation/detached-child=false-16 6.00 ± 0% 6.00 ± 0% ~ (all equal)
SpanCreation/detached-child=true 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-2 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-4 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-8 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5)
SpanCreation/detached-child=true-16 16.0 ± 0% 6.0 ± 0% -62.50% (p=0.008 n=5+5)
Release note: None