Add optimized mapreduce implementation for SkipMissing#27743
Add optimized mapreduce implementation for SkipMissing#27743
Conversation
| i > ilast && return mapreduce_first(f, op, a1) | ||
| a2 = ai::eltype(itr) | ||
| # Unexpectedly, the following assertion allows SIMD instructions to be emitted | ||
| A[i]::eltype(itr) |
There was a problem hiding this comment.
The fact that this line is required to get SIMD is really intriguing. In the present case, we know A[i] is non-missing, so it should be perfectly fine to keep. And this assertion doesn't seem to force the compiler to incorrectly assume that all entries in A are non-missing, since we do get missing values in the loop below.
Minimal reproducer:
function g(A)
v = 0
# Uncomment to get SIMD
# A[1]::Base.nonmissingtype(eltype(A))
@simd for i = 1:length(A)
@inbounds ai = A[i]
if ai !== missing
v += ai
end
end
return v
end
@code_llvm g([1,missing])There was a problem hiding this comment.
Maybe open the minimal reproducer in a standalone issue in order to get more attention?
There was a problem hiding this comment.
Seems like something might be preventing LICM (loop-invariant-code-motion), since in the non-simd version, it's reloading the data pointer on every iteration, and any non-functional code that uses it in the header will allow for SIMD.
julia> _false = false
julia> function g(A)
v = 0
# Uncomment to get SIMD
x = A[1]
if isa(x, Missing) && _false
error()
end
@simd for i = 1:length(A)
@inbounds ai = A[i]
if ai !== missing
v += ai
end
end
return v
end
There was a problem hiding this comment.
Yes, I can file a separate issue if that's useful, but @vtjnash can probably write a much more useful description than I can.
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
|
Nanosoldier results are really contrasted: @ararslan Could something weird be going on with the comparison? I ask because at #27386 there are improvements which I cannot reproduce locally either. |
|
Probable explanation here: #27386 (comment). Let's run it again. @nanosoldier |
|
Restarted the server. @nanosoldier |
Unfortunately, I don't think that was the case here. I've not (yet) known Nanosoldier to blatantly lie to us. In this case, it reported comparing 9e7f85d against 0978251. Those are the correct commits — neither the PR nor master had the random change incorporated. |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
|
Indeed, the regressions are still there. I'm really stuck. I've tried again locally after rebasing on master, and for example I get this for julia> master["array"]["(\"skipmissing\", sum, Float64, false)"]
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 1.462 μs (0.00% GC)
median time: 1.524 μs (0.00% GC)
mean time: 1.568 μs (0.00% GC)
maximum time: 33.038 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> pr["array"]["(\"skipmissing\", sum, Float64, false)"]
BenchmarkTools.Trial:
memory estimate: 64 bytes
allocs estimate: 4
--------------
minimum time: 732.000 ns (0.00% GC)
median time: 793.000 ns (0.00% GC)
mean time: 5.053 μs (0.00% GC)
maximum time: 42.571 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1What can we do about that? Would somebody check locally whether my results are confirmed? |
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Based on the AbstractArray methods. Allows the compiler to emit SIMD instructions,
for SkipMissing{Vector{Int}}, but also improves performance for SkipMissing{Vector{Float64}}.
|
I've found out why Nanosoldier find regressions. I'm still not sure why I wasn't able to reproduce the problem before, but at least now I see it when running BaseBenchmarks. It only happens for vectors of ~1000 entries or less (as in the benchmark, but not in my other checks), and only when I've fixed it by only using the new method when There's still a 20% regression locally for @nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
|
Good news. Most regressions have disappeared and turned into improvements. The only remaining regressions which are related are for So barring objections I'll merge this tomorrow. |
|
Awesome! But, just to keep in mind, another common operation on datasets is to compute mean per group, where typically size per group may be < 1000. |
|
Yeah, I realized that too. Anyway the regression isn't too bad, and the goal is really to enable SIMD, which will be faster in all cases (except maybe very small vectors). |
| i > ilast && return Some(mapreduce_first(f, op, a1)) | ||
| a2 = ai::eltype(itr) | ||
| # Unexpectedly, the following assertion allows SIMD instructions to be emitted | ||
| A[i]::eltype(itr) |
Based on the AbstractArray methods. Allows the compiler to emit SIMD instructions, for
SkipMissing{Vector{Int}}, but also improves performance forSkipMissing{Vector{Float64}}.The performance improvement is clear for
Int(3×-5× less time), but less so forFloat64(-30%). However, since SIMD now works forInt, there's a chance it could eventually be enabled forFloat64, and it should be quite easier than with the genericmapreducefallback. Also, these 30% are enough to get as fast as R (maybe even a bit faster), which is an important reference point (see this Discourse thread).Fixes #27679.