Move all platforms to use llvm.minimum/llvm.maximum for fmin/fmax#56371
Move all platforms to use llvm.minimum/llvm.maximum for fmin/fmax#56371oscardssmith merged 5 commits intomasterfrom
Conversation
|
oh wow julia> versioninfo()
Julia Version 1.12.0-DEV.1506
Commit 2cdfe062952 (2024-10-28 11:32 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
Threads: 16 default, 0 interactive, 16 GC (on 16 virtual cores)
julia> function f(dij)
vmin = Inf
for x in dij
vmin = min(x,vmin)
end
return vmin
end
f (generic function with 2 methods)
julia> @be rand(Float64, 512000) f samples=500 evals=100
Benchmark: 6 samples with 100 evaluations
min 1.764 ms
median 1.768 ms
mean 1.775 ms
max 1.805 ms
julia> function f_llvm(dij)
vmin = Inf
for x in dij
vmin = llvm_min(x,vmin)
end
return vmin
end
f_llvm (generic function with 1 method)
julia> Base.@assume_effects :total @inline llvm_min(x::Float64, y::Float64) = ccall("llvm.minimum.f64", llvmcall, Float64, (Float64, Float64), x, y)
llvm_min (generic function with 1 method)
julia> @be rand(Float64, 512000) f_llvm samples=500 evals=100
Benchmark: 163 samples with 100 evaluations
min 49.408 μs
median 51.382 μs
mean 51.667 μs
max 55.443 μs |
|
related question, what would it take to make julia> function f_isless(dij)
vmin = Inf
for x in dij
pred = isless(x, vmin)
vmin = pred ? x : vmin
end
return vmin
end
f_isless (generic function with 2 methods)
julia> @be rand(Float64, 512000) f_isless samples=500 evals=100
Benchmark: 5 samples with 100 evaluations
min 2.108 ms
median 2.114 ms
mean 2.118 ms
max 2.140 ms
julia> function f_llvm(dij)
vmin = Inf
for x in dij
vmin = llvm_min(x, vmin)
end
return vmin
end
f_llvm (generic function with 5 methods)
julia> @be rand(Float64, 512000) f_llvm samples=500 evals=100
Benchmark: 162 samples with 100 evaluations
min 49.071 μs
median 51.354 μs
mean 51.752 μs
max 71.813 μs |
|
Are you sure the |
|
it's not because of branching: julia> function f_isless(dij)
vmin = Inf
for x in dij
pred = isless(x, vmin)
vmin = ifelse(pred, x, vmin)
end
return vmin
end
f_isless (generic function with 1 method)
julia> @be rand(512000) f_isless samples=100 evals=50
Benchmark: 10 samples with 50 evaluations
min 2.090 ms
median 2.101 ms
mean 2.116 ms
max 2.211 ms |
|
PowerPC failures are fixed in LLVM 19. So we may want to wait for that to merge to merge this. |
|
Should these get tfuncs? IIUC, making these be |
|
Maybe. I'm not sure we gain much by making them tfuncs. They get annotated |
|
This fixes #48487 (CC: @mikmoore) on x86_64 (I think it was already fixed on aarch64): # [...]
# %bb.0: # %top
#DEBUG_VALUE: #2:x <- $xmm0
push rbp
mov rbp, rsp
movabs rax, offset .LCPI0_0
vmovsd xmm1, qword ptr [rax] # xmm1 = mem[0],zero
vmaxsd xmm0, xmm1, xmm0
pop rbp
ret# [...]
# %bb.0: # %top
#DEBUG_VALUE: #5:x <- $xmm0
push rbp
mov rbp, rsp
movabs rax, offset .LCPI0_0
vmovsd xmm1, qword ptr [rax] # xmm1 = mem[0],zero
vmaxsd xmm0, xmm1, xmm0
pop rbp
ret |
|
Hopefully this also gives us a path to better performance on |
|
as far as I can tell However, I'm continuously interested in what we can do for |
|
@Moelf part of the issue is that For example, if one eliminates the branches here, then a factor of 2 speedup is possible. Alas, the ordering of floats becomes |
|
Right, but I don't think use |
|
I think this discussion should continue in a dedicated ticket 🙂 |
|
Given that LLVM 19 is taking a while to merge and powerpc is currently listed as teir 4 (used to build), can we rebase and merge this? |
|
I've pushed 1 additional commit that removes some of the remaining complexity from when this was only ccalls. |
…n but as a test. This used to not work but LLVM now has support for this on all platforms we care about.
639a157 to
a0af124
Compare
giordano
left a comment
There was a problem hiding this comment.
Looks good to me as far as I can tell.
I can confirm this works also on riscv64, and also on this PR
julia> code_llvm(x->max(-Inf,x),Tuple{Float64};debuginfo=:none); Function Signature: var"#20"(Float64)
define double @"julia_#20_2131"(double %"x::Float64") #0 {
top:
#dbg_value(double %"x::Float64", !2, !DIExpression(), !14)
ret double %"x::Float64"
}while on current master
julia> code_llvm(x->max(-Inf,x),Tuple{Float64};debuginfo=:none); Function Signature: var"#26"(Float64)
define double @"julia_#26_4754"(double %"x::Float64") #0 {
top:
#dbg_value(double %"x::Float64", !3, !DIExpression(), !15)
%0 = fsub double 0xFFF0000000000000, %"x::Float64"
%bitcast_coercion = bitcast double %0 to i64
%1 = icmp sgt i64 %bitcast_coercion, -1
%2 = select i1 %1, double 0xFFF0000000000000, double %"x::Float64"
%3 = fcmp ord double %"x::Float64", 0.000000e+00
%4 = select i1 %3, double %2, double %0
ret double %4
}which is another confirmation #48487 is indeed fixed. I see exactly the same improvement on x86_64 (on aarch64 we were already using the llvm functions so there's no change there).
Note that tests are passing on all platforms, the x86_64-apple-darwin job is only erroring during a cleanup step at the end, and upload jobs are failing because of the ongoing github outage.
…liaLang#56371) This used to not work but LLVM now has support for this on all platforms we care about. Maybe this should be a builtin. This allows for more vectorization opportunities since llvm understands the code better Fix JuliaLang#48487. --------- Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com> Co-authored-by: oscarddssmith <oscar.smith@juliacomputing.com>
This used to not work but LLVM now has support for this on all platforms we care about.
Maybe this should be a builtin.
This allows for more vectorization opportunities since llvm understands the code better
Fix #48487.