make A[:] = x::Number fall back to fill!#13459
make A[:] = x::Number fall back to fill!#13459KristofferC wants to merge 1 commit intoJuliaLang:masterfrom
Conversation
|
I don't think this should be necessary. From what I can see, the performance difference is caused by #13301 (although it seems to be a store to the GC frame this time....). LLVM IR below, for whatever reason there's a julia> Base.code_llvm_raw(Base._unsafe_batchsetindex!, Tuple{Base.LinearFast, Matrix{Float64}, Base.Repeated{Int64}, Colon})
define %jl_value_t* @"julia__unsafe_batchsetindex!_22134"(%jl_value_t*, %jl_value_t**, i32) {
top:
call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !2, metadata !14), !dbg !15
call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !16, metadata !17), !dbg !15
call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !18, metadata !19), !dbg !15
%3 = alloca [3 x %jl_value_t*], align 8, !dbg !15
%.sub = getelementptr inbounds [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 0
%4 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !15
%5 = bitcast [3 x %jl_value_t*]* %3 to i64*, !dbg !15
store i64 2, i64* %5, align 8, !dbg !15
%6 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !15
%7 = load i64, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !15
%8 = bitcast %jl_value_t** %6 to i64*, !dbg !15
store i64 %7, i64* %8, align 8, !dbg !15
store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !15
store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !15
%9 = icmp eq i32 %2, 3, !dbg !20
br i1 %9, label %fail, label %pass, !dbg !20
fail: ; preds = %top
%10 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 3, !dbg !20
call void @jl_bounds_error_tuple_int(%jl_value_t** %10, i64 0, i64 1), !dbg !20
unreachable, !dbg !20
pass: ; preds = %top
%11 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1, !dbg !15
%12 = load %jl_value_t*, %jl_value_t** %11, align 8, !dbg !15
%13 = getelementptr inbounds %jl_value_t, %jl_value_t* %12, i64 1, !dbg !15
%14 = bitcast %jl_value_t* %13 to i64*, !dbg !15
%15 = load i64, i64* %14, align 8, !dbg !15, !tbaa !21
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !22, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !25, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !26, metadata !24), !dbg !15
%16 = icmp slt i64 %15, 1, !dbg !27
br i1 %16, label %L.4, label %L.preheader, !dbg !27
L.preheader: ; preds = %pass
%17 = bitcast %jl_value_t* %12 to double**, !dbg !15
%18 = load double*, double** %17, align 8, !dbg !15, !tbaa !32
%19 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 2, !dbg !15
%20 = bitcast %jl_value_t** %19 to i64**, !dbg !15
%21 = load i64*, i64** %20, align 8, !dbg !15
br label %L, !dbg !27
L: ; preds = %L, %L.preheader
%"#s2.0" = phi i64 [ %22, %L ], [ 1, %L.preheader ]
%22 = add i64 %"#s2.0", 1, !dbg !27
call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !35, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 %22, i64 0, metadata !26, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !36, metadata !24), !dbg !15
%23 = load i64, i64* %21, align 16, !dbg !37, !tbaa !38
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !39, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 %23, i64 0, metadata !40, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 2, i64 0, metadata !39, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 3, i64 0, metadata !39, metadata !24), !dbg !15
%24 = add i64 %"#s2.0", -1, !dbg !41
%25 = sitofp i64 %23 to double, !dbg !41
%26 = getelementptr double, double* %18, i64 %24, !dbg !41
store double %25, double* %26, align 8, !dbg !41, !tbaa !42
store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41
%27 = icmp eq i64 %"#s2.0", %15, !dbg !43
br i1 %27, label %L.4.loopexit, label %L, !dbg !43
L.4.loopexit: ; preds = %L
br label %L.4, !dbg !45
L.4: ; preds = %L.4.loopexit, %pass
%28 = load i64, i64* %8, align 8, !dbg !45
store i64 %28, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !45
ret %jl_value_t* %12, !dbg !45
} |
|
OK I can see this is happening on release 0.4 as well so it's not #13301. The AST is a little bit strange and it seems to generate a temporary variable that is never used (which causes the GC frame store) For $(Expr(:boundscheck, false))
_var0 = (Base.arrayset)(A::Array{Float64,2},(Base.box)(Float64,(Base.sitofp)(Float64,v::Int64)::Any)::Float64,offset_0::Int64)::Array{Float64,2}
goto 26
_var0 = $(Expr(:boundscheck, :(Base.pop)))
26:
_var0::Array{Float64,2} # cartesian.jl, line 34:For $(Expr(:boundscheck, false))
(Base.arrayset)(a::Array{Float64,2},x::Float64,i::Int64)::Array{Float64,2}
$(Expr(:boundscheck, :(Base.pop)))Note that both functions uses a loop and inbound assignment but seems that inlining of |
|
Close in favour of #13463? |
|
Thanks for doing this, @KristofferC! I went with the more general solution in #13461 for now, but this was a great to get that ball rolling. |
|
I still think this is a good idea. As long as we have a specialized Moreover, |
|
You're right, I agree this is still a good idea.
Perhaps a PR that introduces a |
@mbauman mentioned this in https://groups.google.com/forum/#!topic/julia-users/ub08AnpIutw
A[:]previously fell back to the AbstractArray function.Before there was a performance difference: