Skip to content

make A[:] = x::Number fall back to fill!#13459

Closed
KristofferC wants to merge 1 commit intoJuliaLang:masterfrom
KristofferC:kc/colon_num
Closed

make A[:] = x::Number fall back to fill!#13459
KristofferC wants to merge 1 commit intoJuliaLang:masterfrom
KristofferC:kc/colon_num

Conversation

@KristofferC
Copy link
Copy Markdown
Member

@mbauman mentioned this in https://groups.google.com/forum/#!topic/julia-users/ub08AnpIutw

A[:] previously fell back to the AbstractArray function.

Before there was a performance difference:

julia> @time fill!(A, 0.);
  0.007387 seconds (4 allocations: 160 bytes)

julia> @time A[:] = 0.
  0.010574 seconds (5 allocations: 176 bytes)

@yuyichao
Copy link
Copy Markdown
Contributor

yuyichao commented Oct 5, 2015

I don't think this should be necessary. From what I can see, the performance difference is caused by #13301 (although it seems to be a store to the GC frame this time....). LLVM IR below, for whatever reason there's a store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41 in the loop that is storying a constant value to the GC frame.

julia> Base.code_llvm_raw(Base._unsafe_batchsetindex!, Tuple{Base.LinearFast, Matrix{Float64}, Base.Repeated{Int64}, Colon})

define %jl_value_t* @"julia__unsafe_batchsetindex!_22134"(%jl_value_t*, %jl_value_t**, i32) {
top:                                                                              
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !2, metadata !14), !dbg !15
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !16, metadata !17), !dbg !15
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !18, metadata !19), !dbg !15
  %3 = alloca [3 x %jl_value_t*], align 8, !dbg !15                               
  %.sub = getelementptr inbounds [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 0
  %4 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !15
  %5 = bitcast [3 x %jl_value_t*]* %3 to i64*, !dbg !15                           
  store i64 2, i64* %5, align 8, !dbg !15                                         
  %6 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !15
  %7 = load i64, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !15
  %8 = bitcast %jl_value_t** %6 to i64*, !dbg !15                                 
  store i64 %7, i64* %8, align 8, !dbg !15                                        
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !15       
  store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !15                    
  %9 = icmp eq i32 %2, 3, !dbg !20
  br i1 %9, label %fail, label %pass, !dbg !20

fail:                                             ; preds = %top
  %10 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 3, !dbg !20
  call void @jl_bounds_error_tuple_int(%jl_value_t** %10, i64 0, i64 1), !dbg !20
  unreachable, !dbg !20

pass:                                             ; preds = %top
  %11 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1, !dbg !15
  %12 = load %jl_value_t*, %jl_value_t** %11, align 8, !dbg !15
  %13 = getelementptr inbounds %jl_value_t, %jl_value_t* %12, i64 1, !dbg !15
  %14 = bitcast %jl_value_t* %13 to i64*, !dbg !15
  %15 = load i64, i64* %14, align 8, !dbg !15, !tbaa !21
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !22, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !25, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !26, metadata !24), !dbg !15
  %16 = icmp slt i64 %15, 1, !dbg !27
  br i1 %16, label %L.4, label %L.preheader, !dbg !27

L.preheader:                                      ; preds = %pass
  %17 = bitcast %jl_value_t* %12 to double**, !dbg !15
  %18 = load double*, double** %17, align 8, !dbg !15, !tbaa !32
  %19 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 2, !dbg !15
  %20 = bitcast %jl_value_t** %19 to i64**, !dbg !15
  %21 = load i64*, i64** %20, align 8, !dbg !15
  br label %L, !dbg !27

L:                                                ; preds = %L, %L.preheader
  %"#s2.0" = phi i64 [ %22, %L ], [ 1, %L.preheader ]
  %22 = add i64 %"#s2.0", 1, !dbg !27
  call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !35, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %22, i64 0, metadata !26, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !36, metadata !24), !dbg !15
  %23 = load i64, i64* %21, align 16, !dbg !37, !tbaa !38
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !39, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %23, i64 0, metadata !40, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 2, i64 0, metadata !39, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 3, i64 0, metadata !39, metadata !24), !dbg !15
  %24 = add i64 %"#s2.0", -1, !dbg !41
  %25 = sitofp i64 %23 to double, !dbg !41
  %26 = getelementptr double, double* %18, i64 %24, !dbg !41
  store double %25, double* %26, align 8, !dbg !41, !tbaa !42
  store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41
  %27 = icmp eq i64 %"#s2.0", %15, !dbg !43
  br i1 %27, label %L.4.loopexit, label %L, !dbg !43

L.4.loopexit:                                     ; preds = %L
  br label %L.4, !dbg !45

L.4:                                              ; preds = %L.4.loopexit, %pass
  %28 = load i64, i64* %8, align 8, !dbg !45
  store i64 %28, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !45
  ret %jl_value_t* %12, !dbg !45
}

@yuyichao
Copy link
Copy Markdown
Contributor

yuyichao commented Oct 5, 2015

OK I can see this is happening on release 0.4 as well so it's not #13301.

The AST is a little bit strange and it seems to generate a temporary variable that is never used (which causes the GC frame store)

For _unsafe_batchsetindex!

      $(Expr(:boundscheck, false))
      _var0 = (Base.arrayset)(A::Array{Float64,2},(Base.box)(Float64,(Base.sitofp)(Float64,v::Int64)::Any)::Float64,offset_0::Int64)::Array{Float64,2}
      goto 26
      _var0 = $(Expr(:boundscheck, :(Base.pop)))
      26: 
      _var0::Array{Float64,2} # cartesian.jl, line 34:

For fill!

      $(Expr(:boundscheck, false))
      (Base.arrayset)(a::Array{Float64,2},x::Float64,i::Int64)::Array{Float64,2}
      $(Expr(:boundscheck, :(Base.pop)))

Note that both functions uses a loop and inbound assignment but seems that inlining of unsafe_setindex! confuses the compiler. We could probably work around this but maybe we should just fix the compiler/type inference inliner?

@KristofferC
Copy link
Copy Markdown
Member Author

Close in favour of #13463?

@yuyichao
Copy link
Copy Markdown
Contributor

yuyichao commented Oct 6, 2015

Not sure, maybe this can be replaced by #13461 if we don't have #13359.

@mbauman
Copy link
Copy Markdown
Member

mbauman commented Oct 7, 2015

Thanks for doing this, @KristofferC! I went with the more general solution in #13461 for now, but this was a great to get that ball rolling.

@mbauman mbauman closed this Oct 7, 2015
@KristofferC KristofferC deleted the kc/colon_num branch October 8, 2015 10:27
@stevengj
Copy link
Copy Markdown
Member

stevengj commented May 10, 2016

I still think this is a good idea. As long as we have a specialized fill! function, it is really strange not to use it for A[:] = x, which is the most natural way for users to do a fill! operation. Especially since in the common case A[:] = 0, I think fill! calls specialized memset code?

Moreover, A[:] = x is currently quite slow (doing a heap allocation), as mentioned in #16302, although this will hopefully be fixed. It's rather unexpected for A[:] = x to perform differently than fill!

@mbauman
Copy link
Copy Markdown
Member

mbauman commented May 10, 2016

You're right, I agree this is still a good idea.

fill! is also defined for all AbstractArrays in a sensible way, so we can do this for everyone (and not just Array). Unfortunately, the most sensible place to do this is immediately after we incur the splatting penalty.

Perhaps a PR that introduces a @generated function to work around the lack of call-site specialization will light a fire here? In general, setindex! shouldn't need to allocate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants