make A[:] = x::Number fall back to fill! by KristofferC · Pull Request #13459 · JuliaLang/julia

KristofferC · 2015-10-05T21:42:39Z

@mbauman mentioned this in https://groups.google.com/forum/#!topic/julia-users/ub08AnpIutw

A[:] previously fell back to the AbstractArray function.

Before there was a performance difference:

julia> @time fill!(A, 0.);
  0.007387 seconds (4 allocations: 160 bytes)

julia> @time A[:] = 0.
  0.010574 seconds (5 allocations: 176 bytes)

yuyichao · 2015-10-05T21:49:57Z

I don't think this should be necessary. From what I can see, the performance difference is caused by #13301 (although it seems to be a store to the GC frame this time....). LLVM IR below, for whatever reason there's a store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41 in the loop that is storying a constant value to the GC frame.

julia> Base.code_llvm_raw(Base._unsafe_batchsetindex!, Tuple{Base.LinearFast, Matrix{Float64}, Base.Repeated{Int64}, Colon})

define %jl_value_t* @"julia__unsafe_batchsetindex!_22134"(%jl_value_t*, %jl_value_t**, i32) {
top:                                                                              
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !2, metadata !14), !dbg !15
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !16, metadata !17), !dbg !15
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !18, metadata !19), !dbg !15
  %3 = alloca [3 x %jl_value_t*], align 8, !dbg !15                               
  %.sub = getelementptr inbounds [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 0
  %4 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !15
  %5 = bitcast [3 x %jl_value_t*]* %3 to i64*, !dbg !15                           
  store i64 2, i64* %5, align 8, !dbg !15                                         
  %6 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !15
  %7 = load i64, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !15
  %8 = bitcast %jl_value_t** %6 to i64*, !dbg !15                                 
  store i64 %7, i64* %8, align 8, !dbg !15                                        
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !15       
  store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !15                    
  %9 = icmp eq i32 %2, 3, !dbg !20
  br i1 %9, label %fail, label %pass, !dbg !20

fail:                                             ; preds = %top
  %10 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 3, !dbg !20
  call void @jl_bounds_error_tuple_int(%jl_value_t** %10, i64 0, i64 1), !dbg !20
  unreachable, !dbg !20

pass:                                             ; preds = %top
  %11 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1, !dbg !15
  %12 = load %jl_value_t*, %jl_value_t** %11, align 8, !dbg !15
  %13 = getelementptr inbounds %jl_value_t, %jl_value_t* %12, i64 1, !dbg !15
  %14 = bitcast %jl_value_t* %13 to i64*, !dbg !15
  %15 = load i64, i64* %14, align 8, !dbg !15, !tbaa !21
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !22, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !25, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !26, metadata !24), !dbg !15
  %16 = icmp slt i64 %15, 1, !dbg !27
  br i1 %16, label %L.4, label %L.preheader, !dbg !27

L.preheader:                                      ; preds = %pass
  %17 = bitcast %jl_value_t* %12 to double**, !dbg !15
  %18 = load double*, double** %17, align 8, !dbg !15, !tbaa !32
  %19 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 2, !dbg !15
  %20 = bitcast %jl_value_t** %19 to i64**, !dbg !15
  %21 = load i64*, i64** %20, align 8, !dbg !15
  br label %L, !dbg !27

L:                                                ; preds = %L, %L.preheader
  %"#s2.0" = phi i64 [ %22, %L ], [ 1, %L.preheader ]
  %22 = add i64 %"#s2.0", 1, !dbg !27
  call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !35, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %22, i64 0, metadata !26, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !36, metadata !24), !dbg !15
  %23 = load i64, i64* %21, align 16, !dbg !37, !tbaa !38
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !39, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %23, i64 0, metadata !40, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 2, i64 0, metadata !39, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 3, i64 0, metadata !39, metadata !24), !dbg !15
  %24 = add i64 %"#s2.0", -1, !dbg !41
  %25 = sitofp i64 %23 to double, !dbg !41
  %26 = getelementptr double, double* %18, i64 %24, !dbg !41
  store double %25, double* %26, align 8, !dbg !41, !tbaa !42
  store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41
  %27 = icmp eq i64 %"#s2.0", %15, !dbg !43
  br i1 %27, label %L.4.loopexit, label %L, !dbg !43

L.4.loopexit:                                     ; preds = %L
  br label %L.4, !dbg !45

L.4:                                              ; preds = %L.4.loopexit, %pass
  %28 = load i64, i64* %8, align 8, !dbg !45
  store i64 %28, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !45
  ret %jl_value_t* %12, !dbg !45
}

yuyichao · 2015-10-05T22:14:29Z

OK I can see this is happening on release 0.4 as well so it's not #13301.

The AST is a little bit strange and it seems to generate a temporary variable that is never used (which causes the GC frame store)

For _unsafe_batchsetindex!

      $(Expr(:boundscheck, false))
      _var0 = (Base.arrayset)(A::Array{Float64,2},(Base.box)(Float64,(Base.sitofp)(Float64,v::Int64)::Any)::Float64,offset_0::Int64)::Array{Float64,2}
      goto 26
      _var0 = $(Expr(:boundscheck, :(Base.pop)))
      26: 
      _var0::Array{Float64,2} # cartesian.jl, line 34:

For fill!

      $(Expr(:boundscheck, false))
      (Base.arrayset)(a::Array{Float64,2},x::Float64,i::Int64)::Array{Float64,2}
      $(Expr(:boundscheck, :(Base.pop)))

Note that both functions uses a loop and inbound assignment but seems that inlining of unsafe_setindex! confuses the compiler. We could probably work around this but maybe we should just fix the compiler/type inference inliner?

KristofferC · 2015-10-06T09:47:31Z

Close in favour of #13463?

yuyichao · 2015-10-06T12:14:16Z

Not sure, maybe this can be replaced by #13461 if we don't have #13359.

mbauman · 2015-10-07T22:15:12Z

Thanks for doing this, @KristofferC! I went with the more general solution in #13461 for now, but this was a great to get that ball rolling.

stevengj · 2016-05-10T21:27:28Z

I still think this is a good idea. As long as we have a specialized fill! function, it is really strange not to use it for A[:] = x, which is the most natural way for users to do a fill! operation. Especially since in the common case A[:] = 0, I think fill! calls specialized memset code?

Moreover, A[:] = x is currently quite slow (doing a heap allocation), as mentioned in #16302, although this will hopefully be fixed. It's rather unexpected for A[:] = x to perform differently than fill!

mbauman · 2016-05-10T22:14:33Z

You're right, I agree this is still a good idea.

fill! is also defined for all AbstractArrays in a sensible way, so we can do this for everyone (and not just Array). Unfortunately, the most sensible place to do this is immediately after we incur the splatting penalty.

Perhaps a PR that introduces a @generated function to work around the lack of call-site specialization will light a fire here? In general, setindex! shouldn't need to allocate.

make A[:] = x::Number fall back to fill!

b3c47b0

This was referenced Oct 5, 2015

Avoid introducing local variable (and GC frame store) in unsafe_setindex! #13461

Merged

tbaa_gcframe #13463

Merged

mbauman closed this Oct 7, 2015

KristofferC deleted the kc/colon_num branch October 8, 2015 10:27

yuyichao mentioned this pull request May 10, 2016

"A[:] = ..." allocates memory #16302

Closed

KristofferC mentioned this pull request May 11, 2016

make setindex with colon fall back to fill! #16310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

make A[:] = x::Number fall back to fill!#13459

make A[:] = x::Number fall back to fill!#13459
KristofferC wants to merge 1 commit intoJuliaLang:masterfrom
KristofferC:kc/colon_num

KristofferC commented Oct 5, 2015

Uh oh!

yuyichao commented Oct 5, 2015

Uh oh!

yuyichao commented Oct 5, 2015

Uh oh!

KristofferC commented Oct 6, 2015

Uh oh!

yuyichao commented Oct 6, 2015

Uh oh!

mbauman commented Oct 7, 2015

Uh oh!

stevengj commented May 10, 2016 •

edited

Loading

Uh oh!

mbauman commented May 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

KristofferC commented Oct 5, 2015

Uh oh!

yuyichao commented Oct 5, 2015

Uh oh!

yuyichao commented Oct 5, 2015

Uh oh!

KristofferC commented Oct 6, 2015

Uh oh!

yuyichao commented Oct 6, 2015

Uh oh!

mbauman commented Oct 7, 2015

Uh oh!

stevengj commented May 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbauman commented May 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevengj commented May 10, 2016 •

edited

Loading