optimizer: enhance SROA, handle partially-initialized allocations by aviatesk · Pull Request #42834 · JuliaLang/julia

aviatesk · 2021-10-28T15:28:45Z

During adding more test cases for our SROA pass, I found our SROA doesn't
handle allocation sites with uninitialized fields at all.
This commit is built on top of #42833 and tries to handle such "unsafe" allocations,
if there are safe setfield! definitions.

For example, this commit allows the allocation r = Ref{Int}() to be
eliminated in the following example (adapted from https://hackmd.io/bZz8k6SHQQuNUW-Vs7rqfw?view):

julia> code_typed() do
           r = Ref{Int}()
           r[] = 42
           b = sin(r[])
           return b
       end |> only

This commit comes with a plenty of basic test cases for our SROA pass also.

Two test cases depend on #42831 as well.

ianatol · 2021-10-28T19:36:36Z

base/compiler/ssair/passes.jl

-                break
+                for use in du.uses
+                    def = find_def_for_use(ir, domtree, allblocks, du, use)[1]
+                    (def == 0 || def == idx) && @goto skip


If def == 0 here, doesn't that mean that this field is truly unitialized? maybe I am misunderstanding

def is the index where the object field is defined, so it could be:

object allocation site: def >= 1

setfield! call: def >= 1

PhiNode (i.e. the object field can be defined in different control flow, and different possibilities are merged at PhiNode): def == 0

So def == 0 case here corresponds to something like:

let src = code_typed((Bool,)) do cond r = Ref{Any}() if cond r[] = 42 else r[] = 32 end return r[] # ::PhiNode(32, 42) end |> only |> first @test_broken !any(src.code) do @nospecialize x Meta.isexpr(x, :new) end end

Since traversing such PhiNode could be complicated (I think?), I just leave it as TODO.

Maybe this is worth to be left as comment ?

ianatol · 2021-10-28T19:37:32Z

base/compiler/ssair/passes.jl

        # Everything accounted for. Go field by field and perform idf
-        for (fidx, du) in pairs(fielddefuse)
+        for fidx in 1:ndefuse
+            du = fielddefuse[fidx]


A bit nitpicky, but is there any difference between the original and this change?

Ah, no difference. If fielddefuse is type-unstable and it can contain arbitrary type objects (like user-given expressions fielddefuse = x.args where x::Expr), then iterators like pairs(fielddefuse) may produce tuple object whose type is very dynamic (which may cause lots of dispatch with almost no benefit), but for this case we don't need to care that since fielddefuse::Vector{SSADefUse}.

ianatol · 2021-10-28T19:38:17Z

lgtm --- basically move doing idf back so as to include structs with potentially uninitialized fields, then handle those structs by making sure that any field we think could be uninitialized was defined somewhere

oscardssmith · 2021-11-01T12:22:46Z

Will this break the ability to use Ref(x)[] with benchmarking to prevent @btime from turning expressions into constants?

aviatesk · 2021-11-01T12:24:22Z

Maybe ? If the definition is available in the analysis scope where Ref(x)[] is defined (the analysis scope after inlining)

aviatesk · 2021-11-01T12:26:54Z

lgtm --- basically move doing idf back so as to include structs with potentially uninitialized fields, then handle those structs by making sure that any field we think could be uninitialized was defined somewhere

Yea, the core change is here:

julia/base/compiler/ssair/passes.jl

Lines 826 to 829 in a2742c9

    
           for use in du.uses 
        
               def = find_def_for_use(ir, domtree, allblocks, du, use)[1] 
        
               (def == 0 || def == idx) && @goto skip 
        
           end

Previously we give up anytime when allocation contains uninitialized field (fidx + 1 > length(defexpr.args)), but this PR tries to check if there is a "definitive" definition there (!(def == 0 || def == idx)) and then tries scalar replace.

oscardssmith · 2021-11-01T12:28:53Z

Can you post results of @btime (1+1) versus @btime 1+Ref(1)[] with this PR? If this breaks the ability to hide info from profiling tools I think we need to be careful about merging it.

aviatesk · 2021-11-01T12:47:49Z

ah, Ref(1)[] should be replaced with 1 even without this PR.

aviatesk · 2021-11-01T12:57:31Z

On this PR:

julia> @btime (1+1)
  0.032 ns (0 allocations: 0 bytes)
2

julia> @btime 1+Ref(1)[]
  1.235 ns (0 allocations: 0 bytes)
2

The essential difference is that 1 + 1 is constant folded but 1 + Ref(1)[] isn't:

julia> code_typed() do
           a = 1 + Ref(1)[]
           b = 1 + 1
           a, b
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.add_int(1, 1)::Int64
│   %2 = Core.tuple(%1, 2)::Tuple{Int64, Int64}
└──      return %2
) => Tuple{Int64, Int64}

aviatesk · 2021-11-01T12:58:20Z

@nanosoldier runbenchmarks("array" || "foldl" || "sort" || "union", vs = ":master")

vtjnash · 2021-11-01T13:03:29Z

It seems odd those aren't the same. We should fix that. The benchmarks are supposed to interpolate that Ref object to avoid the optimizer removing it.

aviatesk · 2021-11-01T13:09:03Z

Hm I don't see the oddness. Whether or not Ref(1)[] doesn't get optimized into 1, 1 + 1 will be fully constant-folded anyway and will run strictly faster than 1 + Ref(1)[].

vchuravy · 2021-11-01T14:27:08Z

Can you post results of @btime (1+1) versus @btime 1+Ref(1)[] with this PR? If this breaks the ability to hide info from profiling tools I think we need to be careful about merging it.

@oscardssmith the correct invocation is @btime 1+ $(Ref(1))[] (yes I know horrible.)

vtjnash · 2021-11-01T14:27:44Z

I don't think that is the key difference, since:

julia> code_llvm() do; 1+Ref(1)[]; end
;  @ REPL[16]:1 within `#14`
define i64 @"julia_#14_630"() #0 {
top:
  ret i64 2
}

But maybe something with the effect-free computation interacting with no-inline?

vchuravy · 2021-11-01T14:30:08Z

The essential difference is that 1 + 1 is constant folded but 1 + Ref(1)[] isn't:

This is what @Roger-luo observed in Yao. Constant propagation occurs during type-inference, but optimizations can form constant expressions, which will then be optimized by LLVM. Which is okay in the normal case, but cause us to write a const-propagation pass on SSAIR for YaoCompiler.

aviatesk · 2021-11-01T15:41:19Z

But maybe something with the effect-free computation interacting with no-inline?

I think the actual difference in performance stems from:

julia> f() = 1+1; g() = 1+Ref(1)[]; f(); g();

julia> Core.Compiler.invoke_api(only(methods(f)).specializations[1].cache) # with use constant calling convention
2

julia> Core.Compiler.invoke_api(only(methods(g)).specializations[1].cache) # just a usual function call
-1

So the question might be what Valentin pointed out: constant-folding at optimization time (rather, re-inference to maximize general optimization opportunities?). Or introduce mutation-analysis into inference and constant-fold more.

vchuravy · 2021-11-01T16:11:52Z

So the question might be what Valentin pointed out: constant-folding at optimization time (rather, re-inference to maximize general optimization opportunities?). Or introduce mutation-analysis into inference and constant-fold more.

I don't think re-running inference is possible (opens up a whole bag of convergence questions). Doing constant-folding at optimization time is likely beneficial and cheap, and can enable other optimizations.

Or introduce mutation-analysis into inference and constant-fold more.

That might be the way to go, but I am cautious over the cost-benefit here. I would rather enable this as SSAIR optimization so that we can do it after inlining as well.

nanosoldier · 2021-11-01T16:40:12Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2021-11-06T08:59:45Z

@nanosoldier runbenchmarks("array" || "foldl" || "sort" || "union", vs = ":master")

nanosoldier · 2021-11-06T12:30:21Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

During adding more test cases for our SROA pass, I found our SROA doesn't handle allocation sites with uninitialized fields at all. This commit is based on #42833 and tries to handle such "unsafe" allocations, if there are safe `setfield!` definitions. For example, this commit allows the allocation `r = Ref{Int}()` to be eliminated in the following example (adapted from <https://hackmd.io/bZz8k6SHQQuNUW-Vs7rqfw?view>): ```julia julia> code_typed() do r = Ref{Int}() r[] = 42 b = sin(r[]) return b end |> only ``` This commit comes with a plenty of basic test cases for our SROA pass also.

…liaLang#42834) During adding more test cases for our SROA pass, I found our SROA doesn't handle allocation sites with uninitialized fields at all. This commit is based on JuliaLang#42833 and tries to handle such "unsafe" allocations, if there are safe `setfield!` definitions. For example, this commit allows the allocation `r = Ref{Int}()` to be eliminated in the following example (adapted from <https://hackmd.io/bZz8k6SHQQuNUW-Vs7rqfw?view>): ```julia julia> code_typed() do r = Ref{Int}() r[] = 42 b = sin(r[]) return b end |> only ``` This commit comes with a plenty of basic test cases for our SROA pass also.

aviatesk force-pushed the avi/moresroa branch from 7ccb600 to a9ae9f2 Compare October 28, 2021 15:50

Base automatically changed from avi/optopt to master October 28, 2021 16:31

aviatesk force-pushed the avi/moresroa branch from a9ae9f2 to 72715ee Compare October 28, 2021 16:33

aviatesk changed the base branch from master to avi/partialtype October 28, 2021 16:33

ianatol reviewed Oct 28, 2021

View reviewed changes

aviatesk force-pushed the avi/partialtype branch 5 times, most recently from 16b1154 to 3c6dfeb Compare November 1, 2021 06:25

Base automatically changed from avi/partialtype to master November 1, 2021 10:49

aviatesk force-pushed the avi/moresroa branch from 72715ee to a2742c9 Compare November 1, 2021 12:23

aviatesk force-pushed the avi/moresroa branch from 0b5f426 to 2520aa2 Compare November 6, 2021 08:26

aviatesk added merge me PR is reviewed. Merge when all tests are passing compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Nov 6, 2021

aviatesk added 2 commits November 7, 2021 15:50

add comment

b228e48

aviatesk force-pushed the avi/moresroa branch from 2520aa2 to b228e48 Compare November 7, 2021 06:50

aviatesk merged commit 310de1c into master Nov 7, 2021

aviatesk deleted the avi/moresroa branch November 7, 2021 13:39

aviatesk added a commit to aviatesk/EscapeAnalysis.jl that referenced this pull request Nov 7, 2021

update to JuliaLang/julia#42834

0db0c77

DilumAluthge removed the merge me PR is reviewed. Merge when all tests are passing label Nov 7, 2021

Uh oh!

Conversation

aviatesk commented Oct 28, 2021

Uh oh!

ianatol Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

aviatesk Nov 1, 2021

Choose a reason for hiding this comment

Uh oh!

ianatol Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

aviatesk Nov 1, 2021

Choose a reason for hiding this comment

Uh oh!

ianatol commented Oct 28, 2021

Uh oh!

oscardssmith commented Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aviatesk commented Nov 1, 2021

Uh oh!

aviatesk commented Nov 1, 2021

Uh oh!

oscardssmith commented Nov 1, 2021

Uh oh!

aviatesk commented Nov 1, 2021

Uh oh!

aviatesk commented Nov 1, 2021

Uh oh!

aviatesk commented Nov 1, 2021

Uh oh!

vtjnash commented Nov 1, 2021

Uh oh!

aviatesk commented Nov 1, 2021

Uh oh!

vchuravy commented Nov 1, 2021

Uh oh!

vtjnash commented Nov 1, 2021

Uh oh!

vchuravy commented Nov 1, 2021

Uh oh!

aviatesk commented Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy commented Nov 1, 2021

Uh oh!

nanosoldier commented Nov 1, 2021

Uh oh!

aviatesk commented Nov 6, 2021

Uh oh!

nanosoldier commented Nov 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

oscardssmith commented Nov 1, 2021 •

edited

Loading

aviatesk commented Nov 1, 2021 •

edited

Loading