Skip to content

Hoist data pointer loads from array allocations#43547

Closed
pchintalapudi wants to merge 22 commits intoJuliaLang:masterfrom
pchintalapudi:pc/data-hoist
Closed

Hoist data pointer loads from array allocations#43547
pchintalapudi wants to merge 22 commits intoJuliaLang:masterfrom
pchintalapudi:pc/data-hoist

Conversation

@pchintalapudi
Copy link
Copy Markdown
Member

When arrays are indexed into, the data pointer is loaded at the indexing site. Typically GVN moves this data pointer load out of loops, but we can do this more eagerly for 2D/3D array allocations and 1D array allocations that do not escape, enabling more of the optimization pipeline to come into play (e.g. loop vectorization).

This PR depends on #43487 for the initial array identification routine as well as the ArrayOpt pass, and therefore #43057 for the escape analysis improvements.

Example:

function f(d1, d2, d3)
           s = 0
           a = zeros(Int, d1, d2, d3)
           for i in eachindex(a)
               s += a[i]
           end
           s
       end

#43487 :
Godbolt: https://godbolt.org/z/9sr36nsvq
Benchmark:

julia> @benchmark f(100,100,100)
BenchmarkTools.Trial: 3347 samples with 1 evaluation.
 Range (min … max):  975.691 μs …   7.981 ms  ┊ GC (min … max):  0.00% …  0.00%
 Time  (median):       1.015 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.488 ms ± 836.460 μs  ┊ GC (mean ± σ):  10.71% ± 16.41%

  █▆▃                                 ▂▃▁         ▃▃         ▃▃ ▁
  ███▆▃█▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▁▁▃▁▁▁▁▃███▅▃▆▇▃▁▁▁▇██▆▅▃▇▇▃▁▁▄██ █
  976 μs        Histogram: log(frequency) by time       3.33 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 2.

PR:
Godbolt: https://godbolt.org/z/8qr8rYv14
Benchmark:

julia> @benchmark f(100,100,100)
BenchmarkTools.Trial: 4292 samples with 1 evaluation.
 Range (min … max):  688.914 μs …   6.848 ms  ┊ GC (min … max):  0.00% … 63.87%
 Time  (median):     723.949 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.159 ms ± 764.938 μs  ┊ GC (mean ± σ):  10.78% ± 17.55%

  █▆▄   ▁                          ▃▂                ▃▂    ▃▃▁  ▁
  ███▆▄▆█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▄▁▄████▅▃▇█▇▄▃▁▄▁▁▃▁▃▇████▅▆████ █
  689 μs        Histogram: log(frequency) by time       2.86 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 2.

@pchintalapudi
Copy link
Copy Markdown
Member Author

Since my branches are part of a fork, I've created a PR comparing this one and the array length propagation PR immediately before it here: pchintalapudi#3

@pchintalapudi pchintalapudi added arrays [a, r, r, a, y, s] compiler:codegen Generation of LLVM IR and native code performance Must go faster labels Dec 29, 2021
@pchintalapudi pchintalapudi force-pushed the pc/data-hoist branch 5 times, most recently from 487457a to 00a9b49 Compare January 8, 2022 07:04
Copy link
Copy Markdown
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some unit-test specifically for the new llvm-array-opt pass?

src/ccall.cpp Outdated
static_rt);
if (auto array_alloc = dyn_cast<CallInst>(retval.V)) {
array_alloc->addAttribute(AttributeList::ReturnIndex, Attribute::NoAlias);
array_alloc->setMetadata("allocation.array", MDNode::get(jl_LLVMContext, {}));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this MD to have a julia prefix.

}
}

//FIXME: This doesn't actually work on Windows, as Windows inserts a trampoline
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is outdated now? Should instead explain why metadata is used

pchintalapudi pushed a commit to pchintalapudi/julia that referenced this pull request Jan 9, 2022
pchintalapudi pushed a commit to pchintalapudi/julia that referenced this pull request Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrays [a, r, r, a, y, s] compiler:codegen Generation of LLVM IR and native code performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants