Experiment with a slighly adjusted pipeline by gbaraldi · Pull Request #52850 · JuliaLang/julia

gbaraldi · 2024-01-10T18:22:56Z

Needs #57380 to merge first

gbaraldi · 2024-01-11T16:38:29Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2024-01-12T00:21:14Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

src/llvm-final-gc-lowering.cpp

src/llvm-late-gc-lowering.cpp

gbaraldi · 2024-01-12T17:59:45Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2024-01-13T00:04:04Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

gbaraldi · 2024-01-31T14:58:16Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2024-01-31T21:06:48Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

oscardssmith · 2024-10-16T17:27:57Z

src/pipeline.cpp

+            FPM.addPass(InstCombinePass());
+            FPM.addPass(AggressiveInstCombinePass());


does it make sense to do 2 instcombine right next to each other?

Yep, Clang does exactly this https://github.com/llvm/llvm-project/blob/ae778ae7ce72219270c30d5c8b3d88c9a4803f81/llvm/lib/Passes/PassBuilderPipelines.cpp#L585-L586

It might be worth customizing the AggressiveInstCombinePass slightly since the defaults include some options that are likely not useful for us specifically from https://llvm.org/doxygen/AggressiveInstCombine_8cpp.html:

foldSqrt is probalby useless because we generate LLVM sqrt

tryToRecognizePopCount probably isn't useful since we have count_ones

foldMemChr I don't think we use memchr (but not sure).

This is unlikely to matter much, but probably could save a bit of compile time here and there.

vtjnash · 2024-10-16T19:01:52Z

looks like you need to fix a couple tests:

Failed Tests (2):
2024-10-16 14:39:21 EDT	  Julia :: image-codegen.jl
2024-10-16 14:39:21 EDT	  Julia :: pipeline-prints.ll

also rerunning nanosoldier, since a lot of changes have happened since:
@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2024-10-17T02:01:00Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

oscardssmith · 2024-10-17T03:45:07Z

Looks overall pretty good, but there are a couple 10x regressions (look like vectorization failures). Is there an easy way from nanosoldier for us to test compile time to make sure it's comparable?

Zentrik · 2024-10-17T06:39:59Z

Isn't that what the inference benchmarks are for, which look like no change to me.

gbaraldi · 2024-10-17T15:11:19Z

I took a big look at it. There's still a couple regressions, but it seems to be a pretty clear overall win. If anyone wants to take a further look

["union", "array", ("perf_countequals", "Int8")]
["array", "index", ("sumelt_boundscheck", "Base.ReinterpretArray{BaseBenchmarks.ArrayBenchmarks.PairVals{Int32}, 2, Int64, Matrix{Int64}, false}")] We are failing to elide a boundscheck
The simd conditional loop ones (they are very noisy (per run and per machine)

The 16x regression is now gone with my latest commit

gbaraldi · 2024-10-21T02:16:02Z

Do we want to run a pkgeval? Im slightly worried about the fact that I had to modify passes.

src/llvm-final-gc-lowering.cpp

src/llvm-late-gc-lowering.cpp

src/llvm-lower-handlers.cpp

src/pipeline.cpp

vchuravy · 2024-10-21T07:08:35Z

@nanosoldier runtests(ALL, vs = ":master", configuration = (buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],), vs_configuration = (buildflags = ["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],))

nanosoldier · 2024-10-21T19:20:26Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

gbaraldi · 2024-11-18T22:36:03Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2024-11-19T12:25:17Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

gbaraldi · 2024-11-19T14:05:18Z

["union", "array", ("map", "*", "Float64", "(false, true)")] seems to be quite a large regression in my mac 2x

oscardssmith · 2025-02-06T21:29:19Z

nice to see that this improves the allocation elimination :)

src/llvm-lower-handlers.cpp

giordano · 2025-02-17T16:40:56Z

I can confirm this PR on top of #57380 addresses most of #56145 (stores are vectorised, there's still an out-of-bounds section in the preamble, it shouldn't be a big deal performance-wise although it probably taints effects since it can throw):

julia> code_llvm((Memory{Float64},)) do v
           for idx in eachindex(v)
               v[idx] = 1.0
           end
       end

; Function Signature: var"#59"(Memory{Float64})
;  @ REPL[17]:2 within `#59`
define void @"julia_#59_3153"(ptr noundef nonnull align 8 dereferenceable(16) %"v::GenericMemory") local_unnamed_addr #0 {
top:
  %thread_ptr = call ptr asm "movq %fs:0, $0", "=r"() #11
  %tls_ppgcstack = getelementptr inbounds i8, ptr %thread_ptr, i64 -8
  %tls_pgcstack = load ptr, ptr %tls_ppgcstack, align 8
; ┌ @ range.jl:917 within `iterate`
; │┌ @ range.jl:688 within `isempty`
; ││┌ @ operators.jl:425 within `>`
; │││┌ @ int.jl:83 within `<`
      %.unbox = load i64, ptr %"v::GenericMemory", align 8
      %0 = icmp slt i64 %.unbox, 1
; └└└└
  br i1 %0, label %L29, label %preloop.pseudo.exit

[...]

oob:                                              ; preds = %L11.postloop, %L11
  %value_phi3.lcssa = phi i64 [ %value_phi3.postloop, %L11.postloop ], [ %6, %L11 ]
;  @ REPL[17]:3 within `#59`
; ┌ @ genericmemory.jl:260 within `setindex!`
; │┌ @ genericmemory.jl:252 within `_setindex!`
    %ptls_field = getelementptr inbounds i8, ptr %tls_pgcstack, i64 16
    %ptls_load = load ptr, ptr %ptls_field, align 8
    %"box::GenericMemoryRef" = call noalias nonnull align 8 dereferenceable(32) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 408, i32 32, i64 140577962397856) #8
    %"box::GenericMemoryRef.tag_addr" = getelementptr inbounds i64, ptr %"box::GenericMemoryRef", i64 -1
    store atomic i64 140577962397856, ptr %"box::GenericMemoryRef.tag_addr" unordered, align 8
    store ptr %memoryref_data, ptr %"box::GenericMemoryRef", align 8
    %.repack16 = getelementptr inbounds { ptr, ptr }, ptr %"box::GenericMemoryRef", i64 0, i32 1
    store ptr %"v::GenericMemory", ptr %.repack16, align 8
    call void @ijl_bounds_error_int(ptr nonnull %"box::GenericMemoryRef", i64 %value_phi3.lcssa)
    unreachable

load:                                             ; preds = %L11
    %memoryref_offset = shl i64 %value_phi3, 3
; ││ @ genericmemory.jl:253 within `_setindex!`
    %gep = getelementptr i8, ptr %invariant.gep, i64 %memoryref_offset
    store i64 4607182418800017408, ptr %gep, align 8
; └└
;  @ REPL[17]:4 within `#59`
; ┌ @ range.jl:921 within `iterate`
   %1 = add nuw nsw i64 %value_phi3, 1
; └
  %exitcond39.not = icmp eq i64 %value_phi3, %umax
  br i1 %exitcond39.not, label %main.exit.selector, label %L11

[...]

vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %offset.idx = shl i64 %index, 3
    %12 = or disjoint i64 %offset.idx, 8
; ││ @ genericmemory.jl:253 within `_setindex!`
    %13 = getelementptr i8, ptr %invariant.gep, i64 %12
    %14 = getelementptr i64, ptr %13, i64 4
    %15 = getelementptr i64, ptr %13, i64 8
    %16 = getelementptr i64, ptr %13, i64 12
    store <4 x i64> <i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408>, ptr %13, align 8
    store <4 x i64> <i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408>, ptr %14, align 8
    store <4 x i64> <i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408>, ptr %15, align 8
    store <4 x i64> <i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408, i64 4607182418800017408>, ptr %16, align 8
    %index.next = add nuw i64 %index, 16
    %17 = icmp eq i64 %index.next, %n.vec
    br i1 %17, label %L11, label %vector.body

[...]

It'd be nice to have a test to make sure this doesn't regress, once #57380 is merged.

There are cases where we optimize the SRet more than the pass expected so try and handle those. I'm tryin to get a test for this, this is separated from #52850 to make merging both easier --------- Co-authored-by: Jameson Nash <vtjnash@gmail.com>

gbaraldi · 2025-02-19T19:38:08Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2025-02-20T06:32:06Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Fix #57358

gbaraldi requested review from vchuravy and vtjnash January 11, 2024 13:55

vtjnash reviewed Jan 12, 2024

View reviewed changes

src/llvm-final-gc-lowering.cpp Outdated Show resolved Hide resolved

vtjnash reviewed Jan 12, 2024

View reviewed changes

src/llvm-late-gc-lowering.cpp Outdated Show resolved Hide resolved

gbaraldi force-pushed the gb/pipeline-fun branch from ad90755 to 30ed1f0 Compare January 12, 2024 17:35

gbaraldi force-pushed the gb/pipeline-fun branch from 6a92ba1 to e4a27bd Compare January 30, 2024 23:02

oscardssmith reviewed Oct 16, 2024

View reviewed changes

Zentrik mentioned this pull request Oct 17, 2024

Bump LLVM to v19.1.7+1 #56130

Merged

gbaraldi force-pushed the gb/pipeline-fun branch 2 times, most recently from d6a2afa to b447d87 Compare October 18, 2024 18:01

vchuravy reviewed Oct 21, 2024

View reviewed changes

src/llvm-final-gc-lowering.cpp Outdated Show resolved Hide resolved

vchuravy reviewed Oct 21, 2024

View reviewed changes

src/llvm-late-gc-lowering.cpp Outdated Show resolved Hide resolved

vchuravy reviewed Oct 21, 2024

View reviewed changes

topolarity reviewed Feb 7, 2025

View reviewed changes

src/llvm-lower-handlers.cpp Outdated Show resolved Hide resolved

gbaraldi mentioned this pull request Feb 12, 2025

Make late_gc_lowering more robust #57380

Merged

Change pipeline slightly

99ad967

gbaraldi force-pushed the gb/pipeline-fun branch from a114171 to 99ad967 Compare February 12, 2025 21:28

Fix llvmpasses failures

37f35e5

gbaraldi added 3 commits February 17, 2025 18:28

Merge branch 'master' into gb/pipeline-fun

306e199

Fix tests

788d040

Merge branch 'master' into gb/pipeline-fun

4ecc77f

gbaraldi merged commit 58ce713 into master Feb 21, 2025

gbaraldi deleted the gb/pipeline-fun branch February 21, 2025 12:40

giordano mentioned this pull request Feb 22, 2025

Memory stores aren't vectorised in a for loop unless explicit at-inbounds is used JuliaArrays/FixedSizeArrays.jl#70

Closed

vtjnash referenced this pull request Feb 23, 2025

strip Module filename from metadata (#57499)

88aa27f

Fix #57358

adienes mentioned this pull request Mar 31, 2025

map gives different result than applying function separately only on nightly #57959

Closed

sgaure mentioned this pull request May 13, 2025

performance regression in 1.12.0-beta2 when summing over iterator #58370

Closed

giordano mentioned this pull request Nov 4, 2025

@aliasscope creates miscompilation in Julia 1.11/1.12 without Const #60029

Open

songjhaha mentioned this pull request Feb 27, 2026

Performance regression for Complex #25964

Closed

		FPM.addPass(InstCombinePass());
		FPM.addPass(AggressiveInstCombinePass());

Uh oh!

Conversation

gbaraldi commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbaraldi commented Jan 11, 2024

Uh oh!

nanosoldier commented Jan 12, 2024

Uh oh!

Uh oh!

Uh oh!

gbaraldi commented Jan 12, 2024

Uh oh!

nanosoldier commented Jan 13, 2024

Uh oh!

gbaraldi commented Jan 31, 2024

Uh oh!

nanosoldier commented Jan 31, 2024

Uh oh!

oscardssmith Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

gbaraldi Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oscardssmith Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

vtjnash commented Oct 16, 2024

Uh oh!

nanosoldier commented Oct 17, 2024

Uh oh!

oscardssmith commented Oct 17, 2024

Uh oh!

Zentrik commented Oct 17, 2024

Uh oh!

gbaraldi commented Oct 17, 2024

Uh oh!

gbaraldi commented Oct 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vchuravy commented Oct 21, 2024

Uh oh!

nanosoldier commented Oct 21, 2024

Uh oh!

gbaraldi commented Nov 18, 2024

Uh oh!

nanosoldier commented Nov 19, 2024

Uh oh!

gbaraldi commented Nov 19, 2024

Uh oh!

oscardssmith commented Feb 6, 2025

Uh oh!

Uh oh!

giordano commented Feb 17, 2025

Uh oh!

gbaraldi commented Feb 19, 2025

Uh oh!

nanosoldier commented Feb 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

gbaraldi commented Jan 10, 2024 •

edited

Loading

gbaraldi Oct 16, 2024 •

edited

Loading