enable/improve constant propagation through varargs methods #26826
enable/improve constant propagation through varargs methods #26826
Conversation
base/compiler/inferenceresult.jl
Outdated
| if isvarargtype(va) | ||
| # assumes that we should never see Vararg{T, x}, where x is a constant (should be guaranteed by construction) | ||
| va = rewrap_unionall(va, linfo.specTypes) | ||
| vararg_type_vec = Any[va] |
There was a problem hiding this comment.
This vec type is wrong - there’s basically nothing we can do in this case (just keep the empty Tuple)
There was a problem hiding this comment.
actually, looks like the inference code will handle this, so this is OK
There was a problem hiding this comment.
So to make sure I follow, you're saying that we can't do anything if we don't actually have the types of the "trailing" arguments in atypes, so this should just be:
if nargs > laty
vararg_type_vec = Any[]
vararg_type = Tuple{}
else
⋮ # the stuff that's already here, but fixed wrt to your other comment
endEDIT: oops, I made this comment before refreshing the page, so I didn't see your other comment
| if cache_code.linfo === code && length(cache_args) >= nargs | ||
| if cache_code.linfo === code && length(cache_args) == length(argtypes) | ||
| cache_match = true | ||
| # verify that the trailing args (va) aren't Const |
There was a problem hiding this comment.
Still need to verify that the vargs list matches
There was a problem hiding this comment.
Verifying this correctly would essentially require fully recomputing the vararg_type_vec part of get_argtypes, right?
Should I actually do that, or is there a more lightweight check I should be doing?
EDIT: This would also be necessary for checking the last element of cache_args, which is the vararg_type that we computed in get_argtypes.
base/compiler/inferenceresult.jl
Outdated
| for i in 1:nargs | ||
| for i in 1:length(argtypes) | ||
| a = argtypes[i] | ||
| ca = cache_args[i] |
base/compiler/inferenceresult.jl
Outdated
| # try to search cache first | ||
| cache_args = cache_code.args | ||
| if cache_code.linfo === code && length(cache_args) >= nargs | ||
| if cache_code.linfo === code && length(cache_args) == length(argtypes) |
There was a problem hiding this comment.
length(cache_args) === nargs
This change would filter out any varargs method from being found in the cache (slow, but not otherwise broken)
| vararg_type_vec = Any[rewrap_unionall(p, linfo.specTypes) for p in atypes[nargs:laty]] | ||
| vararg_type = tuple_tfunc(Tuple{vararg_type_vec...}) | ||
| for i in 1:length(vararg_type_vec) | ||
| atyp = vararg_type_vec[i] |
There was a problem hiding this comment.
Actually, this seems wrong too -atypes can be a Varargs here, and I don’t recall seeing that being checked for in the downstream users
There was a problem hiding this comment.
(Just talked to Jameson in person, turns out this is actually okay)
| result.vargs = vararg_type_vec | ||
| end | ||
| args[nargs] = vararg_type | ||
| nargs -= 1 |
There was a problem hiding this comment.
If this got put back, aren’t we missing detection of the Varargs argument as Const then?
There was a problem hiding this comment.
Ah, wait, no. I think tuple_tfunc took care of this
|
BSD CI got unkillable process from this PR, and it ate 100% CPU on a core. and this |
415b59f to
5572926
Compare
5572926 to
d7a79bf
Compare
|
Looks like this passes all tests except for |
|
Okay, I've tracked down the offending test here: julia/test/compiler/compiler.jl Line 1248 in 1c5ed70 Interrupting that test while it hangs yields: |
c4e106e to
efc42a4
Compare
|
Woohoo, tests here now pass locally. |
|
Only things left here now are adding tests and investigating the potential cache problems pointed out at JuliaLabs/Cassette.jl#41 (comment) |
d863bba to
3424b31
Compare
|
Okay, tests added and caching problem fixed (thanks to @vtjnash). I rebased and squashed so that this doesn't leave a bunch of intermediate broken commits. Only thing left now is to make sure CI passes and then this is good to go. |
|
From running tests locally, it looks like everything passes except a number of tests in EDIT: Here's a gist of the failures logged by the test suite: https://gist.github.com/jrevels/197ed8b8bf71d72acd8bebe246116655 EDIT 2: ...and here's a gist that reproduces the failures in a more digestable format (ripped from the failing test code): https://gist.github.com/jrevels/20f5e1d7ac64c0b51099960ded3e4687 |
|
Good luck. I've lost many hours to failures in there. Typically it's been due to a failure to inline. Check |
|
So, on master (see https://gist.github.com/jrevels/20f5e1d7ac64c0b51099960ded3e4687 for test setup): julia> @test (@allocated broadcast!(+, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(*, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(f, Q, X, Y, Z)) == 0
Test Failed at REPL[10]:1
Expression: #= REPL[10]:1 =# @allocated(broadcast!(f, Q, X, Y, Z)) == 0
Evaluated: 16 == 0
ERROR: There was an error during testingOn this PR currently: julia> @test (@allocated broadcast!(+, Q, X, Y, Z)) == 0
Test Failed at REPL[8]:1
Expression: #= REPL[8]:1 =# @allocated(broadcast!(+, Q, X, Y, Z)) == 0
Evaluated: 6688 == 0
ERROR: There was an error during testing
julia> @test (@allocated broadcast!(*, Q, X, Y, Z)) == 0
Test Failed at REPL[9]:1
Expression: #= REPL[9]:1 =# @allocated(broadcast!(*, Q, X, Y, Z)) == 0
Evaluated: 6688 == 0
ERROR: There was an error during testing
julia> @test (@allocated broadcast!(f, Q, X, Y, Z)) == 0
Test Failed at REPL[10]:1
Expression: #= REPL[10]:1 =# @allocated(broadcast!(f, Q, X, Y, Z)) == 0
Evaluated: 1744 == 0
ERROR: There was an error during testingEDIT: I was dumb, see comment below |
|
D'oh, I was dumb and didn't rebuild the sysimg. It turns out the culprit is entirely the change to inlineable that reduced InferenceResult cache misses, which makes sense. Here's this PR on the MWE tests with the offending change reverted: julia> @test (@allocated broadcast!(+, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(*, Q, X, Y, Z)) == 0
Test Passed
julia> @test (@allocated broadcast!(f, Q, X, Y, Z)) == 0
Test PassedSo I just need to figure out how to fix those cache misses without affecting inlining... |
3424b31 to
26911c1
Compare
|
Okay, I just pushed a rewrite of the cache lookup fix which passes both the new compiler tests and the MWE above. @vtjnash points out that the previous cache lookup fix also fixed another problem that is currently on master: the I'm not going to consider fixing this issue necessary for merging this PR, since the issue already exists on master, but it might be worth looking into later. |
26911c1 to
1a9f8e8
Compare
|
Okay, tests all pass locally now. This PR is basically done, so I'm going to merge once CI turns up green unless there are any objections. |
|
Worth running Nanosoldier? |
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
base/compiler/inferenceresult.jl
Outdated
| mutable struct InferenceResult | ||
| linfo::MethodInstance | ||
| args::Vector{Any} | ||
| vargs::Vector{Any} |
There was a problem hiding this comment.
This field could use an explanatory comment.
- Store varargs type information in the InferenceResult object, such that the info can be used during inference/optimization
- Hack in a more precise return type for getfield of a vararg tuple. Ideally, we would handle this by teaching inference to track the types of the individual fields of a Tuple, which would make this unnecessary, but until then, this hack is helpful.
- Spoof parents as well as children during recursion limiting, so that higher degree cycles are appropriately spoofed
- A broadcast test marked as broken is now no longer broken, presumably due to the optimizations in this commit
- Fix relationship between depth/mindepth in limit_type_size/is_derived_type. The relationship should have been inverse over the domain in which they overlap, but was not maintained consistently. An example of problematic case was:
t = Tuple{X,X} where X<:Tuple{Tuple{Int64,Vararg{Int64,N} where N},Tuple{Int64,Vararg{Int64,N} where N}}
c = Tuple{X,X} where X<:Tuple{Int64,Vararg{Int64,N} where N}
because is_derived_type was computing the depth of usage rather than the depth of definition. This change thus makes the depth/mindepth calculations more consistent, and causes the limiting heuristic to return strictly wider types than it did before.
- Move the optimizer's "varargs types to tuple type" rewrite to after cache lookup.Inference is populating the InferenceResult cache using the varargs form, so the optimizer needs to do the lookup before writing the atypes in order to avoid cache misses.
Co-authored-by: Jameson Nash <vtjnash@users.noreply.github.com>
Co-authored-by: Keno Fischer <keno@alumni.harvard.edu>
1a9f8e8 to
b4b4d21
Compare
|
Added the requested comment. Investigated the benchmark regressions, none seemed significant. Even the 3.65x one showed no regression locally, and had the same CI was totally green (besides unrelated Windows timeout) before I added the requested comment, so I'll go ahead and merge (I did rebuild and run tests locally just be absolutely sure, but of course adding a comment didn't break anything). |
* fix InferenceResult cache lookup for new optimizer * utilize cached vararg type info in new inlining pass * fix test to work with new optimizer * fix predicate for exploiting cached varargs type info * atypes no longer needs to be modified to match cached types * don't require varargs to be in last position to exploit cached type info * restore missing isva boolean
- inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in #26826, but I think `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in #26826, but I think `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in #26826, but `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in JuliaLang#26826, but `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
- inferenceresult.jl: just dead allocations - inlining.jl: `UnionSplitSignature` (and its `SimpleCartesian` helper) is no longer used - abstractinterpretation.jl: the use of `precise_container_type` seems to be introduced in JuliaLang#26826, but `getfield_tfunc` at this moment is precise enough to propagate elements of constant tuples.
This should allow constant propagation through varargs functions. I was just a keyboard monkey following @Keno's/@vtjnash's direction for this PR, so they can probably answer any questions better than I can.
EDIT: We've now added some other stuff that expands the scope here a bit (mainly so we don't forget some of these hacks), but this PR can be broken up into several others later if need be