allow @nospecialize-d push! to take arbitrary items#45790
Conversation
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
test/compiler/inline.jl
Outdated
| @test count(iscall((src, push!)), src.code) == 0 | ||
| @test count(src.code) do @nospecialize x | ||
| isa(x, Core.GotoNode) || isa(x, Core.GotoIfNot) | ||
| end == 0 # the loop should be optimized away for a single item |
There was a problem hiding this comment.
this seems to be failing on many CI bots
There was a problem hiding this comment.
It seems like push!(::Vector{Any}, ::Any) produces the following IR:
julia> code_typed((Vector{Any},Any)) do xs, x
push!(xs, x)
end
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.arraylen(xs)::Int64
│ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
└── goto #3 if not true
2 ─ %4 = Base.add_int(%1, 1)::Int64
└── Base.arrayset(true, xs, x, %4)::Vector{Any}
3 ┄ goto #4
4 ─ return xs
) => Vector{Any}So the loop can't be fully optimized on Julia level actually, but I confirmed LLVM can further optimize it down the road and the performance doesn't regress.
EDIT: A nanosolidier result suggested that loop can be left unoptimized. For now, I'd like to specialize the single-arg case and optimize it manually.
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Currently the `@nospecialize`-d `push!(::Vector{Any}, ...)` can only
take a single item and we will end up with runtime dispatch when we try
to call it with multiple items:
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
CodeInfo(
1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│ %2 = Base.arraylen(a)::Int64
│ Base.arrayset(true, a, item, %2)::Vector{Any}
└── return a
) => Vector{Any}
julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.append!(a, iter)::Vector{Any}
└── return %1
) => Vector{Any}
```
This commit extends it so that it can take arbitrary-length items.
Our compiler should still be able to optimize the single-input case as
before by unrolling using the constant item length information:
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
CodeInfo(
1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│ %2 = Base.arraylen(a)::Int64
│ Base.arrayset(true, a, item, %2)::Vector{Any}
└── return a
) => Vector{Any}
julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.arraylen(a)::Int64
│ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing
└── goto #7 if not true
2 ┄ %4 = φ (#1 => 1, #6 => %14)::Int64
│ %5 = φ (#1 => 1, #6 => %15)::Int64
│ %6 = Base.getfield(x, %4, true)::Any
│ %7 = Base.add_int(%1, %4)::Int64
│ Base.arrayset(true, a, %6, %7)::Vector{Any}
│ %9 = (%5 === 2)::Bool
└── goto #4 if not %9
3 ─ goto #5
4 ─ %12 = Base.add_int(%5, 1)::Int64
└── goto #5
5 ┄ %14 = φ (#4 => %12)::Int64
│ %15 = φ (#4 => %12)::Int64
│ %16 = φ (#3 => true, #4 => false)::Bool
│ %17 = Base.not_int(%16)::Bool
└── goto #7 if not %17
6 ─ goto #2
7 ┄ return a
) => Vector{Any}
```
|
@nanosoldier |
|
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
For `Vector{Any}` to avoid runtime dispatch.
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
…Lang#45790) Currently the `@nospecialize`-d `push!(::Vector{Any}, ...)` can only take a single item and we will end up with runtime dispatch when we try to call it with multiple items: ```julia julia> code_typed(push!, (Vector{Any}, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing │ %2 = Base.arraylen(a)::Int64 │ Base.arrayset(true, a, item, %2)::Vector{Any} └── return a ) => Vector{Any} julia> code_typed(push!, (Vector{Any}, Any, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = Base.append!(a, iter)::Vector{Any} └── return %1 ) => Vector{Any} ``` This commit adds a new specialization that it can take arbitrary-length items. Our compiler should still be able to optimize the single-input case as before via the dispatch mechanism. ```julia julia> code_typed(push!, (Vector{Any}, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing │ %2 = Base.arraylen(a)::Int64 │ Base.arrayset(true, a, item, %2)::Vector{Any} └── return a ) => Vector{Any} julia> code_typed(push!, (Vector{Any}, Any, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = Base.arraylen(a)::Int64 │ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing └── goto JuliaLang#7 if not true 2 ┄ %4 = φ (JuliaLang#1 => 1, JuliaLang#6 => %14)::Int64 │ %5 = φ (JuliaLang#1 => 1, JuliaLang#6 => %15)::Int64 │ %6 = Base.getfield(x, %4, true)::Any │ %7 = Base.add_int(%1, %4)::Int64 │ Base.arrayset(true, a, %6, %7)::Vector{Any} │ %9 = (%5 === 2)::Bool └── goto JuliaLang#4 if not %9 3 ─ goto JuliaLang#5 4 ─ %12 = Base.add_int(%5, 1)::Int64 └── goto JuliaLang#5 5 ┄ %14 = φ (JuliaLang#4 => %12)::Int64 │ %15 = φ (JuliaLang#4 => %12)::Int64 │ %16 = φ (JuliaLang#3 => true, JuliaLang#4 => false)::Bool │ %17 = Base.not_int(%16)::Bool └── goto JuliaLang#7 if not %17 6 ─ goto JuliaLang#2 7 ┄ return a ) => Vector{Any} ``` This commit also adds the equivalent implementations for `pushfirst!`.
Currently the
@nospecialize-dpush!(::Vector{Any}, ...)can onlytake a single item and we will end up with runtime dispatch when we try
to call it with multiple items:
This commit extends it so that it can take arbitrarily long items.
Our compiler should still be able to optimize the single-input case as
before by unrolling using the constant item length information:
@nanosoldier
runbenchmarks("inference", vs=":master")