speed up `hvcat_fill!` by unrolling internal iteration for type-stability by adienes · Pull Request #61426 · JuliaLang/julia

adienes · 2026-03-28T18:51:41Z

julia> using BenchmarkTools

julia> @btime [1.0 2; 3 4.0];
  75.248 ns (8 allocations: 336 bytes) # master
  9.593 ns (2 allocations: 112 bytes) # PR

julia> @btime [1 2 3im; 4 5 6im;;;]
  104.256 ns (10 allocations: 720 bytes) # master
  14.236 ns (2 allocations: 176 bytes) # PR

replaces #52028 with suggestion given in #52028 (comment)

…lity

adienes · 2026-03-28T19:25:19Z

it looks like this is slightly slightly slower in heterogenous cases where the types don't promote to a concrete type, e.g.

julia> @btime ['a' 2; 'b' 3];
  282.313 ns (25 allocations: 1.16 KiB) # master
  302.454 ns (25 allocations: 1.16 KiB) # PR

this seems acceptable to me given that it's much more rare, and will be slow anyway since you end up with an abstractly-typed collection. but just noting.

…inability

@benchmark

in a similar vein to #61426, we can speed up `allequal` by unrolling the loop (up to a cap, 32 chosen by convention) I suppose this is not particularly a super common bottleneck but we may as well be faster where possible. master: ``` julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5)) BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample. Range (min … max): 13.861 ns … 8.303 μs ┊ GC (min … max): 0.00% … 99.17% Time (median): 18.412 ns ┊ GC (median): 0.00% Time (mean ± σ): 33.582 ns ± 122.345 ns ┊ GC (mean ± σ): 6.08% ± 1.71% ▅▇█▇▅▂ ▁▄▄▄▃▃▁ ▁▄▅▄▃▃▂▁ ▃▄▄▃▂▁▁▂▂▁ ▁▂▂▁ ▁▁▃▂ ▂ ██████▅▅▃▅▃▁▄▅███████▇▆████████▇▇▇███████████████▇▆▆█████▆▇▆ █ 13.9 ns Histogram: log(frequency) by time 83.2 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12)) BenchmarkTools.Trial: 624 samples with 997 evaluations per sample. Range (min … max): 16.090 ns … 42.490 μs ┊ GC (min … max): 0.00% … 73.54% Time (median): 10.193 μs ┊ GC (median): 0.00% Time (mean ± σ): 8.034 μs ± 4.193 μs ┊ GC (mean ± σ): 0.62% ± 2.94% █ ▆▅▆ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▆▅█▃▂▁▁▁▁▁▁▁▆███▆▃▃ ▃ 16.1 ns Histogram: frequency by time 11.3 μs < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56)) BenchmarkTools.Trial: 480 samples with 1 evaluation per sample. Range (min … max): 9.840 ms … 48.062 ms ┊ GC (min … max): 0.00% … 76.38% Time (median): 10.312 ms ┊ GC (median): 0.00% Time (mean ± σ): 10.399 ms ± 1.744 ms ┊ GC (mean ± σ): 0.74% ± 3.49% ▁▇ ▁▆▄▁▂▃▂▃▃▆█▆▃▄▂▁▁▁ ▁ ▄▄▃▅▇██▇██████████████████▇█▄▄▄▃▂▂▁▃▃▃▂▁▂▂▁▁▁▂▁▁▂▁▂▃▂▁▁▁▁▁▂ ▄ 9.84 ms Histogram: frequency by time 11.5 ms < Memory estimate: 1.45 MiB, allocs estimate: 27954. ``` PR ``` julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5)) BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample. Range (min … max): 14.445 ns … 91.516 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 16.868 ns ┊ GC (median): 0.00% Time (mean ± σ): 16.809 ns ± 1.603 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▅▃▁ █▁▁▂▁ ▁▂▄▅▄▃▄▄▄▃▂▂▂▁▂▄▇█████▇▅▃▃▂▃▇█████▇▄▃▂▂▃▃▃▄▄▄▅▄▄▃▂▂▁▁▁▁▁▁▁▁ ▃ 14.4 ns Histogram: frequency by time 19.6 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12)) BenchmarkTools.Trial: 952 samples with 998 evaluations per sample. Range (min … max): 15.697 ns … 20.862 μs ┊ GC (min … max): 0.00% … 62.59% Time (median): 6.387 μs ┊ GC (median): 0.00% Time (mean ± σ): 5.256 μs ± 3.257 μs ┊ GC (mean ± σ): 0.48% ± 2.84% █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▂▂▃▃▃▃▄▄▃▄▄▄▃▃▄▄▅▄▄▃▃▃▄▃▃▃▃▃▂ ▃ 15.7 ns Histogram: frequency by time 9.37 μs < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56)) BenchmarkTools.Trial: 645 samples with 1 evaluation per sample. Range (min … max): 6.847 ms … 23.438 ms ┊ GC (min … max): 0.00% … 62.03% Time (median): 7.830 ms ┊ GC (median): 0.00% Time (mean ± σ): 7.730 ms ± 827.062 μs ┊ GC (mean ± σ): 0.29% ± 2.44% ▁▂▃▁ ▅█▄▁▁ ▃▇████▆█▇▆▄▄▄▄▃▄▃▃▃▃▃▄▄▄▇█████▇▇▆▇▄▅▄▃▄▅▄▃▄▃▃▄▃▃▃▄▃▃▃▃▂▃▁▃▂ ▄ 6.85 ms Histogram: frequency by time 9.08 ms < Memory estimate: 488.16 KiB, allocs estimate: 9482. ```

speed up hvcat_fill! by unrolling internal iteration for type-stabi…

c9bdae8

…lity

adienes added performance Must go faster arrays [a, r, r, a, y, s] labels Mar 28, 2026

adienes added 5 commits March 28, 2026 16:04

partially revert bootstrap order changes

6b9ee0a

add ndim path and make homogenous tuples do the same

87b2004

add threshold to limit codegen size

44a5675

fix return value for ndim path

71d5e70

unify hvcat_fill! and hvncat_fill! for more DRY and better mainta…

f8e4500

…inability

This was referenced Mar 29, 2026

Type stabilize hvcat_fill! #52028

Open

unroll tuple allequal for performance #61433

Merged

allocated doesn't include compilation

911b988

adienes added the status: waiting for PR reviewer label Mar 30, 2026

BioTurboNick mentioned this pull request Mar 30, 2026

Fix performance regression in hvcat of simple matrices #57422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speed up `hvcat_fill!` by unrolling internal iteration for type-stability#61426

speed up `hvcat_fill!` by unrolling internal iteration for type-stability#61426
adienes wants to merge 7 commits intoJuliaLang:masterfrom
adienes:type_stable_hvcat_fill

adienes commented Mar 28, 2026 •

edited

Loading

Uh oh!

adienes commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

adienes commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adienes commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adienes commented Mar 28, 2026 •

edited

Loading