speed up hvcat_fill! by unrolling internal iteration for type-stability#61426
Open
adienes wants to merge 7 commits intoJuliaLang:masterfrom
Open
speed up hvcat_fill! by unrolling internal iteration for type-stability#61426adienes wants to merge 7 commits intoJuliaLang:masterfrom
hvcat_fill! by unrolling internal iteration for type-stability#61426adienes wants to merge 7 commits intoJuliaLang:masterfrom
Conversation
Member
Author
|
it looks like this is slightly slightly slower in heterogenous cases where the types don't promote to a concrete type, e.g. this seems acceptable to me given that it's much more rare, and will be slow anyway since you end up with an abstractly-typed collection. but just noting. |
This was referenced Mar 29, 2026
adienes
added a commit
that referenced
this pull request
Mar 31, 2026
in a similar vein to #61426, we can speed up `allequal` by unrolling the loop (up to a cap, 32 chosen by convention) I suppose this is not particularly a super common bottleneck but we may as well be faster where possible. master: ``` julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5)) BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample. Range (min … max): 13.861 ns … 8.303 μs ┊ GC (min … max): 0.00% … 99.17% Time (median): 18.412 ns ┊ GC (median): 0.00% Time (mean ± σ): 33.582 ns ± 122.345 ns ┊ GC (mean ± σ): 6.08% ± 1.71% ▅▇█▇▅▂ ▁▄▄▄▃▃▁ ▁▄▅▄▃▃▂▁ ▃▄▄▃▂▁▁▂▂▁ ▁▂▂▁ ▁▁▃▂ ▂ ██████▅▅▃▅▃▁▄▅███████▇▆████████▇▇▇███████████████▇▆▆█████▆▇▆ █ 13.9 ns Histogram: log(frequency) by time 83.2 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12)) BenchmarkTools.Trial: 624 samples with 997 evaluations per sample. Range (min … max): 16.090 ns … 42.490 μs ┊ GC (min … max): 0.00% … 73.54% Time (median): 10.193 μs ┊ GC (median): 0.00% Time (mean ± σ): 8.034 μs ± 4.193 μs ┊ GC (mean ± σ): 0.62% ± 2.94% █ ▆▅▆ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▆▅█▃▂▁▁▁▁▁▁▁▆███▆▃▃ ▃ 16.1 ns Histogram: frequency by time 11.3 μs < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56)) BenchmarkTools.Trial: 480 samples with 1 evaluation per sample. Range (min … max): 9.840 ms … 48.062 ms ┊ GC (min … max): 0.00% … 76.38% Time (median): 10.312 ms ┊ GC (median): 0.00% Time (mean ± σ): 10.399 ms ± 1.744 ms ┊ GC (mean ± σ): 0.74% ± 3.49% ▁▇ ▁▆▄▁▂▃▂▃▃▆█▆▃▄▂▁▁▁ ▁ ▄▄▃▅▇██▇██████████████████▇█▄▄▄▃▂▂▁▃▃▃▂▁▂▂▁▁▁▂▁▁▂▁▂▃▂▁▁▁▁▁▂ ▄ 9.84 ms Histogram: frequency by time 11.5 ms < Memory estimate: 1.45 MiB, allocs estimate: 27954. ``` PR ``` julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5)) BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample. Range (min … max): 14.445 ns … 91.516 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 16.868 ns ┊ GC (median): 0.00% Time (mean ± σ): 16.809 ns ± 1.603 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▅▃▁ █▁▁▂▁ ▁▂▄▅▄▃▄▄▄▃▂▂▂▁▂▄▇█████▇▅▃▃▂▃▇█████▇▄▃▂▂▃▃▃▄▄▄▅▄▄▃▂▂▁▁▁▁▁▁▁▁ ▃ 14.4 ns Histogram: frequency by time 19.6 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12)) BenchmarkTools.Trial: 952 samples with 998 evaluations per sample. Range (min … max): 15.697 ns … 20.862 μs ┊ GC (min … max): 0.00% … 62.59% Time (median): 6.387 μs ┊ GC (median): 0.00% Time (mean ± σ): 5.256 μs ± 3.257 μs ┊ GC (mean ± σ): 0.48% ± 2.84% █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▂▂▃▃▃▃▄▄▃▄▄▄▃▃▄▄▅▄▄▃▃▃▄▃▃▃▃▃▂ ▃ 15.7 ns Histogram: frequency by time 9.37 μs < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56)) BenchmarkTools.Trial: 645 samples with 1 evaluation per sample. Range (min … max): 6.847 ms … 23.438 ms ┊ GC (min … max): 0.00% … 62.03% Time (median): 7.830 ms ┊ GC (median): 0.00% Time (mean ± σ): 7.730 ms ± 827.062 μs ┊ GC (mean ± σ): 0.29% ± 2.44% ▁▂▃▁ ▅█▄▁▁ ▃▇████▆█▇▆▄▄▄▄▃▄▃▃▃▃▃▄▄▄▇█████▇▇▆▇▄▅▄▃▄▅▄▃▄▃▃▄▃▃▃▄▃▃▃▃▂▃▁▃▂ ▄ 6.85 ms Histogram: frequency by time 9.08 ms < Memory estimate: 488.16 KiB, allocs estimate: 9482. ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
replaces #52028 with suggestion given in #52028 (comment)