Skip to content

unroll tuple allequal for performance#61433

Merged
adienes merged 4 commits intoJuliaLang:masterfrom
adienes:tuple_allequal
Mar 31, 2026
Merged

unroll tuple allequal for performance#61433
adienes merged 4 commits intoJuliaLang:masterfrom
adienes:tuple_allequal

Conversation

@adienes
Copy link
Copy Markdown
Member

@adienes adienes commented Mar 29, 2026

in a similar vein to #61426, we can speed up allequal by unrolling the loop (up to a cap, 32 chosen by convention)

I suppose this is not particularly a super common bottleneck but we may as well be faster where possible.

master:

julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5))
BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample.
 Range (min … max):  13.861 ns …   8.303 μs  ┊ GC (min … max): 0.00% … 99.17%
 Time  (median):     18.412 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   33.582 ns ± 122.345 ns  ┊ GC (mean ± σ):  6.08% ±  1.71%

  ▅▇█▇▅▂        ▁▄▄▄▃▃▁  ▁▄▅▄▃▃▂▁   ▃▄▄▃▂▁▁▂▂▁ ▁▂▂▁   ▁▁▃▂     ▂
  ██████▅▅▃▅▃▁▄▅███████▇▆████████▇▇▇███████████████▇▆▆█████▆▇▆ █
  13.9 ns       Histogram: log(frequency) by time      83.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12))
BenchmarkTools.Trial: 624 samples with 997 evaluations per sample.
 Range (min … max):  16.090 ns … 42.490 μs  ┊ GC (min … max): 0.00% … 73.54%
 Time  (median):     10.193 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):    8.034 μs ±  4.193 μs  ┊ GC (mean ± σ):  0.62% ±  2.94%

  █                                                    ▆▅▆     
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▆▅█▃▂▁▁▁▁▁▁▁▆███▆▃▃ ▃
  16.1 ns         Histogram: frequency by time        11.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56))
BenchmarkTools.Trial: 480 samples with 1 evaluation per sample.
 Range (min … max):   9.840 ms … 48.062 ms  ┊ GC (min … max): 0.00% … 76.38%
 Time  (median):     10.312 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.399 ms ±  1.744 ms  ┊ GC (mean ± σ):  0.74% ±  3.49%

       ▁▇ ▁▆▄▁▂▃▂▃▃▆█▆▃▄▂▁▁▁ ▁                                 
  ▄▄▃▅▇██▇██████████████████▇█▄▄▄▃▂▂▁▃▃▃▂▁▂▂▁▁▁▂▁▁▂▁▂▃▂▁▁▁▁▁▂ ▄
  9.84 ms         Histogram: frequency by time        11.5 ms <

 Memory estimate: 1.45 MiB, allocs estimate: 27954.

PR

julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5))
BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample.
 Range (min … max):  14.445 ns … 91.516 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     16.868 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   16.809 ns ±  1.603 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▅▃▁       █▁▁▂▁                           
  ▁▂▄▅▄▃▄▄▄▃▂▂▂▁▂▄▇█████▇▅▃▃▂▃▇█████▇▄▃▂▂▃▃▃▄▄▄▅▄▄▃▂▂▁▁▁▁▁▁▁▁ ▃
  14.4 ns         Histogram: frequency by time        19.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12))
BenchmarkTools.Trial: 952 samples with 998 evaluations per sample.
 Range (min … max):  15.697 ns … 20.862 μs  ┊ GC (min … max): 0.00% … 62.59%
 Time  (median):      6.387 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):    5.256 μs ±  3.257 μs  ┊ GC (mean ± σ):  0.48% ±  2.84%

  █                                                            
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▂▂▃▃▃▃▄▄▃▄▄▄▃▃▄▄▅▄▄▃▃▃▄▃▃▃▃▃▂ ▃
  15.7 ns         Histogram: frequency by time        9.37 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56))
BenchmarkTools.Trial: 645 samples with 1 evaluation per sample.
 Range (min … max):  6.847 ms …  23.438 ms  ┊ GC (min … max): 0.00% … 62.03%
 Time  (median):     7.830 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.730 ms ± 827.062 μs  ┊ GC (mean ± σ):  0.29% ±  2.44%

    ▁▂▃▁                   ▅█▄▁▁                               
  ▃▇████▆█▇▆▄▄▄▄▃▄▃▃▃▃▃▄▄▄▇█████▇▇▆▇▄▅▄▃▄▅▄▃▄▃▃▄▃▃▃▄▃▃▃▃▂▃▁▃▂ ▄
  6.85 ms         Histogram: frequency by time        9.08 ms <

 Memory estimate: 488.16 KiB, allocs estimate: 9482.

@adienes adienes added performance Must go faster collections Data structures holding multiple items, e.g. sets equality Issues relating to equality relations: ==, ===, isequal labels Mar 29, 2026
@adienes adienes added merge me PR is reviewed. Merge when all tests are passing and removed status: waiting for PR reviewer labels Mar 31, 2026
@adienes adienes merged commit 582a6e6 into JuliaLang:master Mar 31, 2026
8 of 10 checks passed
@adienes adienes deleted the tuple_allequal branch March 31, 2026 23:36
@adienes adienes removed the merge me PR is reviewed. Merge when all tests are passing label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

collections Data structures holding multiple items, e.g. sets equality Issues relating to equality relations: ==, ===, isequal performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants