Skip to content

Add Neon implementation of is_sorted_until#6018

Merged
StephanTLavavej merged 2 commits intomicrosoft:mainfrom
hazzlim:is-sorted-until-pr
Jan 20, 2026
Merged

Add Neon implementation of is_sorted_until#6018
StephanTLavavej merged 2 commits intomicrosoft:mainfrom
hazzlim:is-sorted-until-pr

Conversation

@hazzlim
Copy link
Contributor

@hazzlim hazzlim commented Jan 16, 2026

This PR adds a vectorized implementation of is_sorted_until using Neon intrinsics 🚀

Performance numbers (speedup figure relative to the existing, non-manually vectorized code - higher is better)

Benchmark MSVC Speedup Clang Speedup
bm_is_sorted_until<std::int8_t, AlgType::Std>/3000/1800 10 12.895
bm_is_sorted_until<std::int8_t, AlgType::Rng>/3000/1800 11.795 10.204
bm_is_sorted_until<std::int16_t, AlgType::Std>/3000/1800 5.674 5.6
bm_is_sorted_until<std::int16_t, AlgType::Rng>/3000/1800 6.551 5.442
bm_is_sorted_until<std::int32_t, AlgType::Std>/3000/1800 3.039 2.908
bm_is_sorted_until<std::int32_t, AlgType::Rng>/3000/1800 3.566 2.908
bm_is_sorted_until<std::int64_t, AlgType::Std>/3000/1800 1.549 1.507
bm_is_sorted_until<std::int64_t, AlgType::Rng>/3000/1800 1.899 1.581
bm_is_sorted_until<std::uint8_t, AlgType::Std>/3000/1800 9.673 12.436
bm_is_sorted_until<std::uint8_t, AlgType::Rng>/3000/1800 11.5 10.459
bm_is_sorted_until<std::uint16_t, AlgType::Std>/3000/1800 5.463 6.389
bm_is_sorted_until<std::uint16_t, AlgType::Rng>/3000/1800 6.389 6.944
bm_is_sorted_until<std::uint32_t, AlgType::Std>/3000/1800 3.017 3.172
bm_is_sorted_until<std::uint32_t, AlgType::Rng>/3000/1800 3.636 3.178
bm_is_sorted_until<std::uint64_t, AlgType::Std>/3000/1800 1.549 1.739
bm_is_sorted_until<std::uint64_t, AlgType::Rng>/3000/1800 1.818 1.581
bm_is_sorted_until<float, AlgType::Std>/3000/1800 3.939 3.297
bm_is_sorted_until<float, AlgType::Rng>/3000/1800 3.883 3.475
bm_is_sorted_until<double, AlgType::Std>/3000/1800 2.026 1.663
bm_is_sorted_until<double, AlgType::Rng>/3000/1800 2.016 1.7

@hazzlim hazzlim requested a review from a team as a code owner January 16, 2026 13:23
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jan 16, 2026
@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Jan 16, 2026
@StephanTLavavej StephanTLavavej self-assigned this Jan 16, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jan 17, 2026
@StephanTLavavej StephanTLavavej removed their assignment Jan 17, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Jan 20, 2026
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit d8e8e6f into microsoft:main Jan 20, 2026
45 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Jan 20, 2026
@StephanTLavavej
Copy link
Member

🦾 6️⃣ 4️⃣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64 Related to the ARM64 architecture performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants