Add Neon implementation of std::swap_ranges by hazzlim · Pull Request #5819 · microsoft/STL

hazzlim · 2025-10-31T11:02:09Z

Add an implementation of std::swap_ranges using Neon intrinsics.

hazzlim · 2025-10-31T11:05:07Z

@microsoft-github-policy-service agree company="Arm"

hazzlim · 2025-10-31T11:11:45Z

Hopefully I’ve done something reasonable here - please let me know if you would prefer a different approach when adding new implementations to vector_algorithms.cpp

The performance numbers are below - looks good apart from the case of size(1), which I think is due to the added overhead of the function call and the conditional checks now std::swap_ranges is not being inlined into the benchmark.

	MSVC Speedup
std_swap_ranges<uint8_t, highly_aligned_allocator>/1	0.5x
std_swap_ranges<uint8_t, highly_aligned_allocator>/5	0.9x
std_swap_ranges<uint8_t, highly_aligned_allocator>/15	1.9x
std_swap_ranges<uint8_t, highly_aligned_allocator>/26	4.1x
std_swap_ranges<uint8_t, highly_aligned_allocator>/38	4.8x
std_swap_ranges<uint8_t, highly_aligned_allocator>/60	7.3x
std_swap_ranges<uint8_t, highly_aligned_allocator>/125	12.3x
std_swap_ranges<uint8_t, highly_aligned_allocator>/800	21.9x
std_swap_ranges<uint8_t, highly_aligned_allocator>/3000	22.4x
std_swap_ranges<uint8_t, highly_aligned_allocator>/9000	22.5x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/1	0.5x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/5	1x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/15	1.9x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/26	3.9x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/38	4.9x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/60	7.5x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/125	11.8x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/800	14.6x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/3000	15.1x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/9000	14.7x


	clang-cl Speedup
std_swap_ranges<uint8_t, highly_aligned_allocator>/1	0.6x
std_swap_ranges<uint8_t, highly_aligned_allocator>/5	1.1x
std_swap_ranges<uint8_t, highly_aligned_allocator>/15	1.5x
std_swap_ranges<uint8_t, highly_aligned_allocator>/26	1.4x
std_swap_ranges<uint8_t, highly_aligned_allocator>/38	1.4x
std_swap_ranges<uint8_t, highly_aligned_allocator>/60	1.5x
std_swap_ranges<uint8_t, highly_aligned_allocator>/125	1.5x
std_swap_ranges<uint8_t, highly_aligned_allocator>/800	1x
std_swap_ranges<uint8_t, highly_aligned_allocator>/3000	1x
std_swap_ranges<uint8_t, highly_aligned_allocator>/9000	1x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/1	0.6x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/5	1.2x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/15	1.5x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/26	1x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/38	1.2x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/60	1.4x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/125	1.4x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/800	1x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/3000	1x
std_swap_ranges<uint8_t, not_highly_aligned_allocator>/9000	1.1x

Add an implementation of std::swap_ranges using Neon intrinsics.

stl/src/vector_algorithms.cpp

…H 2108.

stl/src/vector_algorithms.cpp

StephanTLavavej · 2025-11-04T18:43:17Z

Thanks @hazzlim, this is awesome! 😻 I pushed some commits, can you rerun your benchmark measurements? We finally have ARM64 runtime testing in PR checks, but I won't be able to gather perf measurements myself until Feb 2026-ish. The changes to override /Os for ARM64, and to eliminate unnecessary loops, should improve performance but perhaps by an unobservable amount.

StephanTLavavej · 2025-11-04T23:42:30Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

hazzlim · 2025-11-05T10:32:56Z

Thanks @hazzlim, this is awesome! 😻 I pushed some commits, can you rerun your benchmark measurements? We finally have ARM64 runtime testing in PR checks, but I won't be able to gather perf measurements myself until Feb 2026-ish. The changes to override /Os for ARM64, and to eliminate unnecessary loops, should improve performance but perhaps by an unobservable amount.

Nice, thanks for doing this! Not sure how I missed that we could remove the loops for len < 64, nice one 😺

I will re-run the perf and report, it may well be unobservable but I think it should also improve the LDP generation so that's a win :)

StephanTLavavej · 2025-11-05T18:19:33Z

Thanks for working towards #813 and congratulations on your first microsoft/STL commit! 😻 🎉 🦾

This will ship in the MSVC Build Tools 14.51 in a future update to Visual Studio 2026.

hazzlim requested a review from a team as a code owner October 31, 2025 11:02

github-project-automation bot moved this to Initial Review in STL Code Reviews Oct 31, 2025

github-project-automation bot added this to STL Code Reviews Oct 31, 2025

Add Neon implementation of std::swap_ranges

7e0a46c

Add an implementation of std::swap_ranges using Neon intrinsics.

hazzlim force-pushed the swap-ranges-neon branch from d3e9269 to 7e0a46c Compare October 31, 2025 11:15

StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Oct 31, 2025

This comment was marked as resolved.

Sign in to view

Address whitespace & preprocessor review comments

55a3715

This comment was marked as resolved.

Sign in to view

StephanTLavavej self-assigned this Oct 31, 2025

AlexGuteniev reviewed Oct 31, 2025

View reviewed changes

stl/src/vector_algorithms.cpp Show resolved Hide resolved

cpplearner reviewed Oct 31, 2025

View reviewed changes

stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved

hazzlim and others added 5 commits November 3, 2025 15:11

Remove unecessary ifdef

86896d1

ARM64 doesn't need legacy __std_swap_ranges_trivially_swappable.

250a17a

Override /Os for all architectures, before any function defns, cite G…

a16cb5a

…H 2108.

Improve perf: Only the 64-byte step needs to loop.

2239670

Reduce the scope of _Mask_64.

3a42a4f

StephanTLavavej reviewed Nov 4, 2025

View reviewed changes

StephanTLavavej approved these changes Nov 4, 2025

View reviewed changes

StephanTLavavej removed their assignment Nov 4, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Nov 4, 2025

StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Nov 4, 2025

StephanTLavavej merged commit 8784104 into microsoft:main Nov 5, 2025
41 checks passed

github-project-automation bot moved this from Merging to Done in STL Code Reviews Nov 5, 2025

StephanTLavavej removed this from STL Code Reviews Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Neon implementation of std::swap_ranges#5819

Add Neon implementation of std::swap_ranges#5819
StephanTLavavej merged 7 commits intomicrosoft:mainfrom
hazzlim:swap-ranges-neon

hazzlim commented Oct 31, 2025

Uh oh!

hazzlim commented Oct 31, 2025

Uh oh!

hazzlim commented Oct 31, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Nov 4, 2025

Uh oh!

StephanTLavavej commented Nov 4, 2025

Uh oh!

hazzlim commented Nov 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

StephanTLavavej commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hazzlim commented Oct 31, 2025

Uh oh!

hazzlim commented Oct 31, 2025

Uh oh!

hazzlim commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Nov 4, 2025

Uh oh!

StephanTLavavej commented Nov 4, 2025

Uh oh!

hazzlim commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hazzlim commented Oct 31, 2025 •

edited

Loading

hazzlim commented Nov 5, 2025 •

edited

Loading