Improve Adler32 vectorization by saucecontrol · Pull Request #125191 · dotnet/runtime

saucecontrol · 2026-03-04T21:27:55Z

This replaces the vectorized Adler32 implementation added in #124409

Major Differences

Removes the Vector512 implementation, which was about 20% slower than the Vector256 implementation on compatible hardware.
Improves the performance of the Vector256 implementation by taking better advantage of pipelining to compensate for high-latency instructions.
Handles smaller-than-vector tails with SIMD, avoiding potentially long scalar loops.
Avoids dropping to scalar every NMax bytes, speeding large input processing.
Adds an Armv8.2 DP implementation

In all, this amounts to a roughly 2x perf increase on large inputs, and even more on small inputs that are not an even multiple of vector size.

Benchmark Summary

x64

Method	InputLength	Mean	Error	StdDev	Ratio	RatioSD	Code Size
Main	16384	323.930 ns	1.1096 ns	1.0379 ns	1.00	0.00	682 B
PR	16384	176.340 ns	0.3327 ns	0.2778 ns	0.54	0.00	801 B

Arm64

Method	InputLength	Mean	Error	StdDev	Ratio	RatioSD
Main	16384	868.077 ns	10.2150 ns	9.5551 ns	1.00	0.02
PR	16384	425.371 ns	1.2299 ns	1.0903 ns	0.49	0.01

Detailed Benchmark Results

-----> In Here <-----

AMD AVX-512 (Zen 5)


BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7462/25H2/2025Update/HudsonValley2)
AMD Ryzen AI 9 HX 370 w/ Radeon 890M 2.00GHz, 1 CPU, 24 logical and 12 physical cores
.NET SDK 10.0.102
  [Host]     : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4
  DefaultJob : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4

Method	InputLength	Mean	Error	StdDev	Ratio	RatioSD	Code Size
Main	16	4.416 ns	0.0152 ns	0.0134 ns	1.00	0.00	281 B
PR	16	3.357 ns	0.0155 ns	0.0145 ns	0.76	0.00	761 B

Main	24	5.775 ns	0.0136 ns	0.0114 ns	1.00	0.00	284 B
PR	24	3.992 ns	0.0134 ns	0.0126 ns	0.69	0.00	752 B

Main	31	7.000 ns	0.0310 ns	0.0290 ns	1.00	0.01	284 B
PR	31	3.968 ns	0.0080 ns	0.0066 ns	0.57	0.00	752 B

Main	32	4.072 ns	0.0115 ns	0.0096 ns	1.00	0.00	518 B
PR	32	3.318 ns	0.0095 ns	0.0084 ns	0.81	0.00	761 B

Main	48	8.376 ns	0.0165 ns	0.0154 ns	1.00	0.00	520 B
PR	48	3.520 ns	0.0137 ns	0.0121 ns	0.42	0.00	752 B

Main	64	4.789 ns	0.0079 ns	0.0070 ns	1.00	0.00	674 B
PR	64	4.331 ns	0.0059 ns	0.0049 ns	0.90	0.00	781 B

Main	95	14.783 ns	0.2521 ns	0.3366 ns	1.00	0.03	675 B
PR	95	5.316 ns	0.0160 ns	0.0150 ns	0.36	0.01	763 B

Main	127	22.506 ns	0.0464 ns	0.0434 ns	1.00	0.00	1,033 B
PR	127	5.541 ns	0.0118 ns	0.0099 ns	0.25	0.00	754 B

Main	128	5.551 ns	0.0170 ns	0.0151 ns	1.00	0.00	674 B
PR	128	4.899 ns	0.0076 ns	0.0059 ns	0.88	0.00	781 B

Main	224	11.186 ns	0.0167 ns	0.0157 ns	1.00	0.00	1,031 B
PR	224	5.712 ns	0.0098 ns	0.0087 ns	0.51	0.00	768 B

Main	1000	28.755 ns	0.5615 ns	0.5252 ns	1.00	0.02	1,041 B
PR	1000	13.703 ns	0.0371 ns	0.0347 ns	0.48	0.01	783 B

Main	1024	20.507 ns	0.0764 ns	0.0715 ns	1.00	0.00	682 B
PR	1024	13.087 ns	0.0274 ns	0.0257 ns	0.64	0.00	801 B

Main	4096	80.564 ns	1.6367 ns	2.2944 ns	1.00	0.04	682 B
PR	4096	45.615 ns	0.1160 ns	0.1085 ns	0.57	0.02	801 B

Main	16384	323.930 ns	1.1096 ns	1.0379 ns	1.00	0.00	682 B
PR	16384	176.340 ns	0.3327 ns	0.2778 ns	0.54	0.00	801 B

Arm64 (Windows Dev Kit 2023)


BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7840/25H2/2025Update/HudsonValley2)
Snapdragon Compute Platform 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK 10.0.200-preview.0.26103.119
  [Host]     : .NET 10.0.3 (10.0.3, 10.0.326.7603), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 10.0.3 (10.0.3, 10.0.326.7603), Arm64 RyuJIT armv8.0-a

Method	InputLength	Mean	Error	StdDev	Ratio	RatioSD
Main	16	9.931 ns	0.1664 ns	0.1557 ns	1.00	0.02
PR	16	2.778 ns	0.0080 ns	0.0071 ns	0.28	0.00

Main	24	11.030 ns	0.0148 ns	0.0132 ns	1.00	0.00
PR	24	4.248 ns	0.0078 ns	0.0065 ns	0.39	0.00

Main	31	14.650 ns	0.0238 ns	0.0222 ns	1.00	0.00
PR	31	4.386 ns	0.0044 ns	0.0039 ns	0.30	0.00

Main	32	5.378 ns	0.0230 ns	0.0215 ns	1.00	0.01
PR	32	4.162 ns	0.0100 ns	0.0084 ns	0.77	0.00

Main	48	11.665 ns	0.0128 ns	0.0120 ns	1.00	0.00
PR	48	4.811 ns	0.0137 ns	0.0114 ns	0.41	0.00

Main	64	6.777 ns	0.0133 ns	0.0125 ns	1.00	0.00
PR	64	5.436 ns	0.0094 ns	0.0088 ns	0.80	0.00

Main	95	20.609 ns	0.0270 ns	0.0252 ns	1.00	0.00
PR	95	7.929 ns	0.0173 ns	0.0162 ns	0.38	0.00

Main	127	21.668 ns	0.0204 ns	0.0181 ns	1.00	0.00
PR	127	9.041 ns	0.0157 ns	0.0131 ns	0.42	0.00

Main	128	9.553 ns	0.0149 ns	0.0140 ns	1.00	0.00
PR	128	7.310 ns	0.0162 ns	0.0144 ns	0.77	0.00

Main	224	14.551 ns	0.0183 ns	0.0162 ns	1.00	0.00
PR	224	9.784 ns	0.0134 ns	0.0118 ns	0.67	0.00

Main	1000	57.058 ns	0.1198 ns	0.1062 ns	1.00	0.00
PR	1000	32.267 ns	0.0572 ns	0.0535 ns	0.57	0.00

Main	1024	58.382 ns	0.1277 ns	0.1195 ns	1.00	0.00
PR	1024	30.900 ns	0.0654 ns	0.0579 ns	0.53	0.00

Main	4096	218.426 ns	2.2794 ns	2.0207 ns	1.00	0.01
PR	4096	111.126 ns	0.6393 ns	0.5668 ns	0.51	0.01

Main	16384	868.077 ns	10.2150 ns	9.5551 ns	1.00	0.02
PR	16384	425.371 ns	1.2299 ns	1.0903 ns	0.49	0.01

Intel AVX2 (Skylake)


BenchmarkDotNet v0.15.8, Windows 10 (10.0.19045.6456/22H2/2022Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 10.0.103
  [Host]     : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v3

Method	InputLength	Mean	Error	StdDev	Ratio	RatioSD	Code Size
Main	16	7.880 ns	0.1897 ns	0.2258 ns	1.00	0.04	252 B
PR	16	6.551 ns	0.0438 ns	0.0409 ns	0.83	0.02	790 B

Main	24	10.550 ns	0.0987 ns	0.0875 ns	1.00	0.01	252 B
PR	24	8.161 ns	0.0735 ns	0.0574 ns	0.77	0.01	781 B

Main	31	13.016 ns	0.0548 ns	0.0513 ns	1.00	0.01	252 B
PR	31	8.073 ns	0.0281 ns	0.0249 ns	0.62	0.00	781 B

Main	32	8.517 ns	0.0444 ns	0.0393 ns	1.00	0.01	486 B
PR	32	6.719 ns	0.0363 ns	0.0322 ns	0.79	0.01	790 B

Main	48	13.106 ns	0.2872 ns	0.2949 ns	1.00	0.03	488 B
PR	48	8.041 ns	0.0837 ns	0.0654 ns	0.61	0.01	781 B

Main	64	9.075 ns	0.0403 ns	0.0377 ns	1.00	0.01	486 B
PR	64	9.077 ns	0.0366 ns	0.0342 ns	1.00	0.01	814 B

Main	95	22.044 ns	0.4640 ns	0.5157 ns	1.00	0.03	488 B
PR	95	11.108 ns	0.0211 ns	0.0187 ns	0.50	0.01	796 B

Main	127	20.746 ns	0.1126 ns	0.1054 ns	1.00	0.01	488 B
PR	127	11.440 ns	0.0833 ns	0.0650 ns	0.55	0.00	787 B

Main	128	10.805 ns	0.0351 ns	0.0293 ns	1.00	0.00	486 B
PR	128	10.097 ns	0.0329 ns	0.0291 ns	0.93	0.00	814 B

Main	224	13.447 ns	0.2395 ns	0.2123 ns	1.00	0.02	486 B
PR	224	11.717 ns	0.0990 ns	0.0773 ns	0.87	0.01	805 B

Main	1000	33.921 ns	0.1759 ns	0.1646 ns	1.00	0.01	498 B
PR	1000	26.133 ns	0.1194 ns	0.1059 ns	0.77	0.00	815 B

Main	1024	32.486 ns	0.1178 ns	0.0919 ns	1.00	0.00	502 B
PR	1024	27.179 ns	0.1521 ns	0.1187 ns	0.84	0.00	833 B

Main	4096	112.877 ns	1.5260 ns	1.2743 ns	1.00	0.02	502 B
PR	4096	72.805 ns	0.4396 ns	0.4112 ns	0.65	0.01	833 B

Main	16384	442.728 ns	8.2479 ns	7.3115 ns	1.00	0.02	502 B
PR	16384	271.680 ns	1.1137 ns	1.0418 ns	0.61	0.01	833 B

dotnet-policy-service · 2026-03-04T21:29:07Z

Tagging subscribers to this area: @dotnet/area-system-io-hashing, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR refactors Adler-32’s SIMD implementation in System.IO.Hashing to a new strategy-based vectorized core, updates tests to stress delayed-modulo overflow scenarios, and wires the new SIMD source file into the build.

Changes:

Added a new SIMD implementation (Adler32Simd.cs) with AVX2 / SSSE3 / Arm64 (incl. DP) selection and shared vectorized update core.
Updated Adler32 to route vectorized updates through the new implementation and adjusted constant visibility.
Modified Adler32 tests to better stress overflow safety and expanded length coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`src/libraries/System.IO.Hashing/tests/Adler32Tests.cs`	Updates large-input overflow-stress test and expands length coverage; removes a previous all-0xFF reference test.
`src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32Simd.cs`	Introduces the new SIMD implementation and strategy abstractions for vectorized Adler32 updates.
`src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs`	Simplifies vectorization gating and delegates SIMD updating to the new implementation; exposes `ModBase` internally.
`src/libraries/System.IO.Hashing/src/System.IO.Hashing.csproj`	Includes the new `Adler32Simd.cs` for .NETCoreApp builds.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32Simd.cs

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

saucecontrol · 2026-03-04T23:13:05Z

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs

-            {
-                data[i] = (byte)('a' + (i % 26));
-            }
-


This test (and the other removed test) didn't check the actual boundary condition. For example, if NMax is changed to 8192 in the Adler32 implementation, the tests still pass.

The updated test correctly breaks if NMax is set as small as 5553.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32Simd.cs

saucecontrol · 2026-03-04T23:21:01Z

cc @tannergooding @stephentoub

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs

src/libraries/System.IO.Hashing/src/System.IO.Hashing.csproj

tannergooding · 2026-03-09T20:02:15Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+        => Vector128.IsHardwareAccelerated && source.Length >= Vector128<byte>.Count;
+
+    private static uint UpdateVectorized(uint adler, ReadOnlySpan<byte> source)
+        => Adler32Simd.UpdateVectorized(adler, source);


Why separate it out like this?

The JIT tends to special case 1 level of inlining differently from 2+ levels of inlining and so simple forwarders like this can hurt things more than help.

The SIMD implementation is all in file-scoped types, so it has to be called from something in this file. I could make those types nested private, but since there are so many, I was trying to keep them entirely local. If you prefer the nested approach, I can easily change it, though I don't foresee any issues with inlining limits here given the core method is intentionally marked NoInlining.

tannergooding · 2026-03-09T20:07:30Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+    [MethodImpl(MethodImplOptions.AggressiveInlining)]
+    public static uint UpdateVectorized(uint adler, ReadOnlySpan<byte> source)
+    {
+        if (Vector256.IsHardwareAccelerated && Avx2.IsSupported)


What hardware were you testing AVX512 on? I wouldn't expect it to be slower than Vector256 at all and at worst the same speed here.

I'm on an AMD Zen5 (see full benchmark details in the PR description).

The AVX-512 implementation adds some extra high-latency calculations to the inner loop, so it's expected to be slower. It can't be made to match the perf of AVX2 using the same logic widened, because vpmaddubsw will overflow with the larger multipliers required for the wider inputs, and any adjustment made to prevent overflow breaks the pipelining. The whole thing is very latency-sensitive.

A fast Vector512 implementation could be done with the AVX10.2 unsigned dot product instructions (vpdpbuud), but I didn't bother with that since the hardware still doesn't exist in the wild. I could add it preemptively if you like.

You should be able to treat it as 2x256 instead of as 1x512 to avoid the wider multiplier issue and still get the perf gains.

Yes, that's what this PR does to improve 2x over main. Main uses 1x256 or 1x512. 2x256 is faster than either.

I would expect that actual 2x256 (i.e. effectively unrolling) should be slightly slower than using actual 512 and treating it as 2x256, namely due to the denser code and not needing to manually pipeline the instructions. I would not expect the V512 path to be slower.

This already gets instruction-level parallelism. At best, you could match the perf with V512, but that's a lot of extra complexity for nothing.

tannergooding · 2026-03-09T20:07:55Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+            if (Dp.IsSupported)
+            {
+                return UpdateCore<AdlerVector128, AccumulateArm64, DotProductArm64Dp>(adler, source);
+            }
+
+            return UpdateCore<AdlerVector128, AccumulateArm64, DotProductArm64>(adler, source);


What is the perf difference between these two paths? Is it worth the additional complexity here?

The DP implementation is roughly twice as fast as base AdvSimd. Without it, this PR is only about 5% faster than Main on Arm64.

tannergooding · 2026-03-09T20:09:14Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+            return UpdateCore<AdlerVector128, AccumulateArm64, DotProductArm64>(adler, source);
+        }
+
+        return UpdateCore<AdlerVector128, AccumulateXplat, DotProductXplat>(adler, source);


What is the perf difference of the above code paths with the xplat path (all platforms)?

Xplat is around 1/3 the speed of native on both x64 (if restricted to Vector128) and Arm64 (if restricted to AdvSimd base).

tannergooding · 2026-03-09T20:11:38Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+        // This is further optimized to: `high * 16 - high + low`
+        // and implemented as: `(high << 4) - high + low`.
+
+        Vector128<uint> vlo = values & (Vector128<uint>.AllBitsSet >>> 16);


why not Vector128.Create<uint>(ushort.MaxValue)?

I actually wrote the code as I wanted it to be interpreted by JIT, i.e. pcmpeqd+psrld, but it gets constant folded and treated as a memory load anyway.

tannergooding · 2026-03-09T20:13:07Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+            {
+                Vector128<byte> bytes1 = Vector128.LoadUnsafe(ref sourceRef);
+                Vector128<byte> bytes2 = Vector128.LoadUnsafe(ref sourceRef, (uint)Vector128<byte>.Count);
+                sourceRef = ref Unsafe.Add(ref sourceRef, (uint)Vector128<byte>.Count * 2);


We're looking at ways to reduce or otherwise remove unsafe code like this. While we can't really remove LoadUnsafe, we have found it significantly less error-prone to never update sourceRef and to track the relevant offset indices instead, as it reduces risk of accidental GC holes.

This is the exact same reference math done in the current implementation. What's changed between last week when that was approved and now?

A comment was given then too. Last week was namely just taking the existing code "as is" and extending it for the parameterization.

This is touching the code with a slightly more significant and non-critical rewrite (even with quite a lot of it being the same and just moved down for sharing). Since we're actively doing work to reduce unsafe usage where feasible, then ideally we fix this up rather than continuing to persist the problematic code.

I think you've mixed up the Adler32 and parameterized CRC32/64 PRs. Vectorized Adler32 was 100% new in #124409.

It's certainly possible to move to a buffer offset, but I think any code using LoadUnsafe requires scrutiny, and it's trivially provable that this code does not have the potential to create a GC hole.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

tannergooding · 2026-03-09T22:09:59Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+    public static Vector256<uint> DotProduct(Vector256<uint> addend, Vector256<byte> left, Vector256<byte> right)
+    {
+        Vector256<short> mad = Avx2.MultiplyAddAdjacent(left, right.AsSByte());
+        return Avx2.MultiplyAddAdjacent(mad, Vector256<short>.One).AsUInt32() + addend;


Shouldn't this second one be HorizontalAddSaturate. The multiply by 1 is unnecessarily adding 2 cycles.

It's a widen + add pairwise. There's actually a single dot product instruction that does it all in AvxVnni (vpdpbusd), but that creates a dependency on the uint accumulator that ends up making it just slightly slower than this form on the two Intel machines and one AMD machine I tried. Though, as mentioned above, I believe the unsigned form of dot product could be used to make a faster Vector512 implementation.

It's a widen + add pairwise

Right, but the whole setup here is effectively just doing Sum(left * right) (reducing down to multiple 32-bit results, instead of one 8-bit result), which I'm pretty sure can be simplified to less than 5 cycles on anything Skylake or newer

Right, and that's a dot product. The only widening dot product instructions I'm familiar with for x86 are VNNI. What exact instruction sequence are you thinking of?

tannergooding · 2026-03-09T22:12:33Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+                wps += ws1;
+
+                ws1 = Accumulate(ws1, bytes1, bytes2);


Why do we need to be doing a full reduction every loop iteration here for wps?

It seems like a simply widen + add and then only reduce outside the loop should be plenty sufficient here and likely provide a bigger perf win.

These are different accumulators that move at different rates. The only easy thing to factor out is the multiplication of the previous sum by the number of bytes that it would be added to each iteration, and that's already done.

tannergooding · 2026-03-09T22:16:59Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

+                ws2 = DotProduct(ws2, bytes1, weights1);
+                ws3 = DotProduct(ws3, bytes2, weights2);


Similar here. This is effectively just doing ws2 + Sum(bytes1 * weights1) and ws3 + Sum(bytes2 * weights2)

It isn't clear why the sum at this point is actually needed every inner loop iteration and why it couldn't be hoisted "out" so that it's done only in the outer loop

It could be hoisted out, but then you still have to widen each element before accumulating, which is still expensive. See the first attempt at Arm64 acceleration in Stephen's PR for an idea what that looks like. It's not faster.

Widening is significantly cheaper and more pipelineable (and at least on AVX512 has single instruction, single cycle versions that goto wider registers).

I would expect decent savings if we were only widening and not doing the reductions per inner loop iteration, particularly that would simplify the algorithm and allow better code sharing across all these platforms.

I gave that a quick try again (tried it before a long time ago but didn't keep the result as it wasn't worthwhile). It's still slower.

If you think you can do better than this implementation, be my guest. This was the best perf I could get, on every platform.

We're notably not strictly looking for the "best perf" on every platform.

Rather, we're looking for good enough perf given the complexity, expected payload sizes, and real world impact (not every function is going to be a bottleneck or matter if its taking 200ns vs 400ns).

So part of what's being considered here is whether the extra code complexity, generics, impact to NAOT image size, or other scenarios, etc are meaningful enough compared to just having the simpler and very slightly slower code.

-- With this being a case where I expect we can remove most of the per ISA customization and still get "close enough" or even matching on most hardware, particularly when not simply looking at "latest" Intel or AMD and rather at the broad range of typical hardware targets which tend to be a bit older (Haswell, Skylake, Ice Lake, Zen 2, etc).

extra code complexity

Although this is more lines of code, I'd argue it's less complex, simply because it's more consistent. The current implementation uses different logic for different vector widths, inconsistent variable names, etc. The new implementation uses the same skeleton for all, with only very small kernels abstracted away per-platform and with names that are easy to follow.

impact to NAOT image size

It should be noted, this generic approach is an improvement for NAOT code size, because e.g. on x64, instead of dynamically dispatching between up to 3 different implementations depending on ISA support and input size, this will choose exactly 1, which will always be used for any input >= 16 bytes.

In the case of Arm64, it will compile up to 2 potential versions of the core method, but it moves the ISA check out to the dispatcher, avoiding dynamic checks in the inner loop. And if 2x performance isn't good enough to justify a second copy, why are we bothering to implement Vector256 (not to mention the Vector512 implementation I got rid of and which you've argued to bring back)?

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs

improve Adler32 vectorization

d70bfc1

Copilot AI review requested due to automatic review settings March 4, 2026 21:27

github-actions bot added the area-System.IO.Hashing label Mar 4, 2026

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 4, 2026

Copilot started reviewing on behalf of saucecontrol March 4, 2026 21:28 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

feedback

9735c7e

saucecontrol marked this pull request as ready for review March 4, 2026 23:10

Copilot AI review requested due to automatic review settings March 4, 2026 23:10

saucecontrol commented Mar 4, 2026

View reviewed changes

Copilot AI reviewed Mar 4, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs Show resolved Hide resolved

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs Show resolved Hide resolved

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32Simd.cs Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of saucecontrol March 4, 2026 23:18 View session

This was referenced Mar 5, 2026

iOS tests failing with WORKLOAD TIMED OUT - Killing user command. #108103

Open

Android WebSocket failure #121518

Open

tidying

e379f9f

build-analysis bot mentioned this pull request Mar 5, 2026

[wasm] OpenQA.Selenium.WebDriverTimeoutException: timeout: Timed out receiving message from renderer #117486

Open

saucecontrol added 2 commits March 5, 2026 09:28

tidying

f35db78

more tidying, add more length asserts

d411032

Copilot AI review requested due to automatic review settings March 6, 2026 18:48

Copilot started reviewing on behalf of saucecontrol March 6, 2026 18:49 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs Show resolved Hide resolved

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs Show resolved Hide resolved

src/libraries/System.IO.Hashing/src/System.IO.Hashing.csproj Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Mar 6, 2026

[browser-wasm] System.Reflection.Context.Tests.ExtendedAssemblyTests2.GetModule_ReturnsModule failure #125245

Closed

Merge remote-tracking branch 'upstream/main' into adler32

b5dc5c2

tannergooding reviewed Mar 9, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs Outdated Show resolved Hide resolved

tannergooding reviewed Mar 9, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs Outdated Show resolved Hide resolved

tannergooding reviewed Mar 9, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs Outdated Show resolved Hide resolved

tannergooding reviewed Mar 9, 2026

View reviewed changes

address feedback

2a9af33

Copilot AI review requested due to automatic review settings March 10, 2026 00:14

Copilot started reviewing on behalf of saucecontrol March 10, 2026 00:15 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs Show resolved Hide resolved

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.Vectorized.cs Show resolved Hide resolved

		ws2 = DotProduct(ws2, bytes1, weights1);
		ws3 = DotProduct(ws3, bytes2, weights2);

Conversation

saucecontrol commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major Differences

Benchmark Summary

Detailed Benchmark Results

AMD AVX-512 (Zen 5)

Arm64 (Windows Dev Kit 2023)

Intel AVX2 (Skylake)

Uh oh!

dotnet-policy-service bot commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saucecontrol commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saucecontrol Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

saucecontrol commented Mar 4, 2026 •

edited

Loading

saucecontrol Mar 9, 2026 •

edited

Loading

saucecontrol Mar 9, 2026 •

edited

Loading

saucecontrol Mar 10, 2026 •

edited

Loading