-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
The ProbabilisticMap (bloom filter) implementation we use in CoreLib to speed up IndexOfAny(char[]) operations has a vectorized path (Sse41.IsSupported || AdvSimd.Arm64.IsSupported).
The process is split into two parts: 1) building the map, and 2) searching through text using said map.
The actual data structure we end up with depends on whether vectorization is supported. That is, we must make the same decision re: vectorization support when building the map and when we're searching.
If they don't agree (e.g. SetCharBit assumes these intrinsics aren't supported, but IsCharBitSet does) we will report wrong results.
runtime/src/libraries/System.Private.CoreLib/src/System/SearchValues/ProbabilisticMap.cs
Lines 78 to 94 in 643087b
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | |
| private static void SetCharBit(ref uint charMap, byte value) | |
| { | |
| if (Sse41.IsSupported || AdvSimd.Arm64.IsSupported) | |
| { | |
| Unsafe.Add(ref Unsafe.As<uint, byte>(ref charMap), value & VectorizedIndexMask) |= (byte)(1u << (value >> VectorizedIndexShift)); | |
| } | |
| else | |
| { | |
| Unsafe.Add(ref charMap, value & PortableIndexMask) |= 1u << (value >> PortableIndexShift); | |
| } | |
| } | |
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | |
| private static bool IsCharBitSet(ref uint charMap, byte value) => Sse41.IsSupported || AdvSimd.Arm64.IsSupported | |
| ? (Unsafe.Add(ref Unsafe.As<uint, byte>(ref charMap), value & VectorizedIndexMask) & (1u << (value >> VectorizedIndexShift))) != 0 | |
| : (Unsafe.Add(ref charMap, value & PortableIndexMask) & (1u << (value >> PortableIndexShift))) != 0; |
This situation is called out in BOTR docs: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/vectors-and-intrinsics.md#code-review-and-analyzer-rules-for-code-written-in-systemprivatecorelibdll
Within a single function that uses platform intrinsics, unless marked with the
CompExactlyDependsOnattribute it must behave identically regardless of whether IsSupported returns true or not. This allows the R2R compiler to compile with a lower set of intrinsics support, and yet expect that the behavior of the function will remain unchanged in the presence of tiered compilation.
The issue is that if we do apply CompExactlyDependsOn to these methods, we'll run into a viral attribute explosion.
The code analyzer in corelib will tell us that ProbabilisticMap.Contains should be marked, then all its callers like string.MakeSeparatorListAny, then string.SplitInternal, then all string.Split overloads, then ...
What should we do in such situations?
Just mark the two root methods (SetCharBit, IsCharBitSet) as AggressiveOptimization?