-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Background and motivation
It turns out it's quite common to want to search for whitespace (char.IsWhiteSpace) or things other than whitespace (!char.IsWhiteSpace). This is true not only in regex (\s and \S) (according to our nuget regex database there are ~13,000 occurrences of a regex that's simply \s) but also in many open-coded loops, e.g.
- https://github.com/dotnet/sdk/blob/4a27d47440dd3eaefbdb31135ef8867f6758161f/src/Tasks/Microsoft.NET.Build.Tasks/LockFileExtensions.cs#L127-L138
- https://github.com/JimmyCushnie/JimmysUnityUtilities/blob/834059548b2b392d692ddcf28194692e3ae7b2c1/Scripts/Extensions/Csharp%20types/StringBuilderExtensions.cs#L68-L77
- https://github.com/cake-build/cake/blob/a0298c0b5f76f819f0cc0d16ac9ef55d8b26adf9/src/Cake.Core/Configuration/Parser/ConfigurationParser.cs#L107-L117
- https://github.com/rubberduck-vba/Rubberduck/blob/3a9b233cf6ab519773d188e77d09ee8d8111bf49/Rubberduck.Core/UI/Refactorings/AnnotateDeclaration/AnnotationArgumentViewModel.cs#L195-L198
- https://github.com/aspnet/Razor/blob/5439cfe540084edd673b7ed626f2ec9cf3f13b18/src/Microsoft.AspNetCore.Razor.Language/DirectiveTokenEditHandler.cs#L36-L47
- https://github.com/OmniSharp/omnisharp-roslyn/blob/3ae5c8acd7ea3f03ab9e24c28280a320c573721a/src/OmniSharp.Cake/Configuration/Parser/ConfigurationParser.cs#L94-L104
- https://github.com/Unity-Technologies/UnityCsReference/blob/332310b494c5416cdae6c1209dbae7cfa6847c8d/Editor/Mono/Scripting/Compilers/MicrosoftResponseFileParser.cs#L195-L202
- https://github.com/PowerShell/PowerShell/blob/a2ee05400f8cb4a44cd87742f95ebc2c3472e649/src/System.Management.Automation/engine/parser/DebugViewWriter.cs#L1197-L1205
- https://github.com/mono/mono/blob/e2c5f4b0ad1a6b21ca0735f0b35b8611d4ad87b3/mcs/class/referencesource/System.Core/Microsoft/Scripting/Ast/DebugViewWriter.cs#L1155-L1162
- https://github.com/stripe/stripe-dotnet/blob/42dbc8371c5a4ee36df8933e6d72c2c2e3e41d2e/src/Stripe.net/Infrastructure/StringUtils.cs#L9
- https://github.com/mono/mono/blob/e2c5f4b0ad1a6b21ca0735f0b35b8611d4ad87b3/mcs/class/referencesource/System.Web/UI/Util.cs#L995-L1002
- https://github.com/VahidN/EFSecondLevelCache.Core/blob/1de038417ba22c40d9ebe411b67c9e1a7e4ad838/src/EFSecondLevelCache.Core/EFQueryExpressionVisitor.cs#L883-L893
- https://github.com/InstaSharp/InstaSharp/blob/7ab2aad6bdef175dd63620bf39f74fcf02696898/src/InstaSharp/Extensions/StringExtensions.cs#L13-L18
- https://github.com/FirelyTeam/Fhir.Metrics/blob/dd574b76077280299fd7104c754481a2e143ca72/src/Fhir.Metrics/Utils/Parser.cs#L57
- https://github.com/baohaojun/beagrep/blob/b1d56ef14d1d663d43b6af198600caa21623d2f2/Util/StringFu.cs#L388-L394
runtime/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Globalization.cs
Lines 17 to 25 in 264d739
public static bool IsWhiteSpace(this ReadOnlySpan<char> span) { for (int i = 0; i < span.Length; i++) { if (!char.IsWhiteSpace(span[i])) return false; } return true; } runtime/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/DebugViewWriter.cs
Lines 1194 to 1204 in 264d739
private static bool ContainsWhiteSpace(string name) { foreach (char c in name) { if (char.IsWhiteSpace(c)) { return true; } } return false; } runtime/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Trim.cs
Lines 568 to 589 in 264d739
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span) { int start = 0; for (; start < span.Length; start++) { if (!char.IsWhiteSpace(span[start])) { break; } } int end = span.Length - 1; for (; end > start; end--) { if (!char.IsWhiteSpace(span[end])) { break; } } return span.Slice(start, end - start + 1); }
Etc. We should expose these as dedicated helpers, whether or not we're able to improve performance over a simple loop (we might be able to, for at least some kinds of input).
API Proposal
namespace System;
public static class MemoryExtensions
{
+ public static int IndexOfAnyWhiteSpace(this ReadOnlySpan<char> span);
+ public static int IndexOfAnyExceptWhiteSpace(this ReadOnlySpan<char> span);
+ public static int LastIndexOfAnyWhiteSpace(this ReadOnlySpan<char> span);
+ public static int LastIndexOfAnyExceptWhiteSpace(this ReadOnlySpan<char> span);
}- This is only proposed for
ReadOnlySpan<char>and not alsoSpan<char>, since the most common case by far is expected to be spans derived from strings. The existing MemoryExtensions.IsWhiteSpace is also only exposed forReadOnlySpan<char>.
API Usage
e.g. MemoryExtensions.IsWhiteSpace could be rewritten as simply:
public static bool IsWhiteSpace(this ReadOnlySpan<char> span) => span.IndexOfAnyExceptWhiteSpace() < 0;Alternative Designs
If we want to expose these but don't want them to be so prominent, once #68328 is implemented (assuming it sticks with the proposed design), this could instead be exposed as a static property on IndexOfAnyValues:
public static class IndexOfAnyValues
{
+ public static IndexOfAnyValues<char> WhiteSpace { get; }
}in which case the same functionality could be achieved with:
int wsIndex = span.IndexOfAny(IndexOfAnyValues.WhiteSpace); // or IndexOfAnyExceptThe WhiteSpace property would cache a specialized concrete implementation that does what the proposed IndexOfAnyWhiteSpace would do.
Risks
No response