Enable regex to use IndexOf(..., OrdinalIgnoreCase) for prefix searching#85438
Enable regex to use IndexOf(..., OrdinalIgnoreCase) for prefix searching#85438stephentoub merged 3 commits intodotnet:mainfrom
Conversation
As one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes.
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsAs one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes. With #85437, here's the benchmark https://github.com/dotnet/performance/blob/6dccc9979e9a99ebabee2a9b8b9e657c08c3f4a0/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Industry.cs#L86 on my machine:
|
As one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes.
With #85437, here's the benchmark https://github.com/dotnet/performance/blob/6dccc9979e9a99ebabee2a9b8b9e657c08c3f4a0/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Industry.cs#L86 on my machine:
Note that without #85437, this PR will result in some usage being slower, as the compiler / source generator is already doing the same approach as IndexOf(..., OrdinalIgnoreCase) does today of searching for a set of characters with IndexOfAny, but it's frequently picking a better set of characters to search for based on frequency analysis. So we shouldn't merge this without the other PR (though this does have other benefits, like simpler codegen).