-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Use the Boyer-Moore search algorithm for String.IndexOf of large strings #6560
Description
The current implementation of String.IndexOf(string) uses a naive loop to check whether a substring of some given text matches a pattern. Although this implementation is simple and easy to understand, it is very inefficient because it has to iterate through a minimum of m - n + 1 characters of the text, where m is the length of the text and n is the length of the pattern string.
We should instead use the Boyer-Moore search algorithm to implement this function. It performs much better for larger patterns, since it allows us to 'skip over' some characters based on what we know from pre-processing the string. Here is an answer on StackOverflow that explains how it works, and the Wikipedia link has sample implementations in C/Java.
edit: Took me a day to wrap my head around the algorithm, but I have an implementation here and it seems to work well.
edit 2: Maybe this would benefit functions like Replace or Split more. Those functions have to make a pass through the entire string anyway, whereas with IndexOf we stop at the first match.