Use the Boyer-Moore search algorithm for String.IndexOf of large strings

The current implementation of `String.IndexOf(string)` [uses a naive loop](https://github.com/dotnet/coreclr/blob/master/src/utilcode/newapis.cpp#L598-L608) to check whether a substring of some given text matches a pattern. Although this implementation is simple and easy to understand, it is very inefficient because it has to iterate through a minimum of _m - n + 1_ characters of the text, where _m_ is the length of the text and _n_ is the length of the pattern string.

We should instead use the [Boyer-Moore search algorithm](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm) to implement this function. It performs much better for larger patterns, since it allows us to 'skip over' some characters based on what we know from pre-processing the string. [Here](http://stackoverflow.com/questions/6207819/boyer-moore-algorithm-understanding-and-example) is an answer on StackOverflow that explains how it works, and the Wikipedia link has sample implementations in C/Java.

**edit:** Took me a day to wrap my head around the algorithm, but I have an implementation [here](https://gist.github.com/jamesqo/9a34e9edc4a9452a021653ff357a5a24) and it seems to work well.

**edit 2:** Maybe this would benefit functions like `Replace` or `Split` more. Those functions have to make a pass through the entire string anyway, whereas with `IndexOf` we stop at the first match.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the Boyer-Moore search algorithm for String.IndexOf of large strings #6560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use the Boyer-Moore search algorithm for String.IndexOf of large strings #6560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions