Skip to content

Use the Boyer-Moore search algorithm for String.IndexOf of large strings #6560

@jamesqo

Description

@jamesqo

The current implementation of String.IndexOf(string) uses a naive loop to check whether a substring of some given text matches a pattern. Although this implementation is simple and easy to understand, it is very inefficient because it has to iterate through a minimum of m - n + 1 characters of the text, where m is the length of the text and n is the length of the pattern string.

We should instead use the Boyer-Moore search algorithm to implement this function. It performs much better for larger patterns, since it allows us to 'skip over' some characters based on what we know from pre-processing the string. Here is an answer on StackOverflow that explains how it works, and the Wikipedia link has sample implementations in C/Java.

edit: Took me a day to wrap my head around the algorithm, but I have an implementation here and it seems to work well.

edit 2: Maybe this would benefit functions like Replace or Split more. Those functions have to make a pass through the entire string anyway, whereas with IndexOf we stop at the first match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-System.Runtimedesign-discussionOngoing discussion about design without consensusenhancementProduct code improvement that does NOT require public API changes/additionstenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions