Fix #186. by BurntSushi · Pull Request #223 · rust-lang/regex

BurntSushi · 2016-05-01T18:05:57Z

This enables RegexSets to short-circuit when:

All patterns are anchored to the beginning of the input.
All patterns have either matched or will never match.

We make this happen by checking whether all NFA states in a DFA state
are match states, when a DFA match is observed. If all NFA states are
match states, and since all match states are final states, we know that
the current set of matches will never change. Since we don't care about
reporting location information, we can quit.

N.B. If no matches can be found, then the DFA will short circuit using its
normal mechanism.

This enables RegexSets to short-circuit when: 1. All patterns are anchored to the beginning of the input. 2. All patterns have either matched or will never match. We make this happen by checking whether all NFA states in a DFA state are match states, when a DFA match is observed. If all NFA states are match states, and since all match states are final states, we know that the current set of matches will never change. Since we don't care about reporting location information, we can quit. N.B. If no matches can be found, then the DFA will short circuit using its normal mechanism.

BurntSushi · 2016-05-01T18:06:32Z

cc @dprien @birkenfeld

birkenfeld · 2016-05-01T18:28:49Z

Thanks for the note! You'll be pleased to know that for my use case, the timings dropped from

test test::benches::highlight_html_001x ... bench:   4,413,488 ns/iter (+/- 97,944)

to a nice

test test::benches::highlight_html_001x ... bench:     278,142 ns/iter (+/- 31,189)

Alas, there's still no speedup compared to sequentially matching all expressions, which clocks in at

test test::benches::highlight_html_001x ... bench:     272,105 ns/iter (+/- 10,361)

This is possibly quite dependent on the individual lexer (the number of regexes per state, for example), so I'll definitely continue to try RegexSet with different configurations as I add them.

BurntSushi · 2016-05-01T18:38:19Z

That is great news that RegexSet now at least as comparable performance.

And you're right, it is dependent. In particular, on the number of mismatches before finding a match. For example, if you order your regexes by most likely to match to least likely, then depending on the distribution, one could definitely get comparable performance.

In any case, thank you so much for testing out this PR and confirming that it is, in fact, fixed. :-)

BurntSushi merged commit 4458410 into master May 1, 2016

BurntSushi deleted the fix-186 branch May 1, 2016 19:04

This was referenced Oct 13, 2025

chore(deps): bump regex from 1.11.1 to 1.12.1 dirvine/sb#40

Closed

chore(deps): bump regex from 1.11.1 to 1.12.2 dirvine/sb#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #186.#223

Fix #186.#223
BurntSushi merged 1 commit intomasterfrom
fix-186

BurntSushi commented May 1, 2016

Uh oh!

BurntSushi commented May 1, 2016

Uh oh!

birkenfeld commented May 1, 2016

Uh oh!

BurntSushi commented May 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BurntSushi commented May 1, 2016

Uh oh!

BurntSushi commented May 1, 2016

Uh oh!

birkenfeld commented May 1, 2016

Uh oh!

BurntSushi commented May 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants