<regex>: Move saved match state to the heap#5682
Merged
StephanTLavavej merged 1 commit intomicrosoft:mainfrom Aug 25, 2025
Merged
<regex>: Move saved match state to the heap#5682StephanTLavavej merged 1 commit intomicrosoft:mainfrom
<regex>: Move saved match state to the heap#5682StephanTLavavej merged 1 commit intomicrosoft:mainfrom
Conversation
StephanTLavavej
approved these changes
Aug 22, 2025
Member
|
Thanks as always for the detailed explanation and careful changes, looks perfect! 😻 |
Member
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Member
|
Thanks for taking the first step towards this long-awaited impossible dream! 😻 🏃 🎉 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the first minor step on the journey of making the matcher non-recursive to deal with #997 and #1528.
This moves all instances of
_Tgt_stateor_Bt_statefrom the stack to the heap (into avector). These objects store the current match state and are used to restore it when backtracking. In functions, the_Tgt_stateobjects in the vector are referenced using their indices (and not using pointers because the vector might reallocate).Note that we have to pop the frames from the stack whenever we return during backtracking, but we don't have to worry about exceptions: If an exception occurs, the whole
_Matcher2object will be destructed.Since these objects are large, this slightly reduces stack pressure: Copying #997 to a test and running it with the usual matrix, the first asan configs now start failing after 258 instead of 249 repetitions.
This PR adds no additional tests because all existing regex tests already provide coverage for this change and there is no obvious gap where further tests are needed. It does add internal checks, though, to verify that we don't accidentally fail to pop from the stack somewhere when backtracking.
The impact on performance is mixed: On the one hand, the change can improve performance when regex matching involves a substantial amount of repeated backtracking, because it avoids reallocations (and
regex_searchusually backtracks completely at least once because the initial match fails). On the other hand, the new vector performs additional allocations, which noticeably worsens performance when individual_Tgt_state/_Bt_stateobjects don't allocate at all after #5518, such as the patterns(:?bibe)+and((?!lorem)bibe)in theregex_searchbenchmark. (In the long run, we probably want to use a vector variant implementing small vector optimization here.)Benchmark for
regex_search: