<regex>: Successful negative lookahead assertions do not matche to capture groups#5269
<regex>: Successful negative lookahead assertions do not matche to capture groups#5269alexprabhat99 wants to merge 4 commits intomicrosoft:mainfrom
Conversation
|
I'd recommend running tests locally. |
| _Node_end_group* _Node = static_cast<_Node_end_group*>(_Nx); | ||
| _Node_capture* _Node0 = static_cast<_Node_capture*>(_Node->_Back); | ||
| if (_Cap || _Node0->_Idx != 0) { // update capture data | ||
| _Tgt_state._Grp_valid[_Node0->_Idx] = true; |
There was a problem hiding this comment.
This deletion does not just correct the matches for negative assertions. With this change, no capture group at all will ever be matched.
Changes to fix #5245 actually have to be applied here:
Lines 3575 to 3590 in fc15609
Specifically, _Tgt_state._Cur = _Ch; insufficiently resets the state after a successful negative assertion. (But it is the appropriate reset for positive assertions.)
There was a problem hiding this comment.
I have restored the initial state after a successful negative assertion.
There was a problem hiding this comment.
I think a test (like the one in #5245) should be added to verify that the behavior is correct after this PR.
Edit: It would also be nice to add test coverage which verifies that captures are recorded correctly for positive assertions as well.
| @@ -3583,7 +3580,11 @@ bool _Matcher<_BidIt, _Elem, _RxTraits, _It>::_Match_pat(_Node_base* _Nx) { // c | |||
| _Tgt_state = _St; | |||
There was a problem hiding this comment.
Pre-existing: This state reset after a match failure is both unnecessary and pointless, so I think it should be removed. No caller of _Match_pat() can assume that the match state hasn't changed after failure, because the function does not fully reset the state after trying to match two or more NFA nodes anyway.
stl/inc/regex
Outdated
| @@ -3583,7 +3580,11 @@ bool _Matcher<_BidIt, _Elem, _RxTraits, _It>::_Match_pat(_Node_base* _Nx) { // c | |||
| _Tgt_state = _St; | |||
| _Failed = true; | |||
| } else { | |||
There was a problem hiding this comment.
The opening brace after else leads to unnecessary nesting. You can just remove the braces and write else if instead.
stl/inc/regex
Outdated
| if (_Neg) { | ||
| _Tgt_state = _St; | ||
| } else { | ||
| _Tgt_state._Cur = _Ch; |
There was a problem hiding this comment.
Pre-existing: We can assign _St._Cur to _Tgt_state._Cur here instead of _Ch and eliminate the local variable _Ch. This should also decrease the amount of stack this function uses per call a bit, slightly reducing the danger of stack overflow (as in #997).
| } else if (!_Neg) { | ||
| _Tgt_state._Cur = _St._Cur; |
There was a problem hiding this comment.
This is not equivalent to the prior commit. Now the state is no longer being reset for a successful negative lookahead assertion at all.
(My comment about a superfluous _Tgt_state = _St; statement points a different line, namely the one in the failure case. The one removed by the last commit in the successful case isn't superfluous at all.)
This suggests we really need more test coverage for negative assertions if no test catches this.
|
Hi @alexprabhat99, do you have time to address the feedback here, or do you need help to get this ready? |
|
@muellerj2 There's absolutely no obligation or expectation, but if you know what needs to happen here and would like to pick up this PR, you can create a new one by starting with this one's commit and then adding whatever's necessary. @alexprabhat99 agreed to the CLA in #5261 (comment) so this should be perfectly fine, and keeping the original commits will result in co-author credit during merging. If you're too busy or would prefer not to, that's perfectly fine - I will look into this when I have time. |
Fixes: #5245