Optimize the start position of bounded look-behinds#11
Open
shilangyu wants to merge 7 commits intocaptureless-lookbehindsfrom
Open
Optimize the start position of bounded look-behinds#11shilangyu wants to merge 7 commits intocaptureless-lookbehindsfrom
shilangyu wants to merge 7 commits intocaptureless-lookbehindsfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the starting position calculation for bounded look-behinds in the regex automata engine. Key changes include:
- Replacing the flat list of look-behind start states with a tree structure to capture nesting.
- Constructing and sorting a vector of look-behind tuples in PikeVM to ensure correct evaluation order.
- Updating the compiler and builder components to handle the new look-behind tree structure.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| regex-automata/src/nfa/thompson/pikevm.rs | Updated look-behind evaluation logic and added sorting of look-behind states. |
| regex-automata/src/nfa/thompson/nfa.rs | Refactored look-behind representation from a vector of state IDs to a tree structure. |
| regex-automata/src/nfa/thompson/compiler.rs | Modified look-around compilation to incorporate look-behind offset and nesting. |
| regex-automata/src/nfa/thompson/builder.rs | Updated builder API to construct the look-behind tree according to nesting paths. |
Comments suppressed due to low confidence (3)
regex-automata/src/nfa/thompson/nfa.rs:1579
- [nitpick] The condition using preorder in the Debug implementation is not immediately clear. Adding an inline comment to clarify its intent would help future maintainers understand the logic.
.iter().any(|i| !i.preorder(&|e| e.start_id() != sid))
regex-automata/src/nfa/thompson/compiler.rs:1060
- [nitpick] Clarify in a comment the assumption behind combining relative_start and maximum_len to calculate start_offset, ensuring that the resulting value correctly bounds the look-around start position.
let start_offset = match (relative_start, maximum_len) {
regex-automata/src/nfa/thompson/builder.rs:722
- [nitpick] The new start_lookbehind API relies on a nesting path. Adding a brief example or more detailed inline comment about the structure and expected format of 'path' would improve clarity.
pub fn start_lookbehind(&mut self, start_id: StateID, offset_from_start: Option<usize>, path: &[usize]) {
shilangyu
commented
Jun 5, 2025
Comment on lines
+977
to
+979
| *self.lookaround_index.borrow_mut() = SmallIndex::ZERO; | ||
| *self.lookbehind_nesting_path.borrow_mut() = vec![0]; | ||
| *self.current_lookbehind_offset_from_start.borrow_mut() = Some(0); |
Collaborator
Author
There was a problem hiding this comment.
Can/should be moved to the Builder?
a4b50d3 to
4740ceb
Compare
b97fb5a to
3d13971
Compare
4740ceb to
7bf7ba8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.