Implement TC39 duplicate named capturing groups proposal#136
Implement TC39 duplicate named capturing groups proposal#136ridiculousfish merged 4 commits intoridiculousfish:masterfrom
Conversation
Implements the Stage 4 TC39 proposal for duplicate named capturing groups
in regular expressions. This allows the same group name to be used in
different alternatives separated by the | operator.
Examples that now work:
- /(?<year>[0-9]{4})-[0-9]{2}|[0-9]{2}-(?<year>[0-9]{4})/
- /(?<x>a)(?<y>b)|(?<x>c)(?<y>d)/
Duplicate names are only allowed when they appear in different alternatives.
Using the same name twice in the same alternative (e.g., /(?<name>a)(?<name>b)/)
still correctly produces an error.
The implementation tracks alternative paths during parsing and deduplicates
named groups in match results, ensuring only one property per name appears
in the groups object, as specified by the TC39 proposal.
Changes:
- Modified parse_capture_groups() to track alternative branches
- Updated NamedGroups iterator to deduplicate group names
- Added comprehensive test cases
Note: Backreferences with duplicate named groups require additional
matcher changes and are deferred to a future enhancement.
Refs: https://github.com/tc39/proposal-duplicate-named-capturing-groups
|
Hi! I'm the It looks like you correctly set up a CI job that uses the autofix.ci GitHub Action, but the autofix.ci GitHub App has not been installed for this repository. This means that autofix.ci unfortunately does not have the permissions to fix this pull request. If you are the repository owner, please install the app and then restart the CI workflow! 😃 |
15fdc45 to
7cd6cd7
Compare
This implements the TC39 Stage 4 proposal for duplicate named capturing groups in regular expressions. Implementation is in the regress crate (ridiculousfish/regress#136). Once that PR is merged and a new version is published, this PR will be updated with the regress version bump. Fixes boa-dev#4442 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
This implements the TC39 Stage 4 proposal for duplicate named capturing groups in regular expressions. Implementation is in the regress crate (ridiculousfish/regress#136). Once that PR is merged and a new version is published, this PR will be updated with the regress version bump. Fixes boa-dev#4442
ridiculousfish
left a comment
There was a problem hiding this comment.
Great start and less invasive than I would have expected!
I found the alt_indices and named_groups "paths" in parse_capture_groups to be hard to follow - let's try to simplify this. But this looks very promising!
| // Check if we've already returned this name (by looking backwards) | ||
| let already_seen = self.mat.group_names[..idx] | ||
| .iter() | ||
| .any(|n| n.as_ref() == name && !n.is_empty()); |
There was a problem hiding this comment.
Can omit the !n.is_empty() check as name is never empty.
| // Two paths conflict if they share the same alternative indices | ||
| // at all common depth levels (i.e., one is a prefix of the other | ||
| // or they're identical up to the shallower depth). | ||
| if let Some(existing_paths) = named_groups.get(&name) { |
There was a problem hiding this comment.
I think it is likely simpler to construct the set of NamedCaptureGroups up-front and then look for conflicts in a separate path, rather than intermingling them.
| alt_indices.insert(0, 0); | ||
|
|
||
| // Map from group name to the set of alternative paths where it appears. | ||
| let mut named_groups: HashMap<String, Vec<Vec<(usize, usize)>>> = HashMap::new(); |
There was a problem hiding this comment.
I find this named_groups type to be too confusing - please think of ways to simplify it. In particular I'm not sure what "paths" are in this context.
(Another possibility is that parse_capture_groups doesn't actually do the check - instead that check happens at the IR level - it would be straightforward to do this analysis with the recursive Node structure. But if it can be made to work cleanly here that's also good)
There was a problem hiding this comment.
It's in reference to the alternative paths, there's some edgecases involving those. I've made it a bit clearer and added some more comments, as well as addressed the other feedback. Thanks!
e12d571 to
ff2f83e
Compare
- Refactor parse_capture_groups into two-pass approach for clarity - Add AlternativePath struct instead of Vec<(usize, usize)> - Separate collection phase from conflict checking phase - Optimize NamedGroups iterator: remove redundant check and add break - Fix clippy warning about iterating map values
8d767c9 to
e8c7e15
Compare
|
Merged, thank you!! Awesome contribution |
Implements the Stage 4 TC39 proposal for duplicate named capturing groups in regular expressions. This allows the same group name to be used in different alternatives separated by the | operator.
Examples that now work:
Duplicate names are only allowed when they appear in different alternatives. Using the same name twice in the same alternative (e.g., /(?a)(?b)/) still correctly produces an error.
The implementation tracks alternative paths during parsing and deduplicates named groups in match results, ensuring only one property per name appears in the groups object, as specified by the TC39 proposal.
Changes:
Note: Backreferences with duplicate named groups require additional matcher changes and are deferred to a future enhancement.
Refs: https://github.com/tc39/proposal-duplicate-named-capturing-groups