Skip to content

Implement TC39 duplicate named capturing groups proposal#136

Merged
ridiculousfish merged 4 commits intoridiculousfish:masterfrom
n-faria:duplicate-named-groups
Dec 7, 2025
Merged

Implement TC39 duplicate named capturing groups proposal#136
ridiculousfish merged 4 commits intoridiculousfish:masterfrom
n-faria:duplicate-named-groups

Conversation

@n-faria
Copy link
Copy Markdown
Contributor

@n-faria n-faria commented Nov 27, 2025

Implements the Stage 4 TC39 proposal for duplicate named capturing groups in regular expressions. This allows the same group name to be used in different alternatives separated by the | operator.

Examples that now work:

  • /(?[0-9]{4})-[0-9]{2}|[0-9]{2}-(?[0-9]{4})/
  • /(?a)(?b)|(?c)(?d)/

Duplicate names are only allowed when they appear in different alternatives. Using the same name twice in the same alternative (e.g., /(?a)(?b)/) still correctly produces an error.

The implementation tracks alternative paths during parsing and deduplicates named groups in match results, ensuring only one property per name appears in the groups object, as specified by the TC39 proposal.

Changes:

  • Modified parse_capture_groups() to track alternative branches
  • Updated NamedGroups iterator to deduplicate group names
  • Added comprehensive test cases

Note: Backreferences with duplicate named groups require additional matcher changes and are deferred to a future enhancement.

Refs: https://github.com/tc39/proposal-duplicate-named-capturing-groups

Implements the Stage 4 TC39 proposal for duplicate named capturing groups
in regular expressions. This allows the same group name to be used in
different alternatives separated by the | operator.

Examples that now work:
- /(?<year>[0-9]{4})-[0-9]{2}|[0-9]{2}-(?<year>[0-9]{4})/
- /(?<x>a)(?<y>b)|(?<x>c)(?<y>d)/

Duplicate names are only allowed when they appear in different alternatives.
Using the same name twice in the same alternative (e.g., /(?<name>a)(?<name>b)/)
still correctly produces an error.

The implementation tracks alternative paths during parsing and deduplicates
named groups in match results, ensuring only one property per name appears
in the groups object, as specified by the TC39 proposal.

Changes:
- Modified parse_capture_groups() to track alternative branches
- Updated NamedGroups iterator to deduplicate group names
- Added comprehensive test cases

Note: Backreferences with duplicate named groups require additional
matcher changes and are deferred to a future enhancement.

Refs: https://github.com/tc39/proposal-duplicate-named-capturing-groups
@autofix-troubleshooter
Copy link
Copy Markdown

Hi! I'm the autofix logoautofix.ci troubleshooter bot.

It looks like you correctly set up a CI job that uses the autofix.ci GitHub Action, but the autofix.ci GitHub App has not been installed for this repository. This means that autofix.ci unfortunately does not have the permissions to fix this pull request. If you are the repository owner, please install the app and then restart the CI workflow! 😃

@n-faria n-faria force-pushed the duplicate-named-groups branch from 15fdc45 to 7cd6cd7 Compare November 27, 2025 12:19
n-faria pushed a commit to n-faria/boa that referenced this pull request Nov 27, 2025
This implements the TC39 Stage 4 proposal for duplicate named capturing
groups in regular expressions.

Implementation is in the regress crate (ridiculousfish/regress#136).
Once that PR is merged and a new version is published, this PR will be
updated with the regress version bump.

Fixes boa-dev#4442

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
n-faria pushed a commit to n-faria/boa that referenced this pull request Nov 27, 2025
This implements the TC39 Stage 4 proposal for duplicate named capturing
groups in regular expressions.

Implementation is in the regress crate (ridiculousfish/regress#136).
Once that PR is merged and a new version is published, this PR will be
updated with the regress version bump.

Fixes boa-dev#4442
Copy link
Copy Markdown
Owner

@ridiculousfish ridiculousfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start and less invasive than I would have expected!

I found the alt_indices and named_groups "paths" in parse_capture_groups to be hard to follow - let's try to simplify this. But this looks very promising!

Comment thread src/api.rs Outdated
// Check if we've already returned this name (by looking backwards)
let already_seen = self.mat.group_names[..idx]
.iter()
.any(|n| n.as_ref() == name && !n.is_empty());
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can omit the !n.is_empty() check as name is never empty.

Comment thread src/api.rs
Comment thread src/parse.rs Outdated
// Two paths conflict if they share the same alternative indices
// at all common depth levels (i.e., one is a prefix of the other
// or they're identical up to the shallower depth).
if let Some(existing_paths) = named_groups.get(&name) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is likely simpler to construct the set of NamedCaptureGroups up-front and then look for conflicts in a separate path, rather than intermingling them.

Comment thread src/parse.rs Outdated
alt_indices.insert(0, 0);

// Map from group name to the set of alternative paths where it appears.
let mut named_groups: HashMap<String, Vec<Vec<(usize, usize)>>> = HashMap::new();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this named_groups type to be too confusing - please think of ways to simplify it. In particular I'm not sure what "paths" are in this context.

(Another possibility is that parse_capture_groups doesn't actually do the check - instead that check happens at the IR level - it would be straightforward to do this analysis with the recursive Node structure. But if it can be made to work cleanly here that's also good)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in reference to the alternative paths, there's some edgecases involving those. I've made it a bit clearer and added some more comments, as well as addressed the other feedback. Thanks!

- Refactor parse_capture_groups into two-pass approach for clarity
- Add AlternativePath struct instead of Vec<(usize, usize)>
- Separate collection phase from conflict checking phase
- Optimize NamedGroups iterator: remove redundant check and add break
- Fix clippy warning about iterating map values
@n-faria n-faria force-pushed the duplicate-named-groups branch from 8d767c9 to e8c7e15 Compare December 1, 2025 18:01
@ridiculousfish ridiculousfish merged commit 5096dc9 into ridiculousfish:master Dec 7, 2025
33 checks passed
@ridiculousfish
Copy link
Copy Markdown
Owner

Merged, thank you!! Awesome contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants