Skip to content

Implement regset#210

Closed
Keats wants to merge 4 commits into
rust-onig:mainfrom
Keats:main
Closed

Implement regset#210
Keats wants to merge 4 commits into
rust-onig:mainfrom
Keats:main

Conversation

@Keats

@Keats Keats commented Nov 15, 2025

Copy link
Copy Markdown
Contributor

No description provided.

Comment thread onig/examples/regset.rs
@@ -0,0 +1,50 @@
use onig::RegSet;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: i've put all the regset code in that file so it's self-contained, I don't know if you prefer to impl in eg find.rs for the captures

@Keats

Keats commented Nov 29, 2025

Copy link
Copy Markdown
Contributor Author

ping @iwillspeak for this PR as well, bigger. Let me know if there's anything missing. I've been using that branch for https://github.com/getzola/giallo which is a textmate highlighter.

@iwillspeak iwillspeak self-assigned this Dec 1, 2025
@iwillspeak iwillspeak requested a review from Copilot December 1, 2025 13:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a RegSet API for rust-onig, enabling efficient matching of multiple regular expressions against text in a single pass. The implementation wraps Oniguruma's native regset functionality.

Key Changes:

  • Introduces RegSet and RegSetLead types for multi-pattern regex matching
  • Adds methods for pattern management (add, replace) and searching with various options
  • Provides both simple matching and full capture group support
  • Includes comprehensive test coverage and example code

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
onig/src/regset.rs New module implementing RegSet with pattern management, search operations, and capture support
onig/src/lib.rs Exports new RegSet and RegSetLead types; adds internal as_raw() method to Regex; updates documentation with RegSet example
onig/src/find.rs Adds internal new() constructor to Captures for use by RegSet
onig/examples/regset.rs Example demonstrating RegSet usage with pattern matching and capture groups
onig/src/syntax.rs Fixed typo: "dfines" → "defines"
CHANGELOG.md Documents RegSet feature; fixes typo: "falgs" → "flags"
Cargo.toml Formatting correction (trailing comma)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onig/src/regset.rs
pub enum RegSetLead {
/// Return the match that occurs first in the text (position priority)
Position,
/// Same results as Position I think but slower

Copilot AI Dec 1, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is unclear and uncertain. Replace with a concrete explanation of how Regex lead mode differs from Position mode based on the Oniguruma library's behavior.

Suggested change
/// Same results as Position I think but slower
/// Searches each regex in order and returns the first match found, even if it is not the earliest in the text.
/// Differs from `Position` mode, which returns the match that occurs earliest in the text, regardless of which regex matched.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly i don't know the difference between Position and Regex. From oniguruma docs: https://github.com/kkos/oniguruma/blob/f95747b462de672b6f8dbdeb478245ddf061ca53/doc/API#L580-L581 it's not clear either

Comment thread onig/src/regset.rs
Comment on lines +329 to +331
// Pre-allocate region with reasonable capacity
// Most regexes have < 10 capture groups and it's not worth adding an option for that
// for RegSet

Copilot AI Dec 1, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment spans three lines unnecessarily. Consolidate into a single-line comment: // Pre-allocate region with capacity 10 (sufficient for most regexes with < 10 capture groups)

Suggested change
// Pre-allocate region with reasonable capacity
// Most regexes have < 10 capture groups and it's not worth adding an option for that
// for RegSet
// Pre-allocate region with capacity 10 (sufficient for most regexes with < 10 capture groups)

Copilot uses AI. Check for mistakes.
Comment thread onig/src/regset.rs Outdated
@iwillspeak

Copy link
Copy Markdown
Collaborator

I'll try and have a proper read through the rest of this sometime later this week but my initial reaction was: "don't we have regex set support already", but I can't find it so I guess i've made it up. I've implemented it for so many libraries i've lost track.

Overal though this PR looks to be heading in the right direction. Excellent work!

@Keats

Keats commented Dec 1, 2025

Copy link
Copy Markdown
Contributor Author

Thanks! Don't hesitate to ping me if something needs to be changed.

@Keats

Keats commented Dec 1, 2025

Copy link
Copy Markdown
Contributor Author

One thing that I don't think is possible (but I could be wrong) is allowing passing directly compiled regexes to regset. I_think_ regset wants to fully own the regex afaik though

@Keats

Keats commented Dec 14, 2025

Copy link
Copy Markdown
Contributor Author

@iwillspeak any idea when you can check it? I need that in a new version to release https://github.com/getzola/giallo

@Keats

Keats commented Jan 20, 2026

Copy link
Copy Markdown
Contributor Author

Closing in favour of #216 so I don't forget any changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants