Implement regset#210
Conversation
| @@ -0,0 +1,50 @@ | |||
| use onig::RegSet; | |||
There was a problem hiding this comment.
Note: i've put all the regset code in that file so it's self-contained, I don't know if you prefer to impl in eg find.rs for the captures
|
ping @iwillspeak for this PR as well, bigger. Let me know if there's anything missing. I've been using that branch for https://github.com/getzola/giallo which is a textmate highlighter. |
There was a problem hiding this comment.
Pull request overview
This PR implements a RegSet API for rust-onig, enabling efficient matching of multiple regular expressions against text in a single pass. The implementation wraps Oniguruma's native regset functionality.
Key Changes:
- Introduces
RegSetandRegSetLeadtypes for multi-pattern regex matching - Adds methods for pattern management (add, replace) and searching with various options
- Provides both simple matching and full capture group support
- Includes comprehensive test coverage and example code
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| onig/src/regset.rs | New module implementing RegSet with pattern management, search operations, and capture support |
| onig/src/lib.rs | Exports new RegSet and RegSetLead types; adds internal as_raw() method to Regex; updates documentation with RegSet example |
| onig/src/find.rs | Adds internal new() constructor to Captures for use by RegSet |
| onig/examples/regset.rs | Example demonstrating RegSet usage with pattern matching and capture groups |
| onig/src/syntax.rs | Fixed typo: "dfines" → "defines" |
| CHANGELOG.md | Documents RegSet feature; fixes typo: "falgs" → "flags" |
| Cargo.toml | Formatting correction (trailing comma) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub enum RegSetLead { | ||
| /// Return the match that occurs first in the text (position priority) | ||
| Position, | ||
| /// Same results as Position I think but slower |
There was a problem hiding this comment.
This documentation is unclear and uncertain. Replace with a concrete explanation of how Regex lead mode differs from Position mode based on the Oniguruma library's behavior.
| /// Same results as Position I think but slower | |
| /// Searches each regex in order and returns the first match found, even if it is not the earliest in the text. | |
| /// Differs from `Position` mode, which returns the match that occurs earliest in the text, regardless of which regex matched. |
There was a problem hiding this comment.
Honestly i don't know the difference between Position and Regex. From oniguruma docs: https://github.com/kkos/oniguruma/blob/f95747b462de672b6f8dbdeb478245ddf061ca53/doc/API#L580-L581 it's not clear either
| // Pre-allocate region with reasonable capacity | ||
| // Most regexes have < 10 capture groups and it's not worth adding an option for that | ||
| // for RegSet |
There was a problem hiding this comment.
[nitpick] The comment spans three lines unnecessarily. Consolidate into a single-line comment: // Pre-allocate region with capacity 10 (sufficient for most regexes with < 10 capture groups)
| // Pre-allocate region with reasonable capacity | |
| // Most regexes have < 10 capture groups and it's not worth adding an option for that | |
| // for RegSet | |
| // Pre-allocate region with capacity 10 (sufficient for most regexes with < 10 capture groups) |
|
I'll try and have a proper read through the rest of this sometime later this week but my initial reaction was: "don't we have regex set support already", but I can't find it so I guess i've made it up. I've implemented it for so many libraries i've lost track. Overal though this PR looks to be heading in the right direction. Excellent work! |
|
Thanks! Don't hesitate to ping me if something needs to be changed. |
|
One thing that I don't think is possible (but I could be wrong) is allowing passing directly compiled regexes to regset. I_think_ regset wants to fully own the regex afaik though |
|
@iwillspeak any idea when you can check it? I need that in a new version to release https://github.com/getzola/giallo |
|
Closing in favour of #216 so I don't forget any changes |
No description provided.