Implement regset#216
Conversation
| repository = "https://github.com/iwillspeak/rust-onig" | ||
| documentation = "https://docs.rs/onig/" | ||
| repository = "https://github.com/Keats/rust-onig" | ||
| documentation = "https://docs.rs/onig-regset/" |
There was a problem hiding this comment.
I will need to revert those changes before it's merged
|
What are the next steps for this? |
|
Is this blocked on something? |
There was a problem hiding this comment.
Pull request overview
This PR adds a new RegSet API to the onig Rust bindings, enabling efficient “match any of many regexes” searches and exposing additional search options, along with documentation, tests, and an example.
Changes:
- Introduces
RegSet/RegSetLead(with tests) and exports it from the crate. - Adds
SEARCH_OPTION_NOT_BEGIN_STRINGandSEARCH_OPTION_NOT_BEGIN_POSITION. - Updates docs/examples and bumps crate metadata/changelog.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
onig/src/regset.rs |
Adds the new RegSet API, lead modes, search/captures helpers, and tests. |
onig/src/lib.rs |
Exports RegSet, adds docs, and exposes Regex::as_raw for internal FFI use. |
onig/src/find.rs |
Adds an internal Captures::new constructor to support RegSet captures. |
onig/src/flags.rs |
Adds two new SearchOptions bitflags. |
onig/src/syntax.rs |
Fixes a doc typo (“dfines” → “defines”). |
onig/examples/regset.rs |
Adds a usage example for RegSet. |
onig/Cargo.toml |
Renames package to onig-regset, bumps version, and updates links. |
Cargo.toml |
Workspace formatting change. |
CHANGELOG.md |
Adds an unreleased entry and fixes a typo. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,50 @@ | |||
| use onig::RegSet; | |||
There was a problem hiding this comment.
This example imports onig::RegSet, but the package was renamed to onig-regset (crate name onig_regset unless [lib] name = "onig" is set). As-is, cargo run --example regset will not compile; update the import (and any other references) to match the actual crate name, or keep the crate name as onig explicitly in Cargo.toml.
| use onig::RegSet; | |
| use onig_regset::RegSet; |
There was a problem hiding this comment.
I think Copilot is picking up on some of your local-only changes here.
There was a problem hiding this comment.
I renamed the crate back in the catchup merge (no idea I could do that). Feel free to change it back if you want to keep your fork though @Keats .
| pub enum RegSetLead { | ||
| /// Return the match that occurs first in the text (position priority) | ||
| Position, | ||
| /// Same results as Position I think but slower |
There was a problem hiding this comment.
The RegSetLead::Regex variant doc comment is speculative ("I think") and doesn't describe the actual semantics. Please replace it with a precise description (or link to the Oniguruma behavior) so users can choose the lead mode confidently.
| /// Same results as Position I think but slower | |
| /// Use Oniguruma's `ONIG_REGSET_REGEX_LEAD` selection mode. | |
| /// | |
| /// This crate forwards the setting directly to Oniguruma without adding | |
| /// extra behavior; consult the Oniguruma regset documentation for the | |
| /// exact match-selection semantics of regex-lead mode. |
| /// Find the first match with full capture group information and encoding support | ||
| /// | ||
| /// Returns a tuple of `(regex_index, captures)` if a match is found, | ||
| /// or `None` if no match is found. | ||
| /// | ||
| /// # Examples | ||
| /// | ||
| /// ```rust | ||
| /// use onig::{RegSet, RegSetLead, SearchOptions, EncodedBytes}; |
There was a problem hiding this comment.
The docs for captures_with_options mention "encoding support" and the example imports EncodedBytes, but the function currently only accepts &str. Either make this API generic over EncodedChars (like Regex::search_with_encoding) or update the docs/example to reflect that this method is &str-only.
| /// Find the first match with full capture group information and encoding support | |
| /// | |
| /// Returns a tuple of `(regex_index, captures)` if a match is found, | |
| /// or `None` if no match is found. | |
| /// | |
| /// # Examples | |
| /// | |
| /// ```rust | |
| /// use onig::{RegSet, RegSetLead, SearchOptions, EncodedBytes}; | |
| /// Find the first match with full capture group information. | |
| /// | |
| /// This method searches `&str` text and returns a tuple of | |
| /// `(regex_index, captures)` if a match is found, or `None` if no | |
| /// match is found. | |
| /// | |
| /// # Examples | |
| /// | |
| /// ```rust | |
| /// use onig::{RegSet, RegSetLead, SearchOptions}; |
There was a problem hiding this comment.
I think the fix here is we should accept EncodedBytes, rather than updating the docs.
It would bring this in line with the rest of the API.
| } else { | ||
| None |
There was a problem hiding this comment.
onig_regset_search can return negative values for both mismatch and real errors. The current code treats any negative return as None, which can silently hide engine errors (contrast with Regex::search_with_encoding, which panics on error and only returns None for ONIG_MISMATCH). Consider checking explicitly for ONIG_MISMATCH and panicking (or otherwise surfacing) other error codes so failures aren't masked.
| } else { | |
| None | |
| } else if result == onig_sys::ONIG_MISMATCH { | |
| None | |
| } else { | |
| panic!("onig_regset_search failed with error code {}", result); |
| ## 6.6.0 (unreleased) | ||
|
|
||
| * Expose RegSet in rust-onig | ||
| * Expose 2 more SearchOption: SEARCH_OPTION_NOT_BEGIN_STRING and SEARCH_OPTION_NOT_BEGIN_POSITION | ||
|
|
There was a problem hiding this comment.
Changelog version header (6.6.0 (unreleased)) doesn't match the crate version bump to 6.7.0 in onig/Cargo.toml. Please reconcile these so the changelog corresponds to the actual release/version being published.
There was a problem hiding this comment.
I'm wondering if we merge this and the Regex<'syntax> into one big 7.0 release. Thoughts?
There was a problem hiding this comment.
There’s a separate active PR that proposes adding a lifetime to the main regex type to close a correctness gap with the underlying c library’s behaviour.
I’m suggesting that if we merge both we’d just bump to 7.0.
|
Fine for me |
This pull request introduces several new features and improvements to the
onigcrate, most notably exposing theRegSetAPI for working with multiple regular expressions at once, and adding new search options. It also updates documentation and metadata to reflect these changes and provides an example for usingRegSet.New Features:
RegSetAPI, allowing users to efficiently match against multiple regex patterns. This includes public exports inonig/src/lib.rs, a new module, and a usage example inonig/examples/regset.rs. [1] [2] [3]SEARCH_OPTION_NOT_BEGIN_STRINGandSEARCH_OPTION_NOT_BEGIN_POSITION, making the search API more flexible.Documentation and Examples:
onig/src/lib.rsto include usage examples for both single regex andRegSet. [1] [2]RegSetwith multiple patterns and capture groups.Crate Metadata and Changelog:
onig-regsetand version to6.7.0, with updated repository and documentation links. [1] [2]Internal Improvements:
Capturesto support internal use byRegSet.MetaChardocumentation.Cargo.toml.