Skip to content

Implement regset#216

Open
Keats wants to merge 10 commits into
rust-onig:mainfrom
Keats:fork-release
Open

Implement regset#216
Keats wants to merge 10 commits into
rust-onig:mainfrom
Keats:fork-release

Conversation

@Keats

@Keats Keats commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces several new features and improvements to the onig crate, most notably exposing the RegSet API for working with multiple regular expressions at once, and adding new search options. It also updates documentation and metadata to reflect these changes and provides an example for using RegSet.

New Features:

  • Exposed the RegSet API, allowing users to efficiently match against multiple regex patterns. This includes public exports in onig/src/lib.rs, a new module, and a usage example in onig/examples/regset.rs. [1] [2] [3]
  • Added two new search options: SEARCH_OPTION_NOT_BEGIN_STRING and SEARCH_OPTION_NOT_BEGIN_POSITION, making the search API more flexible.

Documentation and Examples:

  • Expanded documentation in onig/src/lib.rs to include usage examples for both single regex and RegSet. [1] [2]
  • Added a practical example demonstrating how to use RegSet with multiple patterns and capture groups.

Crate Metadata and Changelog:

  • Updated crate name to onig-regset and version to 6.7.0, with updated repository and documentation links. [1] [2]
  • Added a changelog entry for the new release and fixed a typo in a previous entry. [1] [2]

Internal Improvements:

  • Added a new constructor for Captures to support internal use by RegSet.
  • Exposed a method to get the raw Oniguruma regex pointer for internal usage.
  • Fixed a typo in the MetaChar documentation.
  • Minor formatting fix in the workspace Cargo.toml.

@Keats Keats mentioned this pull request Jan 20, 2026
Comment thread onig/Cargo.toml
repository = "https://github.com/iwillspeak/rust-onig"
documentation = "https://docs.rs/onig/"
repository = "https://github.com/Keats/rust-onig"
documentation = "https://docs.rs/onig-regset/"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to revert those changes before it's merged

@iwillspeak iwillspeak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far.

@ofek

ofek commented Feb 18, 2026

Copy link
Copy Markdown

What are the next steps for this?

@ofek

ofek commented Apr 5, 2026

Copy link
Copy Markdown

Is this blocked on something?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new RegSet API to the onig Rust bindings, enabling efficient “match any of many regexes” searches and exposing additional search options, along with documentation, tests, and an example.

Changes:

  • Introduces RegSet/RegSetLead (with tests) and exports it from the crate.
  • Adds SEARCH_OPTION_NOT_BEGIN_STRING and SEARCH_OPTION_NOT_BEGIN_POSITION.
  • Updates docs/examples and bumps crate metadata/changelog.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
onig/src/regset.rs Adds the new RegSet API, lead modes, search/captures helpers, and tests.
onig/src/lib.rs Exports RegSet, adds docs, and exposes Regex::as_raw for internal FFI use.
onig/src/find.rs Adds an internal Captures::new constructor to support RegSet captures.
onig/src/flags.rs Adds two new SearchOptions bitflags.
onig/src/syntax.rs Fixes a doc typo (“dfines” → “defines”).
onig/examples/regset.rs Adds a usage example for RegSet.
onig/Cargo.toml Renames package to onig-regset, bumps version, and updates links.
Cargo.toml Workspace formatting change.
CHANGELOG.md Adds an unreleased entry and fixes a typo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onig/examples/regset.rs
@@ -0,0 +1,50 @@
use onig::RegSet;

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example imports onig::RegSet, but the package was renamed to onig-regset (crate name onig_regset unless [lib] name = "onig" is set). As-is, cargo run --example regset will not compile; update the import (and any other references) to match the actual crate name, or keep the crate name as onig explicitly in Cargo.toml.

Suggested change
use onig::RegSet;
use onig_regset::RegSet;

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Copilot is picking up on some of your local-only changes here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the crate back in the catchup merge (no idea I could do that). Feel free to change it back if you want to keep your fork though @Keats .

Comment thread onig/src/regset.rs
pub enum RegSetLead {
/// Return the match that occurs first in the text (position priority)
Position,
/// Same results as Position I think but slower

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RegSetLead::Regex variant doc comment is speculative ("I think") and doesn't describe the actual semantics. Please replace it with a precise description (or link to the Oniguruma behavior) so users can choose the lead mode confidently.

Suggested change
/// Same results as Position I think but slower
/// Use Oniguruma's `ONIG_REGSET_REGEX_LEAD` selection mode.
///
/// This crate forwards the setting directly to Oniguruma without adding
/// extra behavior; consult the Oniguruma regset documentation for the
/// exact match-selection semantics of regex-lead mode.

Copilot uses AI. Check for mistakes.
Comment thread onig/src/regset.rs
Comment on lines +287 to +295
/// Find the first match with full capture group information and encoding support
///
/// Returns a tuple of `(regex_index, captures)` if a match is found,
/// or `None` if no match is found.
///
/// # Examples
///
/// ```rust
/// use onig::{RegSet, RegSetLead, SearchOptions, EncodedBytes};

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for captures_with_options mention "encoding support" and the example imports EncodedBytes, but the function currently only accepts &str. Either make this API generic over EncodedChars (like Regex::search_with_encoding) or update the docs/example to reflect that this method is &str-only.

Suggested change
/// Find the first match with full capture group information and encoding support
///
/// Returns a tuple of `(regex_index, captures)` if a match is found,
/// or `None` if no match is found.
///
/// # Examples
///
/// ```rust
/// use onig::{RegSet, RegSetLead, SearchOptions, EncodedBytes};
/// Find the first match with full capture group information.
///
/// This method searches `&str` text and returns a tuple of
/// `(regex_index, captures)` if a match is found, or `None` if no
/// match is found.
///
/// # Examples
///
/// ```rust
/// use onig::{RegSet, RegSetLead, SearchOptions};

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix here is we should accept EncodedBytes, rather than updating the docs.

It would bring this in line with the rest of the API.

Comment thread onig/src/regset.rs
Comment on lines +378 to +379
} else {
None

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onig_regset_search can return negative values for both mismatch and real errors. The current code treats any negative return as None, which can silently hide engine errors (contrast with Regex::search_with_encoding, which panics on error and only returns None for ONIG_MISMATCH). Consider checking explicitly for ONIG_MISMATCH and panicking (or otherwise surfacing) other error codes so failures aren't masked.

Suggested change
} else {
None
} else if result == onig_sys::ONIG_MISMATCH {
None
} else {
panic!("onig_regset_search failed with error code {}", result);

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md Outdated
Comment on lines +7 to +11
## 6.6.0 (unreleased)

* Expose RegSet in rust-onig
* Expose 2 more SearchOption: SEARCH_OPTION_NOT_BEGIN_STRING and SEARCH_OPTION_NOT_BEGIN_POSITION

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog version header (6.6.0 (unreleased)) doesn't match the crate version bump to 6.7.0 in onig/Cargo.toml. Please reconcile these so the changelog corresponds to the actual release/version being published.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we merge this and the Regex<'syntax> into one big 7.0 release. Thoughts?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there breaking changes?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a separate active PR that proposes adding a lifetime to the main regex type to close a correctness gap with the underlying c library’s behaviour.

I’m suggesting that if we merge both we’d just bump to 7.0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds fine to me

Comment thread onig/Cargo.toml Outdated
Comment thread onig/src/lib.rs
@iwillspeak

Copy link
Copy Markdown
Collaborator

@Keats any objections to me merging #222 then? It might change how this PR lands. I’ll follow it up with a PR to start the CHANGELOG for 7.0

@Keats

Keats commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

Fine for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants