feat(core): add UTF-8 boundary validation to Fix constructors (#463) by avifenesh · Pull Request #505 · agent-sh/agnix

avifenesh · 2026-02-18T02:22:25Z

Summary

Add debug_assert!(start <= end) to the 4 range-based Fix constructors (replace, replace_with_confidence, delete, delete_with_confidence) to catch inverted byte ranges early in development
Add 6 new _checked constructor variants (replace_checked, replace_with_confidence_checked, insert_checked, insert_with_confidence_checked, delete_checked, delete_with_confidence_checked) that accept a content: &str parameter and validate UTF-8 char boundary alignment

Why

Previously, invalid byte offsets were only detected at apply time in fixes.rs, which silently skipped bad fixes in production. This shifts detection left - validator bugs that produce mid-codepoint offsets or reversed ranges are now caught at construction time in debug builds, matching the existing pattern in span_utils.rs:497.

All checks use debug_assert! - zero cost in release builds, no change to release behavior. The existing runtime safety net in fixes.rs remains as the production backstop.

What the `_checked` variants add

// Existing (base constructors still work, now have start <= end check):
Fix::replace(start, end, replacement, description, safe)

// New - validates UTF-8 boundaries when content is available:
Fix::replace_checked(content, start, end, replacement, description, safe)
Fix::insert_checked(content, position, text, description, safe)
Fix::delete_checked(content, start, end, description, safe)
// Plus _with_confidence variants for all three

The _checked variants delegate to the base constructors after asserting:

start <= end (ordering)
start <= content.len() and content.is_char_boundary(start) (UTF-8 boundary)
end <= content.len() and content.is_char_boundary(end) (UTF-8 boundary)

Test coverage

34 new tests covering: reversed ranges, mid-codepoint offsets (2-byte é and 4-byte emoji 😀), valid boundaries, out-of-bounds, zero-width replacements, and ASCII content
Panic-path tests are gated under #[cfg(debug_assertions)]
2705 debug tests pass, 2931 release tests pass (24 debug-only tests compile out cleanly)

Closes #463

Add debug_assert!(start <= end) to Fix::replace, replace_with_confidence, delete, and delete_with_confidence to catch reversed byte ranges in debug builds. Introduce six new _checked constructor variants (replace_checked, replace_with_confidence_checked, insert_checked, insert_with_confidence_checked, delete_checked, delete_with_confidence_checked) that additionally validate UTF-8 char boundary alignment via debug_assert!. These are zero-cost in release builds and delegate to the unchecked originals. Add 24 tests covering reversed-range panics, valid boundary usage, mid-codepoint panics (2-byte and 4-byte), and out-of-bounds detection.

…463)

…nts (#463)

chatgpt-codex-connector · 2026-02-18T02:22:29Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

gemini-code-assist · 2026-02-18T02:22:39Z

Summary of Changes

Hello @avifenesh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of Fix object creation by shifting validation left in the development cycle. It introduces debug-only assertions to catch common errors like inverted byte ranges and misaligned UTF-8 character boundaries at construction time, rather than silently failing at apply time in production. This change helps developers identify and fix issues earlier, leading to more reliable code without impacting the performance of release builds.

Highlights

Enhanced Fix Constructor Validation: Added debug_assert!(start <= end) to four existing range-based Fix constructors (replace, replace_with_confidence, delete, delete_with_confidence) to catch inverted byte ranges early in debug builds.
New UTF-8 Boundary Checked Constructors: Introduced six new _checked Fix constructor variants (replace_checked, replace_with_confidence_checked, insert_checked, insert_with_confidence_checked, delete_checked, delete_with_confidence_checked) that accept a content: &str parameter and validate UTF-8 character boundary alignment in debug builds.
Debug-Only Assertions: All new validations are implemented using debug_assert!, meaning they incur zero cost in release builds and do not alter release behavior, maintaining the existing runtime safety net for production.
Comprehensive Test Coverage: Added 34 new tests covering reversed ranges, mid-codepoint offsets (for 2-byte and 4-byte UTF-8 characters), valid boundaries, out-of-bounds scenarios, zero-width replacements, and ASCII content, with panic-path tests gated under #[cfg(debug_assertions)].

Changelog

CHANGELOG.md
- Added a new feature entry for the UTF-8 boundary _checked Fix constructors.
- Added a new fix entry for the Fix constructor range assertions.
crates/agnix-core/src/diagnostics.rs
- Added debug_assert!(start <= end) to Fix::replace, Fix::replace_with_confidence, Fix::delete, and Fix::delete_with_confidence constructors.
- Introduced replace_checked, replace_with_confidence_checked, insert_checked, insert_with_confidence_checked, delete_checked, and delete_with_confidence_checked methods for Fix.
- Added comprehensive test cases for debug_assert! panics and _checked constructor validations, including UTF-8 boundary checks and out-of-bounds scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a great improvement for robustness. Adding debug_assert! for range checks in existing Fix constructors and introducing new _checked variants for UTF-8 boundary validation will help catch bugs early during development. The test coverage for these new additions is thorough. I have one suggestion to reduce code duplication in the newly added _checked methods by using private helper functions.

gemini-code-assist · 2026-02-18T02:25:15Z

crates/agnix-core/src/diagnostics.rs

+    /// Create a replacement fix, asserting UTF-8 char boundary alignment in debug builds.
+    ///
+    /// Validates that both `start` and `end` land on UTF-8 char boundaries in `content`.
+    /// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::replace`] when `content` is not available.
+    pub fn replace_checked(
+        content: &str,
+        start: usize,
+        end: usize,
+        replacement: impl Into<String>,
+        description: impl Into<String>,
+        safe: bool,
+    ) -> Self {
+        debug_assert!(
+            start <= end,
+            "Fix::replace_checked: start_byte ({start}) must be <= end_byte ({end})"
+        );
+        debug_assert!(
+            start <= content.len(),
+            "Fix::replace_checked: start_byte ({start}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(start),
+            "Fix::replace_checked: start_byte ({start}) is not on a UTF-8 char boundary"
+        );
+        debug_assert!(
+            end <= content.len(),
+            "Fix::replace_checked: end_byte ({end}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(end),
+            "Fix::replace_checked: end_byte ({end}) is not on a UTF-8 char boundary"
+        );
+        Self::replace(start, end, replacement, description, safe)
+    }
+
+    /// Create a replacement fix with explicit confidence, asserting UTF-8 char boundary
+    /// alignment in debug builds.
+    ///
+    /// Validates that both `start` and `end` land on UTF-8 char boundaries in `content`.
+    /// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::replace_with_confidence`] when
+    /// `content` is not available.
+    pub fn replace_with_confidence_checked(
+        content: &str,
+        start: usize,
+        end: usize,
+        replacement: impl Into<String>,
+        description: impl Into<String>,
+        confidence: f32,
+    ) -> Self {
+        debug_assert!(
+            start <= end,
+            "Fix::replace_with_confidence_checked: start_byte ({start}) must be <= end_byte ({end})"
+        );
+        debug_assert!(
+            start <= content.len(),
+            "Fix::replace_with_confidence_checked: start_byte ({start}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(start),
+            "Fix::replace_with_confidence_checked: start_byte ({start}) is not on a UTF-8 char boundary"
+        );
+        debug_assert!(
+            end <= content.len(),
+            "Fix::replace_with_confidence_checked: end_byte ({end}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(end),
+            "Fix::replace_with_confidence_checked: end_byte ({end}) is not on a UTF-8 char boundary"
+        );
+        Self::replace_with_confidence(start, end, replacement, description, confidence)
+    }
+
+    /// Create an insertion fix, asserting UTF-8 char boundary alignment in debug builds.
+    ///
+    /// Validates that `position` lands on a UTF-8 char boundary in `content`.
+    /// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::insert`] when `content` is not available.
+    pub fn insert_checked(
+        content: &str,
+        position: usize,
+        text: impl Into<String>,
+        description: impl Into<String>,
+        safe: bool,
+    ) -> Self {
+        debug_assert!(
+            position <= content.len(),
+            "Fix::insert_checked: position ({position}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(position),
+            "Fix::insert_checked: position ({position}) is not on a UTF-8 char boundary"
+        );
+        Self::insert(position, text, description, safe)
+    }
+
+    /// Create an insertion fix with explicit confidence, asserting UTF-8 char boundary
+    /// alignment in debug builds.
+    ///
+    /// Validates that `position` lands on a UTF-8 char boundary in `content`.
+    /// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::insert_with_confidence`] when
+    /// `content` is not available.
+    pub fn insert_with_confidence_checked(
+        content: &str,
+        position: usize,
+        text: impl Into<String>,
+        description: impl Into<String>,
+        confidence: f32,
+    ) -> Self {
+        debug_assert!(
+            position <= content.len(),
+            "Fix::insert_with_confidence_checked: position ({position}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(position),
+            "Fix::insert_with_confidence_checked: position ({position}) is not on a UTF-8 char boundary"
+        );
+        Self::insert_with_confidence(position, text, description, confidence)
+    }
+
+    /// Create a deletion fix, asserting UTF-8 char boundary alignment in debug builds.
+    ///
+    /// Validates that both `start` and `end` land on UTF-8 char boundaries in `content`.
+    /// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::delete`] when `content` is not available.
+    pub fn delete_checked(
+        content: &str,
+        start: usize,
+        end: usize,
+        description: impl Into<String>,
+        safe: bool,
+    ) -> Self {
+        debug_assert!(
+            start <= end,
+            "Fix::delete_checked: start_byte ({start}) must be <= end_byte ({end})"
+        );
+        debug_assert!(
+            start <= content.len(),
+            "Fix::delete_checked: start_byte ({start}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(start),
+            "Fix::delete_checked: start_byte ({start}) is not on a UTF-8 char boundary"
+        );
+        debug_assert!(
+            end <= content.len(),
+            "Fix::delete_checked: end_byte ({end}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(end),
+            "Fix::delete_checked: end_byte ({end}) is not on a UTF-8 char boundary"
+        );
+        Self::delete(start, end, description, safe)
+    }
+
+    /// Create a deletion fix with explicit confidence, asserting UTF-8 char boundary
+    /// alignment in debug builds.
+    ///
+    /// Validates that both `start` and `end` land on UTF-8 char boundaries in `content`.
+    /// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::delete_with_confidence`] when
+    /// `content` is not available.
+    pub fn delete_with_confidence_checked(
+        content: &str,
+        start: usize,
+        end: usize,
+        description: impl Into<String>,
+        confidence: f32,
+    ) -> Self {
+        debug_assert!(
+            start <= end,
+            "Fix::delete_with_confidence_checked: start_byte ({start}) must be <= end_byte ({end})"
+        );
+        debug_assert!(
+            start <= content.len(),
+            "Fix::delete_with_confidence_checked: start_byte ({start}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(start),
+            "Fix::delete_with_confidence_checked: start_byte ({start}) is not on a UTF-8 char boundary"
+        );
+        debug_assert!(
+            end <= content.len(),
+            "Fix::delete_with_confidence_checked: end_byte ({end}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(end),
+            "Fix::delete_with_confidence_checked: end_byte ({end}) is not on a UTF-8 char boundary"
+        );
+        Self::delete_with_confidence(start, end, description, confidence)
+    }


There's significant duplication of validation logic across the new _checked methods. You can refactor this into private helper functions to improve maintainability and reduce code duplication. One helper can validate spans (start and end) and another can validate single positions.

Copilot

Pull request overview

Adds debug-only validation to Fix construction to catch invalid byte ranges/UTF-8 boundary issues earlier (during fix creation), while keeping the existing runtime safety net at fix-application time.

Changes:

Added debug_assert!(start <= end) to range-based Fix constructors (replace*, delete*).
Introduced 6 new _checked Fix constructors that validate bounds + UTF-8 char boundaries against provided content: &str in debug builds.
Added comprehensive tests for reversed ranges, out-of-bounds, and mid-codepoint offsets (debug-only panic tests gated by #[cfg(debug_assertions)]).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
crates/agnix-core/src/diagnostics.rs	Adds new debug assertions, introduces `_checked` constructors, and adds extensive unit tests for boundary/range validation.
CHANGELOG.md	Documents the new `_checked` constructors and the new debug range assertions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T02:27:57Z

crates/agnix-core/src/diagnostics.rs

+    /// Create a replacement fix, asserting UTF-8 char boundary alignment in debug builds.
+    ///
+    /// Validates that both `start` and `end` land on UTF-8 char boundaries in `content`.
+    /// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::replace`] when `content` is not available.
+    pub fn replace_checked(
+        content: &str,
+        start: usize,
+        end: usize,
+        replacement: impl Into<String>,
+        description: impl Into<String>,
+        safe: bool,
+    ) -> Self {
+        debug_assert!(
+            start <= end,
+            "Fix::replace_checked: start_byte ({start}) must be <= end_byte ({end})"
+        );
+        debug_assert!(
+            start <= content.len(),
+            "Fix::replace_checked: start_byte ({start}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(start),
+            "Fix::replace_checked: start_byte ({start}) is not on a UTF-8 char boundary"
+        );
+        debug_assert!(
+            end <= content.len(),
+            "Fix::replace_checked: end_byte ({end}) is out of bounds (len={})", content.len()
+        );
+        debug_assert!(
+            content.is_char_boundary(end),
+            "Fix::replace_checked: end_byte ({end}) is not on a UTF-8 char boundary"
+        );


The _checked constructors repeat the same debug-only range/bounds/UTF-8 boundary assertions across multiple methods. Consider extracting a small private helper to reduce duplication and prevent these checks/error messages from drifting over time.

Suggested change

/// Create a replacement fix, asserting UTF-8 char boundary alignment in debug builds.

///

/// Validates that both `start` and `end` land on UTF-8 char boundaries in `content`.

/// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::replace`] when `content` is not available.

pub fn replace_checked(

content: &str,

start: usize,

end: usize,

replacement: impl Into<String>,

description: impl Into<String>,

safe: bool,

) -> Self {

debug_assert!(

start <= end,

"Fix::replace_checked: start_byte ({start}) must be <= end_byte ({end})"

);

debug_assert!(

start <= content.len(),

"Fix::replace_checked: start_byte ({start}) is out of bounds (len={})", content.len()

);

debug_assert!(

content.is_char_boundary(start),

"Fix::replace_checked: start_byte ({start}) is not on a UTF-8 char boundary"

);

debug_assert!(

end <= content.len(),

"Fix::replace_checked: end_byte ({end}) is out of bounds (len={})", content.len()

);

debug_assert!(

content.is_char_boundary(end),

"Fix::replace_checked: end_byte ({end}) is not on a UTF-8 char boundary"

);

/// Internal helper for debug-only validation of byte ranges and UTF-8 char boundaries.

///

/// This is used by `_checked` constructors to keep assertions and their error

/// messages consistent in one place. It is a no-op in release builds since it

/// only contains `debug_assert!` calls.

fn debug_assert_valid_byte_range_for_content(

content: &str,

start: usize,

end: usize,

context: &str,

) {

debug_assert!(

start <= end,

"{context}: start_byte ({start}) must be <= end_byte ({end})"

);

debug_assert!(

start <= content.len(),

"{context}: start_byte ({start}) is out of bounds (len={})",

content.len()

);

debug_assert!(

content.is_char_boundary(start),

"{context}: start_byte ({start}) is not on a UTF-8 char boundary"

);

debug_assert!(

end <= content.len(),

"{context}: end_byte ({end}) is out of bounds (len={})",

content.len()

);

debug_assert!(

content.is_char_boundary(end),

"{context}: end_byte ({end}) is not on a UTF-8 char boundary"

);

}

/// Create a replacement fix, asserting UTF-8 char boundary alignment in debug builds.

///

/// Validates that both `start` and `end` land on UTF-8 char boundaries in `content`.

/// These checks are no-ops in release builds; the function otherwise behaves identically to its unchecked counterpart. Use [`Self::replace`] when `content` is not available.

pub fn replace_checked(

content: &str,

start: usize,

end: usize,

replacement: impl Into<String>,

description: impl Into<String>,

safe: bool,

) -> Self {

Self::debug_assert_valid_byte_range_for_content(content, start, end, "Fix::replace_checked");

…463)

avifenesh

Addressed Copilot and Gemini suggestions: extracted private helpers debug_assert_valid_range() and debug_assert_valid_position() to consolidate duplicate debug_assert! logic across the _checked constructors.

avifenesh added 5 commits February 18, 2026 04:00

test(core): fill test gaps for Fix _checked constructors (#463)

e5a8fd8

fix(core): address review findings for Fix UTF-8 boundary validation (#…

258dd92

…463)

test(core): add missing boundary panic tests for delete_checked varia…

d0b7d63

…nts (#463)

docs: add CHANGELOG entries for Fix UTF-8 boundary validation (#463)

13fac4c

Copilot AI review requested due to automatic review settings February 18, 2026 02:22

Copilot started reviewing on behalf of avifenesh February 18, 2026 02:22 View session

gemini-code-assist bot reviewed Feb 18, 2026

View reviewed changes

Copilot AI reviewed Feb 18, 2026

View reviewed changes

refactor(core): extract helper functions for Fix _checked validation (#…

6eb07e3

…463)

avifenesh commented Feb 18, 2026

View reviewed changes

avifenesh merged commit 88c050f into main Feb 18, 2026
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add UTF-8 boundary validation to Fix constructors (#463)#505

feat(core): add UTF-8 boundary validation to Fix constructors (#463)#505
avifenesh merged 6 commits intomainfrom
feature/utf8-fix-constructors-463

avifenesh commented Feb 18, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

avifenesh left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avifenesh commented Feb 18, 2026

Summary

Why

What the _checked variants add

Test coverage

Uh oh!

chatgpt-codex-connector bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

avifenesh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

What the `_checked` variants add