test(core): add precise boundary tests for MAX_REGEX_INPUT_SIZE ReDoS guard (#457) by avifenesh · Pull Request #508 · agent-sh/agnix

avifenesh · 2026-02-19T12:23:21Z

Summary

Add 28 boundary tests covering the 65536-byte limit for regex-guarded functions and unguarded functions across `markdown.rs`, `prompt.rs`, and `cross_platform.rs`
Each of the 12 guarded regex functions gets two tests: content at exactly `MAX_REGEX_INPUT_SIZE` (processed) and at `MAX_REGEX_INPUT_SIZE + 1` (rejected)
Adds tests confirming `extract_imports`, `extract_markdown_links`, and `check_markdown_structure` handle content above 64KB without restriction (they use byte-by-byte scanning, pulldown-cmark, and a simple anchored line-by-line regex respectively - none susceptible to catastrophic backtracking)
Fixes `MAX_REGEX_INPUT_SIZE` doc comment to clarify exactly which functions are guarded vs. unguarded
Documents intentional exclusion of `check_markdown_structure` from the size guard in both code comments and a regression test

Test Plan

`cargo test -p agnix-core -- 64kb` - 15 exact-boundary and above-limit tests pass
`cargo test -p agnix-core -- one_byte_over` - 13 over-limit tests pass
Full test suite: 2742 tests pass, 0 failures

Related Issues

Closes #457

… guard Add 26 boundary tests covering the exact 65536-byte limit for all guarded regex functions across markdown.rs, prompt.rs, and cross_platform.rs. Each guarded function gets two tests: content at exactly MAX_REGEX_INPUT_SIZE (processed) and at MAX_REGEX_INPUT_SIZE+1 (rejected). Also adds tests confirming extract_imports and extract_markdown_links handle content above 64KB without restriction since they use pulldown-cmark parsing instead of regex. Closes #457

- Replace magic numbers 19/18 in xml_tags tests with computed overhead - Fix inaccurate extract_imports comment (byte-by-byte, not pulldown-cmark) - Add design-intent comment for check_markdown_structure exclusion - Fix inaccurate "50%" comment in find_critical_in_middle test - Add self-documenting trigger comments for cross_platform.rs tests - Improve categorize_layer assertion messages - Add explanatory comments for unguarded function boundary tests

The filler appended to the base content contains no newlines, so total_lines stays at 20 after padding. Line 10 is at exactly 50.0%, not the previously claimed ~47.6% (which assumed 21 lines).

gemini-code-assist · 2026-02-19T12:23:40Z

Summary of Changes

Hello @avifenesh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the ReDoS protection mechanism by introducing comprehensive boundary tests for regex-guarded functions. It ensures that the system correctly handles inputs at the exact 64KB size limit and gracefully rejects inputs exceeding it, preventing potential denial-of-service attacks. Additionally, it verifies the expected behavior of non-regex-based parsing functions with large inputs and clarifies specific exclusions from the size guard.

Highlights

Enhanced ReDoS Protection Testing: Added 27 new boundary tests to validate the MAX_REGEX_INPUT_SIZE ReDoS guard across 12 regex functions in markdown.rs, prompt.rs, and cross_platform.rs.
Non-Regex Function Validation: Introduced tests to confirm that functions like extract_imports and extract_markdown_links, which do not use regex, correctly process content exceeding the 64KB limit.
Documentation of Exclusions: Explicitly documented the rationale for check_markdown_structure's intentional exclusion from the MAX_REGEX_INPUT_SIZE guard.

Changelog

crates/agnix-core/src/parsers/markdown.rs
- Added precise boundary tests for extract_xml_tags at and one byte over the MAX_REGEX_INPUT_SIZE limit.
- Added tests for extract_imports and extract_markdown_links to confirm processing of content above the 64KB limit.
crates/agnix-core/src/schemas/cross_platform.rs
- Added precise boundary tests for find_claude_specific_features, find_hard_coded_paths, extract_build_commands, extract_tool_constraints, and categorize_layer at and one byte over the MAX_REGEX_INPUT_SIZE limit.
- Added a note explaining why check_markdown_structure is intentionally not guarded.
crates/agnix-core/src/schemas/prompt.rs
- Added precise boundary tests for find_critical_in_middle, find_cot_on_simple_tasks, find_weak_imperative_language, find_ambiguous_instructions, find_redundant_instructions, and find_negative_only_instructions at and one byte over the MAX_REGEX_INPUT_SIZE limit.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Fix long format! line in markdown.rs boundary tests that cargo fmt reformats, and add the required CHANGELOG.md entry for #457.

Copilot

Pull request overview

Adds precise boundary-condition test coverage around the MAX_REGEX_INPUT_SIZE (64KB) ReDoS guard, ensuring regex-based detectors accept input at exactly the limit and safely skip processing one byte over, while also validating that non-regex markdown parsing continues to work above the limit.

Changes:

Add exact-at-limit and one-byte-over tests for PE-* prompt schema regex detectors in prompt.rs.
Add exact-at-limit and one-byte-over tests for XP-* cross-platform schema regex detectors and categorize_layer precedence detection in cross_platform.rs.
Add exact boundary tests for XML tag extraction and >64KB “still processes” regression tests for non-regex markdown parsing helpers in markdown.rs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
crates/agnix-core/src/schemas/prompt.rs	Adds exact-boundary tests for prompt-engineering regex detectors guarded by `MAX_REGEX_INPUT_SIZE`.
crates/agnix-core/src/schemas/cross_platform.rs	Adds exact-boundary tests for cross-platform regex detectors and precedence detection; documents intentional non-guarded markdown structure check.
crates/agnix-core/src/parsers/markdown.rs	Adds exact-boundary tests for regex-guarded XML tag extraction and verifies non-regex import/link extraction works above 64KB.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T12:27:48Z

crates/agnix-core/src/parsers/markdown.rs

+        );
+    }
+
+    #[test]
+    fn test_extract_imports_processes_above_64kb_limit() {
+        // No exactly-at-64kb companion: extract_imports has no size guard and always
+        // processes content of any size.
+        // extract_imports uses byte-by-byte scanning (not regex) inside pulldown-cmark
+        // token callbacks, so MAX_REGEX_INPUT_SIZE does not apply.
+        let imports_prefix = "@styles.css\n";
+        let needed = MAX_REGEX_INPUT_SIZE - imports_prefix.len() + 1;
+        let content = format!("{}{}", imports_prefix, "a".repeat(needed));
+        assert!(
+            content.len() > MAX_REGEX_INPUT_SIZE,
+            "Content must exceed the limit"
+        );
+        let imports = extract_imports(&content);
+        assert!(
+            !imports.is_empty(),
+            "extract_imports should process content beyond the regex size limit"
+        );
+    }
+
+    #[test]
+    fn test_extract_markdown_links_processes_above_64kb_limit() {


The new tests state that extract_imports/extract_markdown_links are not subject to MAX_REGEX_INPUT_SIZE, but the module-level docs in this file currently say the limit applies to "import detection" and "link parsing". Please reconcile the documentation (either update the module docs or adjust wording here) so readers don’t get conflicting guidance about which functions are guarded.

Fixed in commit 824b128: updated the MAX_REGEX_INPUT_SIZE doc comment to explicitly state that only extract_xml_tags is guarded, while extract_imports and extract_markdown_links are not (they use byte-by-byte scanning and pulldown-cmark respectively).

Copilot · 2026-02-19T12:27:49Z

crates/agnix-core/src/schemas/cross_platform.rs

+    // ===== Precise Boundary Tests for MAX_REGEX_INPUT_SIZE =====
+    //
+    // NOTE: check_markdown_structure intentionally has no MAX_REGEX_INPUT_SIZE guard.
+    // It uses markdown_header_pattern (r"^#+\s+.+"), a simple anchored regex that
+    // is not susceptible to catastrophic backtracking (ReDoS). Only functions using
+    // complex alternation, lookahead, or nested quantifiers require the size guard.
+


The PR description mentions "27 boundary tests", but the changes here appear to add 12 guarded functions × 2 boundary tests = 24, plus 2 above-limit tests for extract_imports/extract_markdown_links = 26. Either update the PR description/test plan count, or add the missing test if one is intended (e.g., a >64KB regression test for check_markdown_structure, since this block documents its intentional lack of a size guard).

Fixed in commit 824b128: added test_check_markdown_structure_processes_above_64kb_limit as the 27th (now 28th overall) boundary test, confirming check_markdown_structure processes content above 64KB since its simple anchored line-by-line regex cannot exhibit catastrophic backtracking. The PR description has also been updated to reflect the accurate count (28 tests: 15 at-limit/above-limit + 13 over-limit).

gemini-code-assist

Code Review

The pull request introduces comprehensive boundary tests for MAX_REGEX_INPUT_SIZE across various functions in markdown.rs, cross_platform.rs, and prompt.rs. These tests ensure that functions correctly handle inputs exactly at the 64KB limit (processing them) and inputs one byte over the limit (rejecting them), which is crucial for ReDoS protection. Additionally, tests are added to confirm that functions not using regex (like extract_imports and extract_markdown_links) process content beyond this limit without restriction. The changes improve the robustness and security of the regex-dependent parsing logic.

Fix MAX_REGEX_INPUT_SIZE doc comment to clarify that extract_imports and extract_markdown_links are NOT guarded (they use byte-scanning/pulldown-cmark). Add test_check_markdown_structure_processes_above_64kb_limit as the 27th boundary test, confirming check_markdown_structure processes content above 64KB since its simple anchored regex is not susceptible to ReDoS.

avifenesh · 2026-02-19T12:53:42Z

The `Fuzz fuzz_markdown` check is a pre-existing flaky failure caused by a bug in `pulldown-cmark` v0.13.0 (panic at `parse.rs:2199` on certain malformed inputs). This is unrelated to this PR - the same failure occurs on `main` (run 22026644595). The previous two fuzz runs on this branch passed successfully (runs 22181621570 and 22181753848), confirming the non-deterministic nature of the failure.

The `catch_unwind` wrapper in `extract_markdown_links` is designed to handle upstream panics, but libfuzzer intercepts panics as deadly signals before `catch_unwind` can process them. This is a known limitation when fuzzing with sanitizers enabled.

All other 25 CI checks pass.

avifenesh added 3 commits February 19, 2026 14:02

fix(tests): correct inaccurate line-percentage comment in boundary test

29a2b09

The filler appended to the base content contains no newlines, so total_lines stays at 20 after padding. Line 10 is at exactly 50.0%, not the previously claimed ~47.6% (which assumed 21 lines).

Copilot AI review requested due to automatic review settings February 19, 2026 12:23

Copilot started reviewing on behalf of avifenesh February 19, 2026 12:23 View session

fix(ci): apply cargo fmt and add CHANGELOG entry for boundary tests

30b4762

Fix long format! line in markdown.rs boundary tests that cargo fmt reformats, and add the required CHANGELOG.md entry for #457.

Copilot AI reviewed Feb 19, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

avifenesh merged commit 359caf4 into main Feb 19, 2026
29 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(core): add precise boundary tests for MAX_REGEX_INPUT_SIZE ReDoS guard (#457)#508

test(core): add precise boundary tests for MAX_REGEX_INPUT_SIZE ReDoS guard (#457)#508
avifenesh merged 5 commits intomainfrom
feature/redos-boundary-tests-457

avifenesh commented Feb 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

avifenesh Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

avifenesh Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

avifenesh commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avifenesh commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Related Issues

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

avifenesh Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

avifenesh Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

avifenesh commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

avifenesh commented Feb 19, 2026 •

edited

Loading