Skip to content

test(core): add precise boundary tests for MAX_REGEX_INPUT_SIZE ReDoS guard (#457)#508

Merged
avifenesh merged 5 commits intomainfrom
feature/redos-boundary-tests-457
Feb 19, 2026
Merged

test(core): add precise boundary tests for MAX_REGEX_INPUT_SIZE ReDoS guard (#457)#508
avifenesh merged 5 commits intomainfrom
feature/redos-boundary-tests-457

Conversation

@avifenesh
Copy link
Collaborator

@avifenesh avifenesh commented Feb 19, 2026

Summary

  • Add 28 boundary tests covering the 65536-byte limit for regex-guarded functions and unguarded functions across `markdown.rs`, `prompt.rs`, and `cross_platform.rs`
  • Each of the 12 guarded regex functions gets two tests: content at exactly `MAX_REGEX_INPUT_SIZE` (processed) and at `MAX_REGEX_INPUT_SIZE + 1` (rejected)
  • Adds tests confirming `extract_imports`, `extract_markdown_links`, and `check_markdown_structure` handle content above 64KB without restriction (they use byte-by-byte scanning, pulldown-cmark, and a simple anchored line-by-line regex respectively - none susceptible to catastrophic backtracking)
  • Fixes `MAX_REGEX_INPUT_SIZE` doc comment to clarify exactly which functions are guarded vs. unguarded
  • Documents intentional exclusion of `check_markdown_structure` from the size guard in both code comments and a regression test

Test Plan

  • `cargo test -p agnix-core -- 64kb` - 15 exact-boundary and above-limit tests pass
  • `cargo test -p agnix-core -- one_byte_over` - 13 over-limit tests pass
  • Full test suite: 2742 tests pass, 0 failures

Related Issues

Closes #457

… guard

Add 26 boundary tests covering the exact 65536-byte limit for all
guarded regex functions across markdown.rs, prompt.rs, and
cross_platform.rs. Each guarded function gets two tests: content at
exactly MAX_REGEX_INPUT_SIZE (processed) and at MAX_REGEX_INPUT_SIZE+1
(rejected). Also adds tests confirming extract_imports and
extract_markdown_links handle content above 64KB without restriction
since they use pulldown-cmark parsing instead of regex.

Closes #457
- Replace magic numbers 19/18 in xml_tags tests with computed overhead
- Fix inaccurate extract_imports comment (byte-by-byte, not pulldown-cmark)
- Add design-intent comment for check_markdown_structure exclusion
- Fix inaccurate "50%" comment in find_critical_in_middle test
- Add self-documenting trigger comments for cross_platform.rs tests
- Improve categorize_layer assertion messages
- Add explanatory comments for unguarded function boundary tests
The filler appended to the base content contains no newlines, so
total_lines stays at 20 after padding. Line 10 is at exactly 50.0%,
not the previously claimed ~47.6% (which assumed 21 lines).
Copilot AI review requested due to automatic review settings February 19, 2026 12:23
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @avifenesh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the ReDoS protection mechanism by introducing comprehensive boundary tests for regex-guarded functions. It ensures that the system correctly handles inputs at the exact 64KB size limit and gracefully rejects inputs exceeding it, preventing potential denial-of-service attacks. Additionally, it verifies the expected behavior of non-regex-based parsing functions with large inputs and clarifies specific exclusions from the size guard.

Highlights

  • Enhanced ReDoS Protection Testing: Added 27 new boundary tests to validate the MAX_REGEX_INPUT_SIZE ReDoS guard across 12 regex functions in markdown.rs, prompt.rs, and cross_platform.rs.
  • Non-Regex Function Validation: Introduced tests to confirm that functions like extract_imports and extract_markdown_links, which do not use regex, correctly process content exceeding the 64KB limit.
  • Documentation of Exclusions: Explicitly documented the rationale for check_markdown_structure's intentional exclusion from the MAX_REGEX_INPUT_SIZE guard.
Changelog
  • crates/agnix-core/src/parsers/markdown.rs
    • Added precise boundary tests for extract_xml_tags at and one byte over the MAX_REGEX_INPUT_SIZE limit.
    • Added tests for extract_imports and extract_markdown_links to confirm processing of content above the 64KB limit.
  • crates/agnix-core/src/schemas/cross_platform.rs
    • Added precise boundary tests for find_claude_specific_features, find_hard_coded_paths, extract_build_commands, extract_tool_constraints, and categorize_layer at and one byte over the MAX_REGEX_INPUT_SIZE limit.
    • Added a note explaining why check_markdown_structure is intentionally not guarded.
  • crates/agnix-core/src/schemas/prompt.rs
    • Added precise boundary tests for find_critical_in_middle, find_cot_on_simple_tasks, find_weak_imperative_language, find_ambiguous_instructions, find_redundant_instructions, and find_negative_only_instructions at and one byte over the MAX_REGEX_INPUT_SIZE limit.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Fix long format! line in markdown.rs boundary tests that cargo fmt
reformats, and add the required CHANGELOG.md entry for #457.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds precise boundary-condition test coverage around the MAX_REGEX_INPUT_SIZE (64KB) ReDoS guard, ensuring regex-based detectors accept input at exactly the limit and safely skip processing one byte over, while also validating that non-regex markdown parsing continues to work above the limit.

Changes:

  • Add exact-at-limit and one-byte-over tests for PE-* prompt schema regex detectors in prompt.rs.
  • Add exact-at-limit and one-byte-over tests for XP-* cross-platform schema regex detectors and categorize_layer precedence detection in cross_platform.rs.
  • Add exact boundary tests for XML tag extraction and >64KB “still processes” regression tests for non-regex markdown parsing helpers in markdown.rs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
crates/agnix-core/src/schemas/prompt.rs Adds exact-boundary tests for prompt-engineering regex detectors guarded by MAX_REGEX_INPUT_SIZE.
crates/agnix-core/src/schemas/cross_platform.rs Adds exact-boundary tests for cross-platform regex detectors and precedence detection; documents intentional non-guarded markdown structure check.
crates/agnix-core/src/parsers/markdown.rs Adds exact-boundary tests for regex-guarded XML tag extraction and verifies non-regex import/link extraction works above 64KB.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +964 to +988
);
}

#[test]
fn test_extract_imports_processes_above_64kb_limit() {
// No exactly-at-64kb companion: extract_imports has no size guard and always
// processes content of any size.
// extract_imports uses byte-by-byte scanning (not regex) inside pulldown-cmark
// token callbacks, so MAX_REGEX_INPUT_SIZE does not apply.
let imports_prefix = "@styles.css\n";
let needed = MAX_REGEX_INPUT_SIZE - imports_prefix.len() + 1;
let content = format!("{}{}", imports_prefix, "a".repeat(needed));
assert!(
content.len() > MAX_REGEX_INPUT_SIZE,
"Content must exceed the limit"
);
let imports = extract_imports(&content);
assert!(
!imports.is_empty(),
"extract_imports should process content beyond the regex size limit"
);
}

#[test]
fn test_extract_markdown_links_processes_above_64kb_limit() {
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests state that extract_imports/extract_markdown_links are not subject to MAX_REGEX_INPUT_SIZE, but the module-level docs in this file currently say the limit applies to "import detection" and "link parsing". Please reconcile the documentation (either update the module docs or adjust wording here) so readers don’t get conflicting guidance about which functions are guarded.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 824b128: updated the MAX_REGEX_INPUT_SIZE doc comment to explicitly state that only extract_xml_tags is guarded, while extract_imports and extract_markdown_links are not (they use byte-by-byte scanning and pulldown-cmark respectively).

Comment on lines +2520 to +2526
// ===== Precise Boundary Tests for MAX_REGEX_INPUT_SIZE =====
//
// NOTE: check_markdown_structure intentionally has no MAX_REGEX_INPUT_SIZE guard.
// It uses markdown_header_pattern (r"^#+\s+.+"), a simple anchored regex that
// is not susceptible to catastrophic backtracking (ReDoS). Only functions using
// complex alternation, lookahead, or nested quantifiers require the size guard.

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions "27 boundary tests", but the changes here appear to add 12 guarded functions × 2 boundary tests = 24, plus 2 above-limit tests for extract_imports/extract_markdown_links = 26. Either update the PR description/test plan count, or add the missing test if one is intended (e.g., a >64KB regression test for check_markdown_structure, since this block documents its intentional lack of a size guard).

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 824b128: added test_check_markdown_structure_processes_above_64kb_limit as the 27th (now 28th overall) boundary test, confirming check_markdown_structure processes content above 64KB since its simple anchored line-by-line regex cannot exhibit catastrophic backtracking. The PR description has also been updated to reflect the accurate count (28 tests: 15 at-limit/above-limit + 13 over-limit).

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces comprehensive boundary tests for MAX_REGEX_INPUT_SIZE across various functions in markdown.rs, cross_platform.rs, and prompt.rs. These tests ensure that functions correctly handle inputs exactly at the 64KB limit (processing them) and inputs one byte over the limit (rejecting them), which is crucial for ReDoS protection. Additionally, tests are added to confirm that functions not using regex (like extract_imports and extract_markdown_links) process content beyond this limit without restriction. The changes improve the robustness and security of the regex-dependent parsing logic.

Fix MAX_REGEX_INPUT_SIZE doc comment to clarify that extract_imports and
extract_markdown_links are NOT guarded (they use byte-scanning/pulldown-cmark).
Add test_check_markdown_structure_processes_above_64kb_limit as the 27th
boundary test, confirming check_markdown_structure processes content above
64KB since its simple anchored regex is not susceptible to ReDoS.
@avifenesh
Copy link
Collaborator Author

The `Fuzz fuzz_markdown` check is a pre-existing flaky failure caused by a bug in `pulldown-cmark` v0.13.0 (panic at `parse.rs:2199` on certain malformed inputs). This is unrelated to this PR - the same failure occurs on `main` (run 22026644595). The previous two fuzz runs on this branch passed successfully (runs 22181621570 and 22181753848), confirming the non-deterministic nature of the failure.

The `catch_unwind` wrapper in `extract_markdown_links` is designed to handle upstream panics, but libfuzzer intercepts panics as deadly signals before `catch_unwind` can process them. This is a known limitation when fuzzing with sanitizers enabled.

All other 25 CI checks pass.

@avifenesh avifenesh merged commit 359caf4 into main Feb 19, 2026
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add boundary tests for MAX_REGEX_INPUT_SIZE (ReDoS protection)

2 participants