Conversation
… guard Add 26 boundary tests covering the exact 65536-byte limit for all guarded regex functions across markdown.rs, prompt.rs, and cross_platform.rs. Each guarded function gets two tests: content at exactly MAX_REGEX_INPUT_SIZE (processed) and at MAX_REGEX_INPUT_SIZE+1 (rejected). Also adds tests confirming extract_imports and extract_markdown_links handle content above 64KB without restriction since they use pulldown-cmark parsing instead of regex. Closes #457
- Replace magic numbers 19/18 in xml_tags tests with computed overhead - Fix inaccurate extract_imports comment (byte-by-byte, not pulldown-cmark) - Add design-intent comment for check_markdown_structure exclusion - Fix inaccurate "50%" comment in find_critical_in_middle test - Add self-documenting trigger comments for cross_platform.rs tests - Improve categorize_layer assertion messages - Add explanatory comments for unguarded function boundary tests
The filler appended to the base content contains no newlines, so total_lines stays at 20 after padding. Line 10 is at exactly 50.0%, not the previously claimed ~47.6% (which assumed 21 lines).
Summary of ChangesHello @avifenesh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the ReDoS protection mechanism by introducing comprehensive boundary tests for regex-guarded functions. It ensures that the system correctly handles inputs at the exact 64KB size limit and gracefully rejects inputs exceeding it, preventing potential denial-of-service attacks. Additionally, it verifies the expected behavior of non-regex-based parsing functions with large inputs and clarifies specific exclusions from the size guard. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Fix long format! line in markdown.rs boundary tests that cargo fmt reformats, and add the required CHANGELOG.md entry for #457.
There was a problem hiding this comment.
Pull request overview
Adds precise boundary-condition test coverage around the MAX_REGEX_INPUT_SIZE (64KB) ReDoS guard, ensuring regex-based detectors accept input at exactly the limit and safely skip processing one byte over, while also validating that non-regex markdown parsing continues to work above the limit.
Changes:
- Add exact-at-limit and one-byte-over tests for PE-* prompt schema regex detectors in
prompt.rs. - Add exact-at-limit and one-byte-over tests for XP-* cross-platform schema regex detectors and
categorize_layerprecedence detection incross_platform.rs. - Add exact boundary tests for XML tag extraction and >64KB “still processes” regression tests for non-regex markdown parsing helpers in
markdown.rs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/agnix-core/src/schemas/prompt.rs | Adds exact-boundary tests for prompt-engineering regex detectors guarded by MAX_REGEX_INPUT_SIZE. |
| crates/agnix-core/src/schemas/cross_platform.rs | Adds exact-boundary tests for cross-platform regex detectors and precedence detection; documents intentional non-guarded markdown structure check. |
| crates/agnix-core/src/parsers/markdown.rs | Adds exact-boundary tests for regex-guarded XML tag extraction and verifies non-regex import/link extraction works above 64KB. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_extract_imports_processes_above_64kb_limit() { | ||
| // No exactly-at-64kb companion: extract_imports has no size guard and always | ||
| // processes content of any size. | ||
| // extract_imports uses byte-by-byte scanning (not regex) inside pulldown-cmark | ||
| // token callbacks, so MAX_REGEX_INPUT_SIZE does not apply. | ||
| let imports_prefix = "@styles.css\n"; | ||
| let needed = MAX_REGEX_INPUT_SIZE - imports_prefix.len() + 1; | ||
| let content = format!("{}{}", imports_prefix, "a".repeat(needed)); | ||
| assert!( | ||
| content.len() > MAX_REGEX_INPUT_SIZE, | ||
| "Content must exceed the limit" | ||
| ); | ||
| let imports = extract_imports(&content); | ||
| assert!( | ||
| !imports.is_empty(), | ||
| "extract_imports should process content beyond the regex size limit" | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_extract_markdown_links_processes_above_64kb_limit() { |
There was a problem hiding this comment.
The new tests state that extract_imports/extract_markdown_links are not subject to MAX_REGEX_INPUT_SIZE, but the module-level docs in this file currently say the limit applies to "import detection" and "link parsing". Please reconcile the documentation (either update the module docs or adjust wording here) so readers don’t get conflicting guidance about which functions are guarded.
There was a problem hiding this comment.
Fixed in commit 824b128: updated the MAX_REGEX_INPUT_SIZE doc comment to explicitly state that only extract_xml_tags is guarded, while extract_imports and extract_markdown_links are not (they use byte-by-byte scanning and pulldown-cmark respectively).
| // ===== Precise Boundary Tests for MAX_REGEX_INPUT_SIZE ===== | ||
| // | ||
| // NOTE: check_markdown_structure intentionally has no MAX_REGEX_INPUT_SIZE guard. | ||
| // It uses markdown_header_pattern (r"^#+\s+.+"), a simple anchored regex that | ||
| // is not susceptible to catastrophic backtracking (ReDoS). Only functions using | ||
| // complex alternation, lookahead, or nested quantifiers require the size guard. | ||
|
|
There was a problem hiding this comment.
The PR description mentions "27 boundary tests", but the changes here appear to add 12 guarded functions × 2 boundary tests = 24, plus 2 above-limit tests for extract_imports/extract_markdown_links = 26. Either update the PR description/test plan count, or add the missing test if one is intended (e.g., a >64KB regression test for check_markdown_structure, since this block documents its intentional lack of a size guard).
There was a problem hiding this comment.
Fixed in commit 824b128: added test_check_markdown_structure_processes_above_64kb_limit as the 27th (now 28th overall) boundary test, confirming check_markdown_structure processes content above 64KB since its simple anchored line-by-line regex cannot exhibit catastrophic backtracking. The PR description has also been updated to reflect the accurate count (28 tests: 15 at-limit/above-limit + 13 over-limit).
There was a problem hiding this comment.
Code Review
The pull request introduces comprehensive boundary tests for MAX_REGEX_INPUT_SIZE across various functions in markdown.rs, cross_platform.rs, and prompt.rs. These tests ensure that functions correctly handle inputs exactly at the 64KB limit (processing them) and inputs one byte over the limit (rejecting them), which is crucial for ReDoS protection. Additionally, tests are added to confirm that functions not using regex (like extract_imports and extract_markdown_links) process content beyond this limit without restriction. The changes improve the robustness and security of the regex-dependent parsing logic.
Fix MAX_REGEX_INPUT_SIZE doc comment to clarify that extract_imports and extract_markdown_links are NOT guarded (they use byte-scanning/pulldown-cmark). Add test_check_markdown_structure_processes_above_64kb_limit as the 27th boundary test, confirming check_markdown_structure processes content above 64KB since its simple anchored regex is not susceptible to ReDoS.
|
The `Fuzz fuzz_markdown` check is a pre-existing flaky failure caused by a bug in `pulldown-cmark` v0.13.0 (panic at `parse.rs:2199` on certain malformed inputs). This is unrelated to this PR - the same failure occurs on `main` (run 22026644595). The previous two fuzz runs on this branch passed successfully (runs 22181621570 and 22181753848), confirming the non-deterministic nature of the failure. The `catch_unwind` wrapper in `extract_markdown_links` is designed to handle upstream panics, but libfuzzer intercepts panics as deadly signals before `catch_unwind` can process them. This is a known limitation when fuzzing with sanitizers enabled. All other 25 CI checks pass. |
Summary
Test Plan
Related Issues
Closes #457