Skip to content

fix(engine): sanitize error messages in checkpoint reconciliation and compaction summaries#632

Merged
Aureliolo merged 4 commits intomainfrom
fix/sanitize-engine-error-messages
Mar 20, 2026
Merged

fix(engine): sanitize error messages in checkpoint reconciliation and compaction summaries#632
Aureliolo merged 4 commits intomainfrom
fix/sanitize-engine-error-messages

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

@Aureliolo Aureliolo commented Mar 20, 2026

Summary

  • Extract shared sanitize_message() helper from _handle_fatal_error's inline regex chain into engine/sanitization.py -- redacts file paths, URLs, UNC paths, and non-HTTP URI schemes, strips non-printable chars, enforces length limit
  • Apply sanitization to error_message in checkpoint reconciliation before injecting into LLM context (fix(engine): sanitize error messages in checkpoint reconciliation #615)
  • Apply sanitization to assistant content snippets in compaction summaries before joining (fix(engine): sanitize assistant content snippets in compaction summaries #614)
  • Refactor _handle_fatal_error to use the shared helper (no behavior change)
  • Broaden path patterns (general Unix absolute paths, UNC paths) and URL patterns (postgresql, redis, mongodb, etc.) based on security review
  • Fix summarizer to sanitize full content before truncating (prevents mid-path truncation bypass)

Closes #614
Closes #615

Test plan

  • 28 unit tests for sanitize_message() covering paths (Windows, Unix, UNC, relative), URLs (HTTP, data-store URIs), mixed content, truncation, non-printable stripping, fallback, edge cases, and passthrough
  • 2 integration tests for checkpoint reconciliation sanitization (path + URL)
  • 2 integration tests for compaction summarizer sanitization (path + URL)
  • 1 integration test for _handle_fatal_error sanitization wiring
  • Full suite: 9729 passed, 93.89% coverage
  • mypy strict: no issues
  • ruff lint + format: clean

Review coverage

Pre-reviewed by 10 agents (docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, conventions-enforcer, logging-audit, resilience-audit, security-reviewer, comment-analyzer, issue-resolution-verifier). 15 findings addressed.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Error messages, agent fatal errors, checkpoint recovery messages, and compacted assistant summaries now redact file paths and URLs before being stored or injected into model context.
  • Documentation

    • Engine design docs updated to clarify sanitization, truncation, and where sanitization is applied in compaction and recovery flows.
  • Tests

    • New tests validate sanitization, truncation, redaction tokens, and fallback behavior for messages and summaries.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 20, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 253cfde9-cd9d-430e-a141-de18be2d9103

📥 Commits

Reviewing files that changed from the base of the PR and between 8094057 and 53892f8.

📒 Files selected for processing (3)
  • tests/unit/engine/checkpoint/test_resume.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_sanitization.py

Walkthrough

A new sanitization helper sanitize_message was added in src/synthorg/engine/sanitization.py to redact file paths and URLs and truncate long text. The helper replaces inline redaction in agent_engine._handle_fatal_error, is applied to error_message before constructing the reconciliation message in src/synthorg/engine/checkpoint/resume.py, and is used when building assistant snippets in src/synthorg/engine/compaction/summarizer.py. Documentation (docs/design/engine.md) was updated to describe these call sites, and unit tests were added to verify redaction, truncation, and edge-case behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 40.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(engine): sanitize error messages in checkpoint reconciliation and compaction summaries' accurately describes the main objective of the PR—extracting and applying a sanitization helper across error handling paths.
Linked Issues check ✅ Passed All coding objectives from #614 and #615 are met: sanitize_message() is extracted and shared; compaction summarizer sanitizes assistant snippets before truncation; checkpoint reconciliation sanitizes error_message before LLM context injection.
Out of Scope Changes check ✅ Passed All changes remain within scope: new sanitization module, refactored uses in three error-handling paths (_handle_fatal_error, checkpoint reconciliation, compaction summarizer), documentation updates, and comprehensive test coverage—no unrelated alterations detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the security and privacy of the system by centralizing and applying robust message sanitization across critical components. It introduces a dedicated sanitize_message utility to redact sensitive information such as file paths and URLs from error messages and LLM-facing content. This ensures that internal system details are not inadvertently exposed to external systems or stored in logs, thereby improving data protection and reducing potential attack surfaces.

Highlights

  • Centralized Message Sanitization: A shared sanitize_message() helper function was extracted to engine/sanitization.py. This utility redacts file paths, URLs, UNC paths, and non-HTTP URI schemes, strips non-printable characters, and enforces length limits for messages.
  • Enhanced Checkpoint Reconciliation Security: Sanitization is now applied to error_message content during checkpoint reconciliation, preventing sensitive data from being inadvertently exposed or injected into LLM context.
  • Improved Compaction Summary Privacy: Assistant content snippets in conversation compaction summaries are now sanitized before being joined, ensuring that private information is redacted from summarized content.
  • Refactored Error Handling: The _handle_fatal_error function was refactored to use the new shared sanitize_message() helper, standardizing error message sanitization without altering its existing behavior.
  • Expanded Redaction Patterns: The regex patterns for path and URL detection were broadened to include general Unix absolute paths, UNC paths, and various data-store URI schemes (e.g., postgresql, redis, mongodb), based on security review recommendations.
  • Robust Truncation Logic: The summarizer logic was fixed to ensure that full content sanitization occurs before truncation, preventing potential bypasses where sensitive information might remain if truncated mid-path.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 20, 2026 06:31 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant security improvement by centralizing and enhancing message sanitization to prevent leaking sensitive information like file paths and URLs into LLM contexts. The new sanitize_message helper is well-implemented and applied correctly in _handle_fatal_error, checkpoint reconciliation, and compaction summaries. The refactoring is clean, and the accompanying unit and integration tests are comprehensive, covering a wide range of cases and potential bypasses. I have one suggestion to further improve the robustness of the path redaction logic.

Comment on lines +10 to +16
_PATH_PATTERN = re.compile(
r"[A-Za-z]:\\[^\s,;)\"']+"
r"|\\\\[^\s,;)\"']+"
r"|//[^\s,;)\"']{2,}"
r"|/[^\s,;)\"']{2,}"
r"|\.\.?/[^\s,;)\"']+"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current regex for path sanitization does not handle unquoted paths containing spaces. For example, an error message like Cannot open C:\Program Files\app.exe would result in partial redaction (Cannot open [REDACTED_PATH] Files\app.exe), potentially leaking parts of the file path. This is a security weakness.

While perfectly parsing all possible unquoted paths with spaces is very difficult with a single regex, we can improve robustness by explicitly matching quoted paths, which are a common way to represent paths with spaces. This suggestion adds patterns for single- and double-quoted paths before the existing patterns for unquoted paths.

_PATH_PATTERN = re.compile(
    r'"(?:[A-Za-z]:\\|\\\\|//|/|\./|\.\./)[^"]*"'
    r"|'(?:[A-Za-z]:\\|\\\\|//|/|\./|\.\./)[^']*'"
    r"|[A-Za-z]:\\[^\s,;)\"'\"]+"
    r"|\\\\[^\s,;)\"'\"]+"
    r"|//[^\s,;)\"'\"]{2,}"
    r"|/[^\s,;)\"'\"]{2,}"
    r"|\.\.?/[^\s,;)\"'\"]+"
)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/sanitization.py`:
- Around line 10-20: sanitize_message() currently only uses _PATH_PATTERN and
_URL_PATTERN and misses plain credentials (e.g., api_key=, token=, password=,
Authorization: Bearer) — add one or more credential regexes (e.g., for key=value
query/header forms and Authorization: Bearer/.+ and Bearer tokens in
URLs/headers) and apply them in sanitize_message() after/_alongside
_PATH_PATTERN/_URL_PATTERN to replace matches with a consistent placeholder like
"<REDACTED_CREDENTIAL>"; update/remove overlaps with _URL_PATTERN as needed to
avoid double-redaction and ensure functions/classes referenced (_PATH_PATTERN,
_URL_PATTERN, sanitize_message) handle headers, query params, and inline tokens.
- Around line 8-9: The module src/synthorg/engine/sanitization.py is missing the
standard module logger; add the required import and logger initialization by
importing get_logger from synthorg.observability and creating logger =
get_logger(__name__) at module scope so the module follows the convention used
across synthorg; place the import alongside existing imports (e.g., next to
import re) and define logger = get_logger(__name__) near the top of the file.

In `@tests/unit/engine/compaction/test_summarizer.py`:
- Around line 241-313: Add a regression test that verifies snippet truncation
cannot leak a partial path prefix: create a new async test (e.g.
test_assistant_snippet_truncation_does_not_leak_path_prefix) that uses the same
CompactionConfig and make_compaction_callback setup as the existing tests, then
build messages with a very long assistant content where a full path (e.g.
"C:\\Users\\dev\\project\\secrets.yaml") begins just before the 100-char snippet
boundary so truncation would have previously exposed a partial prefix; call
_build_context and the callback, then assert the resulting summary
message.content does not contain any partial path fragment like "C:\\Users" or
the original filename and that "[REDACTED_PATH]" appears instead; reuse symbols
_msg, _build_context, CompactionConfig, make_compaction_callback and the
sample_agent_with_personality fixture so the test matches the existing pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: d61d0284-68ee-4e12-8c4b-9953cb3fca68

📥 Commits

Reviewing files that changed from the base of the PR and between 1a36eca and 3f30023.

📒 Files selected for processing (10)
  • CLAUDE.md
  • docs/design/engine.md
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/sanitization.py
  • tests/unit/engine/checkpoint/test_resume.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_agent_engine_errors.py
  • tests/unit/engine/test_sanitization.py

Comment on lines +8 to +9
import re

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add module logger initialization to match engine module conventions.

This module has business logic but does not initialize logger = get_logger(__name__).

As per coding guidelines, src/synthorg/**/*.py: “Every module with business logic must have: from synthorg.observability import get_logger then logger = get_logger(__name__).”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/sanitization.py` around lines 8 - 9, The module
src/synthorg/engine/sanitization.py is missing the standard module logger; add
the required import and logger initialization by importing get_logger from
synthorg.observability and creating logger = get_logger(__name__) at module
scope so the module follows the convention used across synthorg; place the
import alongside existing imports (e.g., next to import re) and define logger =
get_logger(__name__) near the top of the file.

Comment on lines +10 to +20
_PATH_PATTERN = re.compile(
r"[A-Za-z]:\\[^\s,;)\"']+"
r"|\\\\[^\s,;)\"']+"
r"|//[^\s,;)\"']{2,}"
r"|/[^\s,;)\"']{2,}"
r"|\.\.?/[^\s,;)\"']+"
)
_URL_PATTERN = re.compile(
r"(?:https?|postgresql|postgres|mysql|redis|mongodb|amqp|ftp|sftp|file)"
r"://[^\s,;)\"']+"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add credential/token redaction to the shared sanitizer.

sanitize_message() currently redacts paths/URLs only. Plain secrets like
api_key=..., password=..., or Authorization: Bearer ... can still be
injected into LLM context through all three integration points.

🔒 Proposed patch
 _URL_PATTERN = re.compile(
     r"(?:https?|postgresql|postgres|mysql|redis|mongodb|amqp|ftp|sftp|file)"
     r"://[^\s,;)\"']+"
 )
+_SECRET_PATTERN = re.compile(
+    r"(?i)(?:"
+    r"\b(?:api[_-]?key|token|secret|password)\b\s*[:=]\s*[^\s,;)\"']+"
+    r"|\bauthorization\s*:\s*bearer\s+[^\s,;)\"']+"
+    r")"
+)
@@
 def sanitize_message(raw: str, *, max_length: int = 200) -> str:
@@
     sanitized = _URL_PATTERN.sub("[REDACTED_URL]", raw)
     sanitized = _PATH_PATTERN.sub("[REDACTED_PATH]", sanitized)
+    sanitized = _SECRET_PATTERN.sub("[REDACTED_SECRET]", sanitized)
     sanitized = "".join(c for c in sanitized[:max_length] if c.isprintable())

Also applies to: 36-38

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/sanitization.py` around lines 10 - 20, sanitize_message()
currently only uses _PATH_PATTERN and _URL_PATTERN and misses plain credentials
(e.g., api_key=, token=, password=, Authorization: Bearer) — add one or more
credential regexes (e.g., for key=value query/header forms and Authorization:
Bearer/.+ and Bearer tokens in URLs/headers) and apply them in
sanitize_message() after/_alongside _PATH_PATTERN/_URL_PATTERN to replace
matches with a consistent placeholder like "<REDACTED_CREDENTIAL>";
update/remove overlaps with _URL_PATTERN as needed to avoid double-redaction and
ensure functions/classes referenced (_PATH_PATTERN, _URL_PATTERN,
sanitize_message) handle headers, query params, and inline tokens.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.49%. Comparing base (2e51b60) to head (53892f8).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #632   +/-   ##
=======================================
  Coverage   92.48%   92.49%           
=======================================
  Files         553      554    +1     
  Lines       27823    27836   +13     
  Branches     2669     2670    +1     
=======================================
+ Hits        25733    25747   +14     
+ Misses       1648     1647    -1     
  Partials      442      442           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Aureliolo and others added 3 commits March 20, 2026 08:02
… compaction summaries

Extract shared sanitize_message() helper from _handle_fatal_error's
inline regex chain into engine/sanitization.py. Apply to checkpoint
resume (error_message before LLM context injection) and compaction
summarizer (assistant content snippets before joining into summary).

Closes #614
Closes #615

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Broaden Unix path pattern to catch all absolute paths (not just
  allowlisted prefixes), add UNC path support
- Extend URL pattern to cover data-store URI schemes (postgresql,
  redis, mongodb, etc.)
- Reorder URL/path redaction (URLs first) to prevent path pattern
  from mangling URL scheme separators
- Fix summarizer to sanitize full content before truncating (prevents
  mid-path truncation bypass)
- Add integration test for _handle_fatal_error sanitization wiring
- Convert parametrize labels to pytest.param IDs, add edge cases
- Add pytestmark timeout to summarizer tests
- Update CLAUDE.md and docs/design/engine.md for new module

Pre-reviewed by 10 agents, 15 findings addressed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… reviewers

- Fix overly broad Unix path regex: add (?<!\w) lookbehind to prevent
  false positives on ratios (100/min), content-types (application/json),
  and date formats (2024/01/15)
- Add quoted Windows path detection for paths with spaces
- Fix truncation ordering: cap raw input before redaction to prevent
  partial redaction tokens like [REDACTED_PA in output
- Add max_length negative value validation with ValueError
- Handle all-redacted snippets in compaction summarizer: filter out
  useless "details redacted" placeholders, fall through to generic
  fallback when all snippets are redacted
- Fix "stack traces" claim in engine.md: function only redacts paths/URLs,
  truncation limits stack trace exposure
- Add sanitization mentions to Compaction Hook and Checkpoint Recovery
  doc sections
- Update _build_summary docstring to mention sanitization step
- Update max_length docstring to clarify it caps raw input, not output
- Fix resume.py inline comment precision (broader than just paths/URLs)
- Add 13 new tests: non-standard Unix prefixes, false-positive prevention,
  quoted Windows paths, additional URL schemes (ftp, sftp, amqp, mysql),
  file:// URI ordering, negative max_length, truncation boundary regression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Aureliolo Aureliolo force-pushed the fix/sanitize-engine-error-messages branch from 3f30023 to 8094057 Compare March 20, 2026 07:07
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 20, 2026 07:08 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
src/synthorg/engine/sanitization.py (2)

8-9: ⚠️ Potential issue | 🟠 Major

Initialize module logger for this business-logic module.

This module introduces engine business logic but does not define the required module logger.

🔧 Proposed fix
 import re
+
+from synthorg.observability import get_logger
+
+logger = get_logger(__name__)

As per coding guidelines, src/synthorg/**/*.py: “Every module with business logic must have: from synthorg.observability import get_logger then logger = get_logger(__name__).”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/sanitization.py` around lines 8 - 9, This module is
missing the required module logger: add the import "from synthorg.observability
import get_logger" at top of src/synthorg/engine/sanitization.py and initialize
"logger = get_logger(__name__)" at module scope so business-logic in this file
uses the standard get_logger and logger symbols.

10-29: ⚠️ Potential issue | 🟠 Major

Add credential/token redaction in shared sanitizer.

sanitize_message() currently redacts paths/URLs but still allows inline secrets (e.g., api_key=..., password=..., bearer tokens) to pass through to LLM context.

🔒 Proposed fix
 _URL_PATTERN = re.compile(
     r"(?:https?|postgresql|postgres|mysql|redis|mongodb|amqp|ftp|sftp|file)"
     r"://[^\s,;)\"']+"
 )
+_SECRET_PATTERN = re.compile(
+    r"(?i)(?:"
+    r"\b(?:api[_-]?key|token|secret|password)\b\s*[:=]\s*[^\s,;)\"']+"
+    r"|\bauthorization\s*:\s*bearer\s+[^\s,;)\"']+"
+    r")"
+)
@@
     sanitized = _URL_PATTERN.sub("[REDACTED_URL]", capped)
     sanitized = _PATH_PATTERN.sub("[REDACTED_PATH]", sanitized)
+    sanitized = _SECRET_PATTERN.sub("[REDACTED_SECRET]", sanitized)
     sanitized = "".join(c for c in sanitized if c.isprintable())

Also applies to: 54-56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/sanitization.py` around lines 10 - 29, sanitize_message
currently only strips paths/URLs via _PATH_PATTERN/_URL_PATTERN but doesn't
redact inline credentials or bearer tokens; add one or more regex patterns to
match common secrets (e.g., query/key-value pairs like api_key=...,
access_token=..., token=..., password=..., secret=..., auth=..., passwd=...),
Authorization headers with "Bearer <token>" and header-style
"Authorization:Bearer ..." and include them in the sanitizer pipeline so matches
are replaced with a stable placeholder like "[REDACTED]"; update
sanitize_message to apply these new credential patterns (after or alongside
_URL_PATTERN) and ensure functions/variables mentioned (_PATH_PATTERN,
_URL_PATTERN, sanitize_message) are the points to modify so any matched secrets
are redacted before returning the message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/sanitization.py`:
- Around line 50-52: The code raises ValueError for an invalid max_length but
doesn't log the error; before raising, call the module logger (e.g.,
logger.warning or logger.error) to record the invalid-input with context
including the provided max_length and any relevant function name (reference the
max_length check and msg variable in sanitization.py), then raise the same
ValueError; ensure you import or use the existing logger instance in the module
and include a clear message like "invalid max_length" plus the max_length value.

In `@tests/unit/engine/checkpoint/test_resume.py`:
- Around line 128-167: Combine the two nearly identical tests
test_error_with_path_is_sanitized and test_error_with_url_is_sanitized into a
single parametrized test using pytest.mark.parametrize: keep the shared setup
using _make_ctx_json and deserialize_and_reconcile (referencing AgentIdentity
and Task fixtures), and parametrize the error_message input and expected
redaction token (e.g., ("Failed at
C:\\Users\\dev\\secret.key","[REDACTED_PATH]") and ("Timeout calling
https://api.internal.io/v1/completions","[REDACTED_URL]")), then assert the
redaction appears and the raw substring (like "C:\\Users" or "https://") does
not appear; ensure the new test keeps the same agent_id/task_id and checks
last_msg.content as before.

In `@tests/unit/engine/compaction/test_summarizer.py`:
- Around line 241-314: Combine the two near-duplicate tests
test_assistant_snippet_with_path_is_sanitized and
test_assistant_snippet_with_url_is_sanitized into a single parametrized test
using pytest.mark.parametrize: iterate over tuples like (assistant_text,
forbidden_substring, expected_redaction) and reuse the same setup
(CompactionConfig, make_compaction_callback, _build_context, _msg, MessageRole)
inside the test body; assert that forbidden_substring is not in
summary_msg.content and expected_redaction is present. Keep the same
capacity/fill/preserve parameters and preserve_recent_turns, and name the new
test to reflect both cases (e.g.,
test_assistant_snippet_sanitization_parametrized).

In `@tests/unit/engine/test_sanitization.py`:
- Around line 216-222: Add a unit test that exercises sanitize_message's
redaction-expansion edge case: construct input where a short redactable pattern
(e.g., a path like "/ab") falls right before the truncate boundary used by
test_default_max_length/test_custom_max_length (use max_length=200) so that the
replacement token (e.g., "[REDACTED_PATH]") can expand past the limit; call
sanitize_message(raw, max_length=200), assert the redaction token appears in the
result and allow the length to exceed 200 as documented. Reference
sanitize_message and reuse the existing test naming style (e.g.,
test_redaction_near_boundary_may_exceed_max_length) so the new test is
discoverable alongside test_default_max_length and test_custom_max_length.

---

Duplicate comments:
In `@src/synthorg/engine/sanitization.py`:
- Around line 8-9: This module is missing the required module logger: add the
import "from synthorg.observability import get_logger" at top of
src/synthorg/engine/sanitization.py and initialize "logger =
get_logger(__name__)" at module scope so business-logic in this file uses the
standard get_logger and logger symbols.
- Around line 10-29: sanitize_message currently only strips paths/URLs via
_PATH_PATTERN/_URL_PATTERN but doesn't redact inline credentials or bearer
tokens; add one or more regex patterns to match common secrets (e.g.,
query/key-value pairs like api_key=..., access_token=..., token=...,
password=..., secret=..., auth=..., passwd=...), Authorization headers with
"Bearer <token>" and header-style "Authorization:Bearer ..." and include them in
the sanitizer pipeline so matches are replaced with a stable placeholder like
"[REDACTED]"; update sanitize_message to apply these new credential patterns
(after or alongside _URL_PATTERN) and ensure functions/variables mentioned
(_PATH_PATTERN, _URL_PATTERN, sanitize_message) are the points to modify so any
matched secrets are redacted before returning the message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: cee47d61-b3bf-419d-b67c-de6bd421113d

📥 Commits

Reviewing files that changed from the base of the PR and between 3f30023 and 8094057.

📒 Files selected for processing (10)
  • CLAUDE.md
  • docs/design/engine.md
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/sanitization.py
  • tests/unit/engine/checkpoint/test_resume.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_agent_engine_errors.py
  • tests/unit/engine/test_sanitization.py

Comment on lines +50 to +52
if max_length < 0:
msg = f"max_length must be >= 0, got {max_length}"
raise ValueError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Log invalid-input error path before raising.

On Line 50-Line 52, ValueError is raised without a WARNING/ERROR log entry and context.

As per coding guidelines, src/synthorg/**/*.py: “All error paths must log at WARNING or ERROR with context before raising.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/sanitization.py` around lines 50 - 52, The code raises
ValueError for an invalid max_length but doesn't log the error; before raising,
call the module logger (e.g., logger.warning or logger.error) to record the
invalid-input with context including the provided max_length and any relevant
function name (reference the max_length check and msg variable in
sanitization.py), then raise the same ValueError; ensure you import or use the
existing logger instance in the module and include a clear message like "invalid
max_length" plus the max_length value.

- Parametrize duplicate resume sanitization tests (path + URL)
- Parametrize duplicate compaction sanitization tests (path + URL)
- Add redaction-expansion boundary test verifying output may exceed
  max_length when redaction tokens expand (documented behavior)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 20, 2026 07:18 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 5394ed7 into main Mar 20, 2026
33 checks passed
@Aureliolo Aureliolo deleted the fix/sanitize-engine-error-messages branch March 20, 2026 07:38
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 20, 2026 07:38 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 20, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.10](v0.3.9...v0.3.10)
(2026-03-20)


### Bug Fixes

* **ci:** generate required secrets in DAST workflow
([#623](#623))
([6ae297f](6ae297f))
* **cli:** doctor image check reads compose file and fix API docs URL
([#625](#625))
([5202e53](5202e53))
* **engine:** sanitize error messages in checkpoint reconciliation and
compaction summaries
([#632](#632))
([5394ed7](5394ed7))
* mitigate TOCTOU DNS rebinding gap in git clone SSRF prevention
([#633](#633))
([1846f6e](1846f6e))
* resolve post-startup log loss, add provider model discovery, and
improve setup wizard UX
([#634](#634))
([2df8d11](2df8d11))


### Maintenance

* bump https://github.com/astral-sh/ruff-pre-commit from v0.15.6 to
0.15.7 ([#628](#628))
([c641d2c](c641d2c))
* bump python from `584e89d` to `fb83750` in /docker/backend
([#627](#627))
([1a36eca](1a36eca))
* bump python from `584e89d` to `fb83750` in /docker/sandbox
([#629](#629))
([fd3e69a](fd3e69a))
* bump the minor-and-patch group across 2 directories with 3 updates
([#630](#630))
([67d14c4](67d14c4))
* bump the minor-and-patch group with 2 updates
([#631](#631))
([2e51b60](2e51b60))
* **ci:** add timeout-minutes, harden fuzz script, extend CVE audit
([#626](#626))
([25420e2](25420e2))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(engine): sanitize error messages in checkpoint reconciliation fix(engine): sanitize assistant content snippets in compaction summaries

1 participant