feat: add explicit ScanOutcome signal to OutputScanResult by Aureliolo · Pull Request #394 · Aureliolo/synthorg

Aureliolo · 2026-03-14T14:13:30Z

Summary

Add ScanOutcome enum (CLEAN/REDACTED/WITHHELD/LOG_ONLY) to OutputScanResult so ToolInvoker can distinguish intentional policy withholding from scanner failure
Scanner sets REDACTED, WithholdPolicy sets WITHHELD, LogOnlyPolicy sets LOG_ONLY
ToolInvoker._scan_output branches on WITHHELD first with dedicated TOOL_OUTPUT_WITHHELD event and clear user-facing message
Model validator enforces consistency: REDACTED requires redacted_content, LOG_ONLY requires has_sensitive_data=False
Extracted _handle_sensitive_scan from _scan_output to meet 50-line function limit

Review coverage

Pre-reviewed by 14 agents (code-reviewer, python-reviewer, security-reviewer, silent-failure-hunter, type-design-analyzer, comment-analyzer, conventions-enforcer, logging-audit, resilience-audit, async-concurrency-reviewer, pr-test-analyzer, test-quality-reviewer, docs-consistency, issue-resolution-verifier). 12 findings addressed, 1 false positive (PEP 758 as binding requires parentheses), 1 deferred (test file splitting).

Test plan

7773 tests pass (4 new), 9 skipped (platform/env)
94.63% coverage (80% minimum)
ruff lint + format clean
mypy strict clean
All pre-commit hooks pass
CI pipeline (lint, type-check, test, audit, dashboard)

Closes #284

Add a ScanOutcome enum (CLEAN/REDACTED/WITHHELD/LOG_ONLY) to OutputScanResult so the ToolInvoker can distinguish intentional policy withholding from scanner failure. Previously both produced has_sensitive_data=True + redacted_content=None, leading to a misleading "no redacted content available" log message. - ScanOutcome enum in security/models.py with consistency validation - OutputScanner sets REDACTED, WithholdPolicy sets WITHHELD, LogOnlyPolicy sets LOG_ONLY on sensitive results - ToolInvoker branches on WITHHELD first with dedicated event (TOOL_OUTPUT_WITHHELD) and clear user-facing message - 5 new model validation tests, 2 new invoker tests, outcome assertions added across all affected test files

Pre-reviewed by 14 agents, 12 findings addressed: - Tighten OutputScanResult validator: REDACTED requires redacted_content, LOG_ONLY requires has_sensitive_data=False - Extract _handle_sensitive_scan from _scan_output (50-line limit) - Rewrite _scan_output docstring to cover all outcome paths - Add TOOL_OUTPUT_WITHHELD to CLAUDE.md event catalog - Update design spec Output Scan Response Policies with ScanOutcome - Add tests: LOG_ONLY pass-through, WITHHELD priority over redacted_content, new model validation constraints - Remove redundant in-function imports in test

github-actions · 2026-03-14T14:13:40Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-14T14:13:47Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added explicit outcome tracking for output scans (CLEAN, REDACTED, WITHHELD, LOG_ONLY).
- Centralized sensitive-output handling with clear user-facing messages for withheld/redacted outputs.
- Improved observability for withheld outputs.
Bug Fixes
- Clearer separation between policy-withheld results and scanner failures; refined fail-closed messaging.
Documentation
- Operations docs updated with a richer policy model and downstream handling semantics.
Tests
- Expanded coverage for outcome-driven behaviors and policy scenarios.

Walkthrough

Adds a ScanOutcome enum and outcome field to OutputScanResult; policies and scanner set outcomes; introduces handle_sensitive_scan to centralize outcome-based handling; ToolInvoker delegates sensitive-output handling to that handler and branches on outcomes; docs and tests updated accordingly.

Changes

Cohort / File(s)	Summary
Design & Docs `CLAUDE.md`, `docs/design/operations.md`	Replaces prior policy table with an outcome-driven policy model; documents ScanOutcome semantics, policy defaults, downstream handling, and policy selection factory.
Security Models & Public API `src/ai_company/security/models.py`, `src/ai_company/security/__init__.py`	Adds `ScanOutcome` enum and `outcome: ScanOutcome` to `OutputScanResult` with new validation rules; exports `ScanOutcome` from the security package.
Scanner & Policies `src/ai_company/security/output_scanner.py`, `src/ai_company/security/output_scan_policy.py`	Scanner sets REDACTED outcome when redaction occurs; policies set explicit outcomes (WITHHELD, LOG_ONLY, etc.) and adjust returned `OutputScanResult`.
Tool invocation & handling `src/ai_company/tools/invoker.py`, `src/ai_company/tools/scan_result_handler.py`	Adds `handle_sensitive_scan` module and delegates sensitive-output branching to it; invoker now branches on `scan_result.outcome` (WITHHELD/REDACTED/LOG_ONLY/CLEAN) and centralizes metadata/logging and fail-closed behavior.
Observability events `src/ai_company/observability/events/tool.py`	Adds `TOOL_OUTPUT_WITHHELD` event constant for policy-withheld logging.
Security service `src/ai_company/security/service.py`	Logs `fallback_outcome` when policy application fails and updates fallback messaging to note raw scan result may be less strict.
Unit tests (security & tools) `tests/unit/security/`, `tests/unit/tools/`	Updates and expands tests to assert `outcome` semantics across models, policies, scanner, invoker, and the new handler; adds tests for WITHHELD, LOG_ONLY, and defensive fallback flows.
Minor docs/metadata `CLAUDE.md`	Small bullet-prefix change (non-functional) in event naming guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Invoker as ToolInvoker
    participant Tool as Tool
    participant Scanner as OutputScanner
    participant Policy as SecurityPolicy
    participant Handler as ScanHandler

    Client->>Invoker: invoke_tool(tool_call)
    Invoker->>Tool: execute_tool(tool_call)
    Tool-->>Invoker: ToolExecutionResult(output)

    Invoker->>Scanner: scan(output)
    Scanner-->>Invoker: OutputScanResult(outcome=CLEAN|REDACTED|WITHHELD|LOG_ONLY)

    Invoker->>Policy: apply_policy(scan_result)
    Policy-->>Invoker: OutputScanResult(outcome=REDACTED|WITHHELD|LOG_ONLY|CLEAN)

    Invoker->>Handler: handle_sensitive_scan(tool_call, result, scan_result)
    alt scan_result.outcome == WITHHELD
        Handler-->>Invoker: ToolExecutionResult(error="output withheld by security policy", metadata: output_withheld)
        Invoker-->>Client: error response
    else scan_result.outcome == REDACTED
        Handler-->>Invoker: ToolExecutionResult(output=redacted_content, metadata: output_redacted)
        Invoker-->>Client: redacted output
    else scan_result.outcome == LOG_ONLY or CLEAN
        Invoker-->>Client: ToolExecutionResult(output=original)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

feat: add explicit withheld signal to OutputScanResult for WithholdPolicy #284: feat: add explicit withheld signal to OutputScanResult for WithholdPolicy — Implements an outcome enum and outcome field so the invoker can distinguish intentional withholding from missing redactions.

Possibly related PRs

feat: add approval workflow gates to TaskEngine #387 — Also modifies ToolInvoker flow and state tracking; likely overlaps with invocation/control-flow changes in this PR.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main feature: adding an explicit ScanOutcome enum signal to OutputScanResult to enable better outcome tracking and decision-making in the security scanning pipeline.
Description check	✅ Passed	The description is well-structured and directly related to the changeset. It explains the ScanOutcome enum introduction, policy implementations, ToolInvoker behavioral changes, model validation rules, and lists test coverage and pre-review findings.
Linked Issues check	✅ Passed	The pull request fully addresses issue `#284` requirements: introduces ScanOutcome enum with CLEAN/REDACTED/WITHHELD/LOG_ONLY states, updates all policy implementations to set the outcome field, enforces model validation consistency, and updates ToolInvoker to branch on WITHHELD with dedicated event and message.
Out of Scope Changes check	✅ Passed	All changes are within scope of issue `#284`: introducing ScanOutcome enum, updating OutputScanResult model, modifying policy implementations, refactoring ToolInvoker._scan_output, adding comprehensive tests, and updating design documentation. No extraneous changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/withheld-signal

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/withheld-signal

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-14T14:13:56Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the handling of sensitive data within tool outputs by introducing an explicit ScanOutcome enum. This new enum provides a clear signal about how output scan policies processed sensitive information, enabling more precise and intentional responses from the ToolInvoker. The changes ensure that different security policy actions, such as redacting, withholding, or logging data, are distinctly communicated and acted upon, improving both security posture and system clarity.

Highlights

Explicit ScanOutcome Signal: Introduced a new ScanOutcome enum (CLEAN, REDACTED, WITHHELD, LOG_ONLY) to OutputScanResult to clearly indicate how sensitive data scan results were handled by policies. This allows downstream consumers, particularly the ToolInvoker, to differentiate between intentional policy decisions (e.g., withholding content) and scanner failures.
Policy-Driven Outcome Assignment: Output scan policies now explicitly set the ScanOutcome in the OutputScanResult. The scanner initially sets REDACTED if sensitive data is found, WithholdPolicy changes it to WITHHELD, and LogOnlyPolicy sets LOG_ONLY.
Enhanced ToolInvoker Logic: The ToolInvoker._scan_output method was refactored to branch its response based on the ScanOutcome. Specifically, it now prioritizes WITHHELD outcomes, returning a dedicated error message and event (TOOL_OUTPUT_WITHHELD) distinct from generic fail-closed paths for scanner exceptions.
Model Validation for Consistency: A Pydantic model validator was added to OutputScanResult to enforce consistency rules. For example, REDACTED outcomes now require redacted_content to be present, and LOG_ONLY outcomes are invalid if has_sensitive_data is true.
Code Refactoring: The logic for handling sensitive scan results was extracted from _scan_output into a new private method _handle_sensitive_scan within ToolInvoker, improving readability and adhering to function length limits.

Changelog

CLAUDE.md
- Added TOOL_OUTPUT_WITHHELD to the list of recognized event names.
docs/design/operations.md
- Updated the output scan response policies table to include the new ScanOutcome column.
- Added a detailed explanation of the ScanOutcome enum and how ToolInvoker utilizes it for branching logic.
src/ai_company/observability/events/tool.py
- Defined a new constant TOOL_OUTPUT_WITHHELD for logging events related to withheld tool output.
src/ai_company/security/init.py
- Exported the new ScanOutcome enum for broader access within the security module.
src/ai_company/security/models.py
- Introduced the ScanOutcome StrEnum with members CLEAN, REDACTED, WITHHELD, and LOG_ONLY.
- Added an outcome field to the OutputScanResult model, defaulting to ScanOutcome.CLEAN.
- Implemented a model_validator in OutputScanResult to ensure logical consistency between has_sensitive_data, redacted_content, and the new outcome field.
src/ai_company/security/output_scan_policy.py
- Imported ScanOutcome for use in policy implementations.
- Modified WithholdPolicy to set the outcome of the OutputScanResult to ScanOutcome.WITHHELD.
- Modified LogOnlyPolicy to set the outcome of the OutputScanResult to ScanOutcome.LOG_ONLY when sensitive data is detected and logged.
src/ai_company/security/output_scanner.py
- Imported ScanOutcome.
- Updated the scan method to explicitly set ScanOutcome.REDACTED when sensitive data is detected and redacted.
src/ai_company/tools/invoker.py
- Imported OutputScanResult and ScanOutcome for use in tool invocation logic.
- Imported the new TOOL_OUTPUT_WITHHELD event constant.
- Extracted sensitive scan result handling into a new private method _handle_sensitive_scan.
- Refactored _scan_output to delegate to _handle_sensitive_scan when sensitive data is found, using the ScanOutcome to determine the appropriate response.
tests/unit/security/test_models.py
- Imported ScanOutcome for testing.
- Updated existing OutputScanResult tests to include assertions for the outcome field.
- Added new unit tests to validate the consistency rules enforced by the OutputScanResult model validator for various ScanOutcome scenarios.
tests/unit/security/test_output_scan_policy.py
- Imported ScanOutcome for testing.
- Updated test fixtures and assertions to reflect the ScanOutcome being set by RedactPolicy, WithholdPolicy, and LogOnlyPolicy.
tests/unit/security/test_output_scanner.py
- Imported ScanOutcome for testing.
- Updated tests for OutputScanner to assert the correct ScanOutcome is set for clean and sensitive outputs.
tests/unit/security/test_service.py
- Imported ScanOutcome for testing.
- Updated OutputScanResult mocks in various tests to include the outcome field.
tests/unit/tools/test_invoker_security.py
- Imported ScanOutcome for testing.
- Updated existing OutputScanResult mocks to include the outcome field.
- Modified the assertion for withheld content to specifically check for the 'withheld by security policy' message.
- Added new test cases for TestWithheldOutcome to verify the ToolInvoker's behavior when ScanOutcome is WITHHELD, including message content and metadata.
- Added new test cases for TestLogOnlyOutcome to confirm that LOG_ONLY outcomes pass original output through unchanged.

Activity

The pull request introduces a new feature to explicitly signal the outcome of output scans.
The author provided a detailed summary of the changes, including the introduction of the ScanOutcome enum and its integration into OutputScanResult and ToolInvoker.
The PR description outlines the specific responsibilities of the scanner and various policies in setting the ScanOutcome.
A model validator was added to ensure data consistency related to the new ScanOutcome.
Code refactoring involved extracting a method to handle sensitive scan results, improving code organization.
The pull request was pre-reviewed by 14 agents, leading to 12 findings being addressed.
One false positive related to PEP 758 as binding was identified and noted.
One finding regarding test file splitting was deferred.
The test plan confirms that 7773 tests pass (including 4 new ones), code coverage is 94.63%, and all linting and type-checking checks are clean.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new ScanOutcome enum (CLEAN, REDACTED, WITHHELD, LOG_ONLY) to explicitly track the result of security output scanning policies. The OutputScanResult model is updated to include this outcome, along with new validation rules. Policy implementations (WithholdPolicy, LogOnlyPolicy) are modified to set the appropriate ScanOutcome. The ToolInvoker now uses this ScanOutcome to handle sensitive tool outputs, providing distinct responses such as specific error messages and metadata for withheld content, returning redacted content, or passing through original output for LOG_ONLY or CLEAN outcomes. Documentation and unit tests are updated to reflect these new behaviors and validations.

Copilot

Pull request overview

Adds an explicit ScanOutcome enum to OutputScanResult so the tool invoker can distinguish intentional output withholding (policy) from scanner failures, improving logging, user-facing messaging, and downstream observability.

Changes:

Introduce ScanOutcome (CLEAN/REDACTED/WITHHELD/LOG_ONLY) and enforce outcome/content consistency via model validation.
Update scanner/policies/invoker to set and branch on outcomes (including a dedicated TOOL_OUTPUT_WITHHELD event + message).
Expand unit tests and docs to cover WITHHELD and LOG_ONLY behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/unit/tools/test_invoker_security.py	Adds invoker tests for `WITHHELD` and `LOG_ONLY` outcome behavior and messaging.
tests/unit/security/test_service.py	Updates service tests to include explicit `ScanOutcome` on scan results.
tests/unit/security/test_output_scanner.py	Asserts scanner returns `CLEAN` for no findings and `REDACTED` when findings occur.
tests/unit/security/test_output_scan_policy.py	Verifies policies set/transform `ScanOutcome` correctly (e.g., Withhold → `WITHHELD`, LogOnly → `LOG_ONLY`).
tests/unit/security/test_models.py	Adds validation tests ensuring outcome/result field consistency.
src/ai_company/tools/invoker.py	Branches on `ScanOutcome.WITHHELD` first and extracts sensitive-scan handling helper.
src/ai_company/security/output_scanner.py	Sets `outcome=REDACTED` when findings are detected.
src/ai_company/security/output_scan_policy.py	Withhold sets `WITHHELD`; LogOnly returns clean result with `LOG_ONLY`.
src/ai_company/security/models.py	Adds `ScanOutcome` and validates invariants between `has_sensitive_data`, `redacted_content`, and `outcome`.
src/ai_company/security/init.py	Re-exports `ScanOutcome` as part of the security public API.
src/ai_company/observability/events/tool.py	Adds `TOOL_OUTPUT_WITHHELD` event constant.
docs/design/operations.md	Documents `ScanOutcome` semantics and policy/outcome mapping.
CLAUDE.md	Updates logging/event-constant guidance examples to include `TOOL_OUTPUT_WITHHELD`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/ai_company/tools/invoker.py

+                    "Sensitive data detected — content withheld by security policy."
+                ),
+                is_error=True,
+                metadata={"output_withheld": True},


tests/unit/tools/test_invoker_security.py

                has_sensitive_data=True,
                findings=("potential leak",),
                redacted_content=None,
+                outcome=ScanOutcome.WITHHELD,


codecov · 2026-03-14T14:23:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.84%. Comparing base (27a55d2) to head (5b51f36).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #394      +/-   ##
==========================================
+ Coverage   93.83%   93.84%   +0.01%     
==========================================
  Files         462      463       +1     
  Lines       21653    21691      +38     
  Branches     2079     2086       +7     
==========================================
+ Hits        20319    20357      +38     
  Misses       1032     1032              
  Partials      302      302

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

greptile-apps · 2026-03-14T14:23:27Z

Greptile Summary

This PR introduces a ScanOutcome enum (CLEAN / REDACTED / WITHHELD / LOG_ONLY) to OutputScanResult, giving ToolInvoker a typed signal to distinguish intentional policy withholding from scanner failure. The routing logic is cleanly extracted into a new scan_result_handler.py, the model validator correctly enforces all cross-field invariants (including the now-addressed findings non-empty constraint when has_sensitive_data=True), and fail-closed behaviour is preserved throughout. Test coverage is thorough — including model_copy bypass scenarios that exercise the defensive fallback.

Two minor issues found:

Implicit REDACTED dispatch in scan_result_handler.py — the REDACTED branch dispatches on redacted_content is not None rather than outcome == ScanOutcome.REDACTED, which is asymmetric with the WITHHELD branch's explicit outcome check. While today the validator invariant makes these equivalent for valid objects, a future ScanOutcome member that also carries redacted_content would silently mis-route to the REDACTED path instead of falling through to the defensive fallback.
Misleading fallback note in service.py — the log message on policy failure says "may be less strict than intended policy", but this is only true for strict policies (WithholdPolicy). When LogOnlyPolicy fails, the raw scanner result (REDACTED) is returned, which is actually more strict than the intended pass-through behaviour. The note could mislead operators into underestimating enforcement when a permissive policy fails.

Confidence Score: 4/5

Safe to merge — fail-closed invariants are preserved and the two findings are style-level concerns with no runtime impact.
The core logic is well-structured, the model validator enforces all intended invariants, and test coverage is comprehensive including adversarial bypass scenarios. The two findings are a forward-compatibility style concern in scan_result_handler.py and a misleading log comment in service.py — neither affects production safety or correctness today.
src/ai_company/tools/scan_result_handler.py (REDACTED branch dispatch) and src/ai_company/security/service.py (fallback note accuracy).

Important Files Changed

Filename	Overview
src/ai_company/security/models.py	Adds `ScanOutcome` enum and `outcome` field to `OutputScanResult`; the `_check_consistency` validator correctly enforces all cross-field invariants, including the missing `findings` non-empty constraint when `has_sensitive_data=True` (addressed in this PR).
src/ai_company/tools/scan_result_handler.py	New extracted module routing sensitive scan results by outcome; WITHHELD branch uses an explicit outcome check but the REDACTED branch implicitly dispatches on `redacted_content is not None`, creating a forward-compatibility risk if new `ScanOutcome` values with `redacted_content` are added. All three paths (WITHHELD, REDACTED, defensive fallback) correctly preserve `result.metadata`.
src/ai_company/security/service.py	Adds `fallback_outcome` to the policy-failure log; the accompanying note "may be less strict than intended policy" is only accurate for strict-policy failures (e.g. `WithholdPolicy`) — `LogOnlyPolicy` failure returns a stricter fallback, making the message potentially misleading to operators.
src/ai_company/tools/invoker.py	Cleanly delegates sensitive-scan routing to `handle_sensitive_scan`; exception handler now preserves `result.metadata` (consistent with the new handler). `LOG_ONLY` and `CLEAN` outcomes pass through unchanged as expected.
src/ai_company/security/output_scan_policy.py	All three concrete policies correctly set their outcomes: `WithholdPolicy` uses `model_copy` to set `WITHHELD` (intentional, to skip validators while still producing valid state since findings are preserved), `LogOnlyPolicy` returns `OutputScanResult(outcome=ScanOutcome.LOG_ONLY)` (validated at construction time), and `RedactPolicy` passes the scanner result through unchanged (already `REDACTED`).
tests/unit/tools/test_invoker_output_scan.py	New test file covering WITHHELD, LOG_ONLY, and defensive-fallback paths; uses `model_copy` to exercise bypass scenarios. Tests for `test_withheld_metadata_uses_output_withheld_key` and `test_defensive_fallback_metadata_uses_scan_failed_key` correctly access `invoker._scan_output` directly since `ToolResult` does not surface `ToolExecutionResult.metadata` publicly.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/ai_company/tools/scan_result_handler.py
Line: 58-73

Comment:
**Implicit REDACTED dispatch — asymmetric with WITHHELD branch**

The WITHHELD branch dispatches on `scan_result.outcome == ScanOutcome.WITHHELD` (an explicit outcome check), but the REDACTED branch dispatches on `scan_result.redacted_content is not None` (an implicit field check). Today these are semantically equivalent for valid `OutputScanResult` objects because the model validator enforces `REDACTED ↔ redacted_content is not None`, but the asymmetry has a forward-compatibility risk.

If a future `ScanOutcome` member (e.g. `PARTIAL`) is added that also carries `redacted_content`, it would silently hit this branch and be logged as `TOOL_OUTPUT_REDACTED`, misleading any monitoring that distinguishes outcomes. Consider making the dispatch explicit:

```python
if scan_result.outcome == ScanOutcome.WITHHELD:
    # … existing WITHHELD path …

if scan_result.outcome == ScanOutcome.REDACTED and scan_result.redacted_content is not None:
    # … existing REDACTED path …

# Defensive: model_copy() can produce REDACTED with redacted_content=None.
```

The dual condition (`outcome == REDACTED and redacted_content is not None`) preserves the current fail-closed behaviour for the broken-model-copy case — a `REDACTED` outcome with `None` content still falls through to the defensive fallback — while also making it explicit that any unrecognised future outcome falls closed.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 277-292

Comment:
**Misleading fallback note — `LogOnlyPolicy` failure is stricter, not looser**

The note logged on policy failure says `"may be less strict than intended policy"`, but this characterisation is only accurate when a strict policy (e.g. `WithholdPolicy`) fails. When a permissive policy fails the direction is reversed:

| Intended policy | Fallback (raw scanner result) | Relative strictness |
|---|---|---|
| `WithholdPolicy` | `REDACTED` — partial content returned | **less strict** ✓ |
| `LogOnlyPolicy` | `REDACTED` — content now blocked | **more strict** ✗ |
| `RedactPolicy` | `REDACTED` — same behaviour | no change |

Operators monitoring this log event (keyed on `SECURITY_INTERCEPTOR_ERROR`) could be misled into thinking the system degraded toward permissiveness when a `LogOnlyPolicy` failure actually makes it more restrictive. A neutral note avoids the ambiguity:

```suggestion
                note="Output scan policy application failed "
                "— returning raw scanner result "
                "(strictness may differ from intended policy)",
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 5b51f36}

src/ai_company/tools/invoker.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/ai_company/tools/invoker.py (1)

280-383: 🛠️ Refactor suggestion | 🟠 Major

Split the new scan helper again.

_handle_sensitive_scan still spans roughly 60 lines on its own, and src/ai_company/tools/invoker.py is still above the repo's 800-line cap. Please break the withheld/redacted/fail-closed branches into smaller helpers or move scan-result adaptation into a dedicated collaborator.

♻️ Refactor sketch

-    def _handle_sensitive_scan(
-        self,
-        tool_call: ToolCall,
-        result: ToolExecutionResult,
-        scan_result: OutputScanResult,
-    ) -> ToolExecutionResult:
-        ...
+    def _handle_sensitive_scan(
+        self,
+        tool_call: ToolCall,
+        result: ToolExecutionResult,
+        scan_result: OutputScanResult,
+    ) -> ToolExecutionResult:
+        if scan_result.outcome == ScanOutcome.WITHHELD:
+            return self._withheld_scan_result(tool_call, scan_result)
+        if scan_result.redacted_content is not None:
+            return self._redacted_scan_result(tool_call, result, scan_result)
+        return self._fail_closed_scan_result(tool_call, scan_result)
+
+    def _withheld_scan_result(
+        self,
+        tool_call: ToolCall,
+        scan_result: OutputScanResult,
+    ) -> ToolExecutionResult:
+        ...
+
+    def _redacted_scan_result(
+        self,
+        tool_call: ToolCall,
+        result: ToolExecutionResult,
+        scan_result: OutputScanResult,
+    ) -> ToolExecutionResult:
+        ...
+
+    def _fail_closed_scan_result(
+        self,
+        tool_call: ToolCall,
+        scan_result: OutputScanResult,
+    ) -> ToolExecutionResult:
+        ...

As per coding guidelines: "Functions must be less than 50 lines; files must be less than 800 lines."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/invoker.py` around lines 280 - 383, The
_handle_sensitive_scan method is too large and should be split into smaller
helpers to meet the <50-line function> rule; extract the three branches into
distinct methods such as _handle_withheld_output(tool_call, scan_result),
_handle_redacted_output(tool_call, result, scan_result), and
_handle_failclosed_output(tool_call, scan_result) (or move that adaptation into
a new collaborator class like OutputScanHandler) and have _handle_sensitive_scan
delegate to them; ensure each extracted helper constructs the same
ToolExecutionResult shapes and logs using the same logger messages
(TOOL_OUTPUT_WITHHELD, TOOL_OUTPUT_REDACTED) and metadata keys
("output_withheld","output_redacted","redaction_findings","output_scan_failed")
so behavior is unchanged, and update _scan_output to call the refactored
_handle_sensitive_scan as before.

tests/unit/tools/test_invoker_security.py (1)

382-404: 🧹 Nitpick | 🔵 Trivial

Don't reuse the old fail-closed test for the WITHHELD path.

This case now exercises the explicit WITHHELD behavior, which is already covered again in TestWithheldOutcome, and no longer matches the “no redacted content fails closed” scenario named here. Keep this test targeted at the fallback/no-usable-redaction case and let the dedicated class own the WITHHELD assertions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/test_invoker_security.py` around lines 382 - 404, The test
function test_sensitive_but_no_redacted_content_fails_closed currently
constructs an OutputScanResult with outcome=ScanOutcome.WITHHELD which
duplicates the dedicated WITHHELD path; change that outcome to a non-WITHHELD
enum (e.g. ScanOutcome.UNKNOWN) so this test exercises the
fallback/no-usable-redaction case, and remove the assertion that checks for the
explicit "withheld by security policy" message (keep the is_error True and the
"executed:" not-in-content checks). Update the instantiation of OutputScanResult
in this test accordingly and leave TestWithheldOutcome to own WITHHELD-specific
assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 325-339: The logger call in the fail-closed branch currently emits
TOOL_OUTPUT_REDACTED which conflates withheld outputs with redactions; change
the event to a failure-specific event constant (e.g. TOOL_OUTPUT_WITHHELD or
TOOL_OUTPUT_WITHHELD_NO_PAYLOAD) instead of TOOL_OUTPUT_REDACTED and keep the
same context (tool_call_id=tool_call.id, tool_name=tool_call.name,
findings=scan_result.findings, note="no redacted content available — withholding
output"); leave the returned ToolExecutionResult and metadata
({"output_scan_failed": True}) as-is so metrics reflect a failure-withheld event
rather than a redaction.

In `@tests/unit/security/test_models.py`:
- Around line 525-586: Replace the multiple near-duplicate test methods that
exercise OutputScanResult with a parametrized table using
pytest.mark.parametrize: create a single test (e.g.,
test_output_scanresult_outcome_matrix) that iterates rows of
(has_sensitive_data, findings, outcome, expect_valid, expect_error_field) and
for each row either constructs OutputScanResult and asserts
result.outcome/redacted_content when expect_valid is True or wraps construction
in pytest.raises(ValidationError, match=expect_error_field) when expect_valid is
False; reference ScanOutcome enum values for outcomes and preserve checks used
in the originals (outcome and redacted_content) so behavior remains identical.

In `@tests/unit/tools/test_invoker_security.py`:
- Around line 816-925: This test file exceeds the 800-line limit by adding two
large test classes; extract TestWithheldOutcome and TestLogOnlyOutcome (and
their tests that call _make_interceptor, _make_verdict, ToolInvoker,
ToolExecutionResult, SecurityContext) into a new test module (e.g.,
tests/unit/tools/test_invoker_output_scan.py), update imports at top of the new
file to reuse existing fixtures (security_registry, tool_call) and helper
symbols, and remove those classes from the original file so the original stays
under 800 lines and each function remains <50 lines; ensure pytest discovery by
matching the new file name pattern and run tests to confirm no missing imports.

---

Outside diff comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 280-383: The _handle_sensitive_scan method is too large and should
be split into smaller helpers to meet the <50-line function> rule; extract the
three branches into distinct methods such as _handle_withheld_output(tool_call,
scan_result), _handle_redacted_output(tool_call, result, scan_result), and
_handle_failclosed_output(tool_call, scan_result) (or move that adaptation into
a new collaborator class like OutputScanHandler) and have _handle_sensitive_scan
delegate to them; ensure each extracted helper constructs the same
ToolExecutionResult shapes and logs using the same logger messages
(TOOL_OUTPUT_WITHHELD, TOOL_OUTPUT_REDACTED) and metadata keys
("output_withheld","output_redacted","redaction_findings","output_scan_failed")
so behavior is unchanged, and update _scan_output to call the refactored
_handle_sensitive_scan as before.

In `@tests/unit/tools/test_invoker_security.py`:
- Around line 382-404: The test function
test_sensitive_but_no_redacted_content_fails_closed currently constructs an
OutputScanResult with outcome=ScanOutcome.WITHHELD which duplicates the
dedicated WITHHELD path; change that outcome to a non-WITHHELD enum (e.g.
ScanOutcome.UNKNOWN) so this test exercises the fallback/no-usable-redaction
case, and remove the assertion that checks for the explicit "withheld by
security policy" message (keep the is_error True and the "executed:"
not-in-content checks). Update the instantiation of OutputScanResult in this
test accordingly and leave TestWithheldOutcome to own WITHHELD-specific
assertions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 66c3c5a4-0c20-4fa5-8696-e71e41163bef

📥 Commits

Reviewing files that changed from the base of the PR and between 27a55d2 and fbbcf95.

📒 Files selected for processing (13)

CLAUDE.md
docs/design/operations.md
src/ai_company/observability/events/tool.py
src/ai_company/security/__init__.py
src/ai_company/security/models.py
src/ai_company/security/output_scan_policy.py
src/ai_company/security/output_scanner.py
src/ai_company/tools/invoker.py
tests/unit/security/test_models.py
tests/unit/security/test_output_scan_policy.py
tests/unit/security/test_output_scanner.py
tests/unit/security/test_service.py
tests/unit/tools/test_invoker_security.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Agent
GitHub Check: Build Web
GitHub Check: Build Backend
GitHub Check: Greptile Review
GitHub Check: Test (Python 3.14)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (6)

docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Build docs with: uv run zensical build (output: _site/docs/) — use without --strict until zensical/backlog#72 is resolved

Files:

docs/design/operations.md

docs/**

📄 CodeRabbit inference engine (CLAUDE.md)

Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)

Files:

docs/design/operations.md

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (no from __future__ import annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

tests/unit/security/test_output_scan_policy.py
src/ai_company/tools/invoker.py
src/ai_company/observability/events/tool.py
tests/unit/security/test_models.py
src/ai_company/security/models.py
tests/unit/tools/test_invoker_security.py
src/ai_company/security/output_scan_policy.py
tests/unit/security/test_output_scanner.py
src/ai_company/security/__init__.py
src/ai_company/security/output_scanner.py
tests/unit/security/test_service.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80

Files:

tests/unit/security/test_output_scan_policy.py
tests/unit/security/test_models.py
tests/unit/tools/test_invoker_security.py
tests/unit/security/test_output_scanner.py
tests/unit/security/test_service.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/

Files:

src/ai_company/tools/invoker.py
src/ai_company/observability/events/tool.py
src/ai_company/security/models.py
src/ai_company/security/output_scan_policy.py
src/ai_company/security/__init__.py
src/ai_company/security/output_scanner.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests

Files:

src/ai_company/tools/invoker.py
src/ai_company/observability/events/tool.py
src/ai_company/security/models.py
src/ai_company/security/output_scan_policy.py
src/ai_company/security/__init__.py
src/ai_company/security/output_scanner.py

🧠 Learnings (8)

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(__name__); never use import logging or logging.getLogger() or print() in application code

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Always use logger variable name (not _logger or log)

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging

Applied to files:

CLAUDE.md

🧬 Code graph analysis (7)

tests/unit/security/test_output_scan_policy.py (1)

src/ai_company/security/models.py (3)

OutputScanResult (172-216)

ScanOutcome (23-40)

SecurityContext (90-120)

src/ai_company/tools/invoker.py (8)

src/ai_company/providers/models.py (2)

ToolCall (96-119)

ToolResult (122-135)

src/ai_company/security/models.py (2)

OutputScanResult (172-216)

ScanOutcome (23-40)

src/ai_company/tools/base.py (2)

ToolExecutionResult (25-54)

name (123-125)

src/ai_company/security/output_scan_policy.py (5)

name (41-43)

name (69-71)

name (105-107)

name (146-148)

name (249-251)

src/ai_company/security/rules/credential_detector.py (1)

name (88-90)

src/ai_company/security/rules/destructive_op_detector.py (1)

name (94-96)

src/ai_company/security/rules/data_leak_detector.py (1)

name (79-81)

src/ai_company/security/rules/path_traversal_detector.py (1)

name (66-68)

tests/unit/security/test_models.py (1)

src/ai_company/security/models.py (2)

ScanOutcome (23-40)

OutputScanResult (172-216)

src/ai_company/security/output_scan_policy.py (1)

src/ai_company/security/models.py (2)

OutputScanResult (172-216)

ScanOutcome (23-40)

src/ai_company/security/__init__.py (1)

src/ai_company/security/models.py (1)

ScanOutcome (23-40)

src/ai_company/security/output_scanner.py (1)

src/ai_company/security/models.py (2)

OutputScanResult (172-216)

ScanOutcome (23-40)

tests/unit/security/test_service.py (1)

src/ai_company/security/models.py (1)

ScanOutcome (23-40)

🪛 LanguageTool

CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (6)

src/ai_company/security/__init__.py (1)

32-35: LGTM.

Re-exporting ScanOutcome at the package root keeps the public surface aligned with the new OutputScanResult contract.

Also applies to: 56-70

tests/unit/security/test_output_scanner.py (1)

5-5: LGTM.

These assertions lock in the new CLEAN/REDACTED outcome contract at the scanner boundary, which is the right regression coverage for this enum change.

Also applies to: 25-31, 73-79

src/ai_company/security/output_scanner.py (1)

66-70: LGTM.

Assigning ScanOutcome.REDACTED where the scanner actually produces a redacted payload removes the old redacted_content is None inference from downstream code.

src/ai_company/observability/events/tool.py (1)

43-47: LGTM.

Adding a dedicated TOOL_OUTPUT_WITHHELD event gives the invoker a clean observability hook for intentional policy withholding.

src/ai_company/security/output_scan_policy.py (1)

130-132: LGTM.

Encoding WITHHELD and LOG_ONLY directly in the returned OutputScanResult makes policy intent explicit without mutating the original model.

Also applies to: 167-183

src/ai_company/security/models.py (1)

191-215: OutputScanResult now encodes the outcome matrix cleanly.

The validator makes the REDACTED / WITHHELD / LOG_ONLY states explicit and prevents the old ambiguous “sensitive result with no usable redaction” shape from being constructed accidentally.

src/ai_company/tools/invoker.py

tests/unit/security/test_models.py

tests/unit/tools/test_invoker_security.py

- Fix wrong event constant in defensive fallback (TOOL_OUTPUT_REDACTED → TOOL_OUTPUT_WITHHELD) - Fix misleading comment about model validator preventing defensive fallback - Add WITHHELD requires redacted_content=None validator rule - Preserve result.metadata in WITHHELD branch (consistency with REDACTED) - Update WithholdPolicy docstring (no longer "fail-closed") - Add ScanOutcome to security/__init__.py module docstring - Clarify LOG_ONLY docstring (always has_sensitive_data=False) - Log fallback_outcome in service.py policy failure path - Fix _handle_sensitive_scan and _scan_output docstrings - Log unexpected outcome value in defensive fallback - Extract handle_sensitive_scan to scan_result_handler.py (invoker.py 848→779 lines) - Split TestWithheldOutcome/TestLogOnlyOutcome to test_invoker_output_scan.py - Add TestDefensiveFallback with model_copy-based tests for fail-closed path - Repurpose old fail-closed test to verify WITHHELD behavior - Inline helper tool classes via mocking (test file 926→785 lines) - Parametrize outcome validator tests

src/ai_company/security/models.py

src/ai_company/tools/scan_result_handler.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 283-284: Update the docstring for _scan_output to reference the
current function name handle_sensitive_scan (instead of the removed
_handle_sensitive_scan) so implementation and docs match; locate the
_scan_output function and replace the obsolete symbol in its docstring with
handle_sensitive_scan and adjust any wording to reflect that _scan_output
delegates to handle_sensitive_scan which branches on outcome (WITHHELD vs
REDACTED).

In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 4-5: Update the module docstring to reference the currently used
public method name instead of the removed private one: replace references to
ToolInvoker._handle_sensitive_scan with ToolInvoker.handle_sensitive_scan (or
the exact public entrypoint used by the invoker flow) so the docstring matches
the tests that exercise the handler via the invoker; ensure the wording reflects
that the test exercises the handler through the invoker flow rather than calling
a private method directly.

In `@tests/unit/tools/test_invoker_security.py`:
- Around line 619-622: The test mutates BaseTool instances by reassigning
execute (e.g., failing_tool.execute = AsyncMock(...)) which bypasses typing and
risks global state; instead create a dedicated test subclass or factory that
overrides the async execute method on _SecurityTestTool (or produce a new
subclass like FailingSecurityTestTool) and implement execute to raise
RuntimeError("intentional failure"), and update other occurrences (lines around
733-739 and 765-771) to use the subclass/factory so no method rebinding or type:
ignore is needed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4f043bce-3709-49da-bbf3-d72a38ce545e

📥 Commits

Reviewing files that changed from the base of the PR and between fbbcf95 and b1c1c7c.

📒 Files selected for processing (9)

src/ai_company/security/__init__.py
src/ai_company/security/models.py
src/ai_company/security/output_scan_policy.py
src/ai_company/security/service.py
src/ai_company/tools/invoker.py
src/ai_company/tools/scan_result_handler.py
tests/unit/security/test_models.py
tests/unit/tools/test_invoker_output_scan.py
tests/unit/tools/test_invoker_security.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Backend
GitHub Check: Build Web
GitHub Check: Greptile Review
GitHub Check: Test (Python 3.14)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (no from __future__ import annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

src/ai_company/security/__init__.py
src/ai_company/security/output_scan_policy.py
src/ai_company/security/service.py
src/ai_company/tools/invoker.py
src/ai_company/tools/scan_result_handler.py
tests/unit/tools/test_invoker_security.py
tests/unit/security/test_models.py
src/ai_company/security/models.py
tests/unit/tools/test_invoker_output_scan.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/

Files:

src/ai_company/security/__init__.py
src/ai_company/security/output_scan_policy.py
src/ai_company/security/service.py
src/ai_company/tools/invoker.py
src/ai_company/tools/scan_result_handler.py
src/ai_company/security/models.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests

Files:

src/ai_company/security/__init__.py
src/ai_company/security/output_scan_policy.py
src/ai_company/security/service.py
src/ai_company/tools/invoker.py
src/ai_company/tools/scan_result_handler.py
src/ai_company/security/models.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80

Files:

tests/unit/tools/test_invoker_security.py
tests/unit/security/test_models.py
tests/unit/tools/test_invoker_output_scan.py

🧠 Learnings (1)

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to tests/**/*.py : Prefer pytest.mark.parametrize for testing similar cases

Applied to files:

tests/unit/security/test_models.py

🧬 Code graph analysis (7)

src/ai_company/security/__init__.py (1)

src/ai_company/security/models.py (1)

ScanOutcome (23-44)

src/ai_company/security/output_scan_policy.py (1)

src/ai_company/security/models.py (2)

OutputScanResult (176-223)

ScanOutcome (23-44)

src/ai_company/tools/invoker.py (3)

src/ai_company/tools/scan_result_handler.py (1)

handle_sensitive_scan (24-90)

tests/unit/tools/test_invoker_output_scan.py (1)

tool_call (101-106)

tests/unit/tools/test_invoker_security.py (1)

tool_call (108-113)

src/ai_company/tools/scan_result_handler.py (4)

src/ai_company/observability/_logger.py (1)

get_logger (8-28)

src/ai_company/security/models.py (2)

OutputScanResult (176-223)

ScanOutcome (23-44)

src/ai_company/tools/base.py (1)

ToolExecutionResult (25-54)

src/ai_company/providers/models.py (1)

ToolCall (96-119)

tests/unit/tools/test_invoker_security.py (1)

src/ai_company/security/models.py (3)

ScanOutcome (23-44)

SecurityVerdictType (47-56)

OutputScanResult (176-223)

tests/unit/security/test_models.py (1)

src/ai_company/security/models.py (2)

ScanOutcome (23-44)

OutputScanResult (176-223)

tests/unit/tools/test_invoker_output_scan.py (4)

src/ai_company/core/enums.py (1)

ToolCategory (294-308)

src/ai_company/security/models.py (5)

OutputScanResult (176-223)

ScanOutcome (23-44)

SecurityContext (94-124)

SecurityVerdict (59-91)

SecurityVerdictType (47-56)

src/ai_company/tools/invoker.py (3)

ToolInvoker (73-779)

invoke (317-339)

_scan_output (274-315)

src/ai_company/security/output_scan_policy.py (5)

name (41-43)

name (69-71)

name (107-109)

name (148-150)

name (251-253)

🔇 Additional comments (9)

src/ai_company/security/service.py (1)

283-290: Good fallback telemetry on policy failure.

Including both policy and fallback_outcome in the exception context makes postmortems much clearer when policy application fails and raw scanner output is returned.

src/ai_company/tools/scan_result_handler.py (1)

45-90: Outcome routing is clean and fail-closed behavior is solid.

WITHHELD precedence, redaction handling, and the defensive fallback together provide clear behavior for intentional policy withholding vs. malformed sensitive-scan states.

src/ai_company/security/__init__.py (1)

33-37: Public API export is correctly wired.

ScanOutcome is now imported and re-exported via __all__, keeping the package surface aligned with the new model semantics.

Also applies to: 71-72

src/ai_company/security/output_scan_policy.py (1)

132-134: Policy outputs now encode intent explicitly.

Setting WITHHELD and LOG_ONLY directly in policy outputs gives downstream consumers a precise signal instead of inferring from nullable payload fields.

Also applies to: 179-185

tests/unit/security/test_models.py (1)

525-614: Strong parametric coverage for the outcome matrix.

The new accept/reject parametrized tests capture the model’s cross-field constraints clearly and should scale well as outcome semantics evolve.

src/ai_company/security/models.py (1)

23-45: The model consistency matrix is well enforced.

ScanOutcome plus _check_consistency() now explicitly guards the key combinations (CLEAN, REDACTED, WITHHELD, LOG_ONLY) and prevents ambiguous states at construction time.

Also applies to: 193-223

tests/unit/tools/test_invoker_output_scan.py (1)

247-305: Defensive invalid-state coverage is excellent.

The model_copy()-based broken-state tests are a strong defense-in-depth check for fail-closed behavior in sensitive output handling.

tests/unit/tools/test_invoker_security.py (2)

13-13: Outcome-aware fixtures are correctly aligned with the new scan model.

Good update: these constructors now explicitly set ScanOutcome where sensitive data is present, matching the new OutputScanResult invariants and avoiding false-negative test setup failures.

Also applies to: 337-338, 394-395, 706-707, 745-746

382-404: WITHHELD-path assertion now validates the intended user-facing behavior.

This correctly checks the explicit policy-withheld branch instead of conflating it with scanner fail-closed behavior.

src/ai_company/tools/invoker.py

tests/unit/tools/test_invoker_output_scan.py

tests/unit/tools/test_invoker_security.py

- Add missing inverse findings constraint: reject empty findings when has_sensitive_data=True (closes validator asymmetry) - Preserve result.metadata in defensive fallback branch (consistency with WITHHELD and REDACTED branches) - Fix _scan_output docstring: reference handle_sensitive_scan (not the removed _handle_sensitive_scan) - Fix test module docstring: reference handle_sensitive_scan - Replace execute method reassignment with proper subclasses (_FailingSecurityTool, _SoftErrorSecurityTool) — eliminates type: ignore comments

Copilot

Pull request overview

Adds an explicit ScanOutcome signal to OutputScanResult so the tool invoker can distinguish intentional policy withholding (WITHHELD) from scanner/handling failures, and routes sensitive-scan results through a dedicated handler.

Changes:

Introduce ScanOutcome (CLEAN/REDACTED/WITHHELD/LOG_ONLY) and enforce outcome/field consistency via model validation.
Update scanner and output-scan policies to set outcome appropriately, and update ToolInvoker to branch on WITHHELD with a dedicated event/message.
Add/adjust unit tests and documentation to cover the new outcome-driven behavior (including a new focused invoker output-scan test module).

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/unit/tools/test_invoker_security.py	Updates invoker security tests for `ScanOutcome` and refactors helper tools.
tests/unit/tools/test_invoker_output_scan.py	New tests covering `WITHHELD`, `LOG_ONLY`, and defensive fail-closed branches.
tests/unit/security/test_service.py	Updates service tests to include `outcome` where sensitive results are constructed.
tests/unit/security/test_output_scanner.py	Asserts scanner sets `CLEAN`/`REDACTED` outcomes.
tests/unit/security/test_output_scan_policy.py	Verifies policies set `WITHHELD`/`LOG_ONLY` and preserve expected semantics.
tests/unit/security/test_models.py	Adds `ScanOutcome` assertions and validation test cases.
src/ai_company/tools/scan_result_handler.py	New handler that centralizes sensitive scan result routing and logging.
src/ai_company/tools/invoker.py	Delegates sensitive handling to the new handler and adjusts scan failure behavior/docs.
src/ai_company/security/service.py	Improves policy-failure logging by including fallback outcome and clarifying strictness.
src/ai_company/security/output_scanner.py	Sets `outcome=REDACTED` when findings are detected.
src/ai_company/security/output_scan_policy.py	Policies now set `outcome` (`WITHHELD`, `LOG_ONLY`) when transforming results.
src/ai_company/security/models.py	Defines `ScanOutcome` and validates `OutputScanResult` consistency.
src/ai_company/security/init.py	Re-exports `ScanOutcome` and updates module documentation list.
src/ai_company/observability/events/tool.py	Adds `TOOL_OUTPUT_WITHHELD` event constant.
docs/design/operations.md	Documents `ScanOutcome` semantics and policy/outcome mapping.
CLAUDE.md	Updates logging-event guidance to include `TOOL_OUTPUT_WITHHELD`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/ai_company/tools/invoker.py

-                content=("Output scan failed (fail-closed). Tool output withheld."),
+                content="Output scan failed (fail-closed). Tool output withheld.",
                is_error=True,
                metadata={"output_scan_failed": True},


Copilot correctly identified that the scanner-exception path in _scan_output drops tool execution metadata. Align with handle_sensitive_scan by merging result.metadata.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/ai_company/tools/invoker.py (1)

43-43: ⚠️ Potential issue | 🟠 Major

Fail closed on outcome, not just has_sensitive_data.

The PR already acknowledges that model_copy() can bypass OutputScanResult validation. If a malformed result carries outcome=WITHHELD or outcome=REDACTED with has_sensitive_data=False, this guard skips handle_sensitive_scan() and returns raw tool output.

🔒 Suggested hardening

-from ai_company.security.models import SecurityContext, SecurityVerdictType
+from ai_company.security.models import (
+    ScanOutcome,
+    SecurityContext,
+    SecurityVerdictType,
+)
@@
-        if scan_result.has_sensitive_data:
+        if scan_result.has_sensitive_data or scan_result.outcome in (
+            ScanOutcome.REDACTED,
+            ScanOutcome.WITHHELD,
+        ):
             return handle_sensitive_scan(tool_call, result, scan_result)
         return result

Also applies to: 313-315

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/invoker.py` at line 43, The code currently only treats
results as sensitive when OutputScanResult.has_sensitive_data is true, which
allows a crafted model_copy() to bypass handling if outcome is WITHHELD/REDACTED
but has_sensitive_data=false; update the guard where the tool output is
processed (the block that calls handle_sensitive_scan and the similar logic
around lines 313-315) to fail-closed by checking both conditions: call
handle_sensitive_scan when result.has_sensitive_data is true OR result.outcome
is SecurityVerdictType.WITHHELD or SecurityVerdictType.REDACTED; keep existing
import of SecurityVerdictType and use that enum constant(s) and the same result
variable name used in the diff to locate the code to change.

♻️ Duplicate comments (1)

tests/unit/tools/test_invoker_output_scan.py (1)
241-246: ⚠️ Potential issue | 🟡 Minor

Update this docstring to the extracted helper name.

_handle_sensitive_scan no longer exists; this branch now exercises ai_company.tools.scan_result_handler.handle_sensitive_scan via ToolInvoker._scan_output.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/test_invoker_output_scan.py` around lines 241 - 246, Update
the test docstring to reference the extracted helper function instead of
`_handle_sensitive_scan`; specifically mention
`ai_company.tools.scan_result_handler.handle_sensitive_scan` (and that it's
exercised via `ToolInvoker._scan_output`) so the docstring accurately describes
the branch being tested.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/tools/scan_result_handler.py`:
- Around line 24-90: The function handle_sensitive_scan is over 50 lines; split
each branch into its own private helper to reduce size: create
_handle_withheld(tool_call, result, scan_result) to log TOOL_OUTPUT_WITHHELD and
return the WITHHELD ToolExecutionResult, _handle_redacted(tool_call, result,
scan_result) to log TOOL_OUTPUT_REDACTED and return the redacted
ToolExecutionResult (including "output_redacted" and "redaction_findings" in
metadata), and _handle_fail_closed(tool_call, result, scan_result) to log
TOOL_OUTPUT_WITHHELD with outcome and return the fail-closed ToolExecutionResult
(with "output_scan_failed" metadata); then simplify handle_sensitive_scan to
branch on scan_result.outcome and scan_result.redacted_content and delegate to
these helpers, preserving use of ScanOutcome, logger, ToolExecutionResult,
TOOL_OUTPUT_WITHHELD, and TOOL_OUTPUT_REDACTED.

In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 159-171: The tests currently construct
ToolExecutionResult(content="raw output") without preexisting metadata so they
don't catch regressions where existing metadata could be dropped; update both
tests that call invoker._scan_output (the primary one around ToolExecutionResult
usage and the defensive-fallback test around lines 298-305) to initialize
tool_exec_result with a sentinel metadata key (e.g., tool_exec_result =
ToolExecutionResult(content="raw output", metadata={"sentinel": True})) before
calling invoker._scan_output, and add assertions that scan_exec.metadata still
contains that sentinel (e.g., assert scan_exec.metadata.get("sentinel") is True)
in addition to the existing checks for output_withheld and absence of
output_scan_failed so you verify existing metadata is preserved by the
_scan_output path.

---

Outside diff comments:
In `@src/ai_company/tools/invoker.py`:
- Line 43: The code currently only treats results as sensitive when
OutputScanResult.has_sensitive_data is true, which allows a crafted model_copy()
to bypass handling if outcome is WITHHELD/REDACTED but has_sensitive_data=false;
update the guard where the tool output is processed (the block that calls
handle_sensitive_scan and the similar logic around lines 313-315) to fail-closed
by checking both conditions: call handle_sensitive_scan when
result.has_sensitive_data is true OR result.outcome is
SecurityVerdictType.WITHHELD or SecurityVerdictType.REDACTED; keep existing
import of SecurityVerdictType and use that enum constant(s) and the same result
variable name used in the diff to locate the code to change.

---

Duplicate comments:
In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 241-246: Update the test docstring to reference the extracted
helper function instead of `_handle_sensitive_scan`; specifically mention
`ai_company.tools.scan_result_handler.handle_sensitive_scan` (and that it's
exercised via `ToolInvoker._scan_output`) so the docstring accurately describes
the branch being tested.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ffb9a485-a305-4561-8406-6ca22aaa3d47

📥 Commits

Reviewing files that changed from the base of the PR and between b1c1c7c and 76bbd32.

📒 Files selected for processing (6)

src/ai_company/security/models.py
src/ai_company/tools/invoker.py
src/ai_company/tools/scan_result_handler.py
tests/unit/security/test_models.py
tests/unit/tools/test_invoker_output_scan.py
tests/unit/tools/test_invoker_security.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Backend
GitHub Check: Build Web
GitHub Check: Agent
GitHub Check: Greptile Review
GitHub Check: Test (Python 3.14)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (no from __future__ import annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

src/ai_company/tools/scan_result_handler.py
src/ai_company/tools/invoker.py
src/ai_company/security/models.py
tests/unit/security/test_models.py
tests/unit/tools/test_invoker_security.py
tests/unit/tools/test_invoker_output_scan.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/

Files:

src/ai_company/tools/scan_result_handler.py
src/ai_company/tools/invoker.py
src/ai_company/security/models.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests

Files:

src/ai_company/tools/scan_result_handler.py
src/ai_company/tools/invoker.py
src/ai_company/security/models.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80

Files:

tests/unit/security/test_models.py
tests/unit/tools/test_invoker_security.py
tests/unit/tools/test_invoker_output_scan.py

🧠 Learnings (1)

📚 Learning: 2026-03-14T11:20:53.699Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to tests/**/*.py : Prefer pytest.mark.parametrize for testing similar cases

Applied to files:

tests/unit/security/test_models.py

🧬 Code graph analysis (3)

src/ai_company/tools/invoker.py (1)

src/ai_company/tools/scan_result_handler.py (1)

handle_sensitive_scan (24-90)

tests/unit/security/test_models.py (1)

src/ai_company/security/models.py (2)

ScanOutcome (23-44)

OutputScanResult (176-226)

tests/unit/tools/test_invoker_security.py (2)

src/ai_company/security/models.py (3)

ScanOutcome (23-44)

SecurityVerdictType (47-56)

OutputScanResult (176-226)

tests/unit/tools/test_invoker_output_scan.py (5)

test_withheld_outcome_returns_policy_message (117-138)

security_registry (97-98)

tool_call (102-107)

_make_interceptor (77-90)

_make_verdict (59-74)

🔇 Additional comments (1)

tests/unit/tools/test_invoker_security.py (1)

55-84: Nice cleanup: these dedicated test tools are much safer than method rebinding.

The failure and soft-error scenarios are now explicit, type-safe, and no longer mutate BaseTool.execute in place.

As per coding guidelines, **/*.py: "Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement."

coderabbitai · 2026-03-14T15:24:01Z

src/ai_company/tools/scan_result_handler.py

+def handle_sensitive_scan(
+    tool_call: ToolCall,
+    result: ToolExecutionResult,
+    scan_result: OutputScanResult,
+) -> ToolExecutionResult:
+    """Route a sensitive scan result to the correct handler.
+
+    Branches on ``ScanOutcome``:
+
+    - ``WITHHELD``: return error with "withheld by policy" message.
+    - ``redacted_content`` present: return redacted content.
+    - Defensive fallback: withhold output (fail-closed).
+
+    Args:
+        tool_call: The tool call being processed.
+        result: The original tool execution result.
+        scan_result: The scan result with ``has_sensitive_data=True``.
+
+    Returns:
+        A new ``ToolExecutionResult`` reflecting the scan outcome.
+    """
+    if scan_result.outcome == ScanOutcome.WITHHELD:
+        logger.warning(
+            TOOL_OUTPUT_WITHHELD,
+            tool_call_id=tool_call.id,
+            tool_name=tool_call.name,
+            findings=scan_result.findings,
+            note="content withheld by security policy",
+        )
+        return ToolExecutionResult(
+            content=("Sensitive data detected — content withheld by security policy."),
+            is_error=True,
+            metadata={**result.metadata, "output_withheld": True},
+        )
+    if scan_result.redacted_content is not None:
+        logger.warning(
+            TOOL_OUTPUT_REDACTED,
+            tool_call_id=tool_call.id,
+            tool_name=tool_call.name,
+            findings=scan_result.findings,
+        )
+        return ToolExecutionResult(
+            content=scan_result.redacted_content,
+            is_error=result.is_error,
+            metadata={
+                **result.metadata,
+                "output_redacted": True,
+                "redaction_findings": list(scan_result.findings),
+            },
+        )
+    # Defensive: model_copy() skips model validators, so a policy
+    # that clears redacted_content without updating outcome could
+    # produce REDACTED with redacted_content=None.  This branch
+    # catches that case (and future outcome values) — fail-closed.
+    logger.warning(
+        TOOL_OUTPUT_WITHHELD,
+        tool_call_id=tool_call.id,
+        tool_name=tool_call.name,
+        findings=scan_result.findings,
+        outcome=scan_result.outcome.value,
+        note="no redacted content available — withholding output",
+    )
+    return ToolExecutionResult(
+        content="Sensitive data detected (fail-closed). Tool output withheld.",
+        is_error=True,
+        metadata={**result.metadata, "output_scan_failed": True},
+    )


🛠️ Refactor suggestion | 🟠 Major

handle_sensitive_scan() still exceeds the 50-line cap.

Moving this logic out of ToolInvoker fixed the caller, but the extracted helper itself now spans 67 lines. Split the WITHHELD, REDACTED, and fail-closed branches into private helpers so the new module also satisfies the repo limit.

As per coding guidelines, **/*.py: "Functions must be less than 50 lines; files must be less than 800 lines."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/tools/scan_result_handler.py` around lines 24 - 90, The function handle_sensitive_scan is over 50 lines; split each branch into its own private helper to reduce size: create _handle_withheld(tool_call, result, scan_result) to log TOOL_OUTPUT_WITHHELD and return the WITHHELD ToolExecutionResult, _handle_redacted(tool_call, result, scan_result) to log TOOL_OUTPUT_REDACTED and return the redacted ToolExecutionResult (including "output_redacted" and "redaction_findings" in metadata), and _handle_fail_closed(tool_call, result, scan_result) to log TOOL_OUTPUT_WITHHELD with outcome and return the fail-closed ToolExecutionResult (with "output_scan_failed" metadata); then simplify handle_sensitive_scan to branch on scan_result.outcome and scan_result.redacted_content and delegate to these helpers, preserving use of ScanOutcome, logger, ToolExecutionResult, TOOL_OUTPUT_WITHHELD, and TOOL_OUTPUT_REDACTED.

tests/unit/tools/test_invoker_output_scan.py

…ests - Add sentinel metadata key to ToolExecutionResult in _scan_output tests so regressions in metadata merging are caught - Fix TestDefensiveFallback docstring to reference handle_sensitive_scan

src/ai_company/tools/invoker.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 97-103: Add Google-style docstrings to the two public fixtures
security_registry and tool_call: for security_registry document that it returns
a ToolRegistry containing the _OutputScanTestTool and any brief description of
its purpose; for tool_call document that it returns a ToolCall fixture
describing the default ToolCall used in tests and any key parameter meanings.
Update the docstrings above the function definitions for security_registry and
tool_call using Google-style formatting (short summary line, optional blank
line, and parameter/return descriptions as appropriate).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5152585-8927-4e0a-ba3c-d20c96a023c2

📥 Commits

Reviewing files that changed from the base of the PR and between 76bbd32 and 5b51f36.

📒 Files selected for processing (2)

src/ai_company/tools/invoker.py
tests/unit/tools/test_invoker_output_scan.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Backend
GitHub Check: Build Web
GitHub Check: Greptile Review
GitHub Check: Test (Python 3.14)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (no from __future__ import annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

tests/unit/tools/test_invoker_output_scan.py
src/ai_company/tools/invoker.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80

Files:

tests/unit/tools/test_invoker_output_scan.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/

Files:

src/ai_company/tools/invoker.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests

Files:

src/ai_company/tools/invoker.py

🧬 Code graph analysis (2)

tests/unit/tools/test_invoker_output_scan.py (2)

src/ai_company/security/models.py (5)

OutputScanResult (176-226)

ScanOutcome (23-44)

SecurityContext (94-124)

SecurityVerdict (59-91)

SecurityVerdictType (47-56)

src/ai_company/tools/invoker.py (3)

ToolInvoker (73-779)

invoke (317-339)

_scan_output (274-315)

src/ai_company/tools/invoker.py (1)

src/ai_company/tools/scan_result_handler.py (1)

handle_sensitive_scan (24-90)

🔇 Additional comments (1)

src/ai_company/tools/invoker.py (1)

307-315: Good fail-closed metadata preservation and outcome delegation.

This correctly preserves existing result.metadata on scan exceptions and routes sensitive outputs through handle_sensitive_scan, matching the new ScanOutcome flow.

coderabbitai · 2026-03-14T15:30:39Z

tests/unit/tools/test_invoker_output_scan.py

+def security_registry() -> ToolRegistry:
+    return ToolRegistry([_OutputScanTestTool()])
+
+
+@pytest.fixture
+def tool_call() -> ToolCall:
+    return ToolCall(


⚠️ Potential issue | 🟡 Minor

Add docstrings to public fixtures for guideline compliance.

security_registry and tool_call are public module-level functions and should include Google-style docstrings.

✏️ Suggested patch

`@pytest.fixture` def security_registry() -> ToolRegistry: + """Create a registry with the output-scan test tool.""" return ToolRegistry([_OutputScanTestTool()]) `@pytest.fixture` def tool_call() -> ToolCall: + """Create a default tool call used by output-scan tests.""" return ToolCall( id="call_scan_001", name="secure_tool", arguments={"cmd": "ls"}, )

As per coding guidelines, "Use Google style docstrings required on all public classes and functions — enforced by ruff D rules".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/tools/test_invoker_output_scan.py` around lines 97 - 103, Add Google-style docstrings to the two public fixtures security_registry and tool_call: for security_registry document that it returns a ToolRegistry containing the _OutputScanTestTool and any brief description of its purpose; for tool_call document that it returns a ToolCall fixture describing the default ToolCall used in tests and any key parameter meanings. Update the docstrings above the function definitions for security_registry and tool_call using Google-style formatting (short summary line, optional blank line, and parameter/return descriptions as appropriate).

greptile-apps · 2026-03-14T15:35:03Z

src/ai_company/tools/scan_result_handler.py

+    if scan_result.redacted_content is not None:
+        logger.warning(
+            TOOL_OUTPUT_REDACTED,
+            tool_call_id=tool_call.id,
+            tool_name=tool_call.name,
+            findings=scan_result.findings,
+        )
+        return ToolExecutionResult(
+            content=scan_result.redacted_content,
+            is_error=result.is_error,
+            metadata={
+                **result.metadata,
+                "output_redacted": True,
+                "redaction_findings": list(scan_result.findings),
+            },
+        )


Implicit REDACTED dispatch — asymmetric with WITHHELD branch

The WITHHELD branch dispatches on scan_result.outcome == ScanOutcome.WITHHELD (an explicit outcome check), but the REDACTED branch dispatches on scan_result.redacted_content is not None (an implicit field check). Today these are semantically equivalent for valid OutputScanResult objects because the model validator enforces REDACTED ↔ redacted_content is not None, but the asymmetry has a forward-compatibility risk.

If a future ScanOutcome member (e.g. PARTIAL) is added that also carries redacted_content, it would silently hit this branch and be logged as TOOL_OUTPUT_REDACTED, misleading any monitoring that distinguishes outcomes. Consider making the dispatch explicit:

if scan_result.outcome == ScanOutcome.WITHHELD: # … existing WITHHELD path … if scan_result.outcome == ScanOutcome.REDACTED and scan_result.redacted_content is not None: # … existing REDACTED path … # Defensive: model_copy() can produce REDACTED with redacted_content=None.

The dual condition (outcome == REDACTED and redacted_content is not None) preserves the current fail-closed behaviour for the broken-model-copy case — a REDACTED outcome with None content still falls through to the defensive fallback — while also making it explicit that any unrecognised future outcome falls closed.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/tools/scan_result_handler.py Line: 58-73 Comment: **Implicit REDACTED dispatch — asymmetric with WITHHELD branch** The WITHHELD branch dispatches on `scan_result.outcome == ScanOutcome.WITHHELD` (an explicit outcome check), but the REDACTED branch dispatches on `scan_result.redacted_content is not None` (an implicit field check). Today these are semantically equivalent for valid `OutputScanResult` objects because the model validator enforces `REDACTED ↔ redacted_content is not None`, but the asymmetry has a forward-compatibility risk. If a future `ScanOutcome` member (e.g. `PARTIAL`) is added that also carries `redacted_content`, it would silently hit this branch and be logged as `TOOL_OUTPUT_REDACTED`, misleading any monitoring that distinguishes outcomes. Consider making the dispatch explicit: ```python if scan_result.outcome == ScanOutcome.WITHHELD: # … existing WITHHELD path … if scan_result.outcome == ScanOutcome.REDACTED and scan_result.redacted_content is not None: # … existing REDACTED path … # Defensive: model_copy() can produce REDACTED with redacted_content=None. ``` The dual condition (`outcome == REDACTED and redacted_content is not None`) preserves the current fail-closed behaviour for the broken-model-copy case — a `REDACTED` outcome with `None` content still falls through to the defensive fallback — while also making it explicit that any unrecognised future outcome falls closed. How can I resolve this? If you propose a fix, please make it concise.

🤖 I have created a release *beep* *boop* --- ## [0.2.0](v0.1.4...v0.2.0) (2026-03-15) ##First probably usable release? Most likely not no and everything will break ### Features * add /get/ installation page for CLI installer ([#413](#413)) ([6a47e4a](6a47e4a)) * add cross-platform Go CLI for container lifecycle management ([#401](#401)) ([0353d9e](0353d9e)), closes [#392](#392) * add explicit ScanOutcome signal to OutputScanResult ([#394](#394)) ([be33414](be33414)), closes [#284](#284) * add meeting scheduler, event-triggered meetings, and Go CLI lint fixes ([#407](#407)) ([5550fa1](5550fa1)) * wire MultiAgentCoordinator into runtime ([#396](#396)) ([7a9e516](7a9e516)) ### Bug Fixes * CLA signatures branch + declutter repo root ([#409](#409)) ([cabe953](cabe953)) * correct Release Please branch name in release workflow ([#410](#410)) ([515d816](515d816)) * replace slsa-github-generator with attest-build-provenance, fix DAST ([#424](#424)) ([eeaadff](eeaadff)) * resolve CodeQL path-injection alerts in Go CLI ([#412](#412)) ([f41bf16](f41bf16)) ### Refactoring * rename package from ai_company to synthorg ([#422](#422)) ([df27c6e](df27c6e)), closes [#398](#398) ### Tests * add fuzz and property-based testing across all layers ([#421](#421)) ([115a742](115a742)) ### CI/CD * add SLSA L3 provenance for CLI binaries and container images ([#423](#423)) ([d3dc75d](d3dc75d)) * bump the major group with 4 updates ([#405](#405)) ([20c7a04](20c7a04)) ### Maintenance * bump github.com/spf13/cobra from 1.9.1 to 1.10.2 in /cli in the minor-and-patch group ([#402](#402)) ([e31edbb](e31edbb)) * narrow BSL Additional Use Grant and add CLA ([#408](#408)) ([5ab15bd](5ab15bd)), closes [#406](#406) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Aureliolo added 2 commits March 14, 2026 14:53

Copilot AI review requested due to automatic review settings March 14, 2026 14:13

Aureliolo temporarily deployed to ci March 14, 2026 14:13 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Aureliolo March 14, 2026 14:13 View session

Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 14:14 — with GitHub Actions Inactive

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

Copilot AI reviewed Mar 14, 2026

View reviewed changes

greptile-apps bot reviewed Mar 14, 2026

View reviewed changes

src/ai_company/tools/invoker.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

src/ai_company/tools/invoker.py Outdated Show resolved Hide resolved

tests/unit/security/test_models.py Outdated Show resolved Hide resolved

tests/unit/tools/test_invoker_security.py Outdated Show resolved Hide resolved

Aureliolo temporarily deployed to ci March 14, 2026 15:01 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 15:02 — with GitHub Actions Inactive

greptile-apps bot reviewed Mar 14, 2026

View reviewed changes

src/ai_company/security/models.py Show resolved Hide resolved

src/ai_company/tools/scan_result_handler.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

src/ai_company/tools/invoker.py Outdated Show resolved Hide resolved

tests/unit/tools/test_invoker_output_scan.py Outdated Show resolved Hide resolved

tests/unit/tools/test_invoker_security.py Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 14, 2026 15:14

Aureliolo temporarily deployed to ci March 14, 2026 15:14 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Aureliolo March 14, 2026 15:14 View session

Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 15:15 — with GitHub Actions Inactive

Copilot AI reviewed Mar 14, 2026

View reviewed changes

src/ai_company/tools/invoker.py Outdated

content=("Output scan failed (fail-closed). Tool output withheld."),

content="Output scan failed (fail-closed). Tool output withheld.",

is_error=True,

metadata={"output_scan_failed": True},

fix: preserve result.metadata in scanner-exception fail-closed branch

ba50011

Copilot correctly identified that the scanner-exception path in _scan_output drops tool execution metadata. Align with handle_sensitive_scan by merging result.metadata.

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

test: add sentinel metadata to verify metadata preservation in scan t…

5b51f36

…ests - Add sentinel metadata key to ToolExecutionResult in _scan_output tests so regressions in metadata merging are caught - Fix TestDefensiveFallback docstring to reference handle_sensitive_scan

greptile-apps bot reviewed Mar 14, 2026

View reviewed changes

src/ai_company/tools/invoker.py Show resolved Hide resolved

Aureliolo temporarily deployed to ci March 14, 2026 15:27 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 15:28 — with GitHub Actions Inactive

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

Aureliolo merged commit be33414 into main Mar 14, 2026
28 checks passed

Aureliolo deleted the feat/withheld-signal branch March 14, 2026 15:33

Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 15:33 — with GitHub Actions Inactive

Aureliolo mentioned this pull request Mar 14, 2026

chore(main): release 0.2.0 #400

Merged

greptile-apps bot reviewed Mar 14, 2026

View reviewed changes

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed

Conversation

Aureliolo commented Mar 14, 2026

Summary

Review coverage

Test plan

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

codecov bot commented Mar 14, 2026 •

edited

Loading

greptile-apps bot commented Mar 14, 2026 •

edited

Loading