feat: add explicit ScanOutcome signal to OutputScanResult#394
feat: add explicit ScanOutcome signal to OutputScanResult#394
Conversation
Add a ScanOutcome enum (CLEAN/REDACTED/WITHHELD/LOG_ONLY) to OutputScanResult so the ToolInvoker can distinguish intentional policy withholding from scanner failure. Previously both produced has_sensitive_data=True + redacted_content=None, leading to a misleading "no redacted content available" log message. - ScanOutcome enum in security/models.py with consistency validation - OutputScanner sets REDACTED, WithholdPolicy sets WITHHELD, LogOnlyPolicy sets LOG_ONLY on sensitive results - ToolInvoker branches on WITHHELD first with dedicated event (TOOL_OUTPUT_WITHHELD) and clear user-facing message - 5 new model validation tests, 2 new invoker tests, outcome assertions added across all affected test files
Pre-reviewed by 14 agents, 12 findings addressed: - Tighten OutputScanResult validator: REDACTED requires redacted_content, LOG_ONLY requires has_sensitive_data=False - Extract _handle_sensitive_scan from _scan_output (50-line limit) - Rewrite _scan_output docstring to cover all outcome paths - Add TOOL_OUTPUT_WITHHELD to CLAUDE.md event catalog - Update design spec Output Scan Response Policies with ScanOutcome - Add tests: LOG_ONLY pass-through, WITHHELD priority over redacted_content, new model validation constraints - Remove redundant in-function imports in test
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a ScanOutcome enum and outcome field to OutputScanResult; policies and scanner set outcomes; introduces handle_sensitive_scan to centralize outcome-based handling; ToolInvoker delegates sensitive-output handling to that handler and branches on outcomes; docs and tests updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Invoker as ToolInvoker
participant Tool as Tool
participant Scanner as OutputScanner
participant Policy as SecurityPolicy
participant Handler as ScanHandler
Client->>Invoker: invoke_tool(tool_call)
Invoker->>Tool: execute_tool(tool_call)
Tool-->>Invoker: ToolExecutionResult(output)
Invoker->>Scanner: scan(output)
Scanner-->>Invoker: OutputScanResult(outcome=CLEAN|REDACTED|WITHHELD|LOG_ONLY)
Invoker->>Policy: apply_policy(scan_result)
Policy-->>Invoker: OutputScanResult(outcome=REDACTED|WITHHELD|LOG_ONLY|CLEAN)
Invoker->>Handler: handle_sensitive_scan(tool_call, result, scan_result)
alt scan_result.outcome == WITHHELD
Handler-->>Invoker: ToolExecutionResult(error="output withheld by security policy", metadata: output_withheld)
Invoker-->>Client: error response
else scan_result.outcome == REDACTED
Handler-->>Invoker: ToolExecutionResult(output=redacted_content, metadata: output_redacted)
Invoker-->>Client: redacted output
else scan_result.outcome == LOG_ONLY or CLEAN
Invoker-->>Client: ToolExecutionResult(output=original)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the handling of sensitive data within tool outputs by introducing an explicit Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new ScanOutcome enum (CLEAN, REDACTED, WITHHELD, LOG_ONLY) to explicitly track the result of security output scanning policies. The OutputScanResult model is updated to include this outcome, along with new validation rules. Policy implementations (WithholdPolicy, LogOnlyPolicy) are modified to set the appropriate ScanOutcome. The ToolInvoker now uses this ScanOutcome to handle sensitive tool outputs, providing distinct responses such as specific error messages and metadata for withheld content, returning redacted content, or passing through original output for LOG_ONLY or CLEAN outcomes. Documentation and unit tests are updated to reflect these new behaviors and validations.
There was a problem hiding this comment.
Pull request overview
Adds an explicit ScanOutcome enum to OutputScanResult so the tool invoker can distinguish intentional output withholding (policy) from scanner failures, improving logging, user-facing messaging, and downstream observability.
Changes:
- Introduce
ScanOutcome(CLEAN/REDACTED/WITHHELD/LOG_ONLY) and enforce outcome/content consistency via model validation. - Update scanner/policies/invoker to set and branch on outcomes (including a dedicated
TOOL_OUTPUT_WITHHELDevent + message). - Expand unit tests and docs to cover
WITHHELDandLOG_ONLYbehavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/tools/test_invoker_security.py | Adds invoker tests for WITHHELD and LOG_ONLY outcome behavior and messaging. |
| tests/unit/security/test_service.py | Updates service tests to include explicit ScanOutcome on scan results. |
| tests/unit/security/test_output_scanner.py | Asserts scanner returns CLEAN for no findings and REDACTED when findings occur. |
| tests/unit/security/test_output_scan_policy.py | Verifies policies set/transform ScanOutcome correctly (e.g., Withhold → WITHHELD, LogOnly → LOG_ONLY). |
| tests/unit/security/test_models.py | Adds validation tests ensuring outcome/result field consistency. |
| src/ai_company/tools/invoker.py | Branches on ScanOutcome.WITHHELD first and extracts sensitive-scan handling helper. |
| src/ai_company/security/output_scanner.py | Sets outcome=REDACTED when findings are detected. |
| src/ai_company/security/output_scan_policy.py | Withhold sets WITHHELD; LogOnly returns clean result with LOG_ONLY. |
| src/ai_company/security/models.py | Adds ScanOutcome and validates invariants between has_sensitive_data, redacted_content, and outcome. |
| src/ai_company/security/init.py | Re-exports ScanOutcome as part of the security public API. |
| src/ai_company/observability/events/tool.py | Adds TOOL_OUTPUT_WITHHELD event constant. |
| docs/design/operations.md | Documents ScanOutcome semantics and policy/outcome mapping. |
| CLAUDE.md | Updates logging/event-constant guidance examples to include TOOL_OUTPUT_WITHHELD. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/tools/invoker.py
Outdated
| "Sensitive data detected — content withheld by security policy." | ||
| ), | ||
| is_error=True, | ||
| metadata={"output_withheld": True}, |
| has_sensitive_data=True, | ||
| findings=("potential leak",), | ||
| redacted_content=None, | ||
| outcome=ScanOutcome.WITHHELD, |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #394 +/- ##
==========================================
+ Coverage 93.83% 93.84% +0.01%
==========================================
Files 462 463 +1
Lines 21653 21691 +38
Branches 2079 2086 +7
==========================================
+ Hits 20319 20357 +38
Misses 1032 1032
Partials 302 302 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Greptile SummaryThis PR introduces a Two minor issues found:
Confidence Score: 4/5
Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/ai_company/tools/scan_result_handler.py
Line: 58-73
Comment:
**Implicit REDACTED dispatch — asymmetric with WITHHELD branch**
The WITHHELD branch dispatches on `scan_result.outcome == ScanOutcome.WITHHELD` (an explicit outcome check), but the REDACTED branch dispatches on `scan_result.redacted_content is not None` (an implicit field check). Today these are semantically equivalent for valid `OutputScanResult` objects because the model validator enforces `REDACTED ↔ redacted_content is not None`, but the asymmetry has a forward-compatibility risk.
If a future `ScanOutcome` member (e.g. `PARTIAL`) is added that also carries `redacted_content`, it would silently hit this branch and be logged as `TOOL_OUTPUT_REDACTED`, misleading any monitoring that distinguishes outcomes. Consider making the dispatch explicit:
```python
if scan_result.outcome == ScanOutcome.WITHHELD:
# … existing WITHHELD path …
if scan_result.outcome == ScanOutcome.REDACTED and scan_result.redacted_content is not None:
# … existing REDACTED path …
# Defensive: model_copy() can produce REDACTED with redacted_content=None.
```
The dual condition (`outcome == REDACTED and redacted_content is not None`) preserves the current fail-closed behaviour for the broken-model-copy case — a `REDACTED` outcome with `None` content still falls through to the defensive fallback — while also making it explicit that any unrecognised future outcome falls closed.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 277-292
Comment:
**Misleading fallback note — `LogOnlyPolicy` failure is stricter, not looser**
The note logged on policy failure says `"may be less strict than intended policy"`, but this characterisation is only accurate when a strict policy (e.g. `WithholdPolicy`) fails. When a permissive policy fails the direction is reversed:
| Intended policy | Fallback (raw scanner result) | Relative strictness |
|---|---|---|
| `WithholdPolicy` | `REDACTED` — partial content returned | **less strict** ✓ |
| `LogOnlyPolicy` | `REDACTED` — content now blocked | **more strict** ✗ |
| `RedactPolicy` | `REDACTED` — same behaviour | no change |
Operators monitoring this log event (keyed on `SECURITY_INTERCEPTOR_ERROR`) could be misled into thinking the system degraded toward permissiveness when a `LogOnlyPolicy` failure actually makes it more restrictive. A neutral note avoids the ambiguity:
```suggestion
note="Output scan policy application failed "
"— returning raw scanner result "
"(strictness may differ from intended policy)",
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 5b51f36 |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/ai_company/tools/invoker.py (1)
280-383: 🛠️ Refactor suggestion | 🟠 MajorSplit the new scan helper again.
_handle_sensitive_scanstill spans roughly 60 lines on its own, andsrc/ai_company/tools/invoker.pyis still above the repo's 800-line cap. Please break the withheld/redacted/fail-closed branches into smaller helpers or move scan-result adaptation into a dedicated collaborator.♻️ Refactor sketch
- def _handle_sensitive_scan( - self, - tool_call: ToolCall, - result: ToolExecutionResult, - scan_result: OutputScanResult, - ) -> ToolExecutionResult: - ... + def _handle_sensitive_scan( + self, + tool_call: ToolCall, + result: ToolExecutionResult, + scan_result: OutputScanResult, + ) -> ToolExecutionResult: + if scan_result.outcome == ScanOutcome.WITHHELD: + return self._withheld_scan_result(tool_call, scan_result) + if scan_result.redacted_content is not None: + return self._redacted_scan_result(tool_call, result, scan_result) + return self._fail_closed_scan_result(tool_call, scan_result) + + def _withheld_scan_result( + self, + tool_call: ToolCall, + scan_result: OutputScanResult, + ) -> ToolExecutionResult: + ... + + def _redacted_scan_result( + self, + tool_call: ToolCall, + result: ToolExecutionResult, + scan_result: OutputScanResult, + ) -> ToolExecutionResult: + ... + + def _fail_closed_scan_result( + self, + tool_call: ToolCall, + scan_result: OutputScanResult, + ) -> ToolExecutionResult: + ...As per coding guidelines: "Functions must be less than 50 lines; files must be less than 800 lines."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/tools/invoker.py` around lines 280 - 383, The _handle_sensitive_scan method is too large and should be split into smaller helpers to meet the <50-line function> rule; extract the three branches into distinct methods such as _handle_withheld_output(tool_call, scan_result), _handle_redacted_output(tool_call, result, scan_result), and _handle_failclosed_output(tool_call, scan_result) (or move that adaptation into a new collaborator class like OutputScanHandler) and have _handle_sensitive_scan delegate to them; ensure each extracted helper constructs the same ToolExecutionResult shapes and logs using the same logger messages (TOOL_OUTPUT_WITHHELD, TOOL_OUTPUT_REDACTED) and metadata keys ("output_withheld","output_redacted","redaction_findings","output_scan_failed") so behavior is unchanged, and update _scan_output to call the refactored _handle_sensitive_scan as before.tests/unit/tools/test_invoker_security.py (1)
382-404: 🧹 Nitpick | 🔵 TrivialDon't reuse the old fail-closed test for the
WITHHELDpath.This case now exercises the explicit
WITHHELDbehavior, which is already covered again inTestWithheldOutcome, and no longer matches the “no redacted content fails closed” scenario named here. Keep this test targeted at the fallback/no-usable-redaction case and let the dedicated class own theWITHHELDassertions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/tools/test_invoker_security.py` around lines 382 - 404, The test function test_sensitive_but_no_redacted_content_fails_closed currently constructs an OutputScanResult with outcome=ScanOutcome.WITHHELD which duplicates the dedicated WITHHELD path; change that outcome to a non-WITHHELD enum (e.g. ScanOutcome.UNKNOWN) so this test exercises the fallback/no-usable-redaction case, and remove the assertion that checks for the explicit "withheld by security policy" message (keep the is_error True and the "executed:" not-in-content checks). Update the instantiation of OutputScanResult in this test accordingly and leave TestWithheldOutcome to own WITHHELD-specific assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 325-339: The logger call in the fail-closed branch currently emits
TOOL_OUTPUT_REDACTED which conflates withheld outputs with redactions; change
the event to a failure-specific event constant (e.g. TOOL_OUTPUT_WITHHELD or
TOOL_OUTPUT_WITHHELD_NO_PAYLOAD) instead of TOOL_OUTPUT_REDACTED and keep the
same context (tool_call_id=tool_call.id, tool_name=tool_call.name,
findings=scan_result.findings, note="no redacted content available — withholding
output"); leave the returned ToolExecutionResult and metadata
({"output_scan_failed": True}) as-is so metrics reflect a failure-withheld event
rather than a redaction.
In `@tests/unit/security/test_models.py`:
- Around line 525-586: Replace the multiple near-duplicate test methods that
exercise OutputScanResult with a parametrized table using
pytest.mark.parametrize: create a single test (e.g.,
test_output_scanresult_outcome_matrix) that iterates rows of
(has_sensitive_data, findings, outcome, expect_valid, expect_error_field) and
for each row either constructs OutputScanResult and asserts
result.outcome/redacted_content when expect_valid is True or wraps construction
in pytest.raises(ValidationError, match=expect_error_field) when expect_valid is
False; reference ScanOutcome enum values for outcomes and preserve checks used
in the originals (outcome and redacted_content) so behavior remains identical.
In `@tests/unit/tools/test_invoker_security.py`:
- Around line 816-925: This test file exceeds the 800-line limit by adding two
large test classes; extract TestWithheldOutcome and TestLogOnlyOutcome (and
their tests that call _make_interceptor, _make_verdict, ToolInvoker,
ToolExecutionResult, SecurityContext) into a new test module (e.g.,
tests/unit/tools/test_invoker_output_scan.py), update imports at top of the new
file to reuse existing fixtures (security_registry, tool_call) and helper
symbols, and remove those classes from the original file so the original stays
under 800 lines and each function remains <50 lines; ensure pytest discovery by
matching the new file name pattern and run tests to confirm no missing imports.
---
Outside diff comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 280-383: The _handle_sensitive_scan method is too large and should
be split into smaller helpers to meet the <50-line function> rule; extract the
three branches into distinct methods such as _handle_withheld_output(tool_call,
scan_result), _handle_redacted_output(tool_call, result, scan_result), and
_handle_failclosed_output(tool_call, scan_result) (or move that adaptation into
a new collaborator class like OutputScanHandler) and have _handle_sensitive_scan
delegate to them; ensure each extracted helper constructs the same
ToolExecutionResult shapes and logs using the same logger messages
(TOOL_OUTPUT_WITHHELD, TOOL_OUTPUT_REDACTED) and metadata keys
("output_withheld","output_redacted","redaction_findings","output_scan_failed")
so behavior is unchanged, and update _scan_output to call the refactored
_handle_sensitive_scan as before.
In `@tests/unit/tools/test_invoker_security.py`:
- Around line 382-404: The test function
test_sensitive_but_no_redacted_content_fails_closed currently constructs an
OutputScanResult with outcome=ScanOutcome.WITHHELD which duplicates the
dedicated WITHHELD path; change that outcome to a non-WITHHELD enum (e.g.
ScanOutcome.UNKNOWN) so this test exercises the fallback/no-usable-redaction
case, and remove the assertion that checks for the explicit "withheld by
security policy" message (keep the is_error True and the "executed:"
not-in-content checks). Update the instantiation of OutputScanResult in this
test accordingly and leave TestWithheldOutcome to own WITHHELD-specific
assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 66c3c5a4-0c20-4fa5-8696-e71e41163bef
📒 Files selected for processing (13)
CLAUDE.mddocs/design/operations.mdsrc/ai_company/observability/events/tool.pysrc/ai_company/security/__init__.pysrc/ai_company/security/models.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/output_scanner.pysrc/ai_company/tools/invoker.pytests/unit/security/test_models.pytests/unit/security/test_output_scan_policy.pytests/unit/security/test_output_scanner.pytests/unit/security/test_service.pytests/unit/tools/test_invoker_security.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Agent
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Build docs with: uv run zensical build (output: _site/docs/) — use without --strict until zensical/backlog#72 is resolved
Files:
docs/design/operations.md
docs/**
📄 CodeRabbit inference engine (CLAUDE.md)
Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)
Files:
docs/design/operations.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (nofrom __future__ import annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/security/test_output_scan_policy.pysrc/ai_company/tools/invoker.pysrc/ai_company/observability/events/tool.pytests/unit/security/test_models.pysrc/ai_company/security/models.pytests/unit/tools/test_invoker_security.pysrc/ai_company/security/output_scan_policy.pytests/unit/security/test_output_scanner.pysrc/ai_company/security/__init__.pysrc/ai_company/security/output_scanner.pytests/unit/security/test_service.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80
Files:
tests/unit/security/test_output_scan_policy.pytests/unit/security/test_models.pytests/unit/tools/test_invoker_security.pytests/unit/security/test_output_scanner.pytests/unit/security/test_service.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/
Files:
src/ai_company/tools/invoker.pysrc/ai_company/observability/events/tool.pysrc/ai_company/security/models.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/__init__.pysrc/ai_company/security/output_scanner.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests
Files:
src/ai_company/tools/invoker.pysrc/ai_company/observability/events/tool.pysrc/ai_company/security/models.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/__init__.pysrc/ai_company/security/output_scanner.py
🧠 Learnings (8)
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(__name__); never use import logging or logging.getLogger() or print() in application code
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Always use logger variable name (not _logger or log)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging
Applied to files:
CLAUDE.md
🧬 Code graph analysis (7)
tests/unit/security/test_output_scan_policy.py (1)
src/ai_company/security/models.py (3)
OutputScanResult(172-216)ScanOutcome(23-40)SecurityContext(90-120)
src/ai_company/tools/invoker.py (8)
src/ai_company/providers/models.py (2)
ToolCall(96-119)ToolResult(122-135)src/ai_company/security/models.py (2)
OutputScanResult(172-216)ScanOutcome(23-40)src/ai_company/tools/base.py (2)
ToolExecutionResult(25-54)name(123-125)src/ai_company/security/output_scan_policy.py (5)
name(41-43)name(69-71)name(105-107)name(146-148)name(249-251)src/ai_company/security/rules/credential_detector.py (1)
name(88-90)src/ai_company/security/rules/destructive_op_detector.py (1)
name(94-96)src/ai_company/security/rules/data_leak_detector.py (1)
name(79-81)src/ai_company/security/rules/path_traversal_detector.py (1)
name(66-68)
tests/unit/security/test_models.py (1)
src/ai_company/security/models.py (2)
ScanOutcome(23-40)OutputScanResult(172-216)
src/ai_company/security/output_scan_policy.py (1)
src/ai_company/security/models.py (2)
OutputScanResult(172-216)ScanOutcome(23-40)
src/ai_company/security/__init__.py (1)
src/ai_company/security/models.py (1)
ScanOutcome(23-40)
src/ai_company/security/output_scanner.py (1)
src/ai_company/security/models.py (2)
OutputScanResult(172-216)ScanOutcome(23-40)
tests/unit/security/test_service.py (1)
src/ai_company/security/models.py (1)
ScanOutcome(23-40)
🪛 LanguageTool
CLAUDE.md
[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🔇 Additional comments (6)
src/ai_company/security/__init__.py (1)
32-35: LGTM.Re-exporting
ScanOutcomeat the package root keeps the public surface aligned with the newOutputScanResultcontract.Also applies to: 56-70
tests/unit/security/test_output_scanner.py (1)
5-5: LGTM.These assertions lock in the new CLEAN/REDACTED outcome contract at the scanner boundary, which is the right regression coverage for this enum change.
Also applies to: 25-31, 73-79
src/ai_company/security/output_scanner.py (1)
66-70: LGTM.Assigning
ScanOutcome.REDACTEDwhere the scanner actually produces a redacted payload removes the oldredacted_content is Noneinference from downstream code.src/ai_company/observability/events/tool.py (1)
43-47: LGTM.Adding a dedicated
TOOL_OUTPUT_WITHHELDevent gives the invoker a clean observability hook for intentional policy withholding.src/ai_company/security/output_scan_policy.py (1)
130-132: LGTM.Encoding
WITHHELDandLOG_ONLYdirectly in the returnedOutputScanResultmakes policy intent explicit without mutating the original model.Also applies to: 167-183
src/ai_company/security/models.py (1)
191-215:OutputScanResultnow encodes the outcome matrix cleanly.The validator makes the
REDACTED/WITHHELD/LOG_ONLYstates explicit and prevents the old ambiguous “sensitive result with no usable redaction” shape from being constructed accidentally.
- Fix wrong event constant in defensive fallback (TOOL_OUTPUT_REDACTED → TOOL_OUTPUT_WITHHELD) - Fix misleading comment about model validator preventing defensive fallback - Add WITHHELD requires redacted_content=None validator rule - Preserve result.metadata in WITHHELD branch (consistency with REDACTED) - Update WithholdPolicy docstring (no longer "fail-closed") - Add ScanOutcome to security/__init__.py module docstring - Clarify LOG_ONLY docstring (always has_sensitive_data=False) - Log fallback_outcome in service.py policy failure path - Fix _handle_sensitive_scan and _scan_output docstrings - Log unexpected outcome value in defensive fallback - Extract handle_sensitive_scan to scan_result_handler.py (invoker.py 848→779 lines) - Split TestWithheldOutcome/TestLogOnlyOutcome to test_invoker_output_scan.py - Add TestDefensiveFallback with model_copy-based tests for fail-closed path - Repurpose old fail-closed test to verify WITHHELD behavior - Inline helper tool classes via mocking (test file 926→785 lines) - Parametrize outcome validator tests
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 283-284: Update the docstring for _scan_output to reference the
current function name handle_sensitive_scan (instead of the removed
_handle_sensitive_scan) so implementation and docs match; locate the
_scan_output function and replace the obsolete symbol in its docstring with
handle_sensitive_scan and adjust any wording to reflect that _scan_output
delegates to handle_sensitive_scan which branches on outcome (WITHHELD vs
REDACTED).
In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 4-5: Update the module docstring to reference the currently used
public method name instead of the removed private one: replace references to
ToolInvoker._handle_sensitive_scan with ToolInvoker.handle_sensitive_scan (or
the exact public entrypoint used by the invoker flow) so the docstring matches
the tests that exercise the handler via the invoker; ensure the wording reflects
that the test exercises the handler through the invoker flow rather than calling
a private method directly.
In `@tests/unit/tools/test_invoker_security.py`:
- Around line 619-622: The test mutates BaseTool instances by reassigning
execute (e.g., failing_tool.execute = AsyncMock(...)) which bypasses typing and
risks global state; instead create a dedicated test subclass or factory that
overrides the async execute method on _SecurityTestTool (or produce a new
subclass like FailingSecurityTestTool) and implement execute to raise
RuntimeError("intentional failure"), and update other occurrences (lines around
733-739 and 765-771) to use the subclass/factory so no method rebinding or type:
ignore is needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4f043bce-3709-49da-bbf3-d72a38ce545e
📒 Files selected for processing (9)
src/ai_company/security/__init__.pysrc/ai_company/security/models.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/service.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/scan_result_handler.pytests/unit/security/test_models.pytests/unit/tools/test_invoker_output_scan.pytests/unit/tools/test_invoker_security.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (nofrom __future__ import annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/security/__init__.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/service.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/scan_result_handler.pytests/unit/tools/test_invoker_security.pytests/unit/security/test_models.pysrc/ai_company/security/models.pytests/unit/tools/test_invoker_output_scan.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/
Files:
src/ai_company/security/__init__.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/service.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/scan_result_handler.pysrc/ai_company/security/models.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests
Files:
src/ai_company/security/__init__.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/service.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/scan_result_handler.pysrc/ai_company/security/models.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80
Files:
tests/unit/tools/test_invoker_security.pytests/unit/security/test_models.pytests/unit/tools/test_invoker_output_scan.py
🧠 Learnings (1)
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to tests/**/*.py : Prefer pytest.mark.parametrize for testing similar cases
Applied to files:
tests/unit/security/test_models.py
🧬 Code graph analysis (7)
src/ai_company/security/__init__.py (1)
src/ai_company/security/models.py (1)
ScanOutcome(23-44)
src/ai_company/security/output_scan_policy.py (1)
src/ai_company/security/models.py (2)
OutputScanResult(176-223)ScanOutcome(23-44)
src/ai_company/tools/invoker.py (3)
src/ai_company/tools/scan_result_handler.py (1)
handle_sensitive_scan(24-90)tests/unit/tools/test_invoker_output_scan.py (1)
tool_call(101-106)tests/unit/tools/test_invoker_security.py (1)
tool_call(108-113)
src/ai_company/tools/scan_result_handler.py (4)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/security/models.py (2)
OutputScanResult(176-223)ScanOutcome(23-44)src/ai_company/tools/base.py (1)
ToolExecutionResult(25-54)src/ai_company/providers/models.py (1)
ToolCall(96-119)
tests/unit/tools/test_invoker_security.py (1)
src/ai_company/security/models.py (3)
ScanOutcome(23-44)SecurityVerdictType(47-56)OutputScanResult(176-223)
tests/unit/security/test_models.py (1)
src/ai_company/security/models.py (2)
ScanOutcome(23-44)OutputScanResult(176-223)
tests/unit/tools/test_invoker_output_scan.py (4)
src/ai_company/core/enums.py (1)
ToolCategory(294-308)src/ai_company/security/models.py (5)
OutputScanResult(176-223)ScanOutcome(23-44)SecurityContext(94-124)SecurityVerdict(59-91)SecurityVerdictType(47-56)src/ai_company/tools/invoker.py (3)
ToolInvoker(73-779)invoke(317-339)_scan_output(274-315)src/ai_company/security/output_scan_policy.py (5)
name(41-43)name(69-71)name(107-109)name(148-150)name(251-253)
🔇 Additional comments (9)
src/ai_company/security/service.py (1)
283-290: Good fallback telemetry on policy failure.Including both
policyandfallback_outcomein the exception context makes postmortems much clearer when policy application fails and raw scanner output is returned.src/ai_company/tools/scan_result_handler.py (1)
45-90: Outcome routing is clean and fail-closed behavior is solid.WITHHELD precedence, redaction handling, and the defensive fallback together provide clear behavior for intentional policy withholding vs. malformed sensitive-scan states.
src/ai_company/security/__init__.py (1)
33-37: Public API export is correctly wired.
ScanOutcomeis now imported and re-exported via__all__, keeping the package surface aligned with the new model semantics.Also applies to: 71-72
src/ai_company/security/output_scan_policy.py (1)
132-134: Policy outputs now encode intent explicitly.Setting
WITHHELDandLOG_ONLYdirectly in policy outputs gives downstream consumers a precise signal instead of inferring from nullable payload fields.Also applies to: 179-185
tests/unit/security/test_models.py (1)
525-614: Strong parametric coverage for the outcome matrix.The new accept/reject parametrized tests capture the model’s cross-field constraints clearly and should scale well as outcome semantics evolve.
src/ai_company/security/models.py (1)
23-45: The model consistency matrix is well enforced.
ScanOutcomeplus_check_consistency()now explicitly guards the key combinations (CLEAN,REDACTED,WITHHELD,LOG_ONLY) and prevents ambiguous states at construction time.Also applies to: 193-223
tests/unit/tools/test_invoker_output_scan.py (1)
247-305: Defensive invalid-state coverage is excellent.The
model_copy()-based broken-state tests are a strong defense-in-depth check for fail-closed behavior in sensitive output handling.tests/unit/tools/test_invoker_security.py (2)
13-13: Outcome-aware fixtures are correctly aligned with the new scan model.Good update: these constructors now explicitly set
ScanOutcomewhere sensitive data is present, matching the newOutputScanResultinvariants and avoiding false-negative test setup failures.Also applies to: 337-338, 394-395, 706-707, 745-746
382-404: WITHHELD-path assertion now validates the intended user-facing behavior.This correctly checks the explicit policy-withheld branch instead of conflating it with scanner fail-closed behavior.
- Add missing inverse findings constraint: reject empty findings when has_sensitive_data=True (closes validator asymmetry) - Preserve result.metadata in defensive fallback branch (consistency with WITHHELD and REDACTED branches) - Fix _scan_output docstring: reference handle_sensitive_scan (not the removed _handle_sensitive_scan) - Fix test module docstring: reference handle_sensitive_scan - Replace execute method reassignment with proper subclasses (_FailingSecurityTool, _SoftErrorSecurityTool) — eliminates type: ignore comments
There was a problem hiding this comment.
Pull request overview
Adds an explicit ScanOutcome signal to OutputScanResult so the tool invoker can distinguish intentional policy withholding (WITHHELD) from scanner/handling failures, and routes sensitive-scan results through a dedicated handler.
Changes:
- Introduce
ScanOutcome(CLEAN/REDACTED/WITHHELD/LOG_ONLY) and enforce outcome/field consistency via model validation. - Update scanner and output-scan policies to set
outcomeappropriately, and updateToolInvokerto branch onWITHHELDwith a dedicated event/message. - Add/adjust unit tests and documentation to cover the new outcome-driven behavior (including a new focused invoker output-scan test module).
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/tools/test_invoker_security.py | Updates invoker security tests for ScanOutcome and refactors helper tools. |
| tests/unit/tools/test_invoker_output_scan.py | New tests covering WITHHELD, LOG_ONLY, and defensive fail-closed branches. |
| tests/unit/security/test_service.py | Updates service tests to include outcome where sensitive results are constructed. |
| tests/unit/security/test_output_scanner.py | Asserts scanner sets CLEAN/REDACTED outcomes. |
| tests/unit/security/test_output_scan_policy.py | Verifies policies set WITHHELD/LOG_ONLY and preserve expected semantics. |
| tests/unit/security/test_models.py | Adds ScanOutcome assertions and validation test cases. |
| src/ai_company/tools/scan_result_handler.py | New handler that centralizes sensitive scan result routing and logging. |
| src/ai_company/tools/invoker.py | Delegates sensitive handling to the new handler and adjusts scan failure behavior/docs. |
| src/ai_company/security/service.py | Improves policy-failure logging by including fallback outcome and clarifying strictness. |
| src/ai_company/security/output_scanner.py | Sets outcome=REDACTED when findings are detected. |
| src/ai_company/security/output_scan_policy.py | Policies now set outcome (WITHHELD, LOG_ONLY) when transforming results. |
| src/ai_company/security/models.py | Defines ScanOutcome and validates OutputScanResult consistency. |
| src/ai_company/security/init.py | Re-exports ScanOutcome and updates module documentation list. |
| src/ai_company/observability/events/tool.py | Adds TOOL_OUTPUT_WITHHELD event constant. |
| docs/design/operations.md | Documents ScanOutcome semantics and policy/outcome mapping. |
| CLAUDE.md | Updates logging-event guidance to include TOOL_OUTPUT_WITHHELD. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/tools/invoker.py
Outdated
| content=("Output scan failed (fail-closed). Tool output withheld."), | ||
| content="Output scan failed (fail-closed). Tool output withheld.", | ||
| is_error=True, | ||
| metadata={"output_scan_failed": True}, |
Copilot correctly identified that the scanner-exception path in _scan_output drops tool execution metadata. Align with handle_sensitive_scan by merging result.metadata.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/tools/invoker.py (1)
43-43:⚠️ Potential issue | 🟠 MajorFail closed on
outcome, not justhas_sensitive_data.The PR already acknowledges that
model_copy()can bypassOutputScanResultvalidation. If a malformed result carriesoutcome=WITHHELDoroutcome=REDACTEDwithhas_sensitive_data=False, this guard skipshandle_sensitive_scan()and returns raw tool output.🔒 Suggested hardening
-from ai_company.security.models import SecurityContext, SecurityVerdictType +from ai_company.security.models import ( + ScanOutcome, + SecurityContext, + SecurityVerdictType, +) @@ - if scan_result.has_sensitive_data: + if scan_result.has_sensitive_data or scan_result.outcome in ( + ScanOutcome.REDACTED, + ScanOutcome.WITHHELD, + ): return handle_sensitive_scan(tool_call, result, scan_result) return resultAlso applies to: 313-315
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/tools/invoker.py` at line 43, The code currently only treats results as sensitive when OutputScanResult.has_sensitive_data is true, which allows a crafted model_copy() to bypass handling if outcome is WITHHELD/REDACTED but has_sensitive_data=false; update the guard where the tool output is processed (the block that calls handle_sensitive_scan and the similar logic around lines 313-315) to fail-closed by checking both conditions: call handle_sensitive_scan when result.has_sensitive_data is true OR result.outcome is SecurityVerdictType.WITHHELD or SecurityVerdictType.REDACTED; keep existing import of SecurityVerdictType and use that enum constant(s) and the same result variable name used in the diff to locate the code to change.
♻️ Duplicate comments (1)
tests/unit/tools/test_invoker_output_scan.py (1)
241-246:⚠️ Potential issue | 🟡 MinorUpdate this docstring to the extracted helper name.
_handle_sensitive_scanno longer exists; this branch now exercisesai_company.tools.scan_result_handler.handle_sensitive_scanviaToolInvoker._scan_output.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/tools/test_invoker_output_scan.py` around lines 241 - 246, Update the test docstring to reference the extracted helper function instead of `_handle_sensitive_scan`; specifically mention `ai_company.tools.scan_result_handler.handle_sensitive_scan` (and that it's exercised via `ToolInvoker._scan_output`) so the docstring accurately describes the branch being tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/tools/scan_result_handler.py`:
- Around line 24-90: The function handle_sensitive_scan is over 50 lines; split
each branch into its own private helper to reduce size: create
_handle_withheld(tool_call, result, scan_result) to log TOOL_OUTPUT_WITHHELD and
return the WITHHELD ToolExecutionResult, _handle_redacted(tool_call, result,
scan_result) to log TOOL_OUTPUT_REDACTED and return the redacted
ToolExecutionResult (including "output_redacted" and "redaction_findings" in
metadata), and _handle_fail_closed(tool_call, result, scan_result) to log
TOOL_OUTPUT_WITHHELD with outcome and return the fail-closed ToolExecutionResult
(with "output_scan_failed" metadata); then simplify handle_sensitive_scan to
branch on scan_result.outcome and scan_result.redacted_content and delegate to
these helpers, preserving use of ScanOutcome, logger, ToolExecutionResult,
TOOL_OUTPUT_WITHHELD, and TOOL_OUTPUT_REDACTED.
In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 159-171: The tests currently construct
ToolExecutionResult(content="raw output") without preexisting metadata so they
don't catch regressions where existing metadata could be dropped; update both
tests that call invoker._scan_output (the primary one around ToolExecutionResult
usage and the defensive-fallback test around lines 298-305) to initialize
tool_exec_result with a sentinel metadata key (e.g., tool_exec_result =
ToolExecutionResult(content="raw output", metadata={"sentinel": True})) before
calling invoker._scan_output, and add assertions that scan_exec.metadata still
contains that sentinel (e.g., assert scan_exec.metadata.get("sentinel") is True)
in addition to the existing checks for output_withheld and absence of
output_scan_failed so you verify existing metadata is preserved by the
_scan_output path.
---
Outside diff comments:
In `@src/ai_company/tools/invoker.py`:
- Line 43: The code currently only treats results as sensitive when
OutputScanResult.has_sensitive_data is true, which allows a crafted model_copy()
to bypass handling if outcome is WITHHELD/REDACTED but has_sensitive_data=false;
update the guard where the tool output is processed (the block that calls
handle_sensitive_scan and the similar logic around lines 313-315) to fail-closed
by checking both conditions: call handle_sensitive_scan when
result.has_sensitive_data is true OR result.outcome is
SecurityVerdictType.WITHHELD or SecurityVerdictType.REDACTED; keep existing
import of SecurityVerdictType and use that enum constant(s) and the same result
variable name used in the diff to locate the code to change.
---
Duplicate comments:
In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 241-246: Update the test docstring to reference the extracted
helper function instead of `_handle_sensitive_scan`; specifically mention
`ai_company.tools.scan_result_handler.handle_sensitive_scan` (and that it's
exercised via `ToolInvoker._scan_output`) so the docstring accurately describes
the branch being tested.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ffb9a485-a305-4561-8406-6ca22aaa3d47
📒 Files selected for processing (6)
src/ai_company/security/models.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/scan_result_handler.pytests/unit/security/test_models.pytests/unit/tools/test_invoker_output_scan.pytests/unit/tools/test_invoker_security.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Agent
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (nofrom __future__ import annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/tools/scan_result_handler.pysrc/ai_company/tools/invoker.pysrc/ai_company/security/models.pytests/unit/security/test_models.pytests/unit/tools/test_invoker_security.pytests/unit/tools/test_invoker_output_scan.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/
Files:
src/ai_company/tools/scan_result_handler.pysrc/ai_company/tools/invoker.pysrc/ai_company/security/models.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests
Files:
src/ai_company/tools/scan_result_handler.pysrc/ai_company/tools/invoker.pysrc/ai_company/security/models.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80
Files:
tests/unit/security/test_models.pytests/unit/tools/test_invoker_security.pytests/unit/tools/test_invoker_output_scan.py
🧠 Learnings (1)
📚 Learning: 2026-03-14T11:20:53.699Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T11:20:53.699Z
Learning: Applies to tests/**/*.py : Prefer pytest.mark.parametrize for testing similar cases
Applied to files:
tests/unit/security/test_models.py
🧬 Code graph analysis (3)
src/ai_company/tools/invoker.py (1)
src/ai_company/tools/scan_result_handler.py (1)
handle_sensitive_scan(24-90)
tests/unit/security/test_models.py (1)
src/ai_company/security/models.py (2)
ScanOutcome(23-44)OutputScanResult(176-226)
tests/unit/tools/test_invoker_security.py (2)
src/ai_company/security/models.py (3)
ScanOutcome(23-44)SecurityVerdictType(47-56)OutputScanResult(176-226)tests/unit/tools/test_invoker_output_scan.py (5)
test_withheld_outcome_returns_policy_message(117-138)security_registry(97-98)tool_call(102-107)_make_interceptor(77-90)_make_verdict(59-74)
🔇 Additional comments (1)
tests/unit/tools/test_invoker_security.py (1)
55-84: Nice cleanup: these dedicated test tools are much safer than method rebinding.The failure and soft-error scenarios are now explicit, type-safe, and no longer mutate
BaseTool.executein place.As per coding guidelines,
**/*.py: "Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement."
| def handle_sensitive_scan( | ||
| tool_call: ToolCall, | ||
| result: ToolExecutionResult, | ||
| scan_result: OutputScanResult, | ||
| ) -> ToolExecutionResult: | ||
| """Route a sensitive scan result to the correct handler. | ||
|
|
||
| Branches on ``ScanOutcome``: | ||
|
|
||
| - ``WITHHELD``: return error with "withheld by policy" message. | ||
| - ``redacted_content`` present: return redacted content. | ||
| - Defensive fallback: withhold output (fail-closed). | ||
|
|
||
| Args: | ||
| tool_call: The tool call being processed. | ||
| result: The original tool execution result. | ||
| scan_result: The scan result with ``has_sensitive_data=True``. | ||
|
|
||
| Returns: | ||
| A new ``ToolExecutionResult`` reflecting the scan outcome. | ||
| """ | ||
| if scan_result.outcome == ScanOutcome.WITHHELD: | ||
| logger.warning( | ||
| TOOL_OUTPUT_WITHHELD, | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool_call.name, | ||
| findings=scan_result.findings, | ||
| note="content withheld by security policy", | ||
| ) | ||
| return ToolExecutionResult( | ||
| content=("Sensitive data detected — content withheld by security policy."), | ||
| is_error=True, | ||
| metadata={**result.metadata, "output_withheld": True}, | ||
| ) | ||
| if scan_result.redacted_content is not None: | ||
| logger.warning( | ||
| TOOL_OUTPUT_REDACTED, | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool_call.name, | ||
| findings=scan_result.findings, | ||
| ) | ||
| return ToolExecutionResult( | ||
| content=scan_result.redacted_content, | ||
| is_error=result.is_error, | ||
| metadata={ | ||
| **result.metadata, | ||
| "output_redacted": True, | ||
| "redaction_findings": list(scan_result.findings), | ||
| }, | ||
| ) | ||
| # Defensive: model_copy() skips model validators, so a policy | ||
| # that clears redacted_content without updating outcome could | ||
| # produce REDACTED with redacted_content=None. This branch | ||
| # catches that case (and future outcome values) — fail-closed. | ||
| logger.warning( | ||
| TOOL_OUTPUT_WITHHELD, | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool_call.name, | ||
| findings=scan_result.findings, | ||
| outcome=scan_result.outcome.value, | ||
| note="no redacted content available — withholding output", | ||
| ) | ||
| return ToolExecutionResult( | ||
| content="Sensitive data detected (fail-closed). Tool output withheld.", | ||
| is_error=True, | ||
| metadata={**result.metadata, "output_scan_failed": True}, | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
handle_sensitive_scan() still exceeds the 50-line cap.
Moving this logic out of ToolInvoker fixed the caller, but the extracted helper itself now spans 67 lines. Split the WITHHELD, REDACTED, and fail-closed branches into private helpers so the new module also satisfies the repo limit.
As per coding guidelines, **/*.py: "Functions must be less than 50 lines; files must be less than 800 lines."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/scan_result_handler.py` around lines 24 - 90, The
function handle_sensitive_scan is over 50 lines; split each branch into its own
private helper to reduce size: create _handle_withheld(tool_call, result,
scan_result) to log TOOL_OUTPUT_WITHHELD and return the WITHHELD
ToolExecutionResult, _handle_redacted(tool_call, result, scan_result) to log
TOOL_OUTPUT_REDACTED and return the redacted ToolExecutionResult (including
"output_redacted" and "redaction_findings" in metadata), and
_handle_fail_closed(tool_call, result, scan_result) to log TOOL_OUTPUT_WITHHELD
with outcome and return the fail-closed ToolExecutionResult (with
"output_scan_failed" metadata); then simplify handle_sensitive_scan to branch on
scan_result.outcome and scan_result.redacted_content and delegate to these
helpers, preserving use of ScanOutcome, logger, ToolExecutionResult,
TOOL_OUTPUT_WITHHELD, and TOOL_OUTPUT_REDACTED.
…ests - Add sentinel metadata key to ToolExecutionResult in _scan_output tests so regressions in metadata merging are caught - Fix TestDefensiveFallback docstring to reference handle_sensitive_scan
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/tools/test_invoker_output_scan.py`:
- Around line 97-103: Add Google-style docstrings to the two public fixtures
security_registry and tool_call: for security_registry document that it returns
a ToolRegistry containing the _OutputScanTestTool and any brief description of
its purpose; for tool_call document that it returns a ToolCall fixture
describing the default ToolCall used in tests and any key parameter meanings.
Update the docstrings above the function definitions for security_registry and
tool_call using Google-style formatting (short summary line, optional blank
line, and parameter/return descriptions as appropriate).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a5152585-8927-4e0a-ba3c-d20c96a023c2
📒 Files selected for processing (2)
src/ai_company/tools/invoker.pytests/unit/tools/test_invoker_output_scan.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; use PEP 649 native lazy annotations (nofrom __future__ import annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) — enforced by ruff on Python 3.14
Add type hints to all public functions; enforce mypy strict mode
Use Google style docstrings required on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields (including optional and tuple variants)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task; existing code is being migrated incrementally
Enforce 88 character line length — ruff enforces this
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/tools/test_invoker_output_scan.pysrc/ai_company/tools/invoker.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slow
Use pytest-xdist via -n auto for parallel test execution — ALWAYS include -n auto when running pytest, never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
Use test-provider, test-small-001, etc. in tests instead of real vendor names
Run unit tests with: uv run pytest tests/ -m unit -n auto
Run integration tests with: uv run pytest tests/ -m integration -n auto
Run e2e tests with: uv run pytest tests/ -m e2e -n auto
Run full test suite with coverage: uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80
Files:
tests/unit/tools/test_invoker_output_scan.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import get_logger from ai_company.observability and define logger = get_logger(name); never use import logging or logging.getLogger() or print() in application code
Always use logger variable name (not _logger or log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Maintain 80% minimum code coverage — enforced in CI
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Lint Python code with: uv run ruff check src/ tests/
Auto-fix lint issues with: uv run ruff check src/ tests/ --fix
Format code with: uv run ruff format src/ tests/
Type-check with strict mode: uv run mypy src/ tests/
Files:
src/ai_company/tools/invoker.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry; RetryExhaustedError signals all retries failed
Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests
Files:
src/ai_company/tools/invoker.py
🧬 Code graph analysis (2)
tests/unit/tools/test_invoker_output_scan.py (2)
src/ai_company/security/models.py (5)
OutputScanResult(176-226)ScanOutcome(23-44)SecurityContext(94-124)SecurityVerdict(59-91)SecurityVerdictType(47-56)src/ai_company/tools/invoker.py (3)
ToolInvoker(73-779)invoke(317-339)_scan_output(274-315)
src/ai_company/tools/invoker.py (1)
src/ai_company/tools/scan_result_handler.py (1)
handle_sensitive_scan(24-90)
🔇 Additional comments (1)
src/ai_company/tools/invoker.py (1)
307-315: Good fail-closed metadata preservation and outcome delegation.This correctly preserves existing
result.metadataon scan exceptions and routes sensitive outputs throughhandle_sensitive_scan, matching the newScanOutcomeflow.
| def security_registry() -> ToolRegistry: | ||
| return ToolRegistry([_OutputScanTestTool()]) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def tool_call() -> ToolCall: | ||
| return ToolCall( |
There was a problem hiding this comment.
Add docstrings to public fixtures for guideline compliance.
security_registry and tool_call are public module-level functions and should include Google-style docstrings.
✏️ Suggested patch
`@pytest.fixture`
def security_registry() -> ToolRegistry:
+ """Create a registry with the output-scan test tool."""
return ToolRegistry([_OutputScanTestTool()])
`@pytest.fixture`
def tool_call() -> ToolCall:
+ """Create a default tool call used by output-scan tests."""
return ToolCall(
id="call_scan_001",
name="secure_tool",
arguments={"cmd": "ls"},
)As per coding guidelines, "Use Google style docstrings required on all public classes and functions — enforced by ruff D rules".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/tools/test_invoker_output_scan.py` around lines 97 - 103, Add
Google-style docstrings to the two public fixtures security_registry and
tool_call: for security_registry document that it returns a ToolRegistry
containing the _OutputScanTestTool and any brief description of its purpose; for
tool_call document that it returns a ToolCall fixture describing the default
ToolCall used in tests and any key parameter meanings. Update the docstrings
above the function definitions for security_registry and tool_call using
Google-style formatting (short summary line, optional blank line, and
parameter/return descriptions as appropriate).
| if scan_result.redacted_content is not None: | ||
| logger.warning( | ||
| TOOL_OUTPUT_REDACTED, | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool_call.name, | ||
| findings=scan_result.findings, | ||
| ) | ||
| return ToolExecutionResult( | ||
| content=scan_result.redacted_content, | ||
| is_error=result.is_error, | ||
| metadata={ | ||
| **result.metadata, | ||
| "output_redacted": True, | ||
| "redaction_findings": list(scan_result.findings), | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Implicit REDACTED dispatch — asymmetric with WITHHELD branch
The WITHHELD branch dispatches on scan_result.outcome == ScanOutcome.WITHHELD (an explicit outcome check), but the REDACTED branch dispatches on scan_result.redacted_content is not None (an implicit field check). Today these are semantically equivalent for valid OutputScanResult objects because the model validator enforces REDACTED ↔ redacted_content is not None, but the asymmetry has a forward-compatibility risk.
If a future ScanOutcome member (e.g. PARTIAL) is added that also carries redacted_content, it would silently hit this branch and be logged as TOOL_OUTPUT_REDACTED, misleading any monitoring that distinguishes outcomes. Consider making the dispatch explicit:
if scan_result.outcome == ScanOutcome.WITHHELD:
# … existing WITHHELD path …
if scan_result.outcome == ScanOutcome.REDACTED and scan_result.redacted_content is not None:
# … existing REDACTED path …
# Defensive: model_copy() can produce REDACTED with redacted_content=None.The dual condition (outcome == REDACTED and redacted_content is not None) preserves the current fail-closed behaviour for the broken-model-copy case — a REDACTED outcome with None content still falls through to the defensive fallback — while also making it explicit that any unrecognised future outcome falls closed.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/scan_result_handler.py
Line: 58-73
Comment:
**Implicit REDACTED dispatch — asymmetric with WITHHELD branch**
The WITHHELD branch dispatches on `scan_result.outcome == ScanOutcome.WITHHELD` (an explicit outcome check), but the REDACTED branch dispatches on `scan_result.redacted_content is not None` (an implicit field check). Today these are semantically equivalent for valid `OutputScanResult` objects because the model validator enforces `REDACTED ↔ redacted_content is not None`, but the asymmetry has a forward-compatibility risk.
If a future `ScanOutcome` member (e.g. `PARTIAL`) is added that also carries `redacted_content`, it would silently hit this branch and be logged as `TOOL_OUTPUT_REDACTED`, misleading any monitoring that distinguishes outcomes. Consider making the dispatch explicit:
```python
if scan_result.outcome == ScanOutcome.WITHHELD:
# … existing WITHHELD path …
if scan_result.outcome == ScanOutcome.REDACTED and scan_result.redacted_content is not None:
# … existing REDACTED path …
# Defensive: model_copy() can produce REDACTED with redacted_content=None.
```
The dual condition (`outcome == REDACTED and redacted_content is not None`) preserves the current fail-closed behaviour for the broken-model-copy case — a `REDACTED` outcome with `None` content still falls through to the defensive fallback — while also making it explicit that any unrecognised future outcome falls closed.
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.2.0](v0.1.4...v0.2.0) (2026-03-15) ##First probably usable release? Most likely not no and everything will break ### Features * add /get/ installation page for CLI installer ([#413](#413)) ([6a47e4a](6a47e4a)) * add cross-platform Go CLI for container lifecycle management ([#401](#401)) ([0353d9e](0353d9e)), closes [#392](#392) * add explicit ScanOutcome signal to OutputScanResult ([#394](#394)) ([be33414](be33414)), closes [#284](#284) * add meeting scheduler, event-triggered meetings, and Go CLI lint fixes ([#407](#407)) ([5550fa1](5550fa1)) * wire MultiAgentCoordinator into runtime ([#396](#396)) ([7a9e516](7a9e516)) ### Bug Fixes * CLA signatures branch + declutter repo root ([#409](#409)) ([cabe953](cabe953)) * correct Release Please branch name in release workflow ([#410](#410)) ([515d816](515d816)) * replace slsa-github-generator with attest-build-provenance, fix DAST ([#424](#424)) ([eeaadff](eeaadff)) * resolve CodeQL path-injection alerts in Go CLI ([#412](#412)) ([f41bf16](f41bf16)) ### Refactoring * rename package from ai_company to synthorg ([#422](#422)) ([df27c6e](df27c6e)), closes [#398](#398) ### Tests * add fuzz and property-based testing across all layers ([#421](#421)) ([115a742](115a742)) ### CI/CD * add SLSA L3 provenance for CLI binaries and container images ([#423](#423)) ([d3dc75d](d3dc75d)) * bump the major group with 4 updates ([#405](#405)) ([20c7a04](20c7a04)) ### Maintenance * bump github.com/spf13/cobra from 1.9.1 to 1.10.2 in /cli in the minor-and-patch group ([#402](#402)) ([e31edbb](e31edbb)) * narrow BSL Additional Use Grant and add CLA ([#408](#408)) ([5ab15bd](5ab15bd)), closes [#406](#406) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
ScanOutcomeenum (CLEAN/REDACTED/WITHHELD/LOG_ONLY) toOutputScanResultsoToolInvokercan distinguish intentional policy withholding from scanner failureREDACTED,WithholdPolicysetsWITHHELD,LogOnlyPolicysetsLOG_ONLYToolInvoker._scan_outputbranches onWITHHELDfirst with dedicatedTOOL_OUTPUT_WITHHELDevent and clear user-facing messageREDACTEDrequiresredacted_content,LOG_ONLYrequireshas_sensitive_data=False_handle_sensitive_scanfrom_scan_outputto meet 50-line function limitReview coverage
Pre-reviewed by 14 agents (code-reviewer, python-reviewer, security-reviewer, silent-failure-hunter, type-design-analyzer, comment-analyzer, conventions-enforcer, logging-audit, resilience-audit, async-concurrency-reviewer, pr-test-analyzer, test-quality-reviewer, docs-consistency, issue-resolution-verifier). 12 findings addressed, 1 false positive (PEP 758
asbinding requires parentheses), 1 deferred (test file splitting).Test plan
Closes #284