feat: add pluggable output scan response policies#280
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a pluggable post-output handling layer: new Changes
Sequence Diagram(s)sequenceDiagram
participant Tool as Tool
participant SecOps as SecOpsService
participant Scanner as OutputScanner
participant Audit as AuditLog
participant Policy as OutputScanPolicy
participant Context as SecurityContext
Tool->>SecOps: return output content
SecOps->>Scanner: scan(content)
Scanner-->>SecOps: OutputScanResult
SecOps->>Audit: record(scan_result)
Audit-->>SecOps: recorded
alt output_scan_policy configured
SecOps->>Policy: apply(scan_result, Context)
Policy-->>SecOps: transformed OutputScanResult
SecOps->>Context: emit SECURITY_OUTPUT_SCAN_POLICY_APPLIED
else no policy or policy fails
Note over SecOps: return raw (redacted) scan_result
end
SecOps-->>Tool: sanitized result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security framework by introducing a pluggable policy system for handling the results of output scans. It allows the system to respond to detected sensitive data in a configurable manner, such as redacting, withholding, or merely logging the findings, with behavior potentially varying based on the agent's autonomy level. This provides greater control and flexibility in managing sensitive information, improving the overall security posture without disrupting core operations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Greptile SummaryThis PR introduces a pluggable output scan response policy system for the Key highlights and findings:
Confidence Score: 1/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Invoker
participant SecOpsService
participant OutputScanner
participant AuditLog
participant OutputScanResponsePolicy
Invoker->>SecOpsService: scan_output(context, output)
SecOpsService->>OutputScanner: scan(output)
OutputScanner-->>SecOpsService: OutputScanResult
alt has_sensitive_data AND audit_enabled
SecOpsService->>AuditLog: record(AuditEntry)
end
SecOpsService->>OutputScanResponsePolicy: apply(scan_result, context)
note over OutputScanResponsePolicy: RedactPolicy → pass-through<br/>WithholdPolicy → clear redacted_content<br/>LogOnlyPolicy → return OutputScanResult()<br/>AutonomyTieredPolicy → delegate by level
OutputScanResponsePolicy-->>SecOpsService: transformed OutputScanResult
alt policy.apply() raises (non MemoryError/RecursionError)
SecOpsService-->>Invoker: raw scan_result (fail-safe)
else success
SecOpsService-->>Invoker: transformed OutputScanResult
end
Last reviewed commit: 2e3a3aa |
There was a problem hiding this comment.
Pull request overview
Introduces pluggable output scan response policies to control how sensitive-output findings are handled after OutputScanner runs, with supporting config surface and expanded unit/service tests. This extends the security subsystem’s post-tool output scanning so it can be tuned by strategy (redact/withhold/log-only/autonomy-tiered) rather than being hard-coded.
Changes:
- Add
OutputScanResponsePolicyprotocol + built-in policy implementations and a factory (build_output_scan_policy) for enum-driven selection. - Integrate an optional
output_scan_policyintoSecOpsService.scan_output()(policy applied after audit recording; failures fall back to raw scan result). - Add/extend unit tests covering policy behavior, factory behavior, and invoker/service integration gaps; update docs/events/exports accordingly.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/security/output_scan_policy.py |
Adds policy protocol + concrete policy implementations (redact/withhold/log-only/autonomy-tiered). |
src/ai_company/security/output_scan_policy_factory.py |
Adds config-enum-to-policy factory for constructing policy instances. |
src/ai_company/security/service.py |
Applies an optional response policy after scanning/auditing in scan_output(). |
src/ai_company/security/config.py |
Adds OutputScanPolicyType enum + SecurityConfig.output_scan_policy_type default. |
src/ai_company/security/__init__.py |
Exposes new policy types/factory via package public API. |
src/ai_company/observability/events/security.py |
Adds a new event constant for policy-related logging. |
tests/unit/security/test_output_scan_policy.py |
New unit tests for each policy + protocol compliance checks. |
tests/unit/security/test_output_scan_policy_factory.py |
New tests validating factory outputs for each enum value. |
tests/unit/security/test_service.py |
Adds scan_output audit-failure isolation + policy integration tests. |
tests/unit/security/test_config.py |
Adds config default/roundtrip/validation coverage for OutputScanPolicyType. |
tests/unit/tools/test_invoker_security.py |
Adds invoker “gap tests” around output scanning behavior and error propagation. |
DESIGN_SPEC.md / CLAUDE.md |
Documents the new policy concept and updates implementation snapshot. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| approval_store: ApprovalStore | None = None, | ||
| effective_autonomy: EffectiveAutonomy | None = None, | ||
| risk_classifier: RiskTierClassifier | None = None, | ||
| output_scan_policy: OutputScanResponsePolicy | None = None, | ||
| ) -> None: |
There was a problem hiding this comment.
SecurityConfig now exposes output_scan_policy_type, but the default SecOpsService wiring won’t use it unless callers explicitly pass output_scan_policy. This means changing config has no effect in the common path. Consider constructing a default policy in SecOpsService.__init__ when output_scan_policy is None (via build_output_scan_policy(config.output_scan_policy_type, effective_autonomy=effective_autonomy)), while keeping constructor injection as an override.
| Required when ``policy_type`` is ``AUTONOMY_TIERED``; | ||
| ignored otherwise. |
There was a problem hiding this comment.
The docstring says effective_autonomy is “Required when policy_type is AUTONOMY_TIERED”, but the implementation explicitly allows None (it logs a warning and falls back to RedactPolicy). Update the docstring to match the actual behavior, or enforce the requirement by raising a ValueError when policy_type is AUTONOMY_TIERED and effective_autonomy is None.
| Required when ``policy_type`` is ``AUTONOMY_TIERED``; | |
| ignored otherwise. | |
| Used when ``policy_type`` is ``AUTONOMY_TIERED``. If ``None`` | |
| in that case, a warning is logged and the policy will fall | |
| back to ``RedactPolicy``. Ignored for other policy types. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/security/__init__.py (1)
1-14: 🧹 Nitpick | 🔵 TrivialUpdate the module docstring to mention the new policy types.
The public API docstring lists
OutputScanResult / OutputScannerbut doesn't mention the newly exported policy types (OutputScanPolicyType,OutputScanResponsePolicy, policy implementations,build_output_scan_policy).📝 Suggested docstring update
- ``OutputScanResult`` / ``OutputScanner`` — post-tool output scanning. +- ``OutputScanResponsePolicy`` — protocol for output scan policies. +- ``RedactPolicy`` / ``WithholdPolicy`` / ``LogOnlyPolicy`` / ``AutonomyTieredPolicy`` — policy implementations. +- ``OutputScanPolicyType`` / ``build_output_scan_policy`` — config-driven policy selection.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/security/__init__.py` around lines 1 - 14, Update the module docstring at the top of src/ai_company/security/__init__.py to include the new output-scan policy exports by adding mentions of OutputScanPolicyType, OutputScanResponsePolicy, the policy implementations, and build_output_scan_policy alongside the existing OutputScanResult / OutputScanner entry; edit the public API list so it explicitly documents these symbols (OutputScanPolicyType, OutputScanResponsePolicy, build_output_scan_policy, plus any concrete policy class names) so they are discoverable in the module docstring.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/security/output_scan_policy.py`:
- Around line 203-216: The constructor currently stores the caller-supplied
policy_map by reference, allowing external mutation; in the __init__ where
policy_map is handled (the policy_map parameter and self._policy_map
assignment), make a deep copy of the provided mapping (use copy.deepcopy) and
then wrap that copy with MappingProxyType before assigning to self._policy_map;
when policy_map is None still assign a MappingProxyType-wrapped copy of
_DEFAULT_AUTONOMY_POLICY_MAP so the stored map is always frozen and cannot be
mutated by callers.
In `@tests/unit/security/test_output_scan_policy_factory.py`:
- Around line 30-73: Add a unit test verifying the factory's defensive TypeError
for unknown types: inside TestBuildOutputScanPolicy add a method (e.g.
test_unknown_policy_type_raises_type_error) that calls build_output_scan_policy
with an invalid value (e.g. a string, using type: ignore) and asserts it raises
TypeError with a message containing "Unknown output scan policy type"; reference
build_output_scan_policy and the defensive branch in
output_scan_policy_factory.py to locate the behavior to test.
In `@tests/unit/tools/test_invoker_security.py`:
- Around line 624-656: Combine the two nearly identical tests
test_memory_error_from_scan_propagates and
test_recursion_error_from_scan_propagates into one parametrized test (e.g.,
test_nonrecoverable_scan_exceptions) using `@pytest.mark.parametrize` to iterate
over the exception instances or types (MemoryError("oom"), RecursionError("max
depth")) and their corresponding exception classes; in the single test set
interceptor = _make_interceptor(...), set interceptor.scan_output =
AsyncMock(side_effect=exception), create the ToolInvoker and assert with
pytest.raises(expected_exception) that await invoker.invoke(tool_call)
propagates the error. Ensure you reference and reuse the same setup
(security_registry, tool_call, _make_interceptor, ToolInvoker,
interceptor.scan_output) so adding more non-recoverable exceptions is simple.
---
Outside diff comments:
In `@src/ai_company/security/__init__.py`:
- Around line 1-14: Update the module docstring at the top of
src/ai_company/security/__init__.py to include the new output-scan policy
exports by adding mentions of OutputScanPolicyType, OutputScanResponsePolicy,
the policy implementations, and build_output_scan_policy alongside the existing
OutputScanResult / OutputScanner entry; edit the public API list so it
explicitly documents these symbols (OutputScanPolicyType,
OutputScanResponsePolicy, build_output_scan_policy, plus any concrete policy
class names) so they are discoverable in the module docstring.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f6a44960-3110-454c-a6b5-7bfa251531a0
📒 Files selected for processing (13)
CLAUDE.mdDESIGN_SPEC.mdsrc/ai_company/observability/events/security.pysrc/ai_company/security/__init__.pysrc/ai_company/security/config.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/output_scan_policy_factory.pysrc/ai_company/security/service.pytests/unit/security/test_config.pytests/unit/security/test_output_scan_policy.pytests/unit/security/test_output_scan_policy_factory.pytests/unit/security/test_service.pytests/unit/tools/test_invoker_security.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 with native lazy annotations
Useexcept A, B:syntax (no parentheses) for exception handling — PEP 758 style, enforced by ruff on Python 3.14
Add type hints to all public functions; enforce with mypy strict mode
Use Google-style docstrings, required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones; for non-Pydantic internal collections usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention and usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves
Use Pydantic v2BaseModel,model_validator,computed_field, andConfigDict; use@computed_fieldfor derived values instead of storing redundant fields
UseNotBlankStrfromcore.typesfor all identifier/name fields (including optionalNotBlankStr | Noneandtuple[NotBlankStr, ...]variants) instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over barecreate_task
Maintain line length of 88 characters (enforced by ruff)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow errors
Validate at system boundaries: user input, external APIs, and config files
Every module with business logic MUST have:from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Always use variable namelogger(not_logger, notlog) for logger instances
Always use event name constants from domain-specific mo...
Files:
src/ai_company/security/__init__.pytests/unit/security/test_output_scan_policy.pytests/unit/security/test_config.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/output_scan_policy_factory.pytests/unit/security/test_service.pysrc/ai_company/security/service.pysrc/ai_company/observability/events/security.pytests/unit/security/test_output_scan_policy_factory.pysrc/ai_company/security/config.pytests/unit/tools/test_invoker_security.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use
import logging,logging.getLogger(), orprint()in application code
Files:
src/ai_company/security/__init__.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/output_scan_policy_factory.pysrc/ai_company/security/service.pysrc/ai_company/observability/events/security.pysrc/ai_company/security/config.py
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2).claude/skill/agent files, (3) third-party import paths/module names. Tests must usetest-provider,test-small-001, etc.
Files:
src/ai_company/security/__init__.pytests/unit/security/test_output_scan_policy.pytests/unit/security/test_config.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/output_scan_policy_factory.pytests/unit/security/test_service.pysrc/ai_company/security/service.pysrc/ai_company/observability/events/security.pytests/unit/security/test_output_scan_policy_factory.pysrc/ai_company/security/config.pytests/unit/tools/test_invoker_security.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowfor test categorization
Useasyncio_mode = 'auto'for async tests — no manual@pytest.mark.asyncioneeded
Set test timeout to 30 seconds per test
Use pytest-xdist via-n autofor parallel test execution
Prefer@pytest.mark.parametrizefor testing similar cases
Files:
tests/unit/security/test_output_scan_policy.pytests/unit/security/test_config.pytests/unit/security/test_service.pytests/unit/security/test_output_scan_policy_factory.pytests/unit/tools/test_invoker_security.py
🧬 Code graph analysis (9)
tests/unit/security/test_output_scan_policy.py (4)
src/ai_company/core/enums.py (2)
AutonomyLevel(459-469)ToolCategory(294-308)src/ai_company/security/autonomy/models.py (1)
EffectiveAutonomy(154-191)src/ai_company/security/models.py (2)
OutputScanResult(152-177)SecurityContext(70-100)src/ai_company/security/output_scan_policy.py (10)
AutonomyTieredPolicy(193-267)LogOnlyPolicy(129-177)OutputScanResponsePolicy(29-55)RedactPolicy(58-88)WithholdPolicy(91-126)apply(41-55)apply(69-88)apply(105-126)apply(144-177)apply(224-267)
tests/unit/security/test_config.py (1)
src/ai_company/security/config.py (2)
OutputScanPolicyType(17-34)SecurityConfig(91-138)
src/ai_company/security/output_scan_policy.py (4)
src/ai_company/core/enums.py (1)
AutonomyLevel(459-469)src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/security/models.py (1)
OutputScanResult(152-177)src/ai_company/security/autonomy/models.py (1)
EffectiveAutonomy(154-191)
src/ai_company/security/output_scan_policy_factory.py (3)
src/ai_company/security/config.py (1)
OutputScanPolicyType(17-34)src/ai_company/security/output_scan_policy.py (4)
LogOnlyPolicy(129-177)OutputScanResponsePolicy(29-55)RedactPolicy(58-88)WithholdPolicy(91-126)src/ai_company/security/autonomy/models.py (1)
EffectiveAutonomy(154-191)
tests/unit/security/test_service.py (7)
src/ai_company/security/output_scan_policy.py (12)
WithholdPolicy(91-126)OutputScanResponsePolicy(29-55)apply(41-55)apply(69-88)apply(105-126)apply(144-177)apply(224-267)name(37-39)name(65-67)name(101-103)name(140-142)name(220-222)src/ai_company/security/output_scanner.py (2)
OutputScanner(31-70)scan(34-70)src/ai_company/security/rules/engine.py (2)
RuleEngine(32-174)evaluate(74-151)src/ai_company/security/service.py (2)
SecOpsService(70-435)scan_output(210-270)src/ai_company/security/models.py (1)
OutputScanResult(152-177)src/ai_company/security/config.py (1)
SecurityConfig(91-138)src/ai_company/security/audit.py (2)
AuditLog(19-136)record(49-69)
src/ai_company/security/service.py (1)
src/ai_company/security/output_scan_policy.py (11)
OutputScanResponsePolicy(29-55)apply(41-55)apply(69-88)apply(105-126)apply(144-177)apply(224-267)name(37-39)name(65-67)name(101-103)name(140-142)name(220-222)
tests/unit/security/test_output_scan_policy_factory.py (2)
src/ai_company/security/output_scan_policy.py (9)
AutonomyTieredPolicy(193-267)LogOnlyPolicy(129-177)RedactPolicy(58-88)WithholdPolicy(91-126)name(37-39)name(65-67)name(101-103)name(140-142)name(220-222)tests/unit/security/test_output_scan_policy.py (1)
_make_autonomy(138-144)
src/ai_company/security/config.py (2)
src/ai_company/core/enums.py (2)
ActionType(372-408)ApprovalRiskLevel(443-449)src/ai_company/security/models.py (1)
SecurityVerdictType(23-32)
tests/unit/tools/test_invoker_security.py (5)
src/ai_company/tools/registry.py (1)
ToolRegistry(30-122)src/ai_company/security/models.py (2)
SecurityVerdictType(23-32)OutputScanResult(152-177)src/ai_company/security/service.py (1)
scan_output(210-270)src/ai_company/security/protocol.py (1)
scan_output(40-54)src/ai_company/tools/invoker.py (2)
invoke(313-371)registry(119-121)
🔇 Additional comments (18)
CLAUDE.md (1)
80-80: Documentation matches the new security surface.This addition accurately captures the new output scan policy layer and its built-in modes in the package overview.
src/ai_company/observability/events/security.py (1)
33-33: LGTM!The new event constant follows the established naming convention and type annotation pattern used throughout the file.
src/ai_company/security/output_scan_policy_factory.py (2)
1-21: LGTM!Module setup follows conventions: proper imports, TYPE_CHECKING guard for
EffectiveAutonomy, and correctly named logger.
24-70: LGTM!The factory function is well-structured:
- Exhaustive match on all
OutputScanPolicyTypemembers- Appropriate warning when
AUTONOMY_TIEREDis used withouteffective_autonomy- Defensive fallback with
TypeErrorfor unknown types (unreachable under normal conditions)- Google-style docstring with complete Args/Returns/Raises documentation
src/ai_company/security/__init__.py (1)
34-43: LGTM!The new policy-related imports and
__all__entries are properly organized and maintain alphabetical ordering.Also applies to: 50-74
src/ai_company/security/config.py (2)
17-34: LGTM!The
OutputScanPolicyTypeenum is well-designed:
- Follows the established
StrEnumpattern used elsewhere in the codebase- Clear docstring with
Memberssection documenting each variant- Sensible default implied by docstring ("REDACT: Return redacted content — current behavior")
91-124: LGTM!The
output_scan_policy_typefield is properly integrated intoSecurityConfig:
- Type annotation matches the enum
- Default value
REDACTpreserves backward compatibility- Docstring updated to document the new field
tests/unit/security/test_config.py (2)
287-310: LGTM!Comprehensive test coverage for
OutputScanPolicyType:
- Parametrized test covering all enum values in config integration
- Validation error test for invalid types
- Direct enum value assertions
173-173: LGTM!Default value assertion correctly verifies
REDACTis the default policy type.tests/unit/security/test_output_scan_policy.py (5)
48-69: LGTM!
RedactPolicytests correctly verify pass-through behavior for both sensitive and clean results.
74-102: LGTM!
WithholdPolicytests verify the fail-closed behavior:redacted_contentis cleared whilefindingsare preserved for audit purposes.
107-128: LGTM!
LogOnlyPolicytests confirm the audit-only behavior: returns emptyOutputScanResultregardless of input findings.
133-231: LGTM!
AutonomyTieredPolicytests provide excellent coverage:
- Default policy map delegation for each autonomy level
- Fallback behavior when
effective_autonomyisNone- Custom policy map override functionality
- Fallback to
RedactPolicywhen a level is missing from a custom map
237-254: LGTM!Protocol compliance tests verify all concrete policy classes satisfy the
OutputScanResponsePolicyprotocol viaisinstancechecks.src/ai_company/security/service.py (3)
45-47: LGTM!Import of
OutputScanResponsePolicywith# noqa: TC001follows the established pattern for type-only imports in this file.
81-118: LGTM!Constructor extension is clean:
- New
output_scan_policyparameter with proper type annotation and defaultNone- Docstring clearly explains the parameter's purpose and behavior when
None- Stored in
_output_scan_policyfollowing the class's naming convention
256-269: LGTM!Policy application in
scan_outputis well-implemented:
- Correctly placed after audit recording (findings preserved in audit before policy transformation)
- Fail-safe error handling: logs the exception with policy name for debugging and returns raw scan result
- Consistent with exception handling patterns elsewhere in the class (
MemoryError, RecursionErrorre-raised)src/ai_company/security/output_scan_policy.py (1)
19-23: No action required — TYPE_CHECKING pattern is correct for Python 3.14.The code follows the coding guideline: "Do NOT use
from __future__ import annotations— Python 3.14 has PEP 649 with native lazy annotations." Under PEP 649, annotations are evaluated lazily only when accessed via introspection (e.g.,typing.get_type_hints()). Since this module and its tests perform no annotation introspection, the TYPE_CHECKING pattern forMapping,EffectiveAutonomy, andSecurityContextis safe and correct. No changes needed.
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed and robust system for pluggable output scan response policies. The addition of four distinct policies, a configuration-driven factory, and seamless integration into the SecOpsService significantly enhances the security capabilities of the framework. The error handling is commendable, particularly the fail-safe mechanism for policy application failures. The test coverage is comprehensive, covering unit, integration, and various edge cases, which ensures high confidence in the changes. I have one minor suggestion to improve logging consistency.
Note: Security Review did not run due to the size of the PR.
| logger.warning( | ||
| SECURITY_OUTPUT_SCAN_POLICY_APPLIED, | ||
| policy="autonomy_tiered", | ||
| autonomy_level=autonomy_level.value, | ||
| note=( | ||
| f"No policy mapped for autonomy level " | ||
| f"'{autonomy_level.value}' — falling back to " | ||
| f"'{self._fallback.name}'" | ||
| ), | ||
| ) |
There was a problem hiding this comment.
For consistency with the structured logging approach used elsewhere in the codebase (e.g., in LogOnlyPolicy), it's better to avoid f-strings in log messages. Passing dynamic values as separate keyword arguments makes logs more easily machine-readable and queryable.
logger.warning(
SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
policy="autonomy_tiered",
autonomy_level=autonomy_level.value,
fallback_to=self._fallback.name,
note="No policy mapped for autonomy level — falling back",
)Add OutputScanResponsePolicy protocol with 4 concrete strategies (RedactPolicy, WithholdPolicy, LogOnlyPolicy, AutonomyTieredPolicy) for configurable per-autonomy output scan behavior. Wire policy into SecOpsService.scan_output and add OutputScanPolicyType enum to SecurityConfig. Fill 6 test gaps across invoker and service (31 new tests total).
…nd docs Pre-reviewed by 10 agents, 17 findings addressed: - Add build_output_scan_policy() factory to wire config enum to runtime - Wrap policy.apply() in try/except (fail-safe to raw scan result) - Upgrade LogOnlyPolicy logging to WARNING when sensitive data found - Add WARNING on AutonomyTieredPolicy fallback for unmapped levels - Wrap _DEFAULT_AUTONOMY_POLICY_MAP in MappingProxyType (immutability) - Add OutputScanPolicyType config tests, SUPERVISED level test, factory tests, policy-on-clean-result test, custom-map-fallback test - Update DESIGN_SPEC.md §12.3 with policy docs and §15.3 structure - Update CLAUDE.md package structure description - Improve docstrings (WithholdPolicy findings, LogOnlyPolicy clarity, SecurityConfig attributes)
…opilot, and Greptile - Add deepcopy + MappingProxyType for AutonomyTieredPolicy policy_map immutability - Auto-wire output_scan_policy from SecurityConfig in SecOpsService - Add import-time exhaustiveness check for AutonomyLevel ↔ policy map - Use safe getattr for policy name access in scan_output error handling - Update docstrings across service, config, factory, and __init__ modules - Parametrize non-recoverable exception tests (MemoryError/RecursionError) - Add audit-before-policy-clears test for LogOnlyPolicy integration - Add unknown policy type factory test - Add immutability assertion for WithholdPolicy - Update README security feature description
548439c to
43ce881
Compare
Mypy resolves _audit_log without the ignore — CI was failing on unused-ignore.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/security/config.py`:
- Line 123: The default for SecurityConfig.output_scan_policy_type is set to
OutputScanPolicyType.REDACT which causes FULL and LOCKED to behave like REDACT
and hides the new autonomy-aware mapping; change the default to
OutputScanPolicyType.AUTONOMY_TIERED (or remove the hard default so the factory
selects the appropriate policy based on autonomy level) so that the
autonomy-tiered behavior is the default; update any related initialization in
the factory that reads SecurityConfig.output_scan_policy_type to expect
AUTONOMY_TIERED and ensure FULL and LOCKED retain their distinct behaviors when
that value is set.
In `@tests/unit/security/test_service.py`:
- Around line 847-849: Remove the unnecessary type-ignore on the access to the
SecOpsService audit log: delete the " # type: ignore[attr-defined]" from the
line that assigns audit_log = service._audit_log in
tests/unit/security/test_service.py (the access to service._audit_log is valid
because _audit_log is a real instance attribute of SecOpsService), leaving the
direct assignment and subsequent assertions unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0db01553-c21f-4e1e-9c39-61ee4fda15d8
📒 Files selected for processing (14)
CLAUDE.mdDESIGN_SPEC.mdREADME.mdsrc/ai_company/observability/events/security.pysrc/ai_company/security/__init__.pysrc/ai_company/security/config.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/output_scan_policy_factory.pysrc/ai_company/security/service.pytests/unit/security/test_config.pytests/unit/security/test_output_scan_policy.pytests/unit/security/test_output_scan_policy_factory.pytests/unit/security/test_service.pytests/unit/tools/test_invoker_security.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ (PEP 649 native lazy annotations) — do NOT usefrom __future__ import annotations
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14
All public functions and classes must have type hints — mypy strict mode is enforced
All public classes and functions must have Google-style docstrings — enforced by ruff D rules
Create new objects instead of mutating existing ones (immutability principle); for non-Pydantic internal collections, usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Separate frozen Pydantic models for config/identity from mutable-via-copy models for runtime state that evolves; never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict); use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier/name fields instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over barecreate_taskfor structured concurrency
Line length: 88 characters maximum — enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
All provider calls must go throughBaseCompletionProviderwhich applies retry and rate limiting automatically — never implement retry logic in driver subclasses or calling code
RetryConfig and RateLimiterConfig are set per-provider inProviderConfig; retryable errors (RateLimitError, ...
Files:
tests/unit/security/test_config.pytests/unit/security/test_output_scan_policy.pysrc/ai_company/security/output_scan_policy_factory.pytests/unit/security/test_output_scan_policy_factory.pytests/unit/security/test_service.pysrc/ai_company/observability/events/security.pysrc/ai_company/security/config.pysrc/ai_company/security/service.pysrc/ai_company/security/output_scan_policy.pytests/unit/tools/test_invoker_security.pysrc/ai_company/security/__init__.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Maintain 80% minimum code coverage — enforced in CI
Useasyncio_mode = "auto"in pytest configuration — no manual@pytest.mark.asyncioneeded for async tests
Set test timeout to 30 seconds per test; usepytest-xdistvia-n autofor parallel execution; prefer@pytest.mark.parametrizefor testing similar cases
Files:
tests/unit/security/test_config.pytests/unit/security/test_output_scan_policy.pytests/unit/security/test_output_scan_policy_factory.pytests/unit/security/test_service.pytests/unit/tools/test_invoker_security.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor names (Anthropic, OpenAI, Claude, GPT, etc.) may only appear in DESIGN_SPEC.md when listing supported providers (not in other markdown files)
Files:
README.mdCLAUDE.mdDESIGN_SPEC.md
DESIGN_SPEC.md
📄 CodeRabbit inference engine (CLAUDE.md)
ALWAYS read
DESIGN_SPEC.mdbefore implementing any feature or planning any issue — the design spec is the starting point for architecture, data models, and behavior
Files:
DESIGN_SPEC.md
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic MUST import logging fromai_company.observabilitywithlogger = get_logger(__name__)— never useimport logging,logging.getLogger(), orprint()in application code
Always use constants from domain-specific modules underai_company.observability.eventsfor event names (e.g.,PROVIDER_CALL_STARTfromevents.provider), importing directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT
Use structured logging kwargs: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Use absolute paths or run commands directly in Bash — never usecdin Bash commands; working directory is already set to project root
Files:
src/ai_company/security/output_scan_policy_factory.pysrc/ai_company/observability/events/security.pysrc/ai_company/security/config.pysrc/ai_company/security/service.pysrc/ai_company/security/output_scan_policy.pysrc/ai_company/security/__init__.py
🧠 Learnings (5)
📚 Learning: 2026-03-10T22:19:11.971Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T22:19:11.971Z
Learning: Applies to .claude/** : Vendor names (Anthropic, OpenAI, Claude, GPT, etc.) may appear in `.claude/` skill and agent files
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-10T22:19:11.970Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T22:19:11.970Z
Learning: Applies to DESIGN_SPEC.md : ALWAYS read `DESIGN_SPEC.md` before implementing any feature or planning any issue — the design spec is the starting point for architecture, data models, and behavior
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-10T22:19:11.971Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T22:19:11.971Z
Learning: Applies to **/*.py : Create new objects instead of mutating existing ones (immutability principle); for non-Pydantic internal collections, use `copy.deepcopy()` at construction and `MappingProxyType` wrapping for read-only enforcement
Applied to files:
src/ai_company/security/output_scan_policy.py
📚 Learning: 2026-03-10T22:19:11.971Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T22:19:11.971Z
Learning: Applies to **/*.py : For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` for field reassignment prevention and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Applied to files:
src/ai_company/security/output_scan_policy.py
📚 Learning: 2026-03-10T22:19:11.971Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T22:19:11.971Z
Learning: Applies to src/**/*.py : Use structured logging kwargs: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
Applied to files:
src/ai_company/security/output_scan_policy.py
🧬 Code graph analysis (9)
tests/unit/security/test_config.py (1)
src/ai_company/security/config.py (2)
OutputScanPolicyType(17-34)SecurityConfig(91-138)
src/ai_company/security/output_scan_policy_factory.py (4)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/security/config.py (1)
OutputScanPolicyType(17-34)src/ai_company/security/output_scan_policy.py (4)
AutonomyTieredPolicy(206-282)LogOnlyPolicy(130-178)OutputScanResponsePolicy(30-56)RedactPolicy(59-89)src/ai_company/security/autonomy/models.py (1)
EffectiveAutonomy(154-191)
tests/unit/security/test_output_scan_policy_factory.py (5)
src/ai_company/core/enums.py (1)
AutonomyLevel(459-469)src/ai_company/security/autonomy/models.py (1)
EffectiveAutonomy(154-191)src/ai_company/security/config.py (1)
OutputScanPolicyType(17-34)src/ai_company/security/output_scan_policy.py (9)
AutonomyTieredPolicy(206-282)LogOnlyPolicy(130-178)RedactPolicy(59-89)WithholdPolicy(92-127)name(38-40)name(66-68)name(102-104)name(141-143)name(235-237)src/ai_company/security/output_scan_policy_factory.py (1)
build_output_scan_policy(25-73)
tests/unit/security/test_service.py (9)
src/ai_company/security/output_scan_policy.py (13)
WithholdPolicy(92-127)OutputScanResponsePolicy(30-56)apply(42-56)apply(70-89)apply(106-127)apply(145-178)apply(239-282)name(38-40)name(66-68)name(102-104)name(141-143)name(235-237)LogOnlyPolicy(130-178)src/ai_company/security/output_scanner.py (2)
OutputScanner(31-70)scan(34-70)src/ai_company/security/rules/engine.py (2)
RuleEngine(32-174)evaluate(74-151)src/ai_company/security/service.py (2)
SecOpsService(74-456)scan_output(223-291)src/ai_company/security/models.py (1)
OutputScanResult(152-177)src/ai_company/security/rules/data_leak_detector.py (2)
evaluate(83-108)name(79-81)src/ai_company/security/rules/policy_validator.py (2)
evaluate(57-98)name(53-55)src/ai_company/security/audit.py (4)
AuditLog(19-136)record(49-69)count(129-131)entries(134-136)tests/unit/security/test_output_scan_policy.py (1)
_make_context(22-30)
src/ai_company/security/config.py (2)
src/ai_company/core/enums.py (2)
ActionType(372-408)ApprovalRiskLevel(443-449)src/ai_company/security/models.py (1)
SecurityVerdictType(23-32)
src/ai_company/security/service.py (3)
src/ai_company/security/output_scan_policy.py (6)
OutputScanResponsePolicy(30-56)apply(42-56)apply(70-89)apply(106-127)apply(145-178)apply(239-282)src/ai_company/security/output_scan_policy_factory.py (1)
build_output_scan_policy(25-73)src/ai_company/memory/errors.py (1)
MemoryError(13-14)
src/ai_company/security/output_scan_policy.py (3)
src/ai_company/core/enums.py (1)
AutonomyLevel(459-469)src/ai_company/security/models.py (2)
OutputScanResult(152-177)SecurityContext(70-100)src/ai_company/security/autonomy/models.py (1)
EffectiveAutonomy(154-191)
tests/unit/tools/test_invoker_security.py (4)
src/ai_company/security/models.py (2)
SecurityVerdictType(23-32)OutputScanResult(152-177)src/ai_company/security/service.py (1)
scan_output(223-291)src/ai_company/security/protocol.py (1)
scan_output(40-54)src/ai_company/tools/invoker.py (2)
invoke(313-371)registry(119-121)
src/ai_company/security/__init__.py (3)
src/ai_company/security/config.py (1)
OutputScanPolicyType(17-34)src/ai_company/security/output_scan_policy.py (5)
AutonomyTieredPolicy(206-282)LogOnlyPolicy(130-178)OutputScanResponsePolicy(30-56)RedactPolicy(59-89)WithholdPolicy(92-127)src/ai_company/security/output_scan_policy_factory.py (1)
build_output_scan_policy(25-73)
🪛 GitHub Actions: CI
tests/unit/security/test_service.py
[error] 847-847: mypy reported Unused 'type: ignore' comment [unused-ignore]. Command: 'uv run mypy src/ tests/'.
🪛 LanguageTool
README.md
[typographical] ~40-~40: To join two clauses or introduce examples, consider using an em dash.
Context: ...only - Security/Approval System (M7) - SecOps agent with rule engine (soft-allo...
(DASH_RULE)
🔇 Additional comments (10)
src/ai_company/security/output_scan_policy.py (1)
1-282: LGTM!The implementation is well-structured with clear separation of concerns:
- The
OutputScanResponsePolicyprotocol provides a clean contract- All four concrete policies (
RedactPolicy,WithholdPolicy,LogOnlyPolicy,AutonomyTieredPolicy) implement the protocol correctly- The import-time validation (lines 194-203) ensures
_DEFAULT_AUTONOMY_POLICY_MAPstays in sync withAutonomyLevel- The
AutonomyTieredPolicy.__init__now correctly freezes the policy map usingcopy.deepcopy()andMappingProxyType(lines 228-230), addressing the immutability requirementsrc/ai_company/security/output_scan_policy_factory.py (1)
25-73: LGTM!The factory function is well-implemented:
- Exhaustive handling of all
OutputScanPolicyTypemembers via match statement- Appropriate warning when
AUTONOMY_TIEREDis used withouteffective_autonomy- Defensive error handling for unknown types with proper logging before raising
TypeErrorsrc/ai_company/security/__init__.py (1)
11-78: LGTM!The public API surface is correctly expanded to expose:
OutputScanPolicyTypeenum for config-driven policy selectionOutputScanResponsePolicyprotocol and all concrete implementationsbuild_output_scan_policyfactory functionThe
__all__list maintains alphabetical ordering and the module docstring is updated to document the new surface.src/ai_company/security/service.py (2)
86-131: LGTM!The
__init__signature extension follows the established pattern of optional dependency injection with sensible defaults. Whenoutput_scan_policyisNone, the factory builds one fromconfig.output_scan_policy_type— this provides both config-driven and programmatic flexibility.
272-290: LGTM!The policy application logic is well-designed:
- Applied after audit recording, so original findings are preserved in audit
- Gracefully degrades to raw scan result on policy failure (scanner-level redaction still applies)
- Correctly re-raises
MemoryErrorandRecursionErroras non-recoverable- Error logging includes policy name for debuggability
tests/unit/security/test_output_scan_policy_factory.py (1)
30-78: LGTM!Comprehensive test coverage for the factory:
- All four policy types tested with correct instance and name assertions
AUTONOMY_TIEREDtested both with and withouteffective_autonomy- Verifies that
effective_autonomyis ignored for non-tiered types- Includes the unknown policy type error path test (line 75-78)
tests/unit/security/test_config.py (1)
284-310: LGTM!Good test coverage for
OutputScanPolicyType:
- Parametrized test ensures all enum members are accepted in
SecurityConfig- Invalid string value correctly rejected with
ValidationError- Enum values validated against expected strings
tests/unit/security/test_service.py (2)
612-665: LGTM!Good coverage of the audit failure scenario — verifies that
scan_outputreturns results even when audit recording fails, and that exceptions don't propagate.
668-822: LGTM!Comprehensive policy integration tests:
- Verifies
policy.applyis called with correct arguments- Tests policy transformation via
WithholdPolicy- Confirms policy is invoked on both sensitive and clean results
- Tests default policy passthrough behavior
- Covers both recoverable (returns raw result) and non-recoverable (
MemoryError/RecursionErrorpropagate) error pathstests/unit/security/test_output_scan_policy.py (1)
1-256: LGTM!Excellent test coverage for all policy implementations:
RedactPolicy: Verifies passthrough for both sensitive and clean resultsWithholdPolicy: Tests content clearing, findings preservation, and immutability contractLogOnlyPolicy: Confirms empty result returned regardless of inputAutonomyTieredPolicy: Covers all autonomy levels with default map, custom map override, and missing-level fallback- Protocol compliance: Parametrized test ensures all concrete classes satisfy
OutputScanResponsePolicy
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/security/test_service.py`:
- Line 827: Import of LogOnlyPolicy is done locally instead of alongside other
top-level policy imports; move the "LogOnlyPolicy" import to the module-level
imports to match "WithholdPolicy" and other policies. Edit the test module to
remove the local "from ai_company.security.output_scan_policy import
LogOnlyPolicy" and add it to the existing block of top-level imports so all
policy classes are imported consistently.
- Around line 675-693: The helper _make_service_with_policy returns a
SecOpsService but doesn't attach the test-only attribute _test_audit_log like
_make_service does, causing later tests to access service._audit_log directly;
update _make_service_with_policy to mirror _make_service by setting the returned
SecOpsService's _test_audit_log to the AuditLog instance (audit_log) so tests
can use the same test attribute access pattern when interacting with
service._audit_log or related assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 251f1133-0efc-44c7-81b2-d0b8cbc5e688
📒 Files selected for processing (1)
tests/unit/security/test_service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ (PEP 649 native lazy annotations) — do NOT usefrom __future__ import annotations
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14
All public functions and classes must have type hints — mypy strict mode is enforced
All public classes and functions must have Google-style docstrings — enforced by ruff D rules
Create new objects instead of mutating existing ones (immutability principle); for non-Pydantic internal collections, usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Separate frozen Pydantic models for config/identity from mutable-via-copy models for runtime state that evolves; never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict); use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier/name fields instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over barecreate_taskfor structured concurrency
Line length: 88 characters maximum — enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
All provider calls must go throughBaseCompletionProviderwhich applies retry and rate limiting automatically — never implement retry logic in driver subclasses or calling code
RetryConfig and RateLimiterConfig are set per-provider inProviderConfig; retryable errors (RateLimitError, ...
Files:
tests/unit/security/test_service.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Maintain 80% minimum code coverage — enforced in CI
Useasyncio_mode = "auto"in pytest configuration — no manual@pytest.mark.asyncioneeded for async tests
Set test timeout to 30 seconds per test; usepytest-xdistvia-n autofor parallel execution; prefer@pytest.mark.parametrizefor testing similar cases
Files:
tests/unit/security/test_service.py
🔇 Additional comments (4)
tests/unit/security/test_service.py (4)
29-31: LGTM!The
TYPE_CHECKINGguard forOutputScanResponsePolicyis the correct pattern—it avoids runtime import overhead while providing type hints for the test helper signature at line 679.
615-666: LGTM!The tests correctly verify that
scan_outputfails safe when audit recording raises, returning the scan result without propagating the exception. This aligns with the PR objective's error handling that "fails safe to the raw scan result."
793-821: LGTM!The parametrized test correctly verifies that non-recoverable errors (
MemoryError,RecursionError) propagate as specified in the PR objectives. This aligns with the stated behavior: "MemoryError/RecursionError are re-raised."
823-850: LGTM!This test correctly verifies that audit entries capture original findings before the policy transforms the result. The
LogOnlyPolicyclears findings in the returned result (has_sensitive_data=False), but the audit entry preserves the original "Bearer token" finding. This ensures forensic traceability as mentioned in the PR objectives.The previous review comment about the unused
type: ignoreon line 847 has been properly addressed.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| raw = policy_map if policy_map is not None else _DEFAULT_AUTONOMY_POLICY_MAP | ||
| self._policy_map: Mapping[AutonomyLevel, OutputScanResponsePolicy] = ( | ||
| MappingProxyType(copy.deepcopy(dict(raw))) | ||
| ) |
There was a problem hiding this comment.
AutonomyTieredPolicy deep-copies policy_map values (copy.deepcopy(dict(raw))). This is unnecessary for immutability (a shallow dict(raw) already decouples from later mutations) and can break callers that pass non-deepcopyable policy instances (e.g., mocks, policies holding locks/resources), raising at runtime during policy construction. Prefer MappingProxyType(dict(raw)) (or a shallow copy) and document that policy instances are treated as immutable/stateless.
| ) | ||
| if not scan_result.has_sensitive_data: | ||
| return scan_result | ||
| return scan_result.model_copy(update={"redacted_content": None}) |
There was a problem hiding this comment.
WithholdPolicy forces redacted_content=None to trigger ToolInvoker's fail-closed branch, but ToolInvoker currently treats this as "no redacted content available" and logs/metrics that message. That becomes misleading when redaction was available but intentionally withheld by policy. Consider adding an explicit signal (e.g., a withheld flag on OutputScanResult or a distinct field/enum for outcome) and updating the invoker to log/report the correct reason when output is withheld by policy.
| return scan_result.model_copy(update={"redacted_content": None}) | |
| return scan_result.model_copy( | |
| update={ | |
| "redacted_content": None, | |
| "withheld": True, | |
| } | |
| ) |
README.md
Outdated
| - **Memory Backend Adapter (M5)** - Memory protocols, retrieval pipeline, org memory, and consolidation are complete; initial Mem0 adapter backend ([ADR-001](docs/decisions/ADR-001-memory-layer.md)) pending; research backends (GraphRAG, Temporal KG) planned | ||
| - **CLI Surface** - `cli/` package is placeholder-only | ||
| - **Security/Approval System (M7)** - SecOps agent with rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, risk classifier, and ToolInvoker integration are implemented; progressive trust (4 strategies), promotion/demotion, autonomy levels (5 tiers with presets, resolver, change strategies) and approval timeout policies (wait-forever, auto-deny, tiered, escalation-chain with task park/resume) are implemented; real authentication (JWT/OAuth) and approval workflow gates are planned | ||
| - **Security/Approval System (M7)** - SecOps agent with rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, and ToolInvoker integration are implemented; progressive trust (4 strategies), promotion/demotion, autonomy levels (5 tiers with presets, resolver, change strategies) and approval timeout policies (wait-forever, auto-deny, tiered, escalation-chain with task park/resume) are implemented; real authentication (JWT/OAuth) and approval workflow gates are planned |
There was a problem hiding this comment.
This bullet is under "Not implemented yet" but states the Security/Approval System features "are implemented". Either move it to the Implemented section (or reword it as planned/in-progress) to avoid contradictory project status documentation.
| - **Security/Approval System (M7)** - SecOps agent with rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, and ToolInvoker integration are implemented; progressive trust (4 strategies), promotion/demotion, autonomy levels (5 tiers with presets, resolver, change strategies) and approval timeout policies (wait-forever, auto-deny, tiered, escalation-chain with task park/resume) are implemented; real authentication (JWT/OAuth) and approval workflow gates are planned | |
| - **Security/Approval System (M7)** - SecOps agent with rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, and ToolInvoker integration are partially implemented and under active development; progressive trust (4 strategies), promotion/demotion, autonomy levels (5 tiers with presets, resolver, change strategies) and approval timeout policies (wait-forever, auto-deny, tiered, escalation-chain with task park/resume) are partially implemented and continuing to evolve; real authentication (JWT/OAuth) and approval workflow gates are planned |
- Change default output_scan_policy_type from REDACT to AUTONOMY_TIERED so autonomy-aware policy routing is active by default (CodeRabbit) - Replace deepcopy with shallow copy + MappingProxyType in AutonomyTieredPolicy to avoid breaking non-copyable policies (Copilot) - Document stateless/immutable expectation on OutputScanResponsePolicy protocol docstring (Greptile) - Add comment explaining intentional import-time integrity guard on _DEFAULT_AUTONOMY_POLICY_MAP (Greptile) - Add unit test asserting default map covers all AutonomyLevel members - Fix f-string in AutonomyTieredPolicy log to use structured kwargs (Gemini) - Move LogOnlyPolicy import to top-level in test_service.py (CodeRabbit) - Align _make_service_with_policy helper to attach _test_audit_log (CodeRabbit) - Fix README contradictory status: move security items to Implemented (Copilot)
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Python 3 SyntaxError: invalid multi-exception catch syntax
except MemoryError, RecursionError: is the old Python 2 binding syntax (except ExcType, varname:), which was explicitly removed in Python 3 (PEP 3110). In Python 3, this is a SyntaxError, meaning the entire service.py module will fail to import. Parentheses are required to catch multiple exception types:
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): |
Note: the same invalid pattern appears at lines 179, 263, 381, and 433 in this file — those are pre-existing occurrences with the same problem and should be fixed as well.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 280-281
Comment:
**Python 3 SyntaxError: invalid multi-exception catch syntax**
`except MemoryError, RecursionError:` is the old Python 2 binding syntax (`except ExcType, varname:`), which was explicitly removed in Python 3 (PEP 3110). In Python 3, this is a `SyntaxError`, meaning the entire `service.py` module will fail to import. Parentheses are required to catch multiple exception types:
```suggestion
except (MemoryError, RecursionError):
```
Note: the same invalid pattern appears at lines 179, 263, 381, and 433 in this file — those are pre-existing occurrences with the same problem and should be fixed as well.
How can I resolve this? If you propose a fix, please make it concise.
Summary
OutputScanResponsePolicyprotocol) with 4 built-in strategies:RedactPolicy,WithholdPolicy,LogOnlyPolicy,AutonomyTieredPolicyOutputScanPolicyTypeconfig enum andbuild_output_scan_policy()factory for config-driven policy selectionSecOpsService.scan_output()with proper error handling (fail-safe to raw scan result)Output Scan Policies
Pre-PR Review Hardening (17 findings from 10 agents)
policy.apply()withMemoryError/RecursionErrorre-raiseLogOnlyPolicyupgraded from DEBUG to WARNING when sensitive data detectedAutonomyTieredPolicywarns on unmapped autonomy level fallback_DEFAULT_AUTONOMY_POLICY_MAPwrapped inMappingProxyType(immutability)OutputScanPolicyTypeconfig enum to runtime policy instancesTest plan
isinstancechecksCloses #263
🤖 Generated with Claude Code