refactor: harden output scan policies with factory, error handling, and docs

Aureliolo · Aureliolo · commit 548439ccfa1f · 2026-03-10T22:52:39.000+01:00
Pre-reviewed by 10 agents, 17 findings addressed:
- Add build_output_scan_policy() factory to wire config enum to runtime
- Wrap policy.apply() in try/except (fail-safe to raw scan result)
- Upgrade LogOnlyPolicy logging to WARNING when sensitive data found
- Add WARNING on AutonomyTieredPolicy fallback for unmapped levels
- Wrap _DEFAULT_AUTONOMY_POLICY_MAP in MappingProxyType (immutability)
- Add OutputScanPolicyType config tests, SUPERVISED level test, factory
  tests, policy-on-clean-result test, custom-map-fallback test
- Update DESIGN_SPEC.md §12.3 with policy docs and §15.3 structure
- Update CLAUDE.md package structure description
- Improve docstrings (WithholdPolicy findings, LogOnlyPolicy clarity,
  SecurityConfig attributes)
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -77,7 +77,7 @@ src/ai_company/
   persistence/    # Operational data persistence — pluggable PersistenceBackend protocol, SQLite initial (§7.6)
   observability/  # Structured logging, correlation tracking, log sinks
   providers/      # LLM provider abstraction (LiteLLM adapter)
-  security/       # SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)
+  security/       # SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)
   templates/      # Pre-built company templates, personality presets, and builder
   tools/          # Tool registry, built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge (mcp/), role-based access
 ```
diff --git a/DESIGN_SPEC.md b/DESIGN_SPEC.md
@@ -80,7 +80,7 @@ The MVP validates the core hypothesis: **a single agent can complete a real task
 > **How to read this spec:** Sections describe the full vision. Each section with deferred features includes an **MVP** callout box indicating what ships in M3 and what is deferred. The full design is documented upfront to inform architecture decisions — protocol interfaces are designed even for features that won't be built until later milestones.
 
 > **Implementation snapshot (2026-03-10):**
-> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface) + Docker sandbox (#50), MCP bridge (#53), code runner + HR engine (hiring/firing/onboarding/offboarding/registry) + performance tracking (task metrics, quality scoring, collaboration scoring, trend detection, rolling windows). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection, non-inferable filtering) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete. SecOps agent (rule engine, audit log, output scanner, risk classifier, ToolInvoker integration), progressive trust (4 strategies: disabled/weighted/per-category/milestone behind TrustStrategy protocol), promotion/demotion (criteria evaluation, approval strategies, model mapping). Autonomy levels (#42: AutonomyLevel enum, presets, 3-level resolver, rule-based auto-downgrade/human-only promotion change strategy) + approval timeout policies (#126: 4 timeout policies, park/resume service, risk tier classifier, timeout checker) complete.
+> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface) + Docker sandbox (#50), MCP bridge (#53), code runner + HR engine (hiring/firing/onboarding/offboarding/registry) + performance tracking (task metrics, quality scoring, collaboration scoring, trend detection, rolling windows). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection, non-inferable filtering) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete. SecOps agent (rule engine, audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, ToolInvoker integration), progressive trust (4 strategies: disabled/weighted/per-category/milestone behind TrustStrategy protocol), promotion/demotion (criteria evaluation, approval strategies, model mapping). Autonomy levels (#42: AutonomyLevel enum, presets, 3-level resolver, rule-based auto-downgrade/human-only promotion change strategy) + approval timeout policies (#126: 4 timeout policies, park/resume service, risk tier classifier, timeout checker) complete.
 > - **Remaining:** JWT/OAuth auth, approval workflow gates.
 
 ### 1.5 Configuration Philosophy
@@ -2429,6 +2429,19 @@ A special meta-agent that reviews all actions before execution:
 > - **D4 — LLM vs Rule-based:** Hybrid approach. Rule engine for known patterns (credentials, path traversal, destructive ops) — sub-ms, covers ~95% of cases. LLM fallback only for uncertain cases (~5%). Full autonomy mode: rules + audit logging only, no LLM path. Hard safety rules (credential exposure, data destruction) **never bypass** regardless of autonomy level. Precedent: AWS GuardDuty, LlamaFirewall, NeMo Guardrails all use hybrid.
 > - **D5 — Integration Point:** Pluggable `SecurityInterceptionStrategy` protocol. Initial: before every tool invocation — slots into existing `ToolInvoker` between permission check and tool execution. Policy strictness (not interception point) configurable per autonomy level. Add post-tool-call scanning for sensitive data in outputs. Performance: sub-ms rule check is invisible against seconds of LLM inference. Future strategies: batch-level (before task step), assignment-only.
 
+#### Output Scan Response Policies
+
+After the output scanner detects sensitive data, a pluggable **`OutputScanResponsePolicy`** protocol decides how to handle the findings. Four built-in policies ship behind the protocol:
+
+| Policy | Behavior | Default for |
+|--------|----------|-------------|
+| **Redact** (default) | Return scanner's redacted content as-is | `SEMI`, `SUPERVISED` autonomy |
+| **Withhold** | Clear redacted content — fail-closed, no partial data returned | `LOCKED` autonomy |
+| **Log-only** | Discard findings (logs at WARNING), pass original output through | `FULL` autonomy |
+| **Autonomy-tiered** | Delegate to a sub-policy based on effective autonomy level | Composite policy |
+
+Policy selection is declarative via `SecurityConfig.output_scan_policy_type` (`OutputScanPolicyType` enum). A factory function (`build_output_scan_policy`) resolves the enum to a concrete policy instance. Runtime constructor injection on `SecOpsService` is also supported for full flexibility. The policy is applied *after* audit recording, preserving audit fidelity regardless of policy outcome.
+
 ### 12.4 Approval Timeout Policy
 
 When an action requires human approval (per autonomy level in §12.2), the agent must wait. The framework provides configurable timeout policies that determine what happens when a human doesn't respond. All policies implement a `TimeoutPolicy` protocol. The policy is configurable per autonomy level and per action risk tier.
@@ -3099,8 +3112,10 @@ ai-company/
 │       │   ├── action_type_mapping.py # Default ToolCategory → ActionType mapping
 │       │   ├── action_types.py     # ActionTypeCategory registry and validation
 │       │   ├── audit.py            # Append-only AuditLog with configurable eviction
-│       │   ├── config.py           # SecurityConfig, SecurityPolicyRule, RuleEngineConfig
+│       │   ├── config.py           # SecurityConfig, SecurityPolicyRule, RuleEngineConfig, OutputScanPolicyType
 │       │   ├── models.py           # SecurityVerdict, SecurityContext, AuditEntry, OutputScanResult
+│       │   ├── output_scan_policy.py # Output scan response policies (redact/withhold/log-only/autonomy-tiered)
+│       │   ├── output_scan_policy_factory.py # build_output_scan_policy() factory
 │       │   ├── output_scanner.py   # Post-tool output scanning (regex-based redaction)
 │       │   ├── protocol.py         # SecurityInterceptionStrategy protocol
 │       │   ├── service.py          # SecOpsService — meta-agent coordinating security
diff --git a/src/ai_company/security/__init__.py b/src/ai_company/security/__init__.py
@@ -38,6 +38,9 @@
     RedactPolicy,
     WithholdPolicy,
 )
+from ai_company.security.output_scan_policy_factory import (
+    build_output_scan_policy,
+)
 from ai_company.security.output_scanner import OutputScanner
 from ai_company.security.protocol import SecurityInterceptionStrategy
 from ai_company.security.rules.engine import RuleEngine
@@ -67,4 +70,5 @@
     "SecurityVerdict",
     "SecurityVerdictType",
     "WithholdPolicy",
+    "build_output_scan_policy",
 ]
diff --git a/src/ai_company/security/config.py b/src/ai_company/security/config.py
@@ -98,6 +98,8 @@ class SecurityConfig(BaseModel):
         post_tool_scanning_enabled: Scan tool output for secrets.
         hard_deny_action_types: Action types always denied.
         auto_approve_action_types: Action types always approved.
+        output_scan_policy_type: Output scan response policy
+            (default: ``REDACT``).
         custom_policies: User-defined policy rules.
     """
 
diff --git a/src/ai_company/security/output_scan_policy.py b/src/ai_company/security/output_scan_policy.py
@@ -6,6 +6,7 @@
 autonomy level.
 """
 
+from types import MappingProxyType
 from typing import TYPE_CHECKING, Protocol, runtime_checkable
 
 from ai_company.core.enums import AutonomyLevel
@@ -91,6 +92,9 @@ class WithholdPolicy:
     """Clear redacted content when sensitive data is found.
 
     Forces fail-closed in the invoker — no partial data is returned.
+    The ``findings`` tuple is deliberately preserved so that audit
+    consumers can categorise what was detected without seeing the
+    actual content.
     """
 
     @property
@@ -123,10 +127,13 @@ def apply(
 
 
 class LogOnlyPolicy:
-    """Return an empty result — findings are logged but output passes through.
+    """Discard scan findings, returning a clean result.
 
+    The caller should treat the original tool output as unmodified.
     Suitable for audit-only mode or high-trust agents where output
-    scanning is informational rather than enforced.
+    scanning is informational rather than enforced.  The audit entry
+    written by ``SecOpsService.scan_output`` before this policy runs
+    preserves the original findings.
     """
 
     @property
@@ -137,32 +144,50 @@ def name(self) -> str:
     def apply(
         self,
         scan_result: OutputScanResult,
-        context: SecurityContext,  # noqa: ARG002
+        context: SecurityContext,
     ) -> OutputScanResult:
-        """Return empty result regardless of findings.
+        """Return a clean ``OutputScanResult`` regardless of findings.
+
+        Suppresses enforcement while preserving the audit log entry
+        written by ``SecOpsService.scan_output``.
 
         Args:
             scan_result: Result from the output scanner.
-            context: Security context (unused).
+            context: Security context of the tool invocation.
 
         Returns:
-            Empty ``OutputScanResult``.
+            Clean ``OutputScanResult`` with ``has_sensitive_data=False``.
         """
-        logger.debug(
-            SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
-            policy="log_only",
-            has_sensitive_data=scan_result.has_sensitive_data,
-        )
+        if scan_result.has_sensitive_data:
+            logger.warning(
+                SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
+                policy="log_only",
+                has_sensitive_data=True,
+                findings=scan_result.findings,
+                tool_name=context.tool_name,
+                agent_id=context.agent_id,
+                note="Sensitive data detected but passed through by log_only policy",
+            )
+        else:
+            logger.debug(
+                SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
+                policy="log_only",
+                has_sensitive_data=False,
+            )
         return OutputScanResult()
 
 
-# Default autonomy-to-policy mapping.
-_DEFAULT_AUTONOMY_POLICY_MAP: dict[AutonomyLevel, OutputScanResponsePolicy] = {
-    AutonomyLevel.FULL: LogOnlyPolicy(),
-    AutonomyLevel.SEMI: RedactPolicy(),
-    AutonomyLevel.SUPERVISED: RedactPolicy(),
-    AutonomyLevel.LOCKED: WithholdPolicy(),
-}
+# Default autonomy-to-policy mapping (read-only).
+_DEFAULT_AUTONOMY_POLICY_MAP: Mapping[AutonomyLevel, OutputScanResponsePolicy] = (
+    MappingProxyType(
+        {
+            AutonomyLevel.FULL: LogOnlyPolicy(),
+            AutonomyLevel.SEMI: RedactPolicy(),
+            AutonomyLevel.SUPERVISED: RedactPolicy(),
+            AutonomyLevel.LOCKED: WithholdPolicy(),
+        }
+    )
+)
 
 
 class AutonomyTieredPolicy:
@@ -212,18 +237,31 @@ def apply(
         """
         if self._effective_autonomy is None:
             delegate = self._fallback
+            autonomy_level = None
         else:
-            level = self._effective_autonomy.level
-            delegate = self._policy_map.get(level, self._fallback)
+            autonomy_level = self._effective_autonomy.level
+            mapped = self._policy_map.get(autonomy_level)
+            if mapped is not None:
+                delegate = mapped
+            else:
+                delegate = self._fallback
+                logger.warning(
+                    SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
+                    policy="autonomy_tiered",
+                    autonomy_level=autonomy_level.value,
+                    note=(
+                        f"No policy mapped for autonomy level "
+                        f"'{autonomy_level.value}' — falling back to "
+                        f"'{self._fallback.name}'"
+                    ),
+                )
 
         logger.debug(
             SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
             policy="autonomy_tiered",
             delegate=delegate.name,
             autonomy_level=(
-                self._effective_autonomy.level.value
-                if self._effective_autonomy is not None
-                else None
+                autonomy_level.value if autonomy_level is not None else None
             ),
         )
         return delegate.apply(scan_result, context)
diff --git a/src/ai_company/security/output_scan_policy_factory.py b/src/ai_company/security/output_scan_policy_factory.py
@@ -0,0 +1,70 @@
+"""Factory for creating output scan policy instances from configuration."""
+
+from typing import TYPE_CHECKING
+
+from ai_company.observability import get_logger
+from ai_company.observability.events.security import (
+    SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
+)
+from ai_company.security.config import OutputScanPolicyType
+from ai_company.security.output_scan_policy import (
+    AutonomyTieredPolicy,
+    LogOnlyPolicy,
+    OutputScanResponsePolicy,
+    RedactPolicy,
+    WithholdPolicy,
+)
+
+if TYPE_CHECKING:
+    from ai_company.security.autonomy.models import EffectiveAutonomy
+
+logger = get_logger(__name__)
+
+
+def build_output_scan_policy(
+    policy_type: OutputScanPolicyType,
+    *,
+    effective_autonomy: EffectiveAutonomy | None = None,
+) -> OutputScanResponsePolicy:
+    """Create an output scan policy from its config enum value.
+
+    Args:
+        policy_type: Declarative policy selection from config.
+        effective_autonomy: Resolved autonomy for the current run.
+            Required when ``policy_type`` is ``AUTONOMY_TIERED``;
+            ignored otherwise.
+
+    Returns:
+        A configured output scan response policy instance.
+
+    Raises:
+        TypeError: If ``policy_type`` is not a recognized enum member.
+    """
+    match policy_type:
+        case OutputScanPolicyType.REDACT:
+            return RedactPolicy()
+        case OutputScanPolicyType.WITHHOLD:
+            return WithholdPolicy()
+        case OutputScanPolicyType.LOG_ONLY:
+            return LogOnlyPolicy()
+        case OutputScanPolicyType.AUTONOMY_TIERED:
+            if effective_autonomy is None:
+                logger.warning(
+                    SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
+                    policy_type=policy_type.value,
+                    note="output_scan_policy_type=autonomy_tiered "
+                    "but no effective_autonomy — "
+                    "AutonomyTieredPolicy will fall back to "
+                    "RedactPolicy",
+                )
+            return AutonomyTieredPolicy(
+                effective_autonomy=effective_autonomy,
+            )
+
+    msg = f"Unknown output scan policy type: {policy_type!r}"  # type: ignore[unreachable]
+    logger.warning(
+        SECURITY_OUTPUT_SCAN_POLICY_APPLIED,
+        policy_type=str(policy_type),
+        note="Unknown output scan policy type",
+    )
+    raise TypeError(msg)
diff --git a/src/ai_company/security/service.py b/src/ai_company/security/service.py
@@ -254,7 +254,18 @@ async def scan_output(
                 )
 
         if self._output_scan_policy is not None:
-            result = self._output_scan_policy.apply(result, context)
+            try:
+                result = self._output_scan_policy.apply(result, context)
+            except MemoryError, RecursionError:
+                raise
+            except Exception:
+                logger.exception(
+                    SECURITY_INTERCEPTOR_ERROR,
+                    tool_name=context.tool_name,
+                    policy=self._output_scan_policy.name,
+                    note="Output scan policy application failed "
+                    "— returning raw scan result",
+                )
 
         return result
 
diff --git a/tests/unit/security/test_config.py b/tests/unit/security/test_config.py
@@ -5,6 +5,7 @@
 
 from ai_company.core.enums import ApprovalRiskLevel
 from ai_company.security.config import (
+    OutputScanPolicyType,
     RuleEngineConfig,
     SecurityConfig,
     SecurityPolicyRule,
@@ -169,6 +170,7 @@ def test_defaults(self) -> None:
             "code:read",
             "docs:write",
         )
+        assert cfg.output_scan_policy_type == OutputScanPolicyType.REDACT
         assert cfg.custom_policies == ()
 
     def test_disabled_state(self) -> None:
@@ -270,8 +272,39 @@ def test_json_roundtrip(self) -> None:
             post_tool_scanning_enabled=False,
             hard_deny_action_types=("org:fire",),
             auto_approve_action_types=(),
+            output_scan_policy_type=OutputScanPolicyType.WITHHOLD,
             custom_policies=(policy,),
         )
         json_str = cfg.model_dump_json()
         restored = SecurityConfig.model_validate_json(json_str)
         assert restored == cfg
+        assert restored.output_scan_policy_type == OutputScanPolicyType.WITHHOLD
+
+
+# ── OutputScanPolicyType ─────────────────────────────────────────
+
+
+@pytest.mark.unit
+class TestOutputScanPolicyType:
+    """Tests for OutputScanPolicyType enum values and config integration."""
+
+    @pytest.mark.parametrize(
+        "policy_type",
+        list(OutputScanPolicyType),
+    )
+    def test_all_policy_types_accepted_in_config(
+        self,
+        policy_type: OutputScanPolicyType,
+    ) -> None:
+        cfg = SecurityConfig(output_scan_policy_type=policy_type)
+        assert cfg.output_scan_policy_type == policy_type
+
+    def test_invalid_policy_type_rejected(self) -> None:
+        with pytest.raises(ValidationError):
+            SecurityConfig(output_scan_policy_type="nonexistent")  # type: ignore[arg-type]
+
+    def test_enum_values(self) -> None:
+        assert OutputScanPolicyType.REDACT.value == "redact"
+        assert OutputScanPolicyType.WITHHOLD.value == "withhold"
+        assert OutputScanPolicyType.LOG_ONLY.value == "log_only"
+        assert OutputScanPolicyType.AUTONOMY_TIERED.value == "autonomy_tiered"
diff --git a/tests/unit/security/test_output_scan_policy.py b/tests/unit/security/test_output_scan_policy.py
diff --git a/tests/unit/security/test_output_scan_policy_factory.py b/tests/unit/security/test_output_scan_policy_factory.py
diff --git a/tests/unit/security/test_service.py b/tests/unit/security/test_service.py