Skip to content

feat: add autonomy levels and approval timeout policies (#42, #126)#197

Merged
Aureliolo merged 4 commits intomainfrom
feat/autonomy-levels-and-approval-timeout
Mar 10, 2026
Merged

feat: add autonomy levels and approval timeout policies (#42, #126)#197
Aureliolo merged 4 commits intomainfrom
feat/autonomy-levels-and-approval-timeout

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Autonomy levels: Three-level resolution chain (agent → department → company default) with AutonomyResolver, EffectiveAutonomy model, action type expansion via ActionTypeRegistry, seniority constraints (JUNIOR can't have FULL), and HumanOnlyPromotionStrategy for auto-downgrade
  • Approval timeout policies: Four pluggable TimeoutPolicy implementations (Wait Forever, Deny on Timeout, Tiered per risk level, Escalation Chain) with discriminated union config, DefaultRiskTierClassifier, and TimeoutChecker service
  • Task park/resume: ParkService serializes AgentContext to ParkedContext for persistence when awaiting approval; PARKED termination reason in ExecutionResult
  • Persistence: SQLite v3 migration adds parked_contexts table with indexes; SQLiteParkedContextRepository with full CRUD
  • API: AutonomyController (GET/POST /agents/{agent_id}/autonomy) registered with app router
  • Security integration: Autonomy pre-check in SecOpsService, security guard blocking auto-approve for HIGH/CRITICAL risk tiers

Pre-PR review fixes (51 findings from 10 agents)

  • Fix autonomy controller returning requested level instead of current
  • Add disjoint validator on EffectiveAutonomy action sets
  • Add escalate_to consistency validator on TimeoutAction
  • Add MemoryError/RecursionError re-raise in security service broad except blocks
  • Fix _row_to_model to raise QueryError instead of returning None
  • Rename YamlRiskTierClassifierDefaultRiskTierClassifier
  • Move Jinja2 env to module-level singleton in template renderer
  • Fix personality mutation pattern (return instead of mutate)
  • Fix immutability violations (deepcopy metadata, immutable dict patterns)
  • Enumerate columns explicitly in SELECT queries (no SELECT *)
  • Register AutonomyController with app router

Closes #42
Closes #126

Test plan

  • 6396 tests pass (0 failures), 95% coverage
  • New tests: test_parked_context_repo.py (12 tests), test_autonomy.py (4 tests)
  • Edge case tests: negative timeout, zero timeout, disjoint overlap, escalate_to validator, double-downgrade, v3 migration
  • Event constants verified in test_events.py (autonomy, timeout, parked context persistence)
  • Rename propagated: DefaultRiskTierClassifier in all test files
  • ruff, mypy strict, pre-commit hooks all pass
  • Rebased on latest main, conflict resolved in prompt.py

Copilot AI review requested due to automatic review settings March 10, 2026 11:59
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 10, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a46085ec-9651-4a88-a35a-f6acdd55701c

📥 Commits

Reviewing files that changed from the base of the PR and between 1855156 and 4deae2a.

📒 Files selected for processing (63)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/api/controllers/__init__.py
  • src/ai_company/api/controllers/autonomy.py
  • src/ai_company/config/schema.py
  • src/ai_company/core/agent.py
  • src/ai_company/core/company.py
  • src/ai_company/core/enums.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/loop_protocol.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
  • src/ai_company/observability/events/autonomy.py
  • src/ai_company/observability/events/persistence.py
  • src/ai_company/observability/events/timeout.py
  • src/ai_company/persistence/protocol.py
  • src/ai_company/persistence/repositories.py
  • src/ai_company/persistence/sqlite/backend.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/persistence/sqlite/parked_context_repo.py
  • src/ai_company/security/autonomy/__init__.py
  • src/ai_company/security/autonomy/change_strategy.py
  • src/ai_company/security/autonomy/models.py
  • src/ai_company/security/autonomy/protocol.py
  • src/ai_company/security/autonomy/resolver.py
  • src/ai_company/security/service.py
  • src/ai_company/security/timeout/__init__.py
  • src/ai_company/security/timeout/config.py
  • src/ai_company/security/timeout/factory.py
  • src/ai_company/security/timeout/models.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/security/timeout/policies.py
  • src/ai_company/security/timeout/protocol.py
  • src/ai_company/security/timeout/risk_tier_classifier.py
  • src/ai_company/security/timeout/timeout_checker.py
  • src/ai_company/templates/renderer.py
  • tests/unit/api/conftest.py
  • tests/unit/api/controllers/test_autonomy.py
  • tests/unit/core/conftest.py
  • tests/unit/core/test_company.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/test_prompt.py
  • tests/unit/memory/org/test_prompt_integration.py
  • tests/unit/observability/test_events.py
  • tests/unit/persistence/sqlite/test_migrations.py
  • tests/unit/persistence/sqlite/test_parked_context_repo.py
  • tests/unit/persistence/test_migrations_v2.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/security/autonomy/__init__.py
  • tests/unit/security/autonomy/test_change_strategy.py
  • tests/unit/security/autonomy/test_models.py
  • tests/unit/security/autonomy/test_resolver.py
  • tests/unit/security/test_service.py
  • tests/unit/security/timeout/__init__.py
  • tests/unit/security/timeout/test_config.py
  • tests/unit/security/timeout/test_factory.py
  • tests/unit/security/timeout/test_park_service.py
  • tests/unit/security/timeout/test_parked_context.py
  • tests/unit/security/timeout/test_policies.py
  • tests/unit/security/timeout/test_risk_tier_classifier.py
  • tests/unit/security/timeout/test_timeout_checker.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Agent autonomy subsystem with configurable levels and presets
    • API endpoints to view/update per-agent autonomy
    • Approval timeout policies (wait, deny, tiered, escalation) and park/resume of agent execution contexts
  • Configuration Changes

    • Autonomy moved to preset-based config and per-agent/department overrides
    • New approval timeout configuration options
  • Observability

    • New autonomy and timeout event types for monitoring lifecycle and decisions

Walkthrough

Adds a complete autonomy + approval-timeout subsystem: enums, models, resolver, change strategy, timeout policies, park/resume, persistence (ParkedContext + repo + migration), SecOps/autonomy wiring, prompt/engine threading, API controller, observability events, and comprehensive tests.

Changes

Cohort / File(s) Summary
Core enums & configs
src/ai_company/core/enums.py, src/ai_company/config/schema.py, src/ai_company/core/company.py, src/ai_company/core/agent.py
Add AutonomyLevel, downgrade/timeout enums; company/agent config gains autonomy fields and approval_timeout; float→AutonomyConfig coercion; agent-level validation (seniority vs FULL).
Autonomy models & strategies
src/ai_company/security/autonomy/models.py, .../protocol.py, .../change_strategy.py, .../resolver.py, .../__init__.py
New AutonomyPreset/AutonomyConfig/EffectiveAutonomy/AutonomyOverride; AutonomyChangeStrategy protocol; HumanOnlyPromotionStrategy implementation; AutonomyResolver for agent→dept→company resolution and action expansion.
Timeout policy surface
src/ai_company/security/timeout/config.py, .../models.py, .../protocol.py, .../factory.py
Introduce discriminated ApprovalTimeoutConfig (wait/deny/tiered/escalation), TimeoutAction model, TimeoutPolicy/RiskTierClassifier protocols, and factory to instantiate policies.
Timeout policies & classifier
src/ai_company/security/timeout/policies.py, .../risk_tier_classifier.py, .../timeout_checker.py, .../__init__.py
Implement four policies (WaitForever, DenyOnTimeout, Tiered, EscalationChain), default risk-tier classifier, and TimeoutChecker to evaluate/resolve approval items.
Park/resume persistence
src/ai_company/security/timeout/parked_context.py, .../park_service.py
Add ParkedContext model (frozen, JSON context), and ParkService to park/resume AgentContext with logging/validation.
Persistence repo & SQLite impl
src/ai_company/persistence/repositories.py, .../protocol.py, .../sqlite/parked_context_repo.py, .../sqlite/backend.py, .../sqlite/migrations.py
Add ParkedContextRepository protocol; SQLite implementation SQLiteParkedContextRepository; backend property parked_contexts; DB migration v3 creating parked_contexts table and indexes.
Security integration
src/ai_company/security/service.py
Thread EffectiveAutonomy and RiskTierClassifier into SecOpsService; apply autonomy augmentation after rule evaluation to handle auto-approve / human-required actions and escalation logic.
Engine & prompt wiring
src/ai_company/engine/agent_engine.py, src/ai_company/engine/prompt.py, src/ai_company/engine/prompt_template.py, src/ai_company/engine/loop_protocol.py
Thread effective_autonomy through AgentEngine, prompt builders, and trimming; bump prompt template version and render autonomy block; add PARKED termination reason and validation.
API & templates
src/ai_company/api/controllers/autonomy.py, src/ai_company/api/controllers/__init__.py, src/ai_company/templates/renderer.py
New AutonomyController (GET/POST) and export; renderer refactor (module-level Jinja env), personality resolution change, and numeric-config backward compatibility.
Observability events
src/ai_company/observability/events/autonomy.py, .../persistence.py, .../timeout.py
Add AUTONOMY_* and TIMEOUT_* event constants and parked-context persistence events.
Tests & test fixtures
tests/unit/** (many files, see diff)
Extensive unit tests added/updated: autonomy models/resolver/change strategy, timeout configs/policies/checker, ParkService and parked-context repo, controller tests, prompt rendering, migrations, and test fixtures exposing fake parked_context repos.
Docs
DESIGN_SPEC.md, CLAUDE.md, README.md
Design and docs updated to describe autonomy, timeout policies, park/resume, and observability guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AutonomyResolver
    participant AutonomyConfig
    participant ActionRegistry

    Client->>AutonomyResolver: resolve(agent_level, dept_level, seniority)
    AutonomyResolver->>AutonomyConfig: select preset (agent>dept>company)
    AutonomyConfig->>AutonomyConfig: fetch preset for level
    AutonomyResolver->>ActionRegistry: expand patterns -> concrete actions
    ActionRegistry-->>AutonomyResolver: return action set
    AutonomyResolver-->>Client: EffectiveAutonomy(level, auto_approve, human_approval, security_agent)
Loading
sequenceDiagram
    participant AgentEngine
    participant SecOpsService
    participant TimeoutChecker
    participant TimeoutPolicy
    participant ParkService
    participant Persistence

    AgentEngine->>SecOpsService: evaluate action with EffectiveAutonomy
    SecOpsService->>SecOpsService: is action auto_approved?
    alt Auto-approved
        SecOpsService-->>AgentEngine: ALLOW (proceed)
    else Human approval required
        SecOpsService->>TimeoutChecker: check_and_resolve(approval_item)
        TimeoutChecker->>TimeoutPolicy: determine_action(item, elapsed)
        TimeoutPolicy-->>TimeoutChecker: WAIT / APPROVE / DENY / ESCALATE
        alt WAIT
            SecOpsService->>ParkService: park(context)
            ParkService->>Persistence: save(ParkedContext)
            Persistence-->>ParkService: parked_id
            ParkService-->>SecOpsService: parked_context
            SecOpsService-->>AgentEngine: Task parked (agent continues others)
        else APPROVE/DENY/ESCALATE
            SecOpsService-->>AgentEngine: return verdict (proceed/escalate/deny)
        end
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/autonomy-levels-and-approval-timeout
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/autonomy-levels-and-approval-timeout

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR implements the autonomy levels (#42) and approval timeout policies (#126) features — a substantial addition covering AutonomyResolver, four TimeoutPolicy implementations, ParkService/ParkedContext park-resume flow, a SQLite v3 migration, and the AutonomyController API endpoint. The overall architecture is well-structured: autonomy augmentation is applied strictly after the rule engine (so security detectors can never be bypassed), the discriminated-union timeout config is clean, and the persistence layer follows existing repository patterns correctly.

Two blocking issues require fixes before merging:

  • Python 3 SyntaxError in service.py and timeout_checker.py: except MemoryError, RecursionError: is Python 2 syntax. All five occurrences (lines 158, 239, 338, 390 in service.py and line 65 in timeout_checker.py) will cause a SyntaxError at module import time, making the entire security subsystem and timeout checker non-functional. The fix is except (MemoryError, RecursionError): throughout.
  • EscalationChainPolicy off-by-one role selection: When step N's timeout expires, the policy escalates to chain[N+1].role instead of chain[N].role. chain[0].role is dead code — never used as an escalation target. A single-step chain will never escalate to any role; it jumps straight to on_chain_exhausted. The escalation target should be the role of the step whose timeout just expired, not the next step.

Additional non-blocking finding:

  • The POST /agents/{agent_id}/autonomy handler logs AUTONOMY_PROMOTION_DENIED for all requests, including demotions, which will produce misleading audit entries.

Confidence Score: 1/5

  • Not safe to merge — two blocking bugs: a Python 3 SyntaxError that disables the security subsystem at import time, and an off-by-one in the escalation chain that routes escalations to the wrong role.
  • The Python 2 except A, B: syntax in service.py and timeout_checker.py is a SyntaxError in Python 3 and will prevent the security module from loading at all. The EscalationChainPolicy incorrectly targets chain[N+1].role when chain[N]'s timeout expires, meaning every configured escalation role is wrong by one position and single-step chains never escalate. Both issues must be fixed before the PR is safe to deploy. The underlying architecture and the rest of the implementation are solid.
  • src/ai_company/security/service.py (SyntaxError, lines 158/239/338/390), src/ai_company/security/timeout/timeout_checker.py (SyntaxError, line 65), src/ai_company/security/timeout/policies.py (EscalationChainPolicy off-by-one role selection).

Important Files Changed

Filename Overview
src/ai_company/security/service.py Five occurrences of invalid Python 3 except MemoryError, RecursionError: syntax will cause a SyntaxError at import time, making the entire security subsystem non-functional.
src/ai_company/security/timeout/policies.py EscalationChainPolicy has an off-by-one role selection bug: when step[N] times out, the policy escalates to step[N+1].role instead of step[N].role; step[0].role is dead code and single-step chains never escalate to any role.
src/ai_company/security/timeout/timeout_checker.py Contains one Python 2-style except MemoryError, RecursionError: (SyntaxError in Python 3); the rest of the service is well-structured with correct fail-safe fallback to WAIT.
src/ai_company/api/controllers/autonomy.py GET handler reads company-level default instead of resolving per-agent/department (flagged in prior thread); POST endpoint always emits AUTONOMY_PROMOTION_DENIED even for demotions. Core wiring and guards are otherwise correct.
src/ai_company/security/autonomy/resolver.py Three-level resolution chain (agent → department → company) and category expansion logic are clean and correct; seniority constraint and logging are properly implemented.
src/ai_company/security/autonomy/models.py AutonomyPreset, EffectiveAutonomy, and AutonomyOverride models are well-validated (disjoint sets, downgrade direction), frozen, and include sensible built-in presets.
src/ai_company/security/timeout/config.py Discriminated union of four timeout configs is well-designed; TierConfig rejects ESCALATE (no target), EscalationChainConfig rejects empty chains and ESCALATE as on_chain_exhausted — all validated correctly.
src/ai_company/persistence/sqlite/parked_context_repo.py Clean SQLite CRUD implementation: explicit column SELECTs, raises QueryError instead of returning None on row failure, and properly double-encodes the metadata JSON string for storage.
src/ai_company/security/autonomy/change_strategy.py HumanOnlyPromotionStrategy correctly prevents autonomy escalation via in-memory overrides with exhaustiveness check on _DOWNGRADE_MAP; in-memory storage is acknowledged as a known limitation in the docstring.
src/ai_company/security/timeout/park_service.py ParkService correctly uses model_dump_json/model_validate_json for round-trip serialization; defensive ID-consistency checks after construction are a good addition.
src/ai_company/persistence/sqlite/migrations.py V3 migration adding parked_contexts table is correct; includes indexes on agent_id and approval_id; schema version bump and migration registration are clean.
src/ai_company/security/timeout/risk_tier_classifier.py DefaultRiskTierClassifier correctly defaults unknown action types to HIGH (fail-safe per D19); exhaustiveness check at module load time catches newly added ActionType members.
src/ai_company/core/company.py Backward-compatible float→AutonomyConfig coercion via before-validator is clean; _float_to_autonomy_level thresholds and WaitForeverConfig default are sensible.

Sequence Diagram

sequenceDiagram
    participant Engine as AgentEngine
    participant SecOps as SecOpsService
    participant Rules as RuleEngine
    participant Autonomy as AutonomyAugmentation
    participant Store as ApprovalStore
    participant Park as ParkService
    participant Repo as ParkedContextRepository
    participant Timeout as TimeoutChecker

    Engine->>SecOps: intercept(context, effective_autonomy)
    SecOps->>Rules: evaluate(context)
    Rules-->>SecOps: SecurityVerdict (ALLOW/DENY/ESCALATE)
    SecOps->>Autonomy: _apply_autonomy_augmentation(context, verdict)
    Note over Autonomy: DENY/ESCALATE from rules always wins<br/>ALLOW may be upgraded to ESCALATE<br/>based on human_approval_actions
    Autonomy-->>SecOps: verdict (possibly upgraded)
    alt verdict == ESCALATE
        SecOps->>Store: add(ApprovalItem)
        SecOps-->>Engine: verdict with approval_id
        Engine->>Park: park(context, approval_id)
        Park-->>Engine: ParkedContext
        Engine->>Repo: save(ParkedContext)
        Engine-->>Engine: ExecutionResult(PARKED)
    else verdict == ALLOW
        SecOps-->>Engine: allow
    else verdict == DENY
        SecOps-->>Engine: deny
    end

    Note over Timeout: Background / periodic poll
    Timeout->>Store: get pending items
    Timeout->>Timeout: check(item, elapsed_seconds)
    alt APPROVE / DENY
        Timeout->>Store: update item status
    else ESCALATE
        Timeout->>Store: re-route to next role
    end
Loading

Comments Outside Diff (2)

  1. src/ai_company/security/service.py, line 158-159 (link)

    Invalid Python 3 except syntax — SyntaxError at import time

    except MemoryError, RecursionError: is Python 2 syntax. In Python 3 this is a SyntaxError, not valid tuple-catching. The entire service.py module will fail to import, making the whole security subsystem non-functional regardless of what the tests report.

    The correct Python 3 syntax is to wrap both exception types in parentheses. All four occurrences in this file (lines 158, 239, 338, 390) and the fifth in src/ai_company/security/timeout/timeout_checker.py:65 share the same problem.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ai_company/security/service.py
    Line: 158-159
    
    Comment:
    **Invalid Python 3 `except` syntax — `SyntaxError` at import time**
    
    `except MemoryError, RecursionError:` is Python 2 syntax. In Python 3 this is a `SyntaxError`, not valid tuple-catching. The entire `service.py` module will fail to import, making the whole security subsystem non-functional regardless of what the tests report.
    
    The correct Python 3 syntax is to wrap both exception types in parentheses. All four occurrences in this file (lines 158, 239, 338, 390) and the fifth in `src/ai_company/security/timeout/timeout_checker.py:65` share the same problem.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. src/ai_company/api/controllers/autonomy.py, line 284-291 (link)

    AUTONOMY_PROMOTION_DENIED logged for all change requests, including demotions

    AUTONOMY_PROMOTION_DENIED is emitted unconditionally regardless of whether the requested level is higher or lower than the current level. Requesting LOCKED (a demotion) or SUPERVISED from FULL would still log "Autonomy level changes require human approval" under the AUTONOMY_PROMOTION_DENIED event, making audit trails confusing.

    Consider branching on compare_autonomy(requested_level, current_level) and emitting a dedicated event (e.g., AUTONOMY_DEMOTION_PENDING) for downgrade requests vs. promotions.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ai_company/api/controllers/autonomy.py
    Line: 284-291
    
    Comment:
    **`AUTONOMY_PROMOTION_DENIED` logged for all change requests, including demotions**
    
    `AUTONOMY_PROMOTION_DENIED` is emitted unconditionally regardless of whether the requested level is higher or lower than the current level. Requesting `LOCKED` (a demotion) or `SUPERVISED` from `FULL` would still log "Autonomy level changes require human approval" under the `AUTONOMY_PROMOTION_DENIED` event, making audit trails confusing.
    
    Consider branching on `compare_autonomy(requested_level, current_level)` and emitting a dedicated event (e.g., `AUTONOMY_DEMOTION_PENDING`) for downgrade requests vs. promotions.
    
    How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 4deae2a

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements autonomy-level based approval routing and configurable approval-timeout handling, including task park/resume persistence, and integrates these concepts into the engine prompt/security flow.

Changes:

  • Add autonomy subsystem (models, resolver, change strategy) and wire “effective autonomy” into SecOps pre-tool checks and system prompt rendering.
  • Add approval timeout subsystem (policies, classifier, checker, config/factory) plus parked-context model/service and SQLite persistence + migration v3.
  • Add API endpoint for reading/requesting autonomy changes, new observability event constants, and broad unit test coverage for the new subsystems.

Reviewed changes

Copilot reviewed 57 out of 59 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/security/timeout/test_timeout_checker.py Unit tests for TimeoutChecker behavior and resolution updates.
tests/unit/security/timeout/test_risk_tier_classifier.py Tests default/custom mappings and unknown-action fallback for risk tiers.
tests/unit/security/timeout/test_policies.py Tests wait/deny/tiered/escalation timeout policies.
tests/unit/security/timeout/test_parked_context.py Tests ParkedContext validation and immutability.
tests/unit/security/timeout/test_park_service.py Tests park/resume serialization round-trip for AgentContext.
tests/unit/security/timeout/test_factory.py Tests timeout policy factory returns correct implementations.
tests/unit/security/timeout/test_config.py Tests timeout config discriminated union + TimeoutAction validator.
tests/unit/security/timeout/init.py Test package marker for timeout tests.
tests/unit/security/test_service.py Adds coverage for SecOpsService autonomy pre-check routing.
tests/unit/security/autonomy/test_resolver.py Tests autonomy resolution chain + category/all expansion + seniority constraints.
tests/unit/security/autonomy/test_models.py Tests autonomy preset/config/effective models and disjoint validation.
tests/unit/security/autonomy/test_change_strategy.py Tests HumanOnlyPromotionStrategy promotion/downgrade/recovery behavior.
tests/unit/security/autonomy/init.py Test package marker for autonomy tests.
tests/unit/persistence/test_protocol.py Extends fake persistence backend/protocol coverage for parked contexts repo.
tests/unit/persistence/test_migrations_v2.py Updates schema version assertions to v3.
tests/unit/persistence/sqlite/test_parked_context_repo.py Adds CRUD/upsert/ordering/deserialization-failure tests for parked contexts repo.
tests/unit/persistence/sqlite/test_migrations.py Adds assertions for v3 parked_contexts table + indexes.
tests/unit/observability/test_events.py Expands expected event modules and asserts new autonomy/timeout/persistence events.
tests/unit/engine/test_prompt.py Tests effective autonomy section inclusion/omission in system prompt.
tests/unit/engine/test_loop_protocol.py Updates TerminationReason enum tests for new PARKED value.
tests/unit/core/test_company.py Updates CompanyConfig autonomy field semantics + adds approval_timeout tests.
tests/unit/core/conftest.py Updates factories to provide AutonomyConfig defaults.
tests/unit/api/controllers/test_autonomy.py Tests new autonomy controller GET/POST and access controls.
tests/unit/api/conftest.py Adds FakeParkedContextRepository into fake persistence backend for API tests.
src/ai_company/templates/renderer.py Reuses module-level Jinja env; supports autonomy config dict passthrough; fixes personality handling to avoid mutation.
src/ai_company/security/timeout/timeout_checker.py Adds TimeoutChecker for evaluating pending approvals against a TimeoutPolicy.
src/ai_company/security/timeout/risk_tier_classifier.py Adds DefaultRiskTierClassifier with fail-safe HIGH fallback + logging.
src/ai_company/security/timeout/protocol.py Defines TimeoutPolicy and RiskTierClassifier protocols.
src/ai_company/security/timeout/policies.py Implements WaitForever/DenyOnTimeout/TieredTimeout/EscalationChain timeout policies.
src/ai_company/security/timeout/parked_context.py Adds ParkedContext Pydantic model for serialized parked executions.
src/ai_company/security/timeout/park_service.py Adds ParkService to serialize/deserialize AgentContext for park/resume.
src/ai_company/security/timeout/models.py Adds TimeoutAction model + escalate_to consistency validator.
src/ai_company/security/timeout/factory.py Adds create_timeout_policy factory for policy configs.
src/ai_company/security/timeout/config.py Adds discriminated-union timeout policy configuration models.
src/ai_company/security/timeout/init.py Exposes timeout subsystem public API.
src/ai_company/security/service.py Adds effective autonomy integration and pre-check routing in SecOpsService.
src/ai_company/security/autonomy/resolver.py Adds AutonomyResolver for agent→department→company resolution + pattern expansion + seniority validation.
src/ai_company/security/autonomy/protocol.py Defines AutonomyChangeStrategy protocol.
src/ai_company/security/autonomy/models.py Adds autonomy presets/config/effective models + runtime override model.
src/ai_company/security/autonomy/change_strategy.py Adds HumanOnlyPromotionStrategy (deny promotions/recovery; apply auto-downgrades).
src/ai_company/security/autonomy/init.py Exposes autonomy subsystem public API.
src/ai_company/persistence/sqlite/parked_context_repo.py Adds SQLiteParkedContextRepository implementation.
src/ai_company/persistence/sqlite/migrations.py Bumps schema to v3 and adds parked_contexts table + indexes migration.
src/ai_company/persistence/sqlite/backend.py Wires parked_contexts repository into SQLite persistence backend.
src/ai_company/persistence/repositories.py Adds ParkedContextRepository protocol.
src/ai_company/persistence/protocol.py Extends PersistenceBackend protocol with parked_contexts repository.
src/ai_company/observability/events/timeout.py Adds timeout event constants.
src/ai_company/observability/events/persistence.py Adds parked-context persistence event constants.
src/ai_company/observability/events/autonomy.py Adds autonomy subsystem event constants.
src/ai_company/engine/prompt_template.py Adds effective autonomy section to system prompt template.
src/ai_company/engine/prompt.py Passes effective autonomy into prompt context and renders it.
src/ai_company/engine/loop_protocol.py Adds PARKED termination reason + validation rules.
src/ai_company/engine/agent_engine.py Threads effective_autonomy into tool invoker/security interceptor and prompt building.
src/ai_company/core/enums.py Adds AutonomyLevel, DowngradeReason, TimeoutActionType enums.
src/ai_company/core/company.py Adds autonomy config and approval_timeout to CompanyConfig; adds dept autonomy override field.
src/ai_company/core/agent.py Adds per-agent autonomy override field to AgentIdentity.
src/ai_company/config/schema.py Adds autonomy_level to AgentConfig schema.
src/ai_company/api/controllers/autonomy.py Adds AutonomyController GET/POST endpoints for autonomy level.
src/ai_company/api/controllers/init.py Registers AutonomyController in controllers module exports/imports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +286 to +317
if action in autonomy.auto_approve_actions:
logger.info(
AUTONOMY_ACTION_AUTO_APPROVED,
tool_name=context.tool_name,
action_type=action,
autonomy_level=autonomy.level.value,
)
return SecurityVerdict(
verdict=SecurityVerdictType.ALLOW,
reason=f"Auto-approved by autonomy level '{autonomy.level.value}'",
risk_level=ApprovalRiskLevel.LOW,
evaluated_at=now,
evaluation_duration_ms=0.0,
)

if action in autonomy.human_approval_actions:
logger.info(
AUTONOMY_ACTION_HUMAN_REQUIRED,
tool_name=context.tool_name,
action_type=action,
autonomy_level=autonomy.level.value,
)
return SecurityVerdict(
verdict=SecurityVerdictType.ESCALATE,
reason=(
f"Human approval required by autonomy level "
f"'{autonomy.level.value}'"
),
risk_level=ApprovalRiskLevel.MEDIUM,
evaluated_at=now,
evaluation_duration_ms=0.0,
)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_autonomy() hard-codes risk_level to LOW for auto-approved actions and MEDIUM for human-required actions. With presets like SEMI auto-approving broad categories (e.g., "code", "vcs"), this can mislabel genuinely HIGH/CRITICAL actions (and also affects what gets stored on ApprovalItem when escalating). Use the existing risk classification logic from the security subsystem to derive risk_level from action_type (or run the rule engine and reuse its computed risk) instead of hard-coding tiers here.

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +287
cumulative_seconds = 0.0
for step in self._chain:
step_timeout = step.timeout_minutes * _SECONDS_PER_MINUTE
if elapsed_seconds < cumulative_seconds + step_timeout:
logger.debug(
TIMEOUT_WAITING,
approval_id=item.id,
escalation_role=step.role,
elapsed_seconds=elapsed_seconds,
)
return TimeoutAction(
action=TimeoutActionType.ESCALATE,
reason=(
f"Escalated to {step.role!r} — {elapsed_seconds:.0f}s elapsed"
),
escalate_to=step.role,
)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EscalationChainPolicy.determine_action() returns ESCALATE for the entire duration of the current step (elapsed_seconds < cumulative + step_timeout). If TimeoutChecker runs periodically, callers will repeatedly receive ESCALATE for the same escalate_to target and may re-escalate/notify on every tick (there’s no state on ApprovalItem to dedupe). Consider returning WAIT while within a step and only emitting ESCALATE when transitioning to the next step (or include enough state/metadata to make escalation idempotent).

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +96
"""Request an autonomy level change for an agent.

Validates seniority constraints and routes through the
configured ``AutonomyChangeStrategy``. Returns 200 with the
current level. If the change requires human approval, the
response includes ``promotion_pending=True``.

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_autonomy() docstring says it “Validates seniority constraints and routes through the configured AutonomyChangeStrategy”, but the implementation doesn’t consult agent/department overrides, doesn’t validate seniority, and doesn’t call any strategy (it always returns promotion_pending=True). Please either implement the documented behavior (wire in resolver/strategy + validation) or adjust the docstring/response fields to match the current placeholder behavior to avoid misleading API consumers.

Copilot uses AI. Check for mistakes.
Comment on lines +373 to +383
@model_validator(mode="before")
@classmethod
def _coerce_autonomy_float(cls, data: object) -> object:
"""Accept a bare float for autonomy and convert to AutonomyConfig."""
if not isinstance(data, dict):
return data
raw = data.get("autonomy")
if isinstance(raw, (int, float)) and not isinstance(raw, bool):
level = _float_to_autonomy_level(float(raw))
return {**data, "autonomy": {"level": level.value}}
return data
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompanyConfig._coerce_autonomy_float() converts any numeric autonomy value into an AutonomyConfig without validating the old 0.0–1.0 contract. This regresses prior bounds checks (e.g., -0.1 or 2.0 will silently map to LOCKED/FULL), and NaN/inf will also map to FULL due to comparison semantics. Add explicit validation (finite + 0.0 <= value <= 1.0) before calling _float_to_autonomy_level(), and raise a clear ValueError on invalid inputs.

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +152
# Autonomy pre-check: route based on effective autonomy before
# the full rule engine. Hard-deny is always checked first.
autonomy_result = await self._apply_autonomy_precheck(context)
if autonomy_result is not None:
return autonomy_result
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The autonomy pre-check short-circuits the rule engine (evaluate_pre_tool() returns early when _apply_autonomy_precheck() yields a verdict). This bypasses the rule engine’s detectors (credential/path traversal/data leak, etc.) even for actions that are “auto-approved” by autonomy presets, which can materially weaken security guarantees compared to the existing PolicyValidator behavior (auto-approve should not skip remaining detection rules). Consider always running the rule engine first, then applying autonomy routing as a post-processing step (e.g., convert ALLOW→ESCALATE when autonomy requires human approval), while still respecting DENY/ESCALATE produced by detectors.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 37

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/ai_company/engine/agent_engine.py (1)

683-726: ⚠️ Potential issue | 🔴 Critical

Don’t make autonomy enforcement conditional on SecurityConfig.

effective_autonomy only reaches SecOpsService, but
_make_security_interceptor() returns None when security is absent/disabled.
In that branch _make_tool_invoker() still builds a ToolInvoker, so actions
that should park or require approval can run with only the static
tool-permission check. Fail closed here, or provide an autonomy-only
interceptor.

🚫 Minimal fail-closed fix
     def _make_security_interceptor(
         self,
         effective_autonomy: EffectiveAutonomy | None = None,
     ) -> SecurityInterceptionStrategy | None:
         """Build the SecOps security interceptor if configured."""
         if self._security_config is None:
+            if effective_autonomy is not None:
+                msg = (
+                    "effective_autonomy cannot be enforced without SecurityConfig"
+                )
+                logger.error(SECURITY_DISABLED, note=msg)
+                raise ExecutionStateError(msg)
             logger.warning(
                 SECURITY_DISABLED,
                 note="No SecurityConfig provided — all security checks skipped",
             )
             return None
         if not self._security_config.enabled:
+            if effective_autonomy is not None:
+                msg = (
+                    "effective_autonomy cannot be enforced when security is disabled"
+                )
+                logger.error(SECURITY_DISABLED, note=msg)
+                raise ExecutionStateError(msg)
             return None

Also applies to: 728-741

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 683 - 726, The current
_make_security_interceptor returns None when SecurityConfig is missing/disabled
which disables autonomy enforcement; change it to fail-closed by returning an
autonomy-only interceptor that implements SecurityInterceptionStrategy (instead
of None) so autonomy rules still apply even when other security detectors are
off: locate _make_security_interceptor and where it currently returns None for
self._security_config is None or not self._security_config.enabled, and replace
that branch with construction/return of a minimal interceptor (e.g., an
AutonomyEnforcementInterceptor or a SecOpsService instance configured only with
effective_autonomy, self._approval_store, and self._audit_log and no detectors)
so _make_tool_invoker can still rely on this interceptor to park/require
approval according to effective_autonomy.
tests/unit/persistence/test_protocol.py (1)

126-186: ⚠️ Potential issue | 🟡 Minor

Add the missing ParkedContextRepository conformance check.

test_fake_backend_is_persistence_backend() only proves that parked_contexts exists. It does not verify that _FakeParkedContextRepository itself satisfies the repository protocol, so signature drift on the fake can slip through this file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/persistence/test_protocol.py` around lines 126 - 186, Update the
test to explicitly verify that the fake repository type matches the repository
protocol: import ParkedContextRepository and add a conformance assertion for
_FakeParkedContextRepository (e.g., assert
isinstance(_FakeParkedContextRepository(), ParkedContextRepository) or an
equivalent runtime/type-check that your test suite uses) inside
test_fake_backend_is_persistence_backend so that signature drift on
_FakeParkedContextRepository is caught; reference the symbols
_FakeParkedContextRepository and ParkedContextRepository when adding the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/api/controllers/autonomy.py`:
- Around line 73-81: get_autonomy() and update_autonomy() currently ignore
per-agent/department overrides and pending requests by always using
config.autonomy.level and flipping promotion_pending manually; replace this by
routing both handlers through the AutonomyResolver and AutonomyChangeStrategy so
they compute the effective level and validation logic, consult and update the
persistent pending-request store, and emit events only after the strategy
decides (e.g., persist pending vs deny vs apply). Specifically: in
get_autonomy() return AutonomyLevelResponse built from
AutonomyResolver.resolve(agent_id) plus promotion_pending read from the
pending-request store; in update_autonomy() call
AutonomyChangeStrategy.request_change(agent_id, requested_level, actor) (or the
resolver API) to validate/apply/persist the change, persist any pending request
in the store, and return the resulting effective level and promotion_pending
flag instead of echoing config.autonomy.level; ensure AUTONOMY_PROMOTION_DENIED
is emitted only when the strategy denies, not when a request is simply pending.

In `@src/ai_company/core/agent.py`:
- Around line 320-323: The AgentIdentity model currently allows an invalid
combination (level == JUNIOR with autonomy_level == FULL); add a Pydantic
validator (e.g., a root_validator or field validator) on the AgentIdentity class
to detect when level is AutonomyLevel.JUNIOR and autonomy_level is
AutonomyLevel.FULL and raise a ValueError describing the forbidden combination;
reference the autonomy_level and level fields in the check so construction fails
fast and prevents creating AgentIdentity(level=JUNIOR, autonomy_level=FULL).

In `@src/ai_company/core/company.py`:
- Around line 373-383: The before-validator _coerce_autonomy_float should reject
non-finite or out-of-range legacy numeric autonomy values before converting
them; import isfinite from math and, inside _coerce_autonomy_float, when raw is
a numeric (and not bool) first check isfinite(raw) and that raw is within the
valid range (e.g. 0.0 <= raw <= 1.0), and if not raise a ValueError with a clear
message instead of coercing; if it passes, continue to call
_float_to_autonomy_level and return the coerced dict as before.

In `@src/ai_company/engine/prompt_template.py`:
- Around line 144-154: The DEFAULT_TEMPLATE was modified to add autonomy fields
but PROMPT_TEMPLATE_VERSION was not updated; update the PROMPT_TEMPLATE_VERSION
constant (referenced as PROMPT_TEMPLATE_VERSION) to a new semver (e.g., "1.4.0")
so cache/snapshot/telemetry can distinguish the autonomy-aware template, and
ensure any tests or places that import PROMPT_TEMPLATE_VERSION are updated
accordingly; locate the constant near the top of the module and bump it to the
new version to match the DEFAULT_TEMPLATE change.

In `@src/ai_company/engine/prompt.py`:
- Around line 399-404: The projection of EffectiveAutonomy into ctx omits the
security_agent field; update the block that sets ctx["effective_autonomy"] (the
code using effective_autonomy.level, auto_approve_actions,
human_approval_actions) to also include security_agent (e.g., "security_agent":
effective_autonomy.security_agent or its serializable representation) so
templates receive the escalation reviewer info along with level and actions.

In `@src/ai_company/persistence/protocol.py`:
- Around line 114-116: The class docstring for PersistenceBackend is missing the
new public API attribute parked_contexts; update the PersistenceBackend
docstring Attributes section to add an entry for parked_contexts (similar style
to existing entries like collaboration_metrics) describing that it returns a
ParkedContextRepository for ParkedContext persistence and ensure it appears
after collaboration_metrics so generated docs reflect the new property.

In `@src/ai_company/security/autonomy/change_strategy.py`:
- Around line 92-108: The code unconditionally sets current_level =
_DOWNGRADE_MAP[reason], which can raise an agent's autonomy if a later downgrade
maps to a less restrictive level; modify auto_downgrade() logic so when
self._overrides.get(agent_id) exists you do not increase autonomy: compute
new_level = _DOWNGRADE_MAP[reason] but if existing is present set current_level
= the more restrictive of existing.current_level and new_level (i.e., do not
replace an equal-or-more-restrictive level such as LOCKED with a
less-restrictive one); update the AutonomyOverride creation to use that
non-escalating current_level and leave original_level unchanged (use symbols:
_DOWNGRADE_MAP, auto_downgrade(), self._overrides, AutonomyOverride).

In `@src/ai_company/security/autonomy/models.py`:
- Around line 62-135: BUILTIN_PRESETS and AutonomyConfig.presets must be
immutable; change BUILTIN_PRESETS to an immutable Mapping (wrap the literal dict
in types.MappingProxyType) and update AutonomyConfig.presets to use a
Mapping[str, AutonomyPreset] type and a default_factory that returns an
immutable deep-copy of the builtin mapping (e.g., return
types.MappingProxyType(copy.deepcopy(BUILTIN_PRESETS))). Import copy and
types.MappingProxyType and ensure you reference BUILTIN_PRESETS, AutonomyConfig,
and the presets field when making these edits.

In `@src/ai_company/security/autonomy/resolver.py`:
- Around line 50-109: The resolve method and related pattern expansion exceed
the 50-line limit and do multiple responsibilities; refactor by extracting
preset lookup/validation and per-pattern expansion into small helper methods
(e.g., create a _get_preset_or_raise(level) that encapsulates the preset lookup,
warning log and ValueError, and a _expand_pattern_list(patterns) that wraps
_expand_patterns per-item branching), update resolve to call
_get_preset_or_raise and _expand_pattern_list, keep validate_seniority usage and
EffectiveAutonomy construction unchanged, and ensure new helpers are
unit-testable and maintain existing log fields (AUTONOMY_RESOLVED,
resolved_level, agent_override, department_override, counts).

In `@src/ai_company/security/service.py`:
- Around line 148-152: The early return after calling _apply_autonomy_precheck
prevents context from reaching self._rule_engine.evaluate(context); change the
flow so that _apply_autonomy_precheck is used only to short-circuit on an
explicit hard-deny, but otherwise do not return early—always call
self._rule_engine.evaluate(context) when autonomy_result is not a hard-deny, and
then merge or reconcile autonomy_result.risk_level (or related fields) into the
rule engine result so the final decision preserves the full security assessment
(use _apply_autonomy_precheck, _check_autonomy, and self._rule_engine.evaluate
to locate and implement the merge/short-circuit logic).

In `@src/ai_company/security/timeout/config.py`:
- Around line 82-86: The tiers mapping currently typed as dict[str, TierConfig]
allows typos (e.g., "critcal") which silently fall back at runtime; change the
field to use a constrained key type and validate entries explicitly: replace
dict[str, TierConfig] with dict[Literal["low","medium","high","critical"],
TierConfig] (or add a validator on the tiers field) in the model that declares
policy and tiers, and add a pydantic validator that raises a clear
ValidationError if any key is not one of the allowed risk levels
(low/medium/high/critical) so mis-typed config keys fail fast with a helpful
message.
- Around line 57-64: TierConfig.on_timeout and
EscalationChainConfig.on_chain_exhausted currently allow
TimeoutActionType.ESCALATE even though the models don't capture an escalate_to
target, which later causes TimeoutAction(action=ESCALATE, ...) to be constructed
without escalate_to in TieredTimeoutPolicy and EscalationChainPolicy; add
validation on TierConfig and EscalationChainConfig (or their pydantic model
validators) to reject or coerce ESCALATE when no escalate_to is provided: check
the fields TierConfig.on_timeout and EscalationChainConfig.on_chain_exhausted
for TimeoutActionType.ESCALATE and raise a validation error (or change default
to DENY) when escalate target/role is absent so downstream code in
TieredTimeoutPolicy and EscalationChainPolicy will never receive an ESCALATE
action without an escalate_to.

In `@src/ai_company/security/timeout/factory.py`:
- Around line 47-51: When building a TieredTimeoutPolicy from a
TieredTimeoutConfig, validate each tier in config.tiers and reject any tier
whose on_timeout/action is ESCALATE but which lacks an escalate_to target; raise
a clear exception (e.g., ValueError) describing the offending tier so the config
fails fast. Perform this check in the factory branch that returns
TieredTimeoutPolicy (where TieredTimeoutConfig is handled and
DefaultRiskTierClassifier is used) before constructing the TieredTimeoutPolicy.
- Around line 59-63: The logger.warning call in timeout.factory uses a raw event
name string ("timeout.factory.unknown_config"); replace it with a domain event
constant by adding/exporting a timeout event constant (e.g.,
TIMEOUT_FACTORY_UNKNOWN_CONFIG) under ai_company.observability.events.timeout
and then import it into src.ai_company.security.timeout.factory (from
ai_company.observability.events.timeout import TIMEOUT_FACTORY_UNKNOWN_CONFIG)
and use that constant in the logger.warning call (keep the same config_type
kwarg). Also remove the unused msg variable if no longer needed.
- Around line 47-50: TieredTimeoutConfig.tiers is a mutable dict on a frozen
Pydantic model; avoid sharing it directly with the runtime by deep-copying and
making it read-only before constructing the policy. In the branch that returns
TieredTimeoutPolicy, replace passing config.tiers directly with a deep copy
(copy.deepcopy(config.tiers)) and wrap the result with MappingProxyType to
produce an immutable mapping, and ensure you import copy and
types.MappingProxyType; apply this change where TieredTimeoutPolicy(...) is
constructed (referencing TieredTimeoutConfig, TieredTimeoutPolicy,
DefaultRiskTierClassifier, and the tiers field).

In `@src/ai_company/security/timeout/park_service.py`:
- Around line 37-83: After serializing the AgentContext in park(), validate that
the extracted internal identifiers match the outer fields: compare
context.execution_id, context.agent_id, and context.task_id against the values
being stored in ParkedContext (execution_id, agent_id, task_id) and raise
ValueError if any mismatch; when creating ParkedContext ensure you store the
canonical values only after this check. Likewise, in resume() when you
deserialize context_json back to an AgentContext, verify that the deserialized
context.execution_id, context.agent_id, and context.task_id match the
ParkedContext.execution_id, ParkedContext.agent_id, and ParkedContext.task_id
and reject/resist resumption if they differ. Ensure checks reference the
ParkedContext class and the park() and resume() methods and keep metadata
handling unchanged.

In `@src/ai_company/security/timeout/parked_context.py`:
- Around line 29-44: The metadata dict on the frozen Pydantic model
ParkedContext can still be mutated by callers; fix this by deep-copying and
wrapping it with MappingProxyType at creation so it becomes immutable. Add a
field validator for "metadata" (e.g., a `@field_validator`("metadata") on
ParkedContext) that does: metadata = copy.deepcopy(metadata) and return
MappingProxyType(metadata); ensure imports for copy and MappingProxyType are
added and that the validator runs during model instantiation so persisted
snapshots cannot be mutated after creation.

In `@src/ai_company/security/timeout/policies.py`:
- Around line 148-149: determine_action() currently always uses
self._classifier.classify(item.action_type) and ignores TierConfig.actions;
change the logic to first check each TierConfig in self._tiers for a non-empty
actions tuple containing item.action_type and select that tier_config if found,
and only if no TierConfig.actions match fall back to calling
self._classifier.classify(item.action_type) and lookup tier_config by
risk_level.value; update references to tier_config, self._classifier.classify,
and TierConfig.actions accordingly so pinned actions are honored.

In `@src/ai_company/security/timeout/risk_tier_classifier.py`:
- Around line 12-45: Remove the duplicated _DEFAULT_RISK_MAP definition and
instead import and reuse the shared risk map from the existing risk classifier
module (e.g., import DEFAULT_RISK_MAP or the exported map from risk_classifier).
Update references in this file (any uses of _DEFAULT_RISK_MAP, ApprovalRiskLevel
and ActionType lookups) to use the imported map so timeout classification and
security use the single source of truth defined in risk_classifier.py.

In `@src/ai_company/security/timeout/timeout_checker.py`:
- Around line 36-68: Add a guard at the start of check to skip policy evaluation
for items whose ApprovalItem.status is not ApprovalStatus.PENDING: return a
no-op/neutral TimeoutAction (and log the skip) instead of calling
self._policy.determine_action; also ensure check_and_resolve performs the same
status guard before applying any resolution so already-APPROVED/REJECTED/EXPIRED
items cannot be overwritten by timeout logic.

In `@src/ai_company/templates/renderer.py`:
- Around line 526-529: The branch that handles raw_autonomy dicts reuses the
parsed dict instance (raw_autonomy) and assigns it directly to autonomy, causing
aliasing if later normalization/validation mutates it; change the assignment to
make a deep copy (e.g., autonomy = copy.deepcopy(raw_autonomy)) and add an
import for the copy module at the top of the file so the config tree always
receives an isolated dict instance (refer to raw_autonomy and the autonomy
variable in renderer.py).
- Around line 675-678: The code may pass a non-string YAML scalar as preset_name
into get_personality_preset (which calls .strip()), causing AttributeError
instead of the renderer's TemplateRenderError; before calling
get_personality_preset(preset_name) validate that preset_name is an instance of
str and, if not, raise TemplateRenderError with a clear message about invalid
preset type; keep the existing KeyError handling for unknown preset names so the
try/except around get_personality_preset still catches KeyError but the type
check prevents AttributeError from escaping.
- Around line 68-70: The module-level Jinja2 filter "auto" on _JINJA_ENV
incorrectly uses "value or ''" which collapses valid falsy values like 0 or
False; change the filter to only treat None or Jinja2 Undefined as missing —
i.e., import Jinja2's Undefined and update the lambda for
_JINJA_ENV.filters["auto"] to return "" when value is None or isinstance(value,
Undefined), otherwise return the original value (preserving 0, 0.0, False).

In `@tests/unit/api/conftest.py`:
- Around line 198-211: The fake in-memory repo stores and returns the same
ParkedContext instances which share mutable metadata dicts; modify the methods
in the test fake (save, get, get_by_approval, get_by_agent) to deepcopy
ParkedContext objects at the persistence boundary (use import copy at top), i.e.
store copy.deepcopy(context) inside save and return copy.deepcopy(...) from get,
get_by_approval, and get_by_agent so callers receive independent copies
mirroring SQLiteParkedContextRepository serialization behavior.

In `@tests/unit/engine/test_loop_protocol.py`:
- Around line 33-36: Add tests exercising the new PARKED rule: create one
passing test that constructs an ExecutionResult with
termination=TerminationReason.PARKED and asserts that result.error_message is
None (or that validation succeeds), and add one failing test that attempts to
create/validate an ExecutionResult with termination=TerminationReason.PARKED and
a non-None error_message and asserts the operation raises the expected
validation exception (e.g., ValueError or AssertionError); reference
ExecutionResult and TerminationReason.PARKED in the new tests so the
PARKED->error_message=None contract is enforced.

In `@tests/unit/persistence/sqlite/test_migrations.py`:
- Around line 94-115: Update the two tests to validate the parked_contexts
schema, not just names: after run_migrations in
test_v3_creates_parked_contexts_table call "PRAGMA table_info(parked_contexts)"
and assert the returned column names include the repository-required columns
(e.g., "id", "agent_id", "approval_id", plus any expected timestamp or payload
columns your code depends on); in test_v3_creates_parked_context_indexes after
finding index names run "PRAGMA index_info('idx_pc_agent_id')" and "PRAGMA
index_info('idx_pc_approval_id')" and assert those index_info results reference
the proper indexed columns ("agent_id" and "approval_id" respectively); keep
using run_migrations and the same test function names to locate where to add
these assertions.

In `@tests/unit/persistence/sqlite/test_parked_context_repo.py`:
- Around line 7-18: Add a module-level 30-second timeout by defining pytestmark
= pytest.mark.timeout(30) near the top of the test file (after the imports) so
all async tests in this module (which use SQLiteParkedContextRepository and
ParkedContext) get a global timeout; use the pytestmark symbol and
pytest.mark.timeout to implement this.

In `@tests/unit/persistence/test_migrations_v2.py`:
- Around line 31-32: The test test_schema_version_is_three currently only
asserts SCHEMA_VERSION == 3 and therefore misses verifying that v3 artifacts
were actually created; update this test to also query the database schema to
assert the parked_contexts table exists and that the two new indexes
idx_pc_agent_id and idx_pc_approval_id are present (on the upgrade path where
_apply_v3() should run). Locate test_schema_version_is_three and after
confirming SCHEMA_VERSION, run the same connection/PRAGMA or sqlite_master
queries used elsewhere in the suite to verify existence of the parked_contexts
table and that entries for idx_pc_agent_id and idx_pc_approval_id exist; fail
the test if any of those are missing so partial or skipped _apply_v3() is
caught. Ensure you reference the same DB handle/fixture used by other migration
tests so the checks run against the upgraded DB instance.

In `@tests/unit/security/autonomy/test_change_strategy.py`:
- Around line 62-70: Update the test_double_downgrade_preserves_original to
ensure the original_level is preserved: when using HumanOnlyPromotionStrategy
call auto_downgrade the first time with an explicit current_level (e.g.,
AutonomyLevel.HIGH or similar) for agent-1, then call auto_downgrade a second
time with a different DowngradeReason, fetch the override via get_override and
add an assertion that override.original_level equals the explicit level you
seeded; keep the existing assertions for override.current_level and
override.reason to verify the second downgrade replaced only the current state.

In `@tests/unit/security/autonomy/test_models.py`:
- Around line 122-125: The test_config_frozen currently only checks assignment
to the AutonomyConfig.level attribute but misses in-place mutation of the
mutable AutonomyConfig.presets dict; update the test_config_frozen to attempt an
in-place change to config.presets (e.g., mutating an existing key or adding a
key) and assert that this raises an exception or is prevented, and if underlying
model does not yet protect presets, change the AutonomyConfig construction to
deep-copy incoming presets and wrap them with MappingProxyType (using
copy.deepcopy in the AutonomyConfig __init__ or validator) so presets is
read-only; reference AutonomyConfig, presets, test_config_frozen, copy.deepcopy,
and MappingProxyType when making the fixes.

In `@tests/unit/security/autonomy/test_resolver.py`:
- Around line 106-129: Add a test that exercises the public API by calling
resolver.resolve(...) to ensure seniority enforcement is applied there as well:
in TestSeniorityValidation add a case that calls
resolver.resolve(agent_level=AutonomyLevel.FULL,
seniority=SeniorityLevel.JUNIOR) and asserts it raises ValueError (matching
"FULL autonomy"); keep existing validate_seniority() checks but include this
resolve(...) call so the public resolve method is validated for the same JUNIOR
+ FULL rejection.

In `@tests/unit/security/test_service.py`:
- Around line 489-579: Add a new async test (e.g.,
test_auto_approve_blocked_for_high_or_critical_risk) that constructs an
EffectiveAutonomy with the action present in auto_approve_actions, then creates
a context for that action with a HIGH (and/or CRITICAL) risk level (use
_make_context(action_type="...", risk_level=RiskLevel.HIGH) or otherwise mock
the risk classifier), calls service.evaluate_pre_tool(ctx) and asserts the
result is NOT SecurityVerdictType.ALLOW and that auto-approval was not used
(e.g., service._test_rule_engine.evaluate.assert_called_once() or that the
verdict.reason mentions escalation/review); reference EffectiveAutonomy,
auto_approve_actions, evaluate_pre_tool, and SecurityVerdictType.ALLOW when
adding the test.

In `@tests/unit/security/timeout/test_factory.py`:
- Around line 30-47: Update each test to assert behavior/wiring, not just type:
after calling create_timeout_policy with DenyOnTimeoutConfig(timeout_minutes=60)
assert the returned DenyOnTimeoutPolicy has its internal timeout represented as
3600 seconds (verify the concrete attribute on DenyOnTimeoutPolicy that stores
seconds); for TieredTimeoutConfig assert the returned TieredTimeoutPolicy
preserved the provided tier configuration (compare the policy's tiers/config
property to the original TieredTimeoutConfig values); for EscalationChainConfig
assert the returned EscalationChainPolicy preserved the chain and
on_chain_exhausted values (verify the policy.chain contains the
EscalationStep(role="lead", timeout_minutes=30) data and
policy.on_chain_exhausted equals TimeoutActionType.DENY). Ensure you reference
create_timeout_policy, DenyOnTimeoutConfig, DenyOnTimeoutPolicy,
TieredTimeoutConfig, TieredTimeoutPolicy, EscalationChainConfig,
EscalationChainPolicy, EscalationStep, and TimeoutActionType when locating the
fields to assert.

In `@tests/unit/security/timeout/test_park_service.py`:
- Around line 75-92: The test test_resume_restores_context uses
_make_agent_context() which returns an AgentContext with task_execution=None, so
add a real task-bound context before parking: build an AgentContext with a
non-None task_execution containing a Task (with id "task-1" or similar) and use
that context when calling ParkService().park(...); after resume, assert that
restored.task_execution is not None and that restored.task_execution.task.id
equals the original task id to ensure the task survives the round-trip through
ParkService.park and ParkService.resume.

In `@tests/unit/security/timeout/test_parked_context.py`:
- Around line 54-59: Update the frozen-model test and the persistence
deserialization: in the test_frozen() for the ParkedContext model, add an
assertion that an in-place mutation like parked.metadata["key"] = "value" either
raises (or does not change the model when re-fetched) to cover dict mutation on
a frozen Pydantic model; and in the persistence layer method _row_to_model (the
function that converts DB rows into ParkedContext instances) wrap the
deserialized metadata with copy.deepcopy(json.loads(raw_meta)) before
constructing the ParkedContext so the model receives a deep-copied dict and
in-place mutations at call sites cannot mutate shared state.

In `@tests/unit/security/timeout/test_policies.py`:
- Around line 89-141: Add tests that ensure the HIGH and CRITICAL tiers cannot
be auto-approved even if their TierConfig.on_timeout is set to APPROVE: create
TieredTimeoutPolicy instances with tiers like {"high":
TierConfig(timeout_minutes=1, on_timeout=TimeoutActionType.APPROVE)} and
{"critical": TierConfig(...)} and use DefaultRiskTierClassifier plus
_make_item(...) with an action_type that maps to HIGH/CRITICAL (e.g.,
"secrets:write" or another classifier-recognized high/critical action); call
policy.determine_action(item, elapsed_seconds) with elapsed_seconds >
timeout_minutes*60 and assert the returned result.action is
TimeoutActionType.WAIT (repeat or parametrize for both HIGH and CRITICAL to
cover both branches).

In `@tests/unit/security/timeout/test_risk_tier_classifier.py`:
- Around line 1-64: Add the module-level pytest timeout marker by defining
pytestmark = pytest.mark.timeout(30) immediately after the imports in this test
module; update the top of the file (near the imports that include pytest and
DefaultRiskTierClassifier) so the module-level marker applies to all tests (no
changes needed to DefaultRiskTierClassifier or individual test functions).

---

Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 683-726: The current _make_security_interceptor returns None when
SecurityConfig is missing/disabled which disables autonomy enforcement; change
it to fail-closed by returning an autonomy-only interceptor that implements
SecurityInterceptionStrategy (instead of None) so autonomy rules still apply
even when other security detectors are off: locate _make_security_interceptor
and where it currently returns None for self._security_config is None or not
self._security_config.enabled, and replace that branch with construction/return
of a minimal interceptor (e.g., an AutonomyEnforcementInterceptor or a
SecOpsService instance configured only with effective_autonomy,
self._approval_store, and self._audit_log and no detectors) so
_make_tool_invoker can still rely on this interceptor to park/require approval
according to effective_autonomy.

In `@tests/unit/persistence/test_protocol.py`:
- Around line 126-186: Update the test to explicitly verify that the fake
repository type matches the repository protocol: import ParkedContextRepository
and add a conformance assertion for _FakeParkedContextRepository (e.g., assert
isinstance(_FakeParkedContextRepository(), ParkedContextRepository) or an
equivalent runtime/type-check that your test suite uses) inside
test_fake_backend_is_persistence_backend so that signature drift on
_FakeParkedContextRepository is caught; reference the symbols
_FakeParkedContextRepository and ParkedContextRepository when adding the check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 45709ea0-f32a-4c74-b095-2e041b61759b

📥 Commits

Reviewing files that changed from the base of the PR and between 02b5f4e and 1855156.

📒 Files selected for processing (59)
  • src/ai_company/api/controllers/__init__.py
  • src/ai_company/api/controllers/autonomy.py
  • src/ai_company/config/schema.py
  • src/ai_company/core/agent.py
  • src/ai_company/core/company.py
  • src/ai_company/core/enums.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/loop_protocol.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
  • src/ai_company/observability/events/autonomy.py
  • src/ai_company/observability/events/persistence.py
  • src/ai_company/observability/events/timeout.py
  • src/ai_company/persistence/protocol.py
  • src/ai_company/persistence/repositories.py
  • src/ai_company/persistence/sqlite/backend.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/persistence/sqlite/parked_context_repo.py
  • src/ai_company/security/autonomy/__init__.py
  • src/ai_company/security/autonomy/change_strategy.py
  • src/ai_company/security/autonomy/models.py
  • src/ai_company/security/autonomy/protocol.py
  • src/ai_company/security/autonomy/resolver.py
  • src/ai_company/security/service.py
  • src/ai_company/security/timeout/__init__.py
  • src/ai_company/security/timeout/config.py
  • src/ai_company/security/timeout/factory.py
  • src/ai_company/security/timeout/models.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/security/timeout/policies.py
  • src/ai_company/security/timeout/protocol.py
  • src/ai_company/security/timeout/risk_tier_classifier.py
  • src/ai_company/security/timeout/timeout_checker.py
  • src/ai_company/templates/renderer.py
  • tests/unit/api/conftest.py
  • tests/unit/api/controllers/test_autonomy.py
  • tests/unit/core/conftest.py
  • tests/unit/core/test_company.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/test_prompt.py
  • tests/unit/observability/test_events.py
  • tests/unit/persistence/sqlite/test_migrations.py
  • tests/unit/persistence/sqlite/test_parked_context_repo.py
  • tests/unit/persistence/test_migrations_v2.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/security/autonomy/__init__.py
  • tests/unit/security/autonomy/test_change_strategy.py
  • tests/unit/security/autonomy/test_models.py
  • tests/unit/security/autonomy/test_resolver.py
  • tests/unit/security/test_service.py
  • tests/unit/security/timeout/__init__.py
  • tests/unit/security/timeout/test_config.py
  • tests/unit/security/timeout/test_factory.py
  • tests/unit/security/timeout/test_park_service.py
  • tests/unit/security/timeout/test_parked_context.py
  • tests/unit/security/timeout/test_policies.py
  • tests/unit/security/timeout/test_risk_tier_classifier.py
  • tests/unit/security/timeout/test_timeout_checker.py

Comment on lines +73 to +81
app_state: AppState = state.app_state
config = app_state.config.config
level = config.autonomy.level
return ApiResponse(
data=AutonomyLevelResponse(
agent_id=agent_id,
level=level,
),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These endpoints never read or write per-agent autonomy state.

get_autonomy() always returns config.autonomy.level with promotion_pending=False, and update_autonomy() only logs before echoing the same level back with promotion_pending=True. That ignores agent overrides, department overrides, active downgrade overrides, seniority validation, and any persisted pending request, so every agent reports the same value and a follow-up GET immediately loses the pending state. It also emits AUTONOMY_PROMOTION_DENIED for requests the API is treating as pending. This needs to go through the actual AutonomyResolver / AutonomyChangeStrategy flow plus a store for pending requests.

Also applies to: 105-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/controllers/autonomy.py` around lines 73 - 81,
get_autonomy() and update_autonomy() currently ignore per-agent/department
overrides and pending requests by always using config.autonomy.level and
flipping promotion_pending manually; replace this by routing both handlers
through the AutonomyResolver and AutonomyChangeStrategy so they compute the
effective level and validation logic, consult and update the persistent
pending-request store, and emit events only after the strategy decides (e.g.,
persist pending vs deny vs apply). Specifically: in get_autonomy() return
AutonomyLevelResponse built from AutonomyResolver.resolve(agent_id) plus
promotion_pending read from the pending-request store; in update_autonomy() call
AutonomyChangeStrategy.request_change(agent_id, requested_level, actor) (or the
resolver API) to validate/apply/persist the change, persist any pending request
in the store, and return the resulting effective level and promotion_pending
flag instead of echoing config.autonomy.level; ensure AUTONOMY_PROMOTION_DENIED
is emitted only when the strategy denies, not when a request is simply pending.

Comment on lines +30 to +47
@pytest.mark.unit
def test_deny_on_timeout(self) -> None:
result = create_timeout_policy(DenyOnTimeoutConfig(timeout_minutes=60))
assert isinstance(result, DenyOnTimeoutPolicy)

@pytest.mark.unit
def test_tiered(self) -> None:
result = create_timeout_policy(TieredTimeoutConfig())
assert isinstance(result, TieredTimeoutPolicy)

@pytest.mark.unit
def test_escalation_chain(self) -> None:
config = EscalationChainConfig(
chain=(EscalationStep(role="lead", timeout_minutes=30),),
on_chain_exhausted=TimeoutActionType.DENY,
)
result = create_timeout_policy(config)
assert isinstance(result, EscalationChainPolicy)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the factory wiring, not just the concrete class.

These cases only verify dispatch. A regression in the timeout_minutes -> seconds conversion or in forwarding chain / on_chain_exhausted would still pass as long as the returned class stays the same. Please add at least one behavior-level assertion per configurable policy so this suite catches broken wiring as well as wrong type selection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/security/timeout/test_factory.py` around lines 30 - 47, Update
each test to assert behavior/wiring, not just type: after calling
create_timeout_policy with DenyOnTimeoutConfig(timeout_minutes=60) assert the
returned DenyOnTimeoutPolicy has its internal timeout represented as 3600
seconds (verify the concrete attribute on DenyOnTimeoutPolicy that stores
seconds); for TieredTimeoutConfig assert the returned TieredTimeoutPolicy
preserved the provided tier configuration (compare the policy's tiers/config
property to the original TieredTimeoutConfig values); for EscalationChainConfig
assert the returned EscalationChainPolicy preserved the chain and
on_chain_exhausted values (verify the policy.chain contains the
EscalationStep(role="lead", timeout_minutes=30) data and
policy.on_chain_exhausted equals TimeoutActionType.DENY). Ensure you reference
create_timeout_policy, DenyOnTimeoutConfig, DenyOnTimeoutPolicy,
TieredTimeoutConfig, TieredTimeoutPolicy, EscalationChainConfig,
EscalationChainPolicy, EscalationStep, and TimeoutActionType when locating the
fields to assert.

Comment on lines +75 to +92
def test_resume_restores_context(self) -> None:
"""Parks then resumes, verifies round-trip fidelity."""
context = _make_agent_context()
service = ParkService()

parked = service.park(
context=context,
approval_id="approval-1",
agent_id="agent-1",
task_id="task-1",
)

restored = service.resume(parked)

assert restored.execution_id == context.execution_id
assert restored.turn_count == context.turn_count
assert restored.identity.name == context.identity.name
assert restored.identity.role == context.identity.role
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Round-trip coverage never exercises a task-bound context.

_make_agent_context() builds AgentContext(task_execution=None), so this test would still pass if park/resume silently dropped the blocked task. Since the feature is parking approval-gated work, please create the context with a real task and assert restored.task_execution.task.id survives the round-trip.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/security/timeout/test_park_service.py` around lines 75 - 92, The
test test_resume_restores_context uses _make_agent_context() which returns an
AgentContext with task_execution=None, so add a real task-bound context before
parking: build an AgentContext with a non-None task_execution containing a Task
(with id "task-1" or similar) and use that context when calling
ParkService().park(...); after resume, assert that restored.task_execution is
not None and that restored.task_execution.task.id equals the original task id to
ensure the task survives the round-trip through ParkService.park and
ParkService.resume.

Comment on lines +54 to +59
def test_frozen(self) -> None:
"""Cannot modify fields on a frozen model."""
parked = _make_parked_context()
with pytest.raises(ValidationError):
parked.agent_id = "other" # type: ignore[misc]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Pydantic v2, does ConfigDict(frozen=True)prevent in-place mutation ofdictorlist fields, or only attribute reassignment?

💡 Result:

ConfigDict(frozen=True) only makes the model itself “faux-immutable” by blocking attribute setting (i.e., disallowing __setattr__, and thus preventing model.field = ...)—it does not deep-freeze contained values. So if a field value is a mutable object like a list or dict, in-place mutation still works (e.g., model.my_list.append(...), model.my_dict["k"]=...) because you’re not reassigning the attribute, you’re mutating the object it references. [1], [2]

Implication: if you need true immutability for containers, use immutable types (e.g., tuple, frozenset, immutable mapping wrappers) or wrap/convert during validation. [1], [2]

Sources

  1. Pydantic config docs (frozen: blocks __setattr__) https://docs.pydantic.dev/latest/api/config/
  2. Example noting underlying dict remains mutable even when model is frozen https://www.jujens.eu/posts/en/2025/Apr/19/using-custom-classes-pydantic/

🏁 Script executed:

cat -n src/ai_company/security/timeout/parked_context.py | head -50

Repository: Aureliolo/ai-company

Length of output: 2092


🏁 Script executed:

# Check where ParkedContext is used and how metadata is accessed/modified
rg -n "ParkedContext" --type py -A 2 -B 2 | head -100

Repository: Aureliolo/ai-company

Length of output: 7561


🏁 Script executed:

# Check if metadata is mutated anywhere or if deepcopy is used
rg -n "\.metadata" --type py -B 1 -A 1 | head -80

Repository: Aureliolo/ai-company

Length of output: 4971


🏁 Script executed:

# Check specifically for ParkedContext metadata mutations or deepcopy usage
rg -n "parked.*metadata|metadata.*parked" --type py -B 2 -A 2

Repository: Aureliolo/ai-company

Length of output: 3988


🏁 Script executed:

# Check the persistence repo to see if deepcopy is used when loading/saving ParkedContext
cat -n src/ai_company/persistence/sqlite/parked_context_repo.py | sed -n '40,90p'

Repository: Aureliolo/ai-company

Length of output: 2250


🏁 Script executed:

# Check the complete loading logic for ParkedContext
cat -n src/ai_company/persistence/sqlite/parked_context_repo.py | sed -n '84,115p'

Repository: Aureliolo/ai-company

Length of output: 1433


🏁 Script executed:

# Find the _row_to_model method
cat -n src/ai_company/persistence/sqlite/parked_context_repo.py | sed -n '1,40p'

Repository: Aureliolo/ai-company

Length of output: 1535


🏁 Script executed:

# Search for _row_to_model method
rg -n "_row_to_model" src/ai_company/persistence/sqlite/parked_context_repo.py -A 15

Repository: Aureliolo/ai-company

Length of output: 2986


🏁 Script executed:

# Verify the test file doesn't have any in-place mutation tests for metadata
cat -n tests/unit/security/timeout/test_parked_context.py | sed -n '54,85p'

Repository: Aureliolo/ai-company

Length of output: 1386


Extend the frozen model test to cover in-place metadata mutation, and apply copy.deepcopy() to the metadata field at the persistence boundary.

Pydantic v2's frozen=True only prevents attribute reassignment (line 58 checks this), but in-place mutation of dict fields still works. The metadata field on the frozen ParkedContext model can be mutated via parked.metadata["key"] = value, which the current test misses.

Additionally, per coding guidelines, dict/list fields on frozen Pydantic models must use copy.deepcopy() at system boundaries. The persistence layer's _row_to_model() method (line 171–181 in src/ai_company/persistence/sqlite/parked_context_repo.py) deserializes metadata without deepcopy, violating this requirement.

Fixes needed:

  1. Update test_frozen() to verify parked.metadata["key"] = value raises an error or does not persist
  2. Apply copy.deepcopy(json.loads(raw_meta)) in _row_to_model() before constructing ParkedContext
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/security/timeout/test_parked_context.py` around lines 54 - 59,
Update the frozen-model test and the persistence deserialization: in the
test_frozen() for the ParkedContext model, add an assertion that an in-place
mutation like parked.metadata["key"] = "value" either raises (or does not change
the model when re-fetched) to cover dict mutation on a frozen Pydantic model;
and in the persistence layer method _row_to_model (the function that converts DB
rows into ParkedContext instances) wrap the deserialized metadata with
copy.deepcopy(json.loads(raw_meta)) before constructing the ParkedContext so the
model receives a deep-copied dict and in-place mutations at call sites cannot
mutate shared state.

Comment on lines +1 to +64
"""Tests for DefaultRiskTierClassifier."""

import pytest

from ai_company.core.enums import ActionType, ApprovalRiskLevel
from ai_company.security.timeout.risk_tier_classifier import DefaultRiskTierClassifier


class TestDefaultMapping:
"""Default risk tier mapping."""

@pytest.mark.unit
def test_critical_actions(self) -> None:
classifier = DefaultRiskTierClassifier()
expected = ApprovalRiskLevel.CRITICAL
assert classifier.classify(ActionType.DEPLOY_PRODUCTION) == expected
assert classifier.classify(ActionType.DB_ADMIN) == expected

@pytest.mark.unit
def test_high_actions(self) -> None:
classifier = DefaultRiskTierClassifier()
assert classifier.classify(ActionType.VCS_PUSH) == ApprovalRiskLevel.HIGH
assert classifier.classify(ActionType.CODE_DELETE) == ApprovalRiskLevel.HIGH

@pytest.mark.unit
def test_medium_actions(self) -> None:
classifier = DefaultRiskTierClassifier()
assert classifier.classify(ActionType.CODE_WRITE) == ApprovalRiskLevel.MEDIUM

@pytest.mark.unit
def test_low_actions(self) -> None:
classifier = DefaultRiskTierClassifier()
assert classifier.classify(ActionType.CODE_READ) == ApprovalRiskLevel.LOW
assert classifier.classify(ActionType.TEST_RUN) == ApprovalRiskLevel.LOW


class TestUnknownFallback:
"""Unknown action types default to HIGH (D19)."""

@pytest.mark.unit
def test_unknown_defaults_to_high(self) -> None:
classifier = DefaultRiskTierClassifier()
assert classifier.classify("unknown:action") == ApprovalRiskLevel.HIGH


class TestCustomMap:
"""Custom risk overrides."""

@pytest.mark.unit
def test_custom_override(self) -> None:
classifier = DefaultRiskTierClassifier(
custom_map={ActionType.CODE_READ: ApprovalRiskLevel.CRITICAL}
)
assert classifier.classify(ActionType.CODE_READ) == ApprovalRiskLevel.CRITICAL

@pytest.mark.unit
def test_custom_preserves_defaults(self) -> None:
classifier = DefaultRiskTierClassifier(
custom_map={"custom:action": ApprovalRiskLevel.LOW}
)
# Default still works.
assert classifier.classify(ActionType.CODE_READ) == ApprovalRiskLevel.LOW
# Custom also works.
assert classifier.classify("custom:action") == ApprovalRiskLevel.LOW
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for timeout marker usage in test files
fd -e py "tests/" -type f | head -20 | xargs grep -l "timeout" 2>/dev/null | head -10

Repository: Aureliolo/ai-company

Length of output: 234


🏁 Script executed:

# Look for pytest configuration files
fd -e ini -e toml -e cfg "pytest|setup" -type f | head -10

Repository: Aureliolo/ai-company

Length of output: 234


🏁 Script executed:

# Search more broadly for timeout in all test files
rg "pytest\.mark\.timeout|timeout.*30" tests/ -t py -A 2 -B 2

Repository: Aureliolo/ai-company

Length of output: 50376


🏁 Script executed:

# Check if the specific file has the timeout marker
cat tests/unit/security/timeout/test_risk_tier_classifier.py | head -10

Repository: Aureliolo/ai-company

Length of output: 338


Add module-level timeout marker to align with test policy.

Add pytestmark = pytest.mark.timeout(30) after the imports. This is required for all unit test files per coding guidelines and is already consistently used throughout the test suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/security/timeout/test_risk_tier_classifier.py` around lines 1 -
64, Add the module-level pytest timeout marker by defining pytestmark =
pytest.mark.timeout(30) immediately after the imports in this test module;
update the top of the file (near the imports that include pytest and
DefaultRiskTierClassifier) so the module-level marker applies to all tests (no
changes needed to DefaultRiskTierClassifier or individual test functions).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent system by introducing robust autonomy management and approval timeout mechanisms. It allows for dynamic control over agent decision-making, ensuring that actions requiring human oversight are properly routed and handled within defined timeframes. The changes also include a new persistence layer for parking agent execution contexts, enabling seamless resumption of tasks after approvals. This foundational work improves system security, operational control, and agent reliability.

Highlights

  • Autonomy Levels: Introduced a three-level resolution chain (agent → department → company default) for agent autonomy, managed by AutonomyResolver and EffectiveAutonomy model. This includes action type expansion via ActionTypeRegistry and seniority constraints (JUNIOR agents cannot have FULL autonomy). A HumanOnlyPromotionStrategy is implemented for auto-downgrades.
  • Approval Timeout Policies: Added four pluggable TimeoutPolicy implementations: Wait Forever, Deny on Timeout, Tiered per risk level, and Escalation Chain. These are configured via a discriminated union, supported by a DefaultRiskTierClassifier and managed by a TimeoutChecker service.
  • Task Park/Resume: Implemented a ParkService to serialize AgentContext into ParkedContext for persistence when an agent's task is awaiting approval. A new PARKED termination reason was added to ExecutionResult.
  • Persistence: The SQLite database schema was migrated to version 3, adding a parked_contexts table with appropriate indexes. A SQLiteParkedContextRepository was implemented to provide full CRUD operations for parked contexts.
  • API Endpoint: A new AutonomyController was added and registered with the application router, providing GET and POST endpoints for managing agent autonomy levels at /agents/{agent_id}/autonomy.
  • Security Integration: Integrated an autonomy pre-check into SecOpsService to route actions based on effective autonomy levels before the full rule engine. This includes a security guard blocking auto-approval for HIGH/CRITICAL risk tiers.
  • Pre-PR Review Fixes: Addressed 51 findings from 10 agents, including fixes for the autonomy controller, disjoint validator on EffectiveAutonomy action sets, escalate_to consistency validator on TimeoutAction, MemoryError/RecursionError re-raises, _row_to_model error handling, renaming YamlRiskTierClassifier, moving Jinja2 environment to a module-level singleton, personality mutation patterns, immutability violations, and explicit column enumeration in SELECT queries.
Changelog
  • src/ai_company/api/controllers/init.py
    • Imported AutonomyController.
    • Registered AutonomyController for API routing.
  • src/ai_company/api/controllers/autonomy.py
    • Added AutonomyController with GET/POST endpoints for managing agent autonomy levels.
  • src/ai_company/config/schema.py
    • Imported AutonomyLevel, AutonomyConfig, and ApprovalTimeoutConfig.
    • Added autonomy_level field to AgentConfig.
    • Updated CompanyConfig to use AutonomyConfig and ApprovalTimeoutConfig.
    • Included a model_validator for backward compatibility with float autonomy values.
  • src/ai_company/core/agent.py
    • Imported AutonomyLevel.
    • Added autonomy_level field to AgentIdentity.
  • src/ai_company/core/company.py
    • Imported autonomy and timeout configuration models.
    • Added autonomy_level to Department model.
    • Updated CompanyConfig to use AutonomyConfig and ApprovalTimeoutConfig models.
    • Included a backward compatibility validator for autonomy float values.
  • src/ai_company/core/enums.py
    • Added new AutonomyLevel enum.
    • Added new DowngradeReason enum.
    • Added new TimeoutActionType enum.
  • src/ai_company/engine/agent_engine.py
    • Imported EffectiveAutonomy.
    • Modified run method to accept effective_autonomy parameter.
    • Modified _prepare_context method to accept effective_autonomy parameter.
    • Modified _make_security_interceptor method to accept effective_autonomy parameter.
    • Modified _make_tool_invoker method to accept effective_autonomy parameter.
  • src/ai_company/engine/loop_protocol.py
    • Added PARKED as a TerminationReason.
    • Updated _validate_error_message to handle the PARKED termination reason.
  • src/ai_company/engine/prompt.py
    • Imported EffectiveAutonomy.
    • Modified build_system_prompt to pass effective_autonomy.
    • Modified _build_core_context to pass effective_autonomy.
    • Modified _build_template_context to pass effective_autonomy.
    • Modified _trim_sections to pass effective_autonomy.
    • Modified _render_and_estimate to pass effective_autonomy.
  • src/ai_company/engine/prompt_template.py
    • Modified the Jinja2 prompt template to display effective_autonomy details.
  • src/ai_company/observability/events/autonomy.py
    • Added constants for various autonomy-related events.
  • src/ai_company/observability/events/persistence.py
    • Added new event constants for ParkedContext persistence operations.
  • src/ai_company/observability/events/timeout.py
    • Added constants for various approval timeout events.
  • src/ai_company/persistence/protocol.py
    • Added parked_contexts property to PersistenceBackend protocol.
  • src/ai_company/persistence/repositories.py
    • Added ParkedContextRepository protocol definition.
  • src/ai_company/persistence/sqlite/backend.py
    • Imported SQLiteParkedContextRepository.
    • Initialized and exposed _parked_contexts property in SQLitePersistenceBackend.
  • src/ai_company/persistence/sqlite/migrations.py
    • Bumped SCHEMA_VERSION to 3.
    • Added _V3_STATEMENTS to create the parked_contexts table and its indexes.
    • Included _apply_v3 in _MIGRATIONS.
  • src/ai_company/persistence/sqlite/parked_context_repo.py
    • Implemented SQLiteParkedContextRepository for CRUD operations on ParkedContext objects.
  • src/ai_company/security/autonomy/init.py
    • Defined __all__ for the new autonomy modules.
  • src/ai_company/security/autonomy/change_strategy.py
    • Implemented HumanOnlyPromotionStrategy to manage autonomy level changes.
  • src/ai_company/security/autonomy/models.py
    • Defined data models for AutonomyPreset, AutonomyConfig, EffectiveAutonomy, and AutonomyOverride.
  • src/ai_company/security/autonomy/protocol.py
    • Defined AutonomyChangeStrategy protocol.
  • src/ai_company/security/autonomy/resolver.py
    • Implemented AutonomyResolver to determine effective autonomy and expand action type patterns.
  • src/ai_company/security/service.py
    • Imported autonomy events and EffectiveAutonomy.
    • Added effective_autonomy to SecOpsService constructor.
    • Implemented _apply_autonomy_precheck to route actions based on autonomy.
    • Added MemoryError/RecursionError re-raises in audit and escalation handling.
  • src/ai_company/security/timeout/init.py
    • Defined __all__ for the new timeout modules.
  • src/ai_company/security/timeout/config.py
    • Defined a discriminated union ApprovalTimeoutConfig for four policy types.
  • src/ai_company/security/timeout/factory.py
    • Implemented create_timeout_policy function to instantiate TimeoutPolicy objects.
  • src/ai_company/security/timeout/models.py
    • Defined TimeoutAction model, including validation for escalate_to.
  • src/ai_company/security/timeout/park_service.py
    • Implemented ParkService to serialize/deserialize AgentContext to ParkedContext.
  • src/ai_company/security/timeout/parked_context.py
    • Defined ParkedContext model to store a snapshot of a parked agent execution.
  • src/ai_company/security/timeout/policies.py
    • Implemented four TimeoutPolicy types: WaitForeverPolicy, DenyOnTimeoutPolicy, TieredTimeoutPolicy, and EscalationChainPolicy.
  • src/ai_company/security/timeout/protocol.py
    • Defined TimeoutPolicy and RiskTierClassifier protocols.
  • src/ai_company/security/timeout/risk_tier_classifier.py
    • Implemented DefaultRiskTierClassifier to map action types to risk tiers.
  • src/ai_company/security/timeout/timeout_checker.py
    • Implemented TimeoutChecker service to evaluate pending approval items.
  • src/ai_company/templates/renderer.py
    • Moved the Jinja2 environment to a module-level singleton.
    • Updated _extract_numeric_config to handle autonomy as float or dictionary.
    • Refactored _resolve_agent_personality to return personality dictionary instead of mutating agent_dict.
  • tests/unit/api/conftest.py
    • Imported ParkedContext.
    • Added FakeParkedContextRepository to FakePersistenceBackend.
  • tests/unit/api/controllers/test_autonomy.py
    • Added unit tests for AutonomyController GET and POST endpoints.
  • tests/unit/core/conftest.py
    • Imported AutonomyConfig.
    • Updated CompanyConfigFactory to include autonomy.
    • Updated CompanyFactory to include config.
  • tests/unit/core/test_company.py
    • Imported AutonomyLevel, DenyOnTimeoutConfig, WaitForeverConfig.
    • Updated TestCompanyConfig to test autonomy as AutonomyConfig and its backward compatibility.
    • Added TestCompanyConfigApprovalTimeout for new timeout configurations.
  • tests/unit/engine/test_loop_protocol.py
    • Updated tests to reflect the addition of PARKED to TerminationReason.
  • tests/unit/engine/test_prompt.py
    • Imported AutonomyLevel and EffectiveAutonomy.
    • Added TestEffectiveAutonomyInPrompt to verify autonomy details in system prompts.
  • tests/unit/observability/test_events.py
    • Updated test_all_domain_modules_discovered to include new autonomy and timeout event modules.
    • Added test_autonomy_events_exist to verify new autonomy event constants.
    • Added test_timeout_events_exist to verify new timeout event constants.
    • Added test_parked_context_persistence_events_exist to verify new parked context persistence event constants.
  • tests/unit/persistence/sqlite/test_migrations.py
    • Added tests test_v3_creates_parked_contexts_table and test_v3_creates_parked_context_indexes.
  • tests/unit/persistence/sqlite/test_parked_context_repo.py
    • Added comprehensive unit tests for SQLiteParkedContextRepository.
  • tests/unit/persistence/test_migrations_v2.py
    • Updated test_schema_version_is_two to test_schema_version_is_three and adjusted expected schema version.
  • tests/unit/persistence/test_protocol.py
    • Imported ParkedContext.
    • Added _FakeParkedContextRepository to the fake persistence backend.
  • tests/unit/security/autonomy/test_change_strategy.py
    • Added unit tests for HumanOnlyPromotionStrategy.
  • tests/unit/security/autonomy/test_models.py
    • Added unit tests for autonomy-related data models.
  • tests/unit/security/autonomy/test_resolver.py
    • Added unit tests for AutonomyResolver.
  • tests/unit/security/test_service.py
    • Imported AutonomyLevel and EffectiveAutonomy.
    • Added TestAutonomyPrecheck to verify the new autonomy pre-check logic in SecOpsService.
  • tests/unit/security/timeout/test_config.py
    • Added unit tests for ApprovalTimeoutConfig and its constituent models.
  • tests/unit/security/timeout/test_factory.py
    • Added unit tests for create_timeout_policy factory function.
  • tests/unit/security/timeout/test_park_service.py
    • Added unit tests for ParkService.
  • tests/unit/security/timeout/test_parked_context.py
    • Added unit tests for the ParkedContext data model.
  • tests/unit/security/timeout/test_policies.py
    • Added unit tests for WaitForeverPolicy, DenyOnTimeoutPolicy, TieredTimeoutPolicy, and EscalationChainPolicy.
  • tests/unit/security/timeout/test_risk_tier_classifier.py
    • Added unit tests for DefaultRiskTierClassifier.
  • tests/unit/security/timeout/test_timeout_checker.py
    • Added unit tests for TimeoutChecker.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new features for agent autonomy and approval timeouts. The changes are extensive, adding new models, services, persistence layers, and API endpoints. The implementation of autonomy levels with a three-tier resolution chain and pluggable timeout policies is well-structured. The code also includes several important fixes, such as preventing MemoryError from being caught in broad exception blocks and improving the template rendering performance.

My review focuses on a few areas for improvement. The new AutonomyController currently uses the company-wide default autonomy level instead of resolving the agent-specific level, which is a correctness issue. I've also pointed out a couple of smaller maintainability improvements related to magic numbers and logging clarity. Overall, this is a substantial and well-executed feature addition.

Note: Security Review did not run due to the size of the PR.

"""
app_state: AppState = state.app_state
config = app_state.config.config
level = config.autonomy.level
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation returns the company-wide default autonomy level, not the agent-specific one. The endpoint GET /agents/{agent_id}/autonomy implies it should return the level for the specified agent_id. This could be misleading for clients of the API. Consider fetching the agent's data and using the AutonomyResolver to determine their effective autonomy level.

"""
app_state: AppState = state.app_state
config = app_state.config.config
current_level = config.autonomy.level
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to get_autonomy, this method uses the company-wide default autonomy level as the current_level. This is likely incorrect as it doesn't reflect the agent's actual current autonomy level, which might have overrides. This could lead to incorrect logging and behavior. You should resolve the agent-specific autonomy level here as well.

Thresholds: 0.0-0.24 → locked, 0.25-0.49 → supervised,
0.5-0.79 → semi, 0.8-1.0 → full.
"""
if value < 0.25: # noqa: PLR2004
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The thresholds 0.25, 0.5, and 0.8 used for converting a float to an AutonomyLevel are magic numbers. To improve readability and maintainability, consider defining them as named constants at the module level, for example _LOCKED_THRESHOLD = 0.25.

except (sqlite3.Error, aiosqlite.Error) as exc:
msg = f"Failed to delete parked context {parked_id!r}"
logger.exception(
PERSISTENCE_PARKED_CONTEXT_QUERY_FAILED,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log event PERSISTENCE_PARKED_CONTEXT_QUERY_FAILED is used here for a delete operation failure. This is misleading for anyone debugging persistence issues. It would be better to use a more specific event like PERSISTENCE_PARKED_CONTEXT_DELETE_FAILED. You may need to define this new event constant.

Implement four autonomy levels (full/semi/supervised/locked) with
three-level resolution chain (agent→department→company), per-action
classification with category expansion, seniority validation, and
runtime changes via pluggable strategy. Add four timeout policies
(wait-forever/deny/tiered/escalation-chain) with risk tier
classification, parked context persistence, and timeout checker.

- Phase 1: AutonomyLevel/DowngradeReason enums, AutonomyPreset,
  AutonomyConfig, EffectiveAutonomy, AutonomyResolver,
  HumanOnlyPromotionStrategy, AutonomyChangeStrategy protocol
- Phase 2: CompanyConfig.autonomy float→AutonomyConfig migration,
  Department.autonomy_level, AgentIdentity.autonomy_level
- Phase 3: SecOpsService autonomy pre-check (auto-approve/escalate),
  AgentEngine effective_autonomy param, AutonomyController REST API
- Phase 4: Effective autonomy section in system prompt template
- Phase 5: TimeoutActionType enum, TimeoutPolicy protocol, four
  policy implementations, discriminated union config, factory
- Phase 6: ParkedContext model, ParkedContextRepository protocol,
  SQLite implementation, v3 migration
- Phase 7: CompanyConfig.approval_timeout field
- Phase 8: ParkService park/resume, TimeoutChecker, PARKED
  termination reason

Closes #42, Closes #126
Pre-reviewed by 10 agents, 51 findings addressed:
- Fix autonomy controller returning requested level instead of current
- Add disjoint validator on EffectiveAutonomy action sets
- Add escalate_to consistency validator on TimeoutAction
- Add seniority constraint enforcement in AutonomyResolver
- Add MemoryError/RecursionError re-raise in security service
- Fix _row_to_model to raise QueryError instead of returning None
- Rename YamlRiskTierClassifier to DefaultRiskTierClassifier
- Move Jinja2 env to module-level singleton in renderer
- Fix personality mutation pattern (return instead of mutate)
- Add security guard blocking auto-approve for HIGH/CRITICAL risk
- Fix immutability violations (deepcopy metadata, immutable dicts)
- Enumerate columns explicitly in SELECT queries
- Register AutonomyController with app router
- Add comprehensive tests for new code paths
…dates

- Fix circular import in security/autonomy/__init__.py (removed eager
  AutonomyResolver import that caused core→security→core cycle)
- Fix CompanyConfigFactory to pin approval_timeout=WaitForeverConfig()
- Update prompt template version assertions to 1.4.0
- Fix test_non_pending_item_raises to supply decided_at/decided_by
- Replace assert with restructured control flow in TieredTimeoutPolicy
- Update DESIGN_SPEC.md, CLAUDE.md, README.md for autonomy/timeout docs
@Aureliolo Aureliolo force-pushed the feat/autonomy-levels-and-approval-timeout branch from 1855156 to 4deae2a Compare March 10, 2026 13:30
@Aureliolo Aureliolo merged commit eecc25a into main Mar 10, 2026
8 of 9 checks passed
@Aureliolo Aureliolo deleted the feat/autonomy-levels-and-approval-timeout branch March 10, 2026 13:30
Comment on lines +239 to +240
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Python 2 except syntax error (second occurrence)

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 239-240

Comment:
**Same Python 2 `except` syntax error (second occurrence)**

```suggestion
            except (MemoryError, RecursionError):
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +338 to +339
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Python 2 except syntax error (third occurrence)

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 338-339

Comment:
**Same Python 2 `except` syntax error (third occurrence)**

```suggestion
        except (MemoryError, RecursionError):
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +390 to +391
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Python 2 except syntax error (fourth occurrence)

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 390-391

Comment:
**Same Python 2 `except` syntax error (fourth occurrence)**

```suggestion
        except (MemoryError, RecursionError):
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +65 to +66
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Python 2 except syntax error in timeout_checker.py

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/timeout/timeout_checker.py
Line: 65-66

Comment:
**Same Python 2 `except` syntax error in `timeout_checker.py`**

```suggestion
        except (MemoryError, RecursionError):
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +285 to +314
if elapsed_seconds < step_end:
if idx == 0:
# First step hasn't timed out yet — WAIT.
logger.debug(
TIMEOUT_WAITING,
approval_id=item.id,
escalation_role=step.role,
elapsed_seconds=elapsed_seconds,
)
return TimeoutAction(
action=TimeoutActionType.WAIT,
reason=(
f"Waiting at {step.role!r} — "
f"{elapsed_seconds:.0f}s of "
f"{step_end:.0f}s elapsed"
),
)
# Previous step timed out — escalate to this step's role.
logger.info(
TIMEOUT_ESCALATED,
approval_id=item.id,
escalation_role=step.role,
elapsed_seconds=elapsed_seconds,
)
return TimeoutAction(
action=TimeoutActionType.ESCALATE,
reason=(
f"Escalated to {step.role!r} — {elapsed_seconds:.0f}s elapsed"
),
escalate_to=step.role,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EscalationChainPolicy escalates to the wrong role — off-by-one in chain resolution

When step idx-1's timeout expires and the loop reaches idx, the code escalates to step[idx].role (the current step's role). But the semantics of the chain are that expiring step N should escalate to step N's role, not step N+1's role. The current code always skips step[0].role as an escalation target entirely.

Consider a two-step chain [team_lead(10 min), manager(20 min)]:

elapsed Expected Actual
0–10 min WAIT WAIT ✓
10–30 min ESCALATE → team_lead ESCALATE → manager
30+ min ESCALATE → manager / exhausted exhausted ✗

For a single-step chain [cto(60 min)], cto is never escalated to at all — the policy immediately returns on_chain_exhausted after 60 minutes without ever issuing a ESCALATE action.

The fix is to use the previous step's role when deciding where to escalate:

for idx, step in enumerate(self._chain):
    step_timeout = step.timeout_minutes * _SECONDS_PER_MINUTE
    step_end = cumulative_seconds + step_timeout
    if elapsed_seconds < step_end:
        if idx == 0:
            # Waiting for the initial reviewer — no escalation yet.
            return TimeoutAction(
                action=TimeoutActionType.WAIT,
                reason=...,
            )
        # Previous step (chain[idx-1]) timed out — escalate to that step's role.
        prev_step = self._chain[idx - 1]
        return TimeoutAction(
            action=TimeoutActionType.ESCALATE,
            reason=f"Escalated to {prev_step.role!r} ...",
            escalate_to=prev_step.role,
        )
    cumulative_seconds += step_timeout
# Chain exhausted — escalate to the last step's role before on_chain_exhausted,
# or apply on_chain_exhausted directly depending on the design intent.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/timeout/policies.py
Line: 285-314

Comment:
**`EscalationChainPolicy` escalates to the wrong role — off-by-one in chain resolution**

When step `idx-1`'s timeout expires and the loop reaches `idx`, the code escalates to `step[idx].role` (the *current* step's role). But the semantics of the chain are that expiring step N should escalate *to* step N's role, not step N+1's role. The current code always skips step[0].role as an escalation target entirely.

Consider a two-step chain `[team_lead(10 min), manager(20 min)]`:

| elapsed | Expected | Actual |
|---------|----------|--------|
| 0–10 min | WAIT | WAIT ✓ |
| 10–30 min | ESCALATE → `team_lead` | ESCALATE → `manager`|
| 30+ min | ESCALATE → `manager` / exhausted | exhausted ✗ |

For a **single-step** chain `[cto(60 min)]`, `cto` is never escalated to at all — the policy immediately returns `on_chain_exhausted` after 60 minutes without ever issuing a `ESCALATE` action.

The fix is to use the **previous** step's role when deciding where to escalate:

```python
for idx, step in enumerate(self._chain):
    step_timeout = step.timeout_minutes * _SECONDS_PER_MINUTE
    step_end = cumulative_seconds + step_timeout
    if elapsed_seconds < step_end:
        if idx == 0:
            # Waiting for the initial reviewer — no escalation yet.
            return TimeoutAction(
                action=TimeoutActionType.WAIT,
                reason=...,
            )
        # Previous step (chain[idx-1]) timed out — escalate to that step's role.
        prev_step = self._chain[idx - 1]
        return TimeoutAction(
            action=TimeoutActionType.ESCALATE,
            reason=f"Escalated to {prev_step.role!r} ...",
            escalate_to=prev_step.role,
        )
    cumulative_seconds += step_timeout
# Chain exhausted — escalate to the last step's role before on_chain_exhausted,
# or apply on_chain_exhausted directly depending on the design intent.
```

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants