feat: add autonomy levels and approval timeout policies (#42, #126)#197
feat: add autonomy levels and approval timeout policies (#42, #126)#197
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (63)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a complete autonomy + approval-timeout subsystem: enums, models, resolver, change strategy, timeout policies, park/resume, persistence (ParkedContext + repo + migration), SecOps/autonomy wiring, prompt/engine threading, API controller, observability events, and comprehensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AutonomyResolver
participant AutonomyConfig
participant ActionRegistry
Client->>AutonomyResolver: resolve(agent_level, dept_level, seniority)
AutonomyResolver->>AutonomyConfig: select preset (agent>dept>company)
AutonomyConfig->>AutonomyConfig: fetch preset for level
AutonomyResolver->>ActionRegistry: expand patterns -> concrete actions
ActionRegistry-->>AutonomyResolver: return action set
AutonomyResolver-->>Client: EffectiveAutonomy(level, auto_approve, human_approval, security_agent)
sequenceDiagram
participant AgentEngine
participant SecOpsService
participant TimeoutChecker
participant TimeoutPolicy
participant ParkService
participant Persistence
AgentEngine->>SecOpsService: evaluate action with EffectiveAutonomy
SecOpsService->>SecOpsService: is action auto_approved?
alt Auto-approved
SecOpsService-->>AgentEngine: ALLOW (proceed)
else Human approval required
SecOpsService->>TimeoutChecker: check_and_resolve(approval_item)
TimeoutChecker->>TimeoutPolicy: determine_action(item, elapsed)
TimeoutPolicy-->>TimeoutChecker: WAIT / APPROVE / DENY / ESCALATE
alt WAIT
SecOpsService->>ParkService: park(context)
ParkService->>Persistence: save(ParkedContext)
Persistence-->>ParkService: parked_id
ParkService-->>SecOpsService: parked_context
SecOpsService-->>AgentEngine: Task parked (agent continues others)
else APPROVE/DENY/ESCALATE
SecOpsService-->>AgentEngine: return verdict (proceed/escalate/deny)
end
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
Greptile SummaryThis PR implements the autonomy levels (#42) and approval timeout policies (#126) features — a substantial addition covering Two blocking issues require fixes before merging:
Additional non-blocking finding:
Confidence Score: 1/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Engine as AgentEngine
participant SecOps as SecOpsService
participant Rules as RuleEngine
participant Autonomy as AutonomyAugmentation
participant Store as ApprovalStore
participant Park as ParkService
participant Repo as ParkedContextRepository
participant Timeout as TimeoutChecker
Engine->>SecOps: intercept(context, effective_autonomy)
SecOps->>Rules: evaluate(context)
Rules-->>SecOps: SecurityVerdict (ALLOW/DENY/ESCALATE)
SecOps->>Autonomy: _apply_autonomy_augmentation(context, verdict)
Note over Autonomy: DENY/ESCALATE from rules always wins<br/>ALLOW may be upgraded to ESCALATE<br/>based on human_approval_actions
Autonomy-->>SecOps: verdict (possibly upgraded)
alt verdict == ESCALATE
SecOps->>Store: add(ApprovalItem)
SecOps-->>Engine: verdict with approval_id
Engine->>Park: park(context, approval_id)
Park-->>Engine: ParkedContext
Engine->>Repo: save(ParkedContext)
Engine-->>Engine: ExecutionResult(PARKED)
else verdict == ALLOW
SecOps-->>Engine: allow
else verdict == DENY
SecOps-->>Engine: deny
end
Note over Timeout: Background / periodic poll
Timeout->>Store: get pending items
Timeout->>Timeout: check(item, elapsed_seconds)
alt APPROVE / DENY
Timeout->>Store: update item status
else ESCALATE
Timeout->>Store: re-route to next role
end
|
There was a problem hiding this comment.
Pull request overview
Implements autonomy-level based approval routing and configurable approval-timeout handling, including task park/resume persistence, and integrates these concepts into the engine prompt/security flow.
Changes:
- Add autonomy subsystem (models, resolver, change strategy) and wire “effective autonomy” into SecOps pre-tool checks and system prompt rendering.
- Add approval timeout subsystem (policies, classifier, checker, config/factory) plus parked-context model/service and SQLite persistence + migration v3.
- Add API endpoint for reading/requesting autonomy changes, new observability event constants, and broad unit test coverage for the new subsystems.
Reviewed changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/security/timeout/test_timeout_checker.py | Unit tests for TimeoutChecker behavior and resolution updates. |
| tests/unit/security/timeout/test_risk_tier_classifier.py | Tests default/custom mappings and unknown-action fallback for risk tiers. |
| tests/unit/security/timeout/test_policies.py | Tests wait/deny/tiered/escalation timeout policies. |
| tests/unit/security/timeout/test_parked_context.py | Tests ParkedContext validation and immutability. |
| tests/unit/security/timeout/test_park_service.py | Tests park/resume serialization round-trip for AgentContext. |
| tests/unit/security/timeout/test_factory.py | Tests timeout policy factory returns correct implementations. |
| tests/unit/security/timeout/test_config.py | Tests timeout config discriminated union + TimeoutAction validator. |
| tests/unit/security/timeout/init.py | Test package marker for timeout tests. |
| tests/unit/security/test_service.py | Adds coverage for SecOpsService autonomy pre-check routing. |
| tests/unit/security/autonomy/test_resolver.py | Tests autonomy resolution chain + category/all expansion + seniority constraints. |
| tests/unit/security/autonomy/test_models.py | Tests autonomy preset/config/effective models and disjoint validation. |
| tests/unit/security/autonomy/test_change_strategy.py | Tests HumanOnlyPromotionStrategy promotion/downgrade/recovery behavior. |
| tests/unit/security/autonomy/init.py | Test package marker for autonomy tests. |
| tests/unit/persistence/test_protocol.py | Extends fake persistence backend/protocol coverage for parked contexts repo. |
| tests/unit/persistence/test_migrations_v2.py | Updates schema version assertions to v3. |
| tests/unit/persistence/sqlite/test_parked_context_repo.py | Adds CRUD/upsert/ordering/deserialization-failure tests for parked contexts repo. |
| tests/unit/persistence/sqlite/test_migrations.py | Adds assertions for v3 parked_contexts table + indexes. |
| tests/unit/observability/test_events.py | Expands expected event modules and asserts new autonomy/timeout/persistence events. |
| tests/unit/engine/test_prompt.py | Tests effective autonomy section inclusion/omission in system prompt. |
| tests/unit/engine/test_loop_protocol.py | Updates TerminationReason enum tests for new PARKED value. |
| tests/unit/core/test_company.py | Updates CompanyConfig autonomy field semantics + adds approval_timeout tests. |
| tests/unit/core/conftest.py | Updates factories to provide AutonomyConfig defaults. |
| tests/unit/api/controllers/test_autonomy.py | Tests new autonomy controller GET/POST and access controls. |
| tests/unit/api/conftest.py | Adds FakeParkedContextRepository into fake persistence backend for API tests. |
| src/ai_company/templates/renderer.py | Reuses module-level Jinja env; supports autonomy config dict passthrough; fixes personality handling to avoid mutation. |
| src/ai_company/security/timeout/timeout_checker.py | Adds TimeoutChecker for evaluating pending approvals against a TimeoutPolicy. |
| src/ai_company/security/timeout/risk_tier_classifier.py | Adds DefaultRiskTierClassifier with fail-safe HIGH fallback + logging. |
| src/ai_company/security/timeout/protocol.py | Defines TimeoutPolicy and RiskTierClassifier protocols. |
| src/ai_company/security/timeout/policies.py | Implements WaitForever/DenyOnTimeout/TieredTimeout/EscalationChain timeout policies. |
| src/ai_company/security/timeout/parked_context.py | Adds ParkedContext Pydantic model for serialized parked executions. |
| src/ai_company/security/timeout/park_service.py | Adds ParkService to serialize/deserialize AgentContext for park/resume. |
| src/ai_company/security/timeout/models.py | Adds TimeoutAction model + escalate_to consistency validator. |
| src/ai_company/security/timeout/factory.py | Adds create_timeout_policy factory for policy configs. |
| src/ai_company/security/timeout/config.py | Adds discriminated-union timeout policy configuration models. |
| src/ai_company/security/timeout/init.py | Exposes timeout subsystem public API. |
| src/ai_company/security/service.py | Adds effective autonomy integration and pre-check routing in SecOpsService. |
| src/ai_company/security/autonomy/resolver.py | Adds AutonomyResolver for agent→department→company resolution + pattern expansion + seniority validation. |
| src/ai_company/security/autonomy/protocol.py | Defines AutonomyChangeStrategy protocol. |
| src/ai_company/security/autonomy/models.py | Adds autonomy presets/config/effective models + runtime override model. |
| src/ai_company/security/autonomy/change_strategy.py | Adds HumanOnlyPromotionStrategy (deny promotions/recovery; apply auto-downgrades). |
| src/ai_company/security/autonomy/init.py | Exposes autonomy subsystem public API. |
| src/ai_company/persistence/sqlite/parked_context_repo.py | Adds SQLiteParkedContextRepository implementation. |
| src/ai_company/persistence/sqlite/migrations.py | Bumps schema to v3 and adds parked_contexts table + indexes migration. |
| src/ai_company/persistence/sqlite/backend.py | Wires parked_contexts repository into SQLite persistence backend. |
| src/ai_company/persistence/repositories.py | Adds ParkedContextRepository protocol. |
| src/ai_company/persistence/protocol.py | Extends PersistenceBackend protocol with parked_contexts repository. |
| src/ai_company/observability/events/timeout.py | Adds timeout event constants. |
| src/ai_company/observability/events/persistence.py | Adds parked-context persistence event constants. |
| src/ai_company/observability/events/autonomy.py | Adds autonomy subsystem event constants. |
| src/ai_company/engine/prompt_template.py | Adds effective autonomy section to system prompt template. |
| src/ai_company/engine/prompt.py | Passes effective autonomy into prompt context and renders it. |
| src/ai_company/engine/loop_protocol.py | Adds PARKED termination reason + validation rules. |
| src/ai_company/engine/agent_engine.py | Threads effective_autonomy into tool invoker/security interceptor and prompt building. |
| src/ai_company/core/enums.py | Adds AutonomyLevel, DowngradeReason, TimeoutActionType enums. |
| src/ai_company/core/company.py | Adds autonomy config and approval_timeout to CompanyConfig; adds dept autonomy override field. |
| src/ai_company/core/agent.py | Adds per-agent autonomy override field to AgentIdentity. |
| src/ai_company/config/schema.py | Adds autonomy_level to AgentConfig schema. |
| src/ai_company/api/controllers/autonomy.py | Adds AutonomyController GET/POST endpoints for autonomy level. |
| src/ai_company/api/controllers/init.py | Registers AutonomyController in controllers module exports/imports. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if action in autonomy.auto_approve_actions: | ||
| logger.info( | ||
| AUTONOMY_ACTION_AUTO_APPROVED, | ||
| tool_name=context.tool_name, | ||
| action_type=action, | ||
| autonomy_level=autonomy.level.value, | ||
| ) | ||
| return SecurityVerdict( | ||
| verdict=SecurityVerdictType.ALLOW, | ||
| reason=f"Auto-approved by autonomy level '{autonomy.level.value}'", | ||
| risk_level=ApprovalRiskLevel.LOW, | ||
| evaluated_at=now, | ||
| evaluation_duration_ms=0.0, | ||
| ) | ||
|
|
||
| if action in autonomy.human_approval_actions: | ||
| logger.info( | ||
| AUTONOMY_ACTION_HUMAN_REQUIRED, | ||
| tool_name=context.tool_name, | ||
| action_type=action, | ||
| autonomy_level=autonomy.level.value, | ||
| ) | ||
| return SecurityVerdict( | ||
| verdict=SecurityVerdictType.ESCALATE, | ||
| reason=( | ||
| f"Human approval required by autonomy level " | ||
| f"'{autonomy.level.value}'" | ||
| ), | ||
| risk_level=ApprovalRiskLevel.MEDIUM, | ||
| evaluated_at=now, | ||
| evaluation_duration_ms=0.0, | ||
| ) |
There was a problem hiding this comment.
_check_autonomy() hard-codes risk_level to LOW for auto-approved actions and MEDIUM for human-required actions. With presets like SEMI auto-approving broad categories (e.g., "code", "vcs"), this can mislabel genuinely HIGH/CRITICAL actions (and also affects what gets stored on ApprovalItem when escalating). Use the existing risk classification logic from the security subsystem to derive risk_level from action_type (or run the rule engine and reuse its computed risk) instead of hard-coding tiers here.
| cumulative_seconds = 0.0 | ||
| for step in self._chain: | ||
| step_timeout = step.timeout_minutes * _SECONDS_PER_MINUTE | ||
| if elapsed_seconds < cumulative_seconds + step_timeout: | ||
| logger.debug( | ||
| TIMEOUT_WAITING, | ||
| approval_id=item.id, | ||
| escalation_role=step.role, | ||
| elapsed_seconds=elapsed_seconds, | ||
| ) | ||
| return TimeoutAction( | ||
| action=TimeoutActionType.ESCALATE, | ||
| reason=( | ||
| f"Escalated to {step.role!r} — {elapsed_seconds:.0f}s elapsed" | ||
| ), | ||
| escalate_to=step.role, | ||
| ) |
There was a problem hiding this comment.
EscalationChainPolicy.determine_action() returns ESCALATE for the entire duration of the current step (elapsed_seconds < cumulative + step_timeout). If TimeoutChecker runs periodically, callers will repeatedly receive ESCALATE for the same escalate_to target and may re-escalate/notify on every tick (there’s no state on ApprovalItem to dedupe). Consider returning WAIT while within a step and only emitting ESCALATE when transitioning to the next step (or include enough state/metadata to make escalation idempotent).
| """Request an autonomy level change for an agent. | ||
|
|
||
| Validates seniority constraints and routes through the | ||
| configured ``AutonomyChangeStrategy``. Returns 200 with the | ||
| current level. If the change requires human approval, the | ||
| response includes ``promotion_pending=True``. | ||
|
|
There was a problem hiding this comment.
update_autonomy() docstring says it “Validates seniority constraints and routes through the configured AutonomyChangeStrategy”, but the implementation doesn’t consult agent/department overrides, doesn’t validate seniority, and doesn’t call any strategy (it always returns promotion_pending=True). Please either implement the documented behavior (wire in resolver/strategy + validation) or adjust the docstring/response fields to match the current placeholder behavior to avoid misleading API consumers.
| @model_validator(mode="before") | ||
| @classmethod | ||
| def _coerce_autonomy_float(cls, data: object) -> object: | ||
| """Accept a bare float for autonomy and convert to AutonomyConfig.""" | ||
| if not isinstance(data, dict): | ||
| return data | ||
| raw = data.get("autonomy") | ||
| if isinstance(raw, (int, float)) and not isinstance(raw, bool): | ||
| level = _float_to_autonomy_level(float(raw)) | ||
| return {**data, "autonomy": {"level": level.value}} | ||
| return data |
There was a problem hiding this comment.
CompanyConfig._coerce_autonomy_float() converts any numeric autonomy value into an AutonomyConfig without validating the old 0.0–1.0 contract. This regresses prior bounds checks (e.g., -0.1 or 2.0 will silently map to LOCKED/FULL), and NaN/inf will also map to FULL due to comparison semantics. Add explicit validation (finite + 0.0 <= value <= 1.0) before calling _float_to_autonomy_level(), and raise a clear ValueError on invalid inputs.
src/ai_company/security/service.py
Outdated
| # Autonomy pre-check: route based on effective autonomy before | ||
| # the full rule engine. Hard-deny is always checked first. | ||
| autonomy_result = await self._apply_autonomy_precheck(context) | ||
| if autonomy_result is not None: | ||
| return autonomy_result |
There was a problem hiding this comment.
The autonomy pre-check short-circuits the rule engine (evaluate_pre_tool() returns early when _apply_autonomy_precheck() yields a verdict). This bypasses the rule engine’s detectors (credential/path traversal/data leak, etc.) even for actions that are “auto-approved” by autonomy presets, which can materially weaken security guarantees compared to the existing PolicyValidator behavior (auto-approve should not skip remaining detection rules). Consider always running the rule engine first, then applying autonomy routing as a post-processing step (e.g., convert ALLOW→ESCALATE when autonomy requires human approval), while still respecting DENY/ESCALATE produced by detectors.
There was a problem hiding this comment.
Actionable comments posted: 37
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/ai_company/engine/agent_engine.py (1)
683-726:⚠️ Potential issue | 🔴 CriticalDon’t make autonomy enforcement conditional on
SecurityConfig.
effective_autonomyonly reachesSecOpsService, but
_make_security_interceptor()returnsNonewhen security is absent/disabled.
In that branch_make_tool_invoker()still builds aToolInvoker, so actions
that should park or require approval can run with only the static
tool-permission check. Fail closed here, or provide an autonomy-only
interceptor.🚫 Minimal fail-closed fix
def _make_security_interceptor( self, effective_autonomy: EffectiveAutonomy | None = None, ) -> SecurityInterceptionStrategy | None: """Build the SecOps security interceptor if configured.""" if self._security_config is None: + if effective_autonomy is not None: + msg = ( + "effective_autonomy cannot be enforced without SecurityConfig" + ) + logger.error(SECURITY_DISABLED, note=msg) + raise ExecutionStateError(msg) logger.warning( SECURITY_DISABLED, note="No SecurityConfig provided — all security checks skipped", ) return None if not self._security_config.enabled: + if effective_autonomy is not None: + msg = ( + "effective_autonomy cannot be enforced when security is disabled" + ) + logger.error(SECURITY_DISABLED, note=msg) + raise ExecutionStateError(msg) return NoneAlso applies to: 728-741
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/agent_engine.py` around lines 683 - 726, The current _make_security_interceptor returns None when SecurityConfig is missing/disabled which disables autonomy enforcement; change it to fail-closed by returning an autonomy-only interceptor that implements SecurityInterceptionStrategy (instead of None) so autonomy rules still apply even when other security detectors are off: locate _make_security_interceptor and where it currently returns None for self._security_config is None or not self._security_config.enabled, and replace that branch with construction/return of a minimal interceptor (e.g., an AutonomyEnforcementInterceptor or a SecOpsService instance configured only with effective_autonomy, self._approval_store, and self._audit_log and no detectors) so _make_tool_invoker can still rely on this interceptor to park/require approval according to effective_autonomy.tests/unit/persistence/test_protocol.py (1)
126-186:⚠️ Potential issue | 🟡 MinorAdd the missing
ParkedContextRepositoryconformance check.
test_fake_backend_is_persistence_backend()only proves thatparked_contextsexists. It does not verify that_FakeParkedContextRepositoryitself satisfies the repository protocol, so signature drift on the fake can slip through this file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/persistence/test_protocol.py` around lines 126 - 186, Update the test to explicitly verify that the fake repository type matches the repository protocol: import ParkedContextRepository and add a conformance assertion for _FakeParkedContextRepository (e.g., assert isinstance(_FakeParkedContextRepository(), ParkedContextRepository) or an equivalent runtime/type-check that your test suite uses) inside test_fake_backend_is_persistence_backend so that signature drift on _FakeParkedContextRepository is caught; reference the symbols _FakeParkedContextRepository and ParkedContextRepository when adding the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/api/controllers/autonomy.py`:
- Around line 73-81: get_autonomy() and update_autonomy() currently ignore
per-agent/department overrides and pending requests by always using
config.autonomy.level and flipping promotion_pending manually; replace this by
routing both handlers through the AutonomyResolver and AutonomyChangeStrategy so
they compute the effective level and validation logic, consult and update the
persistent pending-request store, and emit events only after the strategy
decides (e.g., persist pending vs deny vs apply). Specifically: in
get_autonomy() return AutonomyLevelResponse built from
AutonomyResolver.resolve(agent_id) plus promotion_pending read from the
pending-request store; in update_autonomy() call
AutonomyChangeStrategy.request_change(agent_id, requested_level, actor) (or the
resolver API) to validate/apply/persist the change, persist any pending request
in the store, and return the resulting effective level and promotion_pending
flag instead of echoing config.autonomy.level; ensure AUTONOMY_PROMOTION_DENIED
is emitted only when the strategy denies, not when a request is simply pending.
In `@src/ai_company/core/agent.py`:
- Around line 320-323: The AgentIdentity model currently allows an invalid
combination (level == JUNIOR with autonomy_level == FULL); add a Pydantic
validator (e.g., a root_validator or field validator) on the AgentIdentity class
to detect when level is AutonomyLevel.JUNIOR and autonomy_level is
AutonomyLevel.FULL and raise a ValueError describing the forbidden combination;
reference the autonomy_level and level fields in the check so construction fails
fast and prevents creating AgentIdentity(level=JUNIOR, autonomy_level=FULL).
In `@src/ai_company/core/company.py`:
- Around line 373-383: The before-validator _coerce_autonomy_float should reject
non-finite or out-of-range legacy numeric autonomy values before converting
them; import isfinite from math and, inside _coerce_autonomy_float, when raw is
a numeric (and not bool) first check isfinite(raw) and that raw is within the
valid range (e.g. 0.0 <= raw <= 1.0), and if not raise a ValueError with a clear
message instead of coercing; if it passes, continue to call
_float_to_autonomy_level and return the coerced dict as before.
In `@src/ai_company/engine/prompt_template.py`:
- Around line 144-154: The DEFAULT_TEMPLATE was modified to add autonomy fields
but PROMPT_TEMPLATE_VERSION was not updated; update the PROMPT_TEMPLATE_VERSION
constant (referenced as PROMPT_TEMPLATE_VERSION) to a new semver (e.g., "1.4.0")
so cache/snapshot/telemetry can distinguish the autonomy-aware template, and
ensure any tests or places that import PROMPT_TEMPLATE_VERSION are updated
accordingly; locate the constant near the top of the module and bump it to the
new version to match the DEFAULT_TEMPLATE change.
In `@src/ai_company/engine/prompt.py`:
- Around line 399-404: The projection of EffectiveAutonomy into ctx omits the
security_agent field; update the block that sets ctx["effective_autonomy"] (the
code using effective_autonomy.level, auto_approve_actions,
human_approval_actions) to also include security_agent (e.g., "security_agent":
effective_autonomy.security_agent or its serializable representation) so
templates receive the escalation reviewer info along with level and actions.
In `@src/ai_company/persistence/protocol.py`:
- Around line 114-116: The class docstring for PersistenceBackend is missing the
new public API attribute parked_contexts; update the PersistenceBackend
docstring Attributes section to add an entry for parked_contexts (similar style
to existing entries like collaboration_metrics) describing that it returns a
ParkedContextRepository for ParkedContext persistence and ensure it appears
after collaboration_metrics so generated docs reflect the new property.
In `@src/ai_company/security/autonomy/change_strategy.py`:
- Around line 92-108: The code unconditionally sets current_level =
_DOWNGRADE_MAP[reason], which can raise an agent's autonomy if a later downgrade
maps to a less restrictive level; modify auto_downgrade() logic so when
self._overrides.get(agent_id) exists you do not increase autonomy: compute
new_level = _DOWNGRADE_MAP[reason] but if existing is present set current_level
= the more restrictive of existing.current_level and new_level (i.e., do not
replace an equal-or-more-restrictive level such as LOCKED with a
less-restrictive one); update the AutonomyOverride creation to use that
non-escalating current_level and leave original_level unchanged (use symbols:
_DOWNGRADE_MAP, auto_downgrade(), self._overrides, AutonomyOverride).
In `@src/ai_company/security/autonomy/models.py`:
- Around line 62-135: BUILTIN_PRESETS and AutonomyConfig.presets must be
immutable; change BUILTIN_PRESETS to an immutable Mapping (wrap the literal dict
in types.MappingProxyType) and update AutonomyConfig.presets to use a
Mapping[str, AutonomyPreset] type and a default_factory that returns an
immutable deep-copy of the builtin mapping (e.g., return
types.MappingProxyType(copy.deepcopy(BUILTIN_PRESETS))). Import copy and
types.MappingProxyType and ensure you reference BUILTIN_PRESETS, AutonomyConfig,
and the presets field when making these edits.
In `@src/ai_company/security/autonomy/resolver.py`:
- Around line 50-109: The resolve method and related pattern expansion exceed
the 50-line limit and do multiple responsibilities; refactor by extracting
preset lookup/validation and per-pattern expansion into small helper methods
(e.g., create a _get_preset_or_raise(level) that encapsulates the preset lookup,
warning log and ValueError, and a _expand_pattern_list(patterns) that wraps
_expand_patterns per-item branching), update resolve to call
_get_preset_or_raise and _expand_pattern_list, keep validate_seniority usage and
EffectiveAutonomy construction unchanged, and ensure new helpers are
unit-testable and maintain existing log fields (AUTONOMY_RESOLVED,
resolved_level, agent_override, department_override, counts).
In `@src/ai_company/security/service.py`:
- Around line 148-152: The early return after calling _apply_autonomy_precheck
prevents context from reaching self._rule_engine.evaluate(context); change the
flow so that _apply_autonomy_precheck is used only to short-circuit on an
explicit hard-deny, but otherwise do not return early—always call
self._rule_engine.evaluate(context) when autonomy_result is not a hard-deny, and
then merge or reconcile autonomy_result.risk_level (or related fields) into the
rule engine result so the final decision preserves the full security assessment
(use _apply_autonomy_precheck, _check_autonomy, and self._rule_engine.evaluate
to locate and implement the merge/short-circuit logic).
In `@src/ai_company/security/timeout/config.py`:
- Around line 82-86: The tiers mapping currently typed as dict[str, TierConfig]
allows typos (e.g., "critcal") which silently fall back at runtime; change the
field to use a constrained key type and validate entries explicitly: replace
dict[str, TierConfig] with dict[Literal["low","medium","high","critical"],
TierConfig] (or add a validator on the tiers field) in the model that declares
policy and tiers, and add a pydantic validator that raises a clear
ValidationError if any key is not one of the allowed risk levels
(low/medium/high/critical) so mis-typed config keys fail fast with a helpful
message.
- Around line 57-64: TierConfig.on_timeout and
EscalationChainConfig.on_chain_exhausted currently allow
TimeoutActionType.ESCALATE even though the models don't capture an escalate_to
target, which later causes TimeoutAction(action=ESCALATE, ...) to be constructed
without escalate_to in TieredTimeoutPolicy and EscalationChainPolicy; add
validation on TierConfig and EscalationChainConfig (or their pydantic model
validators) to reject or coerce ESCALATE when no escalate_to is provided: check
the fields TierConfig.on_timeout and EscalationChainConfig.on_chain_exhausted
for TimeoutActionType.ESCALATE and raise a validation error (or change default
to DENY) when escalate target/role is absent so downstream code in
TieredTimeoutPolicy and EscalationChainPolicy will never receive an ESCALATE
action without an escalate_to.
In `@src/ai_company/security/timeout/factory.py`:
- Around line 47-51: When building a TieredTimeoutPolicy from a
TieredTimeoutConfig, validate each tier in config.tiers and reject any tier
whose on_timeout/action is ESCALATE but which lacks an escalate_to target; raise
a clear exception (e.g., ValueError) describing the offending tier so the config
fails fast. Perform this check in the factory branch that returns
TieredTimeoutPolicy (where TieredTimeoutConfig is handled and
DefaultRiskTierClassifier is used) before constructing the TieredTimeoutPolicy.
- Around line 59-63: The logger.warning call in timeout.factory uses a raw event
name string ("timeout.factory.unknown_config"); replace it with a domain event
constant by adding/exporting a timeout event constant (e.g.,
TIMEOUT_FACTORY_UNKNOWN_CONFIG) under ai_company.observability.events.timeout
and then import it into src.ai_company.security.timeout.factory (from
ai_company.observability.events.timeout import TIMEOUT_FACTORY_UNKNOWN_CONFIG)
and use that constant in the logger.warning call (keep the same config_type
kwarg). Also remove the unused msg variable if no longer needed.
- Around line 47-50: TieredTimeoutConfig.tiers is a mutable dict on a frozen
Pydantic model; avoid sharing it directly with the runtime by deep-copying and
making it read-only before constructing the policy. In the branch that returns
TieredTimeoutPolicy, replace passing config.tiers directly with a deep copy
(copy.deepcopy(config.tiers)) and wrap the result with MappingProxyType to
produce an immutable mapping, and ensure you import copy and
types.MappingProxyType; apply this change where TieredTimeoutPolicy(...) is
constructed (referencing TieredTimeoutConfig, TieredTimeoutPolicy,
DefaultRiskTierClassifier, and the tiers field).
In `@src/ai_company/security/timeout/park_service.py`:
- Around line 37-83: After serializing the AgentContext in park(), validate that
the extracted internal identifiers match the outer fields: compare
context.execution_id, context.agent_id, and context.task_id against the values
being stored in ParkedContext (execution_id, agent_id, task_id) and raise
ValueError if any mismatch; when creating ParkedContext ensure you store the
canonical values only after this check. Likewise, in resume() when you
deserialize context_json back to an AgentContext, verify that the deserialized
context.execution_id, context.agent_id, and context.task_id match the
ParkedContext.execution_id, ParkedContext.agent_id, and ParkedContext.task_id
and reject/resist resumption if they differ. Ensure checks reference the
ParkedContext class and the park() and resume() methods and keep metadata
handling unchanged.
In `@src/ai_company/security/timeout/parked_context.py`:
- Around line 29-44: The metadata dict on the frozen Pydantic model
ParkedContext can still be mutated by callers; fix this by deep-copying and
wrapping it with MappingProxyType at creation so it becomes immutable. Add a
field validator for "metadata" (e.g., a `@field_validator`("metadata") on
ParkedContext) that does: metadata = copy.deepcopy(metadata) and return
MappingProxyType(metadata); ensure imports for copy and MappingProxyType are
added and that the validator runs during model instantiation so persisted
snapshots cannot be mutated after creation.
In `@src/ai_company/security/timeout/policies.py`:
- Around line 148-149: determine_action() currently always uses
self._classifier.classify(item.action_type) and ignores TierConfig.actions;
change the logic to first check each TierConfig in self._tiers for a non-empty
actions tuple containing item.action_type and select that tier_config if found,
and only if no TierConfig.actions match fall back to calling
self._classifier.classify(item.action_type) and lookup tier_config by
risk_level.value; update references to tier_config, self._classifier.classify,
and TierConfig.actions accordingly so pinned actions are honored.
In `@src/ai_company/security/timeout/risk_tier_classifier.py`:
- Around line 12-45: Remove the duplicated _DEFAULT_RISK_MAP definition and
instead import and reuse the shared risk map from the existing risk classifier
module (e.g., import DEFAULT_RISK_MAP or the exported map from risk_classifier).
Update references in this file (any uses of _DEFAULT_RISK_MAP, ApprovalRiskLevel
and ActionType lookups) to use the imported map so timeout classification and
security use the single source of truth defined in risk_classifier.py.
In `@src/ai_company/security/timeout/timeout_checker.py`:
- Around line 36-68: Add a guard at the start of check to skip policy evaluation
for items whose ApprovalItem.status is not ApprovalStatus.PENDING: return a
no-op/neutral TimeoutAction (and log the skip) instead of calling
self._policy.determine_action; also ensure check_and_resolve performs the same
status guard before applying any resolution so already-APPROVED/REJECTED/EXPIRED
items cannot be overwritten by timeout logic.
In `@src/ai_company/templates/renderer.py`:
- Around line 526-529: The branch that handles raw_autonomy dicts reuses the
parsed dict instance (raw_autonomy) and assigns it directly to autonomy, causing
aliasing if later normalization/validation mutates it; change the assignment to
make a deep copy (e.g., autonomy = copy.deepcopy(raw_autonomy)) and add an
import for the copy module at the top of the file so the config tree always
receives an isolated dict instance (refer to raw_autonomy and the autonomy
variable in renderer.py).
- Around line 675-678: The code may pass a non-string YAML scalar as preset_name
into get_personality_preset (which calls .strip()), causing AttributeError
instead of the renderer's TemplateRenderError; before calling
get_personality_preset(preset_name) validate that preset_name is an instance of
str and, if not, raise TemplateRenderError with a clear message about invalid
preset type; keep the existing KeyError handling for unknown preset names so the
try/except around get_personality_preset still catches KeyError but the type
check prevents AttributeError from escaping.
- Around line 68-70: The module-level Jinja2 filter "auto" on _JINJA_ENV
incorrectly uses "value or ''" which collapses valid falsy values like 0 or
False; change the filter to only treat None or Jinja2 Undefined as missing —
i.e., import Jinja2's Undefined and update the lambda for
_JINJA_ENV.filters["auto"] to return "" when value is None or isinstance(value,
Undefined), otherwise return the original value (preserving 0, 0.0, False).
In `@tests/unit/api/conftest.py`:
- Around line 198-211: The fake in-memory repo stores and returns the same
ParkedContext instances which share mutable metadata dicts; modify the methods
in the test fake (save, get, get_by_approval, get_by_agent) to deepcopy
ParkedContext objects at the persistence boundary (use import copy at top), i.e.
store copy.deepcopy(context) inside save and return copy.deepcopy(...) from get,
get_by_approval, and get_by_agent so callers receive independent copies
mirroring SQLiteParkedContextRepository serialization behavior.
In `@tests/unit/engine/test_loop_protocol.py`:
- Around line 33-36: Add tests exercising the new PARKED rule: create one
passing test that constructs an ExecutionResult with
termination=TerminationReason.PARKED and asserts that result.error_message is
None (or that validation succeeds), and add one failing test that attempts to
create/validate an ExecutionResult with termination=TerminationReason.PARKED and
a non-None error_message and asserts the operation raises the expected
validation exception (e.g., ValueError or AssertionError); reference
ExecutionResult and TerminationReason.PARKED in the new tests so the
PARKED->error_message=None contract is enforced.
In `@tests/unit/persistence/sqlite/test_migrations.py`:
- Around line 94-115: Update the two tests to validate the parked_contexts
schema, not just names: after run_migrations in
test_v3_creates_parked_contexts_table call "PRAGMA table_info(parked_contexts)"
and assert the returned column names include the repository-required columns
(e.g., "id", "agent_id", "approval_id", plus any expected timestamp or payload
columns your code depends on); in test_v3_creates_parked_context_indexes after
finding index names run "PRAGMA index_info('idx_pc_agent_id')" and "PRAGMA
index_info('idx_pc_approval_id')" and assert those index_info results reference
the proper indexed columns ("agent_id" and "approval_id" respectively); keep
using run_migrations and the same test function names to locate where to add
these assertions.
In `@tests/unit/persistence/sqlite/test_parked_context_repo.py`:
- Around line 7-18: Add a module-level 30-second timeout by defining pytestmark
= pytest.mark.timeout(30) near the top of the test file (after the imports) so
all async tests in this module (which use SQLiteParkedContextRepository and
ParkedContext) get a global timeout; use the pytestmark symbol and
pytest.mark.timeout to implement this.
In `@tests/unit/persistence/test_migrations_v2.py`:
- Around line 31-32: The test test_schema_version_is_three currently only
asserts SCHEMA_VERSION == 3 and therefore misses verifying that v3 artifacts
were actually created; update this test to also query the database schema to
assert the parked_contexts table exists and that the two new indexes
idx_pc_agent_id and idx_pc_approval_id are present (on the upgrade path where
_apply_v3() should run). Locate test_schema_version_is_three and after
confirming SCHEMA_VERSION, run the same connection/PRAGMA or sqlite_master
queries used elsewhere in the suite to verify existence of the parked_contexts
table and that entries for idx_pc_agent_id and idx_pc_approval_id exist; fail
the test if any of those are missing so partial or skipped _apply_v3() is
caught. Ensure you reference the same DB handle/fixture used by other migration
tests so the checks run against the upgraded DB instance.
In `@tests/unit/security/autonomy/test_change_strategy.py`:
- Around line 62-70: Update the test_double_downgrade_preserves_original to
ensure the original_level is preserved: when using HumanOnlyPromotionStrategy
call auto_downgrade the first time with an explicit current_level (e.g.,
AutonomyLevel.HIGH or similar) for agent-1, then call auto_downgrade a second
time with a different DowngradeReason, fetch the override via get_override and
add an assertion that override.original_level equals the explicit level you
seeded; keep the existing assertions for override.current_level and
override.reason to verify the second downgrade replaced only the current state.
In `@tests/unit/security/autonomy/test_models.py`:
- Around line 122-125: The test_config_frozen currently only checks assignment
to the AutonomyConfig.level attribute but misses in-place mutation of the
mutable AutonomyConfig.presets dict; update the test_config_frozen to attempt an
in-place change to config.presets (e.g., mutating an existing key or adding a
key) and assert that this raises an exception or is prevented, and if underlying
model does not yet protect presets, change the AutonomyConfig construction to
deep-copy incoming presets and wrap them with MappingProxyType (using
copy.deepcopy in the AutonomyConfig __init__ or validator) so presets is
read-only; reference AutonomyConfig, presets, test_config_frozen, copy.deepcopy,
and MappingProxyType when making the fixes.
In `@tests/unit/security/autonomy/test_resolver.py`:
- Around line 106-129: Add a test that exercises the public API by calling
resolver.resolve(...) to ensure seniority enforcement is applied there as well:
in TestSeniorityValidation add a case that calls
resolver.resolve(agent_level=AutonomyLevel.FULL,
seniority=SeniorityLevel.JUNIOR) and asserts it raises ValueError (matching
"FULL autonomy"); keep existing validate_seniority() checks but include this
resolve(...) call so the public resolve method is validated for the same JUNIOR
+ FULL rejection.
In `@tests/unit/security/test_service.py`:
- Around line 489-579: Add a new async test (e.g.,
test_auto_approve_blocked_for_high_or_critical_risk) that constructs an
EffectiveAutonomy with the action present in auto_approve_actions, then creates
a context for that action with a HIGH (and/or CRITICAL) risk level (use
_make_context(action_type="...", risk_level=RiskLevel.HIGH) or otherwise mock
the risk classifier), calls service.evaluate_pre_tool(ctx) and asserts the
result is NOT SecurityVerdictType.ALLOW and that auto-approval was not used
(e.g., service._test_rule_engine.evaluate.assert_called_once() or that the
verdict.reason mentions escalation/review); reference EffectiveAutonomy,
auto_approve_actions, evaluate_pre_tool, and SecurityVerdictType.ALLOW when
adding the test.
In `@tests/unit/security/timeout/test_factory.py`:
- Around line 30-47: Update each test to assert behavior/wiring, not just type:
after calling create_timeout_policy with DenyOnTimeoutConfig(timeout_minutes=60)
assert the returned DenyOnTimeoutPolicy has its internal timeout represented as
3600 seconds (verify the concrete attribute on DenyOnTimeoutPolicy that stores
seconds); for TieredTimeoutConfig assert the returned TieredTimeoutPolicy
preserved the provided tier configuration (compare the policy's tiers/config
property to the original TieredTimeoutConfig values); for EscalationChainConfig
assert the returned EscalationChainPolicy preserved the chain and
on_chain_exhausted values (verify the policy.chain contains the
EscalationStep(role="lead", timeout_minutes=30) data and
policy.on_chain_exhausted equals TimeoutActionType.DENY). Ensure you reference
create_timeout_policy, DenyOnTimeoutConfig, DenyOnTimeoutPolicy,
TieredTimeoutConfig, TieredTimeoutPolicy, EscalationChainConfig,
EscalationChainPolicy, EscalationStep, and TimeoutActionType when locating the
fields to assert.
In `@tests/unit/security/timeout/test_park_service.py`:
- Around line 75-92: The test test_resume_restores_context uses
_make_agent_context() which returns an AgentContext with task_execution=None, so
add a real task-bound context before parking: build an AgentContext with a
non-None task_execution containing a Task (with id "task-1" or similar) and use
that context when calling ParkService().park(...); after resume, assert that
restored.task_execution is not None and that restored.task_execution.task.id
equals the original task id to ensure the task survives the round-trip through
ParkService.park and ParkService.resume.
In `@tests/unit/security/timeout/test_parked_context.py`:
- Around line 54-59: Update the frozen-model test and the persistence
deserialization: in the test_frozen() for the ParkedContext model, add an
assertion that an in-place mutation like parked.metadata["key"] = "value" either
raises (or does not change the model when re-fetched) to cover dict mutation on
a frozen Pydantic model; and in the persistence layer method _row_to_model (the
function that converts DB rows into ParkedContext instances) wrap the
deserialized metadata with copy.deepcopy(json.loads(raw_meta)) before
constructing the ParkedContext so the model receives a deep-copied dict and
in-place mutations at call sites cannot mutate shared state.
In `@tests/unit/security/timeout/test_policies.py`:
- Around line 89-141: Add tests that ensure the HIGH and CRITICAL tiers cannot
be auto-approved even if their TierConfig.on_timeout is set to APPROVE: create
TieredTimeoutPolicy instances with tiers like {"high":
TierConfig(timeout_minutes=1, on_timeout=TimeoutActionType.APPROVE)} and
{"critical": TierConfig(...)} and use DefaultRiskTierClassifier plus
_make_item(...) with an action_type that maps to HIGH/CRITICAL (e.g.,
"secrets:write" or another classifier-recognized high/critical action); call
policy.determine_action(item, elapsed_seconds) with elapsed_seconds >
timeout_minutes*60 and assert the returned result.action is
TimeoutActionType.WAIT (repeat or parametrize for both HIGH and CRITICAL to
cover both branches).
In `@tests/unit/security/timeout/test_risk_tier_classifier.py`:
- Around line 1-64: Add the module-level pytest timeout marker by defining
pytestmark = pytest.mark.timeout(30) immediately after the imports in this test
module; update the top of the file (near the imports that include pytest and
DefaultRiskTierClassifier) so the module-level marker applies to all tests (no
changes needed to DefaultRiskTierClassifier or individual test functions).
---
Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 683-726: The current _make_security_interceptor returns None when
SecurityConfig is missing/disabled which disables autonomy enforcement; change
it to fail-closed by returning an autonomy-only interceptor that implements
SecurityInterceptionStrategy (instead of None) so autonomy rules still apply
even when other security detectors are off: locate _make_security_interceptor
and where it currently returns None for self._security_config is None or not
self._security_config.enabled, and replace that branch with construction/return
of a minimal interceptor (e.g., an AutonomyEnforcementInterceptor or a
SecOpsService instance configured only with effective_autonomy,
self._approval_store, and self._audit_log and no detectors) so
_make_tool_invoker can still rely on this interceptor to park/require approval
according to effective_autonomy.
In `@tests/unit/persistence/test_protocol.py`:
- Around line 126-186: Update the test to explicitly verify that the fake
repository type matches the repository protocol: import ParkedContextRepository
and add a conformance assertion for _FakeParkedContextRepository (e.g., assert
isinstance(_FakeParkedContextRepository(), ParkedContextRepository) or an
equivalent runtime/type-check that your test suite uses) inside
test_fake_backend_is_persistence_backend so that signature drift on
_FakeParkedContextRepository is caught; reference the symbols
_FakeParkedContextRepository and ParkedContextRepository when adding the check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 45709ea0-f32a-4c74-b095-2e041b61759b
📒 Files selected for processing (59)
src/ai_company/api/controllers/__init__.pysrc/ai_company/api/controllers/autonomy.pysrc/ai_company/config/schema.pysrc/ai_company/core/agent.pysrc/ai_company/core/company.pysrc/ai_company/core/enums.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/loop_protocol.pysrc/ai_company/engine/prompt.pysrc/ai_company/engine/prompt_template.pysrc/ai_company/observability/events/autonomy.pysrc/ai_company/observability/events/persistence.pysrc/ai_company/observability/events/timeout.pysrc/ai_company/persistence/protocol.pysrc/ai_company/persistence/repositories.pysrc/ai_company/persistence/sqlite/backend.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/persistence/sqlite/parked_context_repo.pysrc/ai_company/security/autonomy/__init__.pysrc/ai_company/security/autonomy/change_strategy.pysrc/ai_company/security/autonomy/models.pysrc/ai_company/security/autonomy/protocol.pysrc/ai_company/security/autonomy/resolver.pysrc/ai_company/security/service.pysrc/ai_company/security/timeout/__init__.pysrc/ai_company/security/timeout/config.pysrc/ai_company/security/timeout/factory.pysrc/ai_company/security/timeout/models.pysrc/ai_company/security/timeout/park_service.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/security/timeout/policies.pysrc/ai_company/security/timeout/protocol.pysrc/ai_company/security/timeout/risk_tier_classifier.pysrc/ai_company/security/timeout/timeout_checker.pysrc/ai_company/templates/renderer.pytests/unit/api/conftest.pytests/unit/api/controllers/test_autonomy.pytests/unit/core/conftest.pytests/unit/core/test_company.pytests/unit/engine/test_loop_protocol.pytests/unit/engine/test_prompt.pytests/unit/observability/test_events.pytests/unit/persistence/sqlite/test_migrations.pytests/unit/persistence/sqlite/test_parked_context_repo.pytests/unit/persistence/test_migrations_v2.pytests/unit/persistence/test_protocol.pytests/unit/security/autonomy/__init__.pytests/unit/security/autonomy/test_change_strategy.pytests/unit/security/autonomy/test_models.pytests/unit/security/autonomy/test_resolver.pytests/unit/security/test_service.pytests/unit/security/timeout/__init__.pytests/unit/security/timeout/test_config.pytests/unit/security/timeout/test_factory.pytests/unit/security/timeout/test_park_service.pytests/unit/security/timeout/test_parked_context.pytests/unit/security/timeout/test_policies.pytests/unit/security/timeout/test_risk_tier_classifier.pytests/unit/security/timeout/test_timeout_checker.py
| app_state: AppState = state.app_state | ||
| config = app_state.config.config | ||
| level = config.autonomy.level | ||
| return ApiResponse( | ||
| data=AutonomyLevelResponse( | ||
| agent_id=agent_id, | ||
| level=level, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
These endpoints never read or write per-agent autonomy state.
get_autonomy() always returns config.autonomy.level with promotion_pending=False, and update_autonomy() only logs before echoing the same level back with promotion_pending=True. That ignores agent overrides, department overrides, active downgrade overrides, seniority validation, and any persisted pending request, so every agent reports the same value and a follow-up GET immediately loses the pending state. It also emits AUTONOMY_PROMOTION_DENIED for requests the API is treating as pending. This needs to go through the actual AutonomyResolver / AutonomyChangeStrategy flow plus a store for pending requests.
Also applies to: 105-132
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/api/controllers/autonomy.py` around lines 73 - 81,
get_autonomy() and update_autonomy() currently ignore per-agent/department
overrides and pending requests by always using config.autonomy.level and
flipping promotion_pending manually; replace this by routing both handlers
through the AutonomyResolver and AutonomyChangeStrategy so they compute the
effective level and validation logic, consult and update the persistent
pending-request store, and emit events only after the strategy decides (e.g.,
persist pending vs deny vs apply). Specifically: in get_autonomy() return
AutonomyLevelResponse built from AutonomyResolver.resolve(agent_id) plus
promotion_pending read from the pending-request store; in update_autonomy() call
AutonomyChangeStrategy.request_change(agent_id, requested_level, actor) (or the
resolver API) to validate/apply/persist the change, persist any pending request
in the store, and return the resulting effective level and promotion_pending
flag instead of echoing config.autonomy.level; ensure AUTONOMY_PROMOTION_DENIED
is emitted only when the strategy denies, not when a request is simply pending.
| @pytest.mark.unit | ||
| def test_deny_on_timeout(self) -> None: | ||
| result = create_timeout_policy(DenyOnTimeoutConfig(timeout_minutes=60)) | ||
| assert isinstance(result, DenyOnTimeoutPolicy) | ||
|
|
||
| @pytest.mark.unit | ||
| def test_tiered(self) -> None: | ||
| result = create_timeout_policy(TieredTimeoutConfig()) | ||
| assert isinstance(result, TieredTimeoutPolicy) | ||
|
|
||
| @pytest.mark.unit | ||
| def test_escalation_chain(self) -> None: | ||
| config = EscalationChainConfig( | ||
| chain=(EscalationStep(role="lead", timeout_minutes=30),), | ||
| on_chain_exhausted=TimeoutActionType.DENY, | ||
| ) | ||
| result = create_timeout_policy(config) | ||
| assert isinstance(result, EscalationChainPolicy) |
There was a problem hiding this comment.
Assert the factory wiring, not just the concrete class.
These cases only verify dispatch. A regression in the timeout_minutes -> seconds conversion or in forwarding chain / on_chain_exhausted would still pass as long as the returned class stays the same. Please add at least one behavior-level assertion per configurable policy so this suite catches broken wiring as well as wrong type selection.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/timeout/test_factory.py` around lines 30 - 47, Update
each test to assert behavior/wiring, not just type: after calling
create_timeout_policy with DenyOnTimeoutConfig(timeout_minutes=60) assert the
returned DenyOnTimeoutPolicy has its internal timeout represented as 3600
seconds (verify the concrete attribute on DenyOnTimeoutPolicy that stores
seconds); for TieredTimeoutConfig assert the returned TieredTimeoutPolicy
preserved the provided tier configuration (compare the policy's tiers/config
property to the original TieredTimeoutConfig values); for EscalationChainConfig
assert the returned EscalationChainPolicy preserved the chain and
on_chain_exhausted values (verify the policy.chain contains the
EscalationStep(role="lead", timeout_minutes=30) data and
policy.on_chain_exhausted equals TimeoutActionType.DENY). Ensure you reference
create_timeout_policy, DenyOnTimeoutConfig, DenyOnTimeoutPolicy,
TieredTimeoutConfig, TieredTimeoutPolicy, EscalationChainConfig,
EscalationChainPolicy, EscalationStep, and TimeoutActionType when locating the
fields to assert.
| def test_resume_restores_context(self) -> None: | ||
| """Parks then resumes, verifies round-trip fidelity.""" | ||
| context = _make_agent_context() | ||
| service = ParkService() | ||
|
|
||
| parked = service.park( | ||
| context=context, | ||
| approval_id="approval-1", | ||
| agent_id="agent-1", | ||
| task_id="task-1", | ||
| ) | ||
|
|
||
| restored = service.resume(parked) | ||
|
|
||
| assert restored.execution_id == context.execution_id | ||
| assert restored.turn_count == context.turn_count | ||
| assert restored.identity.name == context.identity.name | ||
| assert restored.identity.role == context.identity.role |
There was a problem hiding this comment.
Round-trip coverage never exercises a task-bound context.
_make_agent_context() builds AgentContext(task_execution=None), so this test would still pass if park/resume silently dropped the blocked task. Since the feature is parking approval-gated work, please create the context with a real task and assert restored.task_execution.task.id survives the round-trip.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/timeout/test_park_service.py` around lines 75 - 92, The
test test_resume_restores_context uses _make_agent_context() which returns an
AgentContext with task_execution=None, so add a real task-bound context before
parking: build an AgentContext with a non-None task_execution containing a Task
(with id "task-1" or similar) and use that context when calling
ParkService().park(...); after resume, assert that restored.task_execution is
not None and that restored.task_execution.task.id equals the original task id to
ensure the task survives the round-trip through ParkService.park and
ParkService.resume.
| def test_frozen(self) -> None: | ||
| """Cannot modify fields on a frozen model.""" | ||
| parked = _make_parked_context() | ||
| with pytest.raises(ValidationError): | ||
| parked.agent_id = "other" # type: ignore[misc] | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Pydantic v2, does ConfigDict(frozen=True)prevent in-place mutation ofdictorlist fields, or only attribute reassignment?
💡 Result:
ConfigDict(frozen=True) only makes the model itself “faux-immutable” by blocking attribute setting (i.e., disallowing __setattr__, and thus preventing model.field = ...)—it does not deep-freeze contained values. So if a field value is a mutable object like a list or dict, in-place mutation still works (e.g., model.my_list.append(...), model.my_dict["k"]=...) because you’re not reassigning the attribute, you’re mutating the object it references. [1], [2]
Implication: if you need true immutability for containers, use immutable types (e.g., tuple, frozenset, immutable mapping wrappers) or wrap/convert during validation. [1], [2]
Sources
- Pydantic config docs (
frozen: blocks__setattr__) https://docs.pydantic.dev/latest/api/config/ - Example noting underlying
dictremains mutable even when model is frozen https://www.jujens.eu/posts/en/2025/Apr/19/using-custom-classes-pydantic/
🏁 Script executed:
cat -n src/ai_company/security/timeout/parked_context.py | head -50Repository: Aureliolo/ai-company
Length of output: 2092
🏁 Script executed:
# Check where ParkedContext is used and how metadata is accessed/modified
rg -n "ParkedContext" --type py -A 2 -B 2 | head -100Repository: Aureliolo/ai-company
Length of output: 7561
🏁 Script executed:
# Check if metadata is mutated anywhere or if deepcopy is used
rg -n "\.metadata" --type py -B 1 -A 1 | head -80Repository: Aureliolo/ai-company
Length of output: 4971
🏁 Script executed:
# Check specifically for ParkedContext metadata mutations or deepcopy usage
rg -n "parked.*metadata|metadata.*parked" --type py -B 2 -A 2Repository: Aureliolo/ai-company
Length of output: 3988
🏁 Script executed:
# Check the persistence repo to see if deepcopy is used when loading/saving ParkedContext
cat -n src/ai_company/persistence/sqlite/parked_context_repo.py | sed -n '40,90p'Repository: Aureliolo/ai-company
Length of output: 2250
🏁 Script executed:
# Check the complete loading logic for ParkedContext
cat -n src/ai_company/persistence/sqlite/parked_context_repo.py | sed -n '84,115p'Repository: Aureliolo/ai-company
Length of output: 1433
🏁 Script executed:
# Find the _row_to_model method
cat -n src/ai_company/persistence/sqlite/parked_context_repo.py | sed -n '1,40p'Repository: Aureliolo/ai-company
Length of output: 1535
🏁 Script executed:
# Search for _row_to_model method
rg -n "_row_to_model" src/ai_company/persistence/sqlite/parked_context_repo.py -A 15Repository: Aureliolo/ai-company
Length of output: 2986
🏁 Script executed:
# Verify the test file doesn't have any in-place mutation tests for metadata
cat -n tests/unit/security/timeout/test_parked_context.py | sed -n '54,85p'Repository: Aureliolo/ai-company
Length of output: 1386
Extend the frozen model test to cover in-place metadata mutation, and apply copy.deepcopy() to the metadata field at the persistence boundary.
Pydantic v2's frozen=True only prevents attribute reassignment (line 58 checks this), but in-place mutation of dict fields still works. The metadata field on the frozen ParkedContext model can be mutated via parked.metadata["key"] = value, which the current test misses.
Additionally, per coding guidelines, dict/list fields on frozen Pydantic models must use copy.deepcopy() at system boundaries. The persistence layer's _row_to_model() method (line 171–181 in src/ai_company/persistence/sqlite/parked_context_repo.py) deserializes metadata without deepcopy, violating this requirement.
Fixes needed:
- Update
test_frozen()to verifyparked.metadata["key"] = valueraises an error or does not persist - Apply
copy.deepcopy(json.loads(raw_meta))in_row_to_model()before constructingParkedContext
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/timeout/test_parked_context.py` around lines 54 - 59,
Update the frozen-model test and the persistence deserialization: in the
test_frozen() for the ParkedContext model, add an assertion that an in-place
mutation like parked.metadata["key"] = "value" either raises (or does not change
the model when re-fetched) to cover dict mutation on a frozen Pydantic model;
and in the persistence layer method _row_to_model (the function that converts DB
rows into ParkedContext instances) wrap the deserialized metadata with
copy.deepcopy(json.loads(raw_meta)) before constructing the ParkedContext so the
model receives a deep-copied dict and in-place mutations at call sites cannot
mutate shared state.
| """Tests for DefaultRiskTierClassifier.""" | ||
|
|
||
| import pytest | ||
|
|
||
| from ai_company.core.enums import ActionType, ApprovalRiskLevel | ||
| from ai_company.security.timeout.risk_tier_classifier import DefaultRiskTierClassifier | ||
|
|
||
|
|
||
| class TestDefaultMapping: | ||
| """Default risk tier mapping.""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_critical_actions(self) -> None: | ||
| classifier = DefaultRiskTierClassifier() | ||
| expected = ApprovalRiskLevel.CRITICAL | ||
| assert classifier.classify(ActionType.DEPLOY_PRODUCTION) == expected | ||
| assert classifier.classify(ActionType.DB_ADMIN) == expected | ||
|
|
||
| @pytest.mark.unit | ||
| def test_high_actions(self) -> None: | ||
| classifier = DefaultRiskTierClassifier() | ||
| assert classifier.classify(ActionType.VCS_PUSH) == ApprovalRiskLevel.HIGH | ||
| assert classifier.classify(ActionType.CODE_DELETE) == ApprovalRiskLevel.HIGH | ||
|
|
||
| @pytest.mark.unit | ||
| def test_medium_actions(self) -> None: | ||
| classifier = DefaultRiskTierClassifier() | ||
| assert classifier.classify(ActionType.CODE_WRITE) == ApprovalRiskLevel.MEDIUM | ||
|
|
||
| @pytest.mark.unit | ||
| def test_low_actions(self) -> None: | ||
| classifier = DefaultRiskTierClassifier() | ||
| assert classifier.classify(ActionType.CODE_READ) == ApprovalRiskLevel.LOW | ||
| assert classifier.classify(ActionType.TEST_RUN) == ApprovalRiskLevel.LOW | ||
|
|
||
|
|
||
| class TestUnknownFallback: | ||
| """Unknown action types default to HIGH (D19).""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_unknown_defaults_to_high(self) -> None: | ||
| classifier = DefaultRiskTierClassifier() | ||
| assert classifier.classify("unknown:action") == ApprovalRiskLevel.HIGH | ||
|
|
||
|
|
||
| class TestCustomMap: | ||
| """Custom risk overrides.""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_custom_override(self) -> None: | ||
| classifier = DefaultRiskTierClassifier( | ||
| custom_map={ActionType.CODE_READ: ApprovalRiskLevel.CRITICAL} | ||
| ) | ||
| assert classifier.classify(ActionType.CODE_READ) == ApprovalRiskLevel.CRITICAL | ||
|
|
||
| @pytest.mark.unit | ||
| def test_custom_preserves_defaults(self) -> None: | ||
| classifier = DefaultRiskTierClassifier( | ||
| custom_map={"custom:action": ApprovalRiskLevel.LOW} | ||
| ) | ||
| # Default still works. | ||
| assert classifier.classify(ActionType.CODE_READ) == ApprovalRiskLevel.LOW | ||
| # Custom also works. | ||
| assert classifier.classify("custom:action") == ApprovalRiskLevel.LOW |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for timeout marker usage in test files
fd -e py "tests/" -type f | head -20 | xargs grep -l "timeout" 2>/dev/null | head -10Repository: Aureliolo/ai-company
Length of output: 234
🏁 Script executed:
# Look for pytest configuration files
fd -e ini -e toml -e cfg "pytest|setup" -type f | head -10Repository: Aureliolo/ai-company
Length of output: 234
🏁 Script executed:
# Search more broadly for timeout in all test files
rg "pytest\.mark\.timeout|timeout.*30" tests/ -t py -A 2 -B 2Repository: Aureliolo/ai-company
Length of output: 50376
🏁 Script executed:
# Check if the specific file has the timeout marker
cat tests/unit/security/timeout/test_risk_tier_classifier.py | head -10Repository: Aureliolo/ai-company
Length of output: 338
Add module-level timeout marker to align with test policy.
Add pytestmark = pytest.mark.timeout(30) after the imports. This is required for all unit test files per coding guidelines and is already consistently used throughout the test suite.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/timeout/test_risk_tier_classifier.py` around lines 1 -
64, Add the module-level pytest timeout marker by defining pytestmark =
pytest.mark.timeout(30) immediately after the imports in this test module;
update the top of the file (near the imports that include pytest and
DefaultRiskTierClassifier) so the module-level marker applies to all tests (no
changes needed to DefaultRiskTierClassifier or individual test functions).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agent system by introducing robust autonomy management and approval timeout mechanisms. It allows for dynamic control over agent decision-making, ensuring that actions requiring human oversight are properly routed and handled within defined timeframes. The changes also include a new persistence layer for parking agent execution contexts, enabling seamless resumption of tasks after approvals. This foundational work improves system security, operational control, and agent reliability. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant new features for agent autonomy and approval timeouts. The changes are extensive, adding new models, services, persistence layers, and API endpoints. The implementation of autonomy levels with a three-tier resolution chain and pluggable timeout policies is well-structured. The code also includes several important fixes, such as preventing MemoryError from being caught in broad exception blocks and improving the template rendering performance.
My review focuses on a few areas for improvement. The new AutonomyController currently uses the company-wide default autonomy level instead of resolving the agent-specific level, which is a correctness issue. I've also pointed out a couple of smaller maintainability improvements related to magic numbers and logging clarity. Overall, this is a substantial and well-executed feature addition.
Note: Security Review did not run due to the size of the PR.
| """ | ||
| app_state: AppState = state.app_state | ||
| config = app_state.config.config | ||
| level = config.autonomy.level |
There was a problem hiding this comment.
This implementation returns the company-wide default autonomy level, not the agent-specific one. The endpoint GET /agents/{agent_id}/autonomy implies it should return the level for the specified agent_id. This could be misleading for clients of the API. Consider fetching the agent's data and using the AutonomyResolver to determine their effective autonomy level.
| """ | ||
| app_state: AppState = state.app_state | ||
| config = app_state.config.config | ||
| current_level = config.autonomy.level |
There was a problem hiding this comment.
Similar to get_autonomy, this method uses the company-wide default autonomy level as the current_level. This is likely incorrect as it doesn't reflect the agent's actual current autonomy level, which might have overrides. This could lead to incorrect logging and behavior. You should resolve the agent-specific autonomy level here as well.
| Thresholds: 0.0-0.24 → locked, 0.25-0.49 → supervised, | ||
| 0.5-0.79 → semi, 0.8-1.0 → full. | ||
| """ | ||
| if value < 0.25: # noqa: PLR2004 |
| except (sqlite3.Error, aiosqlite.Error) as exc: | ||
| msg = f"Failed to delete parked context {parked_id!r}" | ||
| logger.exception( | ||
| PERSISTENCE_PARKED_CONTEXT_QUERY_FAILED, |
There was a problem hiding this comment.
The log event PERSISTENCE_PARKED_CONTEXT_QUERY_FAILED is used here for a delete operation failure. This is misleading for anyone debugging persistence issues. It would be better to use a more specific event like PERSISTENCE_PARKED_CONTEXT_DELETE_FAILED. You may need to define this new event constant.
Implement four autonomy levels (full/semi/supervised/locked) with three-level resolution chain (agent→department→company), per-action classification with category expansion, seniority validation, and runtime changes via pluggable strategy. Add four timeout policies (wait-forever/deny/tiered/escalation-chain) with risk tier classification, parked context persistence, and timeout checker. - Phase 1: AutonomyLevel/DowngradeReason enums, AutonomyPreset, AutonomyConfig, EffectiveAutonomy, AutonomyResolver, HumanOnlyPromotionStrategy, AutonomyChangeStrategy protocol - Phase 2: CompanyConfig.autonomy float→AutonomyConfig migration, Department.autonomy_level, AgentIdentity.autonomy_level - Phase 3: SecOpsService autonomy pre-check (auto-approve/escalate), AgentEngine effective_autonomy param, AutonomyController REST API - Phase 4: Effective autonomy section in system prompt template - Phase 5: TimeoutActionType enum, TimeoutPolicy protocol, four policy implementations, discriminated union config, factory - Phase 6: ParkedContext model, ParkedContextRepository protocol, SQLite implementation, v3 migration - Phase 7: CompanyConfig.approval_timeout field - Phase 8: ParkService park/resume, TimeoutChecker, PARKED termination reason Closes #42, Closes #126
Pre-reviewed by 10 agents, 51 findings addressed: - Fix autonomy controller returning requested level instead of current - Add disjoint validator on EffectiveAutonomy action sets - Add escalate_to consistency validator on TimeoutAction - Add seniority constraint enforcement in AutonomyResolver - Add MemoryError/RecursionError re-raise in security service - Fix _row_to_model to raise QueryError instead of returning None - Rename YamlRiskTierClassifier to DefaultRiskTierClassifier - Move Jinja2 env to module-level singleton in renderer - Fix personality mutation pattern (return instead of mutate) - Add security guard blocking auto-approve for HIGH/CRITICAL risk - Fix immutability violations (deepcopy metadata, immutable dicts) - Enumerate columns explicitly in SELECT queries - Register AutonomyController with app router - Add comprehensive tests for new code paths
…dates - Fix circular import in security/autonomy/__init__.py (removed eager AutonomyResolver import that caused core→security→core cycle) - Fix CompanyConfigFactory to pin approval_timeout=WaitForeverConfig() - Update prompt template version assertions to 1.4.0 - Fix test_non_pending_item_raises to supply decided_at/decided_by - Replace assert with restructured control flow in TieredTimeoutPolicy - Update DESIGN_SPEC.md, CLAUDE.md, README.md for autonomy/timeout docs
1855156 to
4deae2a
Compare
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Same Python 2 except syntax error (second occurrence)
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 239-240
Comment:
**Same Python 2 `except` syntax error (second occurrence)**
```suggestion
except (MemoryError, RecursionError):
```
How can I resolve this? If you propose a fix, please make it concise.| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Same Python 2 except syntax error (third occurrence)
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 338-339
Comment:
**Same Python 2 `except` syntax error (third occurrence)**
```suggestion
except (MemoryError, RecursionError):
```
How can I resolve this? If you propose a fix, please make it concise.| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Same Python 2 except syntax error (fourth occurrence)
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/service.py
Line: 390-391
Comment:
**Same Python 2 `except` syntax error (fourth occurrence)**
```suggestion
except (MemoryError, RecursionError):
```
How can I resolve this? If you propose a fix, please make it concise.| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Same Python 2 except syntax error in timeout_checker.py
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/timeout/timeout_checker.py
Line: 65-66
Comment:
**Same Python 2 `except` syntax error in `timeout_checker.py`**
```suggestion
except (MemoryError, RecursionError):
```
How can I resolve this? If you propose a fix, please make it concise.| if elapsed_seconds < step_end: | ||
| if idx == 0: | ||
| # First step hasn't timed out yet — WAIT. | ||
| logger.debug( | ||
| TIMEOUT_WAITING, | ||
| approval_id=item.id, | ||
| escalation_role=step.role, | ||
| elapsed_seconds=elapsed_seconds, | ||
| ) | ||
| return TimeoutAction( | ||
| action=TimeoutActionType.WAIT, | ||
| reason=( | ||
| f"Waiting at {step.role!r} — " | ||
| f"{elapsed_seconds:.0f}s of " | ||
| f"{step_end:.0f}s elapsed" | ||
| ), | ||
| ) | ||
| # Previous step timed out — escalate to this step's role. | ||
| logger.info( | ||
| TIMEOUT_ESCALATED, | ||
| approval_id=item.id, | ||
| escalation_role=step.role, | ||
| elapsed_seconds=elapsed_seconds, | ||
| ) | ||
| return TimeoutAction( | ||
| action=TimeoutActionType.ESCALATE, | ||
| reason=( | ||
| f"Escalated to {step.role!r} — {elapsed_seconds:.0f}s elapsed" | ||
| ), | ||
| escalate_to=step.role, |
There was a problem hiding this comment.
EscalationChainPolicy escalates to the wrong role — off-by-one in chain resolution
When step idx-1's timeout expires and the loop reaches idx, the code escalates to step[idx].role (the current step's role). But the semantics of the chain are that expiring step N should escalate to step N's role, not step N+1's role. The current code always skips step[0].role as an escalation target entirely.
Consider a two-step chain [team_lead(10 min), manager(20 min)]:
| elapsed | Expected | Actual |
|---|---|---|
| 0–10 min | WAIT | WAIT ✓ |
| 10–30 min | ESCALATE → team_lead |
ESCALATE → manager ✗ |
| 30+ min | ESCALATE → manager / exhausted |
exhausted ✗ |
For a single-step chain [cto(60 min)], cto is never escalated to at all — the policy immediately returns on_chain_exhausted after 60 minutes without ever issuing a ESCALATE action.
The fix is to use the previous step's role when deciding where to escalate:
for idx, step in enumerate(self._chain):
step_timeout = step.timeout_minutes * _SECONDS_PER_MINUTE
step_end = cumulative_seconds + step_timeout
if elapsed_seconds < step_end:
if idx == 0:
# Waiting for the initial reviewer — no escalation yet.
return TimeoutAction(
action=TimeoutActionType.WAIT,
reason=...,
)
# Previous step (chain[idx-1]) timed out — escalate to that step's role.
prev_step = self._chain[idx - 1]
return TimeoutAction(
action=TimeoutActionType.ESCALATE,
reason=f"Escalated to {prev_step.role!r} ...",
escalate_to=prev_step.role,
)
cumulative_seconds += step_timeout
# Chain exhausted — escalate to the last step's role before on_chain_exhausted,
# or apply on_chain_exhausted directly depending on the design intent.Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/timeout/policies.py
Line: 285-314
Comment:
**`EscalationChainPolicy` escalates to the wrong role — off-by-one in chain resolution**
When step `idx-1`'s timeout expires and the loop reaches `idx`, the code escalates to `step[idx].role` (the *current* step's role). But the semantics of the chain are that expiring step N should escalate *to* step N's role, not step N+1's role. The current code always skips step[0].role as an escalation target entirely.
Consider a two-step chain `[team_lead(10 min), manager(20 min)]`:
| elapsed | Expected | Actual |
|---------|----------|--------|
| 0–10 min | WAIT | WAIT ✓ |
| 10–30 min | ESCALATE → `team_lead` | ESCALATE → `manager` ✗ |
| 30+ min | ESCALATE → `manager` / exhausted | exhausted ✗ |
For a **single-step** chain `[cto(60 min)]`, `cto` is never escalated to at all — the policy immediately returns `on_chain_exhausted` after 60 minutes without ever issuing a `ESCALATE` action.
The fix is to use the **previous** step's role when deciding where to escalate:
```python
for idx, step in enumerate(self._chain):
step_timeout = step.timeout_minutes * _SECONDS_PER_MINUTE
step_end = cumulative_seconds + step_timeout
if elapsed_seconds < step_end:
if idx == 0:
# Waiting for the initial reviewer — no escalation yet.
return TimeoutAction(
action=TimeoutActionType.WAIT,
reason=...,
)
# Previous step (chain[idx-1]) timed out — escalate to that step's role.
prev_step = self._chain[idx - 1]
return TimeoutAction(
action=TimeoutActionType.ESCALATE,
reason=f"Escalated to {prev_step.role!r} ...",
escalate_to=prev_step.role,
)
cumulative_seconds += step_timeout
# Chain exhausted — escalate to the last step's role before on_chain_exhausted,
# or apply on_chain_exhausted directly depending on the design intent.
```
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
AutonomyResolver,EffectiveAutonomymodel, action type expansion viaActionTypeRegistry, seniority constraints (JUNIOR can't have FULL), andHumanOnlyPromotionStrategyfor auto-downgradeTimeoutPolicyimplementations (Wait Forever, Deny on Timeout, Tiered per risk level, Escalation Chain) with discriminated union config,DefaultRiskTierClassifier, andTimeoutCheckerserviceParkServiceserializesAgentContexttoParkedContextfor persistence when awaiting approval;PARKEDtermination reason inExecutionResultparked_contextstable with indexes;SQLiteParkedContextRepositorywith full CRUDAutonomyController(GET/POST/agents/{agent_id}/autonomy) registered with app routerSecOpsService, security guard blocking auto-approve for HIGH/CRITICAL risk tiersPre-PR review fixes (51 findings from 10 agents)
EffectiveAutonomyaction setsescalate_toconsistency validator onTimeoutActionMemoryError/RecursionErrorre-raise in security service broad except blocks_row_to_modelto raiseQueryErrorinstead of returningNoneYamlRiskTierClassifier→DefaultRiskTierClassifierSELECT *)AutonomyControllerwith app routerCloses #42
Closes #126
Test plan
test_parked_context_repo.py(12 tests),test_autonomy.py(4 tests)test_events.py(autonomy, timeout, parked context persistence)DefaultRiskTierClassifierin all test filesprompt.py