Skip to content

feat: add approval workflow gates to TaskEngine#387

Merged
Aureliolo merged 18 commits intomainfrom
feat/task-engine-approval
Mar 14, 2026
Merged

feat: add approval workflow gates to TaskEngine#387
Aureliolo merged 18 commits intomainfrom
feat/task-engine-approval

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Adds approval-gated execution to the agent engine so SecOps ESCALATE verdicts and agent-initiated request_human_approval tool calls can park execution, persist context, and support resumption workflows.

  • ApprovalGate service: coordinates context parking (serialize → persist) and resumption (load → deserialize → cleanup) via ParkService + ParkedContextRepository
  • RequestHumanApprovalTool: agent-callable tool that creates an ApprovalItem and signals parking via ToolExecutionResult.metadata
  • ToolInvoker escalation tracking: detects ESCALATE verdicts and requires_parking metadata, exposes pending_escalations with deterministic ordering
  • Loop integration: execute_tool_calls in loop_helpers checks for escalations after each tool-call batch and returns PARKED or ERROR termination
  • API hardening: requested_by bound to authenticated user, save_if_pending atomic guard against double-decisions, prompt injection mitigation in resume messages
  • Frontend fix: WS handler uses approval_id (not id), fetches full items via API, re-fetches filtered queries on WS events
  • Observability: 16 structured event constants for the approval gate lifecycle
  • V7 migration: makes parked_contexts.task_id nullable for taskless agents (crash-safe rename-first approach)
  • Shared validation: is_valid_action_type() extracted to core/validation.py, used by both DTO and tool

Key architectural decisions

  • Resume is intentionally a signalling stub (_signal_resume_intent) — actual re-enqueue deferred to a future scheduler component (documented with TODO)
  • External execution_loop supplied alongside approval_store logs a warning — the gate is only auto-wired for the default ReactLoop
  • _track_parking_metadata returns error ToolResult instead of raising ToolExecutionError — prevents sibling task cancellation in invoke_all TaskGroup
  • Cleanup failure in _cleanup_parked does NOT prevent resume — the context is already deserialized

Test plan

  • 7632+ unit tests passing, 94.6% coverage
  • Approval gate park/resume lifecycle (happy path + all error paths)
  • ToolInvoker escalation tracking and deterministic ordering
  • RequestHumanApprovalTool creation, validation, risk classification, store errors
  • Controller helpers: resolve_decision, signal_resume_intent, publish_approval_event
  • ApprovalStore: save_if_pending, combined filters, on_expire callback, expiration
  • V7 migration: task_id nullable via PRAGMA, data preservation
  • Security factory: interceptor creation, approval tool registry composition
  • Frontend: WS event handling with filters, spy cleanup, duplicate prevention
  • Shared validation: action_type format validation
  • DTO: action_type format, TTL bounds, comment max_length

Closes #258, Closes #259

Implement the approval gate layer that parks agent execution when
human approval is required and provides infrastructure for resuming
after a decision. Also adds the `request_human_approval` tool for
agents to explicitly request approval.

- Add EscalationInfo/ResumePayload models and approval gate events
- Add ApprovalGate service (park/resume via ParkService)
- Add RequestHumanApprovalTool (creates ApprovalItem, signals parking)
- Add ToolInvoker escalation tracking (ESCALATE verdicts + tool metadata)
- Integrate approval gate into execute_tool_calls and both loop types
- Wire into AgentEngine, AppState, and approvals controller resume path

Closes #258
Closes #259
Critical: empty task_id ValidationError, parking metadata bypass,
PARKED→ERROR on park failure, wire parked_context_repo, TOCTOU race
via save_if_pending, resume trigger wiring in approve/reject.

Security: prompt injection mitigation, max_length on comment,
ttl_seconds cap, action_type format validation, remove phantom
requested_by field.

Logging: wrong event constants for init/risk/park-info, consistent
MemoryError logging, new INITIALIZED/RISK_CLASSIFIED/RESUME_TRIGGERED
constants.

Code quality: extract security/tool factories (962→860 lines), split
execute() into helpers, deterministic escalation order, boolean
parking_failed, check delete() result, risk classifier error handling.

Frontend: async handler type fix, WS test payload/await fix,
filter-aware WS, 404 vs transient error discrimination, total desync
fix, empty catch logging.

Tests: park failure expects ERROR, module-level markers, extracted
fixtures, simplified PropertyMock, updated resume message assertions.

Docs: persistence qualifier, CLAUDE.md events, docstring fixes.
- _trigger_resume no longer calls resume_context() which would
  delete the parked record before a scheduler can consume it —
  now only logs the resume trigger for scheduler observation
- Remove dead APPROVAL_RESUMED WsEventType (never emitted)
- Remove unused ApprovalGate import from controller
- Rename _trigger_resume → _signal_resume_intent with TODO for scheduler
- Add warning log before UnauthorizedError in controller auth checks
- Add DEBUG log on successful parked record deletion
- New APPROVAL_GATE_PARK_TASKLESS event for pre-park debug log
- New APPROVAL_GATE_RISK_CLASSIFY_FAILED event for classification failure
- Fix SQLite schema: parked_contexts.task_id TEXT NOT NULL → TEXT (nullable)
- Extract shared is_valid_action_type() to core/validation.py
- Fix filtered WS handler: re-fetch query instead of blind total increment
- Add afterEach(vi.restoreAllMocks()) for spy cleanup in frontend tests
- Add 10 missing event constants to CLAUDE.md reference list
- Fix test-model-001 → test-small-001 per project conventions
- Add comprehensive tests: validation, approval_store, controller helpers,
  security_factory, approval_gate init/resume, risk classification failure,
  action_type format validation, TTL bounds, ApproveRequest comment length,
  loop_helpers taskless path and IOError handling
- Split park_context/resume_context into smaller helpers (<50 lines each)
- Fix duplicate APPROVAL_GATE_CONTEXT_RESUMED emission on success path
- Add V6 migration to make parked_contexts.task_id nullable on existing DBs
- Warn when external execution_loop bypasses approval gate wiring
- Guard execute() against KeyError on missing arguments
- Use APPROVAL_GATE_RISK_CLASSIFY_FAILED for "no classifier" debug log
- Fix TOCTOU race on WS duplicate prevention with post-fetch re-check
- Fix double APPROVAL_GATE_RESUME_DELETE_FAILED emission: early return
  after exception log prevents fallthrough to if-not-deleted branch
- Add APPROVAL_GATE_LOOP_WIRING_WARNING constant for external loop
  misconfiguration (was misusing APPROVAL_GATE_INITIALIZED)
- Wire APPROVAL_GATE_RISK_CLASSIFIED to classifier success path; remove
  misleading usage for "no classifier" info log (was RISK_CLASSIFY_FAILED)
- Add APPROVAL_GATE_LOOP_WIRING_WARNING to CLAUDE.md event reference
- Fix save_if_pending() docstring to cover all None cases (not-found,
  expired, already-decided)
- Fix approve/reject ConflictError message: "no longer pending (already
  decided or expired)" instead of "decided by another request"
- Make V6 migration crash-safe: rename old→backup before replacing,
  explicit column list in INSERT to avoid schema-order coupling
- Add V6 migration tests: verify task_id nullable via PRAGMA, verify
  data preservation across migration
- Fix mypy arg-type/index errors: wrap fetchall() in list() for Sized
- Return store-backed saved object from approve/reject instead of
  the pre-save updated variable (defensive consistency)
ApprovalGate._serialize_context already logs CONTEXT_PARK_FAILED
before re-raising — remove the redundant log in _park_for_approval
to avoid double emission of the same event.
_track_parking_metadata now returns an error ToolResult instead of
raising ToolExecutionError — prevents uncaught exception from
cancelling sibling tasks in TaskGroup during invoke_all.
Checkpoint recovery (V6) landed on main before our parked_contexts
nullable migration, which is now V7.
Copilot AI review requested due to automatic review settings March 14, 2026 10:13
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 14, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 14, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Approval workflow gates: sensitive actions can escalate, park agents, and resume after human decisions
    • Agent-callable human-approval tool to request and track approvals
    • Frontend WebSocket handling updated to reconcile approvals by approval_id and re-fetch as needed
  • Documentation

    • README and design docs updated to describe approval gates and approval-resume scheduler
  • Observability

    • Event-driven logging extended with approval-gate events (park/resume/escalation)
  • Validation / API

    • Stronger validation for action_type and TTLs; improved approval request handling and responses

Walkthrough

Adds a human-approval workflow: new ApprovalGate service and models, a request_human_approval tool, ToolInvoker escalation tracking, engine loop wiring to park/resume contexts, API/DTO/controller changes for approvals, observability event constants, DB migration for parked_context.task_id, and frontend WS handling updates.

Changes

Cohort / File(s) Summary
Approval Gate Core
src/ai_company/engine/approval_gate.py, src/ai_company/engine/approval_gate_models.py, src/ai_company/api/approval_store.py
Add ApprovalGate (should_park/park_context/resume_context, serialize/deserialize, persistence hooks), EscalationInfo and ResumePayload models, and ApprovalStore.save_if_pending plus safer expiration-callback handling.
Engine Loop Integration
src/ai_company/engine/agent_engine.py, src/ai_company/engine/react_loop.py, src/ai_company/engine/plan_execute_loop.py, src/ai_company/engine/loop_helpers.py
Wire ApprovalGate into engine construction and loops, forward approval_gate into execute_tool_calls, add default loop builder, parked_context_repo injection, and parking control flow returning PARKED/ERROR results.
Tool System & Escalation Tracking
src/ai_company/tools/approval_tool.py, src/ai_company/tools/invoker.py, src/ai_company/tools/registry.py, src/ai_company/tools/__init__.py
Add RequestHumanApprovalTool producing parking metadata; ToolInvoker now tracks pending_escalations, centralizes per-call invocation, and surfaces pending_escalations; ToolRegistry adds all_tools.
Security Factory & Validation
src/ai_company/engine/_security_factory.py, src/ai_company/core/validation.py
New make_security_interceptor and registry_with_approval_tool to integrate approval tools into security wiring; add is_valid_action_type helper for "category:action" format.
API Controllers & DTOs
src/ai_company/api/controllers/approvals.py, src/ai_company/api/dto.py, src/ai_company/api/state.py
Require server-assigned requested_by, add auth-failure logging/events, centralize approval decision logging and resume signaling, enforce action_type format and stricter TTL/metadata validation, and make ApprovalGate injectable via AppState.
Persistence & Schema
src/ai_company/security/timeout/park_service.py, src/ai_company/security/timeout/parked_context.py, src/ai_company/persistence/sqlite/migrations.py
Make parked_context.task_id nullable and ParkService.task_id optional; add v7 migration to make task_id nullable and bump SCHEMA_VERSION to 7 with crash-safe DDL.
Observability & Exports
src/ai_company/engine/__init__.py, src/ai_company/observability/events/approval_gate.py
Export ApprovalGate, EscalationInfo, ResumePayload; add approval_gate-specific observability event constants for structured logging.
Frontend Approvals
web/src/stores/approvals.ts, web/src/__tests__/stores/approvals.test.ts
Revise WS handling to use approval_id payloads and async fetch/reconciliation; filter-aware re-fetchs and updated tests to match new payload shapes.
Docs & README
CLAUDE.md, README.md, docs/design/engine.md
Document approval-gate integration, event-constants-driven logging requirements, and engine parking/resume design notes.
Tests
tests/unit/... (api, engine, tools, persistence, core, observability)
Add/extend tests for ApprovalGate, models, RequestHumanApprovalTool, invoker escalation tracking, ApprovalStore.save_if_pending, v7 migrations, DTO validation, loop helper parking behavior, and controller helpers.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent (Execution Loop)
    participant Invoker as ToolInvoker
    participant Gate as ApprovalGate
    participant Park as ParkService
    participant Repo as ParkedContextRepo
    participant API as Approvals API
    participant Store as ApprovalStore

    Agent->>Invoker: invoke tool call(s)
    Invoker->>Invoker: execute tools & collect verdicts
    alt escalation detected
        Invoker-->>Agent: pending_escalations (EscalationInfo)
        Agent->>Gate: should_park(escalations)?
        Gate-->>Agent: escalation to park (EscalationInfo)
        Agent->>Gate: park_context(escalation, context)
        Gate->>Park: serialize(context)
        Park-->>Gate: ParkedContext
        Gate->>Repo: persist parked context (optional)
        Repo-->>Gate: persisted
        Gate-->>Agent: parked (approval_id)
        Agent-->>Agent: return PARKED ExecutionResult
    end

    rect rgba(50,150,200,0.5)
        API->>Store: create/list/decide ApprovalItem
        Store-->>API: decision saved
        API->>Gate: resume_context(approval_id)
        Gate->>Repo: load_by_approval_id
        Repo-->>Gate: ParkedContext
        Gate->>Park: deserialize -> AgentContext
        Park-->>Gate: AgentContext
        Gate->>Repo: cleanup(delete)
        Repo-->>Gate: deleted
        Gate-->>API: AgentContext
        API-->>Agent: inject resume message / continue execution
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.80% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary change: adding approval workflow gates to TaskEngine. It is concise, clear, and reflects the main objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset. It covers the key components, architectural decisions, test coverage, and linked issues.
Linked Issues check ✅ Passed The PR successfully addresses both linked issues #258 and #259. It implements the ApprovalGate service for engine-level blocking, request_human_approval tool, escalation tracking, loop integration, and all required features specified in the issues.
Out of Scope Changes check ✅ Passed All changes are within scope. Frontend WS handler updates, V7 migration, validation utilities, and observability events all support the core approval workflow gate implementation objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/task-engine-approval
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/task-engine-approval
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical approval workflow mechanism into the agent engine, enabling human oversight for sensitive operations. It allows agents to pause their execution, persist their current context, and await explicit human approval before proceeding. This enhancement significantly improves the safety and control of autonomous agent operations by integrating human-in-the-loop decision-making, ensuring that high-risk actions are vetted and approved, thereby reducing potential liabilities and increasing operational reliability.

Highlights

  • Approval-Gated Execution: Implemented a new ApprovalGate service within the agent engine to coordinate context parking and resumption. This allows agent execution to pause for human approval based on SecOps verdicts or agent-initiated tool calls, persist its state, and resume later.
  • Request Human Approval Tool: Introduced a new agent-callable tool, RequestHumanApprovalTool, which agents can use to explicitly request human approval for sensitive actions. This tool creates an ApprovalItem and signals the engine to park execution.
  • Tool Invoker Escalation Tracking: Enhanced the ToolInvoker to detect ESCALATE verdicts from SecOps and requires_parking metadata from tools. It now exposes pending_escalations with deterministic ordering, ensuring proper handling of approval requests.
  • Execution Loop Integration: Integrated the approval gate into the execution loops (ReactLoop and PlanExecuteLoop). The execute_tool_calls helper now checks for escalations after each tool-call batch and returns PARKED or ERROR termination reasons as appropriate.
  • API Hardening and Validation: Strengthened the approvals API by binding requested_by to the authenticated user, adding an atomic save_if_pending guard against double-decisions, and implementing prompt injection mitigation in resume messages. Also, a shared validation utility for action_type format was extracted and applied to DTOs and tools.
  • Frontend WebSocket Handling: Updated the frontend WebSocket handler to correctly use approval_id (instead of id), fetch full approval items via the API, and re-fetch filtered queries on WS events to ensure UI consistency.
  • Observability and Database Migration: Added 16 new structured event constants for the approval gate lifecycle to improve observability. A V7 database migration was included to make the parked_contexts.task_id column nullable, supporting taskless agents and ensuring data preservation.
Changelog
  • CLAUDE.md
    • Updated descriptions for api/, engine/, and tools/ directories to reflect approval gate integration.
  • README.md
    • Updated project status to indicate that approval workflow gates are now implemented.
  • docs/design/engine.md
    • Added a new section detailing how the ApprovalGate evaluates escalations and parks the agent context within the execution loop.
  • src/ai_company/api/approval_store.py
    • Added save_if_pending method for optimistic concurrency control when updating approval items.
    • Improved error handling for the _on_expire callback to log exceptions without preventing expiration.
  • src/ai_company/api/controllers/approvals.py
    • Imported new event constants API_AUTH_FAILED and APPROVAL_GATE_RESUME_TRIGGERED.
    • Refactored _resolve_decision to prioritize status check before user authentication.
    • Added _log_approval_decision and _signal_resume_intent helper functions for logging and future scheduler integration.
    • Enforced requested_by field to be populated from the authenticated user's username during approval creation.
    • Switched approve and reject endpoints to use save_if_pending for atomic updates and added ConflictError handling.
    • Integrated _log_approval_decision and _signal_resume_intent into approve and reject flows.
  • src/ai_company/api/dto.py
    • Imported field_validator for Pydantic model validation.
    • Imported is_valid_action_type for shared validation.
    • Removed requested_by from CreateApprovalRequest as it's now populated by the authenticated user.
    • Added action_type format validation to CreateApprovalRequest using is_valid_action_type.
    • Added ttl_seconds bounds (min 60, max 604800) to CreateApprovalRequest.
    • Added max_length constraint (4096) to ApproveRequest.comment field.
  • src/ai_company/api/state.py
    • Imported ApprovalGate.
    • Added _approval_gate to __slots__ for state management.
    • Included approval_gate as an optional parameter in AppState.__init__.
    • Added approval_gate property to AppState.
  • src/ai_company/core/validation.py
    • Added new file containing is_valid_action_type function for validating category:action string format.
  • src/ai_company/engine/init.py
    • Exported ApprovalGate, EscalationInfo, and ResumePayload from the engine package.
  • src/ai_company/engine/_security_factory.py
    • Added new file to encapsulate security interceptor and approval tool registry creation logic, reducing complexity in agent_engine.py.
  • src/ai_company/engine/agent_engine.py
    • Imported make_security_interceptor and registry_with_approval_tool from _security_factory.py.
    • Imported ApprovalGate and ParkedContextRepository.
    • Removed direct security interceptor creation logic, delegating to _security_factory.py.
    • Added _approval_store, _parked_context_repo, and _approval_gate attributes.
    • Implemented _make_approval_gate to instantiate ApprovalGate if an approval store is configured.
    • Implemented _make_default_loop to wire the ReactLoop with the ApprovalGate.
    • Updated _make_security_interceptor to use the new factory function.
    • Modified _make_tool_invoker to use registry_with_approval_tool to dynamically add the RequestHumanApprovalTool.
  • src/ai_company/engine/approval_gate.py
    • Added new file defining the ApprovalGate class, responsible for managing the parking and resumption of agent contexts based on approval requirements.
    • Includes methods for should_park, park_context, resume_context, and build_resume_message.
  • src/ai_company/engine/approval_gate_models.py
    • Added new file defining EscalationInfo and ResumePayload Pydantic models for structured data transfer related to approval events.
  • src/ai_company/engine/loop_helpers.py
    • Imported APPROVAL_GATE_PARK_TASKLESS event constant.
    • Added approval_gate parameter to execute_tool_calls function.
    • Implemented logic within execute_tool_calls to check for escalations and call _park_for_approval if parking is required.
    • Added _park_for_approval function to handle the serialization and persistence of agent context when an approval is needed.
  • src/ai_company/engine/plan_execute_loop.py
    • Added approval_gate as an optional parameter to PlanExecuteLoop.__init__.
    • Passed self._approval_gate to _handle_step_tool_calls.
  • src/ai_company/engine/react_loop.py
    • Added approval_gate as an optional parameter to ReactLoop.__init__.
    • Passed self._approval_gate to _process_turn_response.
  • src/ai_company/observability/events/approval_gate.py
    • Added new file defining 16 constants for various approval gate lifecycle events.
  • src/ai_company/persistence/sqlite/migrations.py
    • Updated SCHEMA_VERSION to 7.
    • Modified the parked_contexts table schema to make task_id nullable.
    • Added _V7_STATEMENTS for the migration, including creating a new table, copying data, renaming, and dropping the old table.
  • src/ai_company/security/timeout/park_service.py
    • Updated the park method to allow task_id to be None, accommodating taskless agents.
  • src/ai_company/security/timeout/parked_context.py
    • Modified the ParkedContext model to make the task_id field nullable.
  • src/ai_company/tools/init.py
    • Exported RequestHumanApprovalTool from the tools package.
  • src/ai_company/tools/approval_tool.py
    • Added new file defining the RequestHumanApprovalTool class, which allows agents to request human approval and signals the engine to park execution.
  • src/ai_company/tools/invoker.py
    • Imported ApprovalRiskLevel and EscalationInfo.
    • Added _pending_escalations property to track escalations.
    • Updated _check_security to populate _pending_escalations when an ESCALATE verdict with an approval_id is received.
    • Refactored invoke to use a new _invoke_single method, ensuring _pending_escalations is cleared at the start of each top-level invocation.
    • Added _track_parking_metadata to detect and record tool-initiated parking requests (e.g., from RequestHumanApprovalTool).
    • Implemented sorting for _pending_escalations in invoke_all to ensure deterministic ordering.
  • src/ai_company/tools/registry.py
    • Added all_tools method to return all registered tool instances.
  • tests/unit/api/controllers/test_approvals.py
    • Updated _create_payload to use the new category:action format for action_type and removed requested_by.
  • tests/unit/api/controllers/test_approvals_helpers.py
    • Added new test file for helper functions in the approvals controller, covering _resolve_decision, _log_approval_decision, _signal_resume_intent, and _publish_approval_event.
  • tests/unit/api/test_approval_store.py
    • Added tests for the save_if_pending method, covering success, already decided, not found, and expired scenarios.
    • Added tests for combined filters in list_items.
    • Added tests for the on_expire callback, including receiving expired items and handling exceptions.
  • tests/unit/api/test_dto.py
    • Added tests for CreateApprovalRequest action type format validation.
    • Added tests for CreateApprovalRequest ttl_seconds bounds validation.
    • Added tests for ApproveRequest comment max_length validation.
  • tests/unit/core/test_validation.py
    • Added new test file for is_valid_action_type utility function, covering valid and invalid formats.
  • tests/unit/engine/test_approval_gate.py
    • Added new test file for the ApprovalGate service, covering should_park, park_context, resume_context, build_resume_message, and initialization scenarios.
  • tests/unit/engine/test_approval_gate_models.py
    • Added new test file for EscalationInfo and ResumePayload models, covering construction, immutability, and validation.
  • tests/unit/engine/test_loop_helpers_approval.py
    • Added new test file for approval gate integration in loop_helpers, covering scenarios with and without the gate, escalation leading to parking, and parking failures.
  • tests/unit/engine/test_security_factory.py
    • Added new test file for the _security_factory module, covering make_security_interceptor and registry_with_approval_tool functions.
  • tests/unit/observability/test_events.py
    • Added approval_gate to the list of expected domain modules for event constant discovery.
  • tests/unit/persistence/sqlite/test_migrations_v6.py
    • Updated the schema version check to 7.
  • tests/unit/persistence/test_migrations_v2.py
    • Added a test to confirm the schema version is 7.
    • Added tests for the V7 migration, verifying parked_contexts.task_id becomes nullable and existing data is preserved.
  • tests/unit/tools/test_approval_tool.py
    • Added new test file for RequestHumanApprovalTool, covering tool creation, execution, parking metadata, risk classification, validation, and error handling.
  • tests/unit/tools/test_invoker_escalation.py
    • Added new test file for ToolInvoker escalation tracking, covering empty escalations, ESCALATE verdicts, clearing between calls, multiple escalations, and parking tool metadata.
  • web/src/tests/stores/approvals.test.ts
    • Updated WebSocket event handling tests to use approval_id from the payload.
    • Modified tests to simulate fetching full approval items via API after WS events.
    • Added tests for re-fetching filtered queries when filters are active and WS events occur.
    • Added tests for correctly decrementing total count when an item is removed due to a 404/410 from getApproval.
    • Added tests to ensure state is not changed on transient fetch errors during WS event processing.
  • web/src/stores/approvals.ts
    • Imported axios for error type checking.
    • Refactored handleWsEvent to use approval_id from the WebSocket payload.
    • Implemented logic to fetch the full approval item via approvalsApi.getApproval for approval.submitted events.
    • Implemented logic to re-fetch the full approval item for approval.approved, approval.rejected, and approval.expired events.
    • Added conditional re-fetching of the entire filtered list if activeFilters are present, ensuring UI consistency.
    • Added error handling for API calls within handleWsEvent, specifically for 404/410 responses to remove items from the local store and decrement the total count.
Activity
  • The author, Aureliolo, has implemented a significant new feature: approval workflow gates for the TaskEngine.
  • Extensive unit tests (7632+ passing, 94.6% coverage) have been added or updated to cover the new functionality, including the approval gate lifecycle, escalation tracking, tool creation, API helpers, and database migrations.
  • Architectural decisions regarding resume signalling, external loop wiring, and error handling have been documented and implemented.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 14, 2026

Codecov Report

❌ Patch coverage is 87.30570% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.83%. Comparing base (6c60f6c) to head (b36b9ca).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/ai_company/tools/invoker.py 62.96% 8 Missing and 2 partials ⚠️
src/ai_company/api/controllers/approvals.py 70.00% 8 Missing and 1 partial ⚠️
src/ai_company/persistence/sqlite/migrations.py 70.83% 3 Missing and 4 partials ⚠️
src/ai_company/engine/approval_gate.py 92.59% 5 Missing and 1 partial ⚠️
src/ai_company/tools/approval_tool.py 91.17% 5 Missing and 1 partial ⚠️
src/ai_company/engine/agent_engine.py 72.22% 3 Missing and 2 partials ⚠️
src/ai_company/engine/_security_factory.py 92.30% 0 Missing and 4 partials ⚠️
src/ai_company/api/approval_store.py 93.33% 1 Missing ⚠️
src/ai_company/engine/loop_helpers.py 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #387      +/-   ##
==========================================
+ Coverage   93.78%   93.83%   +0.05%     
==========================================
  Files         456      462       +6     
  Lines       21330    21653     +323     
  Branches     2046     2079      +33     
==========================================
+ Hits        20005    20319     +314     
+ Misses       1033     1032       -1     
- Partials      292      302      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive approval workflow, a crucial feature for managing agent actions. The changes are well-structured, including a new ApprovalGate service, a RequestHumanApprovalTool, and robust integration into the execution loop and API. The API hardening, particularly binding requested_by to the authenticated user and implementing atomic updates with save_if_pending, significantly improves security and data integrity. The frontend changes to handle WebSocket events asynchronously are also a major improvement.

However, there is a recurring critical issue across several new and modified Python files: the use of Python 2-style except syntax (e.g., except MemoryError, RecursionError:), which will cause a SyntaxError in Python 3. I've added comments with suggestions to fix this in all identified locations.

self._on_expire(expired)
try:
self._on_expire(expired)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, you should use except (MemoryError, RecursionError):. This will cause a SyntaxError in Python 3.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

channels=[CHANNEL_APPROVALS],
)
except RuntimeError, OSError:
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, you should use except (MemoryError, RecursionError):. This will cause a SyntaxError in Python 3.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

"risk_level": escalation.risk_level.value,
},
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except A, B: syntax is for Python 2 and will cause a SyntaxError in Python 3. You should use tuple-based exception catching: except (A, B):.

This issue appears multiple times in this file (lines 173, 249, and 270). Please correct all instances.

        except (MemoryError, RecursionError):

agent_id=agent_id,
task_id=task_id,
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, you should use except (MemoryError, RecursionError):. This will cause a SyntaxError in Python 3.

    except (MemoryError, RecursionError):

metadata={"source": "request_human_approval"},
)
await self._approval_store.add(item)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except A, B: syntax is for Python 2 and will cause a SyntaxError in Python 3. You should use tuple-based exception catching: except (A, B):.

This issue also appears on line 248.

        except (MemoryError, RecursionError):

reason="Agent requested human approval",
),
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, you should use except (MemoryError, RecursionError):. This will cause a SyntaxError in Python 3.

        except (MemoryError, RecursionError):

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an approval-gated “park/resume” workflow to the execution engine so SecOps ESCALATE verdicts and agent-initiated request_human_approval tool calls can pause execution, persist context, and support later resumption; includes API hardening, observability events, migration updates, and a frontend WS fix for approval events.

Changes:

  • Introduce ApprovalGate + models, integrate escalation detection into ToolInvoker, and wire parking behavior into execution loops.
  • Add request_human_approval tool, shared action_type validation, and API/store hardening (requested_by from auth, save_if_pending, resume intent signal).
  • Update persistence schema to v7 (nullable parked_contexts.task_id), add structured approval-gate events, and fix frontend WS approval event handling.

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
web/src/stores/approvals.ts Update WS handler to use approval_id, fetch full items via API, and reconcile filtered lists.
web/src/tests/stores/approvals.test.ts Expand WS handler tests to cover fetching, filtering re-fetches, and error paths.
tests/unit/tools/test_invoker_escalation.py New unit tests for invoker escalation tracking and deterministic ordering.
tests/unit/tools/test_approval_tool.py New unit tests for RequestHumanApprovalTool creation, validation, and error handling.
tests/unit/persistence/test_migrations_v2.py Add schema version assertion and v7 migration tests for nullable task_id.
tests/unit/persistence/sqlite/test_migrations_v6.py Update schema version expectation to 7.
tests/unit/observability/test_events.py Include approval_gate module in event constant discovery tests.
tests/unit/engine/test_security_factory.py New tests for extracted _security_factory helpers.
tests/unit/engine/test_loop_helpers_approval.py New tests for approval gate integration in execute_tool_calls.
tests/unit/engine/test_approval_gate_models.py New tests for EscalationInfo and ResumePayload validation/immutability.
tests/unit/engine/test_approval_gate.py New tests for ApprovalGate park/resume behavior and resume message building.
tests/unit/core/test_validation.py New tests for shared is_valid_action_type() utility.
tests/unit/api/test_dto.py Update DTO tests for action_type format and approve comment bounds.
tests/unit/api/test_approval_store.py Add tests for save_if_pending, combined filters, and on-expire callback behavior.
tests/unit/api/controllers/test_approvals_helpers.py New tests for approvals controller helper functions and auth pre-checks.
tests/unit/api/controllers/test_approvals.py Update controller tests for action_type format and server-side requested_by.
src/ai_company/tools/registry.py Add all_tools() to support building derived registries.
src/ai_company/tools/invoker.py Track pending escalations (SecOps + tool metadata), clear per call, and sort deterministically for batches.
src/ai_company/tools/approval_tool.py Add RequestHumanApprovalTool that creates ApprovalItem and signals parking via metadata.
src/ai_company/tools/init.py Export RequestHumanApprovalTool.
src/ai_company/security/timeout/parked_context.py Make ParkedContext.task_id nullable to support taskless agents.
src/ai_company/security/timeout/park_service.py Allow task_id=None when parking contexts.
src/ai_company/persistence/sqlite/migrations.py Bump schema to v7 and add migration to make parked_contexts.task_id nullable.
src/ai_company/observability/events/approval_gate.py Add approval-gate lifecycle event constants.
src/ai_company/engine/react_loop.py Wire optional approval_gate into React loop tool-call execution.
src/ai_company/engine/plan_execute_loop.py Wire optional approval_gate into plan/execute loop tool-call execution.
src/ai_company/engine/loop_helpers.py Add approval-gate checks after tool calls and park-on-escalation helper.
src/ai_company/engine/approval_gate_models.py Add EscalationInfo and ResumePayload frozen models.
src/ai_company/engine/approval_gate.py Implement gate logic: choose escalation, park (serialize+persist), resume (load+deserialize+cleanup), build resume injection message.
src/ai_company/engine/agent_engine.py Create/wire default ApprovalGate, inject approval tool into registry, extract security factory, and warn on external loop wiring.
src/ai_company/engine/_security_factory.py New module extracting security interceptor and approval-tool registry composition.
src/ai_company/engine/init.py Re-export approval gate types.
src/ai_company/core/validation.py Add shared is_valid_action_type() utility.
src/ai_company/api/state.py Add approval_gate to app state for controller access.
src/ai_company/api/dto.py Enforce action_type format in CreateApprovalRequest; bound TTL max; bound approve comment max length.
src/ai_company/api/controllers/approvals.py Bind requested_by to authenticated user; add save-if-pending conflict behavior; emit resume intent signal; improve auth logging.
src/ai_company/api/approval_store.py Add save_if_pending() and harden on-expire callback error handling.
docs/design/engine.md Document parking behavior in engine flow when escalations occur.
README.md Update status line to reflect approval workflow gates implementation.
CLAUDE.md Update codebase map/docs for approval-gate integration and approval tool.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if result.metadata.get("requires_parking") is not True:
return None
if not result.metadata.get("approval_id"):
return None
Comment on lines +279 to +283
FROM parked_contexts
""",
"ALTER TABLE parked_contexts RENAME TO parked_contexts_old",
"ALTER TABLE parked_contexts_new RENAME TO parked_contexts",
"DROP TABLE IF EXISTS parked_contexts_old",
Comment on lines +98 to +124
async def test_v6_makes_task_id_nullable(
self, memory_db: aiosqlite.Connection
) -> None:
"""V7 migration makes parked_contexts.task_id nullable."""
# Simulate a pre-v6 database with NOT NULL task_id
await memory_db.execute("""\
CREATE TABLE parked_contexts (
id TEXT PRIMARY KEY,
execution_id TEXT NOT NULL,
agent_id TEXT NOT NULL,
task_id TEXT NOT NULL,
approval_id TEXT NOT NULL,
parked_at TEXT NOT NULL,
context_json TEXT NOT NULL,
metadata TEXT NOT NULL DEFAULT '{}'
)""")
await set_user_version(memory_db, 6)
await memory_db.commit()

# Verify task_id is NOT NULL before migration
cursor = await memory_db.execute("PRAGMA table_info('parked_contexts')")
cols = {row[1]: row[3] for row in await cursor.fetchall()}
assert cols["task_id"] == 1 # notnull=1

# Run migrations (applies v6)
await run_migrations(memory_db)
assert await get_user_version(memory_db) == 7
Comment on lines +224 to +227
item = ApprovalItem(
id="exp-001",
action_type="code_merge",
title="Test",
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR implements a complete approval-gated execution system for the agent engine: a new ApprovalGate service parks execution context when SecOps ESCALATE verdicts or request_human_approval tool calls are detected, persists the context via ParkedContextRepository, and exposes a resume path. The change also hardens the API (binding requested_by to the authenticated user, atomic save_if_pending, prompt-injection mitigation), fixes the frontend WS handler to use approval_id and fetch full items via API, and adds a crash-safe V7 migration to make parked_contexts.task_id nullable.

Key areas of concern:

  • Multiple escalations per batch produce orphaned approval items: should_park returns only escalations[0]. When invoke_all triggers several ESCALATE verdicts or request_human_approval calls in one batch, the remaining ApprovalItem records are written to the store (PENDING) with no corresponding ParkedContext. A human who approves/rejects them triggers _signal_resume_intent, which logs but finds no parked context — the agent is silently never resumed.
  • Tool-initiated approval items have no TTL: RequestHumanApprovalTool._persist_item creates ApprovalItem with expires_at=None. Stale items accumulate indefinitely if an agent terminates after parking.
  • build_resume_message is currently dead code: The static method is well-designed but has no call site yet — _signal_resume_intent is a logging-only stub. A cross-reference comment would help future implementors locate the intended injection point.
  • Frontend WS handler race on filtered re-fetches: The fire-and-forget async IIFE can allow two concurrent fetchApprovals calls to race; the slower response overwrites the faster one's result with stale data.

Confidence Score: 3/5

  • Safe to merge with caution — core parking lifecycle is sound, but the multi-escalation orphan issue can produce silent no-resume failures in production that are hard to diagnose.
  • The fundamental park/resume mechanism, migration safety, API hardening, and frontend WS fix are all well-implemented and well-tested. The score is reduced because should_park silently drops all but the first escalation in a multi-tool batch, leaving dangling PENDING approval items with no recovery path — this is a correctness gap in a security-critical workflow. The no-TTL issue on tool-initiated items is a secondary quality concern. The known stub for resume scheduling is documented and acceptable.
  • Pay close attention to src/ai_company/engine/approval_gate.py (should_park multi-escalation handling) and src/ai_company/tools/approval_tool.py (missing TTL on created items).

Important Files Changed

Filename Overview
src/ai_company/engine/approval_gate.py New ApprovalGate service coordinates parking/resume via ParkService + ParkedContextRepository. Core park/resume lifecycle is correct, but should_park silently processes only the first escalation from a multi-escalation batch — remaining ApprovalItem records become orphaned (no parked context, not resumable). build_resume_message is dead code until the scheduler stub is filled in.
src/ai_company/tools/approval_tool.py New RequestHumanApprovalTool correctly creates ApprovalItems, validates action_type format, classifies risk, and signals parking metadata. Minor issue: items are created without expires_at (never expire), which can accumulate stale items if agents terminate abnormally.
src/ai_company/tools/invoker.py Escalation tracking added via _pending_escalations list, cleared per-batch at the start of invoke/invoke_all. ESCALATE verdict tracking and requires_parking metadata detection are both correct. Sorting by call-id order for determinism is sound. _invoke_single refactor cleanly separates escalation-clearing from the invocation logic.
src/ai_company/api/approval_store.py save_if_pending adds a correct optimistic-concurrency guard for approve/reject. on_expire exception swallowing is appropriate. Note: lazy expiration check result is not written back to _items (existing issue flagged in previous review thread).
src/ai_company/api/controllers/approvals.py requested_by binding to auth user is correct and closes an injection vector. ConflictError on double-decision replaces the weaker NotFoundError. _signal_resume_intent is clearly documented as a stub. _log_approval_decision is a clean separation of concerns. Exception broadening in _publish_approval_event is appropriate for best-effort WS publishing.
src/ai_company/persistence/sqlite/migrations.py _apply_v7 implements a crash-safe table-rebuild pattern for making task_id nullable. The three crash states are correctly handled: fresh start, mid-crash after step 3, and already-complete scenarios. Previous thread concerns about the has_new guard and the crash-after-rename window appear to have been addressed in this implementation.
web/src/stores/approvals.ts WS handler correctly uses approval_id from payload and fetches full items via API. Duplicate-insertion re-check after async fetch is sound. Race condition exists when multiple concurrent status-change events both trigger fetchApprovals(activeFilters) — responses can land out of order with no cancellation mechanism.

Comments Outside Diff (3)

  1. src/ai_company/engine/approval_gate.py, line 74-87 (link)

    Only first escalation is parked — others become permanently orphaned

    When invoke_all fires multiple tools concurrently and more than one triggers ESCALATE or request_human_approval, should_park returns only escalations[0]. The remaining ApprovalItem records are created in the approval store (PENDING) but no ParkedContext is ever saved for them.

    Consequences:

    • A human operator who approves/rejects those extra items triggers _signal_resume_intent, which logs APPROVAL_GATE_RESUME_TRIGGERED.
    • resume_context(approval_id) then calls _parked_context_repo.get_by_approval(approval_id), which returns None (no context was ever parked), so the agent is silently not resumed.
    • The operator gets no error feedback; the approval item stays decided but the agent stays dead.

    At a minimum, should_park should log a warning for every escalation beyond the first, or the store items for those escalations should be auto-rejected/expired to prevent them from appearing as actionable in the UI.

    def should_park(
        self,
        escalations: tuple[EscalationInfo, ...],
    ) -> EscalationInfo | None:
        if not escalations:
            return None
        if len(escalations) > 1:
            logger.warning(
                APPROVAL_GATE_ESCALATION_DETECTED,
                escalation_count=len(escalations),
                first_approval_id=escalations[0].approval_id,
                skipped_ids=[e.approval_id for e in escalations[1:]],
                note=(
                    "Only the first escalation will be parked; "
                    "remaining approval items will not be resumable"
                ),
            )
        else:
            logger.info(
                APPROVAL_GATE_ESCALATION_DETECTED,
                escalation_count=1,
                first_approval_id=escalations[0].approval_id,
            )
        return escalations[0]
  2. src/ai_company/tools/approval_tool.py, line 156-181 (link)

    Tool-initiated approval items never expire

    _persist_item creates the ApprovalItem with no expires_at (it defaults to None), so items created by request_human_approval never expire. If an agent parks on an approval and then terminates abnormally — crash, forced stop, deployment replacement — the corresponding ApprovalItem and ParkedContext will remain in PENDING status indefinitely. Over time, stale items accumulate in the queue with no way to correlate them back to a live agent.

    Consider accepting an optional ttl_seconds parameter in the tool's schema (or wiring a default from engine config), and passing it through to the item:

    # In parameters_schema["properties"]:
    "ttl_seconds": {
        "type": "integer",
        "minimum": 60,
        "maximum": 604800,
        "description": "Optional TTL in seconds (default: 24h)",
    },
    # In _persist_item:
    ttl = arguments.get("ttl_seconds")
    expires_at = datetime.now(UTC) + timedelta(seconds=ttl) if ttl else None
    item = ApprovalItem(
        ...
        expires_at=expires_at,
    )
  3. web/src/stores/approvals.ts, line 64-130 (link)

    Concurrent WS events can interleave fetchApprovals calls with no cancellation

    The void IIFE fires-and-forgets async work for each incoming WS event. When activeFilters.value is set, status-change events (approved/rejected/expired) trigger await fetchApprovals(activeFilters.value). If two such events arrive in rapid succession (e.g., a batch approve operation), two concurrent fetchApprovals calls race to completion:

    • Response 2 (stale) lands after Response 1 (fresh) → Response 2 overwrites approvals.value with older data.

    The test suite uses flushPromises() with a single settled promise, which prevents exposure of this race in tests.

    A common mitigation is an AbortController / cancellation token keyed on the filter state:

    let pendingFetchController: AbortController | null = null
    
    async function fetchApprovals(filters?: ApprovalFilters) {
      pendingFetchController?.abort()
      const controller = new AbortController()
      pendingFetchController = controller
      // ... pass signal through to axios
    }

    Alternatively, a debounce on the WS-triggered re-fetch would reduce the window significantly.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/ai_company/engine/approval_gate.py
Line: 74-87

Comment:
**Only first escalation is parked — others become permanently orphaned**

When `invoke_all` fires multiple tools concurrently and more than one triggers ESCALATE or `request_human_approval`, `should_park` returns only `escalations[0]`. The remaining `ApprovalItem` records are created in the approval store (`PENDING`) but no `ParkedContext` is ever saved for them.

Consequences:
- A human operator who approves/rejects those extra items triggers `_signal_resume_intent`, which logs `APPROVAL_GATE_RESUME_TRIGGERED`.
- `resume_context(approval_id)` then calls `_parked_context_repo.get_by_approval(approval_id)`, which returns `None` (no context was ever parked), so the agent is silently not resumed.
- The operator gets no error feedback; the approval item stays decided but the agent stays dead.

At a minimum, `should_park` should log a warning for every escalation beyond the first, or the store items for those escalations should be auto-rejected/expired to prevent them from appearing as actionable in the UI.

```python
def should_park(
    self,
    escalations: tuple[EscalationInfo, ...],
) -> EscalationInfo | None:
    if not escalations:
        return None
    if len(escalations) > 1:
        logger.warning(
            APPROVAL_GATE_ESCALATION_DETECTED,
            escalation_count=len(escalations),
            first_approval_id=escalations[0].approval_id,
            skipped_ids=[e.approval_id for e in escalations[1:]],
            note=(
                "Only the first escalation will be parked; "
                "remaining approval items will not be resumable"
            ),
        )
    else:
        logger.info(
            APPROVAL_GATE_ESCALATION_DETECTED,
            escalation_count=1,
            first_approval_id=escalations[0].approval_id,
        )
    return escalations[0]
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/ai_company/tools/approval_tool.py
Line: 156-181

Comment:
**Tool-initiated approval items never expire**

`_persist_item` creates the `ApprovalItem` with no `expires_at` (it defaults to `None`), so items created by `request_human_approval` never expire. If an agent parks on an approval and then terminates abnormally — crash, forced stop, deployment replacement — the corresponding `ApprovalItem` and `ParkedContext` will remain in `PENDING` status indefinitely. Over time, stale items accumulate in the queue with no way to correlate them back to a live agent.

Consider accepting an optional `ttl_seconds` parameter in the tool's schema (or wiring a default from engine config), and passing it through to the item:

```python
# In parameters_schema["properties"]:
"ttl_seconds": {
    "type": "integer",
    "minimum": 60,
    "maximum": 604800,
    "description": "Optional TTL in seconds (default: 24h)",
},
```

```python
# In _persist_item:
ttl = arguments.get("ttl_seconds")
expires_at = datetime.now(UTC) + timedelta(seconds=ttl) if ttl else None
item = ApprovalItem(
    ...
    expires_at=expires_at,
)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/ai_company/engine/approval_gate.py
Line: 253-284

Comment:
**`build_resume_message` is never called — dead code until scheduler lands**

`build_resume_message` is a well-designed static method, but it is not called anywhere in the current codebase. `_signal_resume_intent` in `approvals.py` only logs `APPROVAL_GATE_RESUME_TRIGGERED` and returns — no message is built or injected into any context.

This isn't a bug (the PR description documents the stub clearly), but the method's presence without a call site could mislead a future implementor into thinking resume injection is already wired. A docstring note or a `# TODO:` comment cross-referencing `§12.4` would make the intent clearer:

```python
@staticmethod
def build_resume_message(
    approval_id: str,
    *,
    approved: bool,
    decided_by: str,
    decision_reason: str | None = None,
) -> str:
    """Build a system message for resume injection.

    .. note:: This method is not yet called. It will be invoked by
        the scheduler component once ``_signal_resume_intent`` is
        replaced with an actual re-enqueue call (see §12.4 TODO).
    ...
    """
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: web/src/stores/approvals.ts
Line: 64-130

Comment:
**Concurrent WS events can interleave `fetchApprovals` calls with no cancellation**

The `void IIFE` fires-and-forgets async work for each incoming WS event. When `activeFilters.value` is set, status-change events (approved/rejected/expired) trigger `await fetchApprovals(activeFilters.value)`. If two such events arrive in rapid succession (e.g., a batch approve operation), two concurrent `fetchApprovals` calls race to completion:

- Response 2 (stale) lands after Response 1 (fresh) → Response 2 overwrites `approvals.value` with older data.

The test suite uses `flushPromises()` with a single settled promise, which prevents exposure of this race in tests.

A common mitigation is an AbortController / cancellation token keyed on the filter state:

```ts
let pendingFetchController: AbortController | null = null

async function fetchApprovals(filters?: ApprovalFilters) {
  pendingFetchController?.abort()
  const controller = new AbortController()
  pendingFetchController = controller
  // ... pass signal through to axios
}
```

Alternatively, a debounce on the WS-triggered re-fetch would reduce the window significantly.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: b36b9ca

Comment on lines +178 to +186
self._on_expire(expired)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
API_APPROVAL_EXPIRED,
approval_id=item.id,
note="on_expire callback failed",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy expiration not written back to _items inside save_if_pending

When _check_expiration determines the stored item has expired, it returns an expired copy but does not update self._items. After save_if_pending returns None, self._items[item.id] still holds the original PENDING item. Any subsequent call to get() (or another save_if_pending) would retrieve the stale PENDING entry and re-run the expiry check again, but the expired on_expire callback would fire a second time.

If get() also applies lazy expiration this is harmless. If it doesn't, callers that check the status after a failed save_if_pending could observe an inconsistent PENDING state.

Consider writing back the expired record when expiration fires:

current = self._check_expiration(current)
if current.status != ApprovalStatus.PENDING:
    # Write back the expired state so subsequent reads are consistent.
    self._items[item.id] = current
    return None
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/approval_store.py
Line: 178-186

Comment:
**Lazy expiration not written back to `_items` inside `save_if_pending`**

When `_check_expiration` determines the stored item has expired, it returns an expired copy but does **not** update `self._items`. After `save_if_pending` returns `None`, `self._items[item.id]` still holds the original `PENDING` item. Any subsequent call to `get()` (or another `save_if_pending`) would retrieve the stale PENDING entry and re-run the expiry check again, but the expired `on_expire` callback would fire a second time.

If `get()` also applies lazy expiration this is harmless. If it doesn't, callers that check the status after a failed `save_if_pending` could observe an inconsistent `PENDING` state.

Consider writing back the expired record when expiration fires:

```python
current = self._check_expiration(current)
if current.status != ApprovalStatus.PENDING:
    # Write back the expired state so subsequent reads are consistent.
    self._items[item.id] = current
    return None
```

How can I resolve this? If you propose a fix, please make it concise.

@@ -237,7 +253,7 @@ class ApproveRequest(BaseModel):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RejectRequest.reason missing max_length bound

ApproveRequest.comment was correctly bounded with max_length=4096 in this PR, but RejectRequest.reason — which flows directly into decision_reason on the stored item — is not bounded. For consistency and to prevent unbounded payloads:

Suggested change
reason: NotBlankStr = Field(max_length=4096)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/dto.py
Line: 253

Comment:
**`RejectRequest.reason` missing `max_length` bound**

`ApproveRequest.comment` was correctly bounded with `max_length=4096` in this PR, but `RejectRequest.reason` — which flows directly into `decision_reason` on the stored item — is not bounded. For consistency and to prevent unbounded payloads:

```suggestion
    reason: NotBlankStr = Field(max_length=4096)
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/ai_company/engine/plan_execute_loop.py (1)

80-96: 🧹 Nitpick | 🔵 Trivial

Update class docstring to document approval_gate parameter.

Similar to ReactLoop, the PlanExecuteLoop class docstring should document the new approval_gate parameter for completeness.

📝 Proposed docstring update
 class PlanExecuteLoop:
     """Plan-and-Execute execution loop.
 
     Decomposes a task into steps via LLM planning, then executes each
     step with a mini-ReAct sub-loop. Supports re-planning on failure.
+
+    Args:
+        config: Optional loop configuration.
+        checkpoint_callback: Optional async callback for checkpointing.
+        approval_gate: Optional gate for approval-required parking.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/plan_execute_loop.py` around lines 80 - 96, The
PlanExecuteLoop class docstring is missing documentation for the new
approval_gate parameter; update the class-level docstring above class
PlanExecuteLoop to describe the approval_gate parameter (type ApprovalGate |
None, purpose and behavior) similar to how ReactLoop documents its
approval_gate, and mention that it can be passed to __init__ to gate step
execution; ensure the parameter name approval_gate appears in the docstring and
briefly state default value (None) and its effect on execution/re-planning.
src/ai_company/engine/react_loop.py (1)

66-78: 🧹 Nitpick | 🔵 Trivial

Update class docstring to document the new approval_gate parameter.

The Args section in the class docstring documents checkpoint_callback but omits the newly added approval_gate parameter. Per coding guidelines, Google-style docstrings are required on public classes.

📝 Proposed docstring update
     Args:
         checkpoint_callback: Optional async callback invoked after each
             completed turn; the callback itself decides whether to persist.
+        approval_gate: Optional gate for approval-required parking.
+            When provided, tool escalations trigger context parking.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/react_loop.py` around lines 66 - 78, The class
docstring's Args section documents checkpoint_callback but omits the new
approval_gate parameter; update the class-level Google-style docstring to
include an Args entry for approval_gate describing its type (ApprovalGate |
None), that it's optional, and what it controls (used to gate/require approvals
between turns passed into __init__ as approval_gate), referencing the existing
CheckpointCallback/ checkpoint_callback description for parity and keeping
wording concise and consistent with the existing docstring style.
src/ai_company/api/controllers/approvals.py (1)

356-430: 🛠️ Refactor suggestion | 🟠 Major

Extract the shared decision flow out of approve() and reject().

These handlers now duplicate the same fetch → save_if_pending() → publish → log → signal path and both exceed the repo's 50-line limit, which makes future fixes easy to land in only one branch. As per coding guidelines, "Functions must be < 50 lines, files < 800 lines."

Also applies to: 437-511

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/controllers/approvals.py` around lines 356 - 430, The
approve() and reject() handlers duplicate the same fetch → save_if_pending() →
publish → log → signal flow and exceed 50 lines; extract that shared flow into a
new helper (e.g. _handle_approval_decision) that accepts (state/app_state,
request, approval_id, status: ApprovalStatus, approved: bool, decision_reason:
Optional[str], decided_by: str) and does: fetch via
app_state.approval_store.get, call _resolve_decision, build updated model_copy
with decided_at/decided_by/decision_reason/status, call save_if_pending, raise
NotFoundError/ConflictError with the same messages when appropriate, call
_publish_approval_event, _log_approval_decision, and await
_signal_resume_intent, then return the saved item; then simplify approve() and
reject() to call this helper with ApprovalStatus.APPROVED/REJECTED and the auth
user (from _resolve_decision or pass through) and return
ApiResponse(data=returned_item), preserving existing log keys/API messages and
error behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Line 154: Fix the punctuation typo in the Event names guidance line in
CLAUDE.md by adding the missing comma in the final clause; edit the sentence
that currently ends with "Import directly: `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`" so it includes
the needed comma (e.g., "Import directly, `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`") to satisfy
lint/style.

In `@README.md`:
- Line 133: The status line in the README overstates completeness by implying
the approval workflow is end-to-end; update the sentence that currently ends
with "Remaining: CLI" to clearly note that resume/re-enqueue scheduling is not
implemented (e.g., "Remaining: CLI and resume/scheduler for re-enqueueing
executions") so readers understand the approval gate can signal intent but a
scheduler is still required to resume work; edit the line containing the phrase
"Remaining: CLI" and adjust wording to mention both the CLI and the deferred
scheduler/resume mechanism.

In `@src/ai_company/api/approval_store.py`:
- Around line 177-186: The except block that re-raises MemoryError and
RecursionError should log context before re-raising: in the try around
self._on_expire(expired) catch the non-recoverable errors as a tuple (except
(MemoryError, RecursionError):), call logger.error(API_APPROVAL_EXPIRED,
approval_id=item.id, note="on_expire callback failed - non-recoverable error")
and then re-raise; keep the existing broad except Exception block that logs via
logger.exception unchanged.

In `@src/ai_company/api/controllers/approvals.py`:
- Around line 177-214: _signal_resume_intent currently only logs decisions;
update it to perform the actual handoff by calling and awaiting
ApprovalGate.resume_context on app_state.approval_gate (if not None), passing
the approval_id and a payload containing approved, decided_by and
decision_reason so the gate/scheduler can resume the parked agent; ensure you
handle/await the async resume_context call, catch and log any exceptions (using
logger.error) so failures don’t silence the resume attempt, and preserve the
existing log entry for telemetry.

In `@src/ai_company/engine/_security_factory.py`:
- Around line 39-117: The make_security_interceptor function is too long and
mixes autonomy validation, detector selection, and service construction; split
it into small helpers (e.g., validate_autonomy_with_config(security_config,
effective_autonomy), build_detectors(rule_engine_cfg) returning the list/tuple
of detector instances including PolicyValidator, CredentialDetector,
PathTraversalDetector, DestructiveOpDetector, DataLeakDetector, and
build_secops_service(cfg, rule_engine, audit_log, approval_store,
effective_autonomy) that constructs and returns the SecOpsService) then refactor
make_security_interceptor to call these helpers and return the SecOpsService;
keep existing behavior and exceptions (ExecutionStateError, logger messages) and
use the same symbols RuleEngine, RiskClassifier, OutputScanner,
DefaultRiskTierClassifier to assemble the final service.
- Around line 141-148: The code unconditionally appends a new
RequestHumanApprovalTool to the registry which will raise on duplicate names;
modify the logic around approval_tool / tool_registry so you check
tool_registry.all_tools() for an existing tool with the same name (or identity)
as approval_tool and either replace that entry or simply return the original
registry when a matching tool exists before constructing _ToolRegistry; update
the block that builds `existing = list(tool_registry.all_tools())` to filter out
any tool whose name matches approval_tool.name (or replace it) and then return
_ToolRegistry([...]) so duplicate registration is avoided.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 887-903: The current _make_approval_gate() constructs an
ApprovalGate whenever _approval_store is set even if _parked_context_repo is
None, which enables approval parking without persistent storage; modify
_make_approval_gate() to check both self._approval_store and
self._parked_context_repo and only return an ApprovalGate when both are present,
otherwise return None (or raise/explicitly disable approval paths) so
ApprovalGate is never created with parked_context_repo=None; update references
to ApprovalGate, _approval_store, and _parked_context_repo accordingly.

In `@src/ai_company/engine/approval_gate.py`:
- Around line 205-213: resume_context currently deserializes and then calls
_cleanup_parked but ignores failures/False results, returning a resumable
context even when cleanup failed; change resume_context to catch exceptions from
_cleanup_parked and check its return value, and if cleanup fails (exception or
falsy) log an error and raise an exception (fail closed) instead of returning
the context; apply the same defensive check to the other method that calls
_cleanup_parked in the second block (the resume_* caller around lines 260-287)
so both paths refuse to resume when cleanup doesn't succeed.

In `@src/ai_company/engine/loop_helpers.py`:
- Around line 345-413: The _park_for_approval function is over the 50-line
limit; refactor it into small helpers: extract agent and task id (e.g., a new
_extract_agent_and_task_id(ctx: AgentContext) -> tuple[str, str|None] that
contains the logger.debug branch), perform the park attempt (e.g.,
_attempt_park(escalation: EscalationInfo, ctx: AgentContext, approval_gate)
which calls approval_gate.park_context and re-raises MemoryError/RecursionError
but returns False on other exceptions), and build the final ExecutionResult
(e.g., _build_park_result(ctx, turns, escalation.approval_id, parked: bool) that
calls build_result with TerminationReason.PARKED or TerminationReason.ERROR and
the proper metadata). Replace the body of _park_for_approval with calls to these
helpers so the top-level function stays small and delegates extraction, parking,
and result construction.

In `@src/ai_company/persistence/sqlite/migrations.py`:
- Line 155: The _V3_STATEMENTS constant was modified to make task_id nullable,
which alters historical v3 DDL; restore the original v3 schema by changing
task_id back to NOT NULL inside _V3_STATEMENTS so migrations replay the true
pre-v7 schema and let the v7 migration (not _V3_STATEMENTS) handle making
task_id nullable; locate _V3_STATEMENTS and the task_id column definition and
revert it to its original NOT NULL declaration.
- Around line 258-286: The v7 migration _V7_STATEMENTS is not resumable; replace
the fixed statement list with an imperative function _apply_v7(conn) that checks
current schema state (existence of parked_contexts, parked_contexts_new,
parked_contexts_old) and performs only the missing idempotent steps: create
parked_contexts_new if absent, copy rows from parked_contexts into
parked_contexts_new if parked_contexts exists and new is empty, rename
parked_contexts to parked_contexts_old if parked_contexts exists and not already
renamed, rename parked_contexts_new to parked_contexts if the current table is
missing, and drop parked_contexts_old when present; update run_migrations to
call _apply_v7 instead of executing _V7_STATEMENTS, and apply the same
imperative, state-checking replacement for the similar sequence referenced
around the other block (the statements at the second occurrence).

In `@src/ai_company/tools/approval_tool.py`:
- Around line 114-129: The execute() method currently coerces arguments to
strings (action_type, title, description) allowing None/objects to become valid
text; instead validate that arguments["action_type"], ["title"], and
["description"] are present, are instances of str, and are non-blank before
calling _validate_action_type; if any check fails, return a ToolExecutionResult
with is_error=True and a clear message naming the offending key(s) and expected
type/non-empty constraint; keep calling _validate_action_type(action_type) only
after these checks pass.

In `@tests/unit/engine/test_approval_gate_models.py`:
- Around line 90-141: Add tests mirroring TestEscalationInfo by ensuring
ResumePayload also rejects empty strings for approval_id and decided_by: update
or add a test (e.g., test_empty_string_rejected) that parametrizes the same
fields ["approval_id","decided_by"], constructs kwargs like in
test_blank_string_rejected but sets the field to "" and asserts
ResumePayload(**kwargs) raises ValidationError; reference the ResumePayload
class and existing test_blank_string_rejected to keep behavior consistent.

In `@tests/unit/persistence/test_migrations_v2.py`:
- Around line 31-32: Add a 30s pytest timeout to the new async migration tests
in tests/unit/persistence/test_migrations_v2.py: either set a file-scope marker
`pytestmark = pytest.mark.timeout(30)` near the top of the file or add
`@pytest.mark.timeout(30)` to each async test (e.g.,
`test_schema_version_is_seven` and the tests covering lines ~98-173) so each
migration test runs with the repository-required 30‑second timeout.

In `@tests/unit/tools/test_invoker_escalation.py`:
- Around line 136-175: Replace the three nearly identical tests
(test_not_populated_on_escalate_without_approval_id,
test_not_populated_on_allow_verdict, test_not_populated_on_deny_verdict) with a
single parametrized pytest test that iterates different verdict inputs; for each
parameter set construct the AsyncMock interceptor (using _verdict with
SecurityVerdictType values and ApprovalRiskLevel where needed), set scan_output
only for the ALLOW case, create the invoker via _make_invoker(_StubTool(),
security_interceptor=interceptor), call await invoker.invoke(_make_tool_call()),
and assert invoker.pending_escalations == (); use pytest.mark.parametrize to
supply the verdict factory args so coverage remains identical while removing
duplicated arrange/act/assert code.

In `@web/src/stores/approvals.ts`:
- Around line 78-80: WS-driven refreshes currently call
fetchApprovals(activeFilters.value) which mutates shared loading/error state and
lacks a last-response-wins guard; create a dedicated helper (e.g.,
refreshApprovalsForFiltersFromWS or wsRefreshApprovals) that snapshots
activeFilters.value into a local const, calls listApprovals using that snapshot
without touching the global loading/error flags, and stores results only if a
per-filter request token/sequence matches the latest token for that snapshot
(ignore stale responses). Update WS paths that currently call fetchApprovals
(the calls around fetchApprovals and the similar usage at the other location) to
call this new helper instead. Add one regression test that issues overlapping
listApprovals calls (simulate out-of-order resolution) and verifies the store
retains the newest response and that foreground errors/loading are not clobbered
by WS refreshes.

---

Outside diff comments:
In `@src/ai_company/api/controllers/approvals.py`:
- Around line 356-430: The approve() and reject() handlers duplicate the same
fetch → save_if_pending() → publish → log → signal flow and exceed 50 lines;
extract that shared flow into a new helper (e.g. _handle_approval_decision) that
accepts (state/app_state, request, approval_id, status: ApprovalStatus,
approved: bool, decision_reason: Optional[str], decided_by: str) and does: fetch
via app_state.approval_store.get, call _resolve_decision, build updated
model_copy with decided_at/decided_by/decision_reason/status, call
save_if_pending, raise NotFoundError/ConflictError with the same messages when
appropriate, call _publish_approval_event, _log_approval_decision, and await
_signal_resume_intent, then return the saved item; then simplify approve() and
reject() to call this helper with ApprovalStatus.APPROVED/REJECTED and the auth
user (from _resolve_decision or pass through) and return
ApiResponse(data=returned_item), preserving existing log keys/API messages and
error behavior.

In `@src/ai_company/engine/plan_execute_loop.py`:
- Around line 80-96: The PlanExecuteLoop class docstring is missing
documentation for the new approval_gate parameter; update the class-level
docstring above class PlanExecuteLoop to describe the approval_gate parameter
(type ApprovalGate | None, purpose and behavior) similar to how ReactLoop
documents its approval_gate, and mention that it can be passed to __init__ to
gate step execution; ensure the parameter name approval_gate appears in the
docstring and briefly state default value (None) and its effect on
execution/re-planning.

In `@src/ai_company/engine/react_loop.py`:
- Around line 66-78: The class docstring's Args section documents
checkpoint_callback but omits the new approval_gate parameter; update the
class-level Google-style docstring to include an Args entry for approval_gate
describing its type (ApprovalGate | None), that it's optional, and what it
controls (used to gate/require approvals between turns passed into __init__ as
approval_gate), referencing the existing CheckpointCallback/ checkpoint_callback
description for parity and keeping wording concise and consistent with the
existing docstring style.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22002a6e-d586-49b7-ac6e-cbaa69d43879

📥 Commits

Reviewing files that changed from the base of the PR and between 6c60f6c and cc65e53.

📒 Files selected for processing (40)
  • CLAUDE.md
  • README.md
  • docs/design/engine.md
  • src/ai_company/api/approval_store.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/api/dto.py
  • src/ai_company/api/state.py
  • src/ai_company/core/validation.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/registry.py
  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/api/controllers/test_approvals_helpers.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/api/test_dto.py
  • tests/unit/core/test_validation.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/engine/test_security_factory.py
  • tests/unit/observability/test_events.py
  • tests/unit/persistence/sqlite/test_migrations_v6.py
  • tests/unit/persistence/test_migrations_v2.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/tools/test_invoker_escalation.py
  • web/src/__tests__/stores/approvals.test.ts
  • web/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Agent
  • GitHub Check: Build Backend
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Type hints required on all public functions; mypy strict mode enforced.
Google-style docstrings required on all public classes and functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions must be < 50 lines, files < 800 lines.

Files:

  • src/ai_company/tools/__init__.py
  • tests/unit/api/controllers/test_approvals_helpers.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/persistence/sqlite/test_migrations_v6.py
  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/tools/test_invoker_escalation.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/api/state.py
  • src/ai_company/engine/approval_gate_models.py
  • tests/unit/api/test_dto.py
  • tests/unit/engine/test_approval_gate.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/security/timeout/park_service.py
  • tests/unit/core/test_validation.py
  • tests/unit/engine/test_security_factory.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/persistence/test_migrations_v2.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/__init__.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/api/test_approval_store.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/registry.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/loop_helpers.py
  • tests/unit/observability/test_events.py
  • src/ai_company/api/dto.py
  • src/ai_company/core/validation.py
  • src/ai_company/engine/agent_engine.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code. Variable name: always logger (not _logger, not log).
Always use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs for logging: always logger.info(EVENT, key=value) — never logger.info('msg %s', val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.

Files:

  • src/ai_company/tools/__init__.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/api/state.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/registry.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/api/dto.py
  • src/ai_company/core/validation.py
  • src/ai_company/engine/agent_engine.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/tools/__init__.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/api/state.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/registry.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/api/dto.py
  • src/ai_company/core/validation.py
  • src/ai_company/engine/agent_engine.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow on test cases.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Vendor-agnostic everywhere: NEVER use real vendor names in tests. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/api/controllers/test_approvals_helpers.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/persistence/sqlite/test_migrations_v6.py
  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/tools/test_invoker_escalation.py
  • tests/unit/api/test_dto.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/core/test_validation.py
  • tests/unit/engine/test_security_factory.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/persistence/test_migrations_v2.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/observability/test_events.py
🧠 Learnings (15)
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Dependency review: license allow-list (permissive only), PR comment summaries.

Applied to files:

  • README.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Package structure: `src/ai_company/` contains api/, budget/, cli/, communication/, config/, core/, engine/, hr/, memory/, persistence/, observability/, providers/, security/, templates/, tools/ modules organized by domain.

Applied to files:

  • README.md
  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Web dashboard structure: Vue 3 + PrimeVue + Tailwind CSS with api/, components/, composables/, router/, stores/, styles/, utils/, views/, __tests__/ organized by feature.

Applied to files:

  • README.md
  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • src/ai_company/api/state.py
  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/ai_company/api/state.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/loop_helpers.py
  • tests/unit/observability/test_events.py
  • CLAUDE.md
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Web dashboard unit tests: `npm --prefix web run test` (Vitest).

Applied to files:

  • web/src/__tests__/stores/approvals.test.ts
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Use markers pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow on test cases.

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/conftest.py : Use asyncio_mode = 'auto' — no manual pytest.mark.asyncio needed.

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • src/ai_company/api/dto.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Use structured kwargs for logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO.

Applied to files:

  • CLAUDE.md
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : DEBUG logging for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • CLAUDE.md
🧬 Code graph analysis (31)
src/ai_company/tools/__init__.py (1)
src/ai_company/tools/approval_tool.py (1)
  • RequestHumanApprovalTool (34-263)
tests/unit/api/controllers/test_approvals_helpers.py (4)
src/ai_company/api/controllers/approvals.py (4)
  • _log_approval_decision (157-174)
  • _publish_approval_event (69-109)
  • _resolve_decision (112-154)
  • _signal_resume_intent (177-212)
src/ai_company/api/errors.py (2)
  • ConflictError (41-47)
  • UnauthorizedError (59-65)
src/ai_company/api/state.py (2)
  • AppState (23-163)
  • approval_gate (139-141)
src/ai_company/api/auth/models.py (1)
  • AuthenticatedUser (68-87)
tests/unit/engine/test_approval_gate_models.py (1)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
tests/unit/persistence/sqlite/test_migrations_v6.py (1)
tests/unit/persistence/test_migrations_v2.py (1)
  • test_schema_version_is_seven (31-32)
tests/unit/tools/test_invoker_escalation.py (4)
src/ai_company/providers/models.py (1)
  • ToolCall (96-119)
src/ai_company/security/models.py (2)
  • SecurityVerdict (35-67)
  • SecurityVerdictType (23-32)
src/ai_company/tools/base.py (3)
  • BaseTool (57-184)
  • ToolExecutionResult (25-54)
  • description (138-140)
src/ai_company/tools/registry.py (1)
  • ToolRegistry (30-126)
src/ai_company/engine/_security_factory.py (6)
src/ai_company/engine/errors.py (1)
  • ExecutionStateError (17-18)
src/ai_company/security/service.py (1)
  • SecOpsService (74-456)
src/ai_company/api/approval_store.py (1)
  • ApprovalStore (27-188)
src/ai_company/security/config.py (1)
  • SecurityConfig (93-140)
src/ai_company/security/protocol.py (1)
  • SecurityInterceptionStrategy (18-54)
src/ai_company/tools/registry.py (2)
  • ToolRegistry (30-126)
  • all_tools (102-104)
src/ai_company/api/state.py (1)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/engine/approval_gate_models.py (1)
src/ai_company/tools/base.py (1)
  • action_type (133-135)
tests/unit/api/test_dto.py (3)
src/ai_company/api/dto.py (3)
  • ApiResponse (39-57)
  • ApproveRequest (247-256)
  • CreateApprovalRequest (196-244)
src/ai_company/tools/base.py (2)
  • action_type (133-135)
  • description (138-140)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
web/src/__tests__/stores/approvals.test.ts (2)
web/src/stores/approvals.ts (1)
  • useApprovalStore (8-144)
web/src/api/types.ts (1)
  • WsEvent (519-524)
tests/unit/engine/test_approval_gate.py (3)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (204-272)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
web/src/stores/approvals.ts (2)
src/ai_company/api/ws_models.py (1)
  • WsEvent (47-72)
web/src/api/types.ts (1)
  • WsEvent (519-524)
src/ai_company/security/timeout/parked_context.py (2)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
src/ai_company/tools/base.py (1)
  • description (138-140)
src/ai_company/tools/invoker.py (5)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/tools/registry.py (2)
  • ToolRegistry (30-126)
  • get (71-96)
src/ai_company/tools/base.py (4)
  • name (123-125)
  • action_type (133-135)
  • ToolExecutionResult (25-54)
  • BaseTool (57-184)
src/ai_company/providers/models.py (2)
  • ToolCall (96-119)
  • ToolResult (122-135)
src/ai_company/security/timeout/park_service.py (1)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
tests/unit/core/test_validation.py (1)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
src/ai_company/persistence/sqlite/migrations.py (2)
tests/unit/persistence/sqlite/test_user_repo.py (1)
  • db (22-28)
tests/unit/hr/test_persistence.py (1)
  • db (29-35)
src/ai_company/engine/react_loop.py (2)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/api/approval_store.py (4)
src/ai_company/core/approval.py (1)
  • ApprovalItem (24-96)
src/ai_company/core/enums.py (1)
  • ApprovalStatus (434-440)
src/ai_company/api/app.py (1)
  • _on_expire (74-99)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/engine/approval_gate.py (5)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (204-272)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
src/ai_company/security/timeout/parked_context.py (1)
  • ParkedContext (19-66)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/context.py (1)
  • AgentContext (87-307)
src/ai_company/engine/__init__.py (3)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
tests/unit/engine/test_loop_helpers_approval.py (4)
src/ai_company/engine/approval_gate.py (3)
  • ApprovalGate (40-322)
  • should_park (71-90)
  • park_context (92-122)
src/ai_company/engine/loop_helpers.py (1)
  • execute_tool_calls (240-342)
src/ai_company/engine/loop_protocol.py (2)
  • ExecutionResult (79-140)
  • TerminationReason (28-36)
src/ai_company/providers/enums.py (1)
  • FinishReason (15-22)
tests/unit/api/test_approval_store.py (3)
src/ai_company/api/approval_store.py (5)
  • ApprovalStore (27-188)
  • add (42-59)
  • save_if_pending (125-149)
  • list_items (75-104)
  • get (61-73)
src/ai_company/core/enums.py (2)
  • ApprovalStatus (434-440)
  • ApprovalRiskLevel (443-449)
tests/unit/core/test_approval.py (1)
  • _now (11-12)
src/ai_company/tools/approval_tool.py (5)
src/ai_company/core/enums.py (1)
  • ToolCategory (294-308)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
src/ai_company/tools/base.py (5)
  • BaseTool (57-184)
  • description (138-140)
  • category (128-130)
  • action_type (133-135)
  • parameters_schema (143-151)
src/ai_company/api/approval_store.py (2)
  • ApprovalStore (27-188)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
src/ai_company/tools/registry.py (1)
src/ai_company/tools/base.py (2)
  • BaseTool (57-184)
  • name (123-125)
src/ai_company/api/controllers/approvals.py (6)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/api/errors.py (2)
  • UnauthorizedError (59-65)
  • ConflictError (41-47)
src/ai_company/api/approval_store.py (2)
  • get (61-73)
  • save_if_pending (125-149)
src/ai_company/api/auth/models.py (1)
  • AuthenticatedUser (68-87)
src/ai_company/api/ws_models.py (1)
  • WsEventType (20-44)
src/ai_company/engine/plan_execute_loop.py (2)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/engine/loop_helpers.py (5)
src/ai_company/engine/approval_gate.py (1)
  • should_park (71-90)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/tools/invoker.py (1)
  • pending_escalations (128-136)
src/ai_company/engine/loop_protocol.py (3)
  • TurnRecord (39-76)
  • ExecutionResult (79-140)
  • TerminationReason (28-36)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/api/dto.py (2)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
src/ai_company/core/validation.py (1)
src/ai_company/tools/base.py (1)
  • action_type (133-135)
src/ai_company/engine/agent_engine.py (4)
src/ai_company/engine/_security_factory.py (1)
  • make_security_interceptor (39-117)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/engine/loop_protocol.py (1)
  • ExecutionLoop (151-189)
src/ai_company/engine/react_loop.py (1)
  • ReactLoop (57-296)
🪛 LanguageTool
CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (45)
web/src/__tests__/stores/approvals.test.ts (1)

262-399: Nice coverage for WS reconciliation edge cases.

The filtered re-fetch, 404-removal, and transient-error cases line up well with the new approval_id-driven store flow and should catch several regressions here.

src/ai_company/tools/registry.py (1)

102-104: Deterministic ordering here looks good.

Returning a tuple sorted by tool name keeps downstream registry augmentation predictable.

src/ai_company/persistence/sqlite/migrations.py (1)

26-26: Schema bump and migration registration are wired correctly.

This makes the new migration reachable on upgrade and keeps the target version explicit.

Also applies to: 371-371

src/ai_company/security/timeout/parked_context.py (1)

41-43: Optional task_id keeps the model aligned with taskless parking.

This preserves validation for non-empty IDs when a value is present while allowing the new taskless flow.

src/ai_company/security/timeout/park_service.py (1)

43-52: park() now matches the taskless-agent model.

Allowing task_id=None here keeps the service API consistent with the nullable parked-context shape.

tests/unit/persistence/sqlite/test_migrations_v6.py (1)

20-21: Version expectation is updated correctly.

This keeps the schema-version smoke test aligned with the new v7 migration.

src/ai_company/observability/events/approval_gate.py (1)

1-19: Good centralization of approval-gate event names.

Keeping the lifecycle events in one domain module will make the new logging paths easier to reuse consistently.

Based on learnings, "Always use event name constants from domain-specific modules under ai_company.observability.events."

tests/unit/observability/test_events.py (2)

140-140: LGTM!

The pytestmark list correctly combines the unit marker and timeout, adhering to test guidelines.


181-228: LGTM!

The approval_gate domain module addition aligns with the new observability events introduced in this PR. The expected set maintains alphabetical consistency.

tests/unit/api/controllers/test_approvals.py (1)

19-29: LGTM!

The action_type format change to "code:merge" aligns with the new category:action validation introduced in CreateApprovalRequest. The test payload defaults are correctly structured.

src/ai_company/tools/__init__.py (1)

3-3: LGTM!

The RequestHumanApprovalTool import and export are correctly added, maintaining alphabetical order in __all__.

Also applies to: 68-68

docs/design/engine.md (2)

450-454: LGTM!

The documentation clearly explains the approval gate integration point within the execution loop, describing how escalations trigger context parking via ParkService and ParkedContextRepository.


465-468: LGTM!

The PARKED termination reason is correctly documented alongside other reasons that preserve task state, with a clear explanation of the approval-timeout policy suspension semantics.

src/ai_company/api/approval_store.py (1)

125-149: LGTM!

The save_if_pending method correctly implements a race-free conditional update pattern. The lazy expiration check before status comparison prevents TOCTOU issues, and the method properly returns None for all failure cases (not found, expired, or no longer pending).

tests/unit/core/test_validation.py (1)

1-41: LGTM!

The test file follows all guidelines: proper markers (unit, timeout), parametrized tests for both valid and invalid cases, and comprehensive edge case coverage including empty strings, whitespace variations, and multiple colons.

src/ai_company/core/validation.py (1)

1-19: LGTM!

Clean implementation of the category:action format validator. The function is well-documented with a Google-style docstring, has proper type hints, and uses a clear validation approach with split and strip.

tests/unit/api/test_dto.py (5)

5-5: LGTM!

Import correctly updated to include the new ApproveRequest DTO.


40-41: LGTM!

The action_type values correctly updated to use the category:action format throughout the metadata tests.

Also applies to: 51-51, 62-62, 71-71


80-118: LGTM!

Comprehensive parametrized tests for action_type validation covering both valid formats and invalid edge cases (missing colon, empty parts, whitespace, multiple colons).


121-160: LGTM!

TTL validation tests properly cover boundary conditions: below minimum (30s rejected), above maximum (700000 rejected), within bounds (3600 accepted), and default None.


163-175: LGTM!

ApproveRequest DTO tests correctly validate the optional comment field with proper length constraints.

src/ai_company/engine/react_loop.py (1)

244-251: LGTM!

The approval gate is correctly forwarded to execute_tool_calls, maintaining consistency with the pattern established for checkpoint callbacks. The keyword-only parameter ensures explicit usage.

tests/unit/engine/test_approval_gate_models.py (1)

1-9: LGTM!

Test setup correctly applies @pytest.mark.unit and timeout markers at module level, following testing guidelines.

src/ai_company/engine/__init__.py (2)

9-10: LGTM!

New imports are correctly placed and follow the alphabetical ordering convention used throughout the file.


213-295: LGTM!

The __all__ exports maintain proper alphabetical ordering (ApprovalGate, EscalationInfo, ResumePayload), making the new public API surface discoverable.

tests/unit/engine/test_security_factory.py (3)

1-13: LGTM!

Test file properly applies module-level markers and follows testing conventions.


34-94: LGTM!

TestMakeSecurityInterceptor provides good coverage of the factory's key behaviors: returning None when disabled, raising errors on configuration mismatches with effective_autonomy, and returning interceptors when properly configured.


97-122: LGTM!

TestRegistryWithApprovalTool correctly verifies identity preservation when no store is provided and registry replacement when a store is configured.

tests/unit/api/test_approval_store.py (3)

177-239: LGTM!

TestSaveIfPending comprehensively covers the optimistic concurrency guard: successful save when pending, rejection when already decided, not found, and expired. The direct _items manipulation on line 234 is appropriate for testing lazy expiration without triggering premature checks.


242-301: LGTM!

TestApprovalStoreFilters provides good coverage of combined filter scenarios, including edge cases where no matches are found.


304-374: LGTM!

TestOnExpireCallback properly validates callback invocation, exception handling that doesn't prevent expiration, and correct status transitions in listing operations.

tests/unit/api/controllers/test_approvals_helpers.py (4)

1-18: LGTM!

Test file correctly sets up markers and imports the private helper functions under test.


51-79: LGTM!

TestResolveDecision covers the key validation paths: conflict on non-pending status, unauthorized on missing/wrong user type, and successful extraction of the authenticated user.


101-135: LGTM!

TestSignalResumeIntent correctly validates the signaling stub behavior: no-op without gate, logging with gate present for both approval and rejection scenarios.


138-187: LGTM!

TestPublishApprovalEvent properly tests the best-effort publishing pattern: graceful handling when no plugin exists, successful publishing when available, and exception resilience.

src/ai_company/engine/plan_execute_loop.py (1)

746-771: LGTM!

The conversion from @staticmethod to instance method is necessary to access self._approval_gate, and the approval gate is correctly forwarded to execute_tool_calls. This maintains consistency with the ReactLoop integration.

src/ai_company/tools/invoker.py (4)

120-137: LGTM!

The pending_escalations property is well-documented and correctly exposes an immutable tuple. Clearing at the start of each invoke cycle ensures clean state.


237-251: LGTM!

Escalation tracking for security ESCALATE verdicts is correctly implemented. The escalation is only tracked when approval_id is non-None, preventing incomplete escalation records.


364-421: LGTM!

The refactoring to separate invoke() and _invoke_single() cleanly handles escalation clearing at the appropriate level—once for single invokes and once at batch level for invoke_all.


782-788: LGTM!

Sorting escalations by original tool-call order ensures deterministic behavior when multiple tools trigger escalations in a batch. The fallback to len(calls) for unknown IDs is defensive.

src/ai_company/api/state.py (1)

39-41: ApprovalGate wiring in AppState is coherent and safe.

__slots__, constructor injection, and the nullable property are aligned and preserve backward compatibility for deployments where the gate is not configured.

Also applies to: 61-67, 138-141

src/ai_company/engine/approval_gate_models.py (1)

14-33: Model design is solid for approval-gate data flow.

Both models are immutable, strongly typed, and use NotBlankStr appropriately for identifier-like fields.

Also applies to: 36-51

src/ai_company/api/dto.py (1)

217-228: Validation tightening here looks correct and well-scoped.

The action_type format validator, ttl_seconds cap, metadata bounds checks, and optional approval comment constraints are consistent with the API hardening goals.

Also applies to: 229-244, 256-256

tests/unit/engine/test_loop_helpers_approval.py (1)

107-139: Good coverage for parked/error termination behavior.

These cases validate the critical approval-gate branch outcomes and metadata propagation paths in execute_tool_calls.

Also applies to: 171-206, 242-273

tests/unit/engine/test_approval_gate.py (1)

188-292: ApprovalGate unit coverage is comprehensive for core lifecycle and safety paths.

The suite meaningfully verifies resume cleanup semantics and untrusted reason-string handling in resume message construction.

Also applies to: 294-354

- **Required**: `mem0ai` (Mem0 memory backend — the default and currently only backend)
- **Install**: `uv sync` installs everything (dev group is default)
- **Web dashboard**: Node.js 20+, dependencies in `web/package.json` (Vue 3, PrimeVue, Tailwind CSS, Pinia, VueFlow, ECharts, Axios, vue-draggable-plus, Vitest, ESLint, vue-tsc)
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_START` from `events.execution`, `CHECKPOINT_SAVED` from `events.checkpoint`, `PERSISTENCE_CHECKPOINT_SAVED` from `events.persistence`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_GROUP_START` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`, `APPROVAL_GATE_ESCALATION_DETECTED` from `events.approval_gate`, `APPROVAL_GATE_ESCALATION_FAILED` from `events.approval_gate`, `APPROVAL_GATE_INITIALIZED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFIED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFY_FAILED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_PARKED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_PARK_FAILED` from `events.approval_gate`, `APPROVAL_GATE_PARK_TASKLESS` from `events.approval_gate`, `APPROVAL_GATE_RESUME_STARTED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_RESUMED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_FAILED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_DELETE_FAILED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_TRIGGERED` from `events.approval_gate`, `APPROVAL_GATE_NO_PARKED_CONTEXT` from `events.approval_gate`, `APPROVAL_GATE_LOOP_WIRING_WARNING` from `events.approval_gate`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the punctuation typo in the logging guidance line.

Line 154 has a comma omission in the long sentence (as flagged by LanguageTool). Please add the missing comma to keep docs lint/style clean.

🧰 Tools
🪛 LanguageTool

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 154, Fix the punctuation typo in the Event names guidance
line in CLAUDE.md by adding the missing comma in the final clause; edit the
sentence that currently ends with "Import directly: `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`" so it includes
the needed comma (e.g., "Import directly, `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`") to satisfy
lint/style.

Comment on lines +177 to +186
try:
self._on_expire(expired)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
API_APPROVAL_EXPIRED,
approval_id=item.id,
note="on_expire callback failed",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if MemoryError is imported or shadowed in this file
rg -n "import.*MemoryError|from.*import.*MemoryError" src/ai_company/api/approval_store.py

# Check if there's a pattern of catching built-in MemoryError elsewhere
rg -nC2 "except.*MemoryError" src/ai_company --type py | head -40

Repository: Aureliolo/synthorg

Length of output: 2521


🏁 Script executed:

#!/bin/bash
# Get the imports at the top of approval_store.py
head -50 src/ai_company/api/approval_store.py

# Check if ai_company.memory.errors is imported anywhere
rg "from ai_company.memory.errors|import.*MemoryError" src/ai_company/api/approval_store.py

# Get the full context of the method containing the suspect code
sed -n '160,190p' src/ai_company/api/approval_store.py

Repository: Aureliolo/synthorg

Length of output: 2396


🏁 Script executed:

#!/bin/bash
# Verify if ai_company.memory.errors.MemoryError exists in codebase
find src -name "*.py" -exec grep -l "class MemoryError" {} \;

# Check ai_company.memory.errors module
fd "memory" src/ai_company --type d

Repository: Aureliolo/synthorg

Length of output: 116


🏁 Script executed:

#!/bin/bash
# Check if MemoryError is actually defined in ai_company.memory.errors
cat src/ai_company/memory/errors.py

Repository: Aureliolo/synthorg

Length of output: 1471


🏁 Script executed:

#!/bin/bash
# Check if approval_store.py imports from ai_company.memory.errors
rg "from ai_company.memory|import.*MemoryError" src/ai_company/api/approval_store.py

# Check the complete import section of approval_store.py
sed -n '1,30p' src/ai_company/api/approval_store.py

Repository: Aureliolo/synthorg

Length of output: 877


Add logging before re-raising MemoryError and RecursionError errors.

The exception handler re-raises MemoryError and RecursionError without logging context first. Per guidelines, all error paths must log at WARNING or ERROR with context before raising.

try:
    self._on_expire(expired)
except MemoryError, RecursionError:
    raise
except Exception:
    logger.exception(
        API_APPROVAL_EXPIRED,
        approval_id=item.id,
        note="on_expire callback failed",
    )

Log before the re-raise on line 179:

except MemoryError, RecursionError:
    logger.error(
        API_APPROVAL_EXPIRED,
        approval_id=item.id,
        note="on_expire callback failed - non-recoverable error",
    )
    raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/approval_store.py` around lines 177 - 186, The except
block that re-raises MemoryError and RecursionError should log context before
re-raising: in the try around self._on_expire(expired) catch the non-recoverable
errors as a tuple (except (MemoryError, RecursionError):), call
logger.error(API_APPROVAL_EXPIRED, approval_id=item.id, note="on_expire callback
failed - non-recoverable error") and then re-raise; keep the existing broad
except Exception block that logs via logger.exception unchanged.

Comment on lines +177 to +214
async def _signal_resume_intent(
app_state: AppState,
approval_id: str,
*,
approved: bool,
decided_by: str,
decision_reason: str | None = None,
) -> None:
"""Log that a decision was made so a scheduler can resume the agent.

This is intentionally a **signalling-only stub**. It does NOT call
``ApprovalGate.resume_context()`` or re-enqueue the parked agent —
that is the responsibility of a future scheduling component that
will observe status changes (via log events or store polling) and
perform the actual resume.

.. todo:: Wire to a real scheduler once one exists (see §12.4).

Args:
app_state: Application state containing the approval gate.
approval_id: The approval item identifier.
approved: Whether the action was approved.
decided_by: Who made the decision.
decision_reason: Optional reason for the decision.
"""
approval_gate = app_state.approval_gate
if approval_gate is None:
return

logger.info(
APPROVAL_GATE_RESUME_TRIGGERED,
approval_id=approval_id,
approved=approved,
decided_by=decided_by,
has_reason=decision_reason is not None,
)


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Wire an actual resume handoff here.

_signal_resume_intent() is the only new post-decision hook in this PR, but it never calls ApprovalGate.resume_context() or hands the decision to any scheduler interface—it only logs. That means the code added here still has no concrete path from an approved/rejected item back to runnable agent state, so the approval flow stops at PARKED.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/controllers/approvals.py` around lines 177 - 214,
_signal_resume_intent currently only logs decisions; update it to perform the
actual handoff by calling and awaiting ApprovalGate.resume_context on
app_state.approval_gate (if not None), passing the approval_id and a payload
containing approved, decided_by and decision_reason so the gate/scheduler can
resume the parked agent; ensure you handle/await the async resume_context call,
catch and log any exceptions (using logger.error) so failures don’t silence the
resume attempt, and preserve the existing log entry for telemetry.

Comment on lines +39 to +117
def make_security_interceptor(
security_config: SecurityConfig | None,
audit_log: AuditLog,
*,
approval_store: ApprovalStore | None = None,
effective_autonomy: EffectiveAutonomy | None = None,
) -> SecurityInterceptionStrategy | None:
"""Build the SecOps security interceptor if configured.

Args:
security_config: Security configuration, or ``None`` to skip.
audit_log: Audit log for security events.
approval_store: Optional approval store for escalation items.
effective_autonomy: Optional autonomy level override.

Returns:
A ``SecOpsService`` interceptor, or ``None`` if security is
disabled or not configured.

Raises:
ExecutionStateError: If *effective_autonomy* is provided but
no SecurityConfig is configured.
"""
if security_config is None:
if effective_autonomy is not None:
msg = (
"effective_autonomy cannot be enforced without "
"SecurityConfig — configure security or remove autonomy"
)
logger.error(SECURITY_DISABLED, note=msg)
raise ExecutionStateError(msg)
logger.warning(
SECURITY_DISABLED,
note="No SecurityConfig provided — all security checks skipped",
)
return None
if not security_config.enabled:
if effective_autonomy is not None:
msg = "effective_autonomy cannot be enforced when security is disabled"
logger.error(SECURITY_DISABLED, note=msg)
raise ExecutionStateError(msg)
return None

cfg = security_config
re_cfg = cfg.rule_engine
policy_validator = PolicyValidator(
hard_deny_action_types=frozenset(cfg.hard_deny_action_types),
auto_approve_action_types=frozenset(cfg.auto_approve_action_types),
)
detectors: list[
PolicyValidator
| CredentialDetector
| PathTraversalDetector
| DestructiveOpDetector
| DataLeakDetector
] = [policy_validator]
if re_cfg.credential_patterns_enabled:
detectors.append(CredentialDetector())
if re_cfg.path_traversal_detection_enabled:
detectors.append(PathTraversalDetector())
if re_cfg.destructive_op_detection_enabled:
detectors.append(DestructiveOpDetector())
if re_cfg.data_leak_detection_enabled:
detectors.append(DataLeakDetector())

rule_engine = RuleEngine(
rules=tuple(detectors),
risk_classifier=RiskClassifier(),
config=re_cfg,
)
return SecOpsService(
config=cfg,
rule_engine=rule_engine,
audit_log=audit_log,
output_scanner=OutputScanner(),
approval_store=approval_store,
effective_autonomy=effective_autonomy,
risk_classifier=DefaultRiskTierClassifier(),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Split make_security_interceptor() into smaller helpers.

This new function mixes autonomy validation, detector selection, and service construction across roughly 80 lines. Extracting those phases will satisfy the repo limit and make the wiring easier to test in isolation. As per coding guidelines, "Functions must be < 50 lines, files < 800 lines."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/_security_factory.py` around lines 39 - 117, The
make_security_interceptor function is too long and mixes autonomy validation,
detector selection, and service construction; split it into small helpers (e.g.,
validate_autonomy_with_config(security_config, effective_autonomy),
build_detectors(rule_engine_cfg) returning the list/tuple of detector instances
including PolicyValidator, CredentialDetector, PathTraversalDetector,
DestructiveOpDetector, DataLeakDetector, and build_secops_service(cfg,
rule_engine, audit_log, approval_store, effective_autonomy) that constructs and
returns the SecOpsService) then refactor make_security_interceptor to call these
helpers and return the SecOpsService; keep existing behavior and exceptions
(ExecutionStateError, logger messages) and use the same symbols RuleEngine,
RiskClassifier, OutputScanner, DefaultRiskTierClassifier to assemble the final
service.

Comment on lines +609 to +661
def _track_parking_metadata(
self,
result: ToolExecutionResult,
tool: BaseTool,
tool_call: ToolCall,
) -> ToolResult | None:
"""Detect ``requires_parking`` metadata and add to escalations.

Tools like ``request_human_approval`` signal parking via
``ToolExecutionResult.metadata``. Only tracks when both
``requires_parking=True`` and ``approval_id`` are present.

Returns:
``None`` on success, or an error ``ToolResult`` if tracking
fails — ensures the agent does not silently bypass the
approval gate.
"""
if result.metadata.get("requires_parking") is not True:
return None
if not result.metadata.get("approval_id"):
return None
try:
from ai_company.engine.approval_gate_models import ( # noqa: PLC0415
EscalationInfo as _EscalationInfo,
)

self._pending_escalations.append(
_EscalationInfo(
approval_id=str(result.metadata["approval_id"]),
tool_call_id=tool_call.id,
tool_name=tool.name,
action_type=tool.action_type,
risk_level=ApprovalRiskLevel(
result.metadata.get("risk_level", "high"),
),
reason="Agent requested human approval",
),
)
except MemoryError, RecursionError:
raise
except Exception as exc:
logger.exception(
TOOL_INVOKE_EXECUTION_ERROR,
tool_call_id=tool_call.id,
tool_name=tool.name,
note="Failed to track parking metadata",
)
return ToolResult(
tool_call_id=tool_call.id,
content=f"Approval escalation tracking failed: {exc}",
is_error=True,
)
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Verify risk_level default value matches expected enum member.

The default "high" for risk_level (line 642) is valid per ApprovalRiskLevel enum, but consider whether tools that don't specify a risk level should default to HIGH or if this should be explicitly required in the metadata.

Also, the EscalationInfo import is duplicated (lines 238-240 and 631-633). Consider consolidating to a single helper or module-level lazy import pattern.

♻️ Optional: Consolidate lazy import
# At module level or as a helper
def _get_escalation_info_class() -> type[EscalationInfo]:
    from ai_company.engine.approval_gate_models import EscalationInfo
    return EscalationInfo

Comment on lines +136 to +175
async def test_not_populated_on_escalate_without_approval_id(self) -> None:
interceptor = AsyncMock()
interceptor.evaluate_pre_tool = AsyncMock(
return_value=_verdict(
SecurityVerdictType.ESCALATE,
reason="Needs approval",
),
)
invoker = _make_invoker(_StubTool(), security_interceptor=interceptor)
await invoker.invoke(_make_tool_call())
assert invoker.pending_escalations == ()

async def test_not_populated_on_allow_verdict(self) -> None:
interceptor = AsyncMock()
interceptor.evaluate_pre_tool = AsyncMock(
return_value=_verdict(
SecurityVerdictType.ALLOW,
reason="OK",
risk_level=ApprovalRiskLevel.LOW,
),
)
interceptor.scan_output = AsyncMock(
return_value=AsyncMock(has_sensitive_data=False),
)
invoker = _make_invoker(_StubTool(), security_interceptor=interceptor)
await invoker.invoke(_make_tool_call())
assert invoker.pending_escalations == ()

async def test_not_populated_on_deny_verdict(self) -> None:
interceptor = AsyncMock()
interceptor.evaluate_pre_tool = AsyncMock(
return_value=_verdict(
SecurityVerdictType.DENY,
reason="Blocked",
risk_level=ApprovalRiskLevel.CRITICAL,
),
)
invoker = _make_invoker(_StubTool(), security_interceptor=interceptor)
await invoker.invoke(_make_tool_call())
assert invoker.pending_escalations == ()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider parameterizing verdict non-escalation cases.

Lines 136–175 repeat the same arrange/act/assert flow with only verdict inputs changed. Converting these to a single @pytest.mark.parametrize test will reduce maintenance overhead while keeping coverage identical.

As per coding guidelines: "Prefer @pytest.mark.parametrize for testing similar cases."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/test_invoker_escalation.py` around lines 136 - 175, Replace
the three nearly identical tests
(test_not_populated_on_escalate_without_approval_id,
test_not_populated_on_allow_verdict, test_not_populated_on_deny_verdict) with a
single parametrized pytest test that iterates different verdict inputs; for each
parameter set construct the AsyncMock interceptor (using _verdict with
SecurityVerdictType values and ApprovalRiskLevel where needed), set scan_output
only for the ALLOW case, create the invoker via _make_invoker(_StubTool(),
security_interceptor=interceptor), call await invoker.invoke(_make_tool_call()),
and assert invoker.pending_escalations == (); use pytest.mark.parametrize to
supply the verdict factory args so coverage remains identical while removing
duplicated arrange/act/assert code.

Comment on lines +78 to +80
if (activeFilters.value) {
// Filters active — re-fetch the filtered query to stay consistent
await fetchApprovals(activeFilters.value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a dedicated, sequenced refresh path for WS-driven filtered updates.

Line 80 and Line 104 reuse fetchApprovals() from detached WS tasks. That method mutates shared loading/error state and has no last-response-wins guard, so a burst of approval events can both hide a foreground action error and let an older filtered snapshot overwrite a newer one. Please route WS refreshes through a separate helper that snapshots the active filters and ignores stale responses; add one regression test with overlapping listApprovals calls that resolve out of order.

💡 Possible fix
+  let wsRefreshSeq = 0
+
+  async function refreshApprovalsFromWs(): Promise<void> {
+    if (!activeFilters.value) return
+
+    const seq = ++wsRefreshSeq
+    const filters = { ...activeFilters.value }
+
+    try {
+      const result = await approvalsApi.listApprovals(filters)
+      if (seq !== wsRefreshSeq) return
+
+      approvals.value = result.data
+      total.value = result.total
+    } catch (err) {
+      if (seq !== wsRefreshSeq) return
+      console.warn('Failed to refresh approvals:', err)
+    }
+  }
+
   function handleWsEvent(event: WsEvent): void {
     const payload = event.payload as Record<string, unknown> | null
     if (!payload || typeof payload !== 'object') return
     const approvalId = payload.approval_id
     if (typeof approvalId !== 'string' || !approvalId) return

     void (async () => {
       try {
         switch (event.event_type) {
           case 'approval.submitted':
             if (!approvals.value.some((a) => a.id === approvalId)) {
               if (activeFilters.value) {
-                await fetchApprovals(activeFilters.value)
+                await refreshApprovalsFromWs()
               } else {
                 // ...
               }
             }
             break
           case 'approval.approved':
           case 'approval.rejected':
           case 'approval.expired':
             if (activeFilters.value) {
-              await fetchApprovals(activeFilters.value)
+              await refreshApprovalsFromWs()
             } else {
               // ...
             }
             break
         }
       } catch (err) {
         console.warn('Unexpected error in WS event handler:', err)
       }
     })()
   }

Also applies to: 102-104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/stores/approvals.ts` around lines 78 - 80, WS-driven refreshes
currently call fetchApprovals(activeFilters.value) which mutates shared
loading/error state and lacks a last-response-wins guard; create a dedicated
helper (e.g., refreshApprovalsForFiltersFromWS or wsRefreshApprovals) that
snapshots activeFilters.value into a local const, calls listApprovals using that
snapshot without touching the global loading/error flags, and stores results
only if a per-filter request token/sequence matches the latest token for that
snapshot (ignore stale responses). Update WS paths that currently call
fetchApprovals (the calls around fetchApprovals and the similar usage at the
other location) to call this new helper instead. Add one regression test that
issues overlapping listApprovals calls (simulate out-of-order resolution) and
verifies the store retains the newest response and that foreground
errors/loading are not clobbered by WS refreshes.

- Restore 89 lines accidentally deleted from CLAUDE.md during rebase
  (Resilience, Testing, Git, CI, Dependencies sections)
- Make V7 migration crash-safe: _apply_v7 checks intermediate states
  (parked_contexts, _old, _new) and only runs missing steps
- Return error ToolResult when requires_parking=True but approval_id
  is missing — prevents silent gate bypass
- Restore V3 schema to original NOT NULL task_id (V7 handles change)
- Rename test_v6 → test_v7 and fix action_type format in store tests
- Update README status to note deferred resume scheduler
- Validate approval_tool arguments are strings (not just present)
- Add approval_gate docstrings to ReactLoop and PlanExecuteLoop
- Add ResumePayload empty-string rejection tests
- Add module-level pytestmark timeout to migration tests
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
CLAUDE.md (1)

154-154: ⚠️ Potential issue | 🟡 Minor

Fix the punctuation: add comma after "e.g."

The style guide expects a comma after "e.g." in this context. This was flagged by LanguageTool and noted in a prior review.

✏️ Proposed fix
-**Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, ...
+**Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 154, The sentence listing event constants uses "e.g."
without the required comma; update the sentence that begins with "**Event
names**: always use constants..." to insert a comma after "e.g." (so it reads
"e.g., `PROVIDER_CALL_START`...") and ensure the same punctuation is applied to
any other occurrences of "e.g." in that paragraph referencing constants like
PROVIDER_CALL_START, BUDGET_RECORD_ADDED, CFO_ANOMALY_DETECTED, etc.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 375-379: The current migration checks `has_original and not
has_new` before running `_V7_COPY_ROWS`, which can skip copying if
`parked_contexts_new` was created then the process crashed; instead, change the
logic to always execute `_V7_COPY_ROWS` whenever a source exists (i.e., if
`has_original` run `_V7_COPY_ROWS` with source="parked_contexts"; if `has_old`
run `_V7_COPY_ROWS` with source="parked_contexts_old`) regardless of `has_new`,
relying on `_V7_COPY_ROWS`'s INSERT OR IGNORE to make the operation idempotent
and crash-safe; update the conditionals around `has_original`, `has_old`,
`has_new` in the migration function in migrations.py accordingly.

In `@src/ai_company/tools/approval_tool.py`:
- Around line 127-138: The current validation in the approval tool only checks
types for action_type, title, and description but allows empty or
whitespace-only strings; update the check in the function containing the
existing isinstance guards (referencing action_type, title, description, and the
returned ToolExecutionResult) to ensure each is a str and that each stripped
string has length > 0 (e.g., verify action_type.strip(), title.strip(),
description.strip() are non-empty) and return the same ToolExecutionResult error
when any fails.

In `@tests/unit/api/test_approval_store.py`:
- Around line 179-374: Add a 30-second pytest timeout marker to each newly added
test class to satisfy the per-test timeout rule: annotate the classes
TestSaveIfPending, TestApprovalStoreFilters, and TestOnExpireCallback with
pytest.mark.timeout(30) (as a class decorator) so every test method inside each
class gets the 30s guard.

In `@tests/unit/engine/test_approval_gate_models.py`:
- Around line 145-152: Add a sibling unit test to ensure an empty string for
decision_reason is rejected the same as blank space: update tests in
test_approval_gate_models.py by adding a test (e.g.,
test_empty_decision_reason_rejected) that constructs
ResumePayload(approval_id="approval-1", approved=False, decided_by="admin",
decision_reason="") and asserts it raises ValidationError, matching the behavior
enforced by NotBlankStr and the existing test_blank_decision_reason_rejected.

---

Duplicate comments:
In `@CLAUDE.md`:
- Line 154: The sentence listing event constants uses "e.g." without the
required comma; update the sentence that begins with "**Event names**: always
use constants..." to insert a comma after "e.g." (so it reads "e.g.,
`PROVIDER_CALL_START`...") and ensure the same punctuation is applied to any
other occurrences of "e.g." in that paragraph referencing constants like
PROVIDER_CALL_START, BUDGET_RECORD_ADDED, CFO_ANOMALY_DETECTED, etc.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a56c33bb-d097-45bd-a066-1d5b2e581be1

📥 Commits

Reviewing files that changed from the base of the PR and between cc65e53 and 58dcc2a.

📒 Files selected for processing (10)
  • CLAUDE.md
  • README.md
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/invoker.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/persistence/test_migrations_v2.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Type hints required on all public functions; mypy strict mode enforced.
Google-style docstrings required on all public classes and functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions must be < 50 lines, files < 800 lines.

Files:

  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/plan_execute_loop.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/persistence/test_migrations_v2.py
  • src/ai_company/tools/approval_tool.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code. Variable name: always logger (not _logger, not log).
Always use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs for logging: always logger.info(EVENT, key=value) — never logger.info('msg %s', val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.

Files:

  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/tools/approval_tool.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/tools/approval_tool.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow on test cases.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Vendor-agnostic everywhere: NEVER use real vendor names in tests. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/persistence/test_migrations_v2.py
🧠 Learnings (12)
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Dependency review: license allow-list (permissive only), PR comment summaries.

Applied to files:

  • README.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Fix everything valid — never skip: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no 'out of scope' skipping.

Applied to files:

  • README.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Package structure: `src/ai_company/` contains api/, budget/, cli/, communication/, config/, core/, engine/, hr/, memory/, persistence/, observability/, providers/, security/, templates/, tools/ modules organized by domain.

Applied to files:

  • README.md
  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/persistence/test_migrations_v2.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Web dashboard structure: Vue 3 + PrimeVue + Tailwind CSS with api/, components/, composables/, router/, stores/, styles/, utils/, views/, __tests__/ organized by feature.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Use structured kwargs for logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : DEBUG logging for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.

Applied to files:

  • src/ai_company/tools/approval_tool.py
🧬 Code graph analysis (5)
src/ai_company/tools/invoker.py (4)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/tools/base.py (2)
  • name (123-125)
  • action_type (133-135)
src/ai_company/providers/models.py (2)
  • ToolCall (96-119)
  • ToolResult (122-135)
src/ai_company/engine/react_loop.py (2)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/engine/plan_execute_loop.py (2)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
tests/unit/engine/test_approval_gate_models.py (2)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
src/ai_company/tools/base.py (1)
  • action_type (133-135)
tests/unit/api/test_approval_store.py (2)
src/ai_company/api/approval_store.py (3)
  • list_items (75-104)
  • ApprovalStore (27-188)
  • get (61-73)
src/ai_company/core/enums.py (2)
  • ApprovalStatus (434-440)
  • ApprovalRiskLevel (443-449)
🪛 LanguageTool
CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (25)
tests/unit/engine/test_approval_gate_models.py (3)

12-88: LGTM!

The TestEscalationInfo class provides thorough coverage:

  • Valid construction with all field assertions
  • Frozen immutability verification
  • Parametrized blank/empty string rejection for all NotBlankStr fields
  • All ApprovalRiskLevel enum values accepted

Good use of @pytest.mark.parametrize for the validation tests.


134-143: Past review comment addressed.

The test_empty_string_rejected parametrized test for approval_id and decided_by has been added, matching the consistency with TestEscalationInfo.


1-9: LGTM!

Module setup is clean:

  • Appropriate imports with no unnecessary dependencies
  • pytestmark correctly applies @pytest.mark.unit and 30-second timeout to all tests per coding guidelines
tests/unit/api/test_approval_store.py (2)

110-120: Action-type test updates are aligned with the new format.

The switch to category:action values in filter/get/save test paths is consistent and improves coverage for the new action-type convention.

Also applies to: 127-127, 165-165


199-239: Parameterize similar save_if_pending negative-path cases.

The already_decided, not_found, and expired scenarios are structurally similar and are better expressed as parameterized cases to reduce duplication.

[ suggest_recommended_refactor ]
As per coding guidelines, "Prefer @pytest.mark.parametrize for testing similar cases."

README.md (1)

133-133: LGTM!

The status line now accurately reflects that the approval workflow can park agents but the resume/re-enqueue scheduler is still pending. This addresses the prior review concern about overstating completeness.

src/ai_company/tools/invoker.py (5)

120-136: LGTM!

The pending_escalations property correctly exposes a read-only tuple of escalation infos collected during tool invocation. The clearing at the start of invoke() and invoke_all() ensures fresh state per invocation batch.


237-251: LGTM!

The escalation tracking on ESCALATE verdict correctly captures the approval_id, tool context, and risk level from the security verdict. The conditional check for verdict.approval_id is not None prevents tracking incomplete escalations.


364-421: LGTM!

The refactor to separate invoke() (clears + delegates) from _invoke_single() (core logic) enables proper escalation collection across batch invocations while maintaining the single-call API contract.


609-674: LGTM!

The _track_parking_metadata method correctly:

  1. Returns early when requires_parking is not True
  2. Returns an error ToolResult when approval_id is missing (fail-closed)
  3. Creates EscalationInfo with appropriate defaults
  4. Propagates non-recoverable errors

The error-result approach ensures the agent cannot silently bypass the approval gate when metadata is inconsistent.


795-801: LGTM!

Sorting escalations by original tool-call order ensures deterministic behavior regardless of concurrent execution timing. The fallback len(calls) for unknown IDs is a reasonable edge-case handler.

CLAUDE.md (1)

101-115: LGTM!

The package structure documentation accurately reflects the new approval gate integration in the api/, engine/, and tools/ modules.

src/ai_company/persistence/sqlite/migrations.py (2)

357-400: LGTM! Crash-safe migration implementation.

The _apply_v7 function correctly handles three intermediate states:

  1. Normal: creates new table, copies rows, renames original→old, renames new→parked_contexts, drops old
  2. Mid-crash after rename: detects parked_contexts_old exists without original, copies from backup, completes rename sequence
  3. Already done: IF NOT EXISTS guards and conditional checks prevent duplicate operations

The state checks at lines 367-369 and conditional operations throughout make this resumable from any crash point.


258-278: LGTM!

The DDL and copy statement templates are well-structured. The INSERT OR IGNORE in _V7_COPY_ROWS provides idempotency for repeated copy attempts, and the {source} placeholder enables copying from either the original or backup table during crash recovery.

tests/unit/persistence/test_migrations_v2.py (3)

19-19: LGTM!

The module-level pytestmark correctly applies both pytest.mark.unit and pytest.mark.timeout(30) to all tests in this file, addressing the prior review comment about missing timeout markers.


99-130: LGTM!

The test correctly verifies the v7 migration behavior:

  1. Sets up pre-v7 schema with task_id TEXT NOT NULL
  2. Verifies NOT NULL constraint via PRAGMA table_info (notnull=1)
  3. Runs migration and asserts version becomes 7
  4. Verifies task_id is now nullable (notnull=0)

132-174: LGTM!

The data preservation test properly:

  1. Creates pre-v7 schema and inserts a test row
  2. Runs migrations
  3. Verifies the row persists with correct id and task_id values

This ensures the INSERT OR IGNORE copy operation preserves existing data.

src/ai_company/engine/plan_execute_loop.py (2)

94-103: LGTM!

The constructor correctly accepts approval_gate as an optional keyword-only parameter and stores it for use during step execution. The docstring accurately describes its purpose.


753-778: LGTM!

The method correctly:

  1. Changed from @staticmethod to instance method to access self._approval_gate
  2. Forwards approval_gate=self._approval_gate to execute_tool_calls

This enables the approval gate to detect escalations and park execution when needed during plan step execution.

src/ai_company/engine/react_loop.py (2)

74-81: LGTM!

The constructor signature mirrors PlanExecuteLoop, accepting approval_gate as an optional keyword-only parameter. This maintains API consistency across execution loops.


247-254: LGTM!

The approval_gate is correctly forwarded to execute_tool_calls, enabling escalation detection and parking during ReAct loop tool execution. This completes the approval gate integration in the core execution path.

src/ai_company/tools/approval_tool.py (4)

49-97: LGTM!

The constructor properly:

  1. Initializes the base tool with appropriate name, description, category, and JSON schema
  2. Stores dependencies (approval_store, risk_classifier, agent_id, task_id)
  3. Defines a clear parameters schema with maxLength constraints

159-202: LGTM!

The _persist_item method correctly:

  1. Creates an ApprovalItem with all required fields including metadata source
  2. Persists via the approval store
  3. Returns error result on failure with proper logging
  4. Re-raises non-recoverable errors (MemoryError, RecursionError)

204-233: LGTM!

The success result correctly includes requires_parking: True and approval_id in metadata, which the ToolInvoker._track_parking_metadata method will detect to trigger the escalation flow.


252-275: LGTM!

The risk classification correctly:

  1. Uses the classifier when available
  2. Falls back to HIGH on missing classifier or classification failure
  3. Logs at appropriate levels (exception for failures, debug for success)

This fail-closed behavior ensures sensitive actions default to high-risk when classification fails.

- Remove not-has_new guard in V7 migration step 2 — INSERT OR IGNORE
  is idempotent, always copy when source exists
- Validate approval_tool args are non-empty after stripping whitespace
- Add timeout(30) markers to approval_store test classes
- Add empty decision_reason rejection test for ResumePayload
Copilot AI review requested due to automatic review settings March 14, 2026 11:05
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 11:06 — with GitHub Actions Inactive
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an approval-gated execution path to the agent engine so SecOps ESCALATE verdicts and agent request_human_approval calls can park execution, persist context, and support later resumption workflows.

Changes:

  • Introduces ApprovalGate + models, integrates escalation parking into execution loops, and adds deterministic ToolInvoker.pending_escalations.
  • Adds request_human_approval tool, shared action_type validation, and expands approval API/controller/store behavior (auth-bound requested_by, save_if_pending, resume-intent signaling).
  • Updates SQLite schema to v7 (taskless parking) and fixes frontend approval WS handling by fetching full items via API.

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
web/src/stores/approvals.ts Reworks WS event handling to use approval_id and reconcile state via API re-fetches.
web/src/tests/stores/approvals.test.ts Adds async WS handler tests (fetching, filtered re-fetch, error cases).
tests/unit/tools/test_invoker_escalation.py New tests for invoker escalation tracking + deterministic ordering.
tests/unit/tools/test_approval_tool.py New tests for RequestHumanApprovalTool creation, validation, and error handling.
tests/unit/persistence/test_migrations_v2.py Expands migration tests to include v7 task_id nullability + data preservation.
tests/unit/persistence/sqlite/test_migrations_v6.py Updates schema version expectation to 7.
tests/unit/observability/test_events.py Ensures approval_gate event module is discovered and unit-marked.
tests/unit/engine/test_security_factory.py New tests for extracted security/tool factory helpers.
tests/unit/engine/test_loop_helpers_approval.py New tests for loop helper parking integration and failure paths.
tests/unit/engine/test_approval_gate_models.py New tests for frozen escalation/resume payload models.
tests/unit/engine/test_approval_gate.py New tests for ApprovalGate park/resume lifecycle and resume message formatting.
tests/unit/core/test_validation.py New tests for shared is_valid_action_type() validation helper.
tests/unit/api/test_dto.py Updates DTO tests (action_type format, TTL bounds, approve comment bounds).
tests/unit/api/test_approval_store.py Updates action_type fixtures + adds tests for save_if_pending, filters, on_expire.
tests/unit/api/controllers/test_approvals_helpers.py New tests for controller helpers (decision resolution, resume intent stub, WS publish).
tests/unit/api/controllers/test_approvals.py Updates controller payload defaults (action_type format, requested_by removed).
src/ai_company/tools/registry.py Adds ToolRegistry.all_tools() for tool enumeration/cloning.
src/ai_company/tools/invoker.py Tracks escalations from SecOps verdicts and tool parking metadata; exposes pending_escalations.
src/ai_company/tools/approval_tool.py Implements RequestHumanApprovalTool that creates approvals and signals parking via metadata.
src/ai_company/tools/init.py Exports the new approval tool.
src/ai_company/security/timeout/parked_context.py Makes ParkedContext.task_id nullable for taskless agents.
src/ai_company/security/timeout/park_service.py Allows task_id=None when parking contexts.
src/ai_company/persistence/sqlite/migrations.py Bumps schema to v7 and adds migration to make parked_contexts.task_id nullable.
src/ai_company/observability/events/approval_gate.py Adds structured event constants for approval gate lifecycle.
src/ai_company/engine/react_loop.py Wires optional approval_gate into React loop tool execution.
src/ai_company/engine/plan_execute_loop.py Wires optional approval_gate into plan/execute step tool execution.
src/ai_company/engine/loop_helpers.py Adds escalation check after tool execution and parks context via helper.
src/ai_company/engine/approval_gate_models.py Adds frozen Pydantic models for escalation info and resume payload.
src/ai_company/engine/approval_gate.py Implements park/resume coordination over ParkService + ParkedContextRepository.
src/ai_company/engine/agent_engine.py Integrates approval gate creation, loop wiring, and approval tool registration via factories.
src/ai_company/engine/_security_factory.py Extracts security interceptor factory + approval-tool registry composition.
src/ai_company/engine/init.py Exports approval gate service/models.
src/ai_company/core/validation.py Adds shared is_valid_action_type() helper.
src/ai_company/api/state.py Adds approval_gate to application state.
src/ai_company/api/dto.py Removes requested_by from create payload; validates action_type format; bounds TTL/comment.
src/ai_company/api/controllers/approvals.py Binds requested_by to auth user; adds conflict-safe save_if_pending; signals resume intent.
src/ai_company/api/approval_store.py Adds save_if_pending() and hardens on_expire callback error handling.
docs/design/engine.md Documents approval-gated parking behavior in engine run flow.
README.md Updates status/roadmap text to reflect approval gates + scheduler remaining.
CLAUDE.md Updates repo overview + event-constant guidance to include approval gate domain.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +649 to +656
_EscalationInfo(
approval_id=str(result.metadata["approval_id"]),
tool_call_id=tool_call.id,
tool_name=tool.name,
action_type=tool.action_type,
risk_level=ApprovalRiskLevel(
result.metadata.get("risk_level", "high"),
),
Comment on lines +386 to +391
# Step 4: rename new → parked_contexts.
if await _table_exists(db, "parked_contexts_new"):
await db.execute(
"ALTER TABLE parked_contexts_new RENAME TO parked_contexts",
)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/unit/api/test_approval_store.py (2)

49-50: 🧹 Nitpick | 🔵 Trivial

Consider adding timeout marker to existing test class for consistency.

The new test classes correctly include @pytest.mark.timeout(30), but TestApprovalStore lacks this marker. For consistency with the coding guidelines and the newly added classes, consider adding it here as well.

♻️ Proposed fix
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestApprovalStore:

As per coding guidelines, "Test timeout: 30 seconds per test."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/test_approval_store.py` around lines 49 - 50, Add the pytest
timeout marker to the TestApprovalStore class by annotating the class with
`@pytest.mark.timeout`(30) (i.e., add the decorator directly above the
TestApprovalStore class definition) so it follows the same 30-second per-test
timeout convention used by the other test classes.

18-21: 🛠️ Refactor suggestion | 🟠 Major

Update default action_type to match the new format.

The helper's default action_type="code_merge" uses the old underscore format, while the changed tests explicitly use the new colon format ("code:merge", "deploy:staging"). This inconsistency could cause confusion and tests using the default may not align with the new domain convention.

♻️ Proposed fix
 def _make_item(  # noqa: PLR0913
     *,
     approval_id: str = "approval-001",
-    action_type: str = "code_merge",
+    action_type: str = "code:merge",
     risk_level: ApprovalRiskLevel = ApprovalRiskLevel.MEDIUM,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/test_approval_store.py` around lines 18 - 21, The helper
function _make_item has an outdated default action_type ("code_merge"); update
its default to the new colon-delimited format ("code:merge") so tests that rely
on the helper match the updated domain convention (e.g., other tests that use
"code:merge" or "deploy:staging"); locate the _make_item definition and change
the action_type default value accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 387-390: The migration's Step 4 currently only checks
_table_exists(db, "parked_contexts_new") and blindly renames it, causing an
OperationalError if "parked_contexts" already exists; update the logic around
that check (the block using _table_exists, "parked_contexts_new" and the
db.execute renaming) to first test whether "parked_contexts" already exists and,
if so, drop "parked_contexts_new" (via await db.execute("DROP TABLE
parked_contexts_new")) instead of renaming; otherwise perform the existing
RENAME TO operation; keep using the same _table_exists helper and db.execute
calls so the change is localized to the Step 4 branch.

In `@src/ai_company/tools/approval_tool.py`:
- Around line 189-190: The except blocks that catch MemoryError and
RecursionError (the "except MemoryError, RecursionError:" handlers in
src/ai_company/tools/approval_tool.py) re-raise without logging; modify each
handler to log a warning or error with contextual information (include the
exception type and message and relevant operation/context) via the module logger
before re-raising the same exception so the error path is observable; ensure you
use the existing logger instance (or create one consistent with the file) and
call logger.warning/logger.error with the exception info (e.g., include
exc_info=True or include str(exc)) immediately prior to the raise in both the
first handler and the similar handler later in the file.
- Around line 99-160: Split execute into a thin orchestrator plus two helpers:
extract the argument parsing/validation into a new method (e.g.
_parse_and_validate_arguments(arguments) -> either (action_type, title,
description) or a ToolExecutionResult error) which contains the try/except
KeyError block and the type/empty-string checks, and leave validation of action
type via the existing _validate_action_type call inside that helper; then
extract the remaining flow that classifies risk, generates approval_id, calls
_persist_item, and returns results into a second helper (e.g.
_create_and_store_approval(action_type, title, description) ->
ToolExecutionResult or success payload) that uses _classify_risk, uuid4().hex to
build approval_id, _persist_item, and _build_success; finally make execute() a
small async method that calls these two helpers in sequence and returns early on
any ToolExecutionResult error.
- Around line 127-147: The input strings are validated for emptiness with
.strip() but the untrimmed values are still used later (e.g., passed to
_validate_action_type, _classify_risk, and persisted), so after the initial
validation block set normalized variables (e.g., action_type =
action_type.strip(), title = title.strip(), description = description.strip() or
another canonical normalization if required) and use these normalized values for
the subsequent calls to self._validate_action_type(action_type) and
self._classify_risk(action_type) and any persistence/ToolExecutionResult content
to avoid misclassification or inconsistent storage.
- Around line 199-203: The tool currently returns internal exception
class/message in the agent-facing ToolExecutionResult (the return constructing
ToolExecutionResult in approval_tool.py), which can leak internals; change this
to return a generic, non-sensitive error message (e.g., "Failed to create
approval request") in the content and set is_error=True, and move the detailed
exception information into an internal log call (use existing logger or
logger.exception/logger.error) so the full exception type and message are
recorded only in logs, not returned to the caller.

In `@tests/unit/engine/test_approval_gate_models.py`:
- Around line 15-161: Add Google-style docstrings to all public test functions
in this file (e.g., test_valid_construction, test_frozen_immutability,
test_blank_string_rejected, test_empty_string_rejected,
test_all_risk_levels_accepted, test_approved_without_reason,
test_rejected_with_reason, test_empty_decision_reason_rejected,
test_blank_decision_reason_rejected, etc.). For each test function add a concise
Google-style docstring describing the behavior being tested (one-line summary;
longer description or Args/Returns sections are optional for clarity) so the
functions satisfy the repository's docstring policy enforced by ruff D rules.
Ensure the docstrings are triple-quoted and placed immediately under each def to
mark them as public-function docstrings.

---

Outside diff comments:
In `@tests/unit/api/test_approval_store.py`:
- Around line 49-50: Add the pytest timeout marker to the TestApprovalStore
class by annotating the class with `@pytest.mark.timeout`(30) (i.e., add the
decorator directly above the TestApprovalStore class definition) so it follows
the same 30-second per-test timeout convention used by the other test classes.
- Around line 18-21: The helper function _make_item has an outdated default
action_type ("code_merge"); update its default to the new colon-delimited format
("code:merge") so tests that rely on the helper match the updated domain
convention (e.g., other tests that use "code:merge" or "deploy:staging"); locate
the _make_item definition and change the action_type default value accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cec01522-f6a3-4d8d-a624-775b3df68279

📥 Commits

Reviewing files that changed from the base of the PR and between 58dcc2a and 9b0638e.

📒 Files selected for processing (4)
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/tools/approval_tool.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/engine/test_approval_gate_models.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Agent
  • GitHub Check: Build Backend
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Type hints required on all public functions; mypy strict mode enforced.
Google-style docstrings required on all public classes and functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions must be < 50 lines, files < 800 lines.

Files:

  • src/ai_company/persistence/sqlite/migrations.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/engine/test_approval_gate_models.py
  • src/ai_company/tools/approval_tool.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code. Variable name: always logger (not _logger, not log).
Always use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs for logging: always logger.info(EVENT, key=value) — never logger.info('msg %s', val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.

Files:

  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/tools/approval_tool.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/tools/approval_tool.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow on test cases.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Vendor-agnostic everywhere: NEVER use real vendor names in tests. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/api/test_approval_store.py
  • tests/unit/engine/test_approval_gate_models.py
🧠 Learnings (3)
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/api/test_approval_store.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Use markers pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow on test cases.

Applied to files:

  • tests/unit/api/test_approval_store.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.

Applied to files:

  • src/ai_company/tools/approval_tool.py
🧬 Code graph analysis (3)
tests/unit/api/test_approval_store.py (3)
src/ai_company/api/approval_store.py (2)
  • add (42-59)
  • list_items (75-104)
src/ai_company/core/enums.py (2)
  • ApprovalStatus (434-440)
  • ApprovalRiskLevel (443-449)
tests/unit/core/test_approval.py (1)
  • _now (11-12)
tests/unit/engine/test_approval_gate_models.py (1)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
src/ai_company/tools/approval_tool.py (3)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
src/ai_company/api/approval_store.py (2)
  • ApprovalStore (27-188)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
🔇 Additional comments (7)
tests/unit/api/test_approval_store.py (4)

110-120: LGTM!

The action_type values correctly use the new category:action format convention.


179-240: LGTM!

Good coverage of save_if_pending() edge cases including pending, already-decided, not-found, and expired scenarios. The timeout marker addresses the previous review feedback.


243-303: LGTM!

The combined filter tests properly validate multi-criteria filtering and use the new action_type format consistently.


306-377: LGTM!

Excellent coverage of the on_expire callback lifecycle, including exception resilience and lazy expiration behavior verification.

tests/unit/engine/test_approval_gate_models.py (1)

9-9: Good module-level test classification and timeout setup.

Using pytestmark here keeps all tests consistently marked as unit tests with a 30s timeout.

As per coding guidelines, tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow on test cases. Test timeout: 30 seconds per test.

src/ai_company/tools/approval_tool.py (1)

215-236: Good structured lifecycle logging and parking metadata.

Line 215–Line 236 correctly logs the state transition with a domain event constant and returns deterministic parking metadata (requires_parking, approval_id, action_type, risk_level).

src/ai_company/persistence/sqlite/migrations.py (1)

270-278: Good idempotent row-copy approach.

Using INSERT OR IGNORE in _V7_COPY_ROWS is a solid choice for crash-retry safety during repeated copy attempts.

Comment on lines +99 to +160
async def execute(
self,
*,
arguments: dict[str, Any],
) -> ToolExecutionResult:
"""Create an approval item and signal parking.

Args:
arguments: Must contain ``action_type``, ``title``, and
``description``.

Returns:
``ToolExecutionResult`` with ``requires_parking=True`` in
metadata on success, or an error result on failure.
"""
try:
action_type = arguments["action_type"]
title = arguments["title"]
description = arguments["description"]
except KeyError as exc:
return ToolExecutionResult(
content=(
f"Missing required argument: {exc}. "
f"Required: action_type, title, description"
),
is_error=True,
)

if (
not isinstance(action_type, str)
or not isinstance(title, str)
or not isinstance(description, str)
or not action_type.strip()
or not title.strip()
or not description.strip()
):
return ToolExecutionResult(
content=(
"Arguments action_type, title, and description "
"must be non-empty strings"
),
is_error=True,
)

validation_error = self._validate_action_type(action_type)
if validation_error is not None:
return validation_error

risk_level = self._classify_risk(action_type)
approval_id = f"approval-{uuid4().hex}"

store_error = await self._persist_item(
approval_id,
action_type,
title,
description,
risk_level,
)
if store_error is not None:
return store_error

return self._build_success(approval_id, action_type, risk_level, title)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Split execute() to satisfy the function-size gate.

execute() currently exceeds the < 50 lines project limit (Line 99–Line 160). Please extract argument parsing/validation and orchestration into helpers to keep this entrypoint smaller and easier to test.

As per coding guidelines "Functions must be < 50 lines, files < 800 lines."

♻️ Refactor sketch
     async def execute(
         self,
         *,
         arguments: dict[str, Any],
     ) -> ToolExecutionResult:
-        ...
+        parsed = self._parse_and_validate_arguments(arguments)
+        if isinstance(parsed, ToolExecutionResult):
+            return parsed
+        action_type, title, description = parsed
+
+        risk_level = self._classify_risk(action_type)
+        approval_id = f"approval-{uuid4().hex}"
+        store_error = await self._persist_item(
+            approval_id, action_type, title, description, risk_level
+        )
+        if store_error is not None:
+            return store_error
+        return self._build_success(approval_id, action_type, risk_level, title)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/approval_tool.py` around lines 99 - 160, Split execute
into a thin orchestrator plus two helpers: extract the argument
parsing/validation into a new method (e.g.
_parse_and_validate_arguments(arguments) -> either (action_type, title,
description) or a ToolExecutionResult error) which contains the try/except
KeyError block and the type/empty-string checks, and leave validation of action
type via the existing _validate_action_type call inside that helper; then
extract the remaining flow that classifies risk, generates approval_id, calls
_persist_item, and returns results into a second helper (e.g.
_create_and_store_approval(action_type, title, description) ->
ToolExecutionResult or success payload) that uses _classify_risk, uuid4().hex to
build approval_id, _persist_item, and _build_success; finally make execute() a
small async method that calls these two helpers in sequence and returns early on
any ToolExecutionResult error.

Comment on lines +189 to +190
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Log fatal re-raises before propagating.

Line 189–Line 190 and Line 264–Line 265 re-raise MemoryError/RecursionError without a warning/error log. Add context logging before re-raise to satisfy error-path observability requirements.

As per coding guidelines "All error paths must log at WARNING or ERROR with context before raising."

🧭 Logging-before-raise example
-        except MemoryError, RecursionError:
-            raise
+        except MemoryError:
+            logger.error(
+                APPROVAL_GATE_ESCALATION_FAILED,
+                agent_id=self._agent_id,
+                action_type=action_type,
+                error="MemoryError",
+                note="Fatal during approval item persistence",
+            )
+            raise
+        except RecursionError:
+            logger.error(
+                APPROVAL_GATE_ESCALATION_FAILED,
+                agent_id=self._agent_id,
+                action_type=action_type,
+                error="RecursionError",
+                note="Fatal during approval item persistence",
+            )
+            raise

Also applies to: 264-265

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/approval_tool.py` around lines 189 - 190, The except
blocks that catch MemoryError and RecursionError (the "except MemoryError,
RecursionError:" handlers in src/ai_company/tools/approval_tool.py) re-raise
without logging; modify each handler to log a warning or error with contextual
information (include the exception type and message and relevant
operation/context) via the module logger before re-raising the same exception so
the error path is observable; ensure you use the existing logger instance (or
create one consistent with the file) and call logger.warning/logger.error with
the exception info (e.g., include exc_info=True or include str(exc)) immediately
prior to the raise in both the first handler and the similar handler later in
the file.

Comment on lines +15 to +161
def test_valid_construction(self) -> None:
info = EscalationInfo(
approval_id="approval-1",
tool_call_id="tc-1",
tool_name="deploy_to_prod",
action_type="deploy:production",
risk_level=ApprovalRiskLevel.CRITICAL,
reason="Production deployment requires approval",
)
assert info.approval_id == "approval-1"
assert info.tool_call_id == "tc-1"
assert info.tool_name == "deploy_to_prod"
assert info.action_type == "deploy:production"
assert info.risk_level == ApprovalRiskLevel.CRITICAL
assert info.reason == "Production deployment requires approval"

def test_frozen_immutability(self) -> None:
info = EscalationInfo(
approval_id="approval-1",
tool_call_id="tc-1",
tool_name="deploy_to_prod",
action_type="deploy:production",
risk_level=ApprovalRiskLevel.HIGH,
reason="Needs approval",
)
with pytest.raises(ValidationError):
info.approval_id = "changed" # type: ignore[misc]

@pytest.mark.parametrize(
"field",
["approval_id", "tool_call_id", "tool_name", "action_type", "reason"],
)
def test_blank_string_rejected(self, field: str) -> None:
kwargs = {
"approval_id": "approval-1",
"tool_call_id": "tc-1",
"tool_name": "deploy_to_prod",
"action_type": "deploy:production",
"risk_level": ApprovalRiskLevel.LOW,
"reason": "Needs approval",
}
kwargs[field] = " "
with pytest.raises(ValidationError):
EscalationInfo(**kwargs) # type: ignore[arg-type]

@pytest.mark.parametrize(
"field",
["approval_id", "tool_call_id", "tool_name", "action_type", "reason"],
)
def test_empty_string_rejected(self, field: str) -> None:
kwargs = {
"approval_id": "approval-1",
"tool_call_id": "tc-1",
"tool_name": "deploy_to_prod",
"action_type": "deploy:production",
"risk_level": ApprovalRiskLevel.LOW,
"reason": "Needs approval",
}
kwargs[field] = ""
with pytest.raises(ValidationError):
EscalationInfo(**kwargs) # type: ignore[arg-type]

def test_all_risk_levels_accepted(self) -> None:
for level in ApprovalRiskLevel:
info = EscalationInfo(
approval_id="a",
tool_call_id="t",
tool_name="tool",
action_type="cat:act",
risk_level=level,
reason="reason",
)
assert info.risk_level == level


class TestResumePayload:
"""ResumePayload construction and immutability."""

def test_approved_without_reason(self) -> None:
payload = ResumePayload(
approval_id="approval-1",
approved=True,
decided_by="admin",
)
assert payload.approval_id == "approval-1"
assert payload.approved is True
assert payload.decided_by == "admin"
assert payload.decision_reason is None

def test_rejected_with_reason(self) -> None:
payload = ResumePayload(
approval_id="approval-1",
approved=False,
decided_by="admin",
decision_reason="Too risky",
)
assert payload.approved is False
assert payload.decision_reason == "Too risky"

def test_frozen_immutability(self) -> None:
payload = ResumePayload(
approval_id="approval-1",
approved=True,
decided_by="admin",
)
with pytest.raises(ValidationError):
payload.approved = False # type: ignore[misc]

@pytest.mark.parametrize("field", ["approval_id", "decided_by"])
def test_blank_string_rejected(self, field: str) -> None:
kwargs = {
"approval_id": "approval-1",
"approved": True,
"decided_by": "admin",
}
kwargs[field] = " "
with pytest.raises(ValidationError):
ResumePayload(**kwargs) # type: ignore[arg-type]

@pytest.mark.parametrize("field", ["approval_id", "decided_by"])
def test_empty_string_rejected(self, field: str) -> None:
kwargs = {
"approval_id": "approval-1",
"approved": True,
"decided_by": "admin",
}
kwargs[field] = ""
with pytest.raises(ValidationError):
ResumePayload(**kwargs) # type: ignore[arg-type]

def test_empty_decision_reason_rejected(self) -> None:
with pytest.raises(ValidationError):
ResumePayload(
approval_id="approval-1",
approved=False,
decided_by="admin",
decision_reason="",
)

def test_blank_decision_reason_rejected(self) -> None:
with pytest.raises(ValidationError):
ResumePayload(
approval_id="approval-1",
approved=False,
decided_by="admin",
decision_reason=" ",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add Google-style docstrings to public test methods.

The test classes have docstrings, but the public test functions do not. This will violate the repository docstring policy for Python files.

🛠️ Suggested pattern (apply to all test methods in this file)
 class TestEscalationInfo:
     """EscalationInfo construction and immutability."""

     def test_valid_construction(self) -> None:
+        """Construct EscalationInfo with valid inputs and verify field values."""
         info = EscalationInfo(
             approval_id="approval-1",
             tool_call_id="tc-1",
             tool_name="deploy_to_prod",
             action_type="deploy:production",
             risk_level=ApprovalRiskLevel.CRITICAL,
             reason="Production deployment requires approval",
         )

As per coding guidelines, **/*.py: Google-style docstrings required on all public classes and functions (enforced by ruff D rules).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_approval_gate_models.py` around lines 15 - 161, Add
Google-style docstrings to all public test functions in this file (e.g.,
test_valid_construction, test_frozen_immutability, test_blank_string_rejected,
test_empty_string_rejected, test_all_risk_levels_accepted,
test_approved_without_reason, test_rejected_with_reason,
test_empty_decision_reason_rejected, test_blank_decision_reason_rejected, etc.).
For each test function add a concise Google-style docstring describing the
behavior being tested (one-line summary; longer description or Args/Returns
sections are optional for clarity) so the functions satisfy the repository's
docstring policy enforced by ruff D rules. Ensure the docstrings are
triple-quoted and placed immediately under each def to mark them as
public-function docstrings.

- Read action_type from result.metadata instead of tool.action_type
  in _track_parking_metadata (request_human_approval carries the
  requested action in metadata, not the tool's own comms:internal)
- Handle crash between V7 step 4 and 5: if both parked_contexts and
  parked_contexts_new exist, drop the redundant _new table
- Strip whitespace from action_type/title/description before use
- Return generic error message to agent instead of exception details
- Add timeout(30) to TestApprovalStore class
- Fix _make_item default action_type to code:merge format
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 11:17 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 2db968a into main Mar 14, 2026
27 of 29 checks passed
@Aureliolo Aureliolo deleted the feat/task-engine-approval branch March 14, 2026 11:19
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 11:19 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 14, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.4](v0.1.3...v0.1.4)
(2026-03-14)


### Features

* add approval workflow gates to TaskEngine
([#387](#387))
([2db968a](2db968a))
* implement checkpoint recovery strategy
([#367](#367))
([f886838](f886838))


### CI/CD

* add npm and pre-commit ecosystems to Dependabot
([#369](#369))
([54e5fe7](54e5fe7))
* bump actions/setup-node from 4.4.0 to 6.3.0
([#360](#360))
([2db5105](2db5105))
* bump github/codeql-action from 3.32.6 to 4.32.6
([#361](#361))
([ce766e8](ce766e8))
* group major dependabot bumps per ecosystem
([#388](#388))
([3c43aef](3c43aef))


### Maintenance

* bump @vitejs/plugin-vue from 5.2.4 to 6.0.5 in /web
([#382](#382))
([d7054ee](d7054ee))
* bump @vue/tsconfig from 0.7.0 to 0.9.0 in /web in the minor-and-patch
group across 1 directory
([#371](#371))
([64fa08b](64fa08b))
* bump astro from 5.18.1 to 6.0.4 in /site
([#376](#376))
([d349317](d349317))
* bump https://github.com/astral-sh/ruff-pre-commit from v0.15.5 to
0.15.6 ([#372](#372))
([dcacb2e](dcacb2e))
* bump https://github.com/gitleaks/gitleaks from v8.24.3 to 8.30.1
([#375](#375))
([a18e6ed](a18e6ed))
* bump https://github.com/hadolint/hadolint from v2.12.0 to 2.14.0
([#373](#373))
([47b906b](47b906b))
* bump https://github.com/pre-commit/pre-commit-hooks from v5.0.0 to
6.0.0 ([#374](#374))
([1926555](1926555))
* bump litellm from 1.82.1 to 1.82.2 in the minor-and-patch group
([#385](#385))
([fa4f7b7](fa4f7b7))
* bump node from 22-alpine to 25-alpine in /docker/web
([#359](#359))
([8d56cd3](8d56cd3))
* bump node from 22-slim to 25-slim in /docker/sandbox
([#358](#358))
([3de8748](3de8748))
* bump pinia from 2.3.1 to 3.0.4 in /web
([#381](#381))
([c78dcc2](c78dcc2))
* bump the major group across 1 directory with 9 updates
([#389](#389))
([9fa621b](9fa621b))
* bump the minor-and-patch group with 2 updates
([#362](#362))
([6ede2ce](6ede2ce))
* bump vue-router from 4.6.4 to 5.0.3 in /web
([#378](#378))
([6c60f6c](6c60f6c))
* expand review skills to 18 smart conditional agents
([#364](#364))
([494013f](494013f))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement request_human_approval tool for approval workflow feat: implement approval workflow gates in engine

2 participants