Skip to content

feat: add approval workflow gates to TaskEngine#368

Closed
Aureliolo wants to merge 9 commits intomainfrom
feat/task-engine-approval
Closed

feat: add approval workflow gates to TaskEngine#368
Aureliolo wants to merge 9 commits intomainfrom
feat/task-engine-approval

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Approval gate integration: Wire SecOps ESCALATE verdicts and request_human_approval tool calls into the engine execution loop, parking agent context when human approval is required
  • RequestHumanApprovalTool: Agent-callable tool that creates ApprovalItem in the approval store and signals execution parking via metadata
  • ApprovalGate service: Coordinates context serialization (via ParkService), persistence (via ParkedContextRepository), and resume with decision message injection
  • Loop integration: Both ReactLoop and PlanExecuteLoop check for escalations after tool execution and return PARKED termination reason
  • ToolInvoker escalation tracking: Detects ESCALATE verdicts and parking metadata, exposes pending_escalations with proper tuple[EscalationInfo, ...] typing
  • API controller hardening: requested_by bound to auth user (not request body), prompt injection mitigation in resume messages, broadened WebSocket error handling
  • WebSocket fix: Frontend approval store now uses approval_id key from backend payload and fetches full items via API
  • Observability: 8 structured event constants for approval gate lifecycle

Closes #258, closes #259

Review coverage

Pre-reviewed by 15 specialized agents (docs-consistency, code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, api-contract-drift, test-quality-reviewer, async-concurrency-reviewer, issue-resolution-verifier).

43 findings identified and addressed across Critical (14), Major (23), and Medium (6) severities. Key fixes:

  • Security: identity spoofing, prompt injection, schema length limits
  • Lifecycle: premature context deletion, orphaned records, zombie parking states
  • Type safety: tuple[Any, ...]tuple[EscalationInfo, ...]
  • Frontend: WebSocket payload mismatch (real-time updates were completely broken)
  • Tests: markers, precise assertions, new error path coverage

Test plan

  • uv run ruff check src/ tests/ — clean
  • uv run mypy src/ tests/ — clean (926 files, strict)
  • uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 7540 passed, 94.60% coverage
  • Verify WebSocket real-time updates work in dashboard (manual)
  • Verify approval create/approve/reject flows via API (manual)

Implement the approval gate layer that parks agent execution when
human approval is required and provides infrastructure for resuming
after a decision. Also adds the `request_human_approval` tool for
agents to explicitly request approval.

- Add EscalationInfo/ResumePayload models and approval gate events
- Add ApprovalGate service (park/resume via ParkService)
- Add RequestHumanApprovalTool (creates ApprovalItem, signals parking)
- Add ToolInvoker escalation tracking (ESCALATE verdicts + tool metadata)
- Integrate approval gate into execute_tool_calls and both loop types
- Wire into AgentEngine, AppState, and approvals controller resume path

Closes #258
Closes #259
Critical: empty task_id ValidationError, parking metadata bypass,
PARKED→ERROR on park failure, wire parked_context_repo, TOCTOU race
via save_if_pending, resume trigger wiring in approve/reject.

Security: prompt injection mitigation, max_length on comment,
ttl_seconds cap, action_type format validation, remove phantom
requested_by field.

Logging: wrong event constants for init/risk/park-info, consistent
MemoryError logging, new INITIALIZED/RISK_CLASSIFIED/RESUME_TRIGGERED
constants.

Code quality: extract security/tool factories (962→860 lines), split
execute() into helpers, deterministic escalation order, boolean
parking_failed, check delete() result, risk classifier error handling.

Frontend: async handler type fix, WS test payload/await fix,
filter-aware WS, 404 vs transient error discrimination, total desync
fix, empty catch logging.

Tests: park failure expects ERROR, module-level markers, extracted
fixtures, simplified PropertyMock, updated resume message assertions.

Docs: persistence qualifier, CLAUDE.md events, docstring fixes.
- _trigger_resume no longer calls resume_context() which would
  delete the parked record before a scheduler can consume it —
  now only logs the resume trigger for scheduler observation
- Remove dead APPROVAL_RESUMED WsEventType (never emitted)
- Remove unused ApprovalGate import from controller
Copilot AI review requested due to automatic review settings March 14, 2026 08:10
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 14, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Approval workflow: request human approval, park/resume agent execution, agent-facing approval tool, and strict action_type validation.
    • New observability events and WebSocket-driven approval updates.
  • Bug Fixes

    • Hardened error handling around expirations, auth, concurrency/conflict, parking/resume, and resume signaling.
  • Documentation

    • README and design/docs updated to describe the approval workflow and logging guidance.
  • Tests

    • Expanded unit tests covering approval gate, tools, invoker, DTOs, migrations, and WS handling.

Walkthrough

Adds an approval-gate feature: new ApprovalGate service and models, request_human_approval tool, ToolInvoker escalation tracking, loop/engine wiring to park/resume agent contexts (PARKED path), API/DTO/controller/store updates, DB migration to nullable parked task_id, observability events, and many tests and frontend WS/store updates.

Changes

Cohort / File(s) Summary
Core models & service
src/ai_company/engine/approval_gate.py, src/ai_company/engine/approval_gate_models.py, src/ai_company/engine/__init__.py
New ApprovalGate class, EscalationInfo and ResumePayload models; exported from engine package.
Engine wiring & loops
src/ai_company/engine/agent_engine.py, src/ai_company/engine/react_loop.py, src/ai_company/engine/plan_execute_loop.py, src/ai_company/engine/loop_helpers.py
Thread approval_gate through AgentEngine and loops; execute_tool_calls may park contexts on escalations, add parked_context_repo wiring and helper factories.
Security factory & detectors
src/ai_company/engine/_security_factory.py
New make_security_interceptor and registry_with_approval_tool utilities to build SecOps interceptors and register the approval tool when approval_store is present.
Tools & invoker
src/ai_company/tools/approval_tool.py, src/ai_company/tools/invoker.py, src/ai_company/tools/registry.py, src/ai_company/tools/__init__.py
New RequestHumanApprovalTool; ToolInvoker tracks pending_escalations and parking metadata; ToolRegistry adds all_tools; tools package exports the approval tool.
API, DTOs, store & controllers
src/ai_company/api/approval_store.py, src/ai_company/api/controllers/approvals.py, src/ai_company/api/dto.py, src/ai_company/api/state.py
ApprovalStore.save_if_pending added; controller auth/optimistic-save/retry signaling and events added; DTO validates action_type and tightens ttl; AppState exposes optional approval_gate.
Park/resume & persistence
src/ai_company/security/timeout/park_service.py, src/ai_company/security/timeout/parked_context.py, src/ai_company/persistence/sqlite/migrations.py
ParkService and ParkedContext.task_id now optional/nullable; migration bumped to v6 to make parked_contexts.task_id nullable.
Observability events
src/ai_company/observability/events/approval_gate.py
New typed event constants covering approval-gate lifecycle (initialized, escalation., risk.classified, context.parked/park_failed, resume. etc.).
Validation util
src/ai_company/core/validation.py
New is_valid_action_type("category:action") helper used by DTOs and tools.
Frontend WebSocket & store
web/src/stores/approvals.ts, web/src/__tests__/stores/approvals.test.ts
WS handler now uses approval_id and is synchronous (delegates async work); fetch-on-event behavior with Axios-aware error handling; tests updated to new payload shapes and behaviors.
Docs & README
CLAUDE.md, README.md, docs/design/engine.md
Documentation updated to describe approval-gate components, events, approval tool, and PARKED pathway after tool execution.
Tests
tests/unit/... (many files)
Extensive unit tests added/updated covering ApprovalGate, models, ApprovalStore, controllers, ToolInvoker escalation, RequestHumanApprovalTool, loop_helpers parking flows, security factory, DTO validations, migrations, and frontend WS behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent/Loop
    participant TI as ToolInvoker
    participant SO as SecOps
    participant AG as ApprovalGate
    participant PS as ParkService
    participant Repo as ParkedContextRepo
    participant AC as ApprovalController

    Agent->>TI: invoke(tool_call)
    TI->>SO: check_security(tool_call)
    alt ESCALATE with approval_id
        SO-->>TI: ESCALATE (approval_id, risk)
        TI->>TI: record pending_escalation
        TI-->>Agent: return pending_escalations
        Agent->>AG: should_park(escalations)
        AG-->>Agent: escalation_info
        Agent->>AG: park_context(escalation, context, agent_id, task_id)
        AG->>PS: serialize context + metadata
        PS-->>AG: serialized data
        AG->>Repo: save parked context
        Repo-->>AG: parked_context
        AG-->>Agent: success -> PARKED result
    else
        SO-->>TI: ALLOW / DENY
        TI->>Agent: tool result (no park)
    end

    rect rgba(200,150,255,0.5)
    Note over AC,Repo: Later — human decision triggers resume
    AC->>AC: approve/reject(approval_id)
    AC->>AG: resume_context(approval_id)
    AG->>Repo: load by approval_id
    Repo-->>AG: parked_context
    AG->>PS: deserialize -> AgentContext
    AG->>Repo: delete parked record
    AG-->>AC: (AgentContext, parked_id)
    AC->>Agent: inject resume message / continue execution
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #258 — Implements engine-level approval gates, escalation tracking, parking/resume and wiring described by that issue.
  • #259 — Adds the request_human_approval tool and integrates it with approval store and parking, matching that issue's scope.

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add approval workflow gates to TaskEngine' clearly summarizes the main change: implementing approval gate functionality in the TaskEngine. It is specific, concise, and directly related to the primary objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing approval gate integration, the RequestHumanApprovalTool, ApprovalGate service, loop integration, escalation tracking, API hardening, and observability improvements.
Linked Issues check ✅ Passed The PR successfully addresses both linked issues: #258 (approval workflow gates in engine—wire SecOps evaluation, block execution, route to approval queue, resume on decision) and #259 (request_human_approval tool—tool implementation, store integration, LLM protocol compliance). All coding requirements are met.
Out of Scope Changes check ✅ Passed All changes are directly scoped to approval workflow gates: engine integration, tool implementation, API hardening, WebSocket fixes for real-time updates, observability events, and supporting models. No unrelated refactorings or feature creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/task-engine-approval
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/task-engine-approval
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust human approval workflow into the system, enabling agents to pause their operations and await human review for critical or sensitive tasks. This involves deep integration across the core engine, API, and tools, ensuring that agent contexts can be reliably parked, persisted, and resumed. The changes also include significant security enhancements for API interactions and crucial fixes for real-time frontend updates, making the approval process transparent and secure.

Highlights

  • Approval Workflow Integration: Introduced a comprehensive human approval workflow, allowing agent execution to be paused for human review and resumed upon decision. This includes wiring SecOps ESCALATE verdicts and request_human_approval tool calls into the engine execution loop, parking agent context when human approval is required.
  • New ApprovalGate Service: Implemented a new ApprovalGate service that coordinates context serialization, persistence via ParkedContextRepository, and resumption with decision message injection, ensuring agents can be paused and restarted seamlessly.
  • Agent-Callable RequestHumanApprovalTool: Added a new agent-callable tool, RequestHumanApprovalTool, which creates an ApprovalItem in the approval store and signals execution parking, enabling agents to explicitly request human intervention for sensitive actions.
  • Enhanced ToolInvoker with Escalation Tracking: The ToolInvoker now detects ESCALATE verdicts and requires_parking metadata from tools, exposing pending_escalations with proper typing and clearing them between invocations for accurate tracking.
  • API Controller Hardening and Frontend Fixes: Hardened the API controller by binding requested_by to the authenticated user (not request body), mitigating prompt injection in resume messages, and broadening WebSocket error handling. A critical WebSocket fix ensures the frontend approval store correctly uses approval_id from backend payloads and fetches full items via API, resolving previously broken real-time updates.
  • Observability Improvements: Added 8 new structured event constants for the approval gate lifecycle, enhancing monitoring and debugging capabilities for the new workflow.
Changelog
  • CLAUDE.md
    • Updated descriptions for api/, engine/, and tools/ directories to reflect approval gate integration.
  • README.md
    • Updated project status to indicate that approval workflow gates are now implemented.
  • docs/design/engine.md
    • Added a new section detailing how ApprovalGate evaluates parking needs after tool execution and handles context serialization and persistence.
  • src/ai_company/api/approval_store.py
    • Added save_if_pending method to update approval items only if their status is still PENDING, preventing concurrent decision issues.
    • Improved error handling for the _on_expire callback with specific exception types and logging.
  • src/ai_company/api/controllers/approvals.py
    • Imported APPROVAL_GATE_RESUME_TRIGGERED for observability.
    • Added _log_approval_decision and _trigger_resume helper functions for logging and signaling context resumption.
    • Modified _resolve_decision to prioritize checking PENDING status before authenticating the user.
    • Updated create_approval to populate requested_by from the authenticated user and added an UnauthorizedError check.
    • Refactored approve and reject methods to use save_if_pending and call _trigger_resume after a decision, handling potential conflicts.
    • Enhanced WebSocket event publishing error handling to catch more general exceptions.
  • src/ai_company/api/dto.py
    • Added field_validator for action_type to enforce a 'category:action' format.
    • Removed requested_by from CreateApprovalRequest as it's now derived from the authenticated user.
    • Added max_length constraint to the comment field in ApproveRequest.
  • src/ai_company/api/state.py
    • Added _approval_gate to AppState slots and constructor.
    • Exposed approval_gate as a property in AppState.
  • src/ai_company/engine/init.py
    • Exported ApprovalGate, EscalationInfo, and ResumePayload for broader access within the engine module.
  • src/ai_company/engine/_security_factory.py
    • Added new file to extract make_security_interceptor and registry_with_approval_tool functions from agent_engine.py for better modularity.
  • src/ai_company/engine/agent_engine.py
    • Refactored security and tool factory methods to use the new _security_factory.py module.
    • Introduced _make_approval_gate to instantiate ApprovalGate if an approval store is configured.
    • Integrated ApprovalGate into the default ReactLoop initialization.
    • Removed direct imports of security rules and related components, delegating their creation to _security_factory.py.
  • src/ai_company/engine/approval_gate.py
    • Added new file implementing the ApprovalGate service, responsible for detecting escalations, parking agent contexts, and resuming them after human decisions.
    • Included methods for should_park, park_context, resume_context, and build_resume_message.
  • src/ai_company/engine/approval_gate_models.py
    • Added new file defining EscalationInfo and ResumePayload Pydantic models for carrying escalation details and approval decisions.
  • src/ai_company/engine/loop_helpers.py
    • Imported new approval gate event constants.
    • Modified execute_tool_calls to accept an optional approval_gate parameter.
    • Added _park_for_approval helper function to handle context parking when escalations are detected, returning a PARKED or ERROR result.
  • src/ai_company/engine/plan_execute_loop.py
    • Added approval_gate to the PlanExecuteLoop constructor.
    • Modified _handle_step_tool_calls to pass the approval_gate to execute_tool_calls.
  • src/ai_company/engine/react_loop.py
    • Added approval_gate to the ReactLoop constructor.
    • Modified _process_turn_response to pass the approval_gate to execute_tool_calls.
  • src/ai_company/observability/events/approval_gate.py
    • Added new file defining constants for various approval gate events, such as APPROVAL_GATE_INITIALIZED, APPROVAL_GATE_ESCALATION_DETECTED, APPROVAL_GATE_CONTEXT_PARKED, and APPROVAL_GATE_CONTEXT_RESUMED.
  • src/ai_company/security/timeout/park_service.py
    • Made the task_id parameter optional in the park method signature and docstring.
  • src/ai_company/security/timeout/parked_context.py
    • Made the task_id field optional in the ParkedContext model.
  • src/ai_company/tools/init.py
    • Exported RequestHumanApprovalTool for external use.
  • src/ai_company/tools/approval_tool.py
    • Added new file implementing RequestHumanApprovalTool, allowing agents to request human approval.
    • This tool creates an ApprovalItem and returns metadata to signal execution parking.
    • Includes logic for action type validation and risk classification.
  • src/ai_company/tools/invoker.py
    • Imported ApprovalRiskLevel and EscalationInfo.
    • Added _pending_escalations list and pending_escalations property to track escalations.
    • Modified _check_security to populate _pending_escalations when an ESCALATE verdict with an approval_id is returned.
    • Refactored invoke and invoke_all to clear _pending_escalations at the start of each call.
    • Added _track_parking_metadata to detect requires_parking metadata from tool results and add to escalations.
    • Ensured escalations are sorted deterministically in invoke_all.
  • src/ai_company/tools/registry.py
    • Added all_tools method to return all registered tool instances.
  • tests/unit/api/controllers/test_approvals.py
    • Updated _create_payload to use the new 'code:merge' action type format and removed requested_by.
  • tests/unit/api/test_dto.py
    • Updated test cases for CreateApprovalRequest to reflect the new action_type format and the removal of requested_by.
  • tests/unit/engine/test_approval_gate.py
    • Added new file with unit tests for the ApprovalGate service, covering parking, resuming, and message building.
  • tests/unit/engine/test_approval_gate_models.py
    • Added new file with unit tests for EscalationInfo and ResumePayload models, including validation and immutability checks.
  • tests/unit/engine/test_loop_helpers_approval.py
    • Added new file with unit tests for the approval gate integration within loop_helpers, specifically for execute_tool_calls.
  • tests/unit/observability/test_events.py
    • Added approval_gate to the list of expected domain modules for event constant discovery.
    • Updated pytest markers to include pytest.mark.unit.
  • tests/unit/tools/test_approval_tool.py
    • Added new file with unit tests for RequestHumanApprovalTool, verifying its creation, execution, approval item creation, risk classification, and error handling.
  • tests/unit/tools/test_invoker_escalation.py
    • Added new file with unit tests for ToolInvoker's escalation tracking, covering ESCALATE verdicts and requires_parking metadata.
  • web/src/tests/stores/approvals.test.ts
    • Added flushPromises helper for asynchronous testing.
    • Updated WebSocket event handling tests to mock getApproval and verify fetching full approval items.
    • Added tests for handling approval.submitted with active filters, decrementing total on item removal, and robust error handling for transient fetch errors.
  • web/src/stores/approvals.ts
    • Imported axios for improved error handling.
    • Refactored handleWsEvent to asynchronously fetch full approval items from the API using approval_id from the WebSocket payload.
    • Implemented logic to update local store state (approvals list and total count) based on fetched item status and active filters.
    • Added robust error handling for API calls within WebSocket event processing, distinguishing between transient errors and genuinely missing items.
Activity
  • Pre-reviewed by 15 specialized agents (docs-consistency, code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, api-contract-drift, test-quality-reviewer, async-concurrency-reviewer, issue-resolution-verifier).
  • 43 findings identified and addressed across Critical (14), Major (23), and Medium (6) severities.
  • Key fixes included: security (identity spoofing, prompt injection, schema length limits), lifecycle (premature context deletion, orphaned records, zombie parking states), type safety (tuple[Any, ...]tuple[EscalationInfo, ...]), frontend (WebSocket payload mismatch, real-time updates were completely broken), and tests (markers, precise assertions, new error path coverage).
  • Test plan executed: ruff check, mypy, and pytest passed with 94.60% coverage.
  • Manual verification of WebSocket real-time updates in the dashboard and approval create/approve/reject flows via API are pending.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 08:11 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 14, 2026

Preview Deployment

URL: https://pr-368.synthorg-pr-preview.pages.dev

Built from commit cd3768f

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/api/test_dto.py (1)

36-77: 🧹 Nitpick | 🔵 Trivial

LGTM with a suggestion.

Tests correctly updated to use the new category:action format ("deploy:release"). Consider adding test cases for invalid action_type formats to ensure the validator rejects malformed inputs.

,

💡 Suggested additional test cases for action_type validation
`@pytest.mark.parametrize`(
    "invalid_action_type",
    [
        "deploy",           # missing colon
        ":release",         # empty category
        "deploy:",          # empty action
        "deploy:  ",        # whitespace-only action
        "  :release",       # whitespace-only category
        "a:b:c",            # too many parts
    ],
)
def test_action_type_invalid_format(self, invalid_action_type: str) -> None:
    with pytest.raises(ValueError, match="category:action"):
        CreateApprovalRequest(
            action_type=invalid_action_type,
            title="Test",
            description="Test desc",
            risk_level=ApprovalRiskLevel.LOW,
        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/test_dto.py` around lines 36 - 77, Add new parametrized tests
to cover invalid action_type formats: create a test named
test_action_type_invalid_format that parametrizes invalid_action_type values
(e.g. "deploy", ":release", "deploy:", "deploy:  ", "  :release", "a:b:c") and
asserts CreateApprovalRequest raises ValueError with a message matching
"category:action"; locate where other DTO tests live and follow the same
pattern/fixtures as test_metadata_* functions so you assert
CreateApprovalRequest(action_type=invalid_action_type, title="Test",
description="Test desc", risk_level=ApprovalRiskLevel.LOW) raises the expected
ValueError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/api/controllers/approvals.py`:
- Around line 298-301: Before raising UnauthorizedError in the auth check, log
the unauthorized path with context: capture request.scope.get("user")
(auth_user) and relevant request context (e.g., request.method or request.url)
and call processLogger.warning or processLogger.error with a clear message and
the auth_user value, then raise UnauthorizedError as currently done; update the
block around auth_user = request.scope.get("user") to emit that log before
raising to satisfy the "log error paths" guideline.
- Around line 171-205: The resume flow currently only logs
APPPROVAL_GATE_RESUME_TRIGGERED and never wakes parked approvals; update
_trigger_resume to call the approval gate's resume logic (use
approval_gate.resume_context or the appropriate resume method on
AppState.approval_gate) after logging, passing approval_id, approved, decided_by
and decision_reason, and ensure the call is non-blocking (e.g. schedule via
asyncio.create_task or other background executor) and properly error-handled
with logger.info/logger.error on success/failure so parked contexts are actually
resumed without consuming the approval record.

In `@src/ai_company/engine/approval_gate.py`:
- Around line 213-232: The code only logs when deletion fails or returns False;
add an explicit success log for observability by emitting a debug/info message
when deleted is True after calling self._parked_context_repo.delete(parked.id).
Locate the block around the deleted variable and the call to
_parked_context_repo.delete in the resume path (the try/except that currently
catches exceptions and logs APPROVAL_GATE_RESUME_DELETE_FAILED), and add a
logger.debug (or logger.info) that references approval_id and parked.id when
deleted is True so successful deletions are recorded alongside the existing
warning/exception logs.

In `@src/ai_company/engine/loop_helpers.py`:
- Around line 373-378: The debug log in logger.debug currently uses the
misleading APPROVAL_GATE_CONTEXT_PARKED event constant when noting a missing
task_execution; replace that constant with a semantically appropriate event
(either a generic debug event constant or a new specific constant) such as
APPROVAL_GATE_CONTEXT_MISSING_TASK_EXECUTION (or GENERIC_DEBUG_EVENT) and update
the call in the same logger.debug invocation that includes
escalation.approval_id and agent_id so the log accurately reflects "missing
task_execution / task_id will be None" rather than implying a parked approval.

In `@tests/unit/engine/test_loop_helpers_approval.py`:
- Line 42: Replace the hardcoded model identifier "test-model-001" with the
repository-standard test model id (e.g., "test-small-001") wherever it appears
in the test case; update the value passed to the model parameter in the test
helper/fixture invocation that contains model="test-model-001" so the test uses
the approved naming pattern (test-small-001) consistent with other tests and
fixtures.

In `@web/src/__tests__/stores/approvals.test.ts`:
- Around line 281-290: The test spies on axios.isAxiosError (via vi.spyOn((await
import('axios')).default, 'isAxiosError')) but never restores it; add an
afterEach hook that calls mockRestore() on that spy (or reassigns the
originalIsAxiosError) so the original implementation stored in
originalIsAxiosError is reinstated after each test; locate the spy created near
axiosError/originalIsAxiosError and ensure you call spy.mockRestore() in
afterEach to prevent leakage into other tests.

In `@web/src/stores/approvals.ts`:
- Around line 79-85: When handling an approval.submitted event, don't
unconditionally increment total.value when activeFilters.value is truthy;
instead fetch the new approval via approvalsApi.getApproval(approvalId) and
check whether it matches the current activeFilters (or re-run the active query
using listApprovals(activeFilters.value)) before mutating approvals.value and
total.value. Update the handler that calls approvalsApi.getApproval(approvalId)
to either (a) apply the same filter predicate used by listApprovals to the
returned item and only prepend to approvals.value and increment total.value if
it matches, or (b) call listApprovals(activeFilters.value) and set
approvals.value and total.value from that refreshed result so the filtered total
stays correct.
- Around line 96-112: The status-change branch must reconcile the active query
instead of assuming the changed item is in approvals.value: after handling
'approval.approved'|'approval.rejected'|'approval.expired' (where you currently
call approvalsApi.getApproval and update approvals.value/total.value), re-run
the current list query to refresh approvals.value and total.value (e.g., call
the module's existing loader like fetchApprovals() or invoke approvalsApi.list
with the current filters/page) so filtered views reflect removals, additions,
and updated rows; keep the 404/410 fallback only as a secondary removal step if
the item truly no longer exists.

---

Outside diff comments:
In `@tests/unit/api/test_dto.py`:
- Around line 36-77: Add new parametrized tests to cover invalid action_type
formats: create a test named test_action_type_invalid_format that parametrizes
invalid_action_type values (e.g. "deploy", ":release", "deploy:", "deploy:  ", "
:release", "a:b:c") and asserts CreateApprovalRequest raises ValueError with a
message matching "category:action"; locate where other DTO tests live and follow
the same pattern/fixtures as test_metadata_* functions so you assert
CreateApprovalRequest(action_type=invalid_action_type, title="Test",
description="Test desc", risk_level=ApprovalRiskLevel.LOW) raises the expected
ValueError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 665c63f1-0f43-4e10-8243-1b719f49b678

📥 Commits

Reviewing files that changed from the base of the PR and between 494013f and 91823f4.

📒 Files selected for processing (32)
  • CLAUDE.md
  • README.md
  • docs/design/engine.md
  • src/ai_company/api/approval_store.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/api/dto.py
  • src/ai_company/api/state.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/registry.py
  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/api/test_dto.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/tools/test_invoker_escalation.py
  • web/src/__tests__/stores/approvals.test.ts
  • web/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Agent
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (9)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/api/test_dto.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/approval_gate.py
  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/engine/test_approval_gate_models.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/security/timeout/parked_context.py
  • tests/unit/engine/test_approval_gate.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/api/state.py
  • src/ai_company/tools/registry.py
  • tests/unit/tools/test_invoker_escalation.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/api/dto.py
  • tests/unit/observability/test_events.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/engine/__init__.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/api/test_dto.py
  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/tools/test_invoker_escalation.py
  • tests/unit/observability/test_events.py
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

When approved deviations occur, update the relevant docs/design/ page to reflect the new reality.

Files:

  • docs/design/engine.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation source in docs/ built with Zensical. Design spec in docs/design/ (7 pages). Architecture in docs/architecture/. Roadmap in docs/roadmap/. Security in docs/security.md.

Files:

  • docs/design/engine.md
web/src/**/*.{vue,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard: Vue 3 + PrimeVue + Tailwind CSS, organized by feature in src/components/, src/stores/, src/views/. Enforce with ESLint and vue-tsc type-checking.

Files:

  • web/src/__tests__/stores/approvals.test.ts
  • web/src/stores/approvals.ts
web/src/__tests__/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard tests: Vitest unit tests organized by feature in tests/

Files:

  • web/src/__tests__/stores/approvals.test.ts
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, ConfigDict. Use @computed_field for derived values instead of storing redundant fields (e.g., TokenUsage.total_tokens).
Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name for loggers (not _logger, not log).
Event names must always use constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import dire...

Files:

  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/api/state.py
  • src/ai_company/tools/registry.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/api/dto.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/engine/__init__.py
src/ai_company/**/[!_]*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Files:

  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/api/state.py
  • src/ai_company/tools/registry.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/api/dto.py
  • src/ai_company/api/approval_store.py
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Frontend state management: Pinia stores organized by feature in src/stores/

Files:

  • web/src/stores/approvals.ts
🧠 Learnings (14)
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/__tests__/**/*.ts : Web dashboard tests: Vitest unit tests organized by feature in __tests__/

Applied to files:

  • web/src/__tests__/stores/approvals.test.ts
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/agent_engine.py
  • CLAUDE.md
  • src/ai_company/engine/loop_helpers.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • src/ai_company/api/state.py
  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Structured logging: always use `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : NEVER use `import logging`, `logging.getLogger()`, or `print()` in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Always use `logger` as the variable name for loggers (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Use Pydantic v2 with `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields (e.g., `TokenUsage.total_tokens`).

Applied to files:

  • src/ai_company/api/dto.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Tests must use markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/stores/**/*.ts : Frontend state management: Pinia stores organized by feature in src/stores/

Applied to files:

  • web/src/stores/approvals.ts
🧬 Code graph analysis (19)
tests/unit/engine/test_loop_helpers_approval.py (5)
src/ai_company/engine/approval_gate.py (2)
  • ApprovalGate (40-274)
  • should_park (71-90)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/loop_helpers.py (1)
  • execute_tool_calls (241-343)
src/ai_company/engine/loop_protocol.py (2)
  • ExecutionResult (79-140)
  • TerminationReason (28-36)
src/ai_company/providers/models.py (3)
  • CompletionResponse (257-306)
  • ToolCall (96-119)
  • ToolResult (122-135)
web/src/__tests__/stores/approvals.test.ts (2)
web/src/stores/approvals.ts (1)
  • useApprovalStore (8-136)
web/src/api/types.ts (1)
  • WsEvent (519-524)
tests/unit/api/test_dto.py (2)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
src/ai_company/api/dto.py (1)
  • CreateApprovalRequest (196-249)
src/ai_company/engine/approval_gate.py (5)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/context.py (1)
  • AgentContext (87-307)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
tests/unit/engine/test_approval_gate_models.py (1)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
src/ai_company/engine/_security_factory.py (7)
src/ai_company/security/rules/engine.py (1)
  • RuleEngine (32-174)
src/ai_company/security/service.py (1)
  • SecOpsService (74-456)
src/ai_company/core/agent.py (1)
  • AgentIdentity (266-342)
src/ai_company/security/config.py (1)
  • SecurityConfig (93-140)
src/ai_company/security/protocol.py (1)
  • SecurityInterceptionStrategy (18-54)
src/ai_company/tools/registry.py (2)
  • ToolRegistry (30-126)
  • all_tools (102-104)
src/ai_company/tools/approval_tool.py (1)
  • RequestHumanApprovalTool (32-252)
src/ai_company/security/timeout/parked_context.py (2)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
src/ai_company/tools/base.py (1)
  • description (138-140)
tests/unit/engine/test_approval_gate.py (5)
src/ai_company/engine/approval_gate.py (5)
  • ApprovalGate (40-274)
  • should_park (71-90)
  • park_context (92-160)
  • resume_context (162-239)
  • build_resume_message (242-274)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
src/ai_company/api/approval_store.py (1)
  • save (106-123)
src/ai_company/engine/agent_engine.py (6)
src/ai_company/engine/_security_factory.py (2)
  • make_security_interceptor (39-117)
  • registry_with_approval_tool (120-148)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-274)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/engine/loop_protocol.py (1)
  • ExecutionLoop (151-189)
src/ai_company/security/timeout/park_service.py (1)
  • ParkService (29-144)
src/ai_company/engine/react_loop.py (1)
  • ReactLoop (55-269)
src/ai_company/api/state.py (1)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-274)
tests/unit/tools/test_invoker_escalation.py (6)
src/ai_company/providers/models.py (1)
  • ToolCall (96-119)
src/ai_company/security/models.py (2)
  • SecurityVerdict (35-67)
  • SecurityVerdictType (23-32)
src/ai_company/tools/base.py (1)
  • ToolExecutionResult (25-54)
src/ai_company/tools/invoker.py (4)
  • registry (123-125)
  • pending_escalations (128-136)
  • invoke (343-365)
  • invoke_all (711-781)
src/ai_company/tools/registry.py (1)
  • ToolRegistry (30-126)
src/ai_company/engine/parallel_models.py (2)
  • agent_id (79-81)
  • task_id (87-89)
src/ai_company/security/timeout/park_service.py (1)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
src/ai_company/tools/__init__.py (1)
src/ai_company/tools/approval_tool.py (1)
  • RequestHumanApprovalTool (32-252)
src/ai_company/engine/approval_gate_models.py (1)
src/ai_company/tools/base.py (1)
  • action_type (133-135)
src/ai_company/api/controllers/approvals.py (4)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/api/approval_store.py (2)
  • get (61-73)
  • save_if_pending (125-142)
src/ai_company/api/auth/models.py (1)
  • AuthenticatedUser (68-87)
src/ai_company/api/errors.py (2)
  • UnauthorizedError (59-65)
  • ConflictError (41-47)
src/ai_company/engine/react_loop.py (2)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-274)
web/src/stores/approvals.ts (1)
web/src/api/types.ts (1)
  • WsEvent (519-524)
src/ai_company/api/approval_store.py (4)
src/ai_company/core/approval.py (1)
  • ApprovalItem (24-96)
src/ai_company/core/enums.py (1)
  • ApprovalStatus (434-440)
src/ai_company/api/app.py (1)
  • _on_expire (74-99)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/engine/__init__.py (3)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-274)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
🪛 LanguageTool
CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (46)
web/src/__tests__/stores/approvals.test.ts (3)

6-7: Good async test helper for WS microtask completion.

This helper makes the sync handleWsEvent + async IIFE pattern testable and keeps assertions deterministic.


163-279: WS event coverage is strong and aligned with approval_id payload semantics.

Nice expansion across submitted/approved/rejected/expired plus filter-active behavior and duplicate suppression.


341-366: Transient error path assertions are solid.

Verifying unchanged state plus console.warn payload here is valuable for regression safety.

web/src/stores/approvals.ts (1)

67-71: Good guard for malformed WS payloads.

Early-returning unless approval_id is a non-empty string keeps bad events from mutating store state.

src/ai_company/tools/registry.py (1)

102-104: LGTM!

The all_tools() method follows the established pattern from list_tools() and to_definitions(), providing deterministic name-sorted iteration over tool instances. Clean implementation with proper type hints.

docs/design/engine.md (1)

450-468: LGTM!

The documentation accurately describes the new approval gate integration, including escalation detection via ToolInvoker.pending_escalations, context serialization through ParkService, and the PARKED termination reason semantics. Clear and consistent with the implementation.

src/ai_company/api/approval_store.py (2)

125-142: LGTM!

The save_if_pending method correctly implements atomic conditional update semantics: it applies lazy expiration check before status comparison, preventing TOCTOU races where an item expires or gets decided between check and save. Returns None on both "not found" and "not pending" cases, which is appropriate for the caller to handle.


170-179: LGTM!

Good defensive handling of the on_expire callback: re-raises critical errors (MemoryError, RecursionError) while logging and swallowing other exceptions to prevent callback failures from breaking the expiration flow. Uses logger.exception correctly for full traceback capture.

CLAUDE.md (2)

101-115: LGTM!

Package structure documentation accurately reflects the new approval gate integration points: API approval gate integration, engine approval gate components (escalation detection, context parking/resume, models), and the request_human_approval tool.


154-154: LGTM!

Event constant documentation properly extended with the new approval gate events (APPROVAL_GATE_ESCALATION_DETECTED, APPROVAL_GATE_INITIALIZED, APPROVAL_GATE_RISK_CLASSIFIED, APPROVAL_GATE_RESUME_TRIGGERED), following the established pattern for domain-specific event modules.

src/ai_company/api/dto.py (4)

10-17: LGTM!

Clean import organization for Pydantic v2 validation decorators. The addition of field_validator and model_validator enables the new action_type format and metadata bounds validation.


221-232: LGTM!

The _validate_action_type_format validator correctly enforces the category:action format by checking for exactly two non-empty parts after splitting on :. The use of strip() handles whitespace-padded segments appropriately.


217-217: LGTM!

Reasonable TTL bounds: minimum 60 seconds prevents excessively short-lived approvals, maximum 604800 seconds (7 days) prevents indefinite pending items. Aligns with the docstring update.


261-261: LGTM!

Adding max_length=4096 to the optional comment field provides consistent bounds with other text fields (description, reason) and prevents unbounded input.

src/ai_company/tools/__init__.py (2)

3-3: LGTM!

Clean import of the new RequestHumanApprovalTool from the approval_tool module.


68-68: LGTM!

RequestHumanApprovalTool correctly added to __all__ in alphabetical order, maintaining the established convention for public API exports.

tests/unit/api/controllers/test_approvals.py (1)

19-29: LGTM!

Test payload correctly updated to use the new category:action format ("code:merge"). The removal of requested_by from the payload aligns with the DTO change where this field is now derived from the authenticated user rather than the request body.

README.md (1)

132-132: LGTM!

The Status section accurately reflects the addition of approval workflow gates to the implemented features.

src/ai_company/api/state.py (1)

138-141: LGTM!

The approval_gate property correctly returns the optional service without raising 503, which is appropriate since approval gates are an optional feature. The pattern is consistent with the nullable return type.

src/ai_company/security/timeout/parked_context.py (1)

41-43: LGTM!

The task_id field change to optional with None default properly supports taskless agents. The description update clarifies the use case.

src/ai_company/engine/__init__.py (1)

9-10: LGTM!

The new imports and __all__ entries properly expose the approval gate components as part of the public engine API.

src/ai_company/engine/loop_helpers.py (2)

330-341: LGTM!

The escalation check integrates cleanly after tool execution. The flow correctly:

  1. Checks if approval_gate is provided
  2. Queries pending escalations via should_park
  3. Delegates to _park_for_approval when escalation detected

346-419: LGTM!

The _park_for_approval helper is well-structured:

  • Properly extracts agent_id and task_id from context
  • Handles parking failures by returning ERROR with descriptive metadata
  • Returns PARKED result with approval_id on success
  • Re-raises MemoryError and RecursionError as required
tests/unit/engine/test_approval_gate_models.py (3)

1-9: LGTM!

Test file follows guidelines with proper markers (pytest.mark.unit) and timeout configuration.


12-88: LGTM!

Comprehensive test coverage for EscalationInfo:

  • Valid construction with all fields
  • Immutability enforcement via frozen model
  • Parametrized tests for blank/empty string rejection
  • All risk levels validated

90-133: LGTM!

Good coverage for ResumePayload including approved/rejected paths and immutability checks.

tests/unit/observability/test_events.py (2)

135-135: LGTM!

The pytestmark format change to a list correctly combines the unit marker with the timeout.


178-178: LGTM!

Adding "approval_gate" to the expected domain modules ensures the new observability events module is discovered and validated.

src/ai_company/engine/approval_gate_models.py (1)

1-51: LGTM!

Well-structured frozen Pydantic models following project conventions:

  • Proper use of NotBlankStr for all identifier fields
  • Frozen config for immutability
  • Clear Google-style docstrings
  • No logging required for pure data models per guidelines
tests/unit/tools/test_approval_tool.py (1)

59-256: Strong coverage for approval-tool execution and failure paths.

This test suite exercises both happy paths and boundary/error behavior well (parking metadata, taskless mode, validation, and store failures).

src/ai_company/security/timeout/park_service.py (1)

43-53: task_id optionality is integrated cleanly.

The signature and docstring updates are consistent and preserve existing park/resume behavior for task-bound contexts.

tests/unit/engine/test_approval_gate.py (1)

188-291: Resume-path testing is thorough and well targeted.

Good coverage of success, unknown approval, deserialization failure, and delete-failure behavior.

tests/unit/tools/test_invoker_escalation.py (1)

114-280: Escalation-tracking tests look solid.

The cases for verdict-driven escalation, clearing between calls, and parking metadata are comprehensive and high-value.

src/ai_company/engine/react_loop.py (1)

65-70: Approval-gate wiring in ReAct loop is clean.

The optional dependency is threaded correctly and keeps default behavior intact when no gate is configured.

Also applies to: 217-224

src/ai_company/engine/plan_execute_loop.py (1)

85-93: PlanExecute approval-gate plumbing is correctly applied.

The instance-level gate injection and propagation into execute_tool_calls() are consistent and low-risk.

Also applies to: 733-758

src/ai_company/tools/approval_tool.py (1)

1-252: LGTM!

The implementation follows coding guidelines correctly:

  • Uses PEP 758 except syntax (except MemoryError, RecursionError:)
  • Proper structured logging with event constants from ai_company.observability.events.approval_gate
  • Google-style docstrings on public class and methods
  • Type hints on all public functions
  • Defensive error handling with fallback to ApprovalRiskLevel.HIGH
src/ai_company/engine/_security_factory.py (1)

1-148: LGTM!

The factory functions are well-structured:

  • make_security_interceptor properly validates configuration consistency and logs warnings/errors before raising
  • registry_with_approval_tool correctly creates a new registry rather than mutating the existing one
  • Proper use of structured logging with event constants
src/ai_company/observability/events/approval_gate.py (1)

1-16: LGTM!

Event constants module correctly follows the established patterns:

  • Uses typing.Final[str] for immutability
  • Dot-notation naming convention (approval_gate.<category>.<action>)
  • Comprehensive coverage of approval gate lifecycle events

Based on learnings: "Event names must always use constants from domain-specific modules under ai_company.observability.events" — this module provides those constants for the approval gate domain.

src/ai_company/engine/approval_gate.py (1)

241-274: Good security design in build_resume_message.

The prompt injection mitigation is well-implemented:

  • Uses repr() (!r format specifier) to escape user-supplied values
  • Explicitly labels decision_reason as untrusted data with clear instructions not to follow as instructions
  • Structural separation between system decision signal and user content
src/ai_company/engine/agent_engine.py (3)

139-142: Initialization order is correct.

The assignment order ensures dependencies are satisfied:

  1. _approval_store is stored first
  2. _parked_context_repo is stored
  3. _approval_gate is created (depends on both above)
  4. _loop is created (depends on _approval_gate)

597-617: LGTM!

The _make_approval_gate and _make_default_loop methods are well-structured:

  • Returns None when approval store is unavailable, allowing graceful degradation
  • Lazy import of ParkService avoids circular imports
  • ReactLoop receives the approval gate for escalation handling

631-655: LGTM!

The _make_tool_invoker correctly integrates approval tooling:

  • Uses registry_with_approval_tool to augment the registry when approval store is available
  • Passes identity and task_id for proper context in the approval tool
  • Creates security interceptor with effective_autonomy
src/ai_company/tools/invoker.py (4)

120-136: LGTM!

The escalation state management follows the immutability guidelines correctly:

  • Internal _pending_escalations is a mutable list for accumulation
  • Public pending_escalations property returns an immutable tuple
  • Cleared at the start of each invoke() and invoke_all() call

601-652: Good defensive design: re-raising as ToolExecutionError on tracking failure.

The comment at lines 613-616 correctly explains the rationale — if escalation tracking fails, the caller must treat this as a tool execution failure to prevent silently bypassing the approval gate. This is the right security posture.


773-779: Deterministic escalation ordering is a good design choice.

Sorting escalations by tool-call index ensures consistent behavior regardless of concurrent execution order, which aids debugging and testing.


633-635: The enum construction is correct; no issue exists.

ApprovalRiskLevel extends StrEnum with values LOW = "low", MEDIUM = "medium", HIGH = "high", and CRITICAL = "critical". StrEnum is designed for string-based construction, and ApprovalRiskLevel("high") will successfully resolve to the enum member. The default fallback value "high" also matches a valid enum value.

			> Likely an incorrect or invalid review comment.

Comment on lines +171 to +205
async def _trigger_resume(
app_state: AppState,
approval_id: str,
*,
approved: bool,
decided_by: str,
decision_reason: str | None = None,
) -> None:
"""Signal that a parked agent can be resumed after a decision.

Does NOT consume the parked record — that is the responsibility
of the future scheduling component that will actually re-enqueue
the agent. This function only logs the resume trigger so that
a scheduler observing log events or polling the approval store
can detect the decision and call ``ApprovalGate.resume_context()``.

Args:
app_state: Application state containing the approval gate.
approval_id: The approval item identifier.
approved: Whether the action was approved.
decided_by: Who made the decision.
decision_reason: Optional reason for the decision.
"""
approval_gate = app_state.approval_gate
if approval_gate is None:
return

logger.info(
APPROVAL_GATE_RESUME_TRIGGERED,
approval_id=approval_id,
approved=approved,
decided_by=decided_by,
has_reason=decision_reason is not None,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether any runtime path consumes APPROVAL_GATE_RESUME_TRIGGERED
# or otherwise resumes parked contexts on approval decisions.

rg -n --type=py 'APPROVAL_GATE_RESUME_TRIGGERED'
rg -n --type=py '\bresume_context\s*\('
rg -n --type=py '\b_trigger_resume\s*\('

Repository: Aureliolo/synthorg

Length of output: 1609


🏁 Script executed:

cat -n src/ai_company/api/controllers/approvals.py | head -80

Repository: Aureliolo/synthorg

Length of output: 3083


🏁 Script executed:

cat -n src/ai_company/api/controllers/approvals.py | sed -n '400,500p'

Repository: Aureliolo/synthorg

Length of output: 3707


🏁 Script executed:

rg -n --type=py 'event.*listener|subscribe|consume.*APPROVAL_GATE_RESUME' -i

Repository: Aureliolo/synthorg

Length of output: 31713


🏁 Script executed:

fd -type f -name '*scheduler*' -o -name '*background*' -o -name '*worker*' -o -name '*poll*' | grep -E '\.py$'

Repository: Aureliolo/synthorg

Length of output: 232


🏁 Script executed:

rg -n --type=py 'scheduler|background.*job|polling.*approval|event.*consumer' -i

Repository: Aureliolo/synthorg

Length of output: 176


🏁 Script executed:

cat -n src/ai_company/api/controllers/approvals.py | sed -n '171,210p'

Repository: Aureliolo/synthorg

Length of output: 1637


Resume mechanism incomplete—parked approvals are never resumed.

_trigger_resume() only logs APPROVAL_GATE_RESUME_TRIGGERED; no production code consumes this event or invokes ApprovalGate.resume_context(). The docstring acknowledges this as a future responsibility ("a scheduler observing log events or polling the approval store can detect the decision"), but that scheduler does not exist. Without a consumer, approvals remain parked indefinitely after decision.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/controllers/approvals.py` around lines 171 - 205, The
resume flow currently only logs APPPROVAL_GATE_RESUME_TRIGGERED and never wakes
parked approvals; update _trigger_resume to call the approval gate's resume
logic (use approval_gate.resume_context or the appropriate resume method on
AppState.approval_gate) after logging, passing approval_id, approved, decided_by
and decision_reason, and ensure the call is non-blocking (e.g. schedule via
asyncio.create_task or other background executor) and properly error-handled
with logger.info/logger.error on success/failure so parked contexts are actually
resumed without consuming the approval record.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR implements the approval workflow gate feature, wiring SecOps ESCALATE verdicts and a new request_human_approval tool into the execution loop so agents can be parked pending human decisions. The implementation is solid overall — security improvements (identity binding, optimistic locking, prompt injection labeling, input bounds), observability (15 structured event constants, all registered in CLAUDE.md), and the WebSocket payload key fix are all well-executed.

Key changes:

  • ApprovalGate service (engine/approval_gate.py): coordinates serialization via ParkService, persistence via ParkedContextRepository, and future resume injection via build_resume_message.
  • RequestHumanApprovalTool (tools/approval_tool.py): agent-callable tool that creates ApprovalItem and signals the loop to park via requires_parking metadata.
  • Loop integration (react_loop.py, plan_execute_loop.py, loop_helpers.py): both loops pass approval_gate into execute_tool_calls, which checks ToolInvoker.pending_escalations after every tool execution batch and calls _park_for_approval when escalations are present.
  • ToolInvoker escalation tracking: captures escalations from both security ESCALATE verdicts and requires_parking metadata, sorts them deterministically in invoke_all.
  • API hardening: requested_by bound to auth identity, save_if_pending prevents double-decide race, RejectRequest.reason and ApproveRequest.comment both bounded to 4096 chars, action_type validated to category:action format.
  • V6 SQLite migration: makes parked_contexts.task_id nullable, adds missing indexes on agent_id and approval_id.
  • Frontend fix (web/src/stores/approvals.ts): WebSocket handler now reads approval_id from payload and fetches full items via API.

Two issues to address:

  • APPROVAL_GATE_CONTEXT_PARK_FAILED is emitted twice on every parking failure: once inside ApprovalGate._serialize_context / _persist_parked, and once again in _park_for_approval in loop_helpers.py. This doubles log counts for every failure and confuses alerting rules.
  • In invoke_all, ToolExecutionError raised by _track_parking_metadata escapes the _run_single wrapper (which only catches MemoryError/RecursionError), propagating through asyncio.gather and potentially canceling sibling concurrent tool calls — an asymmetry with the invoke() path that is undocumented.

Confidence Score: 4/5

  • Safe to merge with two moderate issues worth addressing before the scheduler integration lands.
  • The PR is comprehensive, well-tested (94.6% coverage, 7540 tests), and addresses a large number of pre-review findings. The two remaining issues are moderate: the double-emission of APPROVAL_GATE_CONTEXT_PARK_FAILED creates noise for log aggregation/alerting but does not affect correctness, and the ToolExecutionError escape in invoke_all is an edge-case asymmetry that only triggers on an already-unlikely failure path.
  • src/ai_company/engine/loop_helpers.py (double-log) and src/ai_company/tools/invoker.py (ToolExecutionError in invoke_all) deserve attention before the scheduler integration builds on this foundation.

Important Files Changed

Filename Overview
src/ai_company/engine/approval_gate.py New ApprovalGate service coordinating context parking and resume. Well-structured with proper exception handling and early-return after exception in _cleanup_parked. The _serialize_context and _persist_parked helpers both log APPROVAL_GATE_CONTEXT_PARK_FAILED before re-raising, which combined with the caller's log in loop_helpers._park_for_approval produces double-emission on every parking failure.
src/ai_company/engine/loop_helpers.py Added approval_gate parameter to execute_tool_calls and new _park_for_approval helper. The parking flow is correct, but _park_for_approval re-emits APPROVAL_GATE_CONTEXT_PARK_FAILED after ApprovalGate already logged the same event internally, creating duplicate log entries for every parking failure.
src/ai_company/tools/invoker.py Added pending_escalations property and _track_parking_metadata helper to detect approval-required tool results. The escalation sorting in invoke_all is a nice determinism improvement. However, ToolExecutionError raised by _track_parking_metadata escapes the _run_single wrapper in invoke_all without being caught, potentially canceling concurrent sibling tool calls via asyncio.gather.
src/ai_company/tools/approval_tool.py New RequestHumanApprovalTool cleanly creates ApprovalItem and signals parking via metadata. Risk classification fallback to HIGH is correct. The APPROVAL_GATE_RISK_CLASSIFIED constant is used only on the success path and APPROVAL_GATE_RISK_CLASSIFY_FAILED only on the exception path — clean separation after previous review fixes.
src/ai_company/api/controllers/approvals.py Good security improvements: requested_by now bound to auth identity, save_if_pending prevents double-decide race, saveConflictError path is correct. The _signal_resume_intent stub is clearly documented as incomplete pending a scheduler.
src/ai_company/api/dto.py ApproveRequest.comment gained max_length=4096, RejectRequest.reason also has max_length=4096, ttl_seconds gained le=604800 upper bound, and action_type now enforces category:action format via is_valid_action_type. All constraints are symmetric and consistent.
src/ai_company/engine/agent_engine.py Clean refactor extracting security/registry factory to _security_factory.py. APPROVAL_GATE_LOOP_WIRING_WARNING correctly warns when an external loop bypasses automatic gate wiring, using the new dedicated constant.
src/ai_company/persistence/sqlite/migrations.py V6 migration correctly makes parked_contexts.task_id nullable using the SQLite table-copy pattern, and adds missing indexes on agent_id and approval_id. INSERT OR IGNORE and CREATE TABLE IF NOT EXISTS make the migration reasonably idempotent on retry.
web/src/stores/approvals.ts WebSocket handleWsEvent now correctly reads approval_id from the backend payload (not id), fetches full items via API for new submissions and status changes, and handles 404/410 gracefully. The deduplication guard before inserting new items prevents race-condition duplicates.
src/ai_company/core/validation.py New shared is_valid_action_type utility. Uses split(":") (not split(":", 1)) so multi-colon strings like a:b:c are correctly rejected. Both segments are stripped before the non-blank check.

Comments Outside Diff (2)

  1. src/ai_company/engine/loop_helpers.py, line 1426-1445 (link)

    APPROVAL_GATE_CONTEXT_PARK_FAILED emitted twice on every parking failure

    ApprovalGate._serialize_context (and _persist_parked) already log APPROVAL_GATE_CONTEXT_PARK_FAILED and re-raise the exception. When that exception propagates back here, _park_for_approval logs the same event a second time. Every single parking failure therefore produces two approval_gate.context.park_failed entries — one from the gate internals and one from this handler.

    • Serialization failure path: _serialize_context logs (no note) → _park_for_approval logs (note="Parking failed — returning ERROR")
    • Persistence failure path: _persist_parked logs (note="Context serialized but persistence failed") → _park_for_approval logs (note="Parking failed — returning ERROR")

    Any alerting rule counting by event name will double-fire, and the second entry's note is misleading since the first entry already captured the root cause.

    The simplest fix is to remove the APPROVAL_GATE_CONTEXT_PARK_FAILED log here (the gate already logged it with full context) and only log a loop-level summary at a lower severity if needed:

    Alternatively, remove the log from _serialize_context / _persist_parked and keep only this one call-site emission, adding the root cause as a chained exception reference.

  2. src/ai_company/tools/invoker.py, line 2128-2130 (link)

    ToolExecutionError from _track_parking_metadata not caught in invoke_all path

    In _invoke_single, _track_parking_metadata is documented to raise ToolExecutionError when escalation tracking fails, with the intent of blocking silent bypass. However, the inner wrapper used by invoke_all only catches MemoryError and RecursionError:

    async def _run_single(index: int, tool_call: ToolCall) -> None:
        try:
            ...
            results[index] = await self._invoke_single(tool_call)
        except (MemoryError, RecursionError) as exc:
            fatal_errors.append(exc)

    A ToolExecutionError raised here escapes _run_single and propagates through asyncio.gather, which by default cancels all other in-flight sibling tasks. This means a metadata-tracking failure for one tool call in a batch will silently cancel the remaining concurrent calls. The behavior asymmetry between invoke() (raises normally from the caller) and invoke_all() (cancels sibling tasks) is unexpected and undocumented.

    If the intent is to block execution on this failure, add ToolExecutionError to the fatal-error list so it surfaces through the existing _raise_fatal_errors path without canceling siblings:

        except (MemoryError, RecursionError, ToolExecutionError) as exc:
            fatal_errors.append(exc)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/ai_company/engine/loop_helpers.py
Line: 1426-1445

Comment:
**`APPROVAL_GATE_CONTEXT_PARK_FAILED` emitted twice on every parking failure**

`ApprovalGate._serialize_context` (and `_persist_parked`) already log `APPROVAL_GATE_CONTEXT_PARK_FAILED` and re-raise the exception. When that exception propagates back here, `_park_for_approval` logs the same event a second time. Every single parking failure therefore produces two `approval_gate.context.park_failed` entries — one from the gate internals and one from this handler.

- Serialization failure path: `_serialize_context` logs (no note) → `_park_for_approval` logs (note="Parking failed — returning ERROR")
- Persistence failure path: `_persist_parked` logs (note="Context serialized but persistence failed") → `_park_for_approval` logs (note="Parking failed — returning ERROR")

Any alerting rule counting by event name will double-fire, and the second entry's note is misleading since the first entry already captured the root cause.

The simplest fix is to remove the `APPROVAL_GATE_CONTEXT_PARK_FAILED` log here (the gate already logged it with full context) and only log a loop-level summary at a lower severity if needed:

```suggestion
    except MemoryError, RecursionError:
        raise
    except Exception:
        logger.debug(
            APPROVAL_GATE_CONTEXT_PARK_FAILED,
            approval_id=escalation.approval_id,
            agent_id=agent_id,
            task_id=task_id,
            note="Parking failed — returning ERROR (non-resumable); see gate log for root cause",
        )
        return build_result(
            ctx,
            TerminationReason.ERROR,
            turns,
            error_message=(
                f"Approval escalation detected (id={escalation.approval_id}) "
                f"but context parking failed — cannot resume"
            ),
            metadata={
                "approval_id": escalation.approval_id,
                "parking_failed": True,
            },
        )
```

Alternatively, remove the log from `_serialize_context` / `_persist_parked` and keep only this one call-site emission, adding the root cause as a chained exception reference.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 2128-2130

Comment:
**`ToolExecutionError` from `_track_parking_metadata` not caught in `invoke_all` path**

In `_invoke_single`, `_track_parking_metadata` is documented to raise `ToolExecutionError` when escalation tracking fails, with the intent of blocking silent bypass. However, the inner wrapper used by `invoke_all` only catches `MemoryError` and `RecursionError`:

```python
async def _run_single(index: int, tool_call: ToolCall) -> None:
    try:
        ...
        results[index] = await self._invoke_single(tool_call)
    except (MemoryError, RecursionError) as exc:
        fatal_errors.append(exc)
```

A `ToolExecutionError` raised here escapes `_run_single` and propagates through `asyncio.gather`, which by default cancels all other in-flight sibling tasks. This means a metadata-tracking failure for *one* tool call in a batch will silently cancel the remaining concurrent calls. The behavior asymmetry between `invoke()` (raises normally from the caller) and `invoke_all()` (cancels sibling tasks) is unexpected and undocumented.

If the intent is to block execution on this failure, add `ToolExecutionError` to the fatal-error list so it surfaces through the existing `_raise_fatal_errors` path without canceling siblings:

```python
    except (MemoryError, RecursionError, ToolExecutionError) as exc:
        fatal_errors.append(exc)
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: cd3768f

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an approval-gated execution path to the agent engine so SecOps ESCALATE verdicts and agent-initiated request_human_approval tool calls can park execution, persist context, and support resumption workflows.

Changes:

  • Introduces ApprovalGate + models and wires parking checks into ReactLoop / PlanExecuteLoop via loop_helpers.execute_tool_calls().
  • Adds RequestHumanApprovalTool and extends ToolInvoker to track pending escalations (SecOps verdicts + tool metadata).
  • Hardens approvals API (auth-bound requested_by, conflict-safe saves, resume trigger logging) and fixes frontend WS handling to use approval_id + fetch full items.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
web/src/stores/approvals.ts Update WS approval event handling to use approval_id and fetch full item via API.
web/src/tests/stores/approvals.test.ts Adds async WS handler tests and error-path coverage for approval WS updates.
tests/unit/tools/test_invoker_escalation.py New unit tests for ToolInvoker escalation tracking and clearing behavior.
tests/unit/tools/test_approval_tool.py New unit tests for RequestHumanApprovalTool creation, validation, and error handling.
tests/unit/observability/test_events.py Updates event discovery expectations and standardizes pytest marks.
tests/unit/engine/test_loop_helpers_approval.py New tests for approval gate integration in execute_tool_calls().
tests/unit/engine/test_approval_gate_models.py New tests for frozen Pydantic models EscalationInfo / ResumePayload.
tests/unit/engine/test_approval_gate.py New tests for ApprovalGate park/resume behavior and resume message formatting.
tests/unit/api/test_dto.py Updates DTO tests for new action_type format and removed requested_by.
tests/unit/api/controllers/test_approvals.py Updates controller test payload defaults for category:action action_type.
src/ai_company/tools/registry.py Adds all_tools() to support composing registries with injected tools.
src/ai_company/tools/invoker.py Tracks pending_escalations from SecOps ESCALATE and tool parking metadata; deterministic ordering in invoke_all.
src/ai_company/tools/approval_tool.py Implements request_human_approval tool that creates an ApprovalItem and signals parking via metadata.
src/ai_company/tools/init.py Exposes RequestHumanApprovalTool from tools package.
src/ai_company/security/timeout/parked_context.py Makes task_id optional on parked contexts (taskless agents).
src/ai_company/security/timeout/park_service.py Allows task_id=None when parking contexts.
src/ai_company/observability/events/approval_gate.py Adds structured event constants for approval gate lifecycle.
src/ai_company/engine/react_loop.py Injects optional ApprovalGate into loop and passes through to tool execution helper.
src/ai_company/engine/plan_execute_loop.py Injects optional ApprovalGate and routes through to tool execution helper.
src/ai_company/engine/loop_helpers.py Adds approval-gate parking path after tool execution and returns TerminationReason.PARKED.
src/ai_company/engine/approval_gate_models.py Adds frozen models for escalation + resume payload.
src/ai_company/engine/approval_gate.py Implements park/persist and resume/load/delete orchestration plus resume message builder.
src/ai_company/engine/agent_engine.py Wires approval gate + approval tool registry injection; factors security setup into _security_factory.
src/ai_company/engine/_security_factory.py New factory helpers for security interceptor and approval-tool registry composition.
src/ai_company/engine/init.py Re-exports approval gate types as part of engine public API.
src/ai_company/api/state.py Stores optional ApprovalGate in AppState for controller access/logging.
src/ai_company/api/dto.py Hardens approval DTOs (action_type format validator; TTL max; approval comment max length).
src/ai_company/api/controllers/approvals.py Binds requested_by to authenticated user, adds conflict-safe decision saving and resume-trigger logging, broadens WS publish exception handling.
src/ai_company/api/approval_store.py Adds save_if_pending() and guards on_expire callback failures.
docs/design/engine.md Documents approval gate parking path in the engine run flow.
README.md Updates status to include approval workflow gates as implemented.
CLAUDE.md Updates repo structure notes and observability event guidance to include approval-gate domain.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +286 to +320
const originalIsAxiosError = (await import('axios')).default.isAxiosError
vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => {
if (err === axiosError) return true
return originalIsAxiosError(err)
})

mockGetApproval.mockRejectedValue(axiosError)

const store = useApprovalStore()
store.approvals = [mockApproval]
store.total = 5

const event: WsEvent = {
event_type: 'approval.approved',
channel: 'approvals',
timestamp: '2026-03-12T10:01:00Z',
payload: { approval_id: 'approval-1', status: 'approved', action_type: 'deploy:production', risk_level: 'high' },
}
store.handleWsEvent(event)
await flushPromises()

// Item was found and removed, so total decremented
expect(store.approvals).toHaveLength(0)
expect(store.total).toBe(4)
})

it('does not decrement total when item was not in local list', async () => {
const axiosError = new Error('Not found') as Error & { isAxiosError: boolean; response: { status: number } }
axiosError.isAxiosError = true
axiosError.response = { status: 404 }
vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => {
if (err === axiosError) return true
return false
})

Comment on lines +41 to +43
task_id: NotBlankStr | None = Field(
default=None, description="Task identifier (None for taskless agents)"
)
Comment on lines +373 to +378
logger.debug(
APPROVAL_GATE_CONTEXT_PARKED,
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)
Comment on lines +79 to +86
const item = await approvalsApi.getApproval(approvalId)
if (activeFilters.value) {
// Filters are active — item may not match; just bump total
total.value++
} else {
approvals.value = [item, ...approvals.value]
total.value++
}
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that implements a critical approval workflow feature. The implementation is robust, secure, and well-tested across both backend and frontend. The code is well-designed with good separation of concerns, and includes important security hardening and resilience improvements. The refactoring of AgentEngine and the frontend approvals store are particularly noteworthy improvements. I have one suggestion to improve maintainability by reducing code duplication.

Comment on lines +217 to +218
parts = action_type.split(":")
if len(parts) != 2 or not parts[0].strip() or not parts[1].strip(): # noqa: PLR2004
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This validation logic for action_type is duplicated from src/ai_company/api/dto.py. While the implementation is slightly different (one raises ValueError for Pydantic, this one returns a ToolExecutionResult), the core logic is the same. This could lead to inconsistencies if the validation rules change in the future.

To improve maintainability, consider creating a shared utility function, for example in a new src/ai_company/core/utils.py module, that performs this validation and returns a boolean. Both the DTO and this tool could then use this shared function and handle the result as needed.

- Rename _trigger_resume → _signal_resume_intent with TODO for scheduler
- Add warning log before UnauthorizedError in controller auth checks
- Add DEBUG log on successful parked record deletion
- New APPROVAL_GATE_PARK_TASKLESS event for pre-park debug log
- New APPROVAL_GATE_RISK_CLASSIFY_FAILED event for classification failure
- Fix SQLite schema: parked_contexts.task_id TEXT NOT NULL → TEXT (nullable)
- Extract shared is_valid_action_type() to core/validation.py
- Fix filtered WS handler: re-fetch query instead of blind total increment
- Add afterEach(vi.restoreAllMocks()) for spy cleanup in frontend tests
- Add 10 missing event constants to CLAUDE.md reference list
- Fix test-model-001 → test-small-001 per project conventions
- Add comprehensive tests: validation, approval_store, controller helpers,
  security_factory, approval_gate init/resume, risk classification failure,
  action_type format validation, TTL bounds, ApproveRequest comment length,
  loop_helpers taskless path and IOError handling
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/ai_company/persistence/sqlite/migrations.py (1)

148-163: ⚠️ Potential issue | 🔴 Critical

Schema change in existing migration won't apply to upgraded databases.

Modifying _V3_STATEMENTS to make task_id nullable has no effect on databases already at version 3+. The CREATE TABLE IF NOT EXISTS guard skips DDL when the table exists, so existing installs retain the NOT NULL constraint. This causes INSERT failures when ParkService.park is called with task_id=None.

Add a V6 migration to alter the existing column. SQLite doesn't support ALTER COLUMN, so use table recreation:

Proposed V6 migration to fix existing databases
+_V6_STATEMENTS: Sequence[str] = (
+    # ── Make parked_contexts.task_id nullable ─────────────
+    """\
+CREATE TABLE IF NOT EXISTS parked_contexts_new (
+    id TEXT PRIMARY KEY,
+    execution_id TEXT NOT NULL,
+    agent_id TEXT NOT NULL,
+    task_id TEXT,
+    approval_id TEXT NOT NULL,
+    parked_at TEXT NOT NULL,
+    context_json TEXT NOT NULL,
+    metadata TEXT NOT NULL DEFAULT '{}'
+)""",
+    "INSERT INTO parked_contexts_new SELECT * FROM parked_contexts",
+    "DROP TABLE parked_contexts",
+    "ALTER TABLE parked_contexts_new RENAME TO parked_contexts",
+    "CREATE INDEX IF NOT EXISTS idx_pc_agent_id ON parked_contexts(agent_id)",
+    "CREATE INDEX IF NOT EXISTS idx_pc_approval_id ON parked_contexts(approval_id)",
+)
+
+
+async def _apply_v6(db: aiosqlite.Connection) -> None:
+    """Apply schema v6: make parked_contexts.task_id nullable."""
+    for stmt in _V6_STATEMENTS:
+        await db.execute(stmt)

Also update:

-SCHEMA_VERSION = 5
+SCHEMA_VERSION = 6

 _MIGRATIONS: list[tuple[int, _MigrateFn]] = [
     (1, _apply_v1),
     (2, _apply_v2),
     (3, _apply_v3),
     (4, _apply_v4),
     (5, _apply_v5),
+    (6, _apply_v6),
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/persistence/sqlite/migrations.py` around lines 148 - 163, The
schema change in _V3_STATEMENTS making parked_contexts.task_id nullable will not
affect existing DBs because CREATE TABLE IF NOT EXISTS is skipped; add a new
migration version (V6) that recreates the parked_contexts table with task_id
NULLABLE, copies data across, drops the old table and re-creates the indexes so
upgraded DBs get the new schema; ensure the new migration is appended to the
migrations sequence/registry used by the migrator and reference the
parked_contexts table and ParkService.park behavior so INSERTs with task_id=None
succeed.
tests/unit/api/test_approval_store.py (1)

49-50: 🛠️ Refactor suggestion | 🟠 Major

Add the required 30s timeout marker for this test module.

These classes are marked unit, but the timeout requirement is not applied here. Add a module-level pytestmark (or class-level timeout marks) to enforce per-test time limits.

Suggested fix
 import pytest
 
 from ai_company.api.approval_store import ApprovalStore
 from ai_company.api.errors import ConflictError
 from ai_company.core.approval import ApprovalItem
 from ai_company.core.enums import ApprovalRiskLevel, ApprovalStatus
 
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
@@
-@pytest.mark.unit
 class TestApprovalStore:
@@
-@pytest.mark.unit
 class TestSaveIfPending:
@@
-@pytest.mark.unit
 class TestApprovalStoreFilters:
@@
-@pytest.mark.unit
 class TestOnExpireCallback:

As per coding guidelines, "Test timeout: 30 seconds per test."

Also applies to: 179-180, 242-243, 304-305

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/test_approval_store.py` around lines 49 - 50, Add a
module-level pytest mark that enforces a 30s per-test timeout so every test in
this file runs with the required limit; declare a pytestmark combining the
existing unit mark with a timeout (e.g., pytestmark = [pytest.mark.unit,
pytest.mark.timeout(30)]) near the top of tests/unit/api/test_approval_store.py
(before the TestApprovalStore class and the other test classes in the file) so
all classes like TestApprovalStore inherit the 30s timeout.
♻️ Duplicate comments (1)
src/ai_company/api/controllers/approvals.py (1)

177-213: ⚠️ Potential issue | 🟠 Major

Decisions are still not wired to an actual resume path.

_signal_resume_intent() only logs APPROVAL_GATE_RESUME_TRIGGERED and returns. With approve/reject now depending on this helper, parked contexts can remain indefinitely parked unless another component performs resume immediately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/controllers/approvals.py` around lines 177 - 213,
_signal_resume_intent currently only logs APPROVAL_GATE_RESUME_TRIGGERED and
never triggers a resume, leaving parked contexts indefinitely; update
_signal_resume_intent to call into the approval gate (use
app_state.approval_gate) to perform the actual resume—e.g., await
approval_gate.resume_context(approval_id, approved=approved,
decided_by=decided_by, decision_reason=decision_reason) or schedule it as a
background task if non-blocking behavior is required—and add try/except to log
success or failure (including error details) so the resume path is executed
instead of relying solely on the log event.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/approval_gate.py`:
- Around line 213-247: The resume_context() flow currently returns (context,
parked.id) even when self._parked_context_repo.delete(parked.id) raises or
returns False, which lets retries replay the same parked execution; change the
logic in resume_context so that after attempting delete (the call to
self._parked_context_repo.delete in resume_context) you only return (context,
parked.id) if delete succeeded (True); if delete raised an exception (other than
critical exceptions) or returned False then raise a specific exception (e.g.,
RuntimeError or a custom DeleteParkedContextError) including approval_id and
parked.id so callers know cleanup failed and won’t resume the same parked
record; also fix the exception clause around MemoryError/RecursionError to use
the tuple form (except (MemoryError, RecursionError):) so those are re-raised
correctly.
- Around line 92-160: The park_context and resume_context functions are too
large and mix serialization, persistence, logging, and recovery; split them into
small helpers: extract the serialization and initial logging that calls
self._park_service.park into a helper (e.g., _serialize_and_log_parked_context)
that returns the ParkedContext and preserves the current try/except behavior
(re-raising MemoryError/RecursionError, logging
APPROVAL_GATE_CONTEXT_PARK_FAILED on other Exceptions), extract the persistence
block that awaits self._parked_context_repo.save into a separate helper (e.g.,
_persist_parked_context) that also re-raises MemoryError/RecursionError and logs
persistence failures, and do the analogous split for resume_context around
self._park_service.unpark/self._parked_context_repo.* operations; keep logger
calls, APPROVAL_GATE_* constants, and parameter passing (escalation, agent_id,
task_id, parked.id) unchanged so callers of park_context/resume_context become
thin orchestrators that call the new helpers.

In `@src/ai_company/tools/approval_tool.py`:
- Around line 114-116: The execute() implementation currently accesses
arguments["action_type"], ["title"], and ["description"] directly which can
raise KeyError and bypass returning a ToolExecutionResult; update
ApprovalTool.execute to validate the arguments dict at the method boundary
(check presence and types of "action_type", "title", "description") and on any
missing/invalid value return a structured ToolExecutionResult error (or raise
the tool's expected exception) instead of indexing blindly; use
arguments.get(...) with explicit checks or an input validation helper, and
reference the existing execute(), action_type, title, and description symbols
when making the change.

In `@web/src/stores/approvals.ts`:
- Around line 77-85: The duplicate-prevention check using
approvals.value.some(...) runs before the async approvalsApi.getApproval call,
allowing a TOCTOU race if two approval.submitted events for the same approvalId
arrive; after awaiting getApproval(approvalId) re-check approvals.value for an
existing id and only prepend the fetched item and increment total.value if it
still does not exist (or bail out), referencing the approvals.value array,
approvalId variable, approvalsApi.getApproval method,
fetchApprovals(activeFilters.value) fallback, and total.value counter to locate
and update the logic.

---

Outside diff comments:
In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 148-163: The schema change in _V3_STATEMENTS making
parked_contexts.task_id nullable will not affect existing DBs because CREATE
TABLE IF NOT EXISTS is skipped; add a new migration version (V6) that recreates
the parked_contexts table with task_id NULLABLE, copies data across, drops the
old table and re-creates the indexes so upgraded DBs get the new schema; ensure
the new migration is appended to the migrations sequence/registry used by the
migrator and reference the parked_contexts table and ParkService.park behavior
so INSERTs with task_id=None succeed.

In `@tests/unit/api/test_approval_store.py`:
- Around line 49-50: Add a module-level pytest mark that enforces a 30s per-test
timeout so every test in this file runs with the required limit; declare a
pytestmark combining the existing unit mark with a timeout (e.g., pytestmark =
[pytest.mark.unit, pytest.mark.timeout(30)]) near the top of
tests/unit/api/test_approval_store.py (before the TestApprovalStore class and
the other test classes in the file) so all classes like TestApprovalStore
inherit the 30s timeout.

---

Duplicate comments:
In `@src/ai_company/api/controllers/approvals.py`:
- Around line 177-213: _signal_resume_intent currently only logs
APPROVAL_GATE_RESUME_TRIGGERED and never triggers a resume, leaving parked
contexts indefinitely; update _signal_resume_intent to call into the approval
gate (use app_state.approval_gate) to perform the actual resume—e.g., await
approval_gate.resume_context(approval_id, approved=approved,
decided_by=decided_by, decision_reason=decision_reason) or schedule it as a
background task if non-blocking behavior is required—and add try/except to log
success or failure (including error details) so the resume path is executed
instead of relying solely on the log event.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 764fa9b3-8db0-4b16-b082-79aa884f4a96

📥 Commits

Reviewing files that changed from the base of the PR and between 91823f4 and 8defb9e.

📒 Files selected for processing (19)
  • CLAUDE.md
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/api/dto.py
  • src/ai_company/core/validation.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/tools/approval_tool.py
  • tests/unit/api/controllers/test_approvals_helpers.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/api/test_dto.py
  • tests/unit/core/test_validation.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/engine/test_security_factory.py
  • tests/unit/tools/test_approval_tool.py
  • web/src/__tests__/stores/approvals.test.ts
  • web/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/core/test_validation.py
  • tests/unit/api/controllers/test_approvals_helpers.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/engine/test_security_factory.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/persistence/sqlite/migrations.py
  • tests/unit/tools/test_approval_tool.py
  • src/ai_company/engine/loop_helpers.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/api/test_approval_store.py
  • src/ai_company/observability/events/approval_gate.py
  • tests/unit/api/test_dto.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/core/validation.py
  • src/ai_company/api/dto.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/core/test_validation.py
  • tests/unit/api/controllers/test_approvals_helpers.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/engine/test_security_factory.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/api/test_approval_store.py
  • tests/unit/api/test_dto.py
web/src/**/*.{vue,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard: Vue 3 + PrimeVue + Tailwind CSS, organized by feature in src/components/, src/stores/, src/views/. Enforce with ESLint and vue-tsc type-checking.

Files:

  • web/src/stores/approvals.ts
  • web/src/__tests__/stores/approvals.test.ts
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Frontend state management: Pinia stores organized by feature in src/stores/

Files:

  • web/src/stores/approvals.ts
web/src/__tests__/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard tests: Vitest unit tests organized by feature in tests/

Files:

  • web/src/__tests__/stores/approvals.test.ts
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, ConfigDict. Use @computed_field for derived values instead of storing redundant fields (e.g., TokenUsage.total_tokens).
Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name for loggers (not _logger, not log).
Event names must always use constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import dire...

Files:

  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/core/validation.py
  • src/ai_company/api/dto.py
src/ai_company/**/[!_]*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Files:

  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/core/validation.py
  • src/ai_company/api/dto.py
🧠 Learnings (14)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to docs/design/**/*.md : When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality.

Applied to files:

  • web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/stores/**/*.ts : Frontend state management: Pinia stores organized by feature in src/stores/

Applied to files:

  • web/src/stores/approvals.ts
  • web/src/__tests__/stores/approvals.test.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/__tests__/**/*.ts : Web dashboard tests: Vitest unit tests organized by feature in __tests__/

Applied to files:

  • web/src/__tests__/stores/approvals.test.ts
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/ai_company/api/controllers/approvals.py
  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • src/ai_company/api/controllers/approvals.py
  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • CLAUDE.md
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/observability/events/approval_gate.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Structured logging: always use `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : NEVER use `import logging`, `logging.getLogger()`, or `print()` in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Always use `logger` as the variable name for loggers (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to **/*.py : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`. Tests must use `test-provider`, `test-small-001`, etc.

Applied to files:

  • tests/unit/engine/test_loop_helpers_approval.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Use Pydantic v2 with `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields (e.g., `TokenUsage.total_tokens`).

Applied to files:

  • src/ai_company/api/dto.py
🧬 Code graph analysis (13)
tests/unit/core/test_validation.py (1)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
web/src/stores/approvals.ts (1)
web/src/api/types.ts (1)
  • WsEvent (519-524)
web/src/__tests__/stores/approvals.test.ts (2)
web/src/stores/approvals.ts (1)
  • useApprovalStore (8-141)
web/src/api/types.ts (1)
  • WsEvent (519-524)
tests/unit/engine/test_approval_gate.py (4)
src/ai_company/engine/approval_gate.py (5)
  • ApprovalGate (40-281)
  • should_park (71-90)
  • park_context (92-160)
  • resume_context (162-246)
  • build_resume_message (249-281)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
tests/unit/engine/test_security_factory.py (4)
src/ai_company/engine/_security_factory.py (2)
  • make_security_interceptor (39-117)
  • registry_with_approval_tool (120-148)
src/ai_company/engine/errors.py (1)
  • ExecutionStateError (17-18)
src/ai_company/security/config.py (2)
  • RuleEngineConfig (73-90)
  • SecurityConfig (93-140)
src/ai_company/tools/registry.py (1)
  • all_tools (102-104)
src/ai_company/tools/approval_tool.py (5)
src/ai_company/core/enums.py (1)
  • ToolCategory (294-308)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
src/ai_company/tools/base.py (6)
  • BaseTool (57-184)
  • ToolExecutionResult (25-54)
  • description (138-140)
  • category (128-130)
  • action_type (133-135)
  • parameters_schema (143-151)
src/ai_company/api/approval_store.py (2)
  • ApprovalStore (27-181)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
tests/unit/tools/test_approval_tool.py (3)
src/ai_company/api/approval_store.py (3)
  • ApprovalStore (27-181)
  • get (61-73)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
src/ai_company/tools/approval_tool.py (2)
  • RequestHumanApprovalTool (34-253)
  • execute (99-135)
src/ai_company/engine/loop_helpers.py (2)
src/ai_company/engine/approval_gate.py (3)
  • ApprovalGate (40-281)
  • should_park (71-90)
  • park_context (92-160)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
tests/unit/api/test_approval_store.py (4)
src/ai_company/api/approval_store.py (5)
  • ApprovalStore (27-181)
  • add (42-59)
  • save_if_pending (125-142)
  • list_items (75-104)
  • get (61-73)
src/ai_company/core/enums.py (2)
  • ApprovalStatus (434-440)
  • ApprovalRiskLevel (443-449)
tests/unit/core/test_approval.py (1)
  • _now (11-12)
src/ai_company/core/approval.py (1)
  • ApprovalItem (24-96)
tests/unit/api/test_dto.py (2)
src/ai_company/api/dto.py (3)
  • ApiResponse (39-57)
  • ApproveRequest (247-256)
  • CreateApprovalRequest (196-244)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
src/ai_company/engine/approval_gate.py (7)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
src/ai_company/security/timeout/parked_context.py (1)
  • ParkedContext (19-66)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/context.py (1)
  • AgentContext (87-307)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/core/validation.py (1)
src/ai_company/tools/base.py (1)
  • action_type (133-135)
src/ai_company/api/dto.py (1)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
🪛 LanguageTool
CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (18)
web/src/stores/approvals.ts (3)

3-3: LGTM!

Appropriate import of axios for isAxiosError() type guard used in the error handling branches.


56-72: LGTM!

Clear documentation of the synchronous contract with async IIFE pattern. The payload validation defensively handles edge cases, and the approval_id type narrowing is correctly implemented.


73-127: Past review concerns properly addressed.

The refactored handler now correctly reconciles filtered views by calling fetchApprovals(activeFilters.value) when filters are active (lines 78-80 and 99-101), rather than unconditionally mutating local state. The 404/410 handling and removal logic are also sound.

CLAUDE.md (1)

1-242: LGTM!

Documentation updates accurately reflect the new approval gate components, event constants, and integration points. The expanded event names section comprehensively covers the approval gate lifecycle events.

src/ai_company/engine/loop_helpers.py (3)

11-14: LGTM!

Event constants are correctly imported from the domain-specific module and the new APPROVAL_GATE_PARK_TASKLESS constant semantically matches the debug log at line 373-378, addressing the concern about using a misleading event constant.


241-342: LGTM!

The approval gate integration is well-structured:

  • Keyword-only approval_gate parameter prevents accidental positional usage
  • Escalation check happens after tool results are appended to context
  • Clear delegation to _park_for_approval helper

346-419: LGTM!

The _park_for_approval helper correctly:

  • Extracts task_id safely with None fallback for taskless agents
  • Logs appropriately with APPROVAL_GATE_PARK_TASKLESS for the taskless case
  • Returns ERROR termination with descriptive metadata when parking fails
  • Includes parking_failed flag in metadata for downstream handling
src/ai_company/core/validation.py (1)

1-19: LGTM!

Clean, focused validation utility with proper docstring and type hints. The implementation correctly handles edge cases like whitespace-only segments.

tests/unit/core/test_validation.py (1)

1-41: LGTM!

Comprehensive parameterized test coverage for both valid and invalid action_type formats. Edge cases like whitespace-only segments, empty strings, and multiple colons are well covered.

tests/unit/api/test_dto.py (1)

1-175: LGTM!

Comprehensive test coverage for the updated DTOs:

  • action_type format validation with parameterized valid/invalid cases
  • TTL bounds testing at boundary conditions
  • ApproveRequest.comment optional/length constraints
src/ai_company/api/dto.py (3)

10-27: LGTM!

Clean import organization with field_validator from Pydantic v2 and the shared is_valid_action_type utility from core validation.


196-244: LGTM!

Well-structured CreateApprovalRequest with:

  • action_type format validation via @field_validator
  • TTL bounds (60s min, 7 days max) via Field constraints
  • Metadata bounds validation via @model_validator(mode="after")

247-257: LGTM!

ApproveRequest correctly uses NotBlankStr | None with max_length=4096 for the optional comment field, consistent with other description-length fields in the codebase.

tests/unit/engine/test_approval_gate.py (1)

1-369: LGTM!

Excellent test coverage for the ApprovalGate service:

  • should_park behavior for empty/non-empty escalations
  • park_context serialization and persistence paths including error handling
  • resume_context load/delete/error scenarios with resilience testing
  • build_resume_message formatting including prompt injection mitigation via repr() escaping
  • Initialization warning behavior when no repository is provided
tests/unit/engine/test_loop_helpers_approval.py (2)

36-43: LGTM!

The model identifier correctly uses test-small-001 following the repository's standardized test naming pattern.


67-273: LGTM!

Comprehensive test coverage for approval gate integration in execute_tool_calls:

  • No-gate path returns context normally
  • Gate with no escalation continues execution
  • Escalation triggers PARKED result with metadata
  • Parking failures return ERROR with parking_failed flag
  • Taskless agents park with task_id=None
  • IO errors are handled gracefully
tests/unit/tools/test_approval_tool.py (2)

153-177: Parameterize the classifier matrix.

These two tests only vary by action_type and expected risk_level. Folding them into one parametrized case will keep the classifier expectations aligned and easier to extend.

[ suggest_recommended_refactor ]

♻️ Refactor sketch
 class TestRiskClassification:
     """Risk classification with and without classifier."""
 
-    async def test_with_classifier_uses_known_action(
-        self,
-        tool_with_classifier: RequestHumanApprovalTool,
-    ) -> None:
-        result = await tool_with_classifier.execute(
-            arguments={
-                "action_type": "deploy:production",
-                "title": "Deploy v2.0",
-                "description": "Full deployment",
-            },
-        )
-        assert result.metadata["risk_level"] == "critical"
-
-    async def test_with_classifier_unknown_defaults_to_high(
+    `@pytest.mark.parametrize`(
+        ("action_type", "title", "description", "expected_risk"),
+        [
+            (
+                "deploy:production",
+                "Deploy v2.0",
+                "Full deployment",
+                ApprovalRiskLevel.CRITICAL.value,
+            ),
+            (
+                "custom:unknown",
+                "Custom action",
+                "Unknown action type",
+                ApprovalRiskLevel.HIGH.value,
+            ),
+        ],
+    )
+    async def test_with_classifier_risk_level(
         self,
         tool_with_classifier: RequestHumanApprovalTool,
+        action_type: str,
+        title: str,
+        description: str,
+        expected_risk: str,
     ) -> None:
         result = await tool_with_classifier.execute(
             arguments={
-                "action_type": "custom:unknown",
-                "title": "Custom action",
-                "description": "Unknown action type",
+                "action_type": action_type,
+                "title": title,
+                "description": description,
             },
         )
-        assert result.metadata["risk_level"] == "high"
+        assert result.metadata["risk_level"] == expected_risk
As per coding guidelines, Prefer `@pytest.mark.parametrize` for testing similar cases.

183-208: Nice validation matrix.

The parametrized invalid action_type cases keep the negative-path coverage compact and easy to extend.

As per coding guidelines, Prefer @pytest.mark.parametrize for testing similar cases.

Comment on lines +213 to +247
deleted = False
try:
deleted = await self._parked_context_repo.delete(parked.id)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="Context resumed but parked record not cleaned up",
)

if not deleted:
logger.warning(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="delete() returned False — parked record may still exist",
)
else:
logger.debug(
APPROVAL_GATE_CONTEXT_RESUMED,
approval_id=approval_id,
parked_id=parked.id,
note="Parked record successfully deleted",
)

logger.info(
APPROVAL_GATE_CONTEXT_RESUMED,
approval_id=approval_id,
parked_id=parked.id,
)
return context, parked.id

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

resume_context() can replay the same parked execution when cleanup fails.

When delete() fails or returns False, the method still returns (context, parked.id). A later retry can load the same parked record again and resume duplicate work.

Suggested fix
         deleted = False
         try:
             deleted = await self._parked_context_repo.delete(parked.id)
         except MemoryError, RecursionError:
             raise
         except Exception:
             logger.exception(
                 APPROVAL_GATE_RESUME_DELETE_FAILED,
                 approval_id=approval_id,
                 parked_id=parked.id,
                 note="Context resumed but parked record not cleaned up",
             )
+            raise
 
         if not deleted:
-            logger.warning(
+            logger.error(
                 APPROVAL_GATE_RESUME_DELETE_FAILED,
                 approval_id=approval_id,
                 parked_id=parked.id,
                 note="delete() returned False — parked record may still exist",
             )
+            msg = "Failed to delete parked context; refusing resume to avoid replay"
+            raise RuntimeError(msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/approval_gate.py` around lines 213 - 247, The
resume_context() flow currently returns (context, parked.id) even when
self._parked_context_repo.delete(parked.id) raises or returns False, which lets
retries replay the same parked execution; change the logic in resume_context so
that after attempting delete (the call to self._parked_context_repo.delete in
resume_context) you only return (context, parked.id) if delete succeeded (True);
if delete raised an exception (other than critical exceptions) or returned False
then raise a specific exception (e.g., RuntimeError or a custom
DeleteParkedContextError) including approval_id and parked.id so callers know
cleanup failed and won’t resume the same parked record; also fix the exception
clause around MemoryError/RecursionError to use the tuple form (except
(MemoryError, RecursionError):) so those are re-raised correctly.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 14, 2026

Codecov Report

❌ Patch coverage is 89.44444% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.00%. Comparing base (494013f) to head (cd3768f).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/ai_company/api/controllers/approvals.py 67.85% 8 Missing and 1 partial ⚠️
src/ai_company/tools/invoker.py 66.66% 7 Missing and 1 partial ⚠️
src/ai_company/engine/approval_gate.py 92.59% 5 Missing and 1 partial ⚠️
src/ai_company/engine/agent_engine.py 72.22% 3 Missing and 2 partials ⚠️
src/ai_company/engine/_security_factory.py 92.30% 0 Missing and 4 partials ⚠️
src/ai_company/tools/approval_tool.py 93.65% 4 Missing ⚠️
src/ai_company/api/approval_store.py 93.33% 1 Missing ⚠️
src/ai_company/engine/loop_helpers.py 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #368      +/-   ##
==========================================
+ Coverage   93.90%   94.00%   +0.09%     
==========================================
  Files         447      453       +6     
  Lines       20819    21117     +298     
  Branches     2011     2038      +27     
==========================================
+ Hits        19551    19851     +300     
+ Misses        981      975       -6     
- Partials      287      291       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Split park_context/resume_context into smaller helpers (<50 lines each)
- Fix duplicate APPROVAL_GATE_CONTEXT_RESUMED emission on success path
- Add V6 migration to make parked_contexts.task_id nullable on existing DBs
- Warn when external execution_loop bypasses approval gate wiring
- Guard execute() against KeyError on missing arguments
- Use APPROVAL_GATE_RISK_CLASSIFY_FAILED for "no classifier" debug log
- Fix TOCTOU race on WS duplicate prevention with post-fetch re-check
Copilot AI review requested due to automatic review settings March 14, 2026 09:07
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 09:08 — with GitHub Actions Inactive
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an approval-gating mechanism to the agent execution lifecycle so SecOps ESCALATE verdicts and agent-initiated request_human_approval tool calls can park execution, persist context, and later resume with a decision message injected into the conversation.

Changes:

  • Introduces ApprovalGate, escalation/resume models, and RequestHumanApprovalTool, and wires escalation detection into ToolInvoker plus both ReAct and Plan/Execute loops.
  • Hardens approvals API flows (auth-bound requested_by, optimistic concurrency via save_if_pending, resume-intent signalling, and broader WS publishing error handling) and updates DTO validation.
  • Fixes frontend WebSocket approval events by consuming approval_id and fetching full items via API, with expanded unit tests and observability event constants.

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
web/src/stores/approvals.ts Updates WS handling to use approval_id and fetch full items for list consistency.
web/src/tests/stores/approvals.test.ts Adds async WS handler coverage and error-path tests.
tests/unit/tools/test_invoker_escalation.py Adds unit tests for ToolInvoker escalation tracking behavior.
tests/unit/tools/test_approval_tool.py Adds unit tests for RequestHumanApprovalTool validation, persistence, and risk classification.
tests/unit/persistence/test_migrations_v2.py Updates schema-version assertions for the SQLite migration bump.
tests/unit/observability/test_events.py Extends event discovery expectations for the new approval_gate domain.
tests/unit/engine/test_security_factory.py Adds coverage for extracted security factory and approval-tool registry wiring.
tests/unit/engine/test_loop_helpers_approval.py Tests loop helper parking behavior when escalations are present.
tests/unit/engine/test_approval_gate_models.py Tests validation/immutability of escalation and resume payload models.
tests/unit/engine/test_approval_gate.py Tests ApprovalGate park/resume flows and resume-message construction.
tests/unit/core/test_validation.py Adds tests for shared category:action action_type validation helper.
tests/unit/api/test_dto.py Updates DTO tests for action_type validation and request schema changes.
tests/unit/api/test_approval_store.py Adds tests for save_if_pending, combined filters, and expiration callback behavior.
tests/unit/api/controllers/test_approvals_helpers.py Tests approvals controller helpers (auth checks, publishing, resume intent).
tests/unit/api/controllers/test_approvals.py Updates controller tests for action_type format and removal of requested_by from payload.
src/ai_company/tools/registry.py Adds all_tools() to support composing registries with the approval tool.
src/ai_company/tools/invoker.py Tracks ESCALATE and parking-metadata escalations via pending_escalations.
src/ai_company/tools/approval_tool.py Implements request_human_approval tool and persists ApprovalItems with parking metadata.
src/ai_company/tools/init.py Exports the new approval tool.
src/ai_company/security/timeout/parked_context.py Makes parked context task_id nullable for taskless agents.
src/ai_company/security/timeout/park_service.py Allows task_id=None during context parking.
src/ai_company/persistence/sqlite/migrations.py Bumps schema to v6 and alters parked_contexts schema for nullable task_id.
src/ai_company/observability/events/approval_gate.py Adds structured event constants for approval gate lifecycle.
src/ai_company/engine/react_loop.py Wires optional ApprovalGate into ReAct tool execution path.
src/ai_company/engine/plan_execute_loop.py Wires optional ApprovalGate into Plan/Execute step tool execution path.
src/ai_company/engine/loop_helpers.py Parks execution and returns PARKED termination when escalations require approval.
src/ai_company/engine/approval_gate_models.py Adds Pydantic models for escalation info and resume payloads.
src/ai_company/engine/approval_gate.py Implements context park/persist/resume coordination and safe resume-message formatting.
src/ai_company/engine/agent_engine.py Constructs ApprovalGate, wires it into default loop, and injects approval tool into registry.
src/ai_company/engine/_security_factory.py Extracts security interceptor + approval-tool registry composition from AgentEngine.
src/ai_company/engine/init.py Exposes ApprovalGate and related models at package level.
src/ai_company/core/validation.py Adds shared is_valid_action_type() helper.
src/ai_company/api/state.py Stores/exports an optional ApprovalGate on app state.
src/ai_company/api/dto.py Removes requested_by from create DTO, adds action_type validator, and bounds approve comment length.
src/ai_company/api/controllers/approvals.py Binds requested_by to auth user, adds concurrency guard, and emits resume-intent signals.
src/ai_company/api/approval_store.py Adds save_if_pending() and hardens expiration callback handling.
docs/design/engine.md Documents approval-gate parking behavior in the run loop.
README.md Updates status to reflect approval gates implemented.
CLAUDE.md Updates repository/module overview and observability event guidance for approval gate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +129 to +132
"""Update only if the stored item is still PENDING.

Returns the saved item on success, or ``None`` if the stored
item is no longer PENDING (concurrent decision detected).
Comment on lines +237 to +239
"INSERT OR IGNORE INTO parked_contexts_new SELECT * FROM parked_contexts",
"DROP TABLE IF EXISTS parked_contexts",
"ALTER TABLE parked_contexts_new RENAME TO parked_contexts",
Comment on lines +402 to +410
saved = await app_state.approval_store.save_if_pending(updated)
if saved is None:
msg = "Approval has already been decided by another request"
logger.warning(
API_RESOURCE_NOT_FOUND,
resource="approval",
id=approval_id,
note="disappeared between get and save",
API_APPROVAL_CONFLICT,
approval_id=approval_id,
note=msg,
)
msg = f"Approval {approval_id!r} not found"
raise NotFoundError(msg)
raise ConflictError(msg)
Comment on lines +483 to +491
saved = await app_state.approval_store.save_if_pending(updated)
if saved is None:
msg = "Approval has already been decided by another request"
logger.warning(
API_RESOURCE_NOT_FOUND,
resource="approval",
id=approval_id,
note="disappeared between get and save",
API_APPROVAL_CONFLICT,
approval_id=approval_id,
note=msg,
)
msg = f"Approval {approval_id!r} not found"
raise NotFoundError(msg)
raise ConflictError(msg)
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ai_company/engine/agent_engine.py (1)

631-667: 🛠️ Refactor suggestion | 🟠 Major

Extract the approval wiring out of AgentEngine.

This file is now 871 lines, above the repo's 800-line cap, and the new gate/tool/security factory logic widens an already broad orchestrator. Moving this wiring into a dedicated builder/module would keep execution flow readable and the class within policy.

As per coding guidelines, "Functions must be less than 50 lines; files must be less than 800 lines."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 631 - 667, Extract the
approval/permission/security wiring out of AgentEngine by creating a dedicated
builder (e.g., ToolInvokerFactory or ToolBuilder) that encapsulates the logic
currently in _make_security_interceptor and _make_tool_invoker: move the calls
to make_security_interceptor, registry_with_approval_tool,
ToolPermissionChecker.from_permissions and construction of ToolInvoker
(including use of self._approval_store, self._tool_registry, and
identity/task_id/effective_autonomy) into that new class, give it a clear API
(e.g., build_tool_invoker(identity, task_id=None, effective_autonomy=None) and
optionally build_security_interceptor(effective_autonomy=None)), update
AgentEngine._make_tool_invoker and _make_security_interceptor to delegate to the
new builder, and ensure no behavioral changes (preserve parameters and return
types for ToolInvoker and SecurityInterceptionStrategy).
♻️ Duplicate comments (2)
src/ai_company/tools/approval_tool.py (1)

113-124: ⚠️ Potential issue | 🟠 Major

Reject malformed tool args instead of coercing them.

str() turns None, numbers, and objects into persisted approval text, so invalid tool payloads can create real approval items instead of returning a validation error. Validate action_type, title, and description as strings at the boundary and log the rejection before returning.

Possible fix
-        try:
-            action_type = str(arguments["action_type"])
-            title = str(arguments["title"])
-            description = str(arguments["description"])
-        except KeyError as exc:
+        action_type = arguments.get("action_type")
+        title = arguments.get("title")
+        description = arguments.get("description")
+        if not (
+            isinstance(action_type, str)
+            and isinstance(title, str)
+            and isinstance(description, str)
+        ):
+            logger.warning(
+                APPROVAL_GATE_ESCALATION_FAILED,
+                agent_id=self._agent_id,
+                note="Invalid request_human_approval arguments",
+            )
             return ToolExecutionResult(
                 content=(
-                    f"Missing required argument: {exc}. "
-                    f"Required: action_type, title, description"
+                    "Invalid arguments: action_type, title, and description "
+                    "must be strings."
                 ),
                 is_error=True,
             )
As per coding guidelines, "Validate at system boundaries: user input, external APIs, config files."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/approval_tool.py` around lines 113 - 124, The current
code coerces non-string values via str() for arguments["action_type"],
["title"], and ["description"], allowing None/number/object payloads to create
approvals; change this to explicitly validate that arguments contains these keys
and that each value is an instance of str (and non-empty if desired) before
proceeding, log a clear rejection message (including which arg failed and its
type/value) using the module/class logger, and return the existing
ToolExecutionResult error path when validation fails; reference the arguments
dict, the action_type/title/description symbols, and ToolExecutionResult to
locate where to add the checks and logging.
src/ai_company/engine/approval_gate.py (1)

260-287: ⚠️ Potential issue | 🟠 Major

Treat parked-record cleanup failures as fatal.

resume_context() still returns success after delete() raises or returns False, which leaves the parked row available for a later retry and can replay the same execution twice. _cleanup_parked() should raise (or return a hard failure) so the caller never resumes work until cleanup succeeds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/approval_gate.py` around lines 260 - 287, The
_cleanup_parked method currently logs failures from
self._parked_context_repo.delete(parked.id) but does not propagate them,
allowing resume_context to proceed; change _cleanup_parked (and its except
handling) so any exception from _parked_context_repo.delete is re-raised after
logging (fix the except syntax to "except (MemoryError, RecursionError):" for
proper re-raise), and if delete returns False treat that as fatal by logging and
then raising a clear exception (e.g., RuntimeError or a custom CleanupError)
with approval_id and parked.id so callers (resume_context) never continue when
cleanup failed; keep the APPROVAL_GATE_RESUME_DELETE_FAILED log messages for
context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 609-625: The current _make_approval_gate() enables ApprovalGate
whenever _approval_store is present even if _parked_context_repo is None, which
would create non-resumable "parked" approvals; update _make_approval_gate to
require both _approval_store and _parked_context_repo before constructing
ApprovalGate: if _approval_store is None return None as before, but if
_approval_store is present and _parked_context_repo is None raise a clear
RuntimeError (or otherwise keep approval gating disabled) so we fail fast
instead of creating an ApprovalGate with parked_context_repo=None; reference the
_make_approval_gate function, ApprovalGate class, _approval_store and
_parked_context_repo when making the change.

In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 224-241: The v6 migration (_V6_STATEMENTS) assumes parked_contexts
exists and always runs the INSERT/DROP/RENAME sequence, which breaks recovery
when parked_contexts was dropped but parked_contexts_new was not renamed; update
the migration logic in the _apply_v6 path to explicitly handle three states: (1)
normal pre-migration where parked_contexts exists — create parked_contexts_new,
copy rows, drop old table, rename new; (2) interrupted state where
parked_contexts is missing but parked_contexts_new already exists — skip the
INSERT and DROP and just ensure parked_contexts_new is renamed to
parked_contexts (or verify indices); and (3) already-migrated state where
parked_contexts exists and parked_contexts_new does not — no-op; implement these
checks by querying sqlite_master or using "SELECT name FROM sqlite_master WHERE
type='table' AND name IN ('parked_contexts','parked_contexts_new')" and
conditionally executing the appropriate statements instead of blindly running
the current _V6_STATEMENTS sequence.

In `@tests/unit/persistence/test_migrations_v2.py`:
- Around line 31-32: Update the test_schema_version_is_five (and the similar
test at lines 53-61) to actually verify the v5→v6 migration behavior: create a
pre-migration parked_contexts table with task_id NOT NULL, call the migration
runner (run_migrations or the test helper that invokes _apply_v6), then query
PRAGMA table_info('parked_contexts') and assert that the returned row for column
"task_id" has notnull == 0; keep the existing SCHEMA_VERSION assertion but add
this setup, migration invocation, and PRAGMA-based assertion to ensure _apply_v6
truly makes parked_contexts.task_id nullable.

In `@web/src/stores/approvals.ts`:
- Around line 78-87: The code reads activeFilters.value before awaiting
approvalsApi.getApproval(approvalId), so if filters change while the network
call is in flight you may mutate the wrong list; after the await re-read
activeFilters.value and if it no longer matches the value used to choose the
branch call fetchApprovals(activeFilters.value) (or fetchApprovals() when falsy)
instead of mutating approvals.value/total.value; update the same logic in the
other occurrence (lines ~102-119) and reference fetchApprovals,
approvalsApi.getApproval, activeFilters.value, approvals.value, and total.value
when making the change.
- Around line 78-80: WS-triggered fetchApprovals() calls can race with
user-driven requests and allow stale, slower responses to overwrite newer state;
update fetchApprovals to include request sequencing or cancellation so only the
latest response updates approvals, total, and loading (e.g., maintain a
per-store requestId/counter or an AbortController token checked on response and
incremented on each new invocation) and ensure WS handlers that call
fetchApprovals honor that mechanism so older responses are ignored and loading
reflects the real in-flight status.

---

Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 631-667: Extract the approval/permission/security wiring out of
AgentEngine by creating a dedicated builder (e.g., ToolInvokerFactory or
ToolBuilder) that encapsulates the logic currently in _make_security_interceptor
and _make_tool_invoker: move the calls to make_security_interceptor,
registry_with_approval_tool, ToolPermissionChecker.from_permissions and
construction of ToolInvoker (including use of self._approval_store,
self._tool_registry, and identity/task_id/effective_autonomy) into that new
class, give it a clear API (e.g., build_tool_invoker(identity, task_id=None,
effective_autonomy=None) and optionally
build_security_interceptor(effective_autonomy=None)), update
AgentEngine._make_tool_invoker and _make_security_interceptor to delegate to the
new builder, and ensure no behavioral changes (preserve parameters and return
types for ToolInvoker and SecurityInterceptionStrategy).

---

Duplicate comments:
In `@src/ai_company/engine/approval_gate.py`:
- Around line 260-287: The _cleanup_parked method currently logs failures from
self._parked_context_repo.delete(parked.id) but does not propagate them,
allowing resume_context to proceed; change _cleanup_parked (and its except
handling) so any exception from _parked_context_repo.delete is re-raised after
logging (fix the except syntax to "except (MemoryError, RecursionError):" for
proper re-raise), and if delete returns False treat that as fatal by logging and
then raising a clear exception (e.g., RuntimeError or a custom CleanupError)
with approval_id and parked.id so callers (resume_context) never continue when
cleanup failed; keep the APPROVAL_GATE_RESUME_DELETE_FAILED log messages for
context.

In `@src/ai_company/tools/approval_tool.py`:
- Around line 113-124: The current code coerces non-string values via str() for
arguments["action_type"], ["title"], and ["description"], allowing
None/number/object payloads to create approvals; change this to explicitly
validate that arguments contains these keys and that each value is an instance
of str (and non-empty if desired) before proceeding, log a clear rejection
message (including which arg failed and its type/value) using the module/class
logger, and return the existing ToolExecutionResult error path when validation
fails; reference the arguments dict, the action_type/title/description symbols,
and ToolExecutionResult to locate where to add the checks and logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ee0ae83-5f68-4c43-9eb1-378901c8917a

📥 Commits

Reviewing files that changed from the base of the PR and between 8defb9e and 40aa50f.

📒 Files selected for processing (6)
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/tools/approval_tool.py
  • tests/unit/persistence/test_migrations_v2.py
  • web/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Agent
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/persistence/sqlite/migrations.py
  • tests/unit/persistence/test_migrations_v2.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/approval_gate.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, ConfigDict. Use @computed_field for derived values instead of storing redundant fields (e.g., TokenUsage.total_tokens).
Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name for loggers (not _logger, not log).
Event names must always use constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import dire...

Files:

  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/approval_gate.py
src/ai_company/**/[!_]*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Files:

  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/approval_gate.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/persistence/test_migrations_v2.py
web/src/**/*.{vue,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard: Vue 3 + PrimeVue + Tailwind CSS, organized by feature in src/components/, src/stores/, src/views/. Enforce with ESLint and vue-tsc type-checking.

Files:

  • web/src/stores/approvals.ts
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Frontend state management: Pinia stores organized by feature in src/stores/

Files:

  • web/src/stores/approvals.ts
🧠 Learnings (5)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to docs/design/**/*.md : When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality.

Applied to files:

  • web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.

Applied to files:

  • web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/stores/**/*.ts : Frontend state management: Pinia stores organized by feature in src/stores/

Applied to files:

  • web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Functions must be less than 50 lines; files must be less than 800 lines.

Applied to files:

  • src/ai_company/engine/approval_gate.py
🧬 Code graph analysis (4)
tests/unit/persistence/test_migrations_v2.py (2)
src/ai_company/persistence/sqlite/migrations.py (1)
  • get_user_version (247-251)
tests/unit/persistence/sqlite/conftest.py (1)
  • memory_db (15-22)
src/ai_company/engine/agent_engine.py (3)
src/ai_company/engine/_security_factory.py (1)
  • make_security_interceptor (39-117)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/security/timeout/park_service.py (1)
  • ParkService (29-144)
web/src/stores/approvals.ts (1)
web/src/api/types.ts (1)
  • WsEvent (519-524)
src/ai_company/tools/approval_tool.py (6)
src/ai_company/core/enums.py (1)
  • ToolCategory (294-308)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
src/ai_company/tools/base.py (5)
  • BaseTool (57-184)
  • ToolExecutionResult (25-54)
  • description (138-140)
  • category (128-130)
  • action_type (133-135)
src/ai_company/api/approval_store.py (2)
  • ApprovalStore (27-181)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
🔇 Additional comments (2)
web/src/stores/approvals.ts (1)

56-71: Good runtime guard for the new approval_id payload.

Switching to payload.approval_id and returning early on malformed payloads keeps the store aligned with the backend WS contract and avoids mutating state from bad events.

src/ai_company/engine/approval_gate.py (1)

289-322: Good separation of control data from untrusted approval text.

Keeping the decision in a fixed system wrapper and labeling the human reason as untrusted data is the right shape for resume injection hardening.

Comment on lines +609 to +625
def _make_approval_gate(self) -> ApprovalGate | None:
"""Build an ApprovalGate if an approval store is configured.

Raises:
ExecutionStateError: If effective_autonomy is provided but
no SecurityConfig is configured — autonomy cannot be
enforced without the security subsystem.
Returns ``None`` when no approval store is available — the
execution loop skips approval-gate checks in that case.
"""
if self._security_config is None:
if effective_autonomy is not None:
msg = (
"effective_autonomy cannot be enforced without "
"SecurityConfig — configure security or remove autonomy"
)
logger.error(SECURITY_DISABLED, note=msg)
raise ExecutionStateError(msg)
logger.warning(
SECURITY_DISABLED,
note="No SecurityConfig provided — all security checks skipped",
)
return None
if not self._security_config.enabled:
if effective_autonomy is not None:
msg = "effective_autonomy cannot be enforced when security is disabled"
logger.error(SECURITY_DISABLED, note=msg)
raise ExecutionStateError(msg)
if self._approval_store is None:
return None

cfg = self._security_config
re_cfg = cfg.rule_engine
policy_validator = PolicyValidator(
hard_deny_action_types=frozenset(cfg.hard_deny_action_types),
auto_approve_action_types=frozenset(cfg.auto_approve_action_types),
)
# Build the detector list respecting config flags.
detectors: list[SecurityRule] = [policy_validator]
if re_cfg.credential_patterns_enabled:
detectors.append(CredentialDetector())
if re_cfg.path_traversal_detection_enabled:
detectors.append(PathTraversalDetector())
if re_cfg.destructive_op_detection_enabled:
detectors.append(DestructiveOpDetector())
if re_cfg.data_leak_detection_enabled:
detectors.append(DataLeakDetector())

rule_engine = RuleEngine(
rules=tuple(detectors),
risk_classifier=RiskClassifier(),
config=re_cfg,
)
return SecOpsService(
config=cfg,
rule_engine=rule_engine,
audit_log=self._audit_log,
output_scanner=OutputScanner(),
from ai_company.security.timeout.park_service import ( # noqa: PLC0415
ParkService,
)

return ApprovalGate(
park_service=ParkService(),
parked_context_repo=self._parked_context_repo,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't enable approval parking without persisted parked contexts.

_make_approval_gate() creates a live gate whenever approval_store exists, but ApprovalGate explicitly treats parked_context_repo=None as non-resumable. In this configuration the engine can return PARKED and queue an approval that can never restore the execution. Fail fast here, or keep approval gating disabled until both components are configured.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 609 - 625, The current
_make_approval_gate() enables ApprovalGate whenever _approval_store is present
even if _parked_context_repo is None, which would create non-resumable "parked"
approvals; update _make_approval_gate to require both _approval_store and
_parked_context_repo before constructing ApprovalGate: if _approval_store is
None return None as before, but if _approval_store is present and
_parked_context_repo is None raise a clear RuntimeError (or otherwise keep
approval gating disabled) so we fail fast instead of creating an ApprovalGate
with parked_context_repo=None; reference the _make_approval_gate function,
ApprovalGate class, _approval_store and _parked_context_repo when making the
change.

Comment on lines +224 to +241
_V6_STATEMENTS: Sequence[str] = (
# ── Make parked_contexts.task_id nullable ─────────────
"""\
CREATE TABLE IF NOT EXISTS parked_contexts_new (
id TEXT PRIMARY KEY,
execution_id TEXT NOT NULL,
agent_id TEXT NOT NULL,
task_id TEXT,
approval_id TEXT NOT NULL,
parked_at TEXT NOT NULL,
context_json TEXT NOT NULL,
metadata TEXT NOT NULL DEFAULT '{}'
)""",
"INSERT OR IGNORE INTO parked_contexts_new SELECT * FROM parked_contexts",
"DROP TABLE IF EXISTS parked_contexts",
"ALTER TABLE parked_contexts_new RENAME TO parked_contexts",
"CREATE INDEX IF NOT EXISTS idx_pc_agent_id ON parked_contexts(agent_id)",
"CREATE INDEX IF NOT EXISTS idx_pc_approval_id ON parked_contexts(approval_id)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the v6 rebuild rerunnable from a partially applied v5→v6 upgrade.

If startup dies after Line 238 drops parked_contexts but before Line 239 renames parked_contexts_new, user_version is still 5 and the next run executes only _apply_v6(). Line 237 then fails with no such table: parked_contexts, so recovery requires manual DB repair. Handle the parked_contexts_new exists / parked_contexts missing state explicitly instead of always copying from the old table.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/persistence/sqlite/migrations.py` around lines 224 - 241, The
v6 migration (_V6_STATEMENTS) assumes parked_contexts exists and always runs the
INSERT/DROP/RENAME sequence, which breaks recovery when parked_contexts was
dropped but parked_contexts_new was not renamed; update the migration logic in
the _apply_v6 path to explicitly handle three states: (1) normal pre-migration
where parked_contexts exists — create parked_contexts_new, copy rows, drop old
table, rename new; (2) interrupted state where parked_contexts is missing but
parked_contexts_new already exists — skip the INSERT and DROP and just ensure
parked_contexts_new is renamed to parked_contexts (or verify indices); and (3)
already-migrated state where parked_contexts exists and parked_contexts_new does
not — no-op; implement these checks by querying sqlite_master or using "SELECT
name FROM sqlite_master WHERE type='table' AND name IN
('parked_contexts','parked_contexts_new')" and conditionally executing the
appropriate statements instead of blindly running the current _V6_STATEMENTS
sequence.

Comment on lines +78 to +87
if (activeFilters.value) {
// Filters active — re-fetch the filtered query to stay consistent
await fetchApprovals(activeFilters.value)
} else {
try {
const item = await approvalsApi.getApproval(approvalId)
// Re-check after async fetch to prevent duplicate insertion
if (!approvals.value.some((a) => a.id === approvalId)) {
approvals.value = [item, ...approvals.value]
total.value++
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Re-check the current filters after getApproval() resolves.

The branch is chosen from activeFilters.value before the network await. If the user changes queries while getApproval() is in flight, the old unfiltered path can still prepend, update, or remove against the new filtered list and leave the UI in the wrong state. Re-read activeFilters.value immediately before any local mutation and refresh the current query when it changed.

🛠️ Suggested fix
                 try {
                   const item = await approvalsApi.getApproval(approvalId)
+                  if (activeFilters.value) {
+                    await fetchApprovals(activeFilters.value)
+                    return
+                  }
                   // Re-check after async fetch to prevent duplicate insertion
                   if (!approvals.value.some((a) => a.id === approvalId)) {
                     approvals.value = [item, ...approvals.value]
                     total.value++
                   }
@@
               try {
                 const updated = await approvalsApi.getApproval(approvalId)
+                if (activeFilters.value) {
+                  await fetchApprovals(activeFilters.value)
+                  return
+                }
                 approvals.value = approvals.value.map((a) =>
                   a.id === approvalId ? updated : a,
                 )

Also applies to: 102-119

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/stores/approvals.ts` around lines 78 - 87, The code reads
activeFilters.value before awaiting approvalsApi.getApproval(approvalId), so if
filters change while the network call is in flight you may mutate the wrong
list; after the await re-read activeFilters.value and if it no longer matches
the value used to choose the branch call fetchApprovals(activeFilters.value) (or
fetchApprovals() when falsy) instead of mutating approvals.value/total.value;
update the same logic in the other occurrence (lines ~102-119) and reference
fetchApprovals, approvalsApi.getApproval, activeFilters.value, approvals.value,
and total.value when making the change.

Comment on lines +78 to +80
if (activeFilters.value) {
// Filters active — re-fetch the filtered query to stay consistent
await fetchApprovals(activeFilters.value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Background refreshes need stale-response protection.

These WS-triggered fetchApprovals() calls can overlap with a later user-driven fetch. Right now the slower, older response can land last, overwrite a newer query, and clear loading while the newer request is still in flight. Add request sequencing or cancellation before applying approvals, total, and loading.

🛠️ Suggested fix in `fetchApprovals()`
+  let fetchSeq = 0
+
   async function fetchApprovals(filters?: ApprovalFilters) {
+    const seq = ++fetchSeq
     loading.value = true
     error.value = null
     activeFilters.value = filters ? { ...filters } : undefined
     try {
       const result = await approvalsApi.listApprovals(filters)
+      if (seq !== fetchSeq) return
       approvals.value = result.data
       total.value = result.total
     } catch (err) {
+      if (seq !== fetchSeq) return
       error.value = getErrorMessage(err)
     } finally {
-      loading.value = false
+      if (seq === fetchSeq) loading.value = false
     }
   }

Also applies to: 102-104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/stores/approvals.ts` around lines 78 - 80, WS-triggered
fetchApprovals() calls can race with user-driven requests and allow stale,
slower responses to overwrite newer state; update fetchApprovals to include
request sequencing or cancellation so only the latest response updates
approvals, total, and loading (e.g., maintain a per-store requestId/counter or
an AbortController token checked on response and incremented on each new
invocation) and ensure WS handlers that call fetchApprovals honor that mechanism
so older responses are ignored and loading reflects the real in-flight status.

- Fix double APPROVAL_GATE_RESUME_DELETE_FAILED emission: early return
  after exception log prevents fallthrough to if-not-deleted branch
- Add APPROVAL_GATE_LOOP_WIRING_WARNING constant for external loop
  misconfiguration (was misusing APPROVAL_GATE_INITIALIZED)
- Wire APPROVAL_GATE_RISK_CLASSIFIED to classifier success path; remove
  misleading usage for "no classifier" info log (was RISK_CLASSIFY_FAILED)
- Add APPROVAL_GATE_LOOP_WIRING_WARNING to CLAUDE.md event reference
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/ai_company/engine/agent_engine.py (1)

609-625: ⚠️ Potential issue | 🟠 Major

Do not enable approval parking without parked-context persistence.

This wiring allows PARKED termination with parked_context_repo=None, creating non-resumable parked executions.

🔧 Suggested fix
     def _make_approval_gate(self) -> ApprovalGate | None:
         """Build an ApprovalGate if an approval store is configured.
@@
         if self._approval_store is None:
             return None
+        if self._parked_context_repo is None:
+            msg = (
+                "approval_store configured without parked_context_repo; "
+                "PARKED executions would be non-resumable"
+            )
+            logger.error(
+                APPROVAL_GATE_LOOP_WIRING_WARNING,
+                note=msg,
+            )
+            raise RuntimeError(msg)
 
         from ai_company.security.timeout.park_service import (  # noqa: PLC0415
             ParkService,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 609 - 625, The current
_make_approval_gate creates an ApprovalGate when _approval_store is set even if
self._parked_context_repo is None, allowing non-resumable PARKED executions;
change _make_approval_gate (and its return logic) to only construct and return
an ApprovalGate when both self._approval_store and self._parked_context_repo are
present (otherwise return None or log/raise as appropriate), referencing
ApprovalGate, ParkService, self._approval_store and self._parked_context_repo so
parked-context persistence is enforced before enabling approval parking.
src/ai_company/engine/approval_gate.py (1)

260-287: ⚠️ Potential issue | 🔴 Critical

Fail resume when parked-context cleanup does not succeed.

This path logs delete failures but still lets resume succeed, which can replay the same parked execution later.

🔧 Suggested fix
     async def _cleanup_parked(
         self,
         parked: ParkedContext,
         approval_id: str,
     ) -> None:
@@
         try:
             deleted = await self._parked_context_repo.delete(parked.id)
         except MemoryError, RecursionError:
             raise
-        except Exception:
+        except Exception as exc:
             logger.exception(
                 APPROVAL_GATE_RESUME_DELETE_FAILED,
                 approval_id=approval_id,
                 parked_id=parked.id,
                 note="Context resumed but parked record not cleaned up",
             )
-            return
+            msg = (
+                f"Failed to delete parked context for approval_id={approval_id!r}, "
+                f"parked_id={parked.id!r}"
+            )
+            raise RuntimeError(msg) from exc
 
         if not deleted:
-            logger.warning(
+            logger.error(
                 APPROVAL_GATE_RESUME_DELETE_FAILED,
                 approval_id=approval_id,
                 parked_id=parked.id,
                 note="delete() returned False — may cause duplicate resume",
             )
+            msg = (
+                f"Delete returned False for approval_id={approval_id!r}, "
+                f"parked_id={parked.id!r}"
+            )
+            raise RuntimeError(msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/approval_gate.py` around lines 260 - 287, The
_cleanup_parked method currently logs failures from
self._parked_context_repo.delete(parked.id) but allows resume to succeed, which
can cause duplicate replays; change it so that any delete failure or a False
return value causes the resume to fail by propagating an exception (or raising a
specific ResumeCleanupError) instead of only logging—update the except Exception
block (and the branch where not deleted) to raise after logging (include
APPROVAL_GATE_RESUME_DELETE_FAILED, approval_id and parked_id in the log) so
callers of _cleanup_parked/_resume see the failure and abort replay.
src/ai_company/tools/approval_tool.py (1)

114-117: ⚠️ Potential issue | 🟠 Major

Avoid coercing non-string tool inputs with str(...).

This path still accepts invalid argument types (e.g., dict/int) by coercing them to strings, which weakens the tool contract at the input boundary.

🔧 Suggested fix
         try:
-            action_type = str(arguments["action_type"])
-            title = str(arguments["title"])
-            description = str(arguments["description"])
+            action_type_raw = arguments["action_type"]
+            title_raw = arguments["title"]
+            description_raw = arguments["description"]
         except KeyError as exc:
             return ToolExecutionResult(
                 content=(
                     f"Missing required argument: {exc}. "
                     f"Required: action_type, title, description"
                 ),
                 is_error=True,
             )
+        if (
+            not isinstance(action_type_raw, str)
+            or not isinstance(title_raw, str)
+            or not isinstance(description_raw, str)
+        ):
+            logger.warning(
+                APPROVAL_GATE_ESCALATION_FAILED,
+                agent_id=self._agent_id,
+                note="Invalid request_human_approval argument types",
+            )
+            return ToolExecutionResult(
+                content=(
+                    "Invalid arguments: action_type, title, and description "
+                    "must be strings."
+                ),
+                is_error=True,
+            )
+        action_type = action_type_raw
+        title = title_raw
+        description = description_raw

As per coding guidelines, "Validate at system boundaries: user input, external APIs, config files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/approval_tool.py` around lines 114 - 117, The code
currently coerces inputs with
str(arguments["action_type"/"title"/"description"]) which masks invalid types;
instead validate the arguments dict entries explicitly (e.g., check presence and
use isinstance(arguments["action_type"], str), isinstance(arguments["title"],
str), isinstance(arguments["description"], str)), and if any check fails raise a
clear ValueError or return a structured error explaining the invalid parameter;
update the logic that assigns action_type, title, and description to perform
these checks before assignment and do not use str(...) coercion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Line 154: The long example list of event constants is missing a separating
comma near the end; update the sentence so every listed constant is followed by
a comma (specifically add a comma after `APPROVAL_GATE_LOOP_WIRING_WARNING` from
`events.approval_gate` and ensure other entries like
`APPROVAL_GATE_NO_PARKED_CONTEXT`, `APPROVAL_GATE_LOOP_WIRING_WARNING` are
comma-separated) so the examples remain readable and properly punctuated in the
line that imports EVENT_CONSTANTs from ai_company.observability.events domains.

---

Duplicate comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 609-625: The current _make_approval_gate creates an ApprovalGate
when _approval_store is set even if self._parked_context_repo is None, allowing
non-resumable PARKED executions; change _make_approval_gate (and its return
logic) to only construct and return an ApprovalGate when both
self._approval_store and self._parked_context_repo are present (otherwise return
None or log/raise as appropriate), referencing ApprovalGate, ParkService,
self._approval_store and self._parked_context_repo so parked-context persistence
is enforced before enabling approval parking.

In `@src/ai_company/engine/approval_gate.py`:
- Around line 260-287: The _cleanup_parked method currently logs failures from
self._parked_context_repo.delete(parked.id) but allows resume to succeed, which
can cause duplicate replays; change it so that any delete failure or a False
return value causes the resume to fail by propagating an exception (or raising a
specific ResumeCleanupError) instead of only logging—update the except Exception
block (and the branch where not deleted) to raise after logging (include
APPROVAL_GATE_RESUME_DELETE_FAILED, approval_id and parked_id in the log) so
callers of _cleanup_parked/_resume see the failure and abort replay.

In `@src/ai_company/tools/approval_tool.py`:
- Around line 114-117: The code currently coerces inputs with
str(arguments["action_type"/"title"/"description"]) which masks invalid types;
instead validate the arguments dict entries explicitly (e.g., check presence and
use isinstance(arguments["action_type"], str), isinstance(arguments["title"],
str), isinstance(arguments["description"], str)), and if any check fails raise a
clear ValueError or return a structured error explaining the invalid parameter;
update the logic that assigns action_type, title, and description to perform
these checks before assignment and do not use str(...) coercion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8d77745e-ccbc-4515-ac50-fffa0b25059a

📥 Commits

Reviewing files that changed from the base of the PR and between 40aa50f and e3392f5.

📒 Files selected for processing (5)
  • CLAUDE.md
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/approval_tool.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/approval_tool.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, ConfigDict. Use @computed_field for derived values instead of storing redundant fields (e.g., TokenUsage.total_tokens).
Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name for loggers (not _logger, not log).
Event names must always use constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import dire...

Files:

  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/approval_tool.py
src/ai_company/**/[!_]*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Files:

  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/approval_tool.py
🧠 Learnings (10)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • CLAUDE.md
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/observability/events/approval_gate.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Structured logging: always use `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : NEVER use `import logging`, `logging.getLogger()`, or `print()` in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Always use `logger` as the variable name for loggers (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Functions must be less than 50 lines; files must be less than 800 lines.

Applied to files:

  • src/ai_company/engine/approval_gate.py
🧬 Code graph analysis (2)
src/ai_company/engine/agent_engine.py (7)
src/ai_company/engine/_security_factory.py (2)
  • make_security_interceptor (39-117)
  • registry_with_approval_tool (120-148)
src/ai_company/engine/_validation.py (1)
  • validate_agent (61-73)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-322)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/engine/loop_protocol.py (1)
  • ExecutionLoop (151-189)
src/ai_company/engine/react_loop.py (1)
  • ReactLoop (55-269)
src/ai_company/tools/permissions.py (2)
  • ToolPermissionChecker (35-225)
  • from_permissions (115-128)
src/ai_company/tools/approval_tool.py (5)
src/ai_company/core/enums.py (1)
  • ToolCategory (294-308)
src/ai_company/core/validation.py (1)
  • is_valid_action_type (6-19)
src/ai_company/tools/base.py (2)
  • category (128-130)
  • action_type (133-135)
src/ai_company/api/approval_store.py (2)
  • ApprovalStore (27-181)
  • add (42-59)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
🪛 LanguageTool
CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (1)
src/ai_company/observability/events/approval_gate.py (1)

5-19: Approval-gate event constants are clean and well-scoped.

Nice coverage of the lifecycle and consistent domain prefixing.

- **Never** use `import logging` / `logging.getLogger()` / `print()` in application code
- **Variable name**: always `logger` (not `_logger`, not `log`)
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_STARTED` from `events.execution`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_EXECUTION_STARTED` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_STARTED` from `events.execution`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_EXECUTION_STARTED` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`, `APPROVAL_GATE_ESCALATION_DETECTED` from `events.approval_gate`, `APPROVAL_GATE_ESCALATION_FAILED` from `events.approval_gate`, `APPROVAL_GATE_INITIALIZED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFIED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFY_FAILED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_PARKED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_PARK_FAILED` from `events.approval_gate`, `APPROVAL_GATE_PARK_TASKLESS` from `events.approval_gate`, `APPROVAL_GATE_RESUME_STARTED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_RESUMED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_FAILED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_DELETE_FAILED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_TRIGGERED` from `events.approval_gate`, `APPROVAL_GATE_NO_PARKED_CONTEXT` from `events.approval_gate`, `APPROVAL_GATE_LOOP_WIRING_WARNING` from `events.approval_gate`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix punctuation in the long event-list sentence.

There’s a missing comma in this line, and the sentence becomes hard to parse. Please add the comma to keep the examples readable.

🧰 Tools
🪛 LanguageTool

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 154, The long example list of event constants is missing a
separating comma near the end; update the sentence so every listed constant is
followed by a comma (specifically add a comma after
`APPROVAL_GATE_LOOP_WIRING_WARNING` from `events.approval_gate` and ensure other
entries like `APPROVAL_GATE_NO_PARKED_CONTEXT`,
`APPROVAL_GATE_LOOP_WIRING_WARNING` are comma-separated) so the examples remain
readable and properly punctuated in the line that imports EVENT_CONSTANTs from
ai_company.observability.events domains.

@@ -237,7 +253,7 @@ class ApproveRequest(BaseModel):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RejectRequest.reason missing max_length constraint

ApproveRequest.comment received a max_length=4096 bound in this PR, but RejectRequest.reason (which serves the same role — a user-supplied decision explanation injected into the resume message via build_resume_message) was not updated.

Since _signal_resume_intent passes decision_reason=data.reason to the gate, and the future scheduler will call ApprovalGate.build_resume_message(decision_reason=...) to build the prompt-injected message, an unbounded reason string can produce an arbitrarily large system message. Consistent with the approve path:

Suggested change
reason: NotBlankStr | None = Field(default=None, max_length=4096)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/dto.py
Line: 253

Comment:
**`RejectRequest.reason` missing `max_length` constraint**

`ApproveRequest.comment` received a `max_length=4096` bound in this PR, but `RejectRequest.reason` (which serves the same role — a user-supplied decision explanation injected into the resume message via `build_resume_message`) was not updated.

Since `_signal_resume_intent` passes `decision_reason=data.reason` to the gate, and the future scheduler will call `ApprovalGate.build_resume_message(decision_reason=...)` to build the prompt-injected message, an unbounded `reason` string can produce an arbitrarily large system message. Consistent with the approve path:

```suggestion
    reason: NotBlankStr | None = Field(default=None, max_length=4096)
```

How can I resolve this? If you propose a fix, please make it concise.

- Fix save_if_pending() docstring to cover all None cases (not-found,
  expired, already-decided)
- Fix approve/reject ConflictError message: "no longer pending (already
  decided or expired)" instead of "decided by another request"
- Make V6 migration crash-safe: rename old→backup before replacing,
  explicit column list in INSERT to avoid schema-order coupling
- Add V6 migration tests: verify task_id nullable via PRAGMA, verify
  data preservation across migration
Copilot AI review requested due to automatic review settings March 14, 2026 09:27
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 14, 2026 09:28 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/persistence/test_migrations_v2.py (1)

20-26: 🧹 Nitpick | 🔵 Trivial

Consider using shared fixture from conftest.

The memory_db fixture is duplicated from tests/unit/persistence/sqlite/conftest.py. If this test file is moved or the fixture is needed elsewhere, consider either:

  • Moving this test to tests/unit/persistence/sqlite/ to use the shared fixture
  • Promoting the fixture to a higher-level conftest.py
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/persistence/test_migrations_v2.py` around lines 20 - 26, The
memory_db fixture is duplicated; instead of defining async def memory_db in
tests/unit/persistence/test_migrations_v2.py, remove this local fixture and use
the shared fixture from the existing conftest (or move this test file into
tests/unit/persistence/sqlite/ so it picks up the shared memory_db). Update any
references to the fixture name memory_db in the test to rely on the
conftest-provided fixture, or if you choose promotion, move the memory_db
definition into a higher-level conftest.py so all tests can import it
implicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/api/controllers/approvals.py`:
- Line 430: The return currently uses updated; change it to return the saved
value for defensive consistency: in the approve handler replace "return
ApiResponse(data=updated)" with "return ApiResponse(data=saved)" (saved is the
result of save_if_pending), and make the analogous change in the reject handler
(line where it returns updated around the reject function) so both handlers
return the store-backed "saved" object instead of the earlier "updated"
variable.

---

Outside diff comments:
In `@tests/unit/persistence/test_migrations_v2.py`:
- Around line 20-26: The memory_db fixture is duplicated; instead of defining
async def memory_db in tests/unit/persistence/test_migrations_v2.py, remove this
local fixture and use the shared fixture from the existing conftest (or move
this test file into tests/unit/persistence/sqlite/ so it picks up the shared
memory_db). Update any references to the fixture name memory_db in the test to
rely on the conftest-provided fixture, or if you choose promotion, move the
memory_db definition into a higher-level conftest.py so all tests can import it
implicitly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 161c203d-0e8b-4d82-8180-edf1ed71d802

📥 Commits

Reviewing files that changed from the base of the PR and between e3392f5 and cd3768f.

📒 Files selected for processing (4)
  • src/ai_company/api/approval_store.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/persistence/sqlite/migrations.py
  • tests/unit/persistence/test_migrations_v2.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Agent
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/api/approval_store.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/api/controllers/approvals.py
  • tests/unit/persistence/test_migrations_v2.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, ConfigDict. Use @computed_field for derived values instead of storing redundant fields (e.g., TokenUsage.total_tokens).
Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name for loggers (not _logger, not log).
Event names must always use constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import dire...

Files:

  • src/ai_company/api/approval_store.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/api/controllers/approvals.py
src/ai_company/**/[!_]*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Files:

  • src/ai_company/api/approval_store.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/api/controllers/approvals.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/persistence/test_migrations_v2.py
🧠 Learnings (2)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/ai_company/api/controllers/approvals.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • src/ai_company/api/controllers/approvals.py
🧬 Code graph analysis (4)
src/ai_company/api/approval_store.py (4)
src/ai_company/core/approval.py (1)
  • ApprovalItem (24-96)
src/ai_company/core/enums.py (1)
  • ApprovalStatus (434-440)
src/ai_company/api/app.py (1)
  • _on_expire (74-99)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/persistence/sqlite/migrations.py (2)
tests/unit/persistence/sqlite/test_user_repo.py (1)
  • db (22-28)
tests/unit/hr/test_persistence.py (1)
  • db (29-35)
src/ai_company/api/controllers/approvals.py (4)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/api/errors.py (2)
  • UnauthorizedError (59-65)
  • ConflictError (41-47)
src/ai_company/api/auth/models.py (1)
  • AuthenticatedUser (68-87)
tests/unit/persistence/test_migrations_v2.py (2)
src/ai_company/persistence/sqlite/migrations.py (3)
  • get_user_version (257-261)
  • set_user_version (264-283)
  • run_migrations (334-402)
tests/unit/persistence/sqlite/conftest.py (1)
  • memory_db (15-22)
🔇 Additional comments (15)
src/ai_company/persistence/sqlite/migrations.py (3)

224-252: V6 migration is not idempotent and crashes on partial failure recovery.

The past review comment correctly identified that this migration breaks if interrupted between steps. Specifically:

  1. If parked_contexts_new is created and populated, but parked_contexts rename fails, the next run will fail at line 247 (ALTER TABLE parked_contexts RENAME TO parked_contexts_old) because parked_contexts_new already exists.
  2. If parked_contexts was renamed to parked_contexts_old but the subsequent rename of parked_contexts_new to parked_contexts failed, the INSERT at lines 238-246 will fail because parked_contexts no longer exists.

The migration should check table state before executing and handle partial states.

[raise_major_issue, duplicate_comment]

🛠️ Proposed fix for crash-safe v6 migration
 async def _apply_v6(db: aiosqlite.Connection) -> None:
     """Apply schema v6: make parked_contexts.task_id nullable."""
-    for stmt in _V6_STATEMENTS:
-        await db.execute(stmt)
+    # Check current table state for crash recovery
+    cursor = await db.execute(
+        "SELECT name FROM sqlite_master WHERE type='table' "
+        "AND name IN ('parked_contexts', 'parked_contexts_new', 'parked_contexts_old')"
+    )
+    existing = {row[0] for row in await cursor.fetchall()}
+
+    # Case 1: Already migrated (parked_contexts exists, no temp tables)
+    if "parked_contexts" in existing and "parked_contexts_new" not in existing:
+        # Just ensure indexes exist
+        await db.execute(
+            "CREATE INDEX IF NOT EXISTS idx_pc_agent_id ON parked_contexts(agent_id)"
+        )
+        await db.execute(
+            "CREATE INDEX IF NOT EXISTS idx_pc_approval_id ON parked_contexts(approval_id)"
+        )
+        return
+
+    # Case 2: Interrupted after rename to _old but before _new rename
+    if "parked_contexts_old" in existing and "parked_contexts_new" in existing:
+        await db.execute("ALTER TABLE parked_contexts_new RENAME TO parked_contexts")
+        await db.execute("DROP TABLE IF EXISTS parked_contexts_old")
+        await db.execute(
+            "CREATE INDEX IF NOT EXISTS idx_pc_agent_id ON parked_contexts(agent_id)"
+        )
+        await db.execute(
+            "CREATE INDEX IF NOT EXISTS idx_pc_approval_id ON parked_contexts(approval_id)"
+        )
+        return
+
+    # Case 3: Normal migration path
+    for stmt in _V6_STATEMENTS:
+        await db.execute(stmt)

148-163: LGTM!

The _V3_STATEMENTS correctly defines task_id TEXT (nullable) for fresh database installs. This is consistent with the _V6_STATEMENTS schema, ensuring fresh installs and migrated databases have identical schemas.


318-331: Migration registration is correct.

The _apply_v6 function and its registration in _MIGRATIONS follows the established pattern. The crash-safety concern was addressed in the previous comment.

tests/unit/persistence/test_migrations_v2.py (4)

31-32: LGTM!

The schema version assertion correctly updated to 6.


53-69: LGTM!

The version assertion correctly updated to expect the final schema version 6 after full migration.


98-129: Good test addressing prior review concern.

This test properly verifies the v6 migration makes task_id nullable using PRAGMA table_info. Consider adding assertions to verify the indexes (idx_pc_agent_id, idx_pc_approval_id) are recreated after the table swap.

[approve_code_changes, suggest_optional_refactor]

💡 Optional: Add index verification
         # Verify task_id is now nullable
         cursor = await memory_db.execute("PRAGMA table_info('parked_contexts')")
         cols = {row[1]: row[3] for row in await cursor.fetchall()}
         assert cols["task_id"] == 0  # notnull=0
+
+        # Verify indexes are recreated
+        cursor = await memory_db.execute(
+            "SELECT name FROM sqlite_master WHERE type='index' "
+            "AND tbl_name='parked_contexts'"
+        )
+        indexes = {row[0] for row in await cursor.fetchall()}
+        assert "idx_pc_agent_id" in indexes
+        assert "idx_pc_approval_id" in indexes

131-173: LGTM!

This test properly verifies that existing data is preserved during the v6 migration. The test setup, migration execution, and data verification are all correct.

src/ai_company/api/approval_store.py (2)

125-149: LGTM — well-designed conditional update with clear semantics.

The method correctly applies lazy expiration before the status check, and the absence of await between check and save ensures atomicity within the single-threaded event loop. The docstring accurately documents all three None-return scenarios.


177-186: LGTM — robust callback error handling.

Re-raising MemoryError and RecursionError ensures critical system errors propagate, while logging other exceptions with full traceback via logger.exception provides observability without crashing the expiration flow.

src/ai_company/api/controllers/approvals.py (6)

101-109: LGTM — exception handling with proper critical error propagation.

The PEP 758 except syntax is correctly applied, and re-raising MemoryError/RecursionError ensures critical system errors aren't swallowed.


147-152: LGTM — auth failure now properly logged.

The unauthorized path now logs API_AUTH_FAILED with context before raising, satisfying the error-path observability requirement.


177-213: Intentional signalling stub — acknowledged deferral to future scheduler.

The docstring and .. todo:: clearly document that _signal_resume_intent is a signalling-only stub. The actual resume logic is deferred to a future scheduling component (§12.4) that will observe status changes and call ApprovalGate.resume_context(). This matches the PR objectives describing the separation of concerns between the approval controller and the scheduling layer.


306-329: LGTM — auth binding hardens against prompt injection.

Binding requested_by to the authenticated user's username (rather than accepting it from the request body) mitigates potential prompt-injection vectors and ensures audit integrity.


402-410: LGTM — proper optimistic concurrency control.

The save_if_pending call provides an atomic check-and-save, correctly handling race conditions where the approval status could change between the initial fetch and the update. The error message clearly indicates both "already decided" and "expired" scenarios.


483-510: LGTM — reject flow consistent with approve pattern.

The rejection flow correctly mirrors the approval pattern with save_if_pending, proper conflict handling, and resume intent signalling with the decision reason.

decision_reason=data.comment,
)

return ApiResponse(data=updated)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: consider returning saved for defensive consistency.

When save_if_pending succeeds, saved equals updated, so this is correct. However, returning saved would be slightly more defensive and signal that the returned value comes from the store.

♻️ Optional: use saved in return
-        return ApiResponse(data=updated)
+        return ApiResponse(data=saved)

Same applies to line 511 in reject.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/controllers/approvals.py` at line 430, The return
currently uses updated; change it to return the saved value for defensive
consistency: in the approve handler replace "return ApiResponse(data=updated)"
with "return ApiResponse(data=saved)" (saved is the result of save_if_pending),
and make the analogous change in the reject handler (line where it returns
updated around the reject function) so both handlers return the store-backed
"saved" object instead of the earlier "updated" variable.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an end-to-end approval workflow gate to the engine runtime so SecOps ESCALATE verdicts and request_human_approval tool calls can park execution and later resume safely, with supporting API hardening, persistence changes, and frontend WS fixes.

Changes:

  • Introduces ApprovalGate + models, integrates parking checks into loop helpers and both execution loops, and tracks escalations in ToolInvoker.
  • Adds RequestHumanApprovalTool, DTO validation hardening (action_type format, TTL/comment bounds), controller concurrency guard (save_if_pending), and WS publish resilience.
  • Updates SQLite schema/migrations for taskless parking and fixes dashboard approval WS handling + tests.

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
web/src/stores/approvals.ts Refetches full approval items based on approval_id from WS payloads; reconciles filtered views.
web/src/tests/stores/approvals.test.ts Adds async WS handler tests covering refetch, filtering, and error paths.
tests/unit/tools/test_invoker_escalation.py Verifies ToolInvoker.pending_escalations for SecOps ESCALATE and parking metadata.
tests/unit/tools/test_approval_tool.py Validates request_human_approval tool behavior, validation, and risk classification.
tests/unit/persistence/test_migrations_v2.py Updates schema version expectations and adds v6 migration coverage for nullable task_id.
tests/unit/observability/test_events.py Updates event discovery expectations to include approval_gate.
tests/unit/engine/test_security_factory.py Covers new _security_factory helpers and registry approval-tool wiring.
tests/unit/engine/test_loop_helpers_approval.py Ensures loop helper parks and returns PARKED / errors correctly on parking failures.
tests/unit/engine/test_approval_gate_models.py Tests Pydantic frozen models for escalation + resume payloads.
tests/unit/engine/test_approval_gate.py Tests parking, persistence, resume + resume-message injection mitigation behavior.
tests/unit/core/test_validation.py Adds unit tests for shared is_valid_action_type().
tests/unit/api/test_dto.py Updates tests for DTO constraints (action_type, TTL bounds, approve comment length).
tests/unit/api/test_approval_store.py Adds tests for save_if_pending, filters, and on-expire callback handling.
tests/unit/api/controllers/test_approvals_helpers.py Covers controller helper functions for decision validation + WS publishing behavior.
tests/unit/api/controllers/test_approvals.py Updates action_type payload to match new category:action requirements.
src/ai_company/tools/registry.py Adds all_tools() helper used by approval-tool registry augmentation.
src/ai_company/tools/invoker.py Tracks pending escalations from SecOps verdicts and tool parking metadata; batch ordering made deterministic.
src/ai_company/tools/approval_tool.py Implements RequestHumanApprovalTool creating ApprovalItems and returning parking metadata.
src/ai_company/tools/init.py Exports RequestHumanApprovalTool.
src/ai_company/security/timeout/parked_context.py Allows task_id=None for taskless agents.
src/ai_company/security/timeout/park_service.py Updates park API to accept optional task_id.
src/ai_company/persistence/sqlite/migrations.py Bumps schema to v6 and migrates parked_contexts task_id to nullable.
src/ai_company/observability/events/approval_gate.py Adds structured event constants for approval-gate lifecycle.
src/ai_company/engine/react_loop.py Wires approval_gate into tool execution path via loop helpers.
src/ai_company/engine/plan_execute_loop.py Wires approval_gate into per-step tool execution path.
src/ai_company/engine/loop_helpers.py Adds escalation parking logic and error reporting during parking failures.
src/ai_company/engine/approval_gate_models.py Defines frozen EscalationInfo and ResumePayload.
src/ai_company/engine/approval_gate.py Implements context park/persist/resume + safe resume message builder.
src/ai_company/engine/agent_engine.py Builds/wires default ApprovalGate and injects approval tool into ToolRegistry.
src/ai_company/engine/_security_factory.py Extracts security/tool factory logic and adds approval-tool registry augmentation.
src/ai_company/engine/init.py Re-exports ApprovalGate and gate models.
src/ai_company/core/validation.py Adds shared category:action format validator.
src/ai_company/api/state.py Adds approval_gate to application state.
src/ai_company/api/dto.py Hardens approval DTOs: action_type validator, TTL max, approve comment length.
src/ai_company/api/controllers/approvals.py Binds requested_by to auth user, uses save_if_pending, improves WS error handling, logs resume intent.
src/ai_company/api/approval_store.py Adds save_if_pending() and hardens on-expire callback error handling.
docs/design/engine.md Documents new parking flow in engine lifecycle.
README.md Updates status to reflect approval gates are implemented.
CLAUDE.md Updates repo layout/docs for approval gate + approval tool + event constants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

task_id=task_id,
)
existing = list(tool_registry.all_tools())
return _ToolRegistry([*existing, approval_tool])
Comment on lines 135 to 152
if item.status != ApprovalStatus.PENDING:
msg = f"Approval {approval_id!r} is {item.status.value}, not pending"
logger.warning(
API_APPROVAL_CONFLICT,
approval_id=approval_id,
current_status=item.status.value,
)
raise ConflictError(msg)

auth_user = request.scope.get("user")
if not isinstance(auth_user, AuthenticatedUser):
msg = "Authentication required"
logger.warning(
API_AUTH_FAILED,
approval_id=approval_id,
note="No authenticated user in request scope",
)
raise UnauthorizedError(msg)
@Aureliolo Aureliolo closed this Mar 14, 2026
Aureliolo added a commit that referenced this pull request Mar 14, 2026
- Rename _trigger_resume → _signal_resume_intent with TODO for scheduler
- Add warning log before UnauthorizedError in controller auth checks
- Add DEBUG log on successful parked record deletion
- New APPROVAL_GATE_PARK_TASKLESS event for pre-park debug log
- New APPROVAL_GATE_RISK_CLASSIFY_FAILED event for classification failure
- Fix SQLite schema: parked_contexts.task_id TEXT NOT NULL → TEXT (nullable)
- Extract shared is_valid_action_type() to core/validation.py
- Fix filtered WS handler: re-fetch query instead of blind total increment
- Add afterEach(vi.restoreAllMocks()) for spy cleanup in frontend tests
- Add 10 missing event constants to CLAUDE.md reference list
- Fix test-model-001 → test-small-001 per project conventions
- Add comprehensive tests: validation, approval_store, controller helpers,
  security_factory, approval_gate init/resume, risk classification failure,
  action_type format validation, TTL bounds, ApproveRequest comment length,
  loop_helpers taskless path and IOError handling
Aureliolo added a commit that referenced this pull request Mar 14, 2026
- Split park_context/resume_context into smaller helpers (<50 lines each)
- Fix duplicate APPROVAL_GATE_CONTEXT_RESUMED emission on success path
- Add V6 migration to make parked_contexts.task_id nullable on existing DBs
- Warn when external execution_loop bypasses approval gate wiring
- Guard execute() against KeyError on missing arguments
- Use APPROVAL_GATE_RISK_CLASSIFY_FAILED for "no classifier" debug log
- Fix TOCTOU race on WS duplicate prevention with post-fetch re-check
Aureliolo added a commit that referenced this pull request Mar 14, 2026
- Rename _trigger_resume → _signal_resume_intent with TODO for scheduler
- Add warning log before UnauthorizedError in controller auth checks
- Add DEBUG log on successful parked record deletion
- New APPROVAL_GATE_PARK_TASKLESS event for pre-park debug log
- New APPROVAL_GATE_RISK_CLASSIFY_FAILED event for classification failure
- Fix SQLite schema: parked_contexts.task_id TEXT NOT NULL → TEXT (nullable)
- Extract shared is_valid_action_type() to core/validation.py
- Fix filtered WS handler: re-fetch query instead of blind total increment
- Add afterEach(vi.restoreAllMocks()) for spy cleanup in frontend tests
- Add 10 missing event constants to CLAUDE.md reference list
- Fix test-model-001 → test-small-001 per project conventions
- Add comprehensive tests: validation, approval_store, controller helpers,
  security_factory, approval_gate init/resume, risk classification failure,
  action_type format validation, TTL bounds, ApproveRequest comment length,
  loop_helpers taskless path and IOError handling
Aureliolo added a commit that referenced this pull request Mar 14, 2026
- Split park_context/resume_context into smaller helpers (<50 lines each)
- Fix duplicate APPROVAL_GATE_CONTEXT_RESUMED emission on success path
- Add V6 migration to make parked_contexts.task_id nullable on existing DBs
- Warn when external execution_loop bypasses approval gate wiring
- Guard execute() against KeyError on missing arguments
- Use APPROVAL_GATE_RISK_CLASSIFY_FAILED for "no classifier" debug log
- Fix TOCTOU race on WS duplicate prevention with post-fetch re-check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement request_human_approval tool for approval workflow feat: implement approval workflow gates in engine

2 participants