feat: add approval workflow gates to TaskEngine#387
Conversation
Implement the approval gate layer that parks agent execution when human approval is required and provides infrastructure for resuming after a decision. Also adds the `request_human_approval` tool for agents to explicitly request approval. - Add EscalationInfo/ResumePayload models and approval gate events - Add ApprovalGate service (park/resume via ParkService) - Add RequestHumanApprovalTool (creates ApprovalItem, signals parking) - Add ToolInvoker escalation tracking (ESCALATE verdicts + tool metadata) - Integrate approval gate into execute_tool_calls and both loop types - Wire into AgentEngine, AppState, and approvals controller resume path Closes #258 Closes #259
Critical: empty task_id ValidationError, parking metadata bypass, PARKED→ERROR on park failure, wire parked_context_repo, TOCTOU race via save_if_pending, resume trigger wiring in approve/reject. Security: prompt injection mitigation, max_length on comment, ttl_seconds cap, action_type format validation, remove phantom requested_by field. Logging: wrong event constants for init/risk/park-info, consistent MemoryError logging, new INITIALIZED/RISK_CLASSIFIED/RESUME_TRIGGERED constants. Code quality: extract security/tool factories (962→860 lines), split execute() into helpers, deterministic escalation order, boolean parking_failed, check delete() result, risk classifier error handling. Frontend: async handler type fix, WS test payload/await fix, filter-aware WS, 404 vs transient error discrimination, total desync fix, empty catch logging. Tests: park failure expects ERROR, module-level markers, extracted fixtures, simplified PropertyMock, updated resume message assertions. Docs: persistence qualifier, CLAUDE.md events, docstring fixes.
- _trigger_resume no longer calls resume_context() which would delete the parked record before a scheduler can consume it — now only logs the resume trigger for scheduler observation - Remove dead APPROVAL_RESUMED WsEventType (never emitted) - Remove unused ApprovalGate import from controller
- Rename _trigger_resume → _signal_resume_intent with TODO for scheduler - Add warning log before UnauthorizedError in controller auth checks - Add DEBUG log on successful parked record deletion - New APPROVAL_GATE_PARK_TASKLESS event for pre-park debug log - New APPROVAL_GATE_RISK_CLASSIFY_FAILED event for classification failure - Fix SQLite schema: parked_contexts.task_id TEXT NOT NULL → TEXT (nullable) - Extract shared is_valid_action_type() to core/validation.py - Fix filtered WS handler: re-fetch query instead of blind total increment - Add afterEach(vi.restoreAllMocks()) for spy cleanup in frontend tests - Add 10 missing event constants to CLAUDE.md reference list - Fix test-model-001 → test-small-001 per project conventions - Add comprehensive tests: validation, approval_store, controller helpers, security_factory, approval_gate init/resume, risk classification failure, action_type format validation, TTL bounds, ApproveRequest comment length, loop_helpers taskless path and IOError handling
- Split park_context/resume_context into smaller helpers (<50 lines each) - Fix duplicate APPROVAL_GATE_CONTEXT_RESUMED emission on success path - Add V6 migration to make parked_contexts.task_id nullable on existing DBs - Warn when external execution_loop bypasses approval gate wiring - Guard execute() against KeyError on missing arguments - Use APPROVAL_GATE_RISK_CLASSIFY_FAILED for "no classifier" debug log - Fix TOCTOU race on WS duplicate prevention with post-fetch re-check
- Fix double APPROVAL_GATE_RESUME_DELETE_FAILED emission: early return after exception log prevents fallthrough to if-not-deleted branch - Add APPROVAL_GATE_LOOP_WIRING_WARNING constant for external loop misconfiguration (was misusing APPROVAL_GATE_INITIALIZED) - Wire APPROVAL_GATE_RISK_CLASSIFIED to classifier success path; remove misleading usage for "no classifier" info log (was RISK_CLASSIFY_FAILED) - Add APPROVAL_GATE_LOOP_WIRING_WARNING to CLAUDE.md event reference
- Fix save_if_pending() docstring to cover all None cases (not-found, expired, already-decided) - Fix approve/reject ConflictError message: "no longer pending (already decided or expired)" instead of "decided by another request" - Make V6 migration crash-safe: rename old→backup before replacing, explicit column list in INSERT to avoid schema-order coupling - Add V6 migration tests: verify task_id nullable via PRAGMA, verify data preservation across migration
- Fix mypy arg-type/index errors: wrap fetchall() in list() for Sized - Return store-backed saved object from approve/reject instead of the pre-save updated variable (defensive consistency)
ApprovalGate._serialize_context already logs CONTEXT_PARK_FAILED before re-raising — remove the redundant log in _park_for_approval to avoid double emission of the same event.
_track_parking_metadata now returns an error ToolResult instead of raising ToolExecutionError — prevents uncaught exception from cancelling sibling tasks in TaskGroup during invoke_all.
Checkpoint recovery (V6) landed on main before our parked_contexts nullable migration, which is now V7.
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a human-approval workflow: new ApprovalGate service and models, a request_human_approval tool, ToolInvoker escalation tracking, engine loop wiring to park/resume contexts, API/DTO/controller changes for approvals, observability event constants, DB migration for parked_context.task_id, and frontend WS handling updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent (Execution Loop)
participant Invoker as ToolInvoker
participant Gate as ApprovalGate
participant Park as ParkService
participant Repo as ParkedContextRepo
participant API as Approvals API
participant Store as ApprovalStore
Agent->>Invoker: invoke tool call(s)
Invoker->>Invoker: execute tools & collect verdicts
alt escalation detected
Invoker-->>Agent: pending_escalations (EscalationInfo)
Agent->>Gate: should_park(escalations)?
Gate-->>Agent: escalation to park (EscalationInfo)
Agent->>Gate: park_context(escalation, context)
Gate->>Park: serialize(context)
Park-->>Gate: ParkedContext
Gate->>Repo: persist parked context (optional)
Repo-->>Gate: persisted
Gate-->>Agent: parked (approval_id)
Agent-->>Agent: return PARKED ExecutionResult
end
rect rgba(50,150,200,0.5)
API->>Store: create/list/decide ApprovalItem
Store-->>API: decision saved
API->>Gate: resume_context(approval_id)
Gate->>Repo: load_by_approval_id
Repo-->>Gate: ParkedContext
Gate->>Park: deserialize -> AgentContext
Park-->>Gate: AgentContext
Gate->>Repo: cleanup(delete)
Repo-->>Gate: deleted
Gate-->>API: AgentContext
API-->>Agent: inject resume message / continue execution
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a critical approval workflow mechanism into the agent engine, enabling human oversight for sensitive operations. It allows agents to pause their execution, persist their current context, and await explicit human approval before proceeding. This enhancement significantly improves the safety and control of autonomous agent operations by integrating human-in-the-loop decision-making, ensuring that high-risk actions are vetted and approved, thereby reducing potential liabilities and increasing operational reliability. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #387 +/- ##
==========================================
+ Coverage 93.78% 93.83% +0.05%
==========================================
Files 456 462 +6
Lines 21330 21653 +323
Branches 2046 2079 +33
==========================================
+ Hits 20005 20319 +314
+ Misses 1033 1032 -1
- Partials 292 302 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive approval workflow, a crucial feature for managing agent actions. The changes are well-structured, including a new ApprovalGate service, a RequestHumanApprovalTool, and robust integration into the execution loop and API. The API hardening, particularly binding requested_by to the authenticated user and implementing atomic updates with save_if_pending, significantly improves security and data integrity. The frontend changes to handle WebSocket events asynchronously are also a major improvement.
However, there is a recurring critical issue across several new and modified Python files: the use of Python 2-style except syntax (e.g., except MemoryError, RecursionError:), which will cause a SyntaxError in Python 3. I've added comments with suggestions to fix this in all identified locations.
| self._on_expire(expired) | ||
| try: | ||
| self._on_expire(expired) | ||
| except MemoryError, RecursionError: |
| channels=[CHANNEL_APPROVALS], | ||
| ) | ||
| except RuntimeError, OSError: | ||
| except MemoryError, RecursionError: |
| "risk_level": escalation.risk_level.value, | ||
| }, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| ) | ||
| except MemoryError, RecursionError: |
| metadata={"source": "request_human_approval"}, | ||
| ) | ||
| await self._approval_store.add(item) | ||
| except MemoryError, RecursionError: |
| reason="Agent requested human approval", | ||
| ), | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
Pull request overview
Adds an approval-gated “park/resume” workflow to the execution engine so SecOps ESCALATE verdicts and agent-initiated request_human_approval tool calls can pause execution, persist context, and support later resumption; includes API hardening, observability events, migration updates, and a frontend WS fix for approval events.
Changes:
- Introduce
ApprovalGate+ models, integrate escalation detection intoToolInvoker, and wire parking behavior into execution loops. - Add
request_human_approvaltool, sharedaction_typevalidation, and API/store hardening (requested_byfrom auth,save_if_pending, resume intent signal). - Update persistence schema to v7 (nullable
parked_contexts.task_id), add structured approval-gate events, and fix frontend WS approval event handling.
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/stores/approvals.ts | Update WS handler to use approval_id, fetch full items via API, and reconcile filtered lists. |
| web/src/tests/stores/approvals.test.ts | Expand WS handler tests to cover fetching, filtering re-fetches, and error paths. |
| tests/unit/tools/test_invoker_escalation.py | New unit tests for invoker escalation tracking and deterministic ordering. |
| tests/unit/tools/test_approval_tool.py | New unit tests for RequestHumanApprovalTool creation, validation, and error handling. |
| tests/unit/persistence/test_migrations_v2.py | Add schema version assertion and v7 migration tests for nullable task_id. |
| tests/unit/persistence/sqlite/test_migrations_v6.py | Update schema version expectation to 7. |
| tests/unit/observability/test_events.py | Include approval_gate module in event constant discovery tests. |
| tests/unit/engine/test_security_factory.py | New tests for extracted _security_factory helpers. |
| tests/unit/engine/test_loop_helpers_approval.py | New tests for approval gate integration in execute_tool_calls. |
| tests/unit/engine/test_approval_gate_models.py | New tests for EscalationInfo and ResumePayload validation/immutability. |
| tests/unit/engine/test_approval_gate.py | New tests for ApprovalGate park/resume behavior and resume message building. |
| tests/unit/core/test_validation.py | New tests for shared is_valid_action_type() utility. |
| tests/unit/api/test_dto.py | Update DTO tests for action_type format and approve comment bounds. |
| tests/unit/api/test_approval_store.py | Add tests for save_if_pending, combined filters, and on-expire callback behavior. |
| tests/unit/api/controllers/test_approvals_helpers.py | New tests for approvals controller helper functions and auth pre-checks. |
| tests/unit/api/controllers/test_approvals.py | Update controller tests for action_type format and server-side requested_by. |
| src/ai_company/tools/registry.py | Add all_tools() to support building derived registries. |
| src/ai_company/tools/invoker.py | Track pending escalations (SecOps + tool metadata), clear per call, and sort deterministically for batches. |
| src/ai_company/tools/approval_tool.py | Add RequestHumanApprovalTool that creates ApprovalItem and signals parking via metadata. |
| src/ai_company/tools/init.py | Export RequestHumanApprovalTool. |
| src/ai_company/security/timeout/parked_context.py | Make ParkedContext.task_id nullable to support taskless agents. |
| src/ai_company/security/timeout/park_service.py | Allow task_id=None when parking contexts. |
| src/ai_company/persistence/sqlite/migrations.py | Bump schema to v7 and add migration to make parked_contexts.task_id nullable. |
| src/ai_company/observability/events/approval_gate.py | Add approval-gate lifecycle event constants. |
| src/ai_company/engine/react_loop.py | Wire optional approval_gate into React loop tool-call execution. |
| src/ai_company/engine/plan_execute_loop.py | Wire optional approval_gate into plan/execute loop tool-call execution. |
| src/ai_company/engine/loop_helpers.py | Add approval-gate checks after tool calls and park-on-escalation helper. |
| src/ai_company/engine/approval_gate_models.py | Add EscalationInfo and ResumePayload frozen models. |
| src/ai_company/engine/approval_gate.py | Implement gate logic: choose escalation, park (serialize+persist), resume (load+deserialize+cleanup), build resume injection message. |
| src/ai_company/engine/agent_engine.py | Create/wire default ApprovalGate, inject approval tool into registry, extract security factory, and warn on external loop wiring. |
| src/ai_company/engine/_security_factory.py | New module extracting security interceptor and approval-tool registry composition. |
| src/ai_company/engine/init.py | Re-export approval gate types. |
| src/ai_company/core/validation.py | Add shared is_valid_action_type() utility. |
| src/ai_company/api/state.py | Add approval_gate to app state for controller access. |
| src/ai_company/api/dto.py | Enforce action_type format in CreateApprovalRequest; bound TTL max; bound approve comment max length. |
| src/ai_company/api/controllers/approvals.py | Bind requested_by to authenticated user; add save-if-pending conflict behavior; emit resume intent signal; improve auth logging. |
| src/ai_company/api/approval_store.py | Add save_if_pending() and harden on-expire callback error handling. |
| docs/design/engine.md | Document parking behavior in engine flow when escalations occur. |
| README.md | Update status line to reflect approval workflow gates implementation. |
| CLAUDE.md | Update codebase map/docs for approval-gate integration and approval tool. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/tools/invoker.py
Outdated
| if result.metadata.get("requires_parking") is not True: | ||
| return None | ||
| if not result.metadata.get("approval_id"): | ||
| return None |
| FROM parked_contexts | ||
| """, | ||
| "ALTER TABLE parked_contexts RENAME TO parked_contexts_old", | ||
| "ALTER TABLE parked_contexts_new RENAME TO parked_contexts", | ||
| "DROP TABLE IF EXISTS parked_contexts_old", |
| async def test_v6_makes_task_id_nullable( | ||
| self, memory_db: aiosqlite.Connection | ||
| ) -> None: | ||
| """V7 migration makes parked_contexts.task_id nullable.""" | ||
| # Simulate a pre-v6 database with NOT NULL task_id | ||
| await memory_db.execute("""\ | ||
| CREATE TABLE parked_contexts ( | ||
| id TEXT PRIMARY KEY, | ||
| execution_id TEXT NOT NULL, | ||
| agent_id TEXT NOT NULL, | ||
| task_id TEXT NOT NULL, | ||
| approval_id TEXT NOT NULL, | ||
| parked_at TEXT NOT NULL, | ||
| context_json TEXT NOT NULL, | ||
| metadata TEXT NOT NULL DEFAULT '{}' | ||
| )""") | ||
| await set_user_version(memory_db, 6) | ||
| await memory_db.commit() | ||
|
|
||
| # Verify task_id is NOT NULL before migration | ||
| cursor = await memory_db.execute("PRAGMA table_info('parked_contexts')") | ||
| cols = {row[1]: row[3] for row in await cursor.fetchall()} | ||
| assert cols["task_id"] == 1 # notnull=1 | ||
|
|
||
| # Run migrations (applies v6) | ||
| await run_migrations(memory_db) | ||
| assert await get_user_version(memory_db) == 7 |
| item = ApprovalItem( | ||
| id="exp-001", | ||
| action_type="code_merge", | ||
| title="Test", |
Greptile SummaryThis PR implements a complete approval-gated execution system for the agent engine: a new Key areas of concern:
Confidence Score: 3/5
Important Files Changed
|
| self._on_expire(expired) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.exception( | ||
| API_APPROVAL_EXPIRED, | ||
| approval_id=item.id, | ||
| note="on_expire callback failed", | ||
| ) |
There was a problem hiding this comment.
Lazy expiration not written back to _items inside save_if_pending
When _check_expiration determines the stored item has expired, it returns an expired copy but does not update self._items. After save_if_pending returns None, self._items[item.id] still holds the original PENDING item. Any subsequent call to get() (or another save_if_pending) would retrieve the stale PENDING entry and re-run the expiry check again, but the expired on_expire callback would fire a second time.
If get() also applies lazy expiration this is harmless. If it doesn't, callers that check the status after a failed save_if_pending could observe an inconsistent PENDING state.
Consider writing back the expired record when expiration fires:
current = self._check_expiration(current)
if current.status != ApprovalStatus.PENDING:
# Write back the expired state so subsequent reads are consistent.
self._items[item.id] = current
return NonePrompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/approval_store.py
Line: 178-186
Comment:
**Lazy expiration not written back to `_items` inside `save_if_pending`**
When `_check_expiration` determines the stored item has expired, it returns an expired copy but does **not** update `self._items`. After `save_if_pending` returns `None`, `self._items[item.id]` still holds the original `PENDING` item. Any subsequent call to `get()` (or another `save_if_pending`) would retrieve the stale PENDING entry and re-run the expiry check again, but the expired `on_expire` callback would fire a second time.
If `get()` also applies lazy expiration this is harmless. If it doesn't, callers that check the status after a failed `save_if_pending` could observe an inconsistent `PENDING` state.
Consider writing back the expired record when expiration fires:
```python
current = self._check_expiration(current)
if current.status != ApprovalStatus.PENDING:
# Write back the expired state so subsequent reads are consistent.
self._items[item.id] = current
return None
```
How can I resolve this? If you propose a fix, please make it concise.| @@ -237,7 +253,7 @@ class ApproveRequest(BaseModel): | |||
|
|
|||
There was a problem hiding this comment.
RejectRequest.reason missing max_length bound
ApproveRequest.comment was correctly bounded with max_length=4096 in this PR, but RejectRequest.reason — which flows directly into decision_reason on the stored item — is not bounded. For consistency and to prevent unbounded payloads:
| reason: NotBlankStr = Field(max_length=4096) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/dto.py
Line: 253
Comment:
**`RejectRequest.reason` missing `max_length` bound**
`ApproveRequest.comment` was correctly bounded with `max_length=4096` in this PR, but `RejectRequest.reason` — which flows directly into `decision_reason` on the stored item — is not bounded. For consistency and to prevent unbounded payloads:
```suggestion
reason: NotBlankStr = Field(max_length=4096)
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/ai_company/engine/plan_execute_loop.py (1)
80-96: 🧹 Nitpick | 🔵 TrivialUpdate class docstring to document
approval_gateparameter.Similar to
ReactLoop, thePlanExecuteLoopclass docstring should document the newapproval_gateparameter for completeness.📝 Proposed docstring update
class PlanExecuteLoop: """Plan-and-Execute execution loop. Decomposes a task into steps via LLM planning, then executes each step with a mini-ReAct sub-loop. Supports re-planning on failure. + + Args: + config: Optional loop configuration. + checkpoint_callback: Optional async callback for checkpointing. + approval_gate: Optional gate for approval-required parking. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/plan_execute_loop.py` around lines 80 - 96, The PlanExecuteLoop class docstring is missing documentation for the new approval_gate parameter; update the class-level docstring above class PlanExecuteLoop to describe the approval_gate parameter (type ApprovalGate | None, purpose and behavior) similar to how ReactLoop documents its approval_gate, and mention that it can be passed to __init__ to gate step execution; ensure the parameter name approval_gate appears in the docstring and briefly state default value (None) and its effect on execution/re-planning.src/ai_company/engine/react_loop.py (1)
66-78: 🧹 Nitpick | 🔵 TrivialUpdate class docstring to document the new
approval_gateparameter.The
Argssection in the class docstring documentscheckpoint_callbackbut omits the newly addedapproval_gateparameter. Per coding guidelines, Google-style docstrings are required on public classes.📝 Proposed docstring update
Args: checkpoint_callback: Optional async callback invoked after each completed turn; the callback itself decides whether to persist. + approval_gate: Optional gate for approval-required parking. + When provided, tool escalations trigger context parking. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/react_loop.py` around lines 66 - 78, The class docstring's Args section documents checkpoint_callback but omits the new approval_gate parameter; update the class-level Google-style docstring to include an Args entry for approval_gate describing its type (ApprovalGate | None), that it's optional, and what it controls (used to gate/require approvals between turns passed into __init__ as approval_gate), referencing the existing CheckpointCallback/ checkpoint_callback description for parity and keeping wording concise and consistent with the existing docstring style.src/ai_company/api/controllers/approvals.py (1)
356-430: 🛠️ Refactor suggestion | 🟠 MajorExtract the shared decision flow out of
approve()andreject().These handlers now duplicate the same fetch →
save_if_pending()→ publish → log → signal path and both exceed the repo's 50-line limit, which makes future fixes easy to land in only one branch. As per coding guidelines, "Functions must be < 50 lines, files < 800 lines."Also applies to: 437-511
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/api/controllers/approvals.py` around lines 356 - 430, The approve() and reject() handlers duplicate the same fetch → save_if_pending() → publish → log → signal flow and exceed 50 lines; extract that shared flow into a new helper (e.g. _handle_approval_decision) that accepts (state/app_state, request, approval_id, status: ApprovalStatus, approved: bool, decision_reason: Optional[str], decided_by: str) and does: fetch via app_state.approval_store.get, call _resolve_decision, build updated model_copy with decided_at/decided_by/decision_reason/status, call save_if_pending, raise NotFoundError/ConflictError with the same messages when appropriate, call _publish_approval_event, _log_approval_decision, and await _signal_resume_intent, then return the saved item; then simplify approve() and reject() to call this helper with ApprovalStatus.APPROVED/REJECTED and the auth user (from _resolve_decision or pass through) and return ApiResponse(data=returned_item), preserving existing log keys/API messages and error behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Line 154: Fix the punctuation typo in the Event names guidance line in
CLAUDE.md by adding the missing comma in the final clause; edit the sentence
that currently ends with "Import directly: `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`" so it includes
the needed comma (e.g., "Import directly, `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`") to satisfy
lint/style.
In `@README.md`:
- Line 133: The status line in the README overstates completeness by implying
the approval workflow is end-to-end; update the sentence that currently ends
with "Remaining: CLI" to clearly note that resume/re-enqueue scheduling is not
implemented (e.g., "Remaining: CLI and resume/scheduler for re-enqueueing
executions") so readers understand the approval gate can signal intent but a
scheduler is still required to resume work; edit the line containing the phrase
"Remaining: CLI" and adjust wording to mention both the CLI and the deferred
scheduler/resume mechanism.
In `@src/ai_company/api/approval_store.py`:
- Around line 177-186: The except block that re-raises MemoryError and
RecursionError should log context before re-raising: in the try around
self._on_expire(expired) catch the non-recoverable errors as a tuple (except
(MemoryError, RecursionError):), call logger.error(API_APPROVAL_EXPIRED,
approval_id=item.id, note="on_expire callback failed - non-recoverable error")
and then re-raise; keep the existing broad except Exception block that logs via
logger.exception unchanged.
In `@src/ai_company/api/controllers/approvals.py`:
- Around line 177-214: _signal_resume_intent currently only logs decisions;
update it to perform the actual handoff by calling and awaiting
ApprovalGate.resume_context on app_state.approval_gate (if not None), passing
the approval_id and a payload containing approved, decided_by and
decision_reason so the gate/scheduler can resume the parked agent; ensure you
handle/await the async resume_context call, catch and log any exceptions (using
logger.error) so failures don’t silence the resume attempt, and preserve the
existing log entry for telemetry.
In `@src/ai_company/engine/_security_factory.py`:
- Around line 39-117: The make_security_interceptor function is too long and
mixes autonomy validation, detector selection, and service construction; split
it into small helpers (e.g., validate_autonomy_with_config(security_config,
effective_autonomy), build_detectors(rule_engine_cfg) returning the list/tuple
of detector instances including PolicyValidator, CredentialDetector,
PathTraversalDetector, DestructiveOpDetector, DataLeakDetector, and
build_secops_service(cfg, rule_engine, audit_log, approval_store,
effective_autonomy) that constructs and returns the SecOpsService) then refactor
make_security_interceptor to call these helpers and return the SecOpsService;
keep existing behavior and exceptions (ExecutionStateError, logger messages) and
use the same symbols RuleEngine, RiskClassifier, OutputScanner,
DefaultRiskTierClassifier to assemble the final service.
- Around line 141-148: The code unconditionally appends a new
RequestHumanApprovalTool to the registry which will raise on duplicate names;
modify the logic around approval_tool / tool_registry so you check
tool_registry.all_tools() for an existing tool with the same name (or identity)
as approval_tool and either replace that entry or simply return the original
registry when a matching tool exists before constructing _ToolRegistry; update
the block that builds `existing = list(tool_registry.all_tools())` to filter out
any tool whose name matches approval_tool.name (or replace it) and then return
_ToolRegistry([...]) so duplicate registration is avoided.
In `@src/ai_company/engine/agent_engine.py`:
- Around line 887-903: The current _make_approval_gate() constructs an
ApprovalGate whenever _approval_store is set even if _parked_context_repo is
None, which enables approval parking without persistent storage; modify
_make_approval_gate() to check both self._approval_store and
self._parked_context_repo and only return an ApprovalGate when both are present,
otherwise return None (or raise/explicitly disable approval paths) so
ApprovalGate is never created with parked_context_repo=None; update references
to ApprovalGate, _approval_store, and _parked_context_repo accordingly.
In `@src/ai_company/engine/approval_gate.py`:
- Around line 205-213: resume_context currently deserializes and then calls
_cleanup_parked but ignores failures/False results, returning a resumable
context even when cleanup failed; change resume_context to catch exceptions from
_cleanup_parked and check its return value, and if cleanup fails (exception or
falsy) log an error and raise an exception (fail closed) instead of returning
the context; apply the same defensive check to the other method that calls
_cleanup_parked in the second block (the resume_* caller around lines 260-287)
so both paths refuse to resume when cleanup doesn't succeed.
In `@src/ai_company/engine/loop_helpers.py`:
- Around line 345-413: The _park_for_approval function is over the 50-line
limit; refactor it into small helpers: extract agent and task id (e.g., a new
_extract_agent_and_task_id(ctx: AgentContext) -> tuple[str, str|None] that
contains the logger.debug branch), perform the park attempt (e.g.,
_attempt_park(escalation: EscalationInfo, ctx: AgentContext, approval_gate)
which calls approval_gate.park_context and re-raises MemoryError/RecursionError
but returns False on other exceptions), and build the final ExecutionResult
(e.g., _build_park_result(ctx, turns, escalation.approval_id, parked: bool) that
calls build_result with TerminationReason.PARKED or TerminationReason.ERROR and
the proper metadata). Replace the body of _park_for_approval with calls to these
helpers so the top-level function stays small and delegates extraction, parking,
and result construction.
In `@src/ai_company/persistence/sqlite/migrations.py`:
- Line 155: The _V3_STATEMENTS constant was modified to make task_id nullable,
which alters historical v3 DDL; restore the original v3 schema by changing
task_id back to NOT NULL inside _V3_STATEMENTS so migrations replay the true
pre-v7 schema and let the v7 migration (not _V3_STATEMENTS) handle making
task_id nullable; locate _V3_STATEMENTS and the task_id column definition and
revert it to its original NOT NULL declaration.
- Around line 258-286: The v7 migration _V7_STATEMENTS is not resumable; replace
the fixed statement list with an imperative function _apply_v7(conn) that checks
current schema state (existence of parked_contexts, parked_contexts_new,
parked_contexts_old) and performs only the missing idempotent steps: create
parked_contexts_new if absent, copy rows from parked_contexts into
parked_contexts_new if parked_contexts exists and new is empty, rename
parked_contexts to parked_contexts_old if parked_contexts exists and not already
renamed, rename parked_contexts_new to parked_contexts if the current table is
missing, and drop parked_contexts_old when present; update run_migrations to
call _apply_v7 instead of executing _V7_STATEMENTS, and apply the same
imperative, state-checking replacement for the similar sequence referenced
around the other block (the statements at the second occurrence).
In `@src/ai_company/tools/approval_tool.py`:
- Around line 114-129: The execute() method currently coerces arguments to
strings (action_type, title, description) allowing None/objects to become valid
text; instead validate that arguments["action_type"], ["title"], and
["description"] are present, are instances of str, and are non-blank before
calling _validate_action_type; if any check fails, return a ToolExecutionResult
with is_error=True and a clear message naming the offending key(s) and expected
type/non-empty constraint; keep calling _validate_action_type(action_type) only
after these checks pass.
In `@tests/unit/engine/test_approval_gate_models.py`:
- Around line 90-141: Add tests mirroring TestEscalationInfo by ensuring
ResumePayload also rejects empty strings for approval_id and decided_by: update
or add a test (e.g., test_empty_string_rejected) that parametrizes the same
fields ["approval_id","decided_by"], constructs kwargs like in
test_blank_string_rejected but sets the field to "" and asserts
ResumePayload(**kwargs) raises ValidationError; reference the ResumePayload
class and existing test_blank_string_rejected to keep behavior consistent.
In `@tests/unit/persistence/test_migrations_v2.py`:
- Around line 31-32: Add a 30s pytest timeout to the new async migration tests
in tests/unit/persistence/test_migrations_v2.py: either set a file-scope marker
`pytestmark = pytest.mark.timeout(30)` near the top of the file or add
`@pytest.mark.timeout(30)` to each async test (e.g.,
`test_schema_version_is_seven` and the tests covering lines ~98-173) so each
migration test runs with the repository-required 30‑second timeout.
In `@tests/unit/tools/test_invoker_escalation.py`:
- Around line 136-175: Replace the three nearly identical tests
(test_not_populated_on_escalate_without_approval_id,
test_not_populated_on_allow_verdict, test_not_populated_on_deny_verdict) with a
single parametrized pytest test that iterates different verdict inputs; for each
parameter set construct the AsyncMock interceptor (using _verdict with
SecurityVerdictType values and ApprovalRiskLevel where needed), set scan_output
only for the ALLOW case, create the invoker via _make_invoker(_StubTool(),
security_interceptor=interceptor), call await invoker.invoke(_make_tool_call()),
and assert invoker.pending_escalations == (); use pytest.mark.parametrize to
supply the verdict factory args so coverage remains identical while removing
duplicated arrange/act/assert code.
In `@web/src/stores/approvals.ts`:
- Around line 78-80: WS-driven refreshes currently call
fetchApprovals(activeFilters.value) which mutates shared loading/error state and
lacks a last-response-wins guard; create a dedicated helper (e.g.,
refreshApprovalsForFiltersFromWS or wsRefreshApprovals) that snapshots
activeFilters.value into a local const, calls listApprovals using that snapshot
without touching the global loading/error flags, and stores results only if a
per-filter request token/sequence matches the latest token for that snapshot
(ignore stale responses). Update WS paths that currently call fetchApprovals
(the calls around fetchApprovals and the similar usage at the other location) to
call this new helper instead. Add one regression test that issues overlapping
listApprovals calls (simulate out-of-order resolution) and verifies the store
retains the newest response and that foreground errors/loading are not clobbered
by WS refreshes.
---
Outside diff comments:
In `@src/ai_company/api/controllers/approvals.py`:
- Around line 356-430: The approve() and reject() handlers duplicate the same
fetch → save_if_pending() → publish → log → signal flow and exceed 50 lines;
extract that shared flow into a new helper (e.g. _handle_approval_decision) that
accepts (state/app_state, request, approval_id, status: ApprovalStatus,
approved: bool, decision_reason: Optional[str], decided_by: str) and does: fetch
via app_state.approval_store.get, call _resolve_decision, build updated
model_copy with decided_at/decided_by/decision_reason/status, call
save_if_pending, raise NotFoundError/ConflictError with the same messages when
appropriate, call _publish_approval_event, _log_approval_decision, and await
_signal_resume_intent, then return the saved item; then simplify approve() and
reject() to call this helper with ApprovalStatus.APPROVED/REJECTED and the auth
user (from _resolve_decision or pass through) and return
ApiResponse(data=returned_item), preserving existing log keys/API messages and
error behavior.
In `@src/ai_company/engine/plan_execute_loop.py`:
- Around line 80-96: The PlanExecuteLoop class docstring is missing
documentation for the new approval_gate parameter; update the class-level
docstring above class PlanExecuteLoop to describe the approval_gate parameter
(type ApprovalGate | None, purpose and behavior) similar to how ReactLoop
documents its approval_gate, and mention that it can be passed to __init__ to
gate step execution; ensure the parameter name approval_gate appears in the
docstring and briefly state default value (None) and its effect on
execution/re-planning.
In `@src/ai_company/engine/react_loop.py`:
- Around line 66-78: The class docstring's Args section documents
checkpoint_callback but omits the new approval_gate parameter; update the
class-level Google-style docstring to include an Args entry for approval_gate
describing its type (ApprovalGate | None), that it's optional, and what it
controls (used to gate/require approvals between turns passed into __init__ as
approval_gate), referencing the existing CheckpointCallback/ checkpoint_callback
description for parity and keeping wording concise and consistent with the
existing docstring style.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 22002a6e-d586-49b7-ac6e-cbaa69d43879
📒 Files selected for processing (40)
CLAUDE.mdREADME.mddocs/design/engine.mdsrc/ai_company/api/approval_store.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/api/dto.pysrc/ai_company/api/state.pysrc/ai_company/core/validation.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/_security_factory.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/approval_gate_models.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/react_loop.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/security/timeout/park_service.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/tools/__init__.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/registry.pytests/unit/api/controllers/test_approvals.pytests/unit/api/controllers/test_approvals_helpers.pytests/unit/api/test_approval_store.pytests/unit/api/test_dto.pytests/unit/core/test_validation.pytests/unit/engine/test_approval_gate.pytests/unit/engine/test_approval_gate_models.pytests/unit/engine/test_loop_helpers_approval.pytests/unit/engine/test_security_factory.pytests/unit/observability/test_events.pytests/unit/persistence/sqlite/test_migrations_v6.pytests/unit/persistence/test_migrations_v2.pytests/unit/tools/test_approval_tool.pytests/unit/tools/test_invoker_escalation.pyweb/src/__tests__/stores/approvals.test.tsweb/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Agent
- GitHub Check: Build Backend
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14.
Type hints required on all public functions; mypy strict mode enforced.
Google-style docstrings required on all public classes and functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions must be < 50 lines, files < 800 lines.
Files:
src/ai_company/tools/__init__.pytests/unit/api/controllers/test_approvals_helpers.pytests/unit/engine/test_approval_gate_models.pytests/unit/persistence/sqlite/test_migrations_v6.pytests/unit/api/controllers/test_approvals.pytests/unit/tools/test_invoker_escalation.pysrc/ai_company/engine/_security_factory.pysrc/ai_company/api/state.pysrc/ai_company/engine/approval_gate_models.pytests/unit/api/test_dto.pytests/unit/engine/test_approval_gate.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/tools/invoker.pysrc/ai_company/security/timeout/park_service.pytests/unit/core/test_validation.pytests/unit/engine/test_security_factory.pytests/unit/tools/test_approval_tool.pytests/unit/persistence/test_migrations_v2.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/engine/react_loop.pysrc/ai_company/api/approval_store.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/__init__.pytests/unit/engine/test_loop_helpers_approval.pytests/unit/api/test_approval_store.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/tools/registry.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/loop_helpers.pytests/unit/observability/test_events.pysrc/ai_company/api/dto.pysrc/ai_company/core/validation.pysrc/ai_company/engine/agent_engine.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have:from ai_company.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging/logging.getLogger()/print()in application code. Variable name: alwayslogger(not_logger, notlog).
Always use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs for logging: alwayslogger.info(EVENT, key=value)— neverlogger.info('msg %s', val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
Files:
src/ai_company/tools/__init__.pysrc/ai_company/engine/_security_factory.pysrc/ai_company/api/state.pysrc/ai_company/engine/approval_gate_models.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/tools/invoker.pysrc/ai_company/security/timeout/park_service.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/engine/react_loop.pysrc/ai_company/api/approval_store.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/__init__.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/tools/registry.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/api/dto.pysrc/ai_company/core/validation.pysrc/ai_company/engine/agent_engine.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.
Files:
src/ai_company/tools/__init__.pysrc/ai_company/engine/_security_factory.pysrc/ai_company/api/state.pysrc/ai_company/engine/approval_gate_models.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/tools/invoker.pysrc/ai_company/security/timeout/park_service.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/engine/react_loop.pysrc/ai_company/api/approval_store.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/__init__.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/tools/registry.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/api/dto.pysrc/ai_company/core/validation.pysrc/ai_company/engine/agent_engine.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowon test cases.
Test timeout: 30 seconds per test.
Prefer@pytest.mark.parametrizefor testing similar cases.
Vendor-agnostic everywhere: NEVER use real vendor names in tests. Tests must use test-provider, test-small-001, etc.
Files:
tests/unit/api/controllers/test_approvals_helpers.pytests/unit/engine/test_approval_gate_models.pytests/unit/persistence/sqlite/test_migrations_v6.pytests/unit/api/controllers/test_approvals.pytests/unit/tools/test_invoker_escalation.pytests/unit/api/test_dto.pytests/unit/engine/test_approval_gate.pytests/unit/core/test_validation.pytests/unit/engine/test_security_factory.pytests/unit/tools/test_approval_tool.pytests/unit/persistence/test_migrations_v2.pytests/unit/engine/test_loop_helpers_approval.pytests/unit/api/test_approval_store.pytests/unit/observability/test_events.py
🧠 Learnings (15)
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Dependency review: license allow-list (permissive only), PR comment summaries.
Applied to files:
README.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Package structure: `src/ai_company/` contains api/, budget/, cli/, communication/, config/, core/, engine/, hr/, memory/, persistence/, observability/, providers/, security/, templates/, tools/ modules organized by domain.
Applied to files:
README.mdCLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Web dashboard structure: Vue 3 + PrimeVue + Tailwind CSS with api/, components/, composables/, router/, stores/, styles/, utils/, views/, __tests__/ organized by feature.
Applied to files:
README.mdCLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
src/ai_company/api/state.pyCLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/ai_company/api/state.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/engine/loop_helpers.pytests/unit/observability/test_events.pyCLAUDE.mdsrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Web dashboard unit tests: `npm --prefix web run test` (Vitest).
Applied to files:
web/src/__tests__/stores/approvals.test.ts
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Use markers pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow on test cases.
Applied to files:
tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/conftest.py : Use asyncio_mode = 'auto' — no manual pytest.mark.asyncio needed.
Applied to files:
tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Applied to files:
src/ai_company/api/dto.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Use structured kwargs for logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.mdsrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO.
Applied to files:
CLAUDE.mdsrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : DEBUG logging for object creation, internal flow, entry/exit of key functions.
Applied to files:
CLAUDE.md
🧬 Code graph analysis (31)
src/ai_company/tools/__init__.py (1)
src/ai_company/tools/approval_tool.py (1)
RequestHumanApprovalTool(34-263)
tests/unit/api/controllers/test_approvals_helpers.py (4)
src/ai_company/api/controllers/approvals.py (4)
_log_approval_decision(157-174)_publish_approval_event(69-109)_resolve_decision(112-154)_signal_resume_intent(177-212)src/ai_company/api/errors.py (2)
ConflictError(41-47)UnauthorizedError(59-65)src/ai_company/api/state.py (2)
AppState(23-163)approval_gate(139-141)src/ai_company/api/auth/models.py (1)
AuthenticatedUser(68-87)
tests/unit/engine/test_approval_gate_models.py (1)
src/ai_company/engine/approval_gate_models.py (2)
EscalationInfo(14-33)ResumePayload(36-51)
tests/unit/persistence/sqlite/test_migrations_v6.py (1)
tests/unit/persistence/test_migrations_v2.py (1)
test_schema_version_is_seven(31-32)
tests/unit/tools/test_invoker_escalation.py (4)
src/ai_company/providers/models.py (1)
ToolCall(96-119)src/ai_company/security/models.py (2)
SecurityVerdict(35-67)SecurityVerdictType(23-32)src/ai_company/tools/base.py (3)
BaseTool(57-184)ToolExecutionResult(25-54)description(138-140)src/ai_company/tools/registry.py (1)
ToolRegistry(30-126)
src/ai_company/engine/_security_factory.py (6)
src/ai_company/engine/errors.py (1)
ExecutionStateError(17-18)src/ai_company/security/service.py (1)
SecOpsService(74-456)src/ai_company/api/approval_store.py (1)
ApprovalStore(27-188)src/ai_company/security/config.py (1)
SecurityConfig(93-140)src/ai_company/security/protocol.py (1)
SecurityInterceptionStrategy(18-54)src/ai_company/tools/registry.py (2)
ToolRegistry(30-126)all_tools(102-104)
src/ai_company/api/state.py (1)
src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-322)
src/ai_company/engine/approval_gate_models.py (1)
src/ai_company/tools/base.py (1)
action_type(133-135)
tests/unit/api/test_dto.py (3)
src/ai_company/api/dto.py (3)
ApiResponse(39-57)ApproveRequest(247-256)CreateApprovalRequest(196-244)src/ai_company/tools/base.py (2)
action_type(133-135)description(138-140)src/ai_company/core/enums.py (1)
ApprovalRiskLevel(443-449)
web/src/__tests__/stores/approvals.test.ts (2)
web/src/stores/approvals.ts (1)
useApprovalStore(8-144)web/src/api/types.ts (1)
WsEvent(519-524)
tests/unit/engine/test_approval_gate.py (3)
src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/persistence/repositories.py (1)
ParkedContextRepository(204-272)src/ai_company/security/timeout/park_service.py (3)
ParkService(29-144)park(37-106)resume(108-144)
web/src/stores/approvals.ts (2)
src/ai_company/api/ws_models.py (1)
WsEvent(47-72)web/src/api/types.ts (1)
WsEvent(519-524)
src/ai_company/security/timeout/parked_context.py (2)
src/ai_company/engine/parallel_models.py (1)
task_id(87-89)src/ai_company/tools/base.py (1)
description(138-140)
src/ai_company/tools/invoker.py (5)
src/ai_company/core/enums.py (1)
ApprovalRiskLevel(443-449)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/tools/registry.py (2)
ToolRegistry(30-126)get(71-96)src/ai_company/tools/base.py (4)
name(123-125)action_type(133-135)ToolExecutionResult(25-54)BaseTool(57-184)src/ai_company/providers/models.py (2)
ToolCall(96-119)ToolResult(122-135)
src/ai_company/security/timeout/park_service.py (1)
src/ai_company/engine/parallel_models.py (1)
task_id(87-89)
tests/unit/core/test_validation.py (1)
src/ai_company/core/validation.py (1)
is_valid_action_type(6-19)
src/ai_company/persistence/sqlite/migrations.py (2)
tests/unit/persistence/sqlite/test_user_repo.py (1)
db(22-28)tests/unit/hr/test_persistence.py (1)
db(29-35)
src/ai_company/engine/react_loop.py (2)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-322)
src/ai_company/api/approval_store.py (4)
src/ai_company/core/approval.py (1)
ApprovalItem(24-96)src/ai_company/core/enums.py (1)
ApprovalStatus(434-440)src/ai_company/api/app.py (1)
_on_expire(74-99)src/ai_company/memory/errors.py (1)
MemoryError(13-14)
src/ai_company/engine/approval_gate.py (5)
src/ai_company/persistence/repositories.py (1)
ParkedContextRepository(204-272)src/ai_company/security/timeout/park_service.py (3)
ParkService(29-144)park(37-106)resume(108-144)src/ai_company/security/timeout/parked_context.py (1)
ParkedContext(19-66)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/engine/context.py (1)
AgentContext(87-307)
src/ai_company/engine/__init__.py (3)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-322)src/ai_company/engine/approval_gate_models.py (2)
EscalationInfo(14-33)ResumePayload(36-51)
tests/unit/engine/test_loop_helpers_approval.py (4)
src/ai_company/engine/approval_gate.py (3)
ApprovalGate(40-322)should_park(71-90)park_context(92-122)src/ai_company/engine/loop_helpers.py (1)
execute_tool_calls(240-342)src/ai_company/engine/loop_protocol.py (2)
ExecutionResult(79-140)TerminationReason(28-36)src/ai_company/providers/enums.py (1)
FinishReason(15-22)
tests/unit/api/test_approval_store.py (3)
src/ai_company/api/approval_store.py (5)
ApprovalStore(27-188)add(42-59)save_if_pending(125-149)list_items(75-104)get(61-73)src/ai_company/core/enums.py (2)
ApprovalStatus(434-440)ApprovalRiskLevel(443-449)tests/unit/core/test_approval.py (1)
_now(11-12)
src/ai_company/tools/approval_tool.py (5)
src/ai_company/core/enums.py (1)
ToolCategory(294-308)src/ai_company/core/validation.py (1)
is_valid_action_type(6-19)src/ai_company/tools/base.py (5)
BaseTool(57-184)description(138-140)category(128-130)action_type(133-135)parameters_schema(143-151)src/ai_company/api/approval_store.py (2)
ApprovalStore(27-188)add(42-59)src/ai_company/security/timeout/risk_tier_classifier.py (1)
DefaultRiskTierClassifier(62-101)
src/ai_company/tools/registry.py (1)
src/ai_company/tools/base.py (2)
BaseTool(57-184)name(123-125)
src/ai_company/api/controllers/approvals.py (6)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/memory/errors.py (1)
MemoryError(13-14)src/ai_company/api/errors.py (2)
UnauthorizedError(59-65)ConflictError(41-47)src/ai_company/api/approval_store.py (2)
get(61-73)save_if_pending(125-149)src/ai_company/api/auth/models.py (1)
AuthenticatedUser(68-87)src/ai_company/api/ws_models.py (1)
WsEventType(20-44)
src/ai_company/engine/plan_execute_loop.py (2)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-322)
src/ai_company/engine/loop_helpers.py (5)
src/ai_company/engine/approval_gate.py (1)
should_park(71-90)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/tools/invoker.py (1)
pending_escalations(128-136)src/ai_company/engine/loop_protocol.py (3)
TurnRecord(39-76)ExecutionResult(79-140)TerminationReason(28-36)src/ai_company/memory/errors.py (1)
MemoryError(13-14)
src/ai_company/api/dto.py (2)
src/ai_company/core/validation.py (1)
is_valid_action_type(6-19)src/ai_company/engine/parallel_models.py (1)
task_id(87-89)
src/ai_company/core/validation.py (1)
src/ai_company/tools/base.py (1)
action_type(133-135)
src/ai_company/engine/agent_engine.py (4)
src/ai_company/engine/_security_factory.py (1)
make_security_interceptor(39-117)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-322)src/ai_company/engine/loop_protocol.py (1)
ExecutionLoop(151-189)src/ai_company/engine/react_loop.py (1)
ReactLoop(57-296)
🪛 LanguageTool
CLAUDE.md
[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🔇 Additional comments (45)
web/src/__tests__/stores/approvals.test.ts (1)
262-399: Nice coverage for WS reconciliation edge cases.The filtered re-fetch, 404-removal, and transient-error cases line up well with the new
approval_id-driven store flow and should catch several regressions here.src/ai_company/tools/registry.py (1)
102-104: Deterministic ordering here looks good.Returning a tuple sorted by tool name keeps downstream registry augmentation predictable.
src/ai_company/persistence/sqlite/migrations.py (1)
26-26: Schema bump and migration registration are wired correctly.This makes the new migration reachable on upgrade and keeps the target version explicit.
Also applies to: 371-371
src/ai_company/security/timeout/parked_context.py (1)
41-43: Optionaltask_idkeeps the model aligned with taskless parking.This preserves validation for non-empty IDs when a value is present while allowing the new taskless flow.
src/ai_company/security/timeout/park_service.py (1)
43-52:park()now matches the taskless-agent model.Allowing
task_id=Nonehere keeps the service API consistent with the nullable parked-context shape.tests/unit/persistence/sqlite/test_migrations_v6.py (1)
20-21: Version expectation is updated correctly.This keeps the schema-version smoke test aligned with the new v7 migration.
src/ai_company/observability/events/approval_gate.py (1)
1-19: Good centralization of approval-gate event names.Keeping the lifecycle events in one domain module will make the new logging paths easier to reuse consistently.
Based on learnings, "Always use event name constants from domain-specific modules under
ai_company.observability.events."tests/unit/observability/test_events.py (2)
140-140: LGTM!The
pytestmarklist correctly combines theunitmarker and timeout, adhering to test guidelines.
181-228: LGTM!The
approval_gatedomain module addition aligns with the new observability events introduced in this PR. The expected set maintains alphabetical consistency.tests/unit/api/controllers/test_approvals.py (1)
19-29: LGTM!The
action_typeformat change to"code:merge"aligns with the newcategory:actionvalidation introduced inCreateApprovalRequest. The test payload defaults are correctly structured.src/ai_company/tools/__init__.py (1)
3-3: LGTM!The
RequestHumanApprovalToolimport and export are correctly added, maintaining alphabetical order in__all__.Also applies to: 68-68
docs/design/engine.md (2)
450-454: LGTM!The documentation clearly explains the approval gate integration point within the execution loop, describing how escalations trigger context parking via
ParkServiceandParkedContextRepository.
465-468: LGTM!The
PARKEDtermination reason is correctly documented alongside other reasons that preserve task state, with a clear explanation of the approval-timeout policy suspension semantics.src/ai_company/api/approval_store.py (1)
125-149: LGTM!The
save_if_pendingmethod correctly implements a race-free conditional update pattern. The lazy expiration check before status comparison prevents TOCTOU issues, and the method properly returnsNonefor all failure cases (not found, expired, or no longer pending).tests/unit/core/test_validation.py (1)
1-41: LGTM!The test file follows all guidelines: proper markers (
unit,timeout), parametrized tests for both valid and invalid cases, and comprehensive edge case coverage including empty strings, whitespace variations, and multiple colons.src/ai_company/core/validation.py (1)
1-19: LGTM!Clean implementation of the
category:actionformat validator. The function is well-documented with a Google-style docstring, has proper type hints, and uses a clear validation approach withsplitandstrip.tests/unit/api/test_dto.py (5)
5-5: LGTM!Import correctly updated to include the new
ApproveRequestDTO.
40-41: LGTM!The
action_typevalues correctly updated to use thecategory:actionformat throughout the metadata tests.Also applies to: 51-51, 62-62, 71-71
80-118: LGTM!Comprehensive parametrized tests for
action_typevalidation covering both valid formats and invalid edge cases (missing colon, empty parts, whitespace, multiple colons).
121-160: LGTM!TTL validation tests properly cover boundary conditions: below minimum (30s rejected), above maximum (700000 rejected), within bounds (3600 accepted), and default None.
163-175: LGTM!
ApproveRequestDTO tests correctly validate the optional comment field with proper length constraints.src/ai_company/engine/react_loop.py (1)
244-251: LGTM!The approval gate is correctly forwarded to
execute_tool_calls, maintaining consistency with the pattern established for checkpoint callbacks. The keyword-only parameter ensures explicit usage.tests/unit/engine/test_approval_gate_models.py (1)
1-9: LGTM!Test setup correctly applies
@pytest.mark.unitand timeout markers at module level, following testing guidelines.src/ai_company/engine/__init__.py (2)
9-10: LGTM!New imports are correctly placed and follow the alphabetical ordering convention used throughout the file.
213-295: LGTM!The
__all__exports maintain proper alphabetical ordering (ApprovalGate,EscalationInfo,ResumePayload), making the new public API surface discoverable.tests/unit/engine/test_security_factory.py (3)
1-13: LGTM!Test file properly applies module-level markers and follows testing conventions.
34-94: LGTM!
TestMakeSecurityInterceptorprovides good coverage of the factory's key behaviors: returningNonewhen disabled, raising errors on configuration mismatches witheffective_autonomy, and returning interceptors when properly configured.
97-122: LGTM!
TestRegistryWithApprovalToolcorrectly verifies identity preservation when no store is provided and registry replacement when a store is configured.tests/unit/api/test_approval_store.py (3)
177-239: LGTM!
TestSaveIfPendingcomprehensively covers the optimistic concurrency guard: successful save when pending, rejection when already decided, not found, and expired. The direct_itemsmanipulation on line 234 is appropriate for testing lazy expiration without triggering premature checks.
242-301: LGTM!
TestApprovalStoreFiltersprovides good coverage of combined filter scenarios, including edge cases where no matches are found.
304-374: LGTM!
TestOnExpireCallbackproperly validates callback invocation, exception handling that doesn't prevent expiration, and correct status transitions in listing operations.tests/unit/api/controllers/test_approvals_helpers.py (4)
1-18: LGTM!Test file correctly sets up markers and imports the private helper functions under test.
51-79: LGTM!
TestResolveDecisioncovers the key validation paths: conflict on non-pending status, unauthorized on missing/wrong user type, and successful extraction of the authenticated user.
101-135: LGTM!
TestSignalResumeIntentcorrectly validates the signaling stub behavior: no-op without gate, logging with gate present for both approval and rejection scenarios.
138-187: LGTM!
TestPublishApprovalEventproperly tests the best-effort publishing pattern: graceful handling when no plugin exists, successful publishing when available, and exception resilience.src/ai_company/engine/plan_execute_loop.py (1)
746-771: LGTM!The conversion from
@staticmethodto instance method is necessary to accessself._approval_gate, and the approval gate is correctly forwarded toexecute_tool_calls. This maintains consistency with theReactLoopintegration.src/ai_company/tools/invoker.py (4)
120-137: LGTM!The
pending_escalationsproperty is well-documented and correctly exposes an immutable tuple. Clearing at the start of each invoke cycle ensures clean state.
237-251: LGTM!Escalation tracking for security ESCALATE verdicts is correctly implemented. The escalation is only tracked when
approval_idis non-None, preventing incomplete escalation records.
364-421: LGTM!The refactoring to separate
invoke()and_invoke_single()cleanly handles escalation clearing at the appropriate level—once for single invokes and once at batch level forinvoke_all.
782-788: LGTM!Sorting escalations by original tool-call order ensures deterministic behavior when multiple tools trigger escalations in a batch. The fallback to
len(calls)for unknown IDs is defensive.src/ai_company/api/state.py (1)
39-41: ApprovalGate wiring inAppStateis coherent and safe.
__slots__, constructor injection, and the nullable property are aligned and preserve backward compatibility for deployments where the gate is not configured.Also applies to: 61-67, 138-141
src/ai_company/engine/approval_gate_models.py (1)
14-33: Model design is solid for approval-gate data flow.Both models are immutable, strongly typed, and use
NotBlankStrappropriately for identifier-like fields.Also applies to: 36-51
src/ai_company/api/dto.py (1)
217-228: Validation tightening here looks correct and well-scoped.The
action_typeformat validator,ttl_secondscap, metadata bounds checks, and optional approval comment constraints are consistent with the API hardening goals.Also applies to: 229-244, 256-256
tests/unit/engine/test_loop_helpers_approval.py (1)
107-139: Good coverage for parked/error termination behavior.These cases validate the critical approval-gate branch outcomes and metadata propagation paths in
execute_tool_calls.Also applies to: 171-206, 242-273
tests/unit/engine/test_approval_gate.py (1)
188-292: ApprovalGate unit coverage is comprehensive for core lifecycle and safety paths.The suite meaningfully verifies resume cleanup semantics and untrusted reason-string handling in resume message construction.
Also applies to: 294-354
| - **Required**: `mem0ai` (Mem0 memory backend — the default and currently only backend) | ||
| - **Install**: `uv sync` installs everything (dev group is default) | ||
| - **Web dashboard**: Node.js 20+, dependencies in `web/package.json` (Vue 3, PrimeVue, Tailwind CSS, Pinia, VueFlow, ECharts, Axios, vue-draggable-plus, Vitest, ESLint, vue-tsc) | ||
| - **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_START` from `events.execution`, `CHECKPOINT_SAVED` from `events.checkpoint`, `PERSISTENCE_CHECKPOINT_SAVED` from `events.persistence`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_GROUP_START` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`, `APPROVAL_GATE_ESCALATION_DETECTED` from `events.approval_gate`, `APPROVAL_GATE_ESCALATION_FAILED` from `events.approval_gate`, `APPROVAL_GATE_INITIALIZED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFIED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFY_FAILED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_PARKED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_PARK_FAILED` from `events.approval_gate`, `APPROVAL_GATE_PARK_TASKLESS` from `events.approval_gate`, `APPROVAL_GATE_RESUME_STARTED` from `events.approval_gate`, `APPROVAL_GATE_CONTEXT_RESUMED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_FAILED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_DELETE_FAILED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_TRIGGERED` from `events.approval_gate`, `APPROVAL_GATE_NO_PARKED_CONTEXT` from `events.approval_gate`, `APPROVAL_GATE_LOOP_WIRING_WARNING` from `events.approval_gate`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT` |
There was a problem hiding this comment.
Fix the punctuation typo in the logging guidance line.
Line 154 has a comma omission in the long sentence (as flagged by LanguageTool). Please add the missing comma to keep docs lint/style clean.
🧰 Tools
🪛 LanguageTool
[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CLAUDE.md` at line 154, Fix the punctuation typo in the Event names guidance
line in CLAUDE.md by adding the missing comma in the final clause; edit the
sentence that currently ends with "Import directly: `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`" so it includes
the needed comma (e.g., "Import directly, `from
ai_company.observability.events.<domain> import EVENT_CONSTANT`") to satisfy
lint/style.
| try: | ||
| self._on_expire(expired) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.exception( | ||
| API_APPROVAL_EXPIRED, | ||
| approval_id=item.id, | ||
| note="on_expire callback failed", | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if MemoryError is imported or shadowed in this file
rg -n "import.*MemoryError|from.*import.*MemoryError" src/ai_company/api/approval_store.py
# Check if there's a pattern of catching built-in MemoryError elsewhere
rg -nC2 "except.*MemoryError" src/ai_company --type py | head -40Repository: Aureliolo/synthorg
Length of output: 2521
🏁 Script executed:
#!/bin/bash
# Get the imports at the top of approval_store.py
head -50 src/ai_company/api/approval_store.py
# Check if ai_company.memory.errors is imported anywhere
rg "from ai_company.memory.errors|import.*MemoryError" src/ai_company/api/approval_store.py
# Get the full context of the method containing the suspect code
sed -n '160,190p' src/ai_company/api/approval_store.pyRepository: Aureliolo/synthorg
Length of output: 2396
🏁 Script executed:
#!/bin/bash
# Verify if ai_company.memory.errors.MemoryError exists in codebase
find src -name "*.py" -exec grep -l "class MemoryError" {} \;
# Check ai_company.memory.errors module
fd "memory" src/ai_company --type dRepository: Aureliolo/synthorg
Length of output: 116
🏁 Script executed:
#!/bin/bash
# Check if MemoryError is actually defined in ai_company.memory.errors
cat src/ai_company/memory/errors.pyRepository: Aureliolo/synthorg
Length of output: 1471
🏁 Script executed:
#!/bin/bash
# Check if approval_store.py imports from ai_company.memory.errors
rg "from ai_company.memory|import.*MemoryError" src/ai_company/api/approval_store.py
# Check the complete import section of approval_store.py
sed -n '1,30p' src/ai_company/api/approval_store.pyRepository: Aureliolo/synthorg
Length of output: 877
Add logging before re-raising MemoryError and RecursionError errors.
The exception handler re-raises MemoryError and RecursionError without logging context first. Per guidelines, all error paths must log at WARNING or ERROR with context before raising.
try:
self._on_expire(expired)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
API_APPROVAL_EXPIRED,
approval_id=item.id,
note="on_expire callback failed",
)
Log before the re-raise on line 179:
except MemoryError, RecursionError:
logger.error(
API_APPROVAL_EXPIRED,
approval_id=item.id,
note="on_expire callback failed - non-recoverable error",
)
raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/api/approval_store.py` around lines 177 - 186, The except
block that re-raises MemoryError and RecursionError should log context before
re-raising: in the try around self._on_expire(expired) catch the non-recoverable
errors as a tuple (except (MemoryError, RecursionError):), call
logger.error(API_APPROVAL_EXPIRED, approval_id=item.id, note="on_expire callback
failed - non-recoverable error") and then re-raise; keep the existing broad
except Exception block that logs via logger.exception unchanged.
| async def _signal_resume_intent( | ||
| app_state: AppState, | ||
| approval_id: str, | ||
| *, | ||
| approved: bool, | ||
| decided_by: str, | ||
| decision_reason: str | None = None, | ||
| ) -> None: | ||
| """Log that a decision was made so a scheduler can resume the agent. | ||
|
|
||
| This is intentionally a **signalling-only stub**. It does NOT call | ||
| ``ApprovalGate.resume_context()`` or re-enqueue the parked agent — | ||
| that is the responsibility of a future scheduling component that | ||
| will observe status changes (via log events or store polling) and | ||
| perform the actual resume. | ||
|
|
||
| .. todo:: Wire to a real scheduler once one exists (see §12.4). | ||
|
|
||
| Args: | ||
| app_state: Application state containing the approval gate. | ||
| approval_id: The approval item identifier. | ||
| approved: Whether the action was approved. | ||
| decided_by: Who made the decision. | ||
| decision_reason: Optional reason for the decision. | ||
| """ | ||
| approval_gate = app_state.approval_gate | ||
| if approval_gate is None: | ||
| return | ||
|
|
||
| logger.info( | ||
| APPROVAL_GATE_RESUME_TRIGGERED, | ||
| approval_id=approval_id, | ||
| approved=approved, | ||
| decided_by=decided_by, | ||
| has_reason=decision_reason is not None, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
Wire an actual resume handoff here.
_signal_resume_intent() is the only new post-decision hook in this PR, but it never calls ApprovalGate.resume_context() or hands the decision to any scheduler interface—it only logs. That means the code added here still has no concrete path from an approved/rejected item back to runnable agent state, so the approval flow stops at PARKED.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/api/controllers/approvals.py` around lines 177 - 214,
_signal_resume_intent currently only logs decisions; update it to perform the
actual handoff by calling and awaiting ApprovalGate.resume_context on
app_state.approval_gate (if not None), passing the approval_id and a payload
containing approved, decided_by and decision_reason so the gate/scheduler can
resume the parked agent; ensure you handle/await the async resume_context call,
catch and log any exceptions (using logger.error) so failures don’t silence the
resume attempt, and preserve the existing log entry for telemetry.
| def make_security_interceptor( | ||
| security_config: SecurityConfig | None, | ||
| audit_log: AuditLog, | ||
| *, | ||
| approval_store: ApprovalStore | None = None, | ||
| effective_autonomy: EffectiveAutonomy | None = None, | ||
| ) -> SecurityInterceptionStrategy | None: | ||
| """Build the SecOps security interceptor if configured. | ||
|
|
||
| Args: | ||
| security_config: Security configuration, or ``None`` to skip. | ||
| audit_log: Audit log for security events. | ||
| approval_store: Optional approval store for escalation items. | ||
| effective_autonomy: Optional autonomy level override. | ||
|
|
||
| Returns: | ||
| A ``SecOpsService`` interceptor, or ``None`` if security is | ||
| disabled or not configured. | ||
|
|
||
| Raises: | ||
| ExecutionStateError: If *effective_autonomy* is provided but | ||
| no SecurityConfig is configured. | ||
| """ | ||
| if security_config is None: | ||
| if effective_autonomy is not None: | ||
| msg = ( | ||
| "effective_autonomy cannot be enforced without " | ||
| "SecurityConfig — configure security or remove autonomy" | ||
| ) | ||
| logger.error(SECURITY_DISABLED, note=msg) | ||
| raise ExecutionStateError(msg) | ||
| logger.warning( | ||
| SECURITY_DISABLED, | ||
| note="No SecurityConfig provided — all security checks skipped", | ||
| ) | ||
| return None | ||
| if not security_config.enabled: | ||
| if effective_autonomy is not None: | ||
| msg = "effective_autonomy cannot be enforced when security is disabled" | ||
| logger.error(SECURITY_DISABLED, note=msg) | ||
| raise ExecutionStateError(msg) | ||
| return None | ||
|
|
||
| cfg = security_config | ||
| re_cfg = cfg.rule_engine | ||
| policy_validator = PolicyValidator( | ||
| hard_deny_action_types=frozenset(cfg.hard_deny_action_types), | ||
| auto_approve_action_types=frozenset(cfg.auto_approve_action_types), | ||
| ) | ||
| detectors: list[ | ||
| PolicyValidator | ||
| | CredentialDetector | ||
| | PathTraversalDetector | ||
| | DestructiveOpDetector | ||
| | DataLeakDetector | ||
| ] = [policy_validator] | ||
| if re_cfg.credential_patterns_enabled: | ||
| detectors.append(CredentialDetector()) | ||
| if re_cfg.path_traversal_detection_enabled: | ||
| detectors.append(PathTraversalDetector()) | ||
| if re_cfg.destructive_op_detection_enabled: | ||
| detectors.append(DestructiveOpDetector()) | ||
| if re_cfg.data_leak_detection_enabled: | ||
| detectors.append(DataLeakDetector()) | ||
|
|
||
| rule_engine = RuleEngine( | ||
| rules=tuple(detectors), | ||
| risk_classifier=RiskClassifier(), | ||
| config=re_cfg, | ||
| ) | ||
| return SecOpsService( | ||
| config=cfg, | ||
| rule_engine=rule_engine, | ||
| audit_log=audit_log, | ||
| output_scanner=OutputScanner(), | ||
| approval_store=approval_store, | ||
| effective_autonomy=effective_autonomy, | ||
| risk_classifier=DefaultRiskTierClassifier(), | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Split make_security_interceptor() into smaller helpers.
This new function mixes autonomy validation, detector selection, and service construction across roughly 80 lines. Extracting those phases will satisfy the repo limit and make the wiring easier to test in isolation. As per coding guidelines, "Functions must be < 50 lines, files < 800 lines."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/_security_factory.py` around lines 39 - 117, The
make_security_interceptor function is too long and mixes autonomy validation,
detector selection, and service construction; split it into small helpers (e.g.,
validate_autonomy_with_config(security_config, effective_autonomy),
build_detectors(rule_engine_cfg) returning the list/tuple of detector instances
including PolicyValidator, CredentialDetector, PathTraversalDetector,
DestructiveOpDetector, DataLeakDetector, and build_secops_service(cfg,
rule_engine, audit_log, approval_store, effective_autonomy) that constructs and
returns the SecOpsService) then refactor make_security_interceptor to call these
helpers and return the SecOpsService; keep existing behavior and exceptions
(ExecutionStateError, logger messages) and use the same symbols RuleEngine,
RiskClassifier, OutputScanner, DefaultRiskTierClassifier to assemble the final
service.
| def _track_parking_metadata( | ||
| self, | ||
| result: ToolExecutionResult, | ||
| tool: BaseTool, | ||
| tool_call: ToolCall, | ||
| ) -> ToolResult | None: | ||
| """Detect ``requires_parking`` metadata and add to escalations. | ||
|
|
||
| Tools like ``request_human_approval`` signal parking via | ||
| ``ToolExecutionResult.metadata``. Only tracks when both | ||
| ``requires_parking=True`` and ``approval_id`` are present. | ||
|
|
||
| Returns: | ||
| ``None`` on success, or an error ``ToolResult`` if tracking | ||
| fails — ensures the agent does not silently bypass the | ||
| approval gate. | ||
| """ | ||
| if result.metadata.get("requires_parking") is not True: | ||
| return None | ||
| if not result.metadata.get("approval_id"): | ||
| return None | ||
| try: | ||
| from ai_company.engine.approval_gate_models import ( # noqa: PLC0415 | ||
| EscalationInfo as _EscalationInfo, | ||
| ) | ||
|
|
||
| self._pending_escalations.append( | ||
| _EscalationInfo( | ||
| approval_id=str(result.metadata["approval_id"]), | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool.name, | ||
| action_type=tool.action_type, | ||
| risk_level=ApprovalRiskLevel( | ||
| result.metadata.get("risk_level", "high"), | ||
| ), | ||
| reason="Agent requested human approval", | ||
| ), | ||
| ) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| logger.exception( | ||
| TOOL_INVOKE_EXECUTION_ERROR, | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool.name, | ||
| note="Failed to track parking metadata", | ||
| ) | ||
| return ToolResult( | ||
| tool_call_id=tool_call.id, | ||
| content=f"Approval escalation tracking failed: {exc}", | ||
| is_error=True, | ||
| ) | ||
| return None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Verify risk_level default value matches expected enum member.
The default "high" for risk_level (line 642) is valid per ApprovalRiskLevel enum, but consider whether tools that don't specify a risk level should default to HIGH or if this should be explicitly required in the metadata.
Also, the EscalationInfo import is duplicated (lines 238-240 and 631-633). Consider consolidating to a single helper or module-level lazy import pattern.
♻️ Optional: Consolidate lazy import
# At module level or as a helper
def _get_escalation_info_class() -> type[EscalationInfo]:
from ai_company.engine.approval_gate_models import EscalationInfo
return EscalationInfo| async def test_not_populated_on_escalate_without_approval_id(self) -> None: | ||
| interceptor = AsyncMock() | ||
| interceptor.evaluate_pre_tool = AsyncMock( | ||
| return_value=_verdict( | ||
| SecurityVerdictType.ESCALATE, | ||
| reason="Needs approval", | ||
| ), | ||
| ) | ||
| invoker = _make_invoker(_StubTool(), security_interceptor=interceptor) | ||
| await invoker.invoke(_make_tool_call()) | ||
| assert invoker.pending_escalations == () | ||
|
|
||
| async def test_not_populated_on_allow_verdict(self) -> None: | ||
| interceptor = AsyncMock() | ||
| interceptor.evaluate_pre_tool = AsyncMock( | ||
| return_value=_verdict( | ||
| SecurityVerdictType.ALLOW, | ||
| reason="OK", | ||
| risk_level=ApprovalRiskLevel.LOW, | ||
| ), | ||
| ) | ||
| interceptor.scan_output = AsyncMock( | ||
| return_value=AsyncMock(has_sensitive_data=False), | ||
| ) | ||
| invoker = _make_invoker(_StubTool(), security_interceptor=interceptor) | ||
| await invoker.invoke(_make_tool_call()) | ||
| assert invoker.pending_escalations == () | ||
|
|
||
| async def test_not_populated_on_deny_verdict(self) -> None: | ||
| interceptor = AsyncMock() | ||
| interceptor.evaluate_pre_tool = AsyncMock( | ||
| return_value=_verdict( | ||
| SecurityVerdictType.DENY, | ||
| reason="Blocked", | ||
| risk_level=ApprovalRiskLevel.CRITICAL, | ||
| ), | ||
| ) | ||
| invoker = _make_invoker(_StubTool(), security_interceptor=interceptor) | ||
| await invoker.invoke(_make_tool_call()) | ||
| assert invoker.pending_escalations == () |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider parameterizing verdict non-escalation cases.
Lines 136–175 repeat the same arrange/act/assert flow with only verdict inputs changed. Converting these to a single @pytest.mark.parametrize test will reduce maintenance overhead while keeping coverage identical.
As per coding guidelines: "Prefer @pytest.mark.parametrize for testing similar cases."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/tools/test_invoker_escalation.py` around lines 136 - 175, Replace
the three nearly identical tests
(test_not_populated_on_escalate_without_approval_id,
test_not_populated_on_allow_verdict, test_not_populated_on_deny_verdict) with a
single parametrized pytest test that iterates different verdict inputs; for each
parameter set construct the AsyncMock interceptor (using _verdict with
SecurityVerdictType values and ApprovalRiskLevel where needed), set scan_output
only for the ALLOW case, create the invoker via _make_invoker(_StubTool(),
security_interceptor=interceptor), call await invoker.invoke(_make_tool_call()),
and assert invoker.pending_escalations == (); use pytest.mark.parametrize to
supply the verdict factory args so coverage remains identical while removing
duplicated arrange/act/assert code.
| if (activeFilters.value) { | ||
| // Filters active — re-fetch the filtered query to stay consistent | ||
| await fetchApprovals(activeFilters.value) |
There was a problem hiding this comment.
Use a dedicated, sequenced refresh path for WS-driven filtered updates.
Line 80 and Line 104 reuse fetchApprovals() from detached WS tasks. That method mutates shared loading/error state and has no last-response-wins guard, so a burst of approval events can both hide a foreground action error and let an older filtered snapshot overwrite a newer one. Please route WS refreshes through a separate helper that snapshots the active filters and ignores stale responses; add one regression test with overlapping listApprovals calls that resolve out of order.
💡 Possible fix
+ let wsRefreshSeq = 0
+
+ async function refreshApprovalsFromWs(): Promise<void> {
+ if (!activeFilters.value) return
+
+ const seq = ++wsRefreshSeq
+ const filters = { ...activeFilters.value }
+
+ try {
+ const result = await approvalsApi.listApprovals(filters)
+ if (seq !== wsRefreshSeq) return
+
+ approvals.value = result.data
+ total.value = result.total
+ } catch (err) {
+ if (seq !== wsRefreshSeq) return
+ console.warn('Failed to refresh approvals:', err)
+ }
+ }
+
function handleWsEvent(event: WsEvent): void {
const payload = event.payload as Record<string, unknown> | null
if (!payload || typeof payload !== 'object') return
const approvalId = payload.approval_id
if (typeof approvalId !== 'string' || !approvalId) return
void (async () => {
try {
switch (event.event_type) {
case 'approval.submitted':
if (!approvals.value.some((a) => a.id === approvalId)) {
if (activeFilters.value) {
- await fetchApprovals(activeFilters.value)
+ await refreshApprovalsFromWs()
} else {
// ...
}
}
break
case 'approval.approved':
case 'approval.rejected':
case 'approval.expired':
if (activeFilters.value) {
- await fetchApprovals(activeFilters.value)
+ await refreshApprovalsFromWs()
} else {
// ...
}
break
}
} catch (err) {
console.warn('Unexpected error in WS event handler:', err)
}
})()
}Also applies to: 102-104
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/src/stores/approvals.ts` around lines 78 - 80, WS-driven refreshes
currently call fetchApprovals(activeFilters.value) which mutates shared
loading/error state and lacks a last-response-wins guard; create a dedicated
helper (e.g., refreshApprovalsForFiltersFromWS or wsRefreshApprovals) that
snapshots activeFilters.value into a local const, calls listApprovals using that
snapshot without touching the global loading/error flags, and stores results
only if a per-filter request token/sequence matches the latest token for that
snapshot (ignore stale responses). Update WS paths that currently call
fetchApprovals (the calls around fetchApprovals and the similar usage at the
other location) to call this new helper instead. Add one regression test that
issues overlapping listApprovals calls (simulate out-of-order resolution) and
verifies the store retains the newest response and that foreground
errors/loading are not clobbered by WS refreshes.
- Restore 89 lines accidentally deleted from CLAUDE.md during rebase (Resilience, Testing, Git, CI, Dependencies sections) - Make V7 migration crash-safe: _apply_v7 checks intermediate states (parked_contexts, _old, _new) and only runs missing steps - Return error ToolResult when requires_parking=True but approval_id is missing — prevents silent gate bypass - Restore V3 schema to original NOT NULL task_id (V7 handles change) - Rename test_v6 → test_v7 and fix action_type format in store tests
- Update README status to note deferred resume scheduler - Validate approval_tool arguments are strings (not just present) - Add approval_gate docstrings to ReactLoop and PlanExecuteLoop - Add ResumePayload empty-string rejection tests - Add module-level pytestmark timeout to migration tests
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
CLAUDE.md (1)
154-154:⚠️ Potential issue | 🟡 MinorFix the punctuation: add comma after "e.g."
The style guide expects a comma after "e.g." in this context. This was flagged by LanguageTool and noted in a prior review.
✏️ Proposed fix
-**Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, ... +**Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` at line 154, The sentence listing event constants uses "e.g." without the required comma; update the sentence that begins with "**Event names**: always use constants..." to insert a comma after "e.g." (so it reads "e.g., `PROVIDER_CALL_START`...") and ensure the same punctuation is applied to any other occurrences of "e.g." in that paragraph referencing constants like PROVIDER_CALL_START, BUDGET_RECORD_ADDED, CFO_ANOMALY_DETECTED, etc.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 375-379: The current migration checks `has_original and not
has_new` before running `_V7_COPY_ROWS`, which can skip copying if
`parked_contexts_new` was created then the process crashed; instead, change the
logic to always execute `_V7_COPY_ROWS` whenever a source exists (i.e., if
`has_original` run `_V7_COPY_ROWS` with source="parked_contexts"; if `has_old`
run `_V7_COPY_ROWS` with source="parked_contexts_old`) regardless of `has_new`,
relying on `_V7_COPY_ROWS`'s INSERT OR IGNORE to make the operation idempotent
and crash-safe; update the conditionals around `has_original`, `has_old`,
`has_new` in the migration function in migrations.py accordingly.
In `@src/ai_company/tools/approval_tool.py`:
- Around line 127-138: The current validation in the approval tool only checks
types for action_type, title, and description but allows empty or
whitespace-only strings; update the check in the function containing the
existing isinstance guards (referencing action_type, title, description, and the
returned ToolExecutionResult) to ensure each is a str and that each stripped
string has length > 0 (e.g., verify action_type.strip(), title.strip(),
description.strip() are non-empty) and return the same ToolExecutionResult error
when any fails.
In `@tests/unit/api/test_approval_store.py`:
- Around line 179-374: Add a 30-second pytest timeout marker to each newly added
test class to satisfy the per-test timeout rule: annotate the classes
TestSaveIfPending, TestApprovalStoreFilters, and TestOnExpireCallback with
pytest.mark.timeout(30) (as a class decorator) so every test method inside each
class gets the 30s guard.
In `@tests/unit/engine/test_approval_gate_models.py`:
- Around line 145-152: Add a sibling unit test to ensure an empty string for
decision_reason is rejected the same as blank space: update tests in
test_approval_gate_models.py by adding a test (e.g.,
test_empty_decision_reason_rejected) that constructs
ResumePayload(approval_id="approval-1", approved=False, decided_by="admin",
decision_reason="") and asserts it raises ValidationError, matching the behavior
enforced by NotBlankStr and the existing test_blank_decision_reason_rejected.
---
Duplicate comments:
In `@CLAUDE.md`:
- Line 154: The sentence listing event constants uses "e.g." without the
required comma; update the sentence that begins with "**Event names**: always
use constants..." to insert a comma after "e.g." (so it reads "e.g.,
`PROVIDER_CALL_START`...") and ensure the same punctuation is applied to any
other occurrences of "e.g." in that paragraph referencing constants like
PROVIDER_CALL_START, BUDGET_RECORD_ADDED, CFO_ANOMALY_DETECTED, etc.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a56c33bb-d097-45bd-a066-1d5b2e581be1
📒 Files selected for processing (10)
CLAUDE.mdREADME.mdsrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/react_loop.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/tools/invoker.pytests/unit/api/test_approval_store.pytests/unit/engine/test_approval_gate_models.pytests/unit/persistence/test_migrations_v2.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14.
Type hints required on all public functions; mypy strict mode enforced.
Google-style docstrings required on all public classes and functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions must be < 50 lines, files < 800 lines.
Files:
src/ai_company/tools/invoker.pysrc/ai_company/engine/react_loop.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/engine/plan_execute_loop.pytests/unit/engine/test_approval_gate_models.pytests/unit/api/test_approval_store.pytests/unit/persistence/test_migrations_v2.pysrc/ai_company/tools/approval_tool.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have:from ai_company.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging/logging.getLogger()/print()in application code. Variable name: alwayslogger(not_logger, notlog).
Always use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs for logging: alwayslogger.info(EVENT, key=value)— neverlogger.info('msg %s', val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
Files:
src/ai_company/tools/invoker.pysrc/ai_company/engine/react_loop.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/tools/approval_tool.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.
Files:
src/ai_company/tools/invoker.pysrc/ai_company/engine/react_loop.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/tools/approval_tool.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowon test cases.
Test timeout: 30 seconds per test.
Prefer@pytest.mark.parametrizefor testing similar cases.
Vendor-agnostic everywhere: NEVER use real vendor names in tests. Tests must use test-provider, test-small-001, etc.
Files:
tests/unit/engine/test_approval_gate_models.pytests/unit/api/test_approval_store.pytests/unit/persistence/test_migrations_v2.py
🧠 Learnings (12)
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Dependency review: license allow-list (permissive only), PR comment summaries.
Applied to files:
README.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Fix everything valid — never skip: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no 'out of scope' skipping.
Applied to files:
README.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Package structure: `src/ai_company/` contains api/, budget/, cli/, communication/, config/, core/, engine/, hr/, memory/, persistence/, observability/, providers/, security/, templates/, tools/ modules organized by domain.
Applied to files:
README.mdCLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/persistence/test_migrations_v2.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Web dashboard structure: Vue 3 + PrimeVue + Tailwind CSS with api/, components/, composables/, router/, stores/, styles/, utils/, views/, __tests__/ organized by feature.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : Use structured kwargs for logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to src/ai_company/**/*.py : DEBUG logging for object creation, internal flow, entry/exit of key functions.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.
Applied to files:
src/ai_company/tools/approval_tool.py
🧬 Code graph analysis (5)
src/ai_company/tools/invoker.py (4)
src/ai_company/core/enums.py (1)
ApprovalRiskLevel(443-449)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/tools/base.py (2)
name(123-125)action_type(133-135)src/ai_company/providers/models.py (2)
ToolCall(96-119)ToolResult(122-135)
src/ai_company/engine/react_loop.py (2)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-322)
src/ai_company/engine/plan_execute_loop.py (2)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-322)
tests/unit/engine/test_approval_gate_models.py (2)
src/ai_company/engine/approval_gate_models.py (2)
EscalationInfo(14-33)ResumePayload(36-51)src/ai_company/tools/base.py (1)
action_type(133-135)
tests/unit/api/test_approval_store.py (2)
src/ai_company/api/approval_store.py (3)
list_items(75-104)ApprovalStore(27-188)get(61-73)src/ai_company/core/enums.py (2)
ApprovalStatus(434-440)ApprovalRiskLevel(443-449)
🪛 LanguageTool
CLAUDE.md
[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🔇 Additional comments (25)
tests/unit/engine/test_approval_gate_models.py (3)
12-88: LGTM!The
TestEscalationInfoclass provides thorough coverage:
- Valid construction with all field assertions
- Frozen immutability verification
- Parametrized blank/empty string rejection for all
NotBlankStrfields- All
ApprovalRiskLevelenum values acceptedGood use of
@pytest.mark.parametrizefor the validation tests.
134-143: Past review comment addressed.The
test_empty_string_rejectedparametrized test forapproval_idanddecided_byhas been added, matching the consistency withTestEscalationInfo.
1-9: LGTM!Module setup is clean:
- Appropriate imports with no unnecessary dependencies
pytestmarkcorrectly applies@pytest.mark.unitand 30-second timeout to all tests per coding guidelinestests/unit/api/test_approval_store.py (2)
110-120: Action-type test updates are aligned with the new format.The switch to
category:actionvalues in filter/get/save test paths is consistent and improves coverage for the new action-type convention.Also applies to: 127-127, 165-165
199-239: Parameterize similarsave_if_pendingnegative-path cases.The
already_decided,not_found, andexpiredscenarios are structurally similar and are better expressed as parameterized cases to reduce duplication.[ suggest_recommended_refactor ]
As per coding guidelines, "Prefer@pytest.mark.parametrizefor testing similar cases."README.md (1)
133-133: LGTM!The status line now accurately reflects that the approval workflow can park agents but the resume/re-enqueue scheduler is still pending. This addresses the prior review concern about overstating completeness.
src/ai_company/tools/invoker.py (5)
120-136: LGTM!The
pending_escalationsproperty correctly exposes a read-only tuple of escalation infos collected during tool invocation. The clearing at the start ofinvoke()andinvoke_all()ensures fresh state per invocation batch.
237-251: LGTM!The escalation tracking on ESCALATE verdict correctly captures the approval_id, tool context, and risk level from the security verdict. The conditional check for
verdict.approval_id is not Noneprevents tracking incomplete escalations.
364-421: LGTM!The refactor to separate
invoke()(clears + delegates) from_invoke_single()(core logic) enables proper escalation collection across batch invocations while maintaining the single-call API contract.
609-674: LGTM!The
_track_parking_metadatamethod correctly:
- Returns early when
requires_parkingis not True- Returns an error
ToolResultwhenapproval_idis missing (fail-closed)- Creates
EscalationInfowith appropriate defaults- Propagates non-recoverable errors
The error-result approach ensures the agent cannot silently bypass the approval gate when metadata is inconsistent.
795-801: LGTM!Sorting escalations by original tool-call order ensures deterministic behavior regardless of concurrent execution timing. The fallback
len(calls)for unknown IDs is a reasonable edge-case handler.CLAUDE.md (1)
101-115: LGTM!The package structure documentation accurately reflects the new approval gate integration in the api/, engine/, and tools/ modules.
src/ai_company/persistence/sqlite/migrations.py (2)
357-400: LGTM! Crash-safe migration implementation.The
_apply_v7function correctly handles three intermediate states:
- Normal: creates new table, copies rows, renames original→old, renames new→parked_contexts, drops old
- Mid-crash after rename: detects parked_contexts_old exists without original, copies from backup, completes rename sequence
- Already done: IF NOT EXISTS guards and conditional checks prevent duplicate operations
The state checks at lines 367-369 and conditional operations throughout make this resumable from any crash point.
258-278: LGTM!The DDL and copy statement templates are well-structured. The
INSERT OR IGNOREin_V7_COPY_ROWSprovides idempotency for repeated copy attempts, and the{source}placeholder enables copying from either the original or backup table during crash recovery.tests/unit/persistence/test_migrations_v2.py (3)
19-19: LGTM!The module-level
pytestmarkcorrectly applies bothpytest.mark.unitandpytest.mark.timeout(30)to all tests in this file, addressing the prior review comment about missing timeout markers.
99-130: LGTM!The test correctly verifies the v7 migration behavior:
- Sets up pre-v7 schema with
task_id TEXT NOT NULL- Verifies NOT NULL constraint via
PRAGMA table_info(notnull=1)- Runs migration and asserts version becomes 7
- Verifies task_id is now nullable (notnull=0)
132-174: LGTM!The data preservation test properly:
- Creates pre-v7 schema and inserts a test row
- Runs migrations
- Verifies the row persists with correct id and task_id values
This ensures the
INSERT OR IGNOREcopy operation preserves existing data.src/ai_company/engine/plan_execute_loop.py (2)
94-103: LGTM!The constructor correctly accepts
approval_gateas an optional keyword-only parameter and stores it for use during step execution. The docstring accurately describes its purpose.
753-778: LGTM!The method correctly:
- Changed from
@staticmethodto instance method to accessself._approval_gate- Forwards
approval_gate=self._approval_gatetoexecute_tool_callsThis enables the approval gate to detect escalations and park execution when needed during plan step execution.
src/ai_company/engine/react_loop.py (2)
74-81: LGTM!The constructor signature mirrors
PlanExecuteLoop, acceptingapproval_gateas an optional keyword-only parameter. This maintains API consistency across execution loops.
247-254: LGTM!The
approval_gateis correctly forwarded toexecute_tool_calls, enabling escalation detection and parking during ReAct loop tool execution. This completes the approval gate integration in the core execution path.src/ai_company/tools/approval_tool.py (4)
49-97: LGTM!The constructor properly:
- Initializes the base tool with appropriate name, description, category, and JSON schema
- Stores dependencies (approval_store, risk_classifier, agent_id, task_id)
- Defines a clear parameters schema with maxLength constraints
159-202: LGTM!The
_persist_itemmethod correctly:
- Creates an
ApprovalItemwith all required fields including metadata source- Persists via the approval store
- Returns error result on failure with proper logging
- Re-raises non-recoverable errors (MemoryError, RecursionError)
204-233: LGTM!The success result correctly includes
requires_parking: Trueandapproval_idin metadata, which theToolInvoker._track_parking_metadatamethod will detect to trigger the escalation flow.
252-275: LGTM!The risk classification correctly:
- Uses the classifier when available
- Falls back to
HIGHon missing classifier or classification failure- Logs at appropriate levels (exception for failures, debug for success)
This fail-closed behavior ensures sensitive actions default to high-risk when classification fails.
- Remove not-has_new guard in V7 migration step 2 — INSERT OR IGNORE is idempotent, always copy when source exists - Validate approval_tool args are non-empty after stripping whitespace - Add timeout(30) markers to approval_store test classes - Add empty decision_reason rejection test for ResumePayload
There was a problem hiding this comment.
Pull request overview
Adds an approval-gated execution path to the agent engine so SecOps ESCALATE verdicts and agent request_human_approval calls can park execution, persist context, and support later resumption workflows.
Changes:
- Introduces
ApprovalGate+ models, integrates escalation parking into execution loops, and adds deterministicToolInvoker.pending_escalations. - Adds
request_human_approvaltool, sharedaction_typevalidation, and expands approval API/controller/store behavior (auth-boundrequested_by,save_if_pending, resume-intent signaling). - Updates SQLite schema to v7 (taskless parking) and fixes frontend approval WS handling by fetching full items via API.
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/stores/approvals.ts | Reworks WS event handling to use approval_id and reconcile state via API re-fetches. |
| web/src/tests/stores/approvals.test.ts | Adds async WS handler tests (fetching, filtered re-fetch, error cases). |
| tests/unit/tools/test_invoker_escalation.py | New tests for invoker escalation tracking + deterministic ordering. |
| tests/unit/tools/test_approval_tool.py | New tests for RequestHumanApprovalTool creation, validation, and error handling. |
| tests/unit/persistence/test_migrations_v2.py | Expands migration tests to include v7 task_id nullability + data preservation. |
| tests/unit/persistence/sqlite/test_migrations_v6.py | Updates schema version expectation to 7. |
| tests/unit/observability/test_events.py | Ensures approval_gate event module is discovered and unit-marked. |
| tests/unit/engine/test_security_factory.py | New tests for extracted security/tool factory helpers. |
| tests/unit/engine/test_loop_helpers_approval.py | New tests for loop helper parking integration and failure paths. |
| tests/unit/engine/test_approval_gate_models.py | New tests for frozen escalation/resume payload models. |
| tests/unit/engine/test_approval_gate.py | New tests for ApprovalGate park/resume lifecycle and resume message formatting. |
| tests/unit/core/test_validation.py | New tests for shared is_valid_action_type() validation helper. |
| tests/unit/api/test_dto.py | Updates DTO tests (action_type format, TTL bounds, approve comment bounds). |
| tests/unit/api/test_approval_store.py | Updates action_type fixtures + adds tests for save_if_pending, filters, on_expire. |
| tests/unit/api/controllers/test_approvals_helpers.py | New tests for controller helpers (decision resolution, resume intent stub, WS publish). |
| tests/unit/api/controllers/test_approvals.py | Updates controller payload defaults (action_type format, requested_by removed). |
| src/ai_company/tools/registry.py | Adds ToolRegistry.all_tools() for tool enumeration/cloning. |
| src/ai_company/tools/invoker.py | Tracks escalations from SecOps verdicts and tool parking metadata; exposes pending_escalations. |
| src/ai_company/tools/approval_tool.py | Implements RequestHumanApprovalTool that creates approvals and signals parking via metadata. |
| src/ai_company/tools/init.py | Exports the new approval tool. |
| src/ai_company/security/timeout/parked_context.py | Makes ParkedContext.task_id nullable for taskless agents. |
| src/ai_company/security/timeout/park_service.py | Allows task_id=None when parking contexts. |
| src/ai_company/persistence/sqlite/migrations.py | Bumps schema to v7 and adds migration to make parked_contexts.task_id nullable. |
| src/ai_company/observability/events/approval_gate.py | Adds structured event constants for approval gate lifecycle. |
| src/ai_company/engine/react_loop.py | Wires optional approval_gate into React loop tool execution. |
| src/ai_company/engine/plan_execute_loop.py | Wires optional approval_gate into plan/execute step tool execution. |
| src/ai_company/engine/loop_helpers.py | Adds escalation check after tool execution and parks context via helper. |
| src/ai_company/engine/approval_gate_models.py | Adds frozen Pydantic models for escalation info and resume payload. |
| src/ai_company/engine/approval_gate.py | Implements park/resume coordination over ParkService + ParkedContextRepository. |
| src/ai_company/engine/agent_engine.py | Integrates approval gate creation, loop wiring, and approval tool registration via factories. |
| src/ai_company/engine/_security_factory.py | Extracts security interceptor factory + approval-tool registry composition. |
| src/ai_company/engine/init.py | Exports approval gate service/models. |
| src/ai_company/core/validation.py | Adds shared is_valid_action_type() helper. |
| src/ai_company/api/state.py | Adds approval_gate to application state. |
| src/ai_company/api/dto.py | Removes requested_by from create payload; validates action_type format; bounds TTL/comment. |
| src/ai_company/api/controllers/approvals.py | Binds requested_by to auth user; adds conflict-safe save_if_pending; signals resume intent. |
| src/ai_company/api/approval_store.py | Adds save_if_pending() and hardens on_expire callback error handling. |
| docs/design/engine.md | Documents approval-gated parking behavior in engine run flow. |
| README.md | Updates status/roadmap text to reflect approval gates + scheduler remaining. |
| CLAUDE.md | Updates repo overview + event-constant guidance to include approval gate domain. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _EscalationInfo( | ||
| approval_id=str(result.metadata["approval_id"]), | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool.name, | ||
| action_type=tool.action_type, | ||
| risk_level=ApprovalRiskLevel( | ||
| result.metadata.get("risk_level", "high"), | ||
| ), |
| # Step 4: rename new → parked_contexts. | ||
| if await _table_exists(db, "parked_contexts_new"): | ||
| await db.execute( | ||
| "ALTER TABLE parked_contexts_new RENAME TO parked_contexts", | ||
| ) | ||
|
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/unit/api/test_approval_store.py (2)
49-50: 🧹 Nitpick | 🔵 TrivialConsider adding timeout marker to existing test class for consistency.
The new test classes correctly include
@pytest.mark.timeout(30), butTestApprovalStorelacks this marker. For consistency with the coding guidelines and the newly added classes, consider adding it here as well.♻️ Proposed fix
`@pytest.mark.unit` +@pytest.mark.timeout(30) class TestApprovalStore:As per coding guidelines, "Test timeout: 30 seconds per test."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/api/test_approval_store.py` around lines 49 - 50, Add the pytest timeout marker to the TestApprovalStore class by annotating the class with `@pytest.mark.timeout`(30) (i.e., add the decorator directly above the TestApprovalStore class definition) so it follows the same 30-second per-test timeout convention used by the other test classes.
18-21: 🛠️ Refactor suggestion | 🟠 MajorUpdate default
action_typeto match the new format.The helper's default
action_type="code_merge"uses the old underscore format, while the changed tests explicitly use the new colon format ("code:merge","deploy:staging"). This inconsistency could cause confusion and tests using the default may not align with the new domain convention.♻️ Proposed fix
def _make_item( # noqa: PLR0913 *, approval_id: str = "approval-001", - action_type: str = "code_merge", + action_type: str = "code:merge", risk_level: ApprovalRiskLevel = ApprovalRiskLevel.MEDIUM,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/api/test_approval_store.py` around lines 18 - 21, The helper function _make_item has an outdated default action_type ("code_merge"); update its default to the new colon-delimited format ("code:merge") so tests that rely on the helper match the updated domain convention (e.g., other tests that use "code:merge" or "deploy:staging"); locate the _make_item definition and change the action_type default value accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 387-390: The migration's Step 4 currently only checks
_table_exists(db, "parked_contexts_new") and blindly renames it, causing an
OperationalError if "parked_contexts" already exists; update the logic around
that check (the block using _table_exists, "parked_contexts_new" and the
db.execute renaming) to first test whether "parked_contexts" already exists and,
if so, drop "parked_contexts_new" (via await db.execute("DROP TABLE
parked_contexts_new")) instead of renaming; otherwise perform the existing
RENAME TO operation; keep using the same _table_exists helper and db.execute
calls so the change is localized to the Step 4 branch.
In `@src/ai_company/tools/approval_tool.py`:
- Around line 189-190: The except blocks that catch MemoryError and
RecursionError (the "except MemoryError, RecursionError:" handlers in
src/ai_company/tools/approval_tool.py) re-raise without logging; modify each
handler to log a warning or error with contextual information (include the
exception type and message and relevant operation/context) via the module logger
before re-raising the same exception so the error path is observable; ensure you
use the existing logger instance (or create one consistent with the file) and
call logger.warning/logger.error with the exception info (e.g., include
exc_info=True or include str(exc)) immediately prior to the raise in both the
first handler and the similar handler later in the file.
- Around line 99-160: Split execute into a thin orchestrator plus two helpers:
extract the argument parsing/validation into a new method (e.g.
_parse_and_validate_arguments(arguments) -> either (action_type, title,
description) or a ToolExecutionResult error) which contains the try/except
KeyError block and the type/empty-string checks, and leave validation of action
type via the existing _validate_action_type call inside that helper; then
extract the remaining flow that classifies risk, generates approval_id, calls
_persist_item, and returns results into a second helper (e.g.
_create_and_store_approval(action_type, title, description) ->
ToolExecutionResult or success payload) that uses _classify_risk, uuid4().hex to
build approval_id, _persist_item, and _build_success; finally make execute() a
small async method that calls these two helpers in sequence and returns early on
any ToolExecutionResult error.
- Around line 127-147: The input strings are validated for emptiness with
.strip() but the untrimmed values are still used later (e.g., passed to
_validate_action_type, _classify_risk, and persisted), so after the initial
validation block set normalized variables (e.g., action_type =
action_type.strip(), title = title.strip(), description = description.strip() or
another canonical normalization if required) and use these normalized values for
the subsequent calls to self._validate_action_type(action_type) and
self._classify_risk(action_type) and any persistence/ToolExecutionResult content
to avoid misclassification or inconsistent storage.
- Around line 199-203: The tool currently returns internal exception
class/message in the agent-facing ToolExecutionResult (the return constructing
ToolExecutionResult in approval_tool.py), which can leak internals; change this
to return a generic, non-sensitive error message (e.g., "Failed to create
approval request") in the content and set is_error=True, and move the detailed
exception information into an internal log call (use existing logger or
logger.exception/logger.error) so the full exception type and message are
recorded only in logs, not returned to the caller.
In `@tests/unit/engine/test_approval_gate_models.py`:
- Around line 15-161: Add Google-style docstrings to all public test functions
in this file (e.g., test_valid_construction, test_frozen_immutability,
test_blank_string_rejected, test_empty_string_rejected,
test_all_risk_levels_accepted, test_approved_without_reason,
test_rejected_with_reason, test_empty_decision_reason_rejected,
test_blank_decision_reason_rejected, etc.). For each test function add a concise
Google-style docstring describing the behavior being tested (one-line summary;
longer description or Args/Returns sections are optional for clarity) so the
functions satisfy the repository's docstring policy enforced by ruff D rules.
Ensure the docstrings are triple-quoted and placed immediately under each def to
mark them as public-function docstrings.
---
Outside diff comments:
In `@tests/unit/api/test_approval_store.py`:
- Around line 49-50: Add the pytest timeout marker to the TestApprovalStore
class by annotating the class with `@pytest.mark.timeout`(30) (i.e., add the
decorator directly above the TestApprovalStore class definition) so it follows
the same 30-second per-test timeout convention used by the other test classes.
- Around line 18-21: The helper function _make_item has an outdated default
action_type ("code_merge"); update its default to the new colon-delimited format
("code:merge") so tests that rely on the helper match the updated domain
convention (e.g., other tests that use "code:merge" or "deploy:staging"); locate
the _make_item definition and change the action_type default value accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cec01522-f6a3-4d8d-a624-775b3df68279
📒 Files selected for processing (4)
src/ai_company/persistence/sqlite/migrations.pysrc/ai_company/tools/approval_tool.pytests/unit/api/test_approval_store.pytests/unit/engine/test_approval_gate_models.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Agent
- GitHub Check: Build Backend
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14.
Type hints required on all public functions; mypy strict mode enforced.
Google-style docstrings required on all public classes and functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions must be < 50 lines, files < 800 lines.
Files:
src/ai_company/persistence/sqlite/migrations.pytests/unit/api/test_approval_store.pytests/unit/engine/test_approval_gate_models.pysrc/ai_company/tools/approval_tool.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have:from ai_company.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging/logging.getLogger()/print()in application code. Variable name: alwayslogger(not_logger, notlog).
Always use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs for logging: alwayslogger.info(EVENT, key=value)— neverlogger.info('msg %s', val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
Files:
src/ai_company/persistence/sqlite/migrations.pysrc/ai_company/tools/approval_tool.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.
Files:
src/ai_company/persistence/sqlite/migrations.pysrc/ai_company/tools/approval_tool.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowon test cases.
Test timeout: 30 seconds per test.
Prefer@pytest.mark.parametrizefor testing similar cases.
Vendor-agnostic everywhere: NEVER use real vendor names in tests. Tests must use test-provider, test-small-001, etc.
Files:
tests/unit/api/test_approval_store.pytests/unit/engine/test_approval_gate_models.py
🧠 Learnings (3)
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/api/test_approval_store.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to tests/**/*.py : Use markers pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow on test cases.
Applied to files:
tests/unit/api/test_approval_store.py
📚 Learning: 2026-03-14T09:39:32.007Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T09:39:32.007Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.
Applied to files:
src/ai_company/tools/approval_tool.py
🧬 Code graph analysis (3)
tests/unit/api/test_approval_store.py (3)
src/ai_company/api/approval_store.py (2)
add(42-59)list_items(75-104)src/ai_company/core/enums.py (2)
ApprovalStatus(434-440)ApprovalRiskLevel(443-449)tests/unit/core/test_approval.py (1)
_now(11-12)
tests/unit/engine/test_approval_gate_models.py (1)
src/ai_company/engine/approval_gate_models.py (2)
EscalationInfo(14-33)ResumePayload(36-51)
src/ai_company/tools/approval_tool.py (3)
src/ai_company/core/validation.py (1)
is_valid_action_type(6-19)src/ai_company/api/approval_store.py (2)
ApprovalStore(27-188)add(42-59)src/ai_company/security/timeout/risk_tier_classifier.py (1)
DefaultRiskTierClassifier(62-101)
🔇 Additional comments (7)
tests/unit/api/test_approval_store.py (4)
110-120: LGTM!The action_type values correctly use the new
category:actionformat convention.
179-240: LGTM!Good coverage of
save_if_pending()edge cases including pending, already-decided, not-found, and expired scenarios. The timeout marker addresses the previous review feedback.
243-303: LGTM!The combined filter tests properly validate multi-criteria filtering and use the new action_type format consistently.
306-377: LGTM!Excellent coverage of the
on_expirecallback lifecycle, including exception resilience and lazy expiration behavior verification.tests/unit/engine/test_approval_gate_models.py (1)
9-9: Good module-level test classification and timeout setup.Using
pytestmarkhere keeps all tests consistently marked as unit tests with a 30s timeout.As per coding guidelines,
tests/**/*.py: Use markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowon test cases. Test timeout: 30 seconds per test.src/ai_company/tools/approval_tool.py (1)
215-236: Good structured lifecycle logging and parking metadata.Line 215–Line 236 correctly logs the state transition with a domain event constant and returns deterministic parking metadata (
requires_parking,approval_id,action_type,risk_level).src/ai_company/persistence/sqlite/migrations.py (1)
270-278: Good idempotent row-copy approach.Using
INSERT OR IGNOREin_V7_COPY_ROWSis a solid choice for crash-retry safety during repeated copy attempts.
| async def execute( | ||
| self, | ||
| *, | ||
| arguments: dict[str, Any], | ||
| ) -> ToolExecutionResult: | ||
| """Create an approval item and signal parking. | ||
|
|
||
| Args: | ||
| arguments: Must contain ``action_type``, ``title``, and | ||
| ``description``. | ||
|
|
||
| Returns: | ||
| ``ToolExecutionResult`` with ``requires_parking=True`` in | ||
| metadata on success, or an error result on failure. | ||
| """ | ||
| try: | ||
| action_type = arguments["action_type"] | ||
| title = arguments["title"] | ||
| description = arguments["description"] | ||
| except KeyError as exc: | ||
| return ToolExecutionResult( | ||
| content=( | ||
| f"Missing required argument: {exc}. " | ||
| f"Required: action_type, title, description" | ||
| ), | ||
| is_error=True, | ||
| ) | ||
|
|
||
| if ( | ||
| not isinstance(action_type, str) | ||
| or not isinstance(title, str) | ||
| or not isinstance(description, str) | ||
| or not action_type.strip() | ||
| or not title.strip() | ||
| or not description.strip() | ||
| ): | ||
| return ToolExecutionResult( | ||
| content=( | ||
| "Arguments action_type, title, and description " | ||
| "must be non-empty strings" | ||
| ), | ||
| is_error=True, | ||
| ) | ||
|
|
||
| validation_error = self._validate_action_type(action_type) | ||
| if validation_error is not None: | ||
| return validation_error | ||
|
|
||
| risk_level = self._classify_risk(action_type) | ||
| approval_id = f"approval-{uuid4().hex}" | ||
|
|
||
| store_error = await self._persist_item( | ||
| approval_id, | ||
| action_type, | ||
| title, | ||
| description, | ||
| risk_level, | ||
| ) | ||
| if store_error is not None: | ||
| return store_error | ||
|
|
||
| return self._build_success(approval_id, action_type, risk_level, title) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Split execute() to satisfy the function-size gate.
execute() currently exceeds the < 50 lines project limit (Line 99–Line 160). Please extract argument parsing/validation and orchestration into helpers to keep this entrypoint smaller and easier to test.
As per coding guidelines "Functions must be < 50 lines, files < 800 lines."
♻️ Refactor sketch
async def execute(
self,
*,
arguments: dict[str, Any],
) -> ToolExecutionResult:
- ...
+ parsed = self._parse_and_validate_arguments(arguments)
+ if isinstance(parsed, ToolExecutionResult):
+ return parsed
+ action_type, title, description = parsed
+
+ risk_level = self._classify_risk(action_type)
+ approval_id = f"approval-{uuid4().hex}"
+ store_error = await self._persist_item(
+ approval_id, action_type, title, description, risk_level
+ )
+ if store_error is not None:
+ return store_error
+ return self._build_success(approval_id, action_type, risk_level, title)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/approval_tool.py` around lines 99 - 160, Split execute
into a thin orchestrator plus two helpers: extract the argument
parsing/validation into a new method (e.g.
_parse_and_validate_arguments(arguments) -> either (action_type, title,
description) or a ToolExecutionResult error) which contains the try/except
KeyError block and the type/empty-string checks, and leave validation of action
type via the existing _validate_action_type call inside that helper; then
extract the remaining flow that classifies risk, generates approval_id, calls
_persist_item, and returns results into a second helper (e.g.
_create_and_store_approval(action_type, title, description) ->
ToolExecutionResult or success payload) that uses _classify_risk, uuid4().hex to
build approval_id, _persist_item, and _build_success; finally make execute() a
small async method that calls these two helpers in sequence and returns early on
any ToolExecutionResult error.
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Log fatal re-raises before propagating.
Line 189–Line 190 and Line 264–Line 265 re-raise MemoryError/RecursionError without a warning/error log. Add context logging before re-raise to satisfy error-path observability requirements.
As per coding guidelines "All error paths must log at WARNING or ERROR with context before raising."
🧭 Logging-before-raise example
- except MemoryError, RecursionError:
- raise
+ except MemoryError:
+ logger.error(
+ APPROVAL_GATE_ESCALATION_FAILED,
+ agent_id=self._agent_id,
+ action_type=action_type,
+ error="MemoryError",
+ note="Fatal during approval item persistence",
+ )
+ raise
+ except RecursionError:
+ logger.error(
+ APPROVAL_GATE_ESCALATION_FAILED,
+ agent_id=self._agent_id,
+ action_type=action_type,
+ error="RecursionError",
+ note="Fatal during approval item persistence",
+ )
+ raiseAlso applies to: 264-265
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/approval_tool.py` around lines 189 - 190, The except
blocks that catch MemoryError and RecursionError (the "except MemoryError,
RecursionError:" handlers in src/ai_company/tools/approval_tool.py) re-raise
without logging; modify each handler to log a warning or error with contextual
information (include the exception type and message and relevant
operation/context) via the module logger before re-raising the same exception so
the error path is observable; ensure you use the existing logger instance (or
create one consistent with the file) and call logger.warning/logger.error with
the exception info (e.g., include exc_info=True or include str(exc)) immediately
prior to the raise in both the first handler and the similar handler later in
the file.
| def test_valid_construction(self) -> None: | ||
| info = EscalationInfo( | ||
| approval_id="approval-1", | ||
| tool_call_id="tc-1", | ||
| tool_name="deploy_to_prod", | ||
| action_type="deploy:production", | ||
| risk_level=ApprovalRiskLevel.CRITICAL, | ||
| reason="Production deployment requires approval", | ||
| ) | ||
| assert info.approval_id == "approval-1" | ||
| assert info.tool_call_id == "tc-1" | ||
| assert info.tool_name == "deploy_to_prod" | ||
| assert info.action_type == "deploy:production" | ||
| assert info.risk_level == ApprovalRiskLevel.CRITICAL | ||
| assert info.reason == "Production deployment requires approval" | ||
|
|
||
| def test_frozen_immutability(self) -> None: | ||
| info = EscalationInfo( | ||
| approval_id="approval-1", | ||
| tool_call_id="tc-1", | ||
| tool_name="deploy_to_prod", | ||
| action_type="deploy:production", | ||
| risk_level=ApprovalRiskLevel.HIGH, | ||
| reason="Needs approval", | ||
| ) | ||
| with pytest.raises(ValidationError): | ||
| info.approval_id = "changed" # type: ignore[misc] | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "field", | ||
| ["approval_id", "tool_call_id", "tool_name", "action_type", "reason"], | ||
| ) | ||
| def test_blank_string_rejected(self, field: str) -> None: | ||
| kwargs = { | ||
| "approval_id": "approval-1", | ||
| "tool_call_id": "tc-1", | ||
| "tool_name": "deploy_to_prod", | ||
| "action_type": "deploy:production", | ||
| "risk_level": ApprovalRiskLevel.LOW, | ||
| "reason": "Needs approval", | ||
| } | ||
| kwargs[field] = " " | ||
| with pytest.raises(ValidationError): | ||
| EscalationInfo(**kwargs) # type: ignore[arg-type] | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "field", | ||
| ["approval_id", "tool_call_id", "tool_name", "action_type", "reason"], | ||
| ) | ||
| def test_empty_string_rejected(self, field: str) -> None: | ||
| kwargs = { | ||
| "approval_id": "approval-1", | ||
| "tool_call_id": "tc-1", | ||
| "tool_name": "deploy_to_prod", | ||
| "action_type": "deploy:production", | ||
| "risk_level": ApprovalRiskLevel.LOW, | ||
| "reason": "Needs approval", | ||
| } | ||
| kwargs[field] = "" | ||
| with pytest.raises(ValidationError): | ||
| EscalationInfo(**kwargs) # type: ignore[arg-type] | ||
|
|
||
| def test_all_risk_levels_accepted(self) -> None: | ||
| for level in ApprovalRiskLevel: | ||
| info = EscalationInfo( | ||
| approval_id="a", | ||
| tool_call_id="t", | ||
| tool_name="tool", | ||
| action_type="cat:act", | ||
| risk_level=level, | ||
| reason="reason", | ||
| ) | ||
| assert info.risk_level == level | ||
|
|
||
|
|
||
| class TestResumePayload: | ||
| """ResumePayload construction and immutability.""" | ||
|
|
||
| def test_approved_without_reason(self) -> None: | ||
| payload = ResumePayload( | ||
| approval_id="approval-1", | ||
| approved=True, | ||
| decided_by="admin", | ||
| ) | ||
| assert payload.approval_id == "approval-1" | ||
| assert payload.approved is True | ||
| assert payload.decided_by == "admin" | ||
| assert payload.decision_reason is None | ||
|
|
||
| def test_rejected_with_reason(self) -> None: | ||
| payload = ResumePayload( | ||
| approval_id="approval-1", | ||
| approved=False, | ||
| decided_by="admin", | ||
| decision_reason="Too risky", | ||
| ) | ||
| assert payload.approved is False | ||
| assert payload.decision_reason == "Too risky" | ||
|
|
||
| def test_frozen_immutability(self) -> None: | ||
| payload = ResumePayload( | ||
| approval_id="approval-1", | ||
| approved=True, | ||
| decided_by="admin", | ||
| ) | ||
| with pytest.raises(ValidationError): | ||
| payload.approved = False # type: ignore[misc] | ||
|
|
||
| @pytest.mark.parametrize("field", ["approval_id", "decided_by"]) | ||
| def test_blank_string_rejected(self, field: str) -> None: | ||
| kwargs = { | ||
| "approval_id": "approval-1", | ||
| "approved": True, | ||
| "decided_by": "admin", | ||
| } | ||
| kwargs[field] = " " | ||
| with pytest.raises(ValidationError): | ||
| ResumePayload(**kwargs) # type: ignore[arg-type] | ||
|
|
||
| @pytest.mark.parametrize("field", ["approval_id", "decided_by"]) | ||
| def test_empty_string_rejected(self, field: str) -> None: | ||
| kwargs = { | ||
| "approval_id": "approval-1", | ||
| "approved": True, | ||
| "decided_by": "admin", | ||
| } | ||
| kwargs[field] = "" | ||
| with pytest.raises(ValidationError): | ||
| ResumePayload(**kwargs) # type: ignore[arg-type] | ||
|
|
||
| def test_empty_decision_reason_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| ResumePayload( | ||
| approval_id="approval-1", | ||
| approved=False, | ||
| decided_by="admin", | ||
| decision_reason="", | ||
| ) | ||
|
|
||
| def test_blank_decision_reason_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| ResumePayload( | ||
| approval_id="approval-1", | ||
| approved=False, | ||
| decided_by="admin", | ||
| decision_reason=" ", | ||
| ) |
There was a problem hiding this comment.
Add Google-style docstrings to public test methods.
The test classes have docstrings, but the public test functions do not. This will violate the repository docstring policy for Python files.
🛠️ Suggested pattern (apply to all test methods in this file)
class TestEscalationInfo:
"""EscalationInfo construction and immutability."""
def test_valid_construction(self) -> None:
+ """Construct EscalationInfo with valid inputs and verify field values."""
info = EscalationInfo(
approval_id="approval-1",
tool_call_id="tc-1",
tool_name="deploy_to_prod",
action_type="deploy:production",
risk_level=ApprovalRiskLevel.CRITICAL,
reason="Production deployment requires approval",
)As per coding guidelines, **/*.py: Google-style docstrings required on all public classes and functions (enforced by ruff D rules).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_approval_gate_models.py` around lines 15 - 161, Add
Google-style docstrings to all public test functions in this file (e.g.,
test_valid_construction, test_frozen_immutability, test_blank_string_rejected,
test_empty_string_rejected, test_all_risk_levels_accepted,
test_approved_without_reason, test_rejected_with_reason,
test_empty_decision_reason_rejected, test_blank_decision_reason_rejected, etc.).
For each test function add a concise Google-style docstring describing the
behavior being tested (one-line summary; longer description or Args/Returns
sections are optional for clarity) so the functions satisfy the repository's
docstring policy enforced by ruff D rules. Ensure the docstrings are
triple-quoted and placed immediately under each def to mark them as
public-function docstrings.
- Read action_type from result.metadata instead of tool.action_type in _track_parking_metadata (request_human_approval carries the requested action in metadata, not the tool's own comms:internal) - Handle crash between V7 step 4 and 5: if both parked_contexts and parked_contexts_new exist, drop the redundant _new table
- Strip whitespace from action_type/title/description before use - Return generic error message to agent instead of exception details - Add timeout(30) to TestApprovalStore class - Fix _make_item default action_type to code:merge format
🤖 I have created a release *beep* *boop* --- ## [0.1.4](v0.1.3...v0.1.4) (2026-03-14) ### Features * add approval workflow gates to TaskEngine ([#387](#387)) ([2db968a](2db968a)) * implement checkpoint recovery strategy ([#367](#367)) ([f886838](f886838)) ### CI/CD * add npm and pre-commit ecosystems to Dependabot ([#369](#369)) ([54e5fe7](54e5fe7)) * bump actions/setup-node from 4.4.0 to 6.3.0 ([#360](#360)) ([2db5105](2db5105)) * bump github/codeql-action from 3.32.6 to 4.32.6 ([#361](#361)) ([ce766e8](ce766e8)) * group major dependabot bumps per ecosystem ([#388](#388)) ([3c43aef](3c43aef)) ### Maintenance * bump @vitejs/plugin-vue from 5.2.4 to 6.0.5 in /web ([#382](#382)) ([d7054ee](d7054ee)) * bump @vue/tsconfig from 0.7.0 to 0.9.0 in /web in the minor-and-patch group across 1 directory ([#371](#371)) ([64fa08b](64fa08b)) * bump astro from 5.18.1 to 6.0.4 in /site ([#376](#376)) ([d349317](d349317)) * bump https://github.com/astral-sh/ruff-pre-commit from v0.15.5 to 0.15.6 ([#372](#372)) ([dcacb2e](dcacb2e)) * bump https://github.com/gitleaks/gitleaks from v8.24.3 to 8.30.1 ([#375](#375)) ([a18e6ed](a18e6ed)) * bump https://github.com/hadolint/hadolint from v2.12.0 to 2.14.0 ([#373](#373)) ([47b906b](47b906b)) * bump https://github.com/pre-commit/pre-commit-hooks from v5.0.0 to 6.0.0 ([#374](#374)) ([1926555](1926555)) * bump litellm from 1.82.1 to 1.82.2 in the minor-and-patch group ([#385](#385)) ([fa4f7b7](fa4f7b7)) * bump node from 22-alpine to 25-alpine in /docker/web ([#359](#359)) ([8d56cd3](8d56cd3)) * bump node from 22-slim to 25-slim in /docker/sandbox ([#358](#358)) ([3de8748](3de8748)) * bump pinia from 2.3.1 to 3.0.4 in /web ([#381](#381)) ([c78dcc2](c78dcc2)) * bump the major group across 1 directory with 9 updates ([#389](#389)) ([9fa621b](9fa621b)) * bump the minor-and-patch group with 2 updates ([#362](#362)) ([6ede2ce](6ede2ce)) * bump vue-router from 4.6.4 to 5.0.3 in /web ([#378](#378)) ([6c60f6c](6c60f6c)) * expand review skills to 18 smart conditional agents ([#364](#364)) ([494013f](494013f)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary
Adds approval-gated execution to the agent engine so SecOps ESCALATE verdicts and agent-initiated
request_human_approvaltool calls can park execution, persist context, and support resumption workflows.ParkService+ParkedContextRepositoryApprovalItemand signals parking viaToolExecutionResult.metadatarequires_parkingmetadata, exposespending_escalationswith deterministic orderingexecute_tool_callsin loop_helpers checks for escalations after each tool-call batch and returns PARKED or ERROR terminationrequested_bybound to authenticated user,save_if_pendingatomic guard against double-decisions, prompt injection mitigation in resume messagesapproval_id(notid), fetches full items via API, re-fetches filtered queries on WS eventsparked_contexts.task_idnullable for taskless agents (crash-safe rename-first approach)is_valid_action_type()extracted tocore/validation.py, used by both DTO and toolKey architectural decisions
_signal_resume_intent) — actual re-enqueue deferred to a future scheduler component (documented with TODO)execution_loopsupplied alongsideapproval_storelogs a warning — the gate is only auto-wired for the defaultReactLoop_track_parking_metadatareturns errorToolResultinstead of raisingToolExecutionError— prevents sibling task cancellation ininvoke_allTaskGroup_cleanup_parkeddoes NOT prevent resume — the context is already deserializedTest plan
Closes #258, Closes #259