feat: implement AgentEngine core orchestrator (#11)#142
feat: implement AgentEngine core orchestrator (#11)#142
Conversation
Add AgentEngine as the top-level orchestrator that ties together prompt construction, execution context, execution loop, tool invocation, and budget tracking into a single run() entry point. New files: - engine/agent_engine.py: AgentEngine class with run() method - engine/run_result.py: AgentRunResult frozen model with computed fields - tests/unit/engine/test_agent_engine.py: 27 unit tests (14 classes) Modified: - engine/__init__.py: export AgentEngine, AgentRunResult - observability/events/execution.py: 6 new engine event constants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 9 agents, 23 findings addressed: - Critical: cost recording failure no longer destroys successful results - Major: extracted _prepare_context, fixed duration in error path, removed duplicate DEFAULT_MAX_TURNS, added Raises docstring - Added 5 new tests (zero-cost skip, cost-tracker failure, completion config forwarding, max_turns forwarding, deadline formatting) - Updated DESIGN_SPEC.md §6.5 with AgentEngine + AgentRunResult docs - Added 4 new execution event constants for debug observability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
📝 WalkthroughSummary by CodeRabbitNew Features
WalkthroughIntroduces AgentEngine, a top-level orchestrator for agent execution that validates inputs, builds system prompts, creates execution context, delegates to an ExecutionLoop, handles task state transitions, applies budget constraints, records costs, and returns structured results with metadata. Changes
Sequence DiagramsequenceDiagram
participant Client
participant AgentEngine
participant Validator
participant PromptBuilder
participant ExecutionLoop
participant CostTracker
participant TaskMgmt
Client->>AgentEngine: run(identity, task, ...)
AgentEngine->>Validator: _validate_agent(identity, agent_id)
Validator-->>AgentEngine: ✓ or ExecutionStateError
AgentEngine->>Validator: _validate_task(task, agent_id, task_id)
Validator-->>AgentEngine: ✓ or ExecutionStateError
AgentEngine->>PromptBuilder: build_system_prompt(agent config)
PromptBuilder-->>AgentEngine: SystemPrompt
AgentEngine->>TaskMgmt: transition task ASSIGNED→IN_PROGRESS
TaskMgmt-->>AgentEngine: ✓
AgentEngine->>AgentEngine: _prepare_context(identity, task, system_prompt, ...)
AgentEngine->>AgentEngine: seed conversation(system msg + task instruction)
AgentEngine->>ExecutionLoop: run(context, budget_checker, tool_invoker, max_turns)
ExecutionLoop-->>AgentEngine: ExecutionResult(termination_reason, turns, cost, ...)
alt MemoryError or RecursionError
AgentEngine->>AgentEngine: re-raise exception
else Other Exception
AgentEngine->>AgentEngine: _handle_fatal_error() → AgentRunResult(termination=ERROR)
end
AgentEngine->>CostTracker: _record_costs(result, identity, agent_id, task_id)
CostTracker-->>AgentEngine: ✓ (or fail gracefully)
AgentEngine->>Client: AgentRunResult(execution_result, system_prompt, duration, agent_id, task_id, ...)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the foundational Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the AgentEngine as the core orchestrator, a significant and well-implemented feature with well-structured code and comprehensive unit tests, along with thorough documentation updates in DESIGN_SPEC.md and CLAUDE.md. However, a significant security vulnerability related to prompt injection was identified: the AgentEngine directly concatenates potentially untrusted task and identity data into LLM prompts without sanitization, which could allow an attacker to manipulate the agent's behavior or execute unauthorized tool calls. Remediation is strongly recommended. Additionally, there are minor areas for improvement related to code consistency and formatting, specifically a formatting inconsistency in task instruction string generation and an inconsistent style for handling type-only imports.
| def _format_task_instruction(task: Task) -> str: | ||
| """Format a task into a user message for the initial conversation.""" | ||
| parts = [f"# Task: {task.title}", "", task.description] | ||
|
|
||
| if task.acceptance_criteria: | ||
| parts.append("") | ||
| parts.append("## Acceptance Criteria") | ||
| parts.extend(f"- {c.description}" for c in task.acceptance_criteria) | ||
|
|
||
| if task.budget_limit > 0: | ||
| parts.append("") | ||
| parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD") | ||
|
|
||
| if task.deadline: | ||
| parts.append(f"**Deadline:** {task.deadline}") | ||
|
|
||
| return "\n".join(parts) |
There was a problem hiding this comment.
A prompt injection vulnerability exists because the AgentEngine constructs prompts by directly concatenating fields from the Task and AgentIdentity objects without any sanitization or escaping. If an attacker controls Task or AgentIdentity fields (e.g., title, description, acceptance_criteria, name, personality.description), they can inject malicious instructions, potentially leading to LLM manipulation, unauthorized tool execution, or data exfiltration. Remediation requires implementing sanitization or escaping for all untrusted inputs, considering structured prompt formats or LLM-based guardrails, and instructing the LLM to treat user-provided task data as untrusted. (Also, note a minor formatting inconsistency: the deadline section lacks a blank line before it, unlike the budget limit section, which can reduce readability.)
| system_prompt = build_system_prompt( | ||
| agent=identity, | ||
| task=task, | ||
| available_tools=tool_defs, | ||
| ) | ||
|
|
||
| ctx = AgentContext.from_identity( | ||
| identity, | ||
| task=task, | ||
| max_turns=max_turns, | ||
| ) | ||
| # Seed conversation with system prompt and task instruction | ||
| ctx = ctx.with_message( | ||
| ChatMessage(role=MessageRole.SYSTEM, content=system_prompt.content), | ||
| ) | ||
| ctx = ctx.with_message( | ||
| ChatMessage( | ||
| role=MessageRole.USER, | ||
| content=_format_task_instruction(task), | ||
| ), | ||
| ) |
There was a problem hiding this comment.
The AgentEngine constructs prompts by directly concatenating fields from the Task and AgentIdentity objects without any sanitization or escaping. Specifically, in _prepare_context, the task and identity objects are passed to build_system_prompt to create the system prompt, and the task object is passed to _format_task_instruction to create the initial user message. If an attacker can control the fields of the Task (e.g., title, description, acceptance_criteria) or AgentIdentity (e.g., name, personality.description), they can inject malicious instructions into the prompt. This can lead to manipulation of the LLM's behavior, unauthorized tool execution, or exfiltration of sensitive information.
To remediate this, implement sanitization or escaping for all untrusted inputs included in prompts. Consider using structured prompt formats or LLM-based guardrails to detect and prevent prompt injection. Ensure that the LLM is instructed to treat user-provided task data as untrusted content.
| from ai_company.core.types import NotBlankStr # noqa: TC001 | ||
| from ai_company.engine.loop_protocol import ( | ||
| ExecutionResult, | ||
| TerminationReason, | ||
| ) | ||
| from ai_company.engine.prompt import SystemPrompt # noqa: TC001 |
There was a problem hiding this comment.
For consistency with other files in this PR (e.g., agent_engine.py), type-only imports should be placed within a TYPE_CHECKING block. This avoids using noqa suppressions and is a standard practice for handling circular dependencies and improving startup time. You would also need to update the type hints in the class to be string forward references (e.g., agent_id: "NotBlankStr").
| import pytest | ||
|
|
||
| from ai_company.budget.tracker import CostTracker | ||
| from ai_company.core.agent import AgentIdentity # noqa: TC001 |
There was a problem hiding this comment.
For consistency with other files in this PR (e.g., agent_engine.py), type-only imports should be placed within a TYPE_CHECKING block. This avoids using noqa suppressions. You would also need to update the type hints in function signatures to be string forward references (e.g., sample_agent_with_personality: "AgentIdentity").
There was a problem hiding this comment.
Pull request overview
This PR implements AgentEngine, the top-level orchestrator that ties together prompt construction, execution context, execution loop, tool invocation, and budget tracking. It also introduces AgentRunResult, a frozen Pydantic model wrapping ExecutionResult with engine-level metadata.
Changes:
AgentEngine+AgentRunResultimplementation with 32 unit tests covering happy paths, error handling, budget checking, tool integration, and computed fields- 10 new observability event constants under
EXECUTION_ENGINE_*namespace - Documentation updates in
DESIGN_SPEC.md(§6.5 AgentEngine orchestrator) andCLAUDE.md
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/engine/agent_engine.py |
Core orchestrator: validation, prompt building, context seeding, loop delegation, cost recording, error wrapping |
src/ai_company/engine/run_result.py |
Frozen AgentRunResult model with computed fields delegating to ExecutionResult |
src/ai_company/observability/events/execution.py |
10 new EXECUTION_ENGINE_* event constants |
src/ai_company/engine/__init__.py |
Adds AgentEngine and AgentRunResult to public API exports |
tests/unit/engine/test_agent_engine.py |
32 unit tests covering orchestration, error handling, budget, cost recording, immutability |
DESIGN_SPEC.md |
AgentEngine pipeline description and AgentRunResult field documentation in §6.5 |
CLAUDE.md |
Updates engine/ package description |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| max_turns=max_turns, | ||
| start=start, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
Both except clauses use except MemoryError, RecursionError: which is invalid Python 3 syntax. In Python 3, this is a SyntaxError. The correct syntax for catching multiple exception types is except (MemoryError, RecursionError):. Without the parentheses, this code will fail to parse entirely and cannot be imported or executed.
The correct pattern, consistent with how it is used elsewhere in the codebase (e.g., src/ai_company/tools/invoker.py), is except (MemoryError, RecursionError):.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| timestamp=datetime.now(UTC), | ||
| ) | ||
| await self._cost_tracker.record(record) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
Same invalid Python 3 syntax as at line 138: except MemoryError, RecursionError: is a SyntaxError in Python 3. It must be written as except (MemoryError, RecursionError): to correctly catch both MemoryError and RecursionError and re-raise them.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| EXECUTION_ENGINE_CREATED: Final[str] = "execution.engine.created" | ||
| EXECUTION_ENGINE_START: Final[str] = "execution.engine.start" | ||
| EXECUTION_ENGINE_PROMPT_BUILT: Final[str] = "execution.engine.prompt_built" | ||
| EXECUTION_ENGINE_COMPLETE: Final[str] = "execution.engine.complete" | ||
| EXECUTION_ENGINE_ERROR: Final[str] = "execution.engine.error" | ||
| EXECUTION_ENGINE_INVALID_INPUT: Final[str] = "execution.engine.invalid_input" | ||
| EXECUTION_ENGINE_TASK_TRANSITION: Final[str] = "execution.engine.task_transition" | ||
| EXECUTION_ENGINE_COST_RECORDED: Final[str] = "execution.engine.cost_recorded" | ||
| EXECUTION_ENGINE_COST_SKIPPED: Final[str] = "execution.engine.cost_skipped" | ||
| EXECUTION_ENGINE_COST_FAILED: Final[str] = "execution.engine.cost_failed" |
There was a problem hiding this comment.
The PR description states "4 new execution event constants for debug observability (EXECUTION_ENGINE_CREATED, EXECUTION_ENGINE_PROMPT_BUILT, EXECUTION_ENGINE_COST_SKIPPED, EXECUTION_ENGINE_COST_FAILED)", but the diff actually adds 10 new constants: EXECUTION_ENGINE_CREATED, EXECUTION_ENGINE_START, EXECUTION_ENGINE_PROMPT_BUILT, EXECUTION_ENGINE_COMPLETE, EXECUTION_ENGINE_ERROR, EXECUTION_ENGINE_INVALID_INPUT, EXECUTION_ENGINE_TASK_TRANSITION, EXECUTION_ENGINE_COST_RECORDED, EXECUTION_ENGINE_COST_SKIPPED, and EXECUTION_ENGINE_COST_FAILED. The PR description should be updated to reflect the actual number of new constants added.
| parts.append("") | ||
| parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD") | ||
|
|
||
| if task.deadline: |
There was a problem hiding this comment.
The deadline is appended without a blank-line separator before it, while budget_limit inserts an empty string (parts.append("")) before it for spacing. When a deadline exists but budget_limit is 0 (or negative), the deadline immediately follows the description or last acceptance criterion with no blank line, resulting in inconsistent markdown formatting compared to the budget block.
| if task.deadline: | |
| if task.deadline: | |
| if task.budget_limit <= 0: | |
| parts.append("") |
| task_id: Task identifier, if task-bound. | ||
| """ | ||
|
|
||
| model_config = ConfigDict(frozen=True) |
There was a problem hiding this comment.
The codebase consistently includes a test_frozen test for every frozen Pydantic model (e.g., TurnRecord, ExecutionResult, SystemPrompt, AgentContext, TaskExecution, etc. all have dedicated test_frozen cases). The AgentRunResult model uses model_config = ConfigDict(frozen=True) but test_agent_engine.py does not include a corresponding test_frozen test. A test that attempts to reassign a field and expects a ValidationError should be added, consistent with the convention established throughout the engine test suite.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 2155-2156: Section 15.3's "planned-only" note conflicts with the
file tree because agent_engine.py is listed with a milestone tag (M3) but the PR
actually adds that file; update the doc so readers can tell which modules exist
now by either removing milestone tags from files that are already shipped
(remove "(M3)" from agent_engine.py and any other added files like run_result.py
if shipped) or rewrite the note to state that milestone tags indicate original
plan (not current presence) and explicitly mark planned-only files (e.g., add
"(planned)" or a separate subsection). Also update any cross-references
mentioned (CLAUDE.md, README.md) to reflect the corrected status of
agent_engine.py and related engine modules.
In `@src/ai_company/engine/agent_engine.py`:
- Around line 162-212: After successfully calling _prepare_context (which
returns ctx and system_prompt) modify _execute so any exceptions raised after
that point are caught and forwarded to _handle_fatal_error with the original ctx
and system_prompt instead of letting _handle_fatal_error rebuild a blank
AgentContext; specifically wrap the post-_prepare_context work (including
creation of budget_checker/tool_invoker, the await self._loop.execute call,
_record_costs, and construction of AgentRunResult) in a try/except that calls
_handle_fatal_error(exception, ctx=ctx, system_prompt=system_prompt, ...) to
preserve the seeded conversation, max_turns and ASSIGNED->IN_PROGRESS state when
returning the AgentRunResult.
- Around line 94-148: Validate the incoming max_turns at the start of the method
(before the try/except that wraps _execute) so invalid caller input (e.g.,
max_turns <= 0) is rejected at the boundary rather than being converted into a
fatal engine error; add a short check right after agent_id/task_id assignment
that raises a clear user-level exception (e.g., ValueError with a concise
message) when max_turns is not positive, keeping references to the existing
symbols _execute, _handle_fatal_error, and AgentContext.from_identity to locate
the related logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 961cac72-615d-4e70-84bd-02851daef231
📒 Files selected for processing (7)
CLAUDE.mdDESIGN_SPEC.mdsrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pytests/unit/engine/test_agent_engine.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:(no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
All public functions require type hints. Enforce mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones. Usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement on non-Pydantic internal collections (registries,BaseTool)
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStrfromcore.typesfor all identifier/name fields (including optionalNotBlankStr | Noneand tupletuple[NotBlankStr, ...]variants) instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Maximum line length is 88 characters (enforced by ruff)
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pysrc/ai_company/engine/__init__.pytests/unit/engine/test_agent_engine.pysrc/ai_company/engine/agent_engine.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST havefrom ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code
Logger variable name must always belogger(not_logger, notlog)
Use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with kwargs:logger.info(EVENT, key=value). Never use format strings likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging
SetRetryConfigandRateLimiterConfigper-provider inProviderConfig
Files:
src/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.py
src/ai_company/{engine,providers}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals that all retries failed—the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/run_result.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.py
{src/ai_company,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2).claude/skill/agent files, (3) third-party import paths/module names. Tests must usetest-provider,test-small-001, etc.
Files:
src/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pysrc/ai_company/engine/__init__.pytests/unit/engine/test_agent_engine.pysrc/ai_company/engine/agent_engine.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowfor test categorization
Maintain 80% minimum code coverage (enforced in CI)
Files:
tests/unit/engine/test_agent_engine.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Use base class functionality and leverage reusable components when possible in agent implementations
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/ai_company/observability/events/execution.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/{engine,providers}/**/*.py : `RetryExhaustedError` signals that all retries failed—the engine layer catches this to trigger fallback chains
Applied to files:
src/ai_company/engine/__init__.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/engine/test_agent_engine.py
| │ │ ├── run_result.py # AgentRunResult outcome model | ||
| │ │ ├── agent_engine.py # Agent execution engine (M3) |
There was a problem hiding this comment.
Section 15.3 now contradicts itself.
The note above this tree says milestone-tagged files are planned-only, but agent_engine.py is still tagged (M3) even though this PR adds it. Please either drop milestone tags from shipped files or rewrite the note so readers can tell which engine modules actually exist.
Based on learnings, "When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DESIGN_SPEC.md` around lines 2155 - 2156, Section 15.3's "planned-only" note
conflicts with the file tree because agent_engine.py is listed with a milestone
tag (M3) but the PR actually adds that file; update the doc so readers can tell
which modules exist now by either removing milestone tags from files that are
already shipped (remove "(M3)" from agent_engine.py and any other added files
like run_result.py if shipped) or rewrite the note to state that milestone tags
indicate original plan (not current presence) and explicitly mark planned-only
files (e.g., add "(planned)" or a separate subsection). Also update any
cross-references mentioned (CLAUDE.md, README.md) to reflect the corrected
status of agent_engine.py and related engine modules.
| max_turns: int = DEFAULT_MAX_TURNS, | ||
| ) -> AgentRunResult: | ||
| """Execute an agent on a task. | ||
|
|
||
| Args: | ||
| identity: Frozen agent identity card. | ||
| task: Task to execute (must be ASSIGNED or IN_PROGRESS). | ||
| completion_config: Optional per-run LLM config override. | ||
| max_turns: Maximum LLM turns allowed. | ||
|
|
||
| Returns: | ||
| ``AgentRunResult`` with execution outcome and metadata. | ||
|
|
||
| Raises: | ||
| ExecutionStateError: If the agent is not ACTIVE or the task | ||
| is not ASSIGNED/IN_PROGRESS. | ||
| MemoryError: Re-raised unconditionally (non-recoverable). | ||
| RecursionError: Re-raised unconditionally (non-recoverable). | ||
| """ | ||
| agent_id = str(identity.id) | ||
| task_id = task.id | ||
|
|
||
| self._validate_agent(identity, agent_id) | ||
| self._validate_task(task, agent_id, task_id) | ||
|
|
||
| logger.info( | ||
| EXECUTION_ENGINE_START, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| loop_type=self._loop.get_loop_type(), | ||
| max_turns=max_turns, | ||
| ) | ||
|
|
||
| start = time.monotonic() | ||
| try: | ||
| return await self._execute( | ||
| identity=identity, | ||
| task=task, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| completion_config=completion_config, | ||
| max_turns=max_turns, | ||
| start=start, | ||
| ) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| return self._handle_fatal_error( | ||
| exc=exc, | ||
| identity=identity, | ||
| task=task, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| duration_seconds=time.monotonic() - start, | ||
| ) |
There was a problem hiding this comment.
Validate max_turns before entering the generic error wrapper.
Line 94 exposes max_turns, but the first real validation happens later inside AgentContext.from_identity(). Today max_turns <= 0 gets caught by the broad except Exception path and comes back as TerminationReason.ERROR, which turns bad caller input into a runtime engine failure instead of rejecting it at the boundary.
Suggested fix
agent_id = str(identity.id)
task_id = task.id
+ if max_turns <= 0:
+ msg = f"max_turns must be greater than 0, got {max_turns}"
+ logger.warning(
+ EXECUTION_ENGINE_INVALID_INPUT,
+ agent_id=agent_id,
+ task_id=task_id,
+ reason=msg,
+ )
+ raise ValueError(msg)
+
self._validate_agent(identity, agent_id)
self._validate_task(task, agent_id, task_id)As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/agent_engine.py` around lines 94 - 148, Validate the
incoming max_turns at the start of the method (before the try/except that wraps
_execute) so invalid caller input (e.g., max_turns <= 0) is rejected at the
boundary rather than being converted into a fatal engine error; add a short
check right after agent_id/task_id assignment that raises a clear user-level
exception (e.g., ValueError with a concise message) when max_turns is not
positive, keeping references to the existing symbols _execute,
_handle_fatal_error, and AgentContext.from_identity to locate the related logic.
| ctx, system_prompt = self._prepare_context( | ||
| identity=identity, | ||
| task=task, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| max_turns=max_turns, | ||
| ) | ||
| budget_checker = _make_budget_checker(task) | ||
| tool_invoker = self._make_tool_invoker() | ||
|
|
||
| logger.debug( | ||
| EXECUTION_ENGINE_PROMPT_BUILT, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| estimated_tokens=system_prompt.estimated_tokens, | ||
| ) | ||
|
|
||
| execution_result = await self._loop.execute( | ||
| context=ctx, | ||
| provider=self._provider, | ||
| tool_invoker=tool_invoker, | ||
| budget_checker=budget_checker, | ||
| completion_config=completion_config, | ||
| ) | ||
| duration = time.monotonic() - start | ||
|
|
||
| await self._record_costs( | ||
| execution_result, | ||
| identity, | ||
| agent_id, | ||
| task_id, | ||
| ) | ||
|
|
||
| result = AgentRunResult( | ||
| execution_result=execution_result, | ||
| system_prompt=system_prompt, | ||
| duration_seconds=duration, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| ) | ||
| logger.info( | ||
| EXECUTION_ENGINE_COMPLETE, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| termination_reason=result.termination_reason.value, | ||
| total_turns=result.total_turns, | ||
| total_tokens=execution_result.context.accumulated_cost.total_tokens, | ||
| duration_seconds=duration, | ||
| cost_usd=result.total_cost_usd, | ||
| ) | ||
| return result |
There was a problem hiding this comment.
Preserve the prepared context/prompt on fatal errors.
Once _prepare_context() succeeds, any later exception still falls through to _handle_fatal_error(), which rebuilds a blank AgentContext and synthetic prompt at Lines 400-412. That drops the seeded conversation, the caller-supplied max_turns, and the ASSIGNED -> IN_PROGRESS transition, so the returned AgentRunResult can report misleading state after execution has already started.
Catch the post-setup failure path inside _execute() and thread the prepared ctx / system_prompt into _handle_fatal_error() when they already exist.
Also applies to: 381-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/agent_engine.py` around lines 162 - 212, After
successfully calling _prepare_context (which returns ctx and system_prompt)
modify _execute so any exceptions raised after that point are caught and
forwarded to _handle_fatal_error with the original ctx and system_prompt instead
of letting _handle_fatal_error rebuild a blank AgentContext; specifically wrap
the post-_prepare_context work (including creation of
budget_checker/tool_invoker, the await self._loop.execute call, _record_costs,
and construction of AgentRunResult) in a try/except that calls
_handle_fatal_error(exception, ctx=ctx, system_prompt=system_prompt, ...) to
preserve the seeded conversation, max_turns and ASSIGNED->IN_PROGRESS state when
returning the AgentRunResult.
Greptile SummaryThis PR introduces Key finding:
Confidence Score: 4/5
Last reviewed commit: b033a75 |
| if task.deadline: | ||
| parts.append(f"**Deadline:** {task.deadline}") |
There was a problem hiding this comment.
The deadline section is missing the blank-line separator that the budget section includes. When both budget and deadline are present, they will render on consecutive lines with no visual separation. When only deadline is present, it butts directly against the preceding section.
| if task.deadline: | |
| parts.append(f"**Deadline:** {task.deadline}") | |
| if task.deadline: | |
| parts.append("") | |
| parts.append(f"**Deadline:** {task.deadline}") |
This ensures consistent Markdown formatting across all task instruction blocks.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 435-436
Comment:
The deadline section is missing the blank-line separator that the budget section includes. When both budget and deadline are present, they will render on consecutive lines with no visual separation. When only deadline is present, it butts directly against the preceding section.
```suggestion
if task.deadline:
parts.append("")
parts.append(f"**Deadline:** {task.deadline}")
```
This ensures consistent Markdown formatting across all task instruction blocks.
How can I resolve this? If you propose a fix, please make it concise.
Summary
AgentEngine— top-level orchestrator tying together prompt construction, execution context, execution loop, tool invocation, and budget tracking into a singlerun()entry pointAgentRunResult— frozen Pydantic model wrappingExecutionResultwith engine-level metadata (system prompt, duration, agent/task IDs, computed convenience fields)EXECUTION_ENGINE_CREATED,EXECUTION_ENGINE_PROMPT_BUILT,EXECUTION_ENGINE_COST_SKIPPED,EXECUTION_ENGINE_COST_FAILED)Key design decisions
MemoryErrorandRecursionErrorpropagate unconditionally; all other exceptions are caught and wrapped in an error resultDEFAULT_MAX_TURNSimported fromcontext.py(single source of truth, no duplication)_prepare_context()extracted to keep all methods under the 50-line conventionduration_secondscaptured inrun()and preserved even in error resultsDocumentation updates
run_result.pyto project structureengine/package descriptionCloses #11
Test plan
uv run ruff check src/ tests/— all checks passuv run mypy src/ tests/— no issues in 218 filesuv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 1937 passed, 95.38% coverageagent_engine.pyat 99% coverage,run_result.pyat 100%Review coverage
Pre-reviewed by 9 agents: code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency. 26 findings total, 23 implemented (4 Critical, 6 Major, 10 Medium, 3 Minor) + 3 suggestions. 3 items deferred as future M4 concerns (exception object preservation, broad catch narrowing, fresh context in error path).