refactor: pre-PR review improvements for ExecutionLoop + ReAct loop (#124)#141
refactor: pre-PR review improvements for ExecutionLoop + ReAct loop (#124)#141
Conversation
Implement the ExecutionLoop protocol and ReAct execution loop that wires together the provider, tool system, and agent context into a think-act-observe loop. This is the critical-path foundation for the agent engine (#11). New files: - engine/loop_protocol.py: ExecutionLoop protocol, ExecutionResult, TurnRecord, TerminationReason, BudgetChecker - engine/react_loop.py: ReactLoop implementation Modified: - engine/errors.py: BudgetExhaustedError, LoopExecutionError - observability/events/execution.py: 7 loop event constants - tools/invoker.py: registry property (read-only) - engine/__init__.py: re-export new public API - CLAUDE.md: add dependency-order planning, GH query, PR ref rules Tests: 27 new tests (100% coverage on new code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…124) - Convert total_tool_calls and total_tokens to @computed_field - Add model_validator for error_message/termination_reason consistency - Decompose ReactLoop.execute() into focused helper methods - Add comprehensive error handling (provider, budget, tool exceptions) - Use logger.exception in except blocks (TRY400 compliance) - Use PEP 758 except syntax where compatible with ruff - Update DESIGN_SPEC.md §6.5 with ExecutionLoop protocol docs - Update §15.3 project structure with loop_protocol.py and react_loop.py - Add tests for new error paths and computed fields Pre-reviewed by 10 agents, 16 findings addressed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds an ExecutionLoop protocol and a ReAct implementation, supporting structured execution results, termination reasons, turn records, budget checking, observability events, new engine errors, a ToolInvoker registry accessor, and extensive unit tests validating loop behavior. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the agent's execution loop, moving towards a more modular and robust design. By introducing a formal Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed ExecutionLoop protocol and a robust ReactLoop implementation, which is a significant architectural improvement for agent execution. The code is well-structured with a clear separation of concerns, and the test coverage is comprehensive. The decomposition of the main execution logic into smaller helper methods is excellent for readability and maintainability. My primary concern is a critical syntax issue in the exception handling within the new react_loop.py file, which seems to originate from a misleading guideline in the project's documentation. This issue will prevent the code from running in a standard Python 3 environment.
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
This except block uses Python 2 syntax for catching multiple exceptions. In Python 3, this will raise a SyntaxError. Multiple exception types must be enclosed in parentheses as a tuple.
While I notice the CLAUDE.md file recommends this comma-separated syntax, that guideline appears to be based on a misunderstanding. The associated PEP 758 was rejected, and this syntax is not valid in any version of Python 3. To ensure the code is runnable in a standard Python 3 environment, this should be corrected.
except (MemoryError, RecursionError):
raise| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
As with the other except block in this file, this uses Python 2 syntax. This will cause a SyntaxError in Python 3. The exceptions should be grouped in a tuple to ensure the code is valid and runnable.
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): | |
| raise |
Greptile SummaryThis PR refactors the execution loop layer by introducing a formal Key changes:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([execute called]) --> B{has_turns_remaining?}
B -- No --> Z([return MAX_TURNS])
B -- Yes --> C[_check_budget]
C -- exhausted --> Y([return BUDGET_EXHAUSTED])
C -- checker raises --> X([return ERROR])
C -- ok --> D[_call_provider]
D -- provider raises --> W([return ERROR])
D -- ok --> E[append TurnRecord]
E --> F[_check_response_errors]
F -- CONTENT_FILTER / ERROR --> V([return ERROR\n⚠️ task_execution cost NOT updated])
F -- ok --> G[ctx.with_turn_completed]
G --> H{tool_calls empty?}
H -- Yes --> I[_handle_completion]
I -- TOOL_USE + no calls --> U([return ERROR])
I -- STOP / MAX_TOKENS --> T([return COMPLETED])
H -- No --> J[_execute_tool_calls]
J -- no invoker --> S([return ERROR])
J -- invoke_all raises --> R([return ERROR])
J -- ok --> K[append tool result messages to ctx]
K --> B
Last reviewed commit: 6e739c7 |
There was a problem hiding this comment.
Pull request overview
Adds the ExecutionLoop protocol and a ReAct-based loop implementation with structured observability, clearer engine-layer error types, updated design docs, and comprehensive unit coverage for loop termination/error paths.
Changes:
- Introduce
ExecutionLoopprotocol + frozen result models (ExecutionResult,TurnRecord,TerminationReason) with computed fields and cross-field validation. - Implement
ReactLoopwith decomposed helpers, explicit handling for provider/tool/budget failures, and new execution-loop observability events. - Add/extend unit tests and documentation to cover new loop behaviors and clarify engine vs loop error semantics.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/engine/loop_protocol.py |
Defines protocol + result models with computed fields and validation. |
src/ai_company/engine/react_loop.py |
Implements the ReAct execution loop with structured logging and robust error handling. |
src/ai_company/observability/events/execution.py |
Adds event constants for loop lifecycle/turn/tool logging. |
src/ai_company/tools/invoker.py |
Exposes ToolInvoker.registry for deriving tool definitions. |
src/ai_company/engine/errors.py |
Adds engine-layer exceptions to mirror loop termination reasons. |
src/ai_company/engine/__init__.py |
Re-exports new loop APIs and related errors. |
tests/unit/engine/conftest.py |
Adds a MockCompletionProvider test double + fixture. |
tests/unit/engine/test_react_loop.py |
Adds comprehensive ReactLoop behavior tests across success and failure modes. |
tests/unit/engine/test_loop_protocol.py |
Tests protocol models, computed fields, and validation rules. |
tests/unit/engine/test_errors.py |
Extends error hierarchy tests for new exceptions. |
DESIGN_SPEC.md |
Documents the ExecutionLoop protocol and updates diagrams/project structure. |
CLAUDE.md |
Updates contributor workflow guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| exhausted = budget_checker(ctx) | ||
| except Exception as exc: | ||
| error_msg = f"Budget checker failed: {type(exc).__name__}: {exc}" | ||
| logger.exception( | ||
| EXECUTION_LOOP_ERROR, | ||
| execution_id=ctx.execution_id, | ||
| turn=ctx.turn_count, | ||
| error=error_msg, | ||
| ) | ||
| return _build_result( | ||
| ctx, | ||
| TerminationReason.ERROR, | ||
| turns, | ||
| error_message=error_msg, | ||
| ) |
There was a problem hiding this comment.
In _check_budget, MemoryError/RecursionError are currently caught by the broad except Exception and converted into an ExecutionResult. Elsewhere in this loop (provider call + tool execution) these exceptions are treated as non-recoverable and re-raised. Consider adding an explicit except MemoryError, RecursionError: raise here as well to keep fatal error handling consistent and avoid masking OOM/recursion failures.
tests/unit/engine/test_react_loop.py
Outdated
|
|
||
| @pytest.mark.unit | ||
| class TestReactLoopToolExecutionException: | ||
| """Tool invoker raising fatal error during invoke_all().""" |
There was a problem hiding this comment.
The class docstring says the tool invoker raises a fatal error during invoke_all(), but this test actually exercises a tool raising RuntimeError, which ToolInvoker.invoke() converts into a ToolResult(is_error=True) (i.e., not fatal). Updating the docstring to match the behavior would make the test intent clearer.
| """Tool invoker raising fatal error during invoke_all().""" | |
| """Tool execution errors are captured by ToolInvoker and do not crash the loop.""" |
tests/unit/engine/test_react_loop.py
Outdated
| assert result.termination_reason in ( | ||
| TerminationReason.ERROR, | ||
| TerminationReason.COMPLETED, | ||
| ) |
There was a problem hiding this comment.
This assertion allows both ERROR and COMPLETED, which makes the test too permissive and reduces its ability to catch regressions. Given the current setup (provider has only one response), the loop should deterministically return TerminationReason.ERROR on the next provider call due to IndexError (no more responses); it would be better to assert the exact termination reason and, ideally, key parts of the error message.
| assert result.termination_reason in ( | |
| TerminationReason.ERROR, | |
| TerminationReason.COMPLETED, | |
| ) | |
| assert result.termination_reason == TerminationReason.ERROR |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 811-814: The spec removes a distinct blocked termination state and
folds it into ERROR which loses important semantics; update the
TerminationReason enum to include BLOCKED as a first-class value (e.g., add
`BLOCKED` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR`) and
adjust ExecutionResult rules so that an error message is required only when
TerminationReason == `ERROR`, while `BLOCKED` is documented as a
resumable/parked state (not an outright failure) and any references to
termination handling or consumers of ExecutionResult/TurnRecord reflect this
distinction; if BLOCKED is intentionally deferred, add a clear note in the spec
stating it will be introduced in M3 rather than collapsing it into ERROR.
In `@src/ai_company/tools/invoker.py`:
- Around line 87-90: The current registry property on ToolInvoker exposes the
live ToolRegistry (_registry), widening the public surface and allowing callers
to bypass validation via registry.get(...).execute(...); change the registry
property to instead return only provider-facing tool definitions (e.g., a
deep-copied mapping/list of ToolDefinition or ToolSchema objects) that contain
schemas/metadata needed by ReactLoop and do not expose ToolRegistry methods;
ensure the property returns immutable or copied schema objects (not the original
_registry or Tool instances) so callers cannot call execute or mutate the
underlying registry.
In `@tests/unit/engine/test_react_loop.py`:
- Around line 554-570: The test only verifies completion succeeded but doesn't
confirm the custom CompletionConfig was passed through to the provider; update
test_custom_completion_config to assert the MockCompletionProvider recorded the
request with the override. After calling ReactLoop.execute, inspect the
provider's request log (e.g., provider.request_log or provider.requests) and
assert the first request's completion config or fields (temperature and
max_tokens) match the passed custom_config, ensuring ReactLoop.execute actually
forwarded the override to the provider.
- Around line 667-687: The test incorrectly treats FinishReason.MAX_TOKENS as a
successful completion; update the test (test_max_tokens_returns_completed) to
assert that a MAX_TOKENS finish is NOT mapped to TerminationReason.COMPLETED
(e.g., assert result.termination_reason != TerminationReason.COMPLETED and/or
assert it equals the non-success code your system uses such as
TerminationReason.INTERRUPTED/INCOMPLETE), and verify ReactLoop.execute’s
mapping of CompletionResponse.finish_reason (FinishReason.MAX_TOKENS) treats it
as a non-success path instead of producing TerminationReason.COMPLETED.
- Around line 577-597: The test currently expects the loop to return a generic
TerminationReason.ERROR and mask the provider's exception; instead change it to
assert the provider exception propagates: update
test_provider_exception_returns_error_result to call loop.execute with the
_FailingProvider and use pytest.raises(ConnectionError) (or assert the raised
exception is ConnectionError) around the await, remove the post-call assertions
about termination_reason/error_message, and ensure the failing provider class
and its complete method remain the same so the ConnectionError is what bubbles
up to the engine layer.
- Around line 620-660: The test test_tool_exception_returns_error_result is
non-deterministic because the mock provider only returns one response; add a
second response when constructing provider (use mock_provider_factory([... ,
_tool_use_response(...)])) so the ReactLoop actually continues past the tool
error, and then change the final assertion to require a single deterministic
outcome (e.g., assert result.termination_reason == TerminationReason.COMPLETED).
Update only the provider creation and the assertion; leave _ExplodingTool,
ToolInvoker, and ReactLoop usage unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1463bb3e-80fb-4849-a07a-1c27ef9bb915
📒 Files selected for processing (12)
CLAUDE.mdDESIGN_SPEC.mdsrc/ai_company/engine/__init__.pysrc/ai_company/engine/errors.pysrc/ai_company/engine/loop_protocol.pysrc/ai_company/engine/react_loop.pysrc/ai_company/observability/events/execution.pysrc/ai_company/tools/invoker.pytests/unit/engine/conftest.pytests/unit/engine/test_errors.pytests/unit/engine/test_loop_protocol.pytests/unit/engine/test_react_loop.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for exception handling per PEP 758 — ruff enforces this on Python 3.14
Add type hints to all public functions; mypy strict mode must pass
Use Google-style docstrings on all public classes and functions; enforced by ruff D rules
Create new objects for immutability; never mutate existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields
Use Pydantic v2 with adopted conventions: use@computed_fieldfor derived values instead of storing redundant fields (e.g.TokenUsage.total_tokens); useNotBlankStrfor all identifier/name fields including optional and tuple variants instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
All provider calls go throughBaseCompletionProviderwhich applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code
SetRetryConfigandRateLimiterConfigper-provider inProviderConfig
Retryable erro...
Files:
tests/unit/engine/test_errors.pysrc/ai_company/engine/__init__.pysrc/ai_company/observability/events/execution.pysrc/ai_company/engine/errors.pysrc/ai_company/engine/react_loop.pytests/unit/engine/test_react_loop.pytests/unit/engine/test_loop_protocol.pysrc/ai_company/tools/invoker.pysrc/ai_company/engine/loop_protocol.pytests/unit/engine/conftest.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Set test timeout to 30 seconds per test
Usetests/**/*.pyfor vendor-agnostic naming withtest-provider,test-small-001instead of real vendor model names
Files:
tests/unit/engine/test_errors.pytests/unit/engine/test_react_loop.pytests/unit/engine/test_loop_protocol.pytests/unit/engine/conftest.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic MUST havefrom ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code
Always useloggeras the variable name for loggers, not_loggerorlog
Always use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider); import directly
Always use structured logging withlogger.info(EVENT, key=value)format; never uselogger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Tests must usetest-provider,test-small-001, etc.
Files:
src/ai_company/engine/__init__.pysrc/ai_company/observability/events/execution.pysrc/ai_company/engine/errors.pysrc/ai_company/engine/react_loop.pysrc/ai_company/tools/invoker.pysrc/ai_company/engine/loop_protocol.py
DESIGN_SPEC.md
📄 CodeRabbit inference engine (CLAUDE.md)
Update
DESIGN_SPEC.mdto reflect new reality when approved deviations occur
Files:
DESIGN_SPEC.md
🧠 Learnings (22)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: At every phase of planning and implementation, be critical and actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free); surface improvements as suggestions, not silent changes
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Never defer work—do not suggest "this can be done later" or "consider for a future PR". Complete all requested changes fully.
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Never defer work. Do not suggest 'this can be done later' or 'consider for a future PR'. Complete all requested changes fully.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Alert the user and explain why if implementation deviates from the spec; user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Enable pre-commit hooks: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: After finishing an issue implementation, create a feature branch (`<type>/<slug>`), commit, and push — do NOT create a PR automatically
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Use commit message format `<type>: <description>` with types: feat, fix, refactor, docs, test, chore, perf, ci; enforced by commitizen
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Always create a PR for issue work. When implementing changes for a GitHub issue, create a branch and open a pull request. Do not wait to be asked.
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: No bypassing CI - never use `git push --no-verify` or modify test coverage thresholds to make tests pass. If tests fail, fix the actual issue. Pre-push hooks exist to catch problems before they reach CI.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: NEVER create a PR directly — `gh pr create` is blocked by hookify; ALWAYS use `/pre-pr-review` to create PRs
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
src/ai_company/engine/__init__.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Applied to files:
src/ai_company/engine/__init__.py
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Applies to src/**/*.py : Always use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`); import directly
Applied to files:
src/ai_company/observability/events/execution.py
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Applies to **/*.py : `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains
Applied to files:
src/ai_company/engine/errors.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/engine/test_react_loop.pytests/unit/engine/test_loop_protocol.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings
Applied to files:
tests/unit/engine/conftest.py
🔇 Additional comments (13)
src/ai_company/engine/react_loop.py (5)
1-40: LGTM!Module docstring clearly describes the ReAct pattern, imports are well-organized with TYPE_CHECKING guards, and observability setup follows project conventions with
get_logger(__name__).
56-168: LGTM!The execute loop structure is clean and follows the documented think→act→observe pattern. The decomposition into focused helper methods improves readability, and each early-return path correctly propagates ExecutionResult. The immutable context evolution via
ctx.with_turn_completed()andctx.with_message()aligns with the project's frozen model conventions.
322-389: LGTM!Comprehensive error handling for tool execution: gracefully handles missing tool invoker when the LLM unexpectedly requests tools, and properly catches tool execution failures with full context logging. The context evolution loop using
ctx.with_message()correctly maintains immutability.
392-439: LGTM!Helper functions are well-focused and correctly handle the data transformations.
_build_resultproperly converts the mutableturnslist to an immutable tuple for the frozenExecutionResultmodel.
231-248: No issues. The code correctly uses PEP 758 exception syntax for Python 3.14.src/ai_company/engine/__init__.py (1)
1-61: LGTM!Public API surface cleanly expanded with all new loop protocol types, the ReactLoop implementation, and related error types. The
__all__list maintains alphabetical ordering and the docstring is updated to reflect the new scope.tests/unit/engine/conftest.py (2)
176-230: LGTM!The
MockCompletionProvidertest double correctly implements theCompletionProviderprotocol with matching method signatures. The implementation is appropriately minimal:complete()pops pre-configured responses,stream()raisesNotImplementedError(acceptable for tests not exercising streaming), andget_model_capabilities()returns valid minimal capabilities using the vendor-agnostic"test-provider"name.
233-236: LGTM!The fixture returning the class type (rather than an instance) provides good test flexibility—each test can construct a mock with its specific response sequence.
src/ai_company/engine/loop_protocol.py (5)
1-21: LGTM!Module structure is clean with appropriate imports. The TYPE_CHECKING guard for forward references (
CompletionConfig,CompletionProvider,ToolInvoker) is correctly applied whileAgentContextis imported directly since it's used in runtime field definitions.
23-29: LGTM!Clear, minimal enum covering the four termination scenarios documented in the PR objectives.
32-63: LGTM!
TurnRecordis well-designed with appropriate field constraints (gt=0,ge=0), immutable tuple fortool_calls_made, and@computed_fieldfor derivedtotal_tokensfollowing project conventions.
66-117: LGTM!
ExecutionResultis well-designed with proper validation. The@model_validatorcorrectly enforces the error_message↔ERROR invariant. The docstring appropriately warns aboutmetadatadict mutation and references the project's deep-copy convention for system boundaries.
120-160: LGTM!The
BudgetCheckertype alias is simple and well-documented. TheExecutionLoopprotocol with@runtime_checkableprovides a clean contract thatReactLoopimplements, with comprehensive docstrings explaining the callback semantics and configuration options.
| - **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR` | ||
| - **`TurnRecord`** — frozen per-turn stats (tokens, cost, tool calls, finish reason) | ||
| - **`ExecutionResult`** — frozen outcome with final context, termination reason, turn records, and optional error message (required when reason is `ERROR`) | ||
| - **`BudgetChecker`** — callback type `Callable[[AgentContext], bool]` invoked before each LLM call |
There was a problem hiding this comment.
Keep a first-class blocked termination state.
Issue #124 still calls out blocked termination, and later sections in this spec already distinguish parked/resumable work from outright failure. Folding those cases into ERROR loses whether the loop failed versus needs external input and can be resumed; if BLOCKED is intentionally deferred for M3, call that out explicitly instead of redefining the protocol here.
📝 Suggested spec correction
-- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR`
+- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `BLOCKED`, `ERROR`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DESIGN_SPEC.md` around lines 811 - 814, The spec removes a distinct blocked
termination state and folds it into ERROR which loses important semantics;
update the TerminationReason enum to include BLOCKED as a first-class value
(e.g., add `BLOCKED` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`,
`ERROR`) and adjust ExecutionResult rules so that an error message is required
only when TerminationReason == `ERROR`, while `BLOCKED` is documented as a
resumable/parked state (not an outright failure) and any references to
termination handling or consumers of ExecutionResult/TurnRecord reflect this
distinction; if BLOCKED is intentionally deferred, add a clear note in the spec
stating it will be introduced in M3 rather than collapsing it into ERROR.
| @property | ||
| def registry(self) -> ToolRegistry: | ||
| """Read-only access to the underlying tool registry.""" | ||
| return self._registry |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Expose only provider-facing tool definitions here.
ReactLoop only needs schemas. Returning the live ToolRegistry widens ToolInvoker's public surface and makes it easy for callers to step around the validation/deep-copy boundary via registry.get(...).execute(...).
♻️ Narrow the accessor
-from ai_company.providers.models import ToolCall, ToolResult
+from ai_company.providers.models import ToolCall, ToolDefinition, ToolResult
...
- `@property`
- def registry(self) -> ToolRegistry:
- """Read-only access to the underlying tool registry."""
- return self._registry
+ def to_definitions(self) -> tuple[ToolDefinition, ...]:
+ """Return provider-facing tool definitions."""
+ return self._registry.to_definitions()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/invoker.py` around lines 87 - 90, The current registry
property on ToolInvoker exposes the live ToolRegistry (_registry), widening the
public surface and allowing callers to bypass validation via
registry.get(...).execute(...); change the registry property to instead return
only provider-facing tool definitions (e.g., a deep-copied mapping/list of
ToolDefinition or ToolSchema objects) that contain schemas/metadata needed by
ReactLoop and do not expose ToolRegistry methods; ensure the property returns
immutable or copied schema objects (not the original _registry or Tool
instances) so callers cannot call execute or mutate the underlying registry.
| async def test_provider_exception_returns_error_result( | ||
| self, | ||
| sample_agent_context: AgentContext, | ||
| ) -> None: | ||
| ctx = _ctx_with_user_msg(sample_agent_context) | ||
|
|
||
| class _FailingProvider: | ||
| async def complete(self, *_args: Any, **_kwargs: Any) -> None: | ||
| msg = "connection refused" | ||
| raise ConnectionError(msg) | ||
|
|
||
| loop = ReactLoop() | ||
| result = await loop.execute( | ||
| context=ctx, | ||
| provider=_FailingProvider(), # type: ignore[arg-type] | ||
| ) | ||
|
|
||
| assert result.termination_reason == TerminationReason.ERROR | ||
| assert result.error_message is not None | ||
| assert "ConnectionError" in result.error_message | ||
|
|
There was a problem hiding this comment.
Don't lock provider failures into TerminationReason.ERROR.
An error result here hides the original provider exception from the engine layer. That breaks the fallback path for retry exhaustion and similar provider failures, which the engine is supposed to catch directly rather than receiving a generic loop error.
As per coding guidelines, "RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains`."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_react_loop.py` around lines 577 - 597, The test
currently expects the loop to return a generic TerminationReason.ERROR and mask
the provider's exception; instead change it to assert the provider exception
propagates: update test_provider_exception_returns_error_result to call
loop.execute with the _FailingProvider and use pytest.raises(ConnectionError)
(or assert the raised exception is ConnectionError) around the await, remove the
post-call assertions about termination_reason/error_message, and ensure the
failing provider class and its complete method remain the same so the
ConnectionError is what bubbles up to the engine layer.
tests/unit/engine/test_react_loop.py
Outdated
| async def test_tool_exception_returns_error_result( | ||
| self, | ||
| sample_agent_context: AgentContext, | ||
| mock_provider_factory: type[MockCompletionProvider], | ||
| ) -> None: | ||
| ctx = _ctx_with_user_msg(sample_agent_context) | ||
| provider = mock_provider_factory([_tool_use_response("explode", "tc-1")]) | ||
|
|
||
| class _ExplodingTool(BaseTool): | ||
| def __init__(self) -> None: | ||
| super().__init__( | ||
| name="explode", | ||
| description="boom", | ||
| ) | ||
|
|
||
| async def execute( | ||
| self, | ||
| *, | ||
| arguments: dict[str, Any], | ||
| ) -> ToolExecutionResult: | ||
| msg = "kaboom" | ||
| raise RuntimeError(msg) | ||
|
|
||
| registry = ToolRegistry([_ExplodingTool()]) | ||
| invoker = ToolInvoker(registry) | ||
| loop = ReactLoop() | ||
|
|
||
| result = await loop.execute( | ||
| context=ctx, | ||
| provider=provider, | ||
| tool_invoker=invoker, | ||
| ) | ||
|
|
||
| # The tool error is caught by ToolInvoker.invoke and returned | ||
| # as ToolResult(is_error=True), so the loop continues normally. | ||
| # It terminates because the mock has no more responses. | ||
| # This validates the loop doesn't crash on tool errors. | ||
| assert result.termination_reason in ( | ||
| TerminationReason.ERROR, | ||
| TerminationReason.COMPLETED, | ||
| ) |
There was a problem hiding this comment.
Make the post-tool-error contract deterministic.
Allowing either ERROR or COMPLETED means this test will keep passing even if the loop starts terminating for the wrong reason; at the moment it's mostly asserting the mock provider's exhaustion behavior. Feed a second provider response and pin the expected outcome after a recoverable tool failure.
💚 Make the expectation explicit
- provider = mock_provider_factory([_tool_use_response("explode", "tc-1")])
+ provider = mock_provider_factory(
+ [
+ _tool_use_response("explode", "tc-1"),
+ _stop_response("done after tool error"),
+ ]
+ )
...
- # It terminates because the mock has no more responses.
- # This validates the loop doesn't crash on tool errors.
- assert result.termination_reason in (
- TerminationReason.ERROR,
- TerminationReason.COMPLETED,
- )
+ assert result.termination_reason == TerminationReason.COMPLETED
+ assert result.context.conversation[2].tool_result is not None
+ assert result.context.conversation[2].tool_result.is_error is True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_react_loop.py` around lines 620 - 660, The test
test_tool_exception_returns_error_result is non-deterministic because the mock
provider only returns one response; add a second response when constructing
provider (use mock_provider_factory([... , _tool_use_response(...)])) so the
ReactLoop actually continues past the tool error, and then change the final
assertion to require a single deterministic outcome (e.g., assert
result.termination_reason == TerminationReason.COMPLETED). Update only the
provider creation and the assertion; leave _ExplodingTool, ToolInvoker, and
ReactLoop usage unchanged.
| async def test_max_tokens_returns_completed( | ||
| self, | ||
| sample_agent_context: AgentContext, | ||
| mock_provider_factory: type[MockCompletionProvider], | ||
| ) -> None: | ||
| ctx = _ctx_with_user_msg(sample_agent_context) | ||
| response = CompletionResponse( | ||
| content="partial output", | ||
| finish_reason=FinishReason.MAX_TOKENS, | ||
| usage=_usage(), | ||
| model="test-model-001", | ||
| ) | ||
| provider = mock_provider_factory([response]) | ||
| loop = ReactLoop() | ||
|
|
||
| result = await loop.execute(context=ctx, provider=provider) | ||
|
|
||
| assert result.termination_reason == TerminationReason.COMPLETED | ||
| assert len(result.turns) == 1 | ||
| assert result.turns[0].finish_reason == FinishReason.MAX_TOKENS | ||
|
|
There was a problem hiding this comment.
MAX_TOKENS is not a successful completion signal.
A FinishReason.MAX_TOKENS response only says the model hit the output ceiling; it says nothing about the task being done. Baking TerminationReason.COMPLETED into the test will bless truncated answers as successful executions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_react_loop.py` around lines 667 - 687, The test
incorrectly treats FinishReason.MAX_TOKENS as a successful completion; update
the test (test_max_tokens_returns_completed) to assert that a MAX_TOKENS finish
is NOT mapped to TerminationReason.COMPLETED (e.g., assert
result.termination_reason != TerminationReason.COMPLETED and/or assert it equals
the non-success code your system uses such as
TerminationReason.INTERRUPTED/INCOMPLETE), and verify ReactLoop.execute’s
mapping of CompletionResponse.finish_reason (FinishReason.MAX_TOKENS) treats it
as a non-success path instead of producing TerminationReason.COMPLETED.
Source fixes: - Use NotBlankStr for TurnRecord.tool_calls_made (convention consistency) - Add MemoryError/RecursionError guard to _check_budget (consistency with _call_provider and _execute_tool_calls) - Fix cost accounting: error responses now include failing turn's cost in context.accumulated_cost via model_copy - Update module, method, and class docstrings for accuracy Test improvements: - Tighten overly permissive tool error assertion (ERROR, not ERROR|COMPLETED) - Add RecursionError propagation tests for provider and tool execution - Add invoke_all exception path test with mock ToolInvoker - Add empty ToolRegistry test - Add cost accounting test for error responses - Add TurnRecord field validation boundary tests - Record and verify CompletionConfig forwarding in MockCompletionProvider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| updated_ctx = ctx.model_copy( | ||
| update={ | ||
| "turn_count": ctx.turn_count + 1, | ||
| "accumulated_cost": add_token_usage( | ||
| ctx.accumulated_cost, response.usage | ||
| ), | ||
| }, | ||
| ) | ||
| return _build_result( | ||
| updated_ctx, | ||
| TerminationReason.ERROR, | ||
| turns, | ||
| error_message=error_msg, | ||
| ) |
There was a problem hiding this comment.
task_execution cost skipped on error responses
_check_response_errors manually assembles the context update via model_copy but omits the task_execution.with_cost(usage) step that AgentContext.with_turn_completed() always performs. For task-bound executions, this means the CONTENT_FILTER and ERROR finish-reason paths never credit the failing turn's token cost to the TaskExecution, creating a gap between result.context.accumulated_cost (updated) and result.context.task_execution.accumulated_cost (stale).
with_turn_completed (context.py line 223–224) shows the pattern:
if self.task_execution is not None:
updates["task_execution"] = self.task_execution.with_cost(usage)This same guard is missing here:
| updated_ctx = ctx.model_copy( | |
| update={ | |
| "turn_count": ctx.turn_count + 1, | |
| "accumulated_cost": add_token_usage( | |
| ctx.accumulated_cost, response.usage | |
| ), | |
| }, | |
| ) | |
| return _build_result( | |
| updated_ctx, | |
| TerminationReason.ERROR, | |
| turns, | |
| error_message=error_msg, | |
| ) | |
| updated_ctx = ctx.model_copy( | |
| update={ | |
| "turn_count": ctx.turn_count + 1, | |
| "accumulated_cost": add_token_usage( | |
| ctx.accumulated_cost, response.usage | |
| ), | |
| **( | |
| {"task_execution": ctx.task_execution.with_cost(response.usage)} | |
| if ctx.task_execution is not None | |
| else {} | |
| ), | |
| }, | |
| ) |
None of the existing tests catch this because every test that hits this error path uses the sample_agent_context fixture, which has no task_execution (AgentContext.from_identity without a task= argument). A test exercising CONTENT_FILTER on a task-bound context would expose the discrepancy.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/react_loop.py
Line: 279-292
Comment:
**`task_execution` cost skipped on error responses**
`_check_response_errors` manually assembles the context update via `model_copy` but omits the `task_execution.with_cost(usage)` step that `AgentContext.with_turn_completed()` always performs. For task-bound executions, this means the `CONTENT_FILTER` and `ERROR` finish-reason paths never credit the failing turn's token cost to the `TaskExecution`, creating a gap between `result.context.accumulated_cost` (updated) and `result.context.task_execution.accumulated_cost` (stale).
`with_turn_completed` (context.py line 223–224) shows the pattern:
```python
if self.task_execution is not None:
updates["task_execution"] = self.task_execution.with_cost(usage)
```
This same guard is missing here:
```suggestion
updated_ctx = ctx.model_copy(
update={
"turn_count": ctx.turn_count + 1,
"accumulated_cost": add_token_usage(
ctx.accumulated_cost, response.usage
),
**(
{"task_execution": ctx.task_execution.with_cost(response.usage)}
if ctx.task_execution is not None
else {}
),
},
)
```
None of the existing tests catch this because every test that hits this error path uses the `sample_agent_context` fixture, which has no `task_execution` (`AgentContext.from_identity` without a `task=` argument). A test exercising CONTENT_FILTER on a task-bound context would expose the discrepancy.
How can I resolve this? If you propose a fix, please make it concise.
Summary
loop_protocol.py: Converttotal_tool_callsandtotal_tokensto@computed_field, addmodel_validatorfor error_message/termination_reason cross-field consistencyreact_loop.py: Decompose monolithicexecute()into 5 focused helpers (_check_budget,_call_provider,_check_response_errors,_handle_completion,_execute_tool_calls), add comprehensive error handling for provider failures, budget checker exceptions, and tool execution errorserrors.py: Update docstrings to clarify relationship between loop-internalTerminationReasonand engine-layer exceptionsDESIGN_SPEC.md: AddExecutionLoopprotocol interface docs to §6.5, update §15.3 project structure withloop_protocol.pyandreact_loop.py, fix diagram termination text ("blocked" → "error")Closes #124
Test plan
uv run ruff check src/ tests/— all checks passeduv run ruff format src/ tests/— all formatteduv run mypy src/ tests/— no issues in 215 filesuv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 1896 passed, 95.11% coverageReview coverage
Pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 16 findings consolidated, all 16 implemented.